Recent video segmentation algorithms often exploits both spatial and temporal attention mechanisms. Research in
human action recognition has accelerated significantly since the introduction of powerful tools such as
Convolutional Neural Networks (CNNs). However, effective methods for incorporation of temporal information into CNNs are still being actively explored. Motivated by the popular recurrent attention models in
natural language processing, the Attention-aware Temporal Weighted CNN (ATW CNN) is proposed in videos, which embeds a visual attention model into a temporal weighted multi-stream CNN. This attention model is implemented as temporal weighting and it effectively boosts the recognition performance of video representations. Besides, each stream in the proposed ATW CNN framework is capable of end-to-end training, with both network parameters and temporal weights optimized by
stochastic gradient descent (SGD) with
back-propagation. Experimental results show that the ATW CNN attention mechanism contributes substantially to the performance gains with the more discriminative snippets by focusing on more relevant video segments. == Literature ==