Action Classification Using Interaction-Aware Spatio-Temporal Pyramid Attention Networks PROJECT TITLE : Interaction-Aware Spatio-Temporal Pyramid Attention Networks for Action Classification ABSTRACT: When it comes to CNN-based visual action recognition, the accuracy of the process could be improved by concentrating on local key action regions. The task of self-attention is to concentrate on important aspects while ignoring information that is not pertinent. Therefore, paying attention to one's own actions can be beneficial for action recognition. However, most of the time, the current methods of self-attention ignore the correlations that exist among the local feature vectors located at spatial positions in CNN feature maps. In this paper, we propose an efficient interaction-aware self-attention model that is able to learn attention maps by extracting information about the interactions that occur between feature vectors. For the purpose of modeling attention, we introduce a spatial pyramid that contains feature maps at various layers. This is due to the fact that the various layers of a network each capture feature maps at a different scale. The information from multiple scales is utilized in order to arrive at attention scores that are more accurate. After using these attention scores to weight the local feature vectors of the feature maps, attentional feature maps are then calculated using those weighted local feature vectors. Because there is no limit placed on the number of feature maps that can be fed into the spatial pyramid attention layer, it is simple for us to extend this attention layer into a spatio-temporal variant. To create a video-level, end-to-end attention network for action recognition, our model can be incorporated into any general CNN. In order to obtain precise forecasts of human behavior, researchers are looking into a number of different ways to combine the RGB and flow streams. Our method has been shown to achieve state-of-the-art results on the datasets UCF101, HMDB51, Kinetics-400, and untrimmed Charades, according to the findings of our experimental research. Did you like this research project? To get this research project Guidelines, Training and Code... Click Here facebook twitter google+ linkedin stumble pinterest Drowsiness detection through the learning of multimodal representations Weakly-Supervised Disease Localization in X-Ray Images Using the GREN Graph-Regularized Embedding Network