PROJECT TITLE :
Learning Spatial and Temporal Extents of Human Actions for Action Detection
For the problem of action detection, most existing strategies require that relevant parts of the action of interest in coaching videos are manually annotated with bounding boxes. Some recent works tried to avoid tedious manual annotation , and proposed to automatically identify the relevant portions in training videos. However, these methods only concerned the identification in either spatial or temporal domain, and might get irrelevant contents from another domain. These irrelevant contents are typically undesirable in the coaching part, which can cause a degradation of the detection performance. This paper advances previous work by proposing a joint learning framework to simultaneously identify the spatial and temporal extents of the action of interest in coaching videos. To induce pixel-level localization results, our technique uses dense trajectories extracted from videos as native features to represent actions. We initial present a trajectory split-and-merge algorithm to segment a video into the background and several separated foreground moving objects. During this algorithm, the inherent temporal smoothness of human actions is exploited to facilitate segmentation. Then, with the latent SVM framework on segmentation results, spatial and temporal extents of the action of interest are treated as latent variables that are inferred simultaneously with action recognition. Experiments on 2 difficult datasets show that action detection with our learned spatial and temporal extents is superior than state-of-the-art methods.
Did you like this research project?
To get this research project Guidelines, Training and Code... Click Here