ABSTRACT:

We propose a novel framework for multimodal video indexing and retrieval using shrinkage optimized directed information assessment (SODA) as similarity measure. The directed information (DI) is a variant of the classical mutual information which attempts to capture the direction of information flow that videos naturally possess. It is applied directly to the empirical probability distributions of both audio-visual features over successive frames. We utilize RASTA-PLP features for audio feature representation and SIFT features for visual feature representation. We compute the joint probability density functions of audio and visual features in order to fuse features from different modalities. With SODA, we further estimate the DI in a manner that is suitable for high dimensional features $p$ and small sample size $n$ (large $p$ small $n$) between pairs of video-audio modalities. We demonstrate the superiority of the SODA approach in video indexing, retrieval, and activity recognition as compared to the state-of-the-art methods such as hidden Markov models (HMM), support vector machine (SVM), cross-media indexing space (CMIS), and other noncausal divergence measures such as mutual information (MI). We also demonstrate the success of SODA in audio and video localization and indexing/retrieval of data with missaligned modalities.


Did you like this research project?

To get this research project Guidelines, Training and Code... Click Here


PROJECT TITLE : Spatio-Contextual Deep Network Based Multimodal Pedestrian Detection For Autonomous Driving ABSTRACT: The most important component of an autonomous driving system is the module that is responsible for pedestrian
PROJECT TITLE : Tufts Dental Database A Multimodal Panoramic X-Ray Dataset for Benchmarking Diagnostic Systems
PROJECT TITLE : MM-UrbanFAC Urban Functional Area Classification Model Based on Multimodal Machine Learning ABSTRACT: The majority of the classification methods that are currently used for urban functional areas are only based
PROJECT TITLE : Learning Multimodal Representations for Drowsiness Detection ABSTRACT: The detection of drowsiness is an essential step toward ensuring safe driving. A significant amount of work has been put into developing an
PROJECT TITLE : Joint detection and matching of feature points in multimodal images ABSTRACT: In this work, we propose a novel architecture for Convolutional Neural Networks (CNNs) for the joint detection and matching of feature

Ready to Complete Your Academic MTech Project Work In Affordable Price ?

Project Enquiry