PROJECT TITLE :
A novel lip descriptor for audio-visual keyword Spotting based on adaptive decision fusion - 2016
Keyword spotting remains a challenge when applied to real-world environments with dramatically changing noise. In recent studies, audio-visual integration methods have demonstrated superiorities since visual speech isn't influenced by acoustic noise. However, for visual speech recognition, individual utterance mannerisms can cause confusion and false recognition. To solve this drawback, a novel lip descriptor is presented involving both geometry-based mostly and appearance-based options during this paper. Specifically, a group of geometry-based mostly options is proposed based on a complicated facial landmark localization technique. In order to obtain sturdy and discriminative representation, a spatiotemporal lip feature is put forward regarding similarities among textons and mapping the feature to intra-category subspace. Moreover, a parallel 2-step keyword recognizing strategy based mostly on decision fusion is proposed so as to form the most effective use of audio-visual speech and adapt to various noise conditions. Weights generated employing a neural network combine acoustic and visual contributions. Experimental results on the OuluVS dataset and PKU-AV dataset demonstrate that the proposed lip descriptor shows competitive performance compared to the cutting-edge. Additionally, the proposed audio-visual keyword spotting (AV-KWS) method primarily based on call-level fusion considerably improves the noise robustness and attains higher performance than feature-level fusion, that is also capable of adapting to numerous noisy conditions.
Did you like this research project?
To get this research project Guidelines, Training and Code... Click Here