State-Clustering Based Multiple Deep Neural Networks Modeling Approach for Speech Recognition

PROJECT TITLE:

State-Clustering Based Multiple Deep Neural Networks Modeling Approach for Speech Recognition - 2015

ABSTRACT:

The hybrid deep neural network (DNN) and hidden Markov model (HMM) has recently achieved dramatic performance gains in automatic speech recognition (ASR). The DNN-primarily based acoustic model is very powerful however its learning method is extraordinarily time-consuming. In this paper, we propose a completely unique DNN-primarily based acoustic modeling framework for speech recognition, where the posterior chances of HMM states are computed from multiple DNNs (mDNN), instead of one massive DNN, for the purpose of parallel coaching towards faster turnaround. In the proposed mDNN method all tied HMM states are initial grouped into several disjoint clusters based mostly on knowledge-driven methods. Next, many hierarchically structured DNNs are trained separately in parallel for these clusters using multiple computing units (e.g. GPUs). In decoding, the posterior probabilities of HMM states can be calculated by combining outputs from multiple DNNs. During this work, we have shown that the coaching procedure of the mDNN under common criteria, including each frame-level cross-entropy and sequence-level discriminative training, will be parallelized efficiently to yield important speedup. The coaching speedup is especially attributed to the very fact that multiple DNNs are parallelized over multiple GPUs and each DNN is smaller in size and trained by only a subset of training information. We have evaluated the proposed mDNN technique on a sixty four-hour Mandarin transcription task and also the 320-hour Switchboard task. Compared to the standard DNN, a four-cluster mDNN model with similar size will yield comparable recognition performance in Switchboard (only about a pair of% performance degradation) with a bigger than 7 times speed improvement in CE coaching and a 2.9 times improvement in sequence coaching, when four GPUs are used.

Did you like this research project?

To get this research project Guidelines, Training and Code... Click Here