Analyzing Asr Pretraining for Speech-to-Text Translation with Limited Resources

PROJECT TITLE :

Analyzing Asr Pretraining For Low-Resource Speech-To-Text Translation

ABSTRACT:

Previous research has demonstrated that pretraining an end-to-end model on automatic speech recognition (ASR) data from a high-resource language can improve automatic speech-to-text translation (AST) for low-resource source languages.

However, it is unclear which elements, such as language relatedness or the amount of the pretraining data, result in the greatest benefits, or whether pretraining can be effectively paired with other methods like data augmentation. We try out pretraining on a variety of datasets, including languages related and unrelated to the AST source language.

The word error rate of the pretrained ASR model is found to be the strongest predictor of final AST performance, and differences in ASR/AST performance correlate with how phonetic information is encoded in the subsequent RNN layers of our model. We also show that pretraining and data augmentation provide AST with additional benefits.

Did you like this research project?

To get this research project Guidelines, Training and Code... Click Here