著者
Ryo Fukuda Katsuhito Sudoh Satoshi Nakamura
出版者
The Association for Natural Language Processing
雑誌
自然言語処理 (ISSN:13407619)
巻号頁・発行日
vol.29, no.2, pp.344-366, 2022 (Released:2022-06-15)
参考文献数
42

Recent studies consider knowledge distillation as a promising method for speech translation (ST) using end-to-end models. However, its usefulness in cascade ST with automatic speech recognition (ASR) and machine translation (MT) models has not yet been clarified. An ASR output typically contains speech recognition errors. An MT model trained only on human transcripts performs poorly on error-containing ASR results. Thus, it should be trained considering the presence of ASR errors during inference. In this paper, we propose using knowledge distillation for training of the MT model for cascade ST to achieve robustness against ASR errors. We distilled knowledge from a teacher model based on human transcripts to a student model based on erroneous transcriptions. Our experimental results showed that the proposed method improves the translation performance on erroneous transcriptions. Further investigation by combining knowledge distillation and fine-tuning consistently improved the performance on two different datasets: MuST-C English--Italian and Fisher Spanish--English.