Skip to main navigation Skip to search Skip to main content

MRD-Net: Multi-Modal Residual Knowledge Distillation for Spoken Question Answering

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

31 Scopus citations

Abstract

Spoken question answering (SQA) has recently drawn considerable attention in the speech community. It requires systems to find correct answers from the given spoken passages simultaneously. The common SQA systems consist of the automatic speech recognition (ASR) module and text-based question answering module. However, previous methods suffer from severe performance degradation due to ASR errors. To alleviate this problem, this work proposes a novel multi-modal residual knowledge distillation method (MRD-Net), which further distills knowledge at the acoustic level from the audio-assistant (Audio-A). Specifically, we utilize the teacher (T) trained on manual transcriptions to guide the training of the student (S) on ASR transcriptions. We also show that introducing an Audio-A helps this procedure by learning residual errors between T and S. Moreover, we propose a simple yet effective attention mechanism to adaptively leverage audio-text features as the new deep attention knowledge to boost the network performance. Extensive experiments demonstrate that the proposed MRD-Net achieves superior results compared with state-of-the-art methods on three spoken question answering benchmark datasets.

Original languageEnglish
Title of host publicationProceedings of the 30th International Joint Conference on Artificial Intelligence, IJCAI 2021
EditorsZhi-Hua Zhou
PublisherInternational Joint Conferences on Artificial Intelligence
Pages3985-3991
Number of pages7
ISBN (Electronic)9780999241196
DOIs
StatePublished - 2021
Event30th International Joint Conference on Artificial Intelligence, IJCAI 2021 - Virtual, Online, Canada
Duration: Aug 19 2021Aug 27 2021

Publication series

NameIJCAI International Joint Conference on Artificial Intelligence

Conference

Conference30th International Joint Conference on Artificial Intelligence, IJCAI 2021
Country/TerritoryCanada
CityVirtual, Online
Period08/19/2108/27/21

Fingerprint

Dive into the research topics of 'MRD-Net: Multi-Modal Residual Knowledge Distillation for Spoken Question Answering'. Together they form a unique fingerprint.

Cite this