TY - GEN
T1 - Adaptive bi-directional attention
T2 - 2021 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2021
AU - Chen, Nuo
AU - Liu, Fenglin
AU - You, Chenyu
AU - Zhou, Peilin
AU - Zou, Yuexian
N1 - Publisher Copyright: © 2021 IEEE
PY - 2021
Y1 - 2021
N2 - Recently, the attention-enhanced multi-layer encoder, such as Transformer, has been extensively studied in Machine Reading Comprehension (MRC). To predict the answer, it is common practice to employ a predictor to draw information only from the final encoder layer which generates the coarse-grained representations of the source sequences, i.e., passage and question. Previous studies have shown that the representation of source sequence becomes more coarse-grained from fine-grained as the encoding layer increases. It is generally believed that with the growing number of layers in deep neural networks, the encoding process will gather relevant information for each location increasingly, resulting in more coarse-grained representations, which adds the likelihood of similarity to other locations (referring to homogeneity). Such a phenomenon will mislead the model to make wrong judgments so as to degrade the performance. To this end, we propose a novel approach called Adaptive Bidirectional Attention, which adaptively exploits the source representations of different levels to the predictor. Experimental results on the benchmark dataset, SQuAD 2.0 demonstrate the effectiveness of our approach, and the results are better than the previous state-of-the-art model by 2.5% EM and 2.3% F1 scores.
AB - Recently, the attention-enhanced multi-layer encoder, such as Transformer, has been extensively studied in Machine Reading Comprehension (MRC). To predict the answer, it is common practice to employ a predictor to draw information only from the final encoder layer which generates the coarse-grained representations of the source sequences, i.e., passage and question. Previous studies have shown that the representation of source sequence becomes more coarse-grained from fine-grained as the encoding layer increases. It is generally believed that with the growing number of layers in deep neural networks, the encoding process will gather relevant information for each location increasingly, resulting in more coarse-grained representations, which adds the likelihood of similarity to other locations (referring to homogeneity). Such a phenomenon will mislead the model to make wrong judgments so as to degrade the performance. To this end, we propose a novel approach called Adaptive Bidirectional Attention, which adaptively exploits the source representations of different levels to the predictor. Experimental results on the benchmark dataset, SQuAD 2.0 demonstrate the effectiveness of our approach, and the results are better than the previous state-of-the-art model by 2.5% EM and 2.3% F1 scores.
KW - Machine reading comprehension
KW - Multi-granularity representation
KW - Natural language processing
KW - Question answering
UR - https://www.scopus.com/pages/publications/85115172738
U2 - 10.1109/ICASSP39728.2021.9414067
DO - 10.1109/ICASSP39728.2021.9414067
M3 - Conference contribution
T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
SP - 7833
EP - 7837
BT - 2021 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2021 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 6 June 2021 through 11 June 2021
ER -