TY - GEN
T1 - Learning to represent bilingual dictionaries
AU - Chen, Muhao
AU - Tian, Yingtao
AU - Chen, Haochen
AU - Chang, Kai Wei
AU - Skiena, Steven
AU - Zaniolo, Carlo
N1 - Publisher Copyright: © 2019 Association for Computational Linguistics.
PY - 2019
Y1 - 2019
N2 - Bilingual word embeddings have been widely used to capture the correspondence of lexical semantics in different human languages. However, the cross-lingual correspondence between sentences and words is less studied, despite that this correspondence can significantly benefit many applications such as cross-lingual semantic search and textual inference. To bridge this gap, we propose a neural embedding model that leverages bilingual dictionaries1. The proposed model is trained to map the lexical definitions to the cross-lingual target words, for which we explore with different sentence encoding techniques. To enhance the learning process on limited resources, our model adopts several critical learning strategies, including multi-task learning on different bridges of languages, and joint learning of the dictionary model with a bilingual word embedding model. We conduct experiments on two new tasks. In the cross-lingual reverse dictionary retrieval task, we demonstrate that our model is capable of comprehending bilingual concepts based on descriptions, and the proposed learning strategies are effective. In the bilingual paraphrase identification task, we show that our model effectively associates sentences in different languages via a shared embedding space, and outperforms existing approaches in identifying bilingual paraphrases.
AB - Bilingual word embeddings have been widely used to capture the correspondence of lexical semantics in different human languages. However, the cross-lingual correspondence between sentences and words is less studied, despite that this correspondence can significantly benefit many applications such as cross-lingual semantic search and textual inference. To bridge this gap, we propose a neural embedding model that leverages bilingual dictionaries1. The proposed model is trained to map the lexical definitions to the cross-lingual target words, for which we explore with different sentence encoding techniques. To enhance the learning process on limited resources, our model adopts several critical learning strategies, including multi-task learning on different bridges of languages, and joint learning of the dictionary model with a bilingual word embedding model. We conduct experiments on two new tasks. In the cross-lingual reverse dictionary retrieval task, we demonstrate that our model is capable of comprehending bilingual concepts based on descriptions, and the proposed learning strategies are effective. In the bilingual paraphrase identification task, we show that our model effectively associates sentences in different languages via a shared embedding space, and outperforms existing approaches in identifying bilingual paraphrases.
UR - https://www.scopus.com/pages/publications/85084335425
M3 - Conference contribution
T3 - CoNLL 2019 - 23rd Conference on Computational Natural Language Learning, Proceedings of the Conference
SP - 152
EP - 162
BT - CoNLL 2019 - 23rd Conference on Computational Natural Language Learning, Proceedings of the Conference
PB - Association for Computational Linguistics
T2 - 23rd Conference on Computational Natural Language Learning, CoNLL 2019
Y2 - 3 November 2019 through 4 November 2019
ER -