TY - GEN
T1 - Word-based dialect identification with georeferenced rules
AU - Scherrer, Yves
AU - Rambow, Owen
PY - 2010
Y1 - 2010
N2 - We present a novel approach for (written) dialect identification based on the discriminative potential of entire words. We generate Swiss German dialect words from a Standard German lexicon with the help of hand-crafted phonetic/graphemic rules that are associated with occurrence maps extracted from a linguistic atlas created through extensive empirical fieldwork. In comparison with a character-n-gram approach to dialect identification, our model is more robust to individual spelling differences, which are frequently encountered in non-standardized dialect writing. Moreover, it covers the whole Swiss German dialect continuum, which trained models struggle to achieve due to sparsity of training data.
AB - We present a novel approach for (written) dialect identification based on the discriminative potential of entire words. We generate Swiss German dialect words from a Standard German lexicon with the help of hand-crafted phonetic/graphemic rules that are associated with occurrence maps extracted from a linguistic atlas created through extensive empirical fieldwork. In comparison with a character-n-gram approach to dialect identification, our model is more robust to individual spelling differences, which are frequently encountered in non-standardized dialect writing. Moreover, it covers the whole Swiss German dialect continuum, which trained models struggle to achieve due to sparsity of training data.
UR - https://www.scopus.com/pages/publications/80053231143
M3 - Conference contribution
SN - 1932432868
SN - 9781932432862
T3 - EMNLP 2010 - Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference
SP - 1151
EP - 1161
BT - EMNLP 2010 - Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference
T2 - Conference on Empirical Methods in Natural Language Processing, EMNLP 2010
Y2 - 9 October 2010 through 11 October 2010
ER -