TY - GEN
T1 - Automatic Multilingual Question Generation for Health Data Using LLMs
AU - Ackerman, Ryan
AU - Balyan, Renu
N1 - Publisher Copyright: © 2024, The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
PY - 2024
Y1 - 2024
N2 - Question Generation (QG) involves automatic generation of yes/no, factual and Wh-questions created from data sources such as a database, raw text or semantic representation. QG can be used in an adaptive intelligent tutoring system or a dialog system for improving question answering in various text generation tasks. Traditional QG has used syntactic rules with linguistic features to generate factoid questions. However, more recent research has proposed using pre-trained Transformer-based models for generating questions that are more aware of the answers. The goal of this study was to create a multilingual database (English and Spanish) of automatically generated sets of questions using artificial intelligence (AI), machine learning (ML), and large language models (LLMs) in particular for a culturally sensitive health intelligent tutoring system (ITS) for the Hispanic population. Several language models (LMs) including Chat GPT, valhalla/t5-based-e2e-qg, T5 (Small, Base, and Large), mrm8488/bert2bert-spanish-question-generation, mT5 (Small and Base), Flan-T5 (Small, Base and Large), BART (Base and Large) and mBART (Large) were chosen for our experiments that were given a prompt to produce a set of questions (3, 5, 7 or 10) using transcribed texts as the context. We observed that a text/transcript of at least 100 words was sufficient to generate 5–7 questions of reasonable quality. When models were prompted to produce 10 or more questions based on texts containing 100 words or less the meaningfulness, syntax and semantic soundness of outputs decreased notably.
AB - Question Generation (QG) involves automatic generation of yes/no, factual and Wh-questions created from data sources such as a database, raw text or semantic representation. QG can be used in an adaptive intelligent tutoring system or a dialog system for improving question answering in various text generation tasks. Traditional QG has used syntactic rules with linguistic features to generate factoid questions. However, more recent research has proposed using pre-trained Transformer-based models for generating questions that are more aware of the answers. The goal of this study was to create a multilingual database (English and Spanish) of automatically generated sets of questions using artificial intelligence (AI), machine learning (ML), and large language models (LLMs) in particular for a culturally sensitive health intelligent tutoring system (ITS) for the Hispanic population. Several language models (LMs) including Chat GPT, valhalla/t5-based-e2e-qg, T5 (Small, Base, and Large), mrm8488/bert2bert-spanish-question-generation, mT5 (Small and Base), Flan-T5 (Small, Base and Large), BART (Base and Large) and mBART (Large) were chosen for our experiments that were given a prompt to produce a set of questions (3, 5, 7 or 10) using transcribed texts as the context. We observed that a text/transcript of at least 100 words was sufficient to generate 5–7 questions of reasonable quality. When models were prompted to produce 10 or more questions based on texts containing 100 words or less the meaningfulness, syntax and semantic soundness of outputs decreased notably.
KW - Healthcare
KW - LLMs
KW - Multilingual
KW - Question Generation
UR - https://www.scopus.com/pages/publications/85177171415
U2 - 10.1007/978-981-99-7587-7_1
DO - 10.1007/978-981-99-7587-7_1
M3 - Conference contribution
SN - 9789819975860
T3 - Communications in Computer and Information Science
SP - 1
EP - 11
BT - AI-generated Content - 1st International Conference, AIGC 2023, Revised Selected Papers
A2 - Zhao, Feng
A2 - Miao, Duoqian
PB - Springer Science and Business Media Deutschland GmbH
T2 - 1st International Conference on AI-generated Content, AIGC 2023
Y2 - 25 August 2023 through 26 August 2023
ER -