Skip to main navigation Skip to search Skip to main content

Automatic Multilingual Question Generation for Health Data Using LLMs

  • SUNY Old Westbury

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

9 Scopus citations

Abstract

Question Generation (QG) involves automatic generation of yes/no, factual and Wh-questions created from data sources such as a database, raw text or semantic representation. QG can be used in an adaptive intelligent tutoring system or a dialog system for improving question answering in various text generation tasks. Traditional QG has used syntactic rules with linguistic features to generate factoid questions. However, more recent research has proposed using pre-trained Transformer-based models for generating questions that are more aware of the answers. The goal of this study was to create a multilingual database (English and Spanish) of automatically generated sets of questions using artificial intelligence (AI), machine learning (ML), and large language models (LLMs) in particular for a culturally sensitive health intelligent tutoring system (ITS) for the Hispanic population. Several language models (LMs) including Chat GPT, valhalla/t5-based-e2e-qg, T5 (Small, Base, and Large), mrm8488/bert2bert-spanish-question-generation, mT5 (Small and Base), Flan-T5 (Small, Base and Large), BART (Base and Large) and mBART (Large) were chosen for our experiments that were given a prompt to produce a set of questions (3, 5, 7 or 10) using transcribed texts as the context. We observed that a text/transcript of at least 100 words was sufficient to generate 5–7 questions of reasonable quality. When models were prompted to produce 10 or more questions based on texts containing 100 words or less the meaningfulness, syntax and semantic soundness of outputs decreased notably.

Original languageEnglish
Title of host publicationAI-generated Content - 1st International Conference, AIGC 2023, Revised Selected Papers
EditorsFeng Zhao, Duoqian Miao
PublisherSpringer Science and Business Media Deutschland GmbH
Pages1-11
Number of pages11
ISBN (Print)9789819975860
DOIs
StatePublished - 2024
Event1st International Conference on AI-generated Content, AIGC 2023 - Shanghai, China
Duration: Aug 25 2023Aug 26 2023

Publication series

NameCommunications in Computer and Information Science
Volume1946 CCIS

Conference

Conference1st International Conference on AI-generated Content, AIGC 2023
Country/TerritoryChina
CityShanghai
Period08/25/2308/26/23

Keywords

  • Healthcare
  • LLMs
  • Multilingual
  • Question Generation

Fingerprint

Dive into the research topics of 'Automatic Multilingual Question Generation for Health Data Using LLMs'. Together they form a unique fingerprint.

Cite this