Skip to main navigation Skip to search Skip to main content

IdeaBench: Benchmarking Large Language Models for Research Idea Generation

  • Sikun Guo
  • , Amir Hassan Shariatmadari
  • , Guangzhi Xiong
  • , Albert Huang
  • , Myles Kim
  • , Corey M. Williams
  • , Stefan Bekiranov
  • , Aidong Zhang
  • University of Virginia

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

4 Scopus citations

Abstract

Large Language Models (LLMs) have revolutionized interactions between human and artificial intelligence (AI) systems, demonstrating state-of-the-art performance across various domains, including scientific discovery and hypothesis generation. However, the absence of a comprehensive and systematic evaluation framework for LLM-driven research idea generation hinders a rigorous understanding of their strengths and limitations. To address this gap, we propose IdeaBench, a benchmark system that provides a structured dataset and evaluation framework for standardizing the assessment of research idea generation by LLMs. Our dataset comprises titles and abstracts from 2,374 influential papers across eight research domains, along with their 29,408 referenced works, creating a context-rich environment that mirrors human researchers' ideation processes. By profiling LLMs as domain-specific researchers and grounding them in similar contextual constraints, we directly leverage the models' knowledge learned from the pre-training stage to generate new research ideas. To systematically evaluate LLMs' research ideation capability and approximate human assessment, we propose a reference-based metric that aligns with human judgment to quantify idea quality with the assistance of LLMs. Through this evaluation, we find that while LLMs excel at generating novel ideas, they may struggle with generating feasible ideas. IdeaBench serves as a critical resource for benchmarking and comparing LLMs, ultimately advancing research on AI's role in automating scientific discovery.

Original languageEnglish
Title of host publicationKDD 2025 - Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining
PublisherAssociation for Computing Machinery
Pages5888-5899
Number of pages12
ISBN (Electronic)9798400714542
DOIs
StatePublished - Aug 3 2025
Event31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD 2025 - Toronto, Canada
Duration: Aug 3 2025Aug 7 2025

Publication series

NameProceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
Volume2

Conference

Conference31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD 2025
Country/TerritoryCanada
CityToronto
Period08/3/2508/7/25

Keywords

  • AI for science
  • hypothesis generation
  • large language models

Fingerprint

Dive into the research topics of 'IdeaBench: Benchmarking Large Language Models for Research Idea Generation'. Together they form a unique fingerprint.

Cite this