Skip to main navigation Skip to search Skip to main content

Methods for constructing and evaluating consensus genomic interval sets

  • Julia Rymuza
  • , Yuchen Sun
  • , Guangtao Zheng
  • , Nathan J. Leroy
  • , Maria Murach
  • , Neil Phan
  • , Aidong Zhang
  • , Nathan C. Sheffield

Research output: Contribution to journalArticlepeer-review

1 Scopus citations

Abstract

The amount of genomic region data continues to increase. Integrating across diverse genomic region sets requires consensus regions, which enable comparing regions across experiments, but also by necessity lose precision in region definitions. We require methods to assess this loss of precision and build optimal consensus region sets. Here, we introduce the concept of flexible intervals and propose three novel methods for building consensus region sets, or universes: a coverage cutoff method, a likelihood method, and a Hidden Markov Model. We then propose three novel measures for evaluating how well a proposed universe fits a collection of region sets: a base-level overlap score, a region boundary distance score, and a likelihood score. We apply our methods and evaluation approaches to several collections of region sets and show how these methods can be used to evaluate fit of universes and build optimal universes. We describe scenarios where the common approach of merging regions to create consensus leads to undesirable outcomes and provide principled alternatives that provide interoperability of interval data while minimizing loss of resolution.

Original languageEnglish
Pages (from-to)10119-10131
Number of pages13
JournalNucleic Acids Research
Volume52
Issue number17
DOIs
StatePublished - Sep 23 2024

Fingerprint

Dive into the research topics of 'Methods for constructing and evaluating consensus genomic interval sets'. Together they form a unique fingerprint.

Cite this