Skip to main navigation Skip to search Skip to main content

Heterogeneous data integration with the consensus clustering formalism

  • University of California at Davis

Research output: Contribution to journalArticlepeer-review

32 Scopus citations

Abstract

Meaningfully integrating massive multi-experimental genomic data sets is becoming critical for the understanding of gene function. We have recently proposed methodologies for integrating large numbers of microarray data sets based on consensus clustering. Our methods combine gene clusters into a unified representation, or a consensus, that is insensitive to mis-classifications in the individual experiments. Here we extend their utility to heterogeneous data sets and focus on their refinement and improvement. First of all we compare our best heuristic to the popular majority rule consensus clustering heuristic, and show that the former yields tighter consensuses. We propose a refinement to our consensus algorithm by clustering of the source-specific clusterings as a step before finding the consensus between them, thereby improving our original results and increasing their biological relevance. We demonstrate our methodology on three data sets of yeast with biologically interesting results. Finally, we show that our methodology can deal successfully with missing experimental values.

Fingerprint

Dive into the research topics of 'Heterogeneous data integration with the consensus clustering formalism'. Together they form a unique fingerprint.

Cite this