Skip to main navigation Skip to search Skip to main content

Cluster analysis for gene expression data: A survey

  • SUNY Buffalo

Research output: Contribution to journalReview articlepeer-review

1023 Scopus citations

Abstract

DMA microarray technology has now made it possible to simultaneously monitor the expression levels of thousands of genes during Important biological processes and across collections of related samples. Elucidating the patterns hidden In gene expression data offers a tremendous opportunity for an enhanced understanding of functional genomlcs. However, the large number of genes and the complexity of biological networks greatly Increases the challenges of comprehending and interpreting the resulting mass of data, which often consists of millions of measurements. A first step toward addressing this challenge Is the use of clustering techniques, which Is essential in the data mining process to reveal natural structures and identify interesting patterns In the underlying data. Cluster analysis seeks to partition a given data set into groups based on specified features so that the data points within a group are more similar to each other than the points In different groups. A very rich literature on cluster analysis has developed over the past three decades. Many conventional clustering algorithms have been adapted or directly applied to gene expression data, and also new algorithms have recently been proposed specifically aiming at gene expression data. These clustering algorithms have been proven useful for identifying biologically relevant groups of genes and samples. In this paper, we first briefly introduce the concepts of microarray technology and discuss the basic elements of clustering on gene expression data. In particular, we divide cluster analysis for gene expression data into three categories. Then, we present specific challenges pertinent to each clustering category and introduce several representative approaches. We also discuss the problem of cluster validation in three aspects and review various methods to assess the quality and reliability of clustering results. Finally, we conclude this paper and suggest the promising trends In this field.

Original languageEnglish
Pages (from-to)1370-1386
Number of pages17
JournalIEEE Transactions on Knowledge and Data Engineering
Volume16
Issue number11
DOIs
StatePublished - Nov 2004

Keywords

  • Clustering
  • Gene expression data
  • Microarray technology

Fingerprint

Dive into the research topics of 'Cluster analysis for gene expression data: A survey'. Together they form a unique fingerprint.

Cite this