Skip to main navigation Skip to search Skip to main content

Resolving read assignment ambiguities in metagenomic clustering

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Clustering is a popular technique used for analyzing metagenomic data. Specifically it is used to assign metagenomic reads to clusters, each cluster representing a species or a higher level taxonomic unit. Due to the difficulty in distinguishing between homologous subsequences common to multiple species and lack of a perfect similarity measure between reads, it is not possible to deduce a correct assignment of reads to clusters. Thus, metagenomic clustering methods must either resort to ambiguity, or make the best available choice at each read assignment stage which could lead to incorrect clusters and potentially cascading errors. In this paper, we argue for first generating an ambiguous clustering and then resolving the ambiguities collectively by analyzing the ambiguous clusters. We propose a rigorous formulation of this problem and show that it is NP-Hard. We then propose an efficient heuristic to solve it in practice. We validate our approach on several synthetically generated datasets and a metagenomic dataset consisting of 16S rRNA sequences from the gut microbiome.

Original languageEnglish
Title of host publication5th International Conference on Bioinformatics and Computational Biology 2013, BICoB 2013
Pages73-80
Number of pages8
StatePublished - 2013
Event5th International Conference on Bioinformatics and Computational Biology 2013, BICoB 2013 - Honolulu, HI, United States
Duration: Mar 4 2013Mar 6 2013

Publication series

Name5th International Conference on Bioinformatics and Computational Biology 2013, BICoB 2013

Conference

Conference5th International Conference on Bioinformatics and Computational Biology 2013, BICoB 2013
Country/TerritoryUnited States
CityHonolulu, HI
Period03/4/1303/6/13

Fingerprint

Dive into the research topics of 'Resolving read assignment ambiguities in metagenomic clustering'. Together they form a unique fingerprint.

Cite this