Abstract
Motivation k-mer-based algorithms have become increasingly popular in the processing of high-throughput sequencing data. These algorithms span the gamut of the analysis pipeline from k-mer counting (e.g. for estimating assembly parameters), to error correction, genome and transcriptome assembly, and even transcript quantification. Yet, these tasks often use very different k-mer representations and data structures. In this article, we show how to build a k-mer-counting and multiset-representation system using the counting quotient filter, a feature-rich approximate membership query data structure. We introduce the k-mer-counting/querying system Squeakr (Simple Quotient filter-based Exact and Approximate Kmer Representation), which is based on the counting quotient filter. This off-the-shelf data structure turns out to be an efficient (approximate or exact) representation for sets or multisets of k-mers. Results Squeakr takes 2×-4.3× less time than the state-of-the-art to count and perform a random-point-query workload. Squeakr is memory-efficient, consuming 1.5×-4.3× less memory than the state-of-the-art. It offers competitive counting performance. In fact, it is faster for larger k-mers, and answers point queries (i.e. queries for the abundance of a particular k-mer) over an order-of-magnitude faster than other systems. The Squeakr representation of the k-mer multiset turns out to be immediately useful for downstream processing (e.g. de Bruijn graph traversal) because it supports fast queries and dynamic k-mer insertion, deletion, and modification. Availability and implementation https://github.com/splatlab/squeakr available under BSD 3-Clause License. Contact [email protected] Supplementary informationSupplementary dataare available at Bioinformatics online.
| Original language | English |
|---|---|
| Pages (from-to) | 568-575 |
| Number of pages | 8 |
| Journal | Bioinformatics |
| Volume | 34 |
| Issue number | 4 |
| DOIs | |
| State | Published - Feb 15 2018 |
Fingerprint
Dive into the research topics of 'Squeakr: An exact and approximate k -mer counting system'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver