Nubeam-dedup: a fast and RAM-efficient tool to de-duplicate sequencing reads without mapping

H Dai, Y Guan - Bioinformatics, 2020 - academic.oup.com
We present Nubeam-dedup, a fast and RAM-efficient tool to de-duplicate sequencing reads
without reference genome. Nubeam-dedup represents nucleotides by matrices, transforms …

ParDRe: faster parallel duplicated reads removal tool for sequencing studies

J González-Domínguez, B Schmidt - Bioinformatics, 2016 - academic.oup.com
Current next generation sequencing technologies often generate duplicated or near-
duplicated reads that (depending on the application scenario) do not provide any interesting …

Minirmd: accurate and fast duplicate removal tool for short reads via multiple minimizers

Y Liu, X Zhang, Q Zou, X Zeng - Bioinformatics, 2021 - academic.oup.com
Removing duplicate and near-duplicate reads, generated by high-throughput sequencing
technologies, is able to reduce computational resources in downstream applications. Here …

A study on optimizing markduplicate in genome sequencing pipeline

Q Zhao - Proceedings of the 5th International Conference on …, 2018 - dl.acm.org
MarkDuplicate is typically one of the most time-consuming operations in the whole genome
sequencing pipeline. Picard tool, which is widely used by biologists to sort reads in genome …

Umi-reducer: collapsing duplicate sequencing reads via unique molecular identifiers

S Mangul, SV Driesche, LS Martin, KC Martin, E Eskin - bioRxiv, 2017 - biorxiv.org
Short Structured Abstract Summary Every sequencing library contains duplicate reads.
While many duplicates arise during polymerase chain reaction (PCR), some duplicates …

SAMBLASTER: fast duplicate marking and structural variant read extraction

GG Faust, IM Hall - Bioinformatics, 2014 - academic.oup.com
Motivation: Illumina DNA sequencing is now the predominant source of raw genomic data,
and data volumes are growing rapidly. Bioinformatic analysis pipelines are having trouble …

Super deduper, fast PCR duplicate detection in fastq files

KR Petersen, DA Streett, AT Gerritsen… - Proceedings of the 6th …, 2015 - dl.acm.org
Our goal was to explore the accuracy and utility of identifying and removing PCR duplicates
from HTS data using Super Deduper. Super Deduper is a pre-alignment, sequence read …

Removing duplicate reads using graphics processing units

A Manconi, M Moscatelli, G Armano, M Gnocchi… - BMC …, 2016 - Springer
Background During library construction polymerase chain reaction is used to enrich the DNA
before sequencing. Typically, this process generates duplicate read sequences. Removal of …

Fulcrum: condensing redundant reads from high-throughput sequencing studies

MS Burriesci, EM Lehnert, JR Pringle - Bioinformatics, 2012 - academic.oup.com
Motivation: Ultra-high-throughput sequencing produces duplicate and near-duplicate reads,
which can consume computational resources in downstream applications. A tool that …

Gencore: an efficient tool to generate consensus reads for error suppressing and duplicate removing of NGS data

S Chen, Y Zhou, Y Chen, T Huang, W Liao, Y Xu, Z Li… - Bmc Bioinformatics, 2019 - Springer
Background Removing duplicates might be considered as a well-resolved problem in next-
generation sequencing (NGS) data processing domain. However, as NGS technology gains …