Extracts mappable reads from NGS gata.

Works with large fastq files.


Posted on October 23th 2013 by Timo Lassmann

Raw sequences produced by next generation sequencing (NGS) machines may contain adapter, linker, barcode and fingerprint sequences. TagDust2 is a program to extract and correctly label the sequences to be mapped in downstream pipelines.

TagDust allows users to specify the expected architecture of a read and converts it into a hidden Markov model. The latter can assign sequences to a particular barcode (or index) even in the presence of sequencing errors. Sequences not matching the architecture (primer dimers, contaminants etc.) are automatically discarded:

Version 2.2 released

Posted on May 29th 2014 by Timo Lassmann

Version 2.2 is about 2 major updates

1) Integration with Illumina Casava output.

2) A test suite was added to ensure all paper figures can be easily reproduced.

Have a look at the expanded user manual.

Version 2.12 released

Posted on May 29th 2014 by Timo Lassmann

Version 2.12 includes minor bug fixes.

Version 2.11 released

Posted on May 19th 2014 by Timo Lassmann

Version 2.11 includes three major updates:

1) Support for both single and paired end data.

2) Automatic selection of read architectures. In a lab running several NGS protocols users can now create a template file listing all possible read architectures / barcode combinations used. TagDust will automatically select the correct architecture.

3) Automatic threshold selection. In pervious versions the -q option was used to select the extraction threshold. The new version runs a small simulation and automatically selects a threshold giving best sensitivity + specificity. This greatly improves the results when the read architecture is simple (e.g. a three letter barcode followed by the read).


tar -zxcf tagdust-X.XX.tar.gz 
cd tagdust
make check
make install