usage: blobtools bamfilter -b FILE [-i FILE] [-e FILE] [-u] [-o PREFIX]
[--sort] [--keep] [--threads INT]
[-h|--help]
Options:
-h --help show this
-b, --bam FILE BAM file (sorted by name)
-i, --include FILE List of contigs whose reads are included
- writes interleaved FASTQs of pairs where at least
one read maps sequences in list
(InUn.fq, InIn.fq, ExIn.fq)
-e, --exclude FILE List of contigs whose reads are excluded (outputs reads that do not map to sequences in list)
- writes interleaved FASTQs of pairs where at least
one read does not maps to sequences in list
(InUn.fq, InIn.fq, ExIn.fq)
-u, --include_unmapped Include pairs where both reads are unmapped
--sort Sort BAM file by name
--keep Keep sorted BAM file (deleted otherwise)
--threads INT number of sorting/compression threads
for sorting [default: 2]
-o, --out PREFIX Output prefix
Comments
This script is only needed when paired-end FASTQ are to be extracted from a mapping (if single-end reads were mapped, these can easily be extracted using samtools/GNU grep).
The module bamfilter can be controlled with a list of sequence IDs to include or to exclude.
Use of an exclusion list causes all sequence IDs, except those specified, to be included
It will output up to four interleaved FASTQ files depending on the actual mapping behaviour of the read pairs and whether the parameter --include_unmapped is provided:
*.InIn.fq both reads in read pair mapping to included sequences
*.InUn.fq one read mapping to an included sequence and the other being unmapped
*.ExIn.fq one read mapping to an included sequence and the other mapping to an excluded sequence
*.UnUn.fq read pairs where neither read maps to the assembly. This file is only written if --include_unmapped is provided
If no exclusion/inclusion list is provided, bamfilter will output three interleaved FASTQ files:
*.InIn.fq
*.InUn.fq
*.UnUn.fq
The script also writes a summary file (*.info.txt) with the numbers of read pairs extracted for each category.
If downstream processes require deinterleaved FASTQ files, I suggest using one of these approaches.