usage: blobtools bamfilter -b FILE [-i FILE] [-e FILE] [-u] [-o PREFIX] [--sort] [--keep] [--threads INT] [-h|--help] Options: -h --help show this -b, --bam FILE BAM file (sorted by name) -i, --include FILE List of contigs whose reads are included - writes interleaved FASTQs of pairs where at least one read maps sequences in list (InUn.fq, InIn.fq, ExIn.fq) -e, --exclude FILE List of contigs whose reads are excluded (outputs reads that do not map to sequences in list) - writes interleaved FASTQs of pairs where at least one read does not maps to sequences in list (InUn.fq, InIn.fq, ExIn.fq) -u, --include_unmapped Include pairs where both reads are unmapped --sort Sort BAM file by name --keep Keep sorted BAM file (deleted otherwise) --threads INT number of sorting/compression threads for sorting [default: 2] -o, --out PREFIX Output prefix

Comments

  • This script is only needed when paired-end FASTQ are to be extracted from a mapping (if single-end reads were mapped, these can easily be extracted using samtools/GNU grep).
  • The module bamfilter can be controlled with a list of sequence IDs to include or to exclude.
  • Use of an exclusion list causes all sequence IDs, except those specified, to be included
  • It will output up to four interleaved FASTQ files depending on the actual mapping behaviour of the read pairs and whether the parameter --include_unmapped is provided:
  • *.InIn.fq both reads in read pair mapping to included sequences
  • *.InUn.fq one read mapping to an included sequence and the other being unmapped
  • *.ExIn.fq one read mapping to an included sequence and the other mapping to an excluded sequence
  • *.UnUn.fq read pairs where neither read maps to the assembly. This file is only written if --include_unmapped is provided
  • If no exclusion/inclusion list is provided, bamfilter will output three interleaved FASTQ files:
  • *.InIn.fq
  • *.InUn.fq
  • *.UnUn.fq
  • The script also writes a summary file (*.info.txt) with the numbers of read pairs extracted for each category.
  • If downstream processes require deinterleaved FASTQ files, I suggest using one of these approaches.