bamfilter

usage: blobtools bamfilter  -b FILE [-i FILE] [-e FILE] [-u] [-o PREFIX]
                                [--sort] [--keep] [--threads INT]
                                [-h|--help]

    Options:
        -h --help                   show this
        -b, --bam FILE              BAM file (sorted by name)
        -i, --include FILE          List of contigs whose reads are included
                                    - writes interleaved FASTQs of pairs where at least
                                        one read maps sequences in list
                                        (InUn.fq, InIn.fq, ExIn.fq)
        -e, --exclude FILE          List of contigs whose reads are excluded (outputs reads that do not map to sequences in list)
                                    - writes interleaved FASTQs of pairs where at least
                                        one read does not maps to sequences in list
                                        (InUn.fq, InIn.fq, ExIn.fq)
        -u, --include_unmapped      Include pairs where both reads are unmapped
        --sort                      Sort BAM file by name
        --keep                      Keep sorted BAM file (deleted otherwise)
        --threads INT               number of sorting/compression threads
                                    for sorting [default: 2]
        -o, --out PREFIX            Output prefix

Comments

  • This script is only needed when paired-end FASTQ are to be extracted from a mapping (if single-end reads were mapped, these can easily be extracted using samtools/GNU grep).
  • The module bamfilter can be controlled with a list of sequence IDs to include or to exclude.
  • Use of an exclusion list causes all sequence IDs, except those specified, to be included
  • It will output up to four interleaved FASTQ files depending on the actual mapping behaviour of the read pairs and whether the parameter --include_unmapped is provided:
  • *.InIn.fq both reads in read pair mapping to included sequences
  • *.InUn.fq one read mapping to an included sequence and the other being unmapped
  • *.ExIn.fq one read mapping to an included sequence and the other mapping to an excluded sequence
  • *.UnUn.fq read pairs where neither read maps to the assembly. This file is only written if --include_unmapped is provided
  • If no exclusion/inclusion list is provided, bamfilter will output three interleaved FASTQ files:
  • *.InIn.fq
  • *.InUn.fq
  • *.UnUn.fq
  • The script also writes a summary file (*.info.txt) with the numbers of read pairs extracted for each category.
  • If downstream processes require deinterleaved FASTQ files, I suggest using one of these approaches.