Coverage parsing


The process of coverage parsing used by blobtools create and map2cov differs depending on the file format of the coverage file.


1. BAM files


1.1. samtools flagstat is called on the BAM file in order to obtain:

  • total number of reads
  • mapped number of reads

1.2. samtools view -F 4 -F 1028 -F 256 is called on the BAM file and the output is parsed

1.3 For each read :

  • read coverage of sequence increases by 1
  • base coverage of sequence increases by N
  • where N is the sum of alignment match positions (M/X/='s) divided by the number of AGCT's in the sequence

2. SAM files


2.1. Only mapped reads (where RNAME is not "*") of the SAM file

2.2 For each read :

  • read coverage of sequence increases by 1
  • base coverage of sequence increases by N
  • where N is the sum of alignment match positions (M/X/='s) divided by the number of AGCT's in the sequence

3. CAS files


3.1 clc_mapping_info -s is called and the output is parsed for:

  • total number of reads
  • mapped number of reads

3.2 clc_mapping_info -n is called and for each sequence in the assembly the following values are parsed:

  • corrected base coverage (column 6 of "Contig info")
  • read coverage (column 3 of "Contig info")

4. COV files


  • Total/mapped read count is parsed from the header
  • Base/read coverage is parsed for each sequence