Coverage parsing
The process of coverage parsing used by blobtools create
and map2cov
differs depending on the file format of the coverage file.
1. BAM files
1.1. samtools flagstat
is called on the BAM file in order to obtain:
- total number of reads
- mapped number of reads
1.2. samtools view -F 4 -F 1028 -F 256
is called on the BAM file and the output is parsed
1.3 For each read :
- read coverage of sequence increases by 1
- base coverage of sequence increases by N
- where N is the sum of alignment match positions (M/X/='s) divided by the number of AGCT's in the sequence
2. SAM files
2.1. Only mapped reads (where RNAME is not "*") of the SAM file
2.2 For each read :
- read coverage of sequence increases by 1
- base coverage of sequence increases by N
- where N is the sum of alignment match positions (M/X/='s) divided by the number of AGCT's in the sequence
3. CAS files
3.1 clc_mapping_info -s
is called and the output is parsed for:
- total number of reads
- mapped number of reads
3.2 clc_mapping_info -n
is called and for each sequence in the assembly the following values are parsed:
- corrected base coverage (column 6 of "Contig info")
- read coverage (column 3 of "Contig info")
4. COV files
- Total/mapped read count is parsed from the header
- Base/read coverage is parsed for each sequence
Updated over 7 years ago