Taxonomic annotation
How does it work?
Assumptions
- Scores of hits in hits file(s) (results of sequence similarity searches against a database) are a proxy for evolutionary distance.
- Scores in hits hits file(s) are additive.
Process
For taxonomic annotation, BlobTools create
:
- Parses one (or multiple) hits file(s) and extracts: sequence ID, NCBI TaxID, and score. If a hit contains more than one TaxID separated by semicolon(s), only the first one is extracted.
- Infers a taxonomy at every taxonomic ranks for each taxID in the hits file(s).
- Calculates the taxonomy for each sequence in the assembly, under a given tax-rule, using the scores (parsed in 1) and the taxonomic names (inferred in 2).
Updated over 7 years ago