Taxonomic annotation

How does it work?


Assumptions

  • Scores of hits in hits file(s) (results of sequence similarity searches against a database) are a proxy for evolutionary distance.
  • Scores in hits hits file(s) are additive.

Process

For taxonomic annotation, BlobTools create:

  1. Parses one (or multiple) hits file(s) and extracts: sequence ID, NCBI TaxID, and score. If a hit contains more than one TaxID separated by semicolon(s), only the first one is extracted.
  2. Infers a taxonomy at every taxonomic ranks for each taxID in the hits file(s).
  3. Calculates the taxonomy for each sequence in the assembly, under a given tax-rule, using the scores (parsed in 1) and the taxonomic names (inferred in 2).