{"__v":0,"_id":"5796681bacc0bb0e0033350d","category":{"__v":0,"_id":"576e8ae1f37ab41700147471","project":"57618347b65324200072d6a5","version":"57618347b65324200072d6a8","sync":{"url":"","isSync":false},"reference":false,"createdAt":"2016-06-25T13:45:05.813Z","from_sync":false,"order":4,"slug":"test","title":"How does it work?"},"parentDoc":null,"project":"57618347b65324200072d6a5","user":"57617c8caa540f3600bfed20","version":{"__v":8,"_id":"57618347b65324200072d6a8","project":"57618347b65324200072d6a5","createdAt":"2016-06-15T16:33:11.587Z","releaseDate":"2016-06-15T16:33:11.587Z","categories":["57618347b65324200072d6a9","5761912d207db7170022fbe9","57619455a7c9f729009a74e0","576e8ae1f37ab41700147471","5797b8e5209a6e0e00b8321b","57989a8817ced017003c4c69","579ca6f3d46f960e0029a8ec","579ca703fefb1d0e00c94f06"],"is_deprecated":false,"is_hidden":false,"is_beta":false,"is_stable":true,"codename":"blobtools v0.9.19","version_clean":"0.9.19","version":"0.9.19"},"updates":[],"next":{"pages":[],"description":""},"createdAt":"2016-07-25T19:27:23.332Z","link_external":false,"link_url":"","githubsync":"","sync_unique":"","hidden":false,"api":{"results":{"codes":[]},"settings":"","auth":"required","params":[],"url":""},"isReference":false,"order":0,"body":"### Assumptions\n\n* Scores of hits in [hits file(s)](doc:taxonomy-file) (results of sequence similarity searches against a database) are a proxy for evolutionary distance \n* Scores in hits [hits file(s)](doc:taxonomy-file) can be  \n\n### Process\n \nFor taxonomic annotation, ```blobtools create```\n\n1. Parses one (or multiple) [hits file(s)](doc:taxonomy-file) and extracts:\n * sequence ID\n * taxID\n      - if a hit contains more than one TaxID separated by semicolon(s), only the first one is extracted\n * score\n\n\n2. Infers the taxonomic name at all [taxonomic ranks](doc:taxonomic-rank) for each [taxID](doc:taxid)  in the [hits file(s)](doc:taxonomy-file)\n\n3. Calculates the [taxonomy](doc:taxonomy) for each sequence in the assembly, under a given [tax-rule](doc:tax-rule), using the (bit)scores (parsed in 1) and the taxonomic names (inferred in 2).\n\n### Sources of errors in taxonomic annotation\n\n* False-negatives due to narrow phylogenetic scope of a sequence collection/database\n* False-positives due to mis-annotated sequences in a sequence collection/database","excerpt":"How does it work?","slug":"taxonomic-annotation","type":"basic","title":"Taxonomic annotation"}

Taxonomic annotation

How does it work?

### Assumptions * Scores of hits in [hits file(s)](doc:taxonomy-file) (results of sequence similarity searches against a database) are a proxy for evolutionary distance * Scores in hits [hits file(s)](doc:taxonomy-file) can be ### Process For taxonomic annotation, ```blobtools create``` 1. Parses one (or multiple) [hits file(s)](doc:taxonomy-file) and extracts: * sequence ID * taxID - if a hit contains more than one TaxID separated by semicolon(s), only the first one is extracted * score 2. Infers the taxonomic name at all [taxonomic ranks](doc:taxonomic-rank) for each [taxID](doc:taxid) in the [hits file(s)](doc:taxonomy-file) 3. Calculates the [taxonomy](doc:taxonomy) for each sequence in the assembly, under a given [tax-rule](doc:tax-rule), using the (bit)scores (parsed in 1) and the taxonomic names (inferred in 2). ### Sources of errors in taxonomic annotation * False-negatives due to narrow phylogenetic scope of a sequence collection/database * False-positives due to mis-annotated sequences in a sequence collection/database