{"__v":0,"_id":"5797b2bee8e2090e003b845d","category":{"project":"57618347b65324200072d6a5","version":"57618347b65324200072d6a8","_id":"579ca6f3d46f960e0029a8ec","__v":0,"sync":{"url":"","isSync":false},"reference":false,"createdAt":"2016-07-30T13:09:07.617Z","from_sync":false,"order":9999,"slug":"glossary","title":"Glossary"},"parentDoc":null,"project":"57618347b65324200072d6a5","user":"57617c8caa540f3600bfed20","version":{"__v":8,"_id":"57618347b65324200072d6a8","project":"57618347b65324200072d6a5","createdAt":"2016-06-15T16:33:11.587Z","releaseDate":"2016-06-15T16:33:11.587Z","categories":["57618347b65324200072d6a9","5761912d207db7170022fbe9","57619455a7c9f729009a74e0","576e8ae1f37ab41700147471","5797b8e5209a6e0e00b8321b","57989a8817ced017003c4c69","579ca6f3d46f960e0029a8ec","579ca703fefb1d0e00c94f06"],"is_deprecated":false,"is_hidden":false,"is_beta":false,"is_stable":true,"codename":"blobtools v0.9.19","version_clean":"0.9.19","version":"0.9.19"},"updates":[],"next":{"pages":[],"description":""},"createdAt":"2016-07-26T18:58:06.172Z","link_external":false,"link_url":"","githubsync":"","sync_unique":"","hidden":false,"api":{"results":{"codes":[]},"settings":"","auth":"required","params":[],"url":""},"isReference":false,"order":1,"body":"* Tax-rules are algorithms that determine how the taxonomy of each sequence in the assembly is calculated during the [taxonomic annotation](doc:taxonomic-annotation) process. \n\n* Tax-rules are applied to each [taxonomic rank](doc:taxonomic-rank).\n\n* Currently, only two tax-rules exist:\n\n * bestsum\n  * Scores of all hits across **all** [hits files](doc:taxonomy-file) get summed by taxonomic name\n  * The taxonomy is then set to : \n   * name = taxonomic name with the highest sum-score\n   * score = highest sum-score\n   * c-value = count of alternative candidates for taxonomic name\n \n * bestsumorder\n  * Scores of all hits get summed by taxonomic name for the **first** [hits file](doc:taxonomy-file) (determined by the input order), analogous to bestsum \n  * If no hits were found in the first [hits file](doc:taxonomy-file), the next [hits file](doc:taxonomy-file) gets processed.\n  \n* The behaviour of a tax-rule can be controlled using arguments when calling ```blobtools create```\n * ```[---min_diff FLOAT]``` : if the two best scoring taxonomic names are within FLOAT of each other, the taxonomy is set to:\n  *  taxonomic name = \"unresolved\"\n\n * ```[--tax_collision_random]``` : if two best scoring taxonomic names have the same score, select a taxonomic name at random\n\n[block:code]\n{\n  \"codes\": [\n    {\n      \"code\": \"# Sum of scores by taxonomy across all hits files\\nfor hitsfile in hitsfiles:\\n\\tfor rank in ranks:\\n  \\tfor tax, score in hits:\\n\\t  \\thash[rank][tax] += score\\n\\n# Sort by score  \\nsorted_hash = sort(hash, score, decreasing)\\n\\n# Infer winning taxonomy\\nfor rank in ranks:\\n\\tfor tax, score in sorted_hash:\\n  \\tif not taxonomy:\\n    \\ttaxonomy[rank]['tax'] = tax\\n      taxonomy[rank]['score'] = score\\n    else:\\n    \\tif score == taxonomy[rank]['score']:\\n      \\tif not tax_collision_random:\\n        \\ttaxonomy[rank]['tax'] = 'unresolved'\\n      elif (taxonomy[rank]['score'] - score) <= min_bitscore_diff:\\n      \\ttaxonomy[rank]['tax'] = 'unresolved'\\n      else:\\n      \\tpass\\n      taxonomy[rank]['c_index'] += 1\\nreturn taxonomy\",\n      \"language\": \"python\",\n      \"name\": \"bestsum\"\n    },\n    {\n      \"code\": \"hits_found = 0\\nfor hitsfile in hitsfiles:\\n\\tfor rank in ranks:\\n  \\tfor taxonomic_name, score in sorted_hits_by_score:\\n    \\tif not hits_found: # parsing first hit file\\n  \\t\\t\\tif not taxonomy:\\n    \\t\\t\\ttaxonomy[rank]['tax'] = taxonomic_name\\n    \\t\\t  taxonomy[rank]['score'] = score\\n        else:\\n          if score == taxonomy[rank]['score']: # equal score in subsequent hit\\n          \\tif not tax_collision_random:\\n            \\ttaxonomy[rank]['tax'] = 'unresolved'\\n            elif (taxonomy[rank]['score'] - score) <= min_bitscore_diff:\\n              taxonomy[rank]['tax'] = 'unresolved'\\n            else:\\n            \\tpass\\n          else:\\n          \\ttaxonomy[rank]['c_index'] += 1\\n        hits_found = 1\",\n      \"language\": \"python\",\n      \"name\": \"bestsumorder\"\n    }\n  ]\n}\n[/block]","excerpt":"","slug":"tax-rule","type":"basic","title":"tax-rule"}
* Tax-rules are algorithms that determine how the taxonomy of each sequence in the assembly is calculated during the [taxonomic annotation](doc:taxonomic-annotation) process. * Tax-rules are applied to each [taxonomic rank](doc:taxonomic-rank). * Currently, only two tax-rules exist: * bestsum * Scores of all hits across **all** [hits files](doc:taxonomy-file) get summed by taxonomic name * The taxonomy is then set to : * name = taxonomic name with the highest sum-score * score = highest sum-score * c-value = count of alternative candidates for taxonomic name * bestsumorder * Scores of all hits get summed by taxonomic name for the **first** [hits file](doc:taxonomy-file) (determined by the input order), analogous to bestsum * If no hits were found in the first [hits file](doc:taxonomy-file), the next [hits file](doc:taxonomy-file) gets processed. * The behaviour of a tax-rule can be controlled using arguments when calling ```blobtools create``` * ```[---min_diff FLOAT]``` : if the two best scoring taxonomic names are within FLOAT of each other, the taxonomy is set to: * taxonomic name = "unresolved" * ```[--tax_collision_random]``` : if two best scoring taxonomic names have the same score, select a taxonomic name at random [block:code] { "codes": [ { "code": "# Sum of scores by taxonomy across all hits files\nfor hitsfile in hitsfiles:\n\tfor rank in ranks:\n \tfor tax, score in hits:\n\t \thash[rank][tax] += score\n\n# Sort by score \nsorted_hash = sort(hash, score, decreasing)\n\n# Infer winning taxonomy\nfor rank in ranks:\n\tfor tax, score in sorted_hash:\n \tif not taxonomy:\n \ttaxonomy[rank]['tax'] = tax\n taxonomy[rank]['score'] = score\n else:\n \tif score == taxonomy[rank]['score']:\n \tif not tax_collision_random:\n \ttaxonomy[rank]['tax'] = 'unresolved'\n elif (taxonomy[rank]['score'] - score) <= min_bitscore_diff:\n \ttaxonomy[rank]['tax'] = 'unresolved'\n else:\n \tpass\n taxonomy[rank]['c_index'] += 1\nreturn taxonomy", "language": "python", "name": "bestsum" }, { "code": "hits_found = 0\nfor hitsfile in hitsfiles:\n\tfor rank in ranks:\n \tfor taxonomic_name, score in sorted_hits_by_score:\n \tif not hits_found: # parsing first hit file\n \t\t\tif not taxonomy:\n \t\t\ttaxonomy[rank]['tax'] = taxonomic_name\n \t\t taxonomy[rank]['score'] = score\n else:\n if score == taxonomy[rank]['score']: # equal score in subsequent hit\n \tif not tax_collision_random:\n \ttaxonomy[rank]['tax'] = 'unresolved'\n elif (taxonomy[rank]['score'] - score) <= min_bitscore_diff:\n taxonomy[rank]['tax'] = 'unresolved'\n else:\n \tpass\n else:\n \ttaxonomy[rank]['c_index'] += 1\n hits_found = 1", "language": "python", "name": "bestsumorder" } ] } [/block]