{"__v":9,"_id":"5761916afb1e913400b9cf49","category":{"project":"57618347b65324200072d6a5","version":"57618347b65324200072d6a8","_id":"5761912d207db7170022fbe9","__v":0,"sync":{"url":"","isSync":false},"reference":false,"createdAt":"2016-06-15T17:32:29.121Z","from_sync":false,"order":2,"slug":"workflows","title":"Workflows"},"parentDoc":null,"project":"57618347b65324200072d6a5","user":"57617c8caa540f3600bfed20","version":{"__v":8,"_id":"57618347b65324200072d6a8","project":"57618347b65324200072d6a5","createdAt":"2016-06-15T16:33:11.587Z","releaseDate":"2016-06-15T16:33:11.587Z","categories":["57618347b65324200072d6a9","5761912d207db7170022fbe9","57619455a7c9f729009a74e0","576e8ae1f37ab41700147471","5797b8e5209a6e0e00b8321b","57989a8817ced017003c4c69","579ca6f3d46f960e0029a8ec","579ca703fefb1d0e00c94f06"],"is_deprecated":false,"is_hidden":false,"is_beta":false,"is_stable":true,"codename":"blobtools v0.9.19","version_clean":"0.9.19","version":"0.9.19"},"updates":[],"next":{"pages":[],"description":""},"createdAt":"2016-06-15T17:33:30.332Z","link_external":false,"link_url":"","githubsync":"","sync_unique":"","hidden":false,"api":{"results":{"codes":[]},"settings":"","auth":"required","params":[],"url":""},"isReference":false,"order":0,"body":"[block:api-header]\n{\n  \"type\": \"basic\",\n  \"title\": \"1. Create a blobDB file from input files\"\n}\n[/block]\n### 1.1 Input file requirements\n\nThe required input files are:\n- **one** [assembly file](doc:assembly-file), e.g. test_files/assembly.fna\n- **one** (or more) [coverage file(s)](doc:coverage-file) e.g. test_files/mapping_1.bam\n- **one** (or more) [hits file(s)](doc:taxonomy-file), e.g. test_files/blast.out\n\n### 1.2 Creating a blobDB\n[block:code]\n{\n  \"codes\": [\n    {\n      \"code\": \"# Since we are going to use a BAM file, make sure that samtools is added to you $PATH variable\\nexport PATH=\\\"/path/to/samtools:$PATH\\\"\\n\\n# using the files provided in the test_files/ folder \\n./blobtools create \\\\\\n\\t-i test_files/assembly.fna \\\\\\n  -b test_files/mapping_1.bam \\\\\\n  -t test_files/blast.out \\\\\\n  -o test_files/my_first_blobplot\",\n      \"language\": \"shell\",\n      \"name\": \"Input\"\n    },\n    {\n      \"code\": \"[STATUS]\\t\\t: Parsing FASTA - test_files/assembly.fna\\n[STATUS]\\t\\t: names.dmp/nodes.dmp not specified. Retrieving nodesDB from data/nodesDB.txt\\n[PROGRESS]\\t: \\t100%\\n[STATUS]\\t\\t: Parsing tax0 - test_files/blast.out\\n[STATUS]\\t\\t: Computing taxonomy using taxrule(s) bestsum\\n[PROGRESS]\\t: \\t100%\\n[STATUS]\\t\\t: Parsing bam0 - test_files/mapping_1.bam\\n[STATUS]\\t\\t: \\tChecking with 'samtools flagstat'\\n[STATUS]\\t\\t: \\tMapping reads = 15,313, total reads = 15,313 (mapping rate = 100.0%)\\n[PROGRESS]\\t: \\t100%\\n[STATUS]\\t\\t: \\tWriting mapping_1.bam.cov\\n[STATUS]\\t\\t: Generating BlobDB and writing to file test_files/my_first_blobplot.blobDB.json\\n# writes file 'test_files/my_first_blobplot.blobDB.json'\",\n      \"language\": \"text\",\n      \"name\": \"Output\"\n    }\n  ]\n}\n[/block]\n\n[block:api-header]\n{\n  \"type\": \"basic\",\n  \"title\": \"2. Create a view of a blobDB file\"\n}\n[/block]\n### 2.1. Creating a view\n\n* Using ```blobtools view```, one can extract information stored in a [blobDB file](doc:blobdb).  \n* Using the default settings, ```blobtools view``` will generate a tabular output for [taxonomic rank](doc:taxonomic-rank) of \"phylum\" under the [tax-rule](doc:tax-rule) \"bestsum\".\n[block:code]\n{\n  \"codes\": [\n    {\n      \"code\": \"./blobtools view \\\\\\n\\t-i test_files/my_first_blobplot.blobDB.json \\\\\\n  -o test_files/\",\n      \"language\": \"shell\",\n      \"name\": \"Input\"\n    },\n    {\n      \"code\": \"[STATUS]\\t\\t: Reading BlobDB test_files/my_first_blobplot.blobDB.json\\n[STATUS]\\t\\t: \\tLoading BlobDB into memory ...\\n[STATUS]\\t\\t: \\tSerialising BlobDB (using 'ujson' module) (this may take a while) ...\\n[STATUS]\\t\\t: \\tFinished in 0.0109348297119s\\n[STATUS]\\t\\t: Preparing view(s) ...\\n[PROGRESS]\\t: \\t100%\\n[STATUS]\\t\\t: \\tWriting test_files/my_first_blobplot.blobDB.table.txt\\n[STATUS]\\t\\t: Writing output ...\\n# writes file 'test_files/my_first_blobplot.blobDB.table.txt'\",\n      \"language\": \"text\",\n      \"name\": \"Output\"\n    }\n  ]\n}\n[/block]\n### 2.2. Inspecting the output\n\n* The resulting file can be inspected using a one-liner (to indent the columns properly), e.g.:\n[block:code]\n{\n  \"codes\": [\n    {\n      \"code\": \"grep '^##' test_files/my_first_blobplot.blobDB.table.txt ; \\\\\\n\\tgrep -v '^##' test_files/my_first_blobplot.blobDB.table.txt | \\\\\\n\\tcolumn -t -s $'\\\\t'\",\n      \"language\": \"shell\",\n      \"name\": \"Input\"\n    },\n    {\n      \"code\": \"## blobtools v0.9.19\\n## assembly\\t: /Users/dominik/git/blobtools/test_files/assembly.fna\\n## coverage\\t: bam0 - /Users/dominik/git/blobtools/test_files/mapping_1.bam\\n## taxonomy\\t: tax0 - /Users/dominik/git/blobtools/test_files/blast.out\\n## nodesDB\\t: /Users/dominik/git/blobtools/data/nodesDB.txt\\n## taxrule\\t: bestsum\\n##\\n# name     length  GC      N  bam0     phylum.t.6      phylum.s.7  phylum.c.8\\ncontig_1   756     0.2606  0  90.406   Actinobacteria  200.0       0\\ncontig_2   1060    0.2623  0  168.409  Actinobacteria  2300.0      0\\ncontig_3   602     0.2342  0  43.761   Actinobacteria  10000.0     0\\ncontig_4   951     0.3155  0  456.313  Actinobacteria  1000.0      0\\ncontig_5   614     0.329   0  163.557  Nematoda        2000.0      0\\ncontig_6   216     0.1944  0  25.88    Tardigrada      4000.0      2\\ncontig_7   4060    0.2584  0  52.312   Nematoda        2000.0      0\\ncontig_8   2346    0.2801  0  91.742   unresolved      2000.0      1\\ncontig_9   1599    0.2439  0  74.757   Nematoda        200.0       0\\ncontig_10  6273    0.3067  0  310.634  no-hit          0.0         None\",\n      \"language\": \"text\",\n      \"name\": \"Output\"\n    }\n  ]\n}\n[/block]\n* The main header (##) contains version number, input files and parameters used while creating the [blobDB file](doc:blobdb).  \n* The table header (#) indicates the names of the columns (explained in the table below).\n* All following rows contain information about the sequences in the order they appear in the [assembly file](doc:assembly-file).\n[block:parameters]\n{\n  \"data\": {\n    \"h-0\": \"column header\",\n    \"h-1\": \"description\",\n    \"0-0\": \"name\",\n    \"0-1\": \"name of the sequence\",\n    \"1-0\": \"length\",\n    \"1-1\": \"total length of the sequence, i.e. count(A, G, C, T, N)\",\n    \"2-0\": \"GC\",\n    \"2-1\": \"GC content percentage of the sequence, i.e. count(G, C)/count(A, G, C, T)\",\n    \"3-0\": \"N\",\n    \"3-1\": \"Number of N's in the sequence, i.e. count(N)\",\n    \"4-0\": \"bam0\",\n    \"4-1\": \"Coverage from bam0 (see main header for filename)\",\n    \"5-0\": \"phylum.t.6\",\n    \"5-1\": \"The assigned [taxonomy](doc:taxonomy) of the sequence at the [taxonomic rank](doc:taxonomic-rank) of \\\"phylum\\\" under the [tax-rule](doc:tax-rule) \\\"best-sum\\\"\",\n    \"6-0\": \"phylum.s.7\",\n    \"6-1\": \"The sum of scores for the [taxonomy](doc:taxonomy) of the sequence at the [taxonomic rank](doc:taxonomic-rank) of \\\"phylum\\\" under the [tax-rule](doc:tax-rule) \\\"best-sum\\\"\",\n    \"7-0\": \"phylum.c.8\",\n    \"7-1\": \"The [c-index](doc:c-index) for the taxonomy of the sequence at the [taxonomic rank](doc:taxonomic-rank) of \\\"phylum\\\" under the [tax-rule](doc:tax-rule) \\\"best-sum\\\"\"\n  },\n  \"cols\": 2,\n  \"rows\": 8\n}\n[/block]\n* If N > 1 [coverage files](doc:coverage-file) are provided, N+1 coverage columns would show up in the table. One column per [coverage files](doc:coverage-file), and an additional column listing the sum of coverages across all files.\n* The last three columns list information about the taxonomy at the [taxonomic rank](doc:taxonomic-rank) of \"phylum\" under the [tax-rule](doc:tax-rule) \"best-sum\". \n * If N > 1 [taxonomic ranks](doc:taxonomic-rank) are specified using the ```-r``` argument, additional columns will appear in the table. \n * If the ```--hits``` flag is used, the table will include the sum of scores by [hits file](doc:taxonomy-file) for each [taxID](doc:taxid) \n * The headers of the \"taxonomy\" columns are numbered (1-based) to ease filtering using command line applications such as ```GNU awk``` or ```UNIX/LINUX cut```.\n[block:api-header]\n{\n  \"type\": \"basic\",\n  \"title\": \"3. Create a blobplot\"\n}\n[/block]\n\n[block:code]\n{\n  \"codes\": [\n    {\n      \"code\": \"./blobtools blobplot \\\\\\n\\t-i test_files/blobDB.json \\\\\\n  -o test_files/ \",\n      \"language\": \"text\",\n      \"name\": \"input\"\n    },\n    {\n      \"code\": \"[STATUS]\\t: Reading BlobDB test_files/blobDB.json\\n[STATUS]\\t: \\tLoading BlobDB into memory ...\\n[STATUS]\\t: \\tSerialising BlobDB (using 'ujson' module) (this may take a while) ...\\n[STATUS]\\t: \\tFinished in 0.00234913825989s\\n[STATUS]\\t: Extracting data for plots ...\\n\\t[INFO]\\t: no-hit : sequences = 1, span = 0.01 MB, N50 = 6,273 nt\\n\\t[INFO]\\t: Nematoda : sequences = 3, span = 0.01 MB, N50 = 4,060 nt\\n\\t[INFO]\\t: unresolved : sequences = 2, span = 0.0 MB, N50 = 2,346 nt\\n\\t[INFO]\\t: Actinobacteria : sequences = 3, span = 0.0 MB, N50 = 951 nt\\n\\t[INFO]\\t: Tardigrada : sequences = 1, span = 0.0 MB, N50 = 216 nt\\n[STATUS]\\t: Plotting test_files/blobDB.json.bestsum.phylum.p7.span.100.blobplot.bam0.png\\n[STATUS]\\t: Plotting test_files/blobDB.json.bestsum.phylum.p7.span.100.blobplot.read_cov.bam0.png\\n[STATUS]\\t: Writing test_files/blobDB.json.bestsum.phylum.p7.span.100.blobplot.stats.txt\",\n      \"language\": \"text\",\n      \"name\": \"output\"\n    }\n  ]\n}\n[/block]\n* This generates three files:\n * the [blobplot](doc:blobplot) ```blobDB.json.bestsum.phylum.p7.span.100.blobplot.bam0.png```\n * the [readcovplot](doc:readcovplot) ```blobDB.json.bestsum.phylum.p7.span.100.blobplot.read_cov.bam0.png```\n * the stats file ```blobDB.json.bestsum.phylum.p7.span.100.blobplot.stats.txt```\n  * This file lists stats for **all** the groups in the blobDB, also the ones that were   \n[block:image]\n{\n  \"images\": [\n    {\n      \"image\": [\n        \"https://files.readme.io/bc23079-blobDB.json.bestsum.phylum.p7.span.100.blobplot.bam0.png\",\n        \"blobDB.json.bestsum.phylum.p7.span.100.blobplot.bam0.png\",\n        3500,\n        3500,\n        \"#f3f3f3\"\n      ],\n      \"sizing\": \"smart\",\n      \"border\": true,\n      \"caption\": \"Blobplot of test_files/blobDB.json\"\n    }\n  ]\n}\n[/block]\n\n[block:image]\n{\n  \"images\": [\n    {\n      \"image\": [\n        \"https://files.readme.io/f9f2f7b-blobDB.json.bestsum.phylum.p7.span.100.blobplot.read_cov.bam0.png\",\n        \"blobDB.json.bestsum.phylum.p7.span.100.blobplot.read_cov.bam0.png\",\n        3000,\n        1000,\n        \"#f1f1f0\"\n      ],\n      \"border\": true,\n      \"caption\": \"Readcovplot of test_files/blobDB.json\"\n    }\n  ]\n}\n[/block]","excerpt":"This section describes the how to generate a blobplot based on the files provided in the ```test_files/``` directory.","slug":"my-first-blobplot","type":"basic","title":"My first blobplot"}

My first blobplot

This section describes the how to generate a blobplot based on the files provided in the ```test_files/``` directory.

[block:api-header] { "type": "basic", "title": "1. Create a blobDB file from input files" } [/block] ### 1.1 Input file requirements The required input files are: - **one** [assembly file](doc:assembly-file), e.g. test_files/assembly.fna - **one** (or more) [coverage file(s)](doc:coverage-file) e.g. test_files/mapping_1.bam - **one** (or more) [hits file(s)](doc:taxonomy-file), e.g. test_files/blast.out ### 1.2 Creating a blobDB [block:code] { "codes": [ { "code": "# Since we are going to use a BAM file, make sure that samtools is added to you $PATH variable\nexport PATH=\"/path/to/samtools:$PATH\"\n\n# using the files provided in the test_files/ folder \n./blobtools create \\\n\t-i test_files/assembly.fna \\\n -b test_files/mapping_1.bam \\\n -t test_files/blast.out \\\n -o test_files/my_first_blobplot", "language": "shell", "name": "Input" }, { "code": "[STATUS]\t\t: Parsing FASTA - test_files/assembly.fna\n[STATUS]\t\t: names.dmp/nodes.dmp not specified. Retrieving nodesDB from data/nodesDB.txt\n[PROGRESS]\t: \t100%\n[STATUS]\t\t: Parsing tax0 - test_files/blast.out\n[STATUS]\t\t: Computing taxonomy using taxrule(s) bestsum\n[PROGRESS]\t: \t100%\n[STATUS]\t\t: Parsing bam0 - test_files/mapping_1.bam\n[STATUS]\t\t: \tChecking with 'samtools flagstat'\n[STATUS]\t\t: \tMapping reads = 15,313, total reads = 15,313 (mapping rate = 100.0%)\n[PROGRESS]\t: \t100%\n[STATUS]\t\t: \tWriting mapping_1.bam.cov\n[STATUS]\t\t: Generating BlobDB and writing to file test_files/my_first_blobplot.blobDB.json\n# writes file 'test_files/my_first_blobplot.blobDB.json'", "language": "text", "name": "Output" } ] } [/block] [block:api-header] { "type": "basic", "title": "2. Create a view of a blobDB file" } [/block] ### 2.1. Creating a view * Using ```blobtools view```, one can extract information stored in a [blobDB file](doc:blobdb). * Using the default settings, ```blobtools view``` will generate a tabular output for [taxonomic rank](doc:taxonomic-rank) of "phylum" under the [tax-rule](doc:tax-rule) "bestsum". [block:code] { "codes": [ { "code": "./blobtools view \\\n\t-i test_files/my_first_blobplot.blobDB.json \\\n -o test_files/", "language": "shell", "name": "Input" }, { "code": "[STATUS]\t\t: Reading BlobDB test_files/my_first_blobplot.blobDB.json\n[STATUS]\t\t: \tLoading BlobDB into memory ...\n[STATUS]\t\t: \tSerialising BlobDB (using 'ujson' module) (this may take a while) ...\n[STATUS]\t\t: \tFinished in 0.0109348297119s\n[STATUS]\t\t: Preparing view(s) ...\n[PROGRESS]\t: \t100%\n[STATUS]\t\t: \tWriting test_files/my_first_blobplot.blobDB.table.txt\n[STATUS]\t\t: Writing output ...\n# writes file 'test_files/my_first_blobplot.blobDB.table.txt'", "language": "text", "name": "Output" } ] } [/block] ### 2.2. Inspecting the output * The resulting file can be inspected using a one-liner (to indent the columns properly), e.g.: [block:code] { "codes": [ { "code": "grep '^##' test_files/my_first_blobplot.blobDB.table.txt ; \\\n\tgrep -v '^##' test_files/my_first_blobplot.blobDB.table.txt | \\\n\tcolumn -t -s $'\\t'", "language": "shell", "name": "Input" }, { "code": "## blobtools v0.9.19\n## assembly\t: /Users/dominik/git/blobtools/test_files/assembly.fna\n## coverage\t: bam0 - /Users/dominik/git/blobtools/test_files/mapping_1.bam\n## taxonomy\t: tax0 - /Users/dominik/git/blobtools/test_files/blast.out\n## nodesDB\t: /Users/dominik/git/blobtools/data/nodesDB.txt\n## taxrule\t: bestsum\n##\n# name length GC N bam0 phylum.t.6 phylum.s.7 phylum.c.8\ncontig_1 756 0.2606 0 90.406 Actinobacteria 200.0 0\ncontig_2 1060 0.2623 0 168.409 Actinobacteria 2300.0 0\ncontig_3 602 0.2342 0 43.761 Actinobacteria 10000.0 0\ncontig_4 951 0.3155 0 456.313 Actinobacteria 1000.0 0\ncontig_5 614 0.329 0 163.557 Nematoda 2000.0 0\ncontig_6 216 0.1944 0 25.88 Tardigrada 4000.0 2\ncontig_7 4060 0.2584 0 52.312 Nematoda 2000.0 0\ncontig_8 2346 0.2801 0 91.742 unresolved 2000.0 1\ncontig_9 1599 0.2439 0 74.757 Nematoda 200.0 0\ncontig_10 6273 0.3067 0 310.634 no-hit 0.0 None", "language": "text", "name": "Output" } ] } [/block] * The main header (##) contains version number, input files and parameters used while creating the [blobDB file](doc:blobdb). * The table header (#) indicates the names of the columns (explained in the table below). * All following rows contain information about the sequences in the order they appear in the [assembly file](doc:assembly-file). [block:parameters] { "data": { "h-0": "column header", "h-1": "description", "0-0": "name", "0-1": "name of the sequence", "1-0": "length", "1-1": "total length of the sequence, i.e. count(A, G, C, T, N)", "2-0": "GC", "2-1": "GC content percentage of the sequence, i.e. count(G, C)/count(A, G, C, T)", "3-0": "N", "3-1": "Number of N's in the sequence, i.e. count(N)", "4-0": "bam0", "4-1": "Coverage from bam0 (see main header for filename)", "5-0": "phylum.t.6", "5-1": "The assigned [taxonomy](doc:taxonomy) of the sequence at the [taxonomic rank](doc:taxonomic-rank) of \"phylum\" under the [tax-rule](doc:tax-rule) \"best-sum\"", "6-0": "phylum.s.7", "6-1": "The sum of scores for the [taxonomy](doc:taxonomy) of the sequence at the [taxonomic rank](doc:taxonomic-rank) of \"phylum\" under the [tax-rule](doc:tax-rule) \"best-sum\"", "7-0": "phylum.c.8", "7-1": "The [c-index](doc:c-index) for the taxonomy of the sequence at the [taxonomic rank](doc:taxonomic-rank) of \"phylum\" under the [tax-rule](doc:tax-rule) \"best-sum\"" }, "cols": 2, "rows": 8 } [/block] * If N > 1 [coverage files](doc:coverage-file) are provided, N+1 coverage columns would show up in the table. One column per [coverage files](doc:coverage-file), and an additional column listing the sum of coverages across all files. * The last three columns list information about the taxonomy at the [taxonomic rank](doc:taxonomic-rank) of "phylum" under the [tax-rule](doc:tax-rule) "best-sum". * If N > 1 [taxonomic ranks](doc:taxonomic-rank) are specified using the ```-r``` argument, additional columns will appear in the table. * If the ```--hits``` flag is used, the table will include the sum of scores by [hits file](doc:taxonomy-file) for each [taxID](doc:taxid) * The headers of the "taxonomy" columns are numbered (1-based) to ease filtering using command line applications such as ```GNU awk``` or ```UNIX/LINUX cut```. [block:api-header] { "type": "basic", "title": "3. Create a blobplot" } [/block] [block:code] { "codes": [ { "code": "./blobtools blobplot \\\n\t-i test_files/blobDB.json \\\n -o test_files/ ", "language": "text", "name": "input" }, { "code": "[STATUS]\t: Reading BlobDB test_files/blobDB.json\n[STATUS]\t: \tLoading BlobDB into memory ...\n[STATUS]\t: \tSerialising BlobDB (using 'ujson' module) (this may take a while) ...\n[STATUS]\t: \tFinished in 0.00234913825989s\n[STATUS]\t: Extracting data for plots ...\n\t[INFO]\t: no-hit : sequences = 1, span = 0.01 MB, N50 = 6,273 nt\n\t[INFO]\t: Nematoda : sequences = 3, span = 0.01 MB, N50 = 4,060 nt\n\t[INFO]\t: unresolved : sequences = 2, span = 0.0 MB, N50 = 2,346 nt\n\t[INFO]\t: Actinobacteria : sequences = 3, span = 0.0 MB, N50 = 951 nt\n\t[INFO]\t: Tardigrada : sequences = 1, span = 0.0 MB, N50 = 216 nt\n[STATUS]\t: Plotting test_files/blobDB.json.bestsum.phylum.p7.span.100.blobplot.bam0.png\n[STATUS]\t: Plotting test_files/blobDB.json.bestsum.phylum.p7.span.100.blobplot.read_cov.bam0.png\n[STATUS]\t: Writing test_files/blobDB.json.bestsum.phylum.p7.span.100.blobplot.stats.txt", "language": "text", "name": "output" } ] } [/block] * This generates three files: * the [blobplot](doc:blobplot) ```blobDB.json.bestsum.phylum.p7.span.100.blobplot.bam0.png``` * the [readcovplot](doc:readcovplot) ```blobDB.json.bestsum.phylum.p7.span.100.blobplot.read_cov.bam0.png``` * the stats file ```blobDB.json.bestsum.phylum.p7.span.100.blobplot.stats.txt``` * This file lists stats for **all** the groups in the blobDB, also the ones that were [block:image] { "images": [ { "image": [ "https://files.readme.io/bc23079-blobDB.json.bestsum.phylum.p7.span.100.blobplot.bam0.png", "blobDB.json.bestsum.phylum.p7.span.100.blobplot.bam0.png", 3500, 3500, "#f3f3f3" ], "sizing": "smart", "border": true, "caption": "Blobplot of test_files/blobDB.json" } ] } [/block] [block:image] { "images": [ { "image": [ "https://files.readme.io/f9f2f7b-blobDB.json.bestsum.phylum.p7.span.100.blobplot.read_cov.bam0.png", "blobDB.json.bestsum.phylum.p7.span.100.blobplot.read_cov.bam0.png", 3000, 1000, "#f1f1f0" ], "border": true, "caption": "Readcovplot of test_files/blobDB.json" } ] } [/block]