These are chat archives for dereneaton/ipyrad

13th
Jun 2018
Isaac Overcast
@isaacovercast
Jun 13 2018 01:15
@jmark_porter_twitter This is bizarre to the max.
Isaac Overcast
@isaacovercast
Jun 13 2018 01:21
Can you blank your ipyrad_log.txt file (rm ipyrad_log.txt) then rerun ipyrad like this: ipyrad -p params.txt -s 3 -f -d sub in your params file, naturally. With the -d flag set you should see more stuff like this:
2018-06-12 21:18:18,017         pid=36058       [cluster_within.py]     DEBUG   ['/home/isaac/ipyrad/ipyrad/bin/vsearch-linux-x86_64', '-cluster_smallmem', '/tmp/ipyrad-test/rad_edits/1_derep.fastq', '-strand', 'plus', '-query_cov', '0.75', '-id', '0.85', '-minsl', '0.5', '-userout', '/tmp/ipyrad-test/rad_clust_0.85/1.utemp', '-userfields', 'query+target+id+gaps+qstrand+qcov', '-maxaccepts', '1', '-maxrejects', '0', '-threads', '2', '-notmatched', '/tmp/ipyrad-test/rad_clust_0.85/1.htemp', '-fasta_width', '0', '-fastq_qmax', '100', '-fulldp', '-usersort']                                                                                                                       
2018-06-12 21:18:18,019         pid=36239       [cluster_within.py]     DEBUG   ['/home/isaac/ipyrad/ipyrad/bin/vsearch-linux-x86_64', '-cluster_smallmem', '/tmp/ipyrad-test/rad_edits/11_derep.fastq', '-strand', 'plus', '-query_cov', '0.75', '-id', '0.85', '-minsl', '0.5', '-userout', '/tmp/ipyrad-test/rad_clust_0.85/11.utemp', '-userfields', 'query+target+id+gaps+qstrand+qcov', '-maxaccepts', '1', '-maxrejects', '0', '-threads', '2', '-notmatched', '/tmp/ipyrad-test/rad_clust_0.85/11.htemp', '-fasta_width', '0', '-fastq_qmax', '100', '-fulldp', '-usersort']                                                                                                                    
2018-06-12 21:18:18,022         pid=36078       [cluster_within.py]     DEBUG   ['/home/isaac/ipyrad/ipyrad/bin/vsearch-linux-x86_64', '-cluster_smallmem', '/tmp/ipyrad-test/rad_edits/6_derep.fastq', '-strand', 'plus', '-query_cov', '0.75', '-id', '0.85', '-minsl', '0.5', '-userout', '/tmp/ipyrad-test/rad_clust_0.85/6.utemp', '-userfields', 'query+target+id+gaps+qstrand+qcov', '-maxaccepts', '1', '-maxrejects', '0', '-threads', '2', '-notmatched', '/tmp/ipyrad-test/rad_clust_0.85/6.htemp', '-fasta_width', '0', '-fastq_qmax', '100', '-fulldp', '-usersort']
The vsearch -cluster_smallmem command is what's happening during step 3, and it's the parameters for these calls that I'm curious about. It's crazy that it's happening to different samples now. Is it possible that multiple assemblies are running at the same time? I can't think of what else would do this.
J. Mark Porter
@jmark_porter_twitter
Jun 13 2018 01:41
Hi @isaacovercast, I have removed the previous copy of ipyrad_log.txt and restarted the run of step 3, enabling debug mode. It has started the clustering step. The program has started rewriting the clustering files.
Isaac Overcast
@isaacovercast
Jun 13 2018 02:40
Great. Let me know how it goes. I'm really perplexed by this problem, i've never seen something like this before. I'm confident we'll figure it out!
J. Mark Porter
@jmark_porter_twitter
Jun 13 2018 17:17

Hi @isaacovercast, the ipyrad run of step 3 appears to remain running. Here is the output:

Enabling debug mode


ipyrad [v.0.7.19]

Interactive assembly and analysis of RAD-seq data

loading Assembly: opuntia_TEST80
from saved path: ~/bigdata/ddRAD/opuntia_ipyrad/TESTS/opuntia_TEST80.json
establishing parallel connection:
host compute node: [10 cores] on c08
host compute node: [10 cores] on c01
host compute node: [10 cores] on c05
host compute node: [10 cores] on c07

Step 3: Clustering/Mapping reads
[####################] 100% dereplicating | 0:00:36
[################ ] 83% clustering | 14:51:37

The files are being updated, as you can see by the date and time. However, samples 15421_I and 15430_C still do not have corresponding .utemp.sort and .clust.gz files written to the opuntia_TEST80_clust_0.8 folder. Here is the first few lines of an ls -l of the folder:

mporter@globus:~/bigdata/ddRAD/opuntia_ipyrad/TESTS$ ls -l opuntia_TEST80_clust_0.8
total 1019392
-rw-r--r-- 1 mporter columbuslab 21365333 Jun 12 19:12 15220_D.clust.gz
-rw-r--r-- 1 mporter columbuslab 26423095 Jun 12 19:11 15220_D.htemp
-rw-r--r-- 1 mporter columbuslab 35397190 Jun 12 19:11 15220_D.utemp
-rw-r--r-- 1 mporter columbuslab 35397190 Jun 12 19:11 15220_D.utemp.sort
-rw-r--r-- 1 mporter columbuslab 16777216 Jun 13 00:43 15421_I.htemp
-rw-r--r-- 1 mporter columbuslab 33554432 Jun 13 03:36 15421_I.utemp
-rw-r--r-- 1 mporter columbuslab 9748118 Jun 12 18:40 15424_H.clust.gz
-rw-r--r-- 1 mporter columbuslab 8353924 Jun 12 18:40 15424_H.htemp
-rw-r--r-- 1 mporter columbuslab 22616994 Jun 12 18:40 15424_H.utemp
-rw-r--r-- 1 mporter columbuslab 22616994 Jun 12 18:40 15424_H.utemp.sort

Also, I notice that in the *-tmpalign folder, there are no files relating to 15421_I and 15430_C. At what point should these files be written? Thanks for your insight.

Isaac Overcast
@isaacovercast
Jun 13 2018 18:27
The files in the tmpalign folder get created after the clustering step. In this case these samples are hanging before completing the clustering. It looks like they get most of the way there, if not completely finished. Can I see the results of this grep vsearch ipyrad_log.txtin the directory you run ipyrad from?
Also this: ls -l opuntia_TEST80_clust_0.8 | grep 15430 so I can see the file sizes for the other bad sample.
J. Mark Porter
@jmark_porter_twitter
Jun 13 2018 18:31
mporter@owl:~/bigdata/ddRAD/opuntia_ipyrad/TESTS$ grep vsearch ipyrad_log.txt
2018-06-12 18:37:16,664 pid=6571 [cluster_within.py] INFO derep cmd ['/rhome/mporter/miniconda2/lib/python2.7/site-packages/bin/vsearch-linux-x86_64', '--derep_fulllength', '-', '--strand', 'plus', '--output', '/bigdata/columbuslab/mporter/ddRAD/opuntia_ipyrad/TESTS/opuntia_TEST80_edits/15424_J_derep.fastq', '--threads', '4', '--fasta_width', '0', '--fastq_qmax', '1000', '--sizeout', '--relabel_md5']
2018-06-12 18:37:16,707 pid=24784 [cluster_within.py] INFO derep cmd ['/rhome/mporter/miniconda2/lib/python2.7/site-packages/bin/vsearch-linux-x86_64', '--derep_fulllength', '-', '--strand', 'plus', '--output', '/bigdata/columbuslab/mporter/ddRAD/opuntia_ipyrad/TESTS/opuntia_TEST80_edits/15537_G_derep.fastq', '--threads', '4', '--fasta_width', '0', '--fastq_qmax', '1000', '--sizeout', '--relabel_md5']
2018-06-12 18:37:16,707 pid=24794 [cluster_within.py] INFO derep cmd ['/rhome/mporter/miniconda2/lib/python2.7/site-packages/bin/vsearch-linux-x86_64', '--derep_fulllength', '-', '--strand', 'plus', '--output', '/bigdata/columbuslab/mporter/ddRAD/opuntia_ipyrad/TESTS/opuntia_TEST80_edits/15501_E_derep.fastq', '--threads', '4', '--fasta_width', '0', '--fastq_qmax', '1000', '--sizeout', '--relabel_md5']
2018-06-12 18:37:16,720 pid=34700 [cluster_within.py] INFO derep cmd ['/rhome/mporter/miniconda2/lib/python2.7/site-packages/bin/vsearch-linux-x86_64', '--derep_fulllength', '-', '--strand', 'plus', '--output', '/bigdata/columbuslab/mporter/ddRAD/opuntia_ipyrad/TESTS/opuntia_TEST80_edits/15424_H_derep.fastq', '--threads', '4', '--fasta_width', '0', '--fastq_qmax', '1000', '--sizeout', '--relabel_md5']
2018-06-12 18:37:16,720 pid=34702 [cluster_within.py] INFO derep cmd ['/rhome/mporter/miniconda2/lib/python2.7/site-packages/bin/vsearch-linux-x86_64', '--derep_fulllength', '-', '--strand', 'plus', '--output', '/bigdata/columbuslab/mporter/ddRAD/opuntia_ipyrad/TESTS/opuntia_TEST80_edits/15430_C_derep.fastq', '--threads', '4', '--fasta_width', '0', '--fastq_qmax', '1000', '--sizeout', '--relabel_md5']
2018-06-12 18:37:16,720 pid=34694 [cluster_within.py] INFO derep cmd ['/rhome/mporter/miniconda2/lib/python2.7/site-packages/bin/vsearch-linux-x86_64', '--derep_fulllength', '-', '--strand', 'plus', '--output', '/bigdata/columbuslab/mporter/ddRAD/opuntia_ipyrad/TESTS/opuntia_TEST80_edits/15220_D_derep.fastq', '--threads', '4', '--fasta_width', '0', '--fastq_qmax', '1000', '--sizeout', '--relabel_md5']
2018-06-12 18:37:16,838 pid=6581 [cluster_within.py] INFO derep cmd ['/rhome/mporter/miniconda2/lib/python2.7/site-packages/bin/vsearch-linux-x86_64', '--derep_fulllength', '-', '--strand', 'plus', '--output', '/bigdata/columbuslab/mporter/ddRAD/opuntia_ipyrad/TESTS/opuntia_TEST80_edits/15426_J_derep.fastq', '--threads', '4', '--fasta_width', '0', '--fastq_qmax', '1000', '--sizeout', '--relabel_md5']
2018-06-12 18:37:17,016 pid=6570 [cluster_within.py] INFO derep cmd ['/rhome/mporter/miniconda2/lib/python2.7/site-packages/bin/vsearch-linux-x86_64', '--derep_fulllength', '-', '--strand', 'plus', '--output', '/bigdata/columbuslab/mporter/ddRAD/opuntia_ipyrad/TESTS/opuntia_TEST80_edits/15501_I_derep.fastq', '--threads', '4', '--fasta_width', '0', '--fastq_qmax', '1000', '--sizeout', '--relabel_md5']
2018-06-12 18:37:17,431 pid=2577 [cluster_within.py] INFO derep cmd ['/rhome/mporter/miniconda2/lib/python2.7/site-packages/bin/vsearch-linux-x86_64', '--derep_fulllength', '-', '--strand', 'plus', '--output', '/bigdata/columbuslab/mporter/ddRAD/opuntia_ipyrad/TESTS/opuntia_TEST80_edits/15421_I_derep.fastq', '--threads', '4', '--fasta_width', '0', '--fastq_qmax', '1000', '--sizeout', '--relabel_md5']
2018-06-12 18:37:17,441 pid=2585 [cluster_within.py] INFO derep cmd ['/rhome/mporter/miniconda2/lib/python2.7/site-packages/bin/vsearch-linux-x86_64', '--derep_fulllength', '-', '--strand', 'plus', '--output', '/bigdata/columbuslab/mporter/ddRAD/opuntia_ipyrad/TESTS/opuntia_TEST80_edits/15429_M_derep.fastq', '--threads', '4', '--fasta_width', '0
Isaac Overcast
@isaacovercast
Jun 13 2018 19:15
Weird. Why is it not showing the vsearch cluster smallmem calls? These should only print to the log file when debug mode is on, but it's only printing the derep calls.... If you do this: grep DEBUG ipyrad_log.txt | tail do you see anything there?
J. Mark Porter
@jmark_porter_twitter
Jun 13 2018 19:41
OK, this is what I get...mporter@globus:/bigdata/columbuslab/mporter/ddRAD/opuntia_ipyrad/TESTS$ grep DEBUG ipyrad_log.txt | tail
2018-06-12 18:37:52,555 pid=6571 [cluster_within.py] DEBUG ['/rhome/mporter/miniconda2/lib/python2.7/site-packages/bin/vsearch-linux-x86_64', '-cluster_smallmem', '/bigdata/columbuslab/mporter/ddRAD/opuntia_ipyrad/TESTS/opuntia_TEST80_edits/15501_I_derep.fastq', '-strand', 'plus', '-query_cov', '0.75', '-id', '0.8', '-minsl', '0.5', '-userout', '/bigdata/columbuslab/mporter/ddRAD/opuntia_ipyrad/TESTS/opuntia_TEST80_clust_0.8/15501_I.utemp', '-userfields', 'query+target+id+gaps+qstrand+qcov', '-maxaccepts', '1', '-maxrejects', '0', '-threads', '4', '-notmatched', '/bigdata/columbuslab/mporter/ddRAD/opuntia_ipyrad/TESTS/opuntia_TEST80_clust_0.8/15501_I.htemp', '-fasta_width', '0', '-fastq_qmax', '100', '-fulldp', '-usersort']
2018-06-12 18:37:52,560 pid=6581 [cluster_within.py] DEBUG ['/rhome/mporter/miniconda2/lib/python2.7/site-packages/bin/vsearch-linux-x86_64', '-cluster_smallmem', '/bigdata/columbuslab/mporter/ddRAD/opuntia_ipyrad/TESTS/opuntia_TEST80_edits/15424_J_derep.fastq', '-strand', 'plus', '-query_cov', '0.75', '-id', '0.8', '-minsl', '0.5', '-userout', '/bigdata/columbuslab/mporter/ddRAD/opuntia_ipyrad/TESTS/opuntia_TEST80_clust_0.8/15424_J.utemp', '-userfields', 'query+target+id+gaps+qstrand+qcov', '-maxaccepts', '1', '-maxrejects', '0', '-threads', '4', '-notmatched', '/bigdata/columbuslab/mporter/ddRAD/opuntia_ipyrad/TESTS/opuntia_TEST80_clust_0.8/15424_J.htemp', '-fasta_width', '0', '-fastq_qmax', '100', '-fulldp', '-usersort']
2018-06-12 18:37:52,569 pid=6570 [cluster_within.py] DEBUG ['/rhome/mporter/miniconda2/lib/python2.7/site-packages/bin/vsearch-linux-x86_64', '-cluster_smallmem', '/bigdata/columbuslab/mporter/ddRAD/opuntia_ipyrad/TESTS/opuntia_TEST80_edits/15430_C_derep.fastq', '-strand', 'plus', '-query_cov', '0.75', '-id', '0.8', '-minsl', '0.5', '-userout', '/bigdata/columbuslab/mporter/ddRAD/opuntia_ipyrad/TESTS/opuntia_TEST80_clust_0.8/15430_C.utemp', '-userfields', 'query+target+id+gaps+qstrand+qcov', '-maxaccepts', '1', '-maxrejects', '0', '-threads', '4', '-notmatched', '/bigdata/columbuslab/mporter/ddRAD/opuntia_ipyrad/TESTS/opuntia_TEST80_clust_0.8/15430_C.htemp', '-fasta_width', '0', '-fastq_qmax', '100', '-fulldp', '-usersort']
2018-06-12 18:37:52,556 pid=34702 [cluster_within.py] DEBUG ['/rhome/mporter/miniconda2/lib/python2.7/site-packages/bin/vsearch-linux-x86_64', '-cluster_smallmem', '/bigdata/columbuslab/mporter/ddRAD/opuntia_ipyrad/TESTS/opuntia_TEST80_edits/15537_G_derep.fastq', '-strand', 'plus', '-query_cov', '0.75', '-id', '0.8', '-minsl', '0.5', '-userout', '/bigdata/columbuslab/mporter/ddRAD/opuntia_ipyrad/TESTS/opuntia_TEST80_clust_0.8/15537_G.utemp', '-userfields', 'query+target+id+gaps+qstrand+qcov', '-maxaccepts', '1', '-maxrejects', '0', '-threads', '4', '-notmatched', '/bigdata/columbuslab/mporter/ddRAD/opuntia_ipyrad/TESTS/opuntia_TEST80_clust_0.8/15537_G.htemp', '-fasta_width', '0', '-fastq_qmax', '100', '-fulldp', '-usersort']
2018-06-12 18:37:52,560 pid=2577 [cluster_within.py] DEBUG ['/rhome/mporter/miniconda2/lib/python2.7/site-packages/bin/vsearch-linux-x86_64', '-cluster_smallmem', '/bigdata/columbuslab/mporter/ddRAD/opuntia_ipyrad/TESTS/opuntia_TEST80_edits/15500_C_derep.fastq', '-strand', 'plus', '-query_cov', '0.75', '-id', '0.8', '-minsl', '0.5', '-userout', '/bigdata/columbuslab/mporter/ddRAD/opuntia_ipyrad/TESTS/opuntia_TEST80_clust_0.8/15500_C.utemp', '-userfields', 'query+target+id+gaps+qstrand+qcov', '-maxaccepts', '1', '-maxrejects', '0', '-threads', '4', '-notmatched', '/bigdata/columbuslab/mporter/ddRAD/opuntia_ipyrad/TESTS/opuntia_TEST80_clust_0.8/15500_C.htemp', '-fasta_width', '0', '-fastq_qmax', '100', '-fulldp', '-usersort']
2018-06-12 18:37:52,574 pid=24784 [cluster_within.py] DEBUG ['/rhome/mporter/miniconda2/lib/python2.7/site-packages/bin/vsearch-linux-x86_64', '-cluster_smallmem', '/bigdata/columbuslab/mporter/ddRAD/opuntia_ipyrad/TESTS/opuntia_TEST80_edits/15426_J_derep.fastq', '-strand', 'p