These are chat archives for dereneaton/ipyrad

11th
Jun 2018
joqb
@joqb
Jun 11 12:57

Hi @isaacovercast , @eaton-lab , with a colleague we are trying to get ipyrad 0.7.24 running on a new dataset.
However we got an error with the very first step:
ImportError(No module named ipyparallel.serialize.canning) and also we were not able to update to the new version.

Any idea?
Cheers

Isaac Overcast
@isaacovercast
Jun 11 14:06
@joqb What happened when you tried to update to the new version? What's the exact error message? Are you sure ipyparallel is installed properly conda install ipyparallel.
@jmark_porter_twitter Could be something goofy in one of the sample names? Can you show me the sample names of the 12 samples you're running?
joqb
@joqb
Jun 11 15:28

@isaacovercast when trying to update to 0.7.25 only 0.7.24 appears. There is not really an error, just 0.7.25 doesn't seem to be available.

I also tried to update ipyparallel but it didn't seem to resolve the issue. After killing all the processes and screens of my colleague we were able to start ipyrad properly... Not sure exactly what was the problem but now it's fine.

J. Mark Porter
@jmark_porter_twitter
Jun 11 16:25
Hi @isaacovercast , here are the names of the 12 files that I am using for parameter exploration: 15220_DR1.fastq 15424_HR1.fastq 15426_JR1.fastq 15430_CR1.fastq 15500_CR1.fastq 15501_IR1.fastq
15421_IR1.fastq 15424_JR1.fastq 15429_MR1.fastq 15442_AR1.fastq 15501_ER1.fastq 15537_GR1.fastq
All of the names are structurally the same, a collection number, letter for the individual and R1. I am perplexed.
J. Mark Porter
@jmark_porter_twitter
Jun 11 16:32
Hi @isaacovercast , I forgot to mention that the run is still active. At 9 hours into the run clustering was 83% complete (2 samples remained). It has been running for 1 day, 22:45:16, but has not written to the log file since 2018-06-09 19:30:40,018.
Isaac Overcast
@isaacovercast
Jun 11 17:00
@jobq Oh yeah, the linux version of 0.7.25 isn't in conda yet, i've been meaning to push it up, sorry for the confusion. Glad you got it working.
@jmark_porter_twitter you mentioned one of the samples didn't make it to tmpalign? Which two? Also, can I see an ls -ltr of your *_clust directory?
J. Mark Porter
@jmark_porter_twitter
Jun 11 17:08
Hi @isaacovercast , 15421_IR1.fastq and 15430_CR1.fastq did not make it into tmpalign; and here is the folder contents:
/opuntia_TEST80_clust_0.8$ ls -ltr
total 1117696
-rw-r--r-- 1 mporter columbuslab 8353924 Jun 9 10:45 15424_H.htemp
-rw-r--r-- 1 mporter columbuslab 22616994 Jun 9 10:45 15424_H.utemp
-rw-r--r-- 1 mporter columbuslab 22616994 Jun 9 10:45 15424_H.utemp.sort
-rw-r--r-- 1 mporter columbuslab 9748118 Jun 9 10:45 15424_H.clust.gz
-rw-r--r-- 1 mporter columbuslab 11377764 Jun 9 10:48 15429_M.htemp
-rw-r--r-- 1 mporter columbuslab 22575371 Jun 9 10:48 15429_M.utemp
-rw-r--r-- 1 mporter columbuslab 22575371 Jun 9 10:48 15429_M.utemp.sort
-rw-r--r-- 1 mporter columbuslab 13088000 Jun 9 10:48 15501_E.htemp
-rw-r--r-- 1 mporter columbuslab 24381436 Jun 9 10:48 15501_E.utemp
-rw-r--r-- 1 mporter columbuslab 24381436 Jun 9 10:48 15501_E.utemp.sort
-rw-r--r-- 1 mporter columbuslab 10980645 Jun 9 10:48 15429_M.clust.gz
-rw-r--r-- 1 mporter columbuslab 12377527 Jun 9 10:48 15501_E.clust.gz
-rw-r--r-- 1 mporter columbuslab 11966537 Jun 9 10:50 15501_I.htemp
-rw-r--r-- 1 mporter columbuslab 23539654 Jun 9 10:50 15501_I.utemp
-rw-r--r-- 1 mporter columbuslab 23539654 Jun 9 10:50 15501_I.utemp.sort
-rw-r--r-- 1 mporter columbuslab 11618296 Jun 9 10:51 15501_I.clust.gz
-rw-r--r-- 1 mporter columbuslab 17976974 Jun 9 10:52 15500_C.htemp
-rw-r--r-- 1 mporter columbuslab 29978573 Jun 9 10:52 15500_C.utemp
-rw-r--r-- 1 mporter columbuslab 29978573 Jun 9 10:52 15500_C.utemp.sort
-rw-r--r-- 1 mporter columbuslab 16113106 Jun 9 10:52 15500_C.clust.gz
-rw-r--r-- 1 mporter columbuslab 10413974 Jun 9 10:57 15537_G.htemp
-rw-r--r-- 1 mporter columbuslab 33478735 Jun 9 10:57 15537_G.utemp
-rw-r--r-- 1 mporter columbuslab 33478735 Jun 9 10:57 15537_G.utemp.sort
-rw-r--r-- 1 mporter columbuslab 13743582 Jun 9 10:57 15537_G.clust.gz
-rw-r--r-- 1 mporter columbuslab 19084859 Jun 9 11:00 15426_J.htemp
-rw-r--r-- 1 mporter columbuslab 42698205 Jun 9 11:00 15426_J.utemp
-rw-r--r-- 1 mporter columbuslab 42698205 Jun 9 11:00 15426_J.utemp.sort
-rw-r--r-- 1 mporter columbuslab 20166847 Jun 9 11:00 15426_J.clust.gz
-rw-r--r-- 1 mporter columbuslab 23508687 Jun 9 11:05 15442_A.htemp
-rw-r--r-- 1 mporter columbuslab 31803005 Jun 9 11:05 15442_A.utemp
-rw-r--r-- 1 mporter columbuslab 31803005 Jun 9 11:05 15442_A.utemp.sort
-rw-r--r-- 1 mporter columbuslab 18853671 Jun 9 11:06 15442_A.clust.gz
-rw-r--r-- 1 mporter columbuslab 26423095 Jun 9 11:09 15220_D.htemp
-rw-r--r-- 1 mporter columbuslab 35397190 Jun 9 11:09 15220_D.utemp
-rw-r--r-- 1 mporter columbuslab 35397190 Jun 9 11:09 15220_D.utemp.sort
-rw-r--r-- 1 mporter columbuslab 21365333 Jun 9 11:10 15220_D.clust.gz
-rw-r--r-- 1 mporter columbuslab 25360367 Jun 9 19:29 15424_J.htemp
-rw-r--r-- 1 mporter columbuslab 38287291 Jun 9 19:29 15424_J.utemp
-rw-r--r-- 1 mporter columbuslab 38287291 Jun 9 19:29 15424_J.utemp.sort
-rw-r--r-- 1 mporter columbuslab 21389779 Jun 9 19:30 15424_J.clust.gz
-rw-r--r-- 1 mporter columbuslab 50331648 Jun 10 14:31 15430_C.utemp
-rw-r--r-- 1 mporter columbuslab 33554432 Jun 11 01:40 15430_C.htemp
-rw-r--r-- 1 mporter columbuslab 33554432 Jun 11 02:54 15421_I.htemp
-rw-r--r-- 1 mporter columbuslab 83886080 Jun 11 06:28 15421_I.utemp
Isaac Overcast
@isaacovercast
Jun 11 17:19
-rw-r--r-- 1 mporter columbuslab 50331648 Jun 10 14:31 15430_C.utemp
-rw-r--r-- 1 mporter columbuslab 33554432 Jun 11 01:40 15430_C.htemp
-rw-r--r-- 1 mporter columbuslab 33554432 Jun 11 02:54 15421_I.htemp
-rw-r--r-- 1 mporter columbuslab 83886080 Jun 11 06:28 15421_I.utemp
Can you look in top and see if vsearch is running?
Also you can ps -ef | grep vsearch
Both of the samples that didn't finish yet have much larger file sizes, it looks like. Can i see the ls -l of the *_edits directory?
The htemp files are unique loci, and the utemp file is reads that match these loci. So these massive utemp files will really slow things down, also they probably indicate lots of singletons. Can I see your params file as well?
J. Mark Porter
@jmark_porter_twitter
Jun 11 17:52
Hi @isaacovercast, here are the contents of the edits folder:
/opuntia_TEST80_edits$ ls -l
total 3585024
-rw-r--r-- 1 mporter columbuslab 173364756 Jun 9 10:41 15220_D.trimmedR1.fastq.gz
-rw-r--r-- 1 mporter columbuslab 76229835 Jun 11 10:15 15220_D_derep.fastq
-rw-r--r-- 1 mporter columbuslab 254519730 Jun 9 10:40 15421_I.trimmedR1.fastq.gz
-rw-r--r-- 1 mporter columbuslab 311438095 Jun 11 10:15 15421_I_derep.fastq
-rw-r--r-- 1 mporter columbuslab 158156641 Jun 9 10:40 15424_H.trimmedR1.fastq.gz
-rw-r--r-- 1 mporter columbuslab 39915256 Jun 11 10:15 15424_H_derep.fastq
-rw-r--r-- 1 mporter columbuslab 179658938 Jun 9 10:41 15424_J.trimmedR1.fastq.gz
-rw-r--r-- 1 mporter columbuslab 79192111 Jun 11 10:15 15424_J_derep.fastq
-rw-r--r-- 1 mporter columbuslab 274067908 Jun 9 10:41 15426_J.trimmedR1.fastq.gz
-rw-r--r-- 1 mporter columbuslab 78824913 Jun 11 10:15 15426_J_derep.fastq
-rw-r--r-- 1 mporter columbuslab 154886088 Jun 9 10:41 15429_M.trimmedR1.fastq.gz
-rw-r--r-- 1 mporter columbuslab 42928090 Jun 11 10:15 15429_M_derep.fastq
-rw-r--r-- 1 mporter columbuslab 326688904 Jun 9 10:42 15430_C.trimmedR1.fastq.gz
-rw-r--r-- 1 mporter columbuslab 438029303 Jun 11 10:15 15430_C_derep.fastq
-rw-r--r-- 1 mporter columbuslab 168733476 Jun 9 10:40 15442_A.trimmedR1.fastq.gz
-rw-r--r-- 1 mporter columbuslab 68207491 Jun 11 10:15 15442_A_derep.fastq
-rw-r--r-- 1 mporter columbuslab 160165476 Jun 9 10:40 15500_C.trimmedR1.fastq.gz
-rw-r--r-- 1 mporter columbuslab 60039888 Jun 11 10:15 15500_C_derep.fastq
-rw-r--r-- 1 mporter columbuslab 154624083 Jun 9 10:41 15501_E.trimmedR1.fastq.gz
-rw-r--r-- 1 mporter columbuslab 47224379 Jun 11 10:15 15501_E_derep.fastq
-rw-r--r-- 1 mporter columbuslab 155009130 Jun 9 10:40 15501_I.trimmedR1.fastq.gz
-rw-r--r-- 1 mporter columbuslab 44911195 Jun 11 10:15 15501_I_derep.fastq
-rw-r--r-- 1 mporter columbuslab 159926330 Jun 9 10:41 15537_G.trimmedR1.fastq.gz
-rw-r--r-- 1 mporter columbuslab 57211078 Jun 11 10:15 15537_G_derep.fastq
-rw-r--r-- 1 mporter columbuslab 1741 Jun 9 10:42 s2_rawedit_stats.txt
I apologize, I scanceled the run just before you responded. I restarted with shorter names and it is ca. 63% complete with custering. However, vsearch does not seem to be running at the moment:
9871 jdean007 20 0 4580 832 440 R 50.8 0.0 7:12.70 gzip
13852 mwoolls 20 0 267016 14672 8520 R 50.8 0.0 2642:00 mpirun
21957 cyang 20 0 749524 667816 1544 R 50.5 0.5 16516:19 mb
22461 root 0 -20 18.199g 0.010t 1.780g S 2.6 8.0 4951:23 mmfsd
14937 mporter 20 0 65664 3924 1884 R 1.3 0.0 0:01.20 top
19180 pmuhind+ 20 0 65780 3992 1904 R 1.3 0.0 6:04.78 top
10 root 20 0 0 0 0 S 0.3 0.0 251:42.92 rcu_sched
5184 root 20 0 1070892 73600 42580 S 0.3 0.1 353:57.19 fail2ban-server
7046 root 20 0 2935948 37152 4412 S 0.3 0.0 986:31.78 python
7556 ganglia 20 0 167160

Here is the params file:
------- ipyrad params file (v.0.5.15)-------------------------------------------
opuntia_TEST80 ## [0] [assembly_name]: Assembly name. Used to name output directories for assembly steps
~/bigdata/ddRAD/opuntia_ipyrad/TESTS ## [1] [project_dir]: Project dir (made in curdir if not present)

                                                          ## [2] [raw_fastq_path]: Location of raw non-demultiplexed fastq files
                                                          ## [3] [barcodes_path]: Location of barcodes file

~/bigdata/ddRAD/opuntia_ipyrad/newsclero_TEST_fastqs/*.fastq ## [4] [sorted_fastq_path]: Location of demultiplexed/sorted fastq files
denovo ## [5] [assembly_method]: Assembly method (denovo, reference, denovo+reference, denovo-reference)

                                                          ## [6] [reference_sequence]: Location of reference sequence file

ddrad ## [7] [datatype]: Datatype (see docs): rad, gbs, ddrad, etc.
TA, TGCA ## [8] [restriction_overhang]: Restriction overhang (cut1,) or (cut1, cut2)
1 ## [9] [max_low_qual_bases]: Max low quality base calls (Q<20) in a read
33 ## [10] [phred_Qscore_offset]: phred Q score offset (33 is default and very standard)
10 ## [11] [mindepth_statistical]: Min depth for statistical base calling
10 ## [12] [mindepth_majrule]: Min depth for majority-rule base calling
10000 ## [13] [maxdepth]: Max cluster depth within samples
0.80 ## [14] [clust_threshold]: Clustering threshold for de novo assembly
0 ## [15] [max_barcode_mismatch]: Max number of allowable mismatches in barcodes
0 ## [16] [filter_adapters]: Filter for adapters/primers (1 or 2=stricter)
50 ## [17] [filter_min_trim_len]: Min length of reads after adapter trim
2 ## [18] [max_alleles_consens]: Max alleles per site in consensus sequences
2 ## [19] [max_Ns_consens]: Max N's (uncalled bases) in consensus (R1, R2)
8 ## [20] [max_Hs_consens]: Max Hs (heterozygotes) in consensus (R1, R2)
4 ## [21] [min_samples_locus]: Min # samples per locus for output
20 ## [22] [max_SNPs_locus]: Max # SNPs per locus (R1, R2)
8 ## [23] [max_Indels_locus]: Max # of indels per locus (R1, R2)
0.5 ## [24] [max_shared_Hs_locus]: Max # heterozygous sites per locus (R1, R2)
0, 0, 0, 0 ## [25] [trim_reads]: Trim raw read edges (R1>, <R1, R2>, <R2) (see docs) 0, 0, 0, 0 ## [26] [trim_loci]: Trim locus edges (see docs) (R1>, <R1, R2>, <R2)
G, a, g, k, m, l, n, p, s, u, t, v ## [27] [output_formats]: Output formats (see docs)

                                                          ## [28] [pop_assign_file]: Path to population assignment file
Isaac Overcast
@isaacovercast
Jun 11 18:02
Wait until it gets back to being "stuck" again, and then check top and ps for vsearch.
Whoa, just noticed this is conspicuous:
-rw-r--r-- 1 mporter columbuslab 33554432 Jun 11 01:40 15430_C.htemp
-rw-r--r-- 1 mporter columbuslab 33554432 Jun 11 02:54 15421_I.htemp
Identical file sizes for the htemp files for the two misbehaving samples? That's astronomically unlikely by chance.
Did you run ipyrad with -d again? Can you grep vsearch ipyrad_log.txt?
J. Mark Porter
@jmark_porter_twitter
Jun 11 18:07
Yes, I used the -d flag...
Here is the tail of the search:
2018-06-11 10:15:20,536 pid=34688 [cluster_within.py] INFO derep cmd ['/rhome/mporter/miniconda2/lib/python2.7/site-packages/bin/vsearch-linux-x86_64', '--derep_fulllength', '-', '--strand', 'plus', '--output', '/bigdata/columbuslab/mporter/ddRAD/opuntia_ipyrad/TESTS/opuntia_TEST80_edits/15424_H_derep.fastq', '--threads', '4', '--fasta_width', '0', '--fastq_qmax', '1000', '--sizeout', '--relabel_md5']
2018-06-11 10:15:20,559 pid=34697 [cluster_within.py] INFO derep cmd ['/rhome/mporter/miniconda2/lib/python2.7/site-packages/bin/vsearch-linux-x86_64', '--derep_fulllength', '-', '--strand', 'plus', '--output', '/bigdata/columbuslab/mporter/ddRAD/opuntia_ipyrad/TESTS/opuntia_TEST80_edits/15430_C_derep.fastq', '--threads', '4', '--fasta_width', '0', '--fastq_qmax', '1000', '--sizeout', '--relabel_md5']
2018-06-11 10:15:20,638 pid=46048 [cluster_within.py] INFO derep cmd ['/rhome/mporter/miniconda2/lib/python2.7/site-packages/bin/vsearch-linux-x86_64', '--derep_fulllength', '-', '--strand', 'plus', '--output', '/bigdata/columbuslab/mporter/ddRAD/opuntia_ipyrad/TESTS/opuntia_TEST80_edits/15537_G_derep.fastq', '--threads', '4', '--fasta_width', '0', '--fastq_qmax', '1000', '--sizeout', '--relabel_md5']
2018-06-11 10:15:20,638 pid=46049 [cluster_within.py] INFO derep cmd ['/rhome/mporter/miniconda2/lib/python2.7/site-packages/bin/vsearch-linux-x86_64', '--derep_fulllength', '-', '--strand', 'plus', '--output', '/bigdata/columbuslab/mporter/ddRAD/opuntia_ipyrad/TESTS/opuntia_TEST80_edits/15421_I_derep.fastq', '--threads', '4', '--fasta_width', '0', '--fastq_qmax', '1000', '--sizeout', '--relabel_md5']
2018-06-11 10:15:20,638 pid=46060 [cluster_within.py] INFO derep cmd ['/rhome/mporter/miniconda2/lib/python2.7/site-packages/bin/vsearch-linux-x86_64', '--derep_fulllength', '-', '--strand', 'plus', '--output', '/bigdata/columbuslab/mporter/ddRAD/opuntia_ipyrad/TESTS/opuntia_TEST80_edits/15429_M_derep.fastq', '--threads', '4', '--fasta_width', '0', '--fastq_qmax', '1000', '--sizeout', '--relabel_md5']
2018-06-11 10:15:21,003 pid=29181 [cluster_within.py] INFO derep cmd ['/rhome/mporter/miniconda2/lib/python2.7/site-packages/bin/vsearch-linux-x86_64', '--derep_fulllength', '-', '--strand', 'plus', '--output', '/bigdata/columbuslab/mporter/ddRAD/opuntia_ipyrad/TESTS/opuntia_TEST80_edits/15501_I_derep.fastq', '--threads', '4', '--fasta_width', '0', '--fastq_qmax', '1000', '--sizeout', '--relabel_md5']
2018-06-11 10:15:21,003 pid=29146 [cluster_within.py] INFO derep cmd ['/rhome/mporter/miniconda2/lib/python2.7/site-packages/bin/vsearch-linux-x86_64', '--derep_fulllength', '-', '--strand', 'plus', '--output', '/bigdata/columbuslab/mporter/ddRAD/opuntia_ipyrad/TESTS/opuntia_TEST80_edits/15426_J_derep.fastq', '--threads', '4', '--fasta_width', '0', '--fastq_qmax', '1000', '--sizeout', '--relabel_md5']
2018-06-11 10:15:21,040 pid=40981 [cluster_within.py] INFO derep cmd ['/rhome/mporter/miniconda2/lib/python2.7/site-packages/bin/vsearch-linux-x86_64', '--derep_fulllength', '-', '--strand', 'plus', '--output', '/bigdata/columbuslab/mporter/ddRAD/opuntia_ipyrad/TESTS/opuntia_TEST80_edits/15501_E_derep.fastq', '--threads', '4', '--fasta_width', '0', '--fastq_qmax', '1000', '--sizeout', '--relabel_md5']
2018-06-11 10:15:21,046 pid=40985 [cluster_within.py] INFO derep cmd ['/rhome/mporter/miniconda2/lib/python2.7/site-packages/bin/vsearch-linux-x86_64', '--derep_fulllength', '-', '--strand', 'plus', '--output', '/bigdata/columbuslab/mporter/ddRAD/opuntia_ipyrad/TESTS/opuntia_TEST80_edits/15220_D_derep.fastq', '--threads', '4', '--fasta_width', '0', '--fastq_qmax', '1000', '--sizeout', '--relabel_md5']
2018-06-11 10:15:21,046 pid=40984 [cluster_within.py] INFO derep cmd ['/rhome/mporter/miniconda2/lib/python2.7/site-packages/bin/vsearch-linux-x86_64', '--derep_fulllength', '-', '--strand', 'plus', '--output', '/bigdata/columbuslab/mporter/ddRAD/opuntia_ipyrad/TESTS/opuntia_TEST80_edits/15424_J_derep.fastq', '--threads', '4', '--fasta_width', '0', '--fastq_qmax', '1000', '--sizeout', '
J. Mark Porter
@jmark_porter_twitter
Jun 11 19:17
Hi @isaacovercast, I believe the run has reached the stall point. In looking at the *clust_0.8 folder I spotted two new files with identical size. As you pointed out, this is very unlikely. Note that this is not the same two files as lasttime, though one individual is the same.
-rw-r--r-- 1 mporter columbuslab 16777216 Jun 11 12:09 15421_I.utemp
-rw-r--r-- 1 mporter columbuslab 16777216 Jun 11 12:01 15424_J.utemp