These are chat archives for dereneaton/ipyrad

17th
May 2017
LinaValencia85
@LinaValencia85
May 17 2017 14:02
@isaacovercast @dereneaton thanks. I will be waiting for the updated version.
Isaac Overcast
@isaacovercast
May 17 2017 15:14
@vieves If we have barcode information then we use it to trim reversecut+bcode+adapter from reverse read, if not then we have to apply a more general cut to make sure we remove the barcode, this uses wildcards and so will have more false positives that trim a little extra from the ends of reads.
Katherine Silliman
@ksil91
May 17 2017 15:50
@isaacovercast @dereneaton I'm having a similar issue to one someone had in February. I'm running step 1 on one HiSeq2500 lane of data with the ipyrad v0.6.17 API (SSH tunnel to a remote jupyter notebook). Step 1 gets to 88% sorting reads really quickly, then has been stuck there for 15 hours so far. The fastqs folder has lots of "tmp_Sample_4669.fastq" files, like 9 per sample. My ipyrad_log file is blank. Any idea what's going on? Should I cancel it and update to the newest version?
Isaac Overcast
@isaacovercast
May 17 2017 15:52
@ksil91 Step 1 is pretty rock solid, so the version you have shouldn't matter. One thing to be sure is that you have enough disk space on the drive you're demux'ing to or that you exhausting your disk quota (if there is one). This could cause step 1 to misbehave.
Katherine Silliman
@ksil91
May 17 2017 16:34
@isaacovercast Thanks for the reply. I have lots of space available and checked if there was a disk quota set and there isn't. Could it have something to do with the API or ipcluster? Also weird is that it only provides a loading bar for "sorting reads", not for "chunking large files" and it created a tmp-chunks-assemblyName folder that is empty.
Katherine Silliman
@ksil91
May 17 2017 16:46
@isaacovercast Another thought- in my barcodes file I have a couple of samples that have the same name, but different barcode sequences (technical replicates I want to combine for this analysis). Is that allowed?
Isaac Overcast
@isaacovercast
May 17 2017 17:10
@ksil91 Mmm, multiple barcodes mapping to the same sample name are not something we support. Each unique sample name is allowed one barcode sequence, so the last one to be read from the file will be selected and reads with the other barcodes will be thrown out. The easiest way around this is to give each technical replicate a unique name, then after step one concatenate all the _fastq/*.fq files for each replicate into just one of the sample fq files. Then you can branch and remove all but one of the samples per replicate group. Not the most straightforward, but it should work.
@ksil91 Step 1 should run really fast, even with large datasets. You can try killing it and rerunning with the -d flag to turn on verbose debug logging to the ipyrad_log.txt file. Are you running this on an HPC cluster? Can you show me the last 20 or 30 lines of ls -ltr in the fastqs folder?
Katherine Silliman
@ksil91
May 17 2017 17:57
@isaacovercast Ok I changed the barcodes file so all samples have a different name and it seems to be doing the same thing. I'm running this remotely on a single node HPC with 10 processors. I also can't figure out how to turn on debugging with the API. Ouput of ls -ltr:
-rw------- 1 ksilliman ksilliman 152748741 May 17 12:25 tmp_CA4_2_R1_10259.fastq
-rw------- 1 ksilliman ksilliman  34012262 May 17 12:25 tmp_CA3_8_R1_10259.fastq
-rw------- 1 ksilliman ksilliman   6107012 May 17 12:25 tmp_WA1_11_R1_10259.fastq
-rw------- 1 ksilliman ksilliman     65744 May 17 12:25 tmp_WA1_10_R1_10259.fastq
-rw------- 1 ksilliman ksilliman  90076467 May 17 12:25 tmp_OR3_1_R1_10259.fastq
-rw------- 1 ksilliman ksilliman   7164508 May 17 12:25 tmp_WA9_6_R1_10259.fastq
-rw------- 1 ksilliman ksilliman  10307884 May 17 12:25 tmp_WA12_16_R1_10259.fastq
-rw------- 1 ksilliman ksilliman  15102912 May 17 12:25 tmp_WA1_12_R1_10259.fastq
-rw------- 1 ksilliman ksilliman  48974628 May 17 12:25 tmp_WA11_9_R1_10259.fastq
-rw------- 1 ksilliman ksilliman  92471729 May 17 12:25 tmp_WA10_12_R1_10259.fastq
-rw------- 1 ksilliman ksilliman  75982615 May 17 12:25 tmp_WA10_10_R1_10259.fastq
-rw------- 1 ksilliman ksilliman 246736753 May 17 12:25 tmp_WA10_11_R1_10259.fastq
-rw------- 1 ksilliman ksilliman     29802 May 17 12:25 tmp_WA10_14_R1_10259.fastq
-rw------- 1 ksilliman ksilliman  10431331 May 17 12:25 tmp_OR3_13_R1_10259.fastq
-rw------- 1 ksilliman ksilliman 104935177 May 17 12:25 tmp_CA7_9_R1_10259.fastq
-rw------- 1 ksilliman ksilliman   5130878 May 17 12:25 tmp_WA1_9_R1_10259.fastq
-rw------- 1 ksilliman ksilliman  33404870 May 17 12:25 tmp_CA2_10_R1_10259.fastq
-rw------- 1 ksilliman ksilliman 166842437 May 17 12:25 tmp_WA1_8_R1_10259.fastq
-rw------- 1 ksilliman ksilliman   5418741 May 17 12:25 tmp_WA13_8_R1_10259.fastq
-rw------- 1 ksilliman ksilliman   1867617 May 17 12:25 tmp_WA1_7_R1_10259.fastq
-rw------- 1 ksilliman ksilliman  10490523 May 17 12:25 tmp_WA13_6_R1_10259.fastq
-rw------- 1 ksilliman ksilliman 149818739 May 17 12:25 tmp_CA7_8_R1_10259.fastq
-rw------- 1 ksilliman ksilliman 406436948 May 17 12:25 tmp_WA13_3_R1_10259.fastq
-rw------- 1 ksilliman ksilliman    575182 May 17 12:25 tmp_WA12_8_R1_10259.fastq
-rw------- 1 ksilliman ksilliman  28838162 May 17 12:25 tmp_WA11_1_R1_10259.fastq
-rw------- 1 ksilliman ksilliman 289799374 May 17 12:25 tmp_WA11_3_R1_10259.fastq
-rw------- 1 ksilliman ksilliman  13462148 May 17 12:25 tmp_CA5_15_R1_10259.fastq
-rw------- 1 ksilliman ksilliman  12114325 May 17 12:25 tmp_CA5_12_R1_10259.fastq
-rw------- 1 ksilliman ksilliman 108027140 May 17 12:25 tmp_CA2_8_R1_10259.fastq
-rw------- 1 ksilliman ksilliman  67854032 May 17 12:25 tmp_BC3_17_R1_10259.fastq
-rw------- 1 ksilliman ksilliman 184038028 May 17 12:25 tmp_CA1_15_R1_10259.fastq
-rw------- 1 ksilliman ksilliman  19875505 May 17 12:25 tmp_WA9_12_R1_10259.fastq
-rw------- 1 ksilliman ksilliman  26101222 May 17 12:25 tmp_OR2_4_R1_10259.fastq
-rw------- 1 ksilliman ksilliman  24735065 May 17 12:25 tmp_CA1_19_R1_10259.fastq
-rw------- 1 ksilliman ksilliman  28812934 May 17 12:25 tmp_CA5_13b_R1_10259.fastq
-rw------- 1 ksilliman ksilliman    578269 May 17 12:25 tmp_CA5_13a_R1_10259.fastq
-rw------- 1 ksilliman ksilliman  37184240 May 17 12:25 tmp_WA9_8_R1_10259.fastq
-rw------- 1 ksilliman ksilliman 144970290 May 17 12:25 tmp_WA12_22_R1_10259.fastq
-rw------- 1 ksilliman ksilliman   6811865 May 17 12:25 tmp_OR3_12_R1_10259.fastq
-rw------- 1 ksilliman ksilliman  44485970 May 17 12:25 tmp_BC4_3_R1_10259.fastq
-rw------- 1 ksilliman ksilliman     47769 May 17 12:25 tmp_10259_0.p
Isaac Overcast
@isaacovercast
May 17 2017 18:05
Can you show me the output of df -h. Also can you show me the output of ls -ltr again, i want to see if that last line is a fluke, it looks weird: tmp_10259_0.p
Deren Eaton
@dereneaton
May 17 2017 18:06
The .p file is a pickled dict
with the API debugging is a little tricky (well, trickier to turn off). You can turn it on with ip._debug_on().
Deren Eaton
@dereneaton
May 17 2017 18:12
to turn debugging off in the API use ip._set_debug_dict("ERROR").