These are chat archives for dereneaton/ipyrad

Aug 2016
Aug 03 2016 03:06
@dereneaton Thanks! Looking forward to the preprint.
Deren Eaton
Aug 03 2016 04:16
@RobertaDamasceno setting the 'trim_overhang' back to its default value (4,4,4,4) should fix the problem.
Aug 03 2016 18:18
Thanks, @dereneaton ! It worked! Thanks! But what does it mean to have 4, 4, 4, 4? The ipyrad manual doesn't describe this parameter. Is this somewhat similar to pyrad parameter in line 29 "Line 29: Allow overhanging ends of reads in final data set. If reads are different lengths or overlap to different degrees, 1,1 will trim to shortest sequence on either side of locus; 0,1 would trim only the right side; 0,0 allows both ends to overhang. For paired data 1,1,1,1 would trim overhangs on left1,right1,left2,right2"? Thanks a lot!
Aug 03 2016 19:32
I'm having problems trying to creating branches. I ran steps 1-6 with one dataset, then created a branch with a subset of the data and tried to run from step 4 and on with the subsample dataset. I ran step 1 with the subsample and then tried to run -s 4567 but it didn't work.

yeti:damasceno Yeti$ ipyrad -p data2-p12_12_p13_1000_P14_09_P21_45_P22_20_P24_025.txt -b subdata subdata1_samples_cat_pic.txt -f

loading Assembly: data2
from saved path: ~/Documents/damasceno/ipyrad_test_2/data2.json
Creating a new branch called 'subdata' with 71 Samples
Writing new params file to params-subdata.txt

yeti:damasceno Yeti$ ipyrad -p subdata1_params.txt -s 1

ipyrad [v.0.3.25]

Interactive assembly and analysis of RAD-seq data

New Assembly: subdata1
ipyparallel setup: Local connection to 24 Engines

Step1: Linking sorted fastq data to Samples
Linking to demultiplexed fastq files in:
91 new Samples created in subdata1.
91 fastq files linked to 91 new Samples.

yeti:damasceno Yeti$ ipyrad -p subdata1_params.txt -s 4567

ipyrad [v.0.3.25]

Interactive assembly and analysis of RAD-seq data

loading Assembly: subdata1
from saved path: ~/Documents/damasceno/ipyrad_test_2/subdata1.json
ipyparallel setup: Local connection to 24 Engines

Step4: Joint estimation of error rate and heterozygosity
No Samples ready for joint estimation. First run step3().

Step5: Consensus base calling
No Samples ready for consensus calling. First run step4().

Step6: Clustering across 0 samples at 0.9 similarity
No Samples ready for clustering. First run step5().

Step7: Filter and write output files for 91 Samples
Database file not found. First run step6

Not really sure what went wrong in the branching process. Thanks tons!
Isaac Overcast
Aug 03 2016 19:34
Hola Roberta!
Isaac Overcast
Aug 03 2016 19:47
I see what's happening here. Each sample has an associated state to track what step it's at in the assembly process. If you run a set of samples up through step 6, then branch a subset and re-run step 1 the branched samples will be reset TO state 1, so trying to run later steps on these samples will fail. The logic is that if you run any given step on any sample all previous obtained results for later steps for that sample will be invalid because you may have changed assembly parameters that impact these results.
The best way to do what it sounds like you want is to run all samples as one assembly through step 3, then create different branches for each subset of samples you're interested in and run steps 4-7 on each branch.
Branching is useful, but not super straightforward, it's def on my list to improve the documentation for this.
Deren Eaton
Aug 03 2016 20:46
@RobertaDamasceno I think from what I'm reading is that you ran one data set and then wanted to run it again under a different set of params starting from step4. You had the right idea with branching, however, after branching and subselecting taxa you only had to continue your run from step 4 (and use the force flag to tell it you want to continue these running these samples from step4). The problem arose when you ran step1, which acts a little differently from the other steps. Because you ran step1 it reverted the sample states to state1, I believe. So, I expect your code should look something like this:
## the old assembly you ran
ipyrad -p params-old.txt -s 1234567

## branching to make a new assembly with a subset of taxa
ipyrad -p params-old.txt -b new subset.txt

## modify the params-new.txt to enter new params
## ...

## run steps 4-7 on the new assembly
ipyrad -p params-new.txt -s 4567 -f