These are chat archives for dereneaton/ipyrad

7th
Nov 2016
Isaac Overcast
@isaacovercast
Nov 07 2016 00:02
v.0.5.0 - Huge performance improvement for reference sequence mapping. Also, reference is not reindexed on -f for step 3 (saves time).
Deren Eaton
@dereneaton
Nov 07 2016 01:12
@jldimond the VCF file now has the DP info for each sample.
@ksil91 I found the source of the VCF "4/4" bug and it is now corrected. Thanks.
Edgardo M. Ortiz
@edgardomortiz
Nov 07 2016 07:07
@dereneaton, just to report that step 4 works without problems now
Deren Eaton
@dereneaton
Nov 07 2016 15:00
:rocket:
Edgardo M. Ortiz
@edgardomortiz
Nov 07 2016 18:05
@dereneaton @isaacovercast I found a bug on step 2, I tried writing the actual cutsites sequences on the edit_cutsites parameter and that caused step 2 to fail, it must be because cutadapt is expecting numbers for trimming the cutsites instead of sequences.
Deren Eaton
@dereneaton
Nov 07 2016 18:06
Oh yeah, I forgot to change that in the documentation and params parsing that we no longer support the sequence input.
Edgardo M. Ortiz
@edgardomortiz
Nov 07 2016 18:06
I was using 2 for filter_adapters
ok, I will update my params file
Deren Eaton
@dereneaton
Nov 07 2016 18:07
yeah you should just have to update edit_cutsites to be integers for the lengths of sequence you want trimmed.
Alexander McKelvy
@SnakeEvolution_twitter
Nov 07 2016 19:05

Hi guys - I used the branching command to create subsets of my dataset using a text file with names like you mention and is documented in the advanced tutorial at the bottom of this page http://ipyrad.readthedocs.io/tutorial_advanced_cli.html#tutorial-advanced-cli
I am a little confused as to how I use these though - when I create the subset, iPyrad says I've made it with the correct number of individuals, but then when I use those param files in the basic ipyrad steps it still chugs through the entire plate and creates a fastqs folder with all sequences.

So we start with this to get the subset: ipyrad -p params-paramtest.txt -b subset1 subset1.txt
Then what command would you use to just run ipyrad on this subset, exporting just this subset?
I thought it would go like: ipyrad -p params-subset1.txt -s 1 --force
but this uses the entire plate's data just as if I hadn't created a subset - I'm missing some important step or operator, or maybe I'm jumping too far back in the process.

Deren Eaton
@dereneaton
Nov 07 2016 19:14

@SnakeEvolution_twitter

Step 1 works a little differently from the other steps, because it is reading in the raw data and creating the samples. Thus you can't subselect samples until after step1, i.e., after the samples are created. A typical workflow in which samples are separated into different assemblies might look like the following:

## demultiplex all samples
ipyrad -p params-all.txt -s 1 

## branch to separate samples into studies
ipyrad -p params-all.txt -b study1 listofsamples.txt
ipyrad -p params-all.txt -b study2 listofothersamples.txt

## analyse those data sets
ipyrad -p params-study1.txt -s 234567
ipyrad -p params-study2.txt -s 234567
Alexander McKelvy
@SnakeEvolution_twitter
Nov 07 2016 19:44

That makes sense, thanks! I have it running like that, I should have been more patient to see what it would output. I was going to wait to ask, but was confused by step 4 because I got this output:

Joint estimation of error rate and heterozygosity
skipping 114. Too few high depth reads (1.0).
skipping 43. Too few high depth reads (1.0).

Neither samples 114 or 43 were included as part of that subset - is it expected that they would be used in this step?

Deren Eaton
@dereneaton
Nov 07 2016 19:48
If the samples have almost no data then it is best to exclude them by branching before step 4.
Alexander McKelvy
@SnakeEvolution_twitter
Nov 07 2016 20:21

I thought that I had excluded them by specifying the new branch, then re-running using the "params-study1.txt" that the branching command generated. I was confused because it was using those two even though they weren't supposed to be retained.

My workflow was like this -
1) Ran the whole plate together as a test/tutorial, to make sure things were working and to learn the software, work out any problems with hardware/software on my end
2) Created branches for my different species - for example "ipyrad -p params-paramtest.txt -b Coluber Coluber.txt" where coluber.txt had the sample names. This returned a positive message, with the correct # of samples in the subset listed
3) Reran this branch - ipyrad -p params-Coluber.txt -s 2345 --force

Each step it says it's referencing Coluber.json - that seems right, but I get the "skipping" notice despite not including those samples in the text file in #2 above

Deren Eaton
@dereneaton
Nov 07 2016 20:55
@SnakeEvolution_twitter When you use the -r option do you see stats for those two samples in the Coluber Assembly?
or do you see them in the fullstats files it produces?
Deren Eaton
@dereneaton
Nov 07 2016 21:05
I just ran a quick test and it appears to be working correctly:
deren@tinus:~/Documents/ipyrad/tests$ ipyrad -v
ipyrad 0.5.0
deren@tinus:~/Documents/ipyrad/tests$ ipyrad -p params-cli.txt -s 123

 -------------------------------------------------------------
  ipyrad [v.0.5.0]
  Interactive assembly and analysis of RAD-seq data
 -------------------------------------------------------------
  New Assembly: cli
  local compute node: [40 cores] on tinus

  Step 1: Demultiplexing fastq data to Samples
  [####################] 100%  chunking large files  | 0:00:00  
  [####################] 100%  sorting reads         | 0:00:42  
  [####################] 100%  writing/compressing   | 0:00:01  

  Step 2: Filtering reads 
  [####################] 100%  processing reads      | 0:00:08  

  Step 3: Clustering/Mapping reads
  [####################] 100%  dereplicating         | 0:00:00  
  [####################] 100%  clustering            | 0:00:01  
  [####################] 100%  building clusters     | 0:00:01  
  [####################] 100%  chunking              | 0:00:00  
  [####################] 100%  aligning              | 0:01:04  
  [####################] 100%  concatenating         | 0:00:00  

deren@tinus:~/Documents/ipyrad/tests$ ipyrad -p params-cli.txt -b sub 1A_0 1B_0 1C_0 1D_0

  loading Assembly: cli
  from saved path: ~/Documents/ipyrad/tests/cli/cli.json
  creating a new branch called 'sub' with 4 Samples
  writing new params file to params-sub.txt

deren@tinus:~/Documents/ipyrad/tests$ ipyrad -p params-sub.txt -s 4 -r

 -------------------------------------------------------------
  ipyrad [v.0.5.0]
  Interactive assembly and analysis of RAD-seq data
 -------------------------------------------------------------
  loading Assembly: sub
  from saved path: ~/Documents/ipyrad/tests/cli/sub.json
  local compute node: [40 cores] on tinus

  Step 4: Joint estimation of error rate and heterozygosity
  [####################] 100%  inferring [H, E]      | 0:00:34  


Summary stats of Assembly sub
------------------------------------------------
      state  reads_raw  reads_passed_filter  clusters_total  clusters_hidepth  \
1A_0      4     199615               199615           10000             10000   
1B_0      4     200606               200606           10000             10000   
1C_0      4     200318               200318           10000             10000   
1D_0      4     199379               199379           10000             10000   

      hetero_est  error_est  
1A_0    0.001980   0.000758  
1B_0    0.001897   0.000743  
1C_0    0.001933   0.000744  
1D_0    0.001918   0.000749
@SnakeEvolution_twitter If you can paste the printout from ipyrad we can probably get to the bottom of it.