These are chat archives for dereneaton/ipyrad

15th
May 2016
Cycadales
@Cycadales
May 15 2016 14:01
I guys sorry thing bring something else up...their seems to be some missing files on the advanced CLI tutorial. I download the curl -LkO https://github.com/dereneaton/ipyrad/raw/master/tests/ipsimdata.tar.gz and run command tar -xvzf ipsimdata.tar.gz. there are two files that seem to be missing .simrad_test_R1.fastq.gz and sim_rad_test_barcodes.txt. I have checked the directly and it just does not seem to be there. Am I missing something?
Isaac Overcast
@isaacovercast
May 15 2016 14:42
I'll check it out.
Isaac Overcast
@isaacovercast
May 15 2016 15:03
@Cycadales We recently regenerated all the sim data and change the naming scheme, the new files for the advanced tutorial are rad_example_R1_.fastq.gz and rad_example_barcodes.txt. I'm updating the docs now.
Deren Eaton
@dereneaton
May 15 2016 17:11
Yeah, sorry @Cycadales, the tutorials are in flux at the moment, I've been editing them the last few days.
Deren Eaton
@dereneaton
May 15 2016 21:59

@isaacovercast, New feature implemented in 0.2.6. I'm planning to deprecate the excludes parameter. Instead, subsampling taxa will be done during branching. As an example, you can subsample a data set by listing the names you want to include after a branch name:

ipyrad -p params-data1.txt -b data2 1A_0 1B_0 1C_0

This seems like a much cleaner way of doing things. It reports back names that do not match, but we could add other user-friendliness to it, like a way of listing names to exclude, instead of just those to include...

Isaac Overcast
@isaacovercast
May 15 2016 22:06
That makes sense. One immediate feature I could see that would be useful is wildcard expansion for sample names
Deren Eaton
@dereneaton
May 15 2016 22:08
Yeah, we could def. do that. The relevant code is in:
ipyrad.__main__.branch_assembly() and ipyrad.core.assembly.branch.
Isaac Overcast
@isaacovercast
May 15 2016 22:36
I was thinking, one thing we do need to have straight before release is at least an idea of how people with multiple plates can merge these before (or maybe during) step 7. Or alternatively make it so you can specify multiple raw input and barcode files. Some way to support people with multiple plates of data, which is getting more and more common.
Deren Eaton
@dereneaton
May 15 2016 22:36
yeah, for sure. Don't we have a merge method?
Isaac Overcast
@isaacovercast
May 15 2016 22:37
We have a merge method ticket #120 lol. I can work on it.
Deren Eaton
@dereneaton
May 15 2016 22:38
I guess I remember us talking about how it gets a bit complicated about which statsfiles to list in the summary, etc.
Isaac Overcast
@isaacovercast
May 15 2016 22:38
Yes, i remember we talked about it and it felt like opening pandoras box...
Deren Eaton
@dereneaton
May 15 2016 22:39
But yeah, I think merging should require creating a new assembly, like branching.
ipyrad -m newAssembly data1 data2 data3
Isaac Overcast
@isaacovercast
May 15 2016 22:39
Agreed.
Deren Eaton
@dereneaton
May 15 2016 22:40
and it just deepcopies all of the samples into a new Assembly, and complains if the parent assemblies have a difference in their params besides file paths.
Isaac Overcast
@isaacovercast
May 15 2016 22:41
good idea.
Deren Eaton
@dereneaton
May 15 2016 22:42
the statsfiles are just a string, so we can write whatever we want for the new assembly, like "merged(data1,data2,data3)".
the statsdfs can be regenerated if we want... but there would not be clear directories to put printouts of them into... which is fine as long as the data is in the JSON and can be used going forward.
Isaac Overcast
@isaacovercast
May 15 2016 23:08
Is there really any advantage to merging before step 7? What if merging was only something you did in step 7 and it didn't create a new assembly, just wrote out merged output files for all the input assemblies. Seems like the merging thing could get messy, maybe we shouldn't do it unless there's a good reason?
We already have the assembly.merge() func, but im not familiar with how well it works. I guess i'll just mess with it and see what makes most sense.
Deren Eaton
@dereneaton
May 15 2016 23:55
I could imagine two main scenarios for merging:
  1. You demultiplex several Assemblies from different files+barcodes and then merge them before step2 into a single data set.
  2. You sequenced new Samples that you want to add to an existing data set, so you run the new Samples in an Assembly for steps 1-5 and then merge it with the old Assembly before clustering-across Samples in step6.