These are chat archives for dereneaton/ipyrad

29th
Oct 2016
James Clugston
@Cycadales_twitter
Oct 29 2016 00:29
@dereneaton @isaacovercast actually looking at the error it looks like one sample is not good and that is what is cause the problems. removing this sample seems to have made it work though.
Deren Eaton
@dereneaton
Oct 29 2016 19:32
@Cycadales_twitter hey James, cool to see you're using iplant, I haven't tried it yet. It looks like the error is reporting that the two paired read files for that sample to do not match up. Did you apply any prefiltering to them? Did you demultiplex them in ipyrad? Or did one of the files perhaps become corrupted or not finish transferring?
@SheaML it is hard to know without more information whether or not something is wrong. Not all steps can be fully parallelized at all times, so there are instances during which fewer cpus will be used. During step 3 ipyrad may devote 2 or 4 threads (cpus) per sample for clustering, but depending on how efficiently vsearch can cluster the data, each sample may run at only around 100% (1 cpu) even though it has more resources available. We try to automatically optimize this, but it is not always perfect. Also, if you are connecting to multiple nodes, for example 4 8-core nodes, and you use 'top' to check the resource use, you will only be able to see the 8 cores running on the node you're connected to, even though the others are all running, just maybe somewhere else. When you use the MPI option in ipyrad it should print how many cpus it is connecting to and from which host. If this is info is printed, then it has found and is using all of those cpus.
Deren Eaton
@dereneaton
Oct 29 2016 19:40
@ahaponski_twitter you can not set the outgroup specifically in ipyrad. Instead, our approach is to simply treat the outgroup like any other population, which you can designate populations to using the 'popfile' paramter. This allows you to set a minimum number of samples that must have data from each population for the locus to be kept in the final data set. actually, though, we haven't finished implementing the population minimums yet, so currently you can only filter my minsamples, but the pop setting will come soon, and when it is ready you will only have to rerun step 7 (very fast) to get the new outputs.
@ahaponski_twitter One difference is that in pyrad the outgroups would be put to the end of the list of reads when clustering across samples in step 6, with the idea that ingroups samples should be first clustered to each other, and only after that clustered to the outgroups. This was also implemented more formally in the 'hierarchical' clustering method. Though we may choose to re-implement this feature in the future, I found that because clustering speed has improved so much, it is no longer worth using hierarchical method just for a speed increase, and also that for most reasonable cluster threshold settings (0.80-0.95) the outgroup will cluster with the ingroup just fine, even over quite divergent time scales.
James Clugston
@Cycadales_twitter
Oct 29 2016 20:56
@dereneaton I think the problem one with one file that is it. When I removed that file it worked fine...but yes I trimmed all the reads using TRIMMOMATIC before running them though ipyrad.
Deren Eaton
@dereneaton
Oct 29 2016 20:58
@Cycadales_twitter OK, cool. We still want to fix it, since if one sample fails it should just leave that one in state=1, and move on with the others. It seems like trimmomatic should filter the R1s and R2s in a way that is synced so that you cannot end up with the two mismatched. That is what 'cutadapt` does, which is what ipyrad implements in step2 for read trimming (if filter_adapters is set to 2).
James Clugston
@Cycadales_twitter
Oct 29 2016 21:03
@dereneaton ahh ok well I am not sure why this one sample has failed. When I do use trimmomatic I do use it in PE mode as it seems to check if reads can be paired or not. Oh with regards to iPlant it is good them they give you the resources you need. But every time I get the resources I need I crop up with an error and I get no where. I no that is just the nature of this type of work but its quite stressful. By the way I only set the filter adapter to 1 as I already use adapter trimming in trimmomatic. Can I ask what would you do about this sample?
Deren Eaton
@dereneaton
Oct 29 2016 21:09
@Cycadales_twitter I would first take a look at the files and figure out why read1 NS500799:97:HTV37BGXX:4:11401:6904:1031 1:N:0:TAATGCGC+AGGCTATA does not match NS500799:97:HTV37BGXX:2:11101:16232:1067 1:N:0:TAATGCGC+AGGCTATA in the read2 file. It may be something very simple. Maybe remake the files in trimmomatic. Or you could skip trimmomatic and use the unfiltered data in ipyrad. To make sure it works you could start by running just that one sample on your laptop, or anywhere that you can get fast access to a compute node.
James Clugston
@Cycadales_twitter
Oct 29 2016 21:21
@dereneaton I think I could just run that sample though trimmomatic as I am unfamiliar with the method in ipyrad at the moment. I am just reading about it now.
James Clugston
@Cycadales_twitter
Oct 29 2016 22:42
@dereneaton tried the sample again and run thought trimmomatic and the same thing happens. I will just drop this samples as one samples from a population will not be the end of the world
Deren Eaton
@dereneaton
Oct 29 2016 22:45
weird, but yeah, maybe you can figure it out later.
James Clugston
@Cycadales_twitter
Oct 29 2016 22:51
@dereneaton I think this may be the sample that caused me problems back in July but I could never really get to the bottom of it. I also tried my other datasets and they get past the filter with no problems. Any luck with the step six error?
Deren Eaton
@dereneaton
Oct 29 2016 22:53
it's on my docket for tomorrow. I've been travelling the last two weeks.so it's been a bit delayed, but should be simple to fix, I think.
James Clugston
@Cycadales_twitter
Oct 29 2016 22:54
@dereneaton ok excellent. I take it I will not need to run step three again? I can just run it from step 6? Thank you Daren I do appreciate it. When is the ipyrad paper coming out?
Deren Eaton
@dereneaton
Oct 29 2016 22:57
Hopefully as a preprint within a few weeks. We really need to buckle down on it.