These are chat archives for dereneaton/ipyrad

27th
Feb 2017
s-arvidsson-lgc
@s-arvidsson-lgc
Feb 27 2017 10:42
@dereneaton Hi Deren, let me know whether you need some data from me to replicate the problems I have with the VCF. For now, I will filter out the problematic loci.
Amanda Haponski
@ahaponski_twitter
Feb 27 2017 15:00
Hi, I am running my ipyrad .phy output files in RAxML and am seeing some weird results. I am trying to vary my coverage of individuals and have decreased the coverage from 70% to 60% of my samples. When I analyze the 60% coverage, I go from high supports (95-100%) for my tree nodes to 0s with the placement of many individuals unresolved when in the 70% analyses they had formed well defined and supported clades. However, if I run the .snps.phy file then I get a similar topology to the 70% with high supports. Any insights as to what is going on? I've checked my alignments and there's nothing glaringly wrong with them. I've also re-run ipyrad and got the same result. Thanks in advance.
Deren Eaton
@dereneaton
Feb 27 2017 16:33

@ahaponski_twitter , I have a few thoughts:

  1. In general I would not use the snps.phy file in RAxML unless you are also using one of its ascertainment-bias models to account for the fact that you've excluded all invariant sites.

  2. There have been quite a few papers showing that in general you will get more accurate phylogenetic results by including more data, to the extent it's computationally feasible. Meaning you can use Sample coverage as low as 4 samples (minimum needed to attain phylogenetic information) and expect better results than requiring data to be shared across 70% of samples. In general, I usually see the same topology supported for the data from most values of min coverage up until you require too stringent of value, and then you start to get bad trees simply because there is not enough information left in the data, i.e., there is too little data left.

  3. There is no general rule about whether to expect 60% or 70% coverage to be meaningful about how much phylogenetic information there is. What really matters is how much data (and specifically variable data) is shared by your samples. If you have 70% coverage across 1000 sites that will most likely have much less information than 10% coverage across 10M sites.

  4. However, it is strange that you would get a highly supported tree from the 70% snps.phy but not from the .phy file, I'm not sure what's going on there, but the .snps.phy data is probably being poorly modeled, as I mentioned above.

Cheers,

LinaValencia85
@LinaValencia85
Feb 27 2017 18:22
@dereneaton I have updated ipyrad to version 0.6.8 and when I try to run a job and I get the following error: No ipcluster instance found. This may be a problem with your installation setup. I would recommend that you contact the ipyrad developers.
Deren Eaton
@dereneaton
Feb 27 2017 18:24
Hi @LinaValencia85 was this on a HPC cluster?
Katherine Silliman
@ksil91
Feb 27 2017 18:24
@dereneaton @isaacovercast I'm rerunning step 3,4,5,6 with dataset that initially run with v.0.5.15 at 80%, mindepth_stat = 6. mindepth_majrule =4, now with the v.0.6.6 and 80%, mindepth_stat = 6%, mindepth_majrule = 6 and all else held the same. Comparing the clusters_total column in -r results between the 2 runs, my v.0.5.15 run has fewer clusters than the v.0.6.6. I would expect clusters_total to stay the same(since clustering similarity is the same) and fewer clusters in the newer run (since mindepth_majrule is higher). What changed in step 3 between the versions to cause this? Thanks!
Deren Eaton
@dereneaton
Feb 27 2017 18:26
@ksil91 what kind of data do you have (e.g., "rad" or "gbs", or "pairddrad", etc)?
LinaValencia85
@LinaValencia85
Feb 27 2017 18:26
@dereneaton Yes, lonestar5 from TACC: https://portal.tacc.utexas.edu/user-guides/lonestar5
you had to work with @edgardomortiz before to figure it out. It was running perfectly with previous ipyrad versions.
Deren Eaton
@dereneaton
Feb 27 2017 18:27
lol, yeah that cluster keeps giving us trouble.
the problem is basically that it sometimes takes >30 seconds for the parallel engines to spin up and be ready, but we tell ipyrad to raise an error if all the engines aren't ready by 30 seconds or so. There is a fix for it:
@LinaValencia85 are you trying to run on a single node or multiple nodes?
Katherine Silliman
@ksil91
Feb 27 2017 18:29
@dereneaton SE GBS
LinaValencia85
@LinaValencia85
Feb 27 2017 18:29
single node.
@dereneaton single node
Deren Eaton
@dereneaton
Feb 27 2017 18:30
OK, in your submission script put the following:
ipcluster start --daemonize
sleep 120
ipyrad -p params-{yours}.txt -s 123 --ipyclient
This tells it to start a cluster separately, then wait 120 seconds, and then look for the cluster with ipyrad.
Alexander McKelvy
@SnakeEvolution_twitter
Feb 27 2017 18:39

Hi guys - I'm having a problem now in branching, iPyrad works fine without it, but when I try to create a new assembly like I did in the past using a txt file with sample names, I get a relatively long error that starts:

Failed to read input file with sample names.
list index out of range
Traceback (most recent call last):
File "/home/tayne/miniconda2/bin/ipyrad", line 11, in <module>
load_entry_point('ipyrad==0.6.8', 'console_scripts', 'ipyrad')()

and the debug file says:
2017-02-27 13:27:36,360 pid=21803 [load.py] DEBUG skipping: no svd results present in old assembly

LinaValencia85
@LinaValencia85
Feb 27 2017 18:40
@dereneaton I am getting this error now:
ipyrad: error: unrecognized arguments: --ipyclient
Should I try --ipcluster instead?
Deren Eaton
@dereneaton
Feb 27 2017 18:40
Oh, sorry, my bad, I mean --ipcluster
@SnakeEvolution_twitter I think this might be caused by blank lines in the sample names file. I was just able to recreate the problem that way. Can you check for that?
I'll work on a fix for that.
LinaValencia85
@LinaValencia85
Feb 27 2017 18:44
@dereneaton It did! Thanks!
Deren Eaton
@dereneaton
Feb 27 2017 18:47
@LinaValencia85 Great! When you run it this way it will connect to all available cores on the node. If you want to learn about more options to ipcluster you can find more info here (http://ipyparallel.readthedocs.io/en/latest/process.html).
LinaValencia85
@LinaValencia85
Feb 27 2017 18:47
@dereneaton ok! Thanks
Alexander McKelvy
@SnakeEvolution_twitter
Feb 27 2017 18:52

Interesting @dereneaton - there was one blank line. I deleted it and the program then got a step further, listing the number of samples and writing a new param file. It seems to have worked as I can start running step 6 on the new assembly, but it produced a huge amount of error reports, starting:

Traceback (most recent call last):
File "/home/tayne/miniconda2/lib/python2.7/logging/init.py", line 861, in emit
msg = self.format(record)
File "/home/tayne/miniconda2/lib/python2.7/logging/init.py", line 734, in format
return fmt.format(record)
File "/home/tayne/miniconda2/lib/python2.7/logging/init.py", line 465, in format
record.message = record.getMessage()
File "/home/tayne/miniconda2/lib/python2.7/logging/init.py", line 329, in getMessage
msg = msg % self.args

Deren Eaton
@dereneaton
Feb 27 2017 18:54
@SnakeEvolution_twitter hmm, I've never seen that before. Seems to be a problem with the debugger. Are you using the -d flag? Try just running it again.
Alexander McKelvy
@SnakeEvolution_twitter
Feb 27 2017 18:55
I have been running with the -d flag. Yeah, it seems to be working so far, thanks!
Deren Eaton
@dereneaton
Feb 27 2017 18:56
ok, we'll just call that logging bug a fluke. In general, unless you're having problems avoid the -d flag because it can really slow things down.
LinaValencia85
@LinaValencia85
Feb 27 2017 21:14
@dereneaton @isaacovercast I have another issue with this new ipyrad version. I am running with the reference+denovo pipeline the same samples that I did for the reference pipeline, but in the former I am loosing all of my reads in step 2, as they are all being filtered by min-length. In my params file I have:
35 ## [17] [filter_min_trim_len] and
2 ## [16] [filter_adapters]
Is this new version doing something different with cutadapt?
Deren Eaton
@dereneaton
Feb 27 2017 21:21
@ksil91 Yes, we changed a major setting in how reads are clustered for GBS data. Because GBS reads can be sequenced from either end (same cutsite overhang occurs on both ends) you can get reads that overlap partially or completely if the size selection allows fragment smaller than 2X length of reads. We used to require that the reads overlap at least 33%, but I increased this value to 50%.
It seemed to yield cleaner results. The value is hard-coded, but we could make it so that users can modify the minimum overlap setting.
@LinaValencia85 yes, we made some changes to the cutadapt settings. What is your data type?
LinaValencia85
@LinaValencia85
Feb 27 2017 21:23
ddRAD PE 150bp
Deren Eaton
@dereneaton
Feb 27 2017 21:24
@LinaValencia85 do you have a value entered in your params file for the "trim_reads" parameter?
that is a new parameter that is replacing edit_cutsites
LinaValencia85
@LinaValencia85
Feb 27 2017 21:26
@dereneaton well I was using the old params files that I had for my previous runs, and I just realized that there is no "trim_reads" parameter.
however, I was using: 5, 4 ## [25] [edit_cutsites]:
Deren Eaton
@dereneaton
Feb 27 2017 21:26
aha, that is the problem.
LinaValencia85
@LinaValencia85
Feb 27 2017 21:26
I am going to redo my params file and run it again! Thanks!!!!
Deren Eaton
@dereneaton
Feb 27 2017 21:27
@LinaValencia85 OK, the trim_reads parameter is a little more intuitive in how it works. http://ipyrad.readthedocs.io/parameters.html#trim-reads
LinaValencia85
@LinaValencia85
Feb 27 2017 21:27
@dereneaton cool! Thanks!