These are chat archives for dereneaton/ipyrad

3rd
Nov 2016
Cycadales
@Cycadales
Nov 03 2016 00:02
@dereneaton excellent thanks Deren
markusruhsam
@markusruhsam
Nov 03 2016 09:32
I've got paired end RAD data and I don't understand why the vcf file only contains information about the first half of the sequences in a cluster from the .loci file (i. e. before the 'nnnn') and not also from the sequences after the 'nnnn'? Also there's a mismatch in the cluster number between the vcf file (#chrom column) and the .loci file (number between the |, e. g. |230 |) so that the #chrom number in the vcf file is always one more than the || number in the .loci file (for instance #chrom is 231 and the corresponding .loci cluster number is |230|).
Deren Eaton
@dereneaton
Nov 03 2016 13:22
Hi @markusruhsam we know about the locus numbering. We prefer for our numbering to be 0-indexed, b/c that is just a Python thing. But some vcf reading software can't handle that, and so the loci are 1-indexed there.
Deren Eaton
@dereneaton
Nov 03 2016 13:28
Oh, paired support in the vcf output is on the to-do list. We just haven't added it yet. If I have time I'll try today.
joqb
@joqb
Nov 03 2016 13:29

@dereneaton @isaacovercast Hi I did an analysis with a previous version (0.3.42) with all my individuals. Today I updated ipyrad to 0.4.7 and I wanted to create outfiles for a subset of individuals
So I did ipyrad -p params_090_60.txt -b subdata samples_to_keep.txt
and then ran ipyrad -p params-subdata.txt -s 7
but I got the error

Step 7: Filter and write output files for 109 Samples

Encountered an unexpected error (see ./ipyrad_log.txt)
Error message is below -------------------------------
invalid index to scalar variable.

What does it mean? BTW ipyrad_log.txt is empty!

Thanks for the help

Deren Eaton
@dereneaton
Nov 03 2016 13:37
@joqb, some changes between 0.3.x and 0.4.x are not fully compatible. I would recommend either downgrading to get the outputs you want, or to continue with the new version, in which case you will have to run step 6 again as well.
joqb
@joqb
Nov 03 2016 13:46
@dereneaton Ok thanks for the quick answer, I'll try to rerun step 6.
joqb
@joqb
Nov 03 2016 15:17
@dereneaton it worked however I am not able to output .geno files, any idea why? Here are the files I got (using *)
.hdf5
.loci
.phy
.snps.map
.snps.phy
.str
.u.snps.phy
.u.str
.vcf.gz
stats.txt
Deren Eaton
@dereneaton
Nov 03 2016 15:42
@joqb Oh, I see a bug that caused the geno output to fail silently. I'll work on fixing it. But FYI, I'm not sure that the geno output is very useful anymore. The latest versions of the program ADMIXTURE stopped supporting .geno as an input format. I am not sure why.
But I suspect it may be because there is some kind of problem with how it handles that kind of input. I believe it now only takes .ped + .map files as input, which are a huge hassle to produce, in my experience, using something like plink.
I've been using STRUCTURE instead of ADMIXTURE for my recent analyses, and it gives much more reasonable results. But it is slow and needs to be run for reeeeaaaallly long, and it starts to become unfeasible for thousands of SNPs if you have large numbers of samples (e.g., >50). I'm hoping to support the fineradstructure input format soon, which hopefully performs better, but I haven't tested it yet. There's a few other alternatives available too. But in short, I don't fully trust ADMIXTURE results, so I recommend caution, or at least testing with more than one program.
Edgardo M. Ortiz
@edgardomortiz
Nov 03 2016 16:08
This error in step 4 for PE reads persists:
2016-11-03 10:39:29,984 pid=31575 [jointestimate.py] ERROR Sample DIP-Dmeye failed with error IndexError(index 28449 is out of bounds for axis 0 with size 28449)
This is my proposed fix, probably not the best but after expanding the array, step 4 ends without error:
edgardomortiz/ipyrad#1
Deren Eaton
@dereneaton
Nov 03 2016 16:08
@edgardomortiz oh yeah, sorry I forgot about that. Will check it out.
Edgardo M. Ortiz
@edgardomortiz
Nov 03 2016 16:10
No problem, I remembered because I got the error message again this morning with 0.4.7
Edgardo M. Ortiz
@edgardomortiz
Nov 03 2016 16:41
@dereneaton please ignore the previous link, this is the correct one:
dereneaton/ipyrad#198
joqb
@joqb
Nov 03 2016 17:57
@dereneaton OK thanks. I need geno file for using tess3r (Structure + spatial data). Did you ever try fastStructure? It deals relatively well with SNP data.
Deren Eaton
@dereneaton
Nov 03 2016 17:59
@joqb Ok, cool, I haven't tried tess3r, but the bugfix to get the geno outputs again is very easy, I will push it sometime today. I tried faststructure once, maybe I should give it another look.
Deren Eaton
@dereneaton
Nov 03 2016 19:38
@edgardomortiz Can you check the change I just pushed to Master? I think the problem is fixed.
Edgardo M. Ortiz
@edgardomortiz
Nov 03 2016 20:17
Cool, I will do!
Edgardo M. Ortiz
@edgardomortiz
Nov 03 2016 20:29
I see, you are building a custom-sized arrays for each sample. I will test it with my data, should I run step 3 again because of the new functions in cluster_within.py ?
Deren Eaton
@dereneaton
Nov 03 2016 20:30
No, try running from 4 and/or 5.
it should work for sure from step3, I'm hoping to have you test on your data whether it works when you change mindepth after step 3
it worked for me, but I was testing on sim data.
Edgardo M. Ortiz
@edgardomortiz
Nov 03 2016 20:32
OK, I will test it also without change in mindepth since I got the error this morning without having changed it.
I will let you know what happens in each case
Deren Eaton
@dereneaton
Nov 03 2016 20:33
thanks
Edgardo M. Ortiz
@edgardomortiz
Nov 03 2016 20:36
Question: I have all set up in the cluster, how do I pull the changes to ipyrad to the cluster, I guess the changes are not available through conda yet
Deren Eaton
@dereneaton
Nov 03 2016 20:37
if you have the git repo cloned, then you can go into the repo and run git pull to update it, and then pip install -e . to install the local version of ipyrad and your working version. This will take priority over the conda version.
Edgardo M. Ortiz
@edgardomortiz
Nov 03 2016 20:38
Nice, thanks
Deren Eaton
@dereneaton
Nov 03 2016 20:39
if you are using ipcluster you also have to restart that when you make changes to ipyrad, because it will otherwise keep running the version of ipyrad, and all other Python packages, that were installed when it was launched.
Edgardo M. Ortiz
@edgardomortiz
Nov 03 2016 20:41
do I just delete the ipyrad profile in the .ipython/ folder?
Deren Eaton
@dereneaton
Nov 03 2016 20:45
no, that's different, it's just a name for an individual ipcluster instance/settings. It doesn't actually have anything to do with ipyrad, it's no different from 'profile_default', but by setting a unique name the idea is that if you were running some other code somewhere you would not have ipyrad fighting with your 'default' cluster for engines.
you just need to run ipcluster stop if you happen to have ipcluster running now.
then ipcluster start to start it again after you update ipyrad
Edgardo M. Ortiz
@edgardomortiz
Nov 03 2016 20:46
ah, got it
Deren Eaton
@dereneaton
Nov 03 2016 20:46
and if you are just letting ipyrad launch ipcluster on its own then none of this matters.
Edgardo M. Ortiz
@edgardomortiz
Nov 03 2016 21:37
Same error when I don't change mindepth:
2016-11-03 16:35:34,484 pid=26317 [jointestimate.py] ERROR Sample PIO-Dcine failed with error IndexError(index 123 is out of bounds for axis 0 with size 123)
now will start the test on the branch reducing mindepth...
however, this time the sample with fewest loci is the one that failed
Deren Eaton
@dereneaton
Nov 03 2016 21:40
oh, does the sample that is failing maybe have 0 clusters with depth>mindepth_majrule?
Edgardo M. Ortiz
@edgardomortiz
Nov 03 2016 21:43
it does have several clusters above mindepth_majrule
Deren Eaton
@dereneaton
Nov 03 2016 21:44
If you could send me the clustS.gz file I can investigate.
Edgardo M. Ortiz
@edgardomortiz
Nov 03 2016 23:29
Step 3 was run with mindepth_statistical=6 and mindepth_majrule=6, I branched with mindepth_statistical=5 and mindepth_majrule=4 and step 4 ended normally without errors