These are chat archives for dereneaton/ipyrad

19th
Sep 2017
Wind-ant
@Wind-ant
Sep 19 2017 01:21
thanks a lot @dereneaton @isaacovercast
arminf
@arminf82
Sep 19 2017 11:01
@isaacovercast Thanks, ya, I just saw that! Thank you.
tommydevitt
@tommydevitt
Sep 19 2017 16:40

@isaacovercast I've been launching the ipcluster from a batch script

ipcluster start --n=96 --engines=MPI --ip='*' --profile=$profile

I haven't tried launching it locally, but have been able to launch and run it remotely in non-MPI mode.

Isaac Overcast
@isaacovercast
Sep 19 2017 17:06
@tommydevitt What are you setting the value of $profile as?
tommydevitt
@tommydevitt
Sep 19 2017 17:06
@isaacovercast I didn't. What should that value be?
Isaac Overcast
@isaacovercast
Sep 19 2017 17:51

@tommydevitt You can just leave it off. Run it like this:

ipcluster start --n=96 --engines=MPI --ip='*'

then run ipyrad like this:

ipyrad -p params-1.txt -s 1 --MPI --ipcluster
the --profile flag is used if you want to name your ipcluster instance, but it just uses the default one if you don't specify.
Deren Eaton
@dereneaton
Sep 19 2017 18:01
yeah that's probably the problem @tommydevitt. The profile name from your batch script ($profile) needs to match the profile name in your python code (ipp.Client(profile='MPI96')).
tommydevitt
@tommydevitt
Sep 19 2017 18:07
@dereneaton @isaacovercast Thanks y'all. So, if I leave it off the batch script, I don't need to include it in the python code?
Deren Eaton
@dereneaton
Sep 19 2017 18:21
that's right. The only real reason to use a profile argument is if you are running multiple ipcluster instances at the same time, to keep them distinct. The other use is to customize a profile by editing it's setting so that it will always use the same arguments. I typically just provide a name for it so I remember what settings I used to start it (.e.g, like MPI96).
tommydevitt
@tommydevitt
Sep 19 2017 18:22
Gotcha.
toczydlowski
@toczydlowski
Sep 19 2017 19:47
@dereneaton Hi Deren, I'm finding weirdness in my vcf from ipyrad that doesn't make sense to me. I am working with a table of genotype frequencies per SNP that I built from the vcf file, the raw vcf file, and the .loci file. My species is diploid, and I set parameters as such. The following is all at one example SNP that I looked at. The vcf file states G and A,T as the alleles. This was weird to me as I expected to only have 2 alleles per segregating site. When I look visually in .loci, all samples are G, -, or K at this SNP position. Confusing thing 1: "K" is G or T, so why are the alleles listed earlier on this vcf line and binned as G, A/T (so there aren't really 3 alleles, at least in this case, it's a translation of the "K code" issue)? For this SNP, most individuals in the vcf file are 0/0:X:0,0,0,X. This seems right to me. Confusing thing 2: For 20/315 individuals, the pattern is 0/0:X:0,0,X,0. Shouldn't this be coded 1/1? Or differently than 0/0? For completeness, I have 1 individual that is 0/2:9:1,1,2,5 and one that is 1/1:6:0,6,0,0. All other individuals at this SNP follow one of the first two patterns or are missing (./.). Thoughts?
Isaac Overcast
@isaacovercast
Sep 19 2017 19:59
@toczydlowski What version of ipyrad are you running? Can you dropbox me the .loci and .vcf files? That does sound weird.
toczydlowski
@toczydlowski
Sep 19 2017 20:31
@isaacovercast Hi Isaac. Quick follow-up. For the individuals coded as 0/0:X:0,0,X,0, they are in fact G in .loci. So 0/0 is the correct coding - that is it gives the correct end result, but it seems the X should be in the end (4th) position, like for the other samples that are G, not in the third position. I am running 0.5.15 for this work. I have read through the bug report/updates from this version forward multiple times, but I don't remember if this is a bug that was found and fixed or not. I did a bunch of optimization of running replicate samples with different parameters in this version, so I don't want to switch to a more recent version (yet) if I don't have to. How do I find you on Dropbox (I use it but rarely).
Isaac Overcast
@isaacovercast
Sep 19 2017 23:31
@toczydlowski Well, i would bet good money the problem is the very old version. I know of several bugs we've fixed in the vcf output since 0.5. If you update to the newest version you will only have to re-run all the assemblies from step 5. I would highly recommend updating.