These are chat archives for dereneaton/ipyrad

2nd
Nov 2016
markusruhsam
@markusruhsam
Nov 02 2016 13:31
Which output contains the mapping reads which underpin a cluster within samples, i. e how can I find out how many reads underpin a specific SNP in my chosen sample?
Deren Eaton
@dereneaton
Nov 02 2016 13:33
@markusruhsam That information can be found in the VCF output.
markusruhsam
@markusruhsam
Nov 02 2016 13:38
And how can I pinpoint the SNP I am looking at in the .phy file (using Geneious for example) in the vcf output?
Deren Eaton
@dereneaton
Nov 02 2016 15:15
@markusruhsam I would do the following: (1) copy a sequence of 10 or so bases around your SNP, (2) open the .loci file with the less command and press / to search for a string in the file and paste the copied sequence, (3) see which locus it is found in. Then search for the resulting locus number in the VCF file.
Isaac Overcast
@isaacovercast
Nov 02 2016 17:54
@SnakeEvolution_twitter Also, 20GB of memory is pretty low. If you are analyzing real data I recommend 32GB or more. The alternative suggested by @ksil91 (dialing down the number of cores you're using) should work as well, because fewer cores reduces demand on memory.
Deren Eaton
@dereneaton
Nov 02 2016 17:56
I've been aiming to limit memory use to 1GB per core. I think I can say that we're hitting that mark for any single-end data set currently, but we need to make some changes to keep paired-end data in this range.
Alexander McKelvy
@SnakeEvolution_twitter
Nov 02 2016 20:12
Using three cores instead of 4 resulted in a similar error. Dialed down to one core and it completed without error. Seems to have gone faster that way. I guess I'll continue to use multiple cores unless I'm writing outputs. Thanks for the help! At what stage in your workflow do you guys recommend removing problematic samples from the dataset, or subsampling? Is it as simple as removing them from the working fastq folder and running iPyrad again at step 2?
Isaac Overcast
@isaacovercast
Nov 02 2016 20:40
To remove samples from an assembly you should create a new branch and pass the "-" argument to specify samples to remove like this:
ipyrad -p params-wat.txt -b newAssemblyName - badsample1 badsample2
This will create a new assembly without the two bad samples listed.
Alexander McKelvy
@SnakeEvolution_twitter
Nov 02 2016 20:43
Thanks! Does the opposite work as well? For example, would ipyrad -p params-wat.txt -b newAssemblyName + goodsample1 goodsample2
create an new assembly with only those two samples?
Deren Eaton
@dereneaton
Nov 02 2016 20:51
yes, but without the plus.
ipyrad -p params-wat.txt -b new sample1 sample2
Alexander McKelvy
@SnakeEvolution_twitter
Nov 02 2016 20:52
great, thank you!
Deren Eaton
@dereneaton
Nov 02 2016 20:53
or you can pass in a txt file as the last arg, with names listed one per line.
and names in the list will be kept
Deren Eaton
@dereneaton
Nov 02 2016 23:25
@/all v.0.4.7 of ipyrad is now up.
If you were running into a memory error in steps 5-7 in a previous version this should now be handled better. To avoid having to go back to step3 to fix the memory problem, follow these directions after updating ipyrad to continue from step 5 or 6 with your data set:
  1. open your JSON file in a text editor and find the value "max_fragment_length"
  2. If it is very high (like ~300) set it to a lower number like 100 or 200, corresponding to whether your data set contains 100bp reads or paired 100s, respectively.
  3. Run ipyrad with the -f flag starting from either step 5 or 6.