These are chat archives for dereneaton/ipyrad

19th
Sep 2018
N-atalia
@N-atalia
Sep 19 2018 00:48
Found out what was wrong! thanks anyway
AliceLedent
@AliceLedent
Sep 19 2018 13:27
Hi
Is it correct in the output of the third step to say that "clusters_hidepth" corresponds to the number of clusters that passed the "mindepth treshold" and that "avg_depth_total" correspond to the mean depth of all clusters? What should be the "normal" values of "avg_depth_total" ? Mine are around 2, i suppose this is too low? Does that mean my genome is huge? Is there another explanation?
Thanks in advance!
Nitish Narula
@nitishnarula
Sep 19 2018 14:22
Hi. I have been testing the bundled analysis tool, tetrad. I run a simple test using a snps.phy containing 64 samples . This was my command line call: tetrad -s test.snps.phy -n test -b 100 -c 16 --MPI. I believe tetrad completed without errors (nothing in the screen output said there was an error) but I don't see the same output files as in the example in docs here. Particularly .stats.txt and .full.tre are missing. I have the following files:
test.boots
test.cons
test.input.h5
test.nhx
test.output.h5
test.quartets.txt
test.tet.json
test.tree
Eaton Lab
@eaton-lab
Sep 19 2018 14:25
Hi @nitishnarula , we need to update the docs. The 'full' tree is now simply labeled '.tree' and we don't print any statistics for the run currently beyond what is printed to the screen while running, though it's something I would like to add to.
Nitish Narula
@nitishnarula
Sep 19 2018 14:28
@eaton-lab thanks for the info.
Eaton Lab
@eaton-lab
Sep 19 2018 14:29
Hi @AliceLedent , you are interpreting the depth statistics correctly. This is not abnormal. There are many possible explanations: too many loci (e.g., too common of cutter or too wide of size selection); too much multiplexing (not enough sequencing per sample); Many off target reads or contaminants. You can retain these low depth clusters or exclude them. It's usually best to try both ways.
Hi @marypsiboas_twitter , Hmm, it would be strange if there was no difference between .85 and .90 clustering... One possibility is that you created a new branch and named it clust90 or something but then forgot to actually change the clustering threshold in it to .90. I've done that before :hand:
Veronica Reyes
@VeroIarrachtai
Sep 19 2018 14:36

Hi everyone. I have some basics questions about the parameters.
I would like you help me.

In the parameter 17 filter_min_trim_len, the sequences with a minor size a 35 (for example) will be descarted? And
In the parameter 10 Phred_Qscore_offset can I chose a phred 28? How I do this? Because 33 is a 20 phred and 43 is a 30 phred

Nitish Narula
@nitishnarula
Sep 19 2018 14:52
@eaton-lab Another question about tetrad. Is there a way to infer a species tree from an alignment that contains multiple samples per species. Like by providing a map similar to the populations assignment file (parameter 28) but instead a species assignment file? The SVQuartets paper has an example like that (fig7).
Deren Eaton
@dereneaton
Sep 19 2018 15:06
No. Maybe I'll implement it eventually. It seems to me just a shortcut to make analyses run faster. If things run fast enough with all your samples then no point in subsampling I would think.
Mariana Vasconcellos
@marypsiboas_twitter
Sep 19 2018 17:04
Hi @dereneaton, I already checked for that, and the assemblies .90 and .85 are not identical, neither the stats files generated afterwards. The only weird thing is that columns referring to steps 3-6 on the stats files are identical. The column 'loci in assembly' from step 7 is different, however. @isaacovercast seems to know more about a potential bug in the generation of the stats file once you branch an already finished pipeline. Thank you very much for getting back to me! I hope this is an easy fix since the branching seems to be working fine, the problem is the stats file only.