These are chat archives for dereneaton/ipyrad

18th
Dec 2017
Isaac Overcast
@isaacovercast
Dec 18 2017 16:27
@ChaoShenzjs I don't know exactly how disk usage increases with increasing raw data size, it's complicated. Step 6 will consume much more disk than step 5, so if your run is crashing on step 5 already then you'll need much more disk. How many samples do you have? You say "44 files" but is this raw data files or 44 sample fastqs? What does the output of ls -l look like in your _clust* directory?
Deren Eaton
@dereneaton
Dec 18 2017 17:44
@ChaoShenzjs @isaacovercast The files produced in step 5 should be much smaller than the files from step 2 (in the edits/ directory). If you have already completed step 3 than you can remove the files in the edits/ directory to make more disk space available for step 5. As Isaac said, however, it is usually the case that step 6 uses a lot of disk space, so you may run into problems then. It is hard to estimate how much, it depends on whether you have many loci, or many samples, or if your loci are very long. If all three are large, then yes, it will use a lot of disk space.
@bioballs I think the IndexError may be caused by a problem with how you create the initial object bb. Do you provide a .loci file as the data input? As for the generate_tests_from_tree function, it is very picky about how you enter the constraints. If there are no subtrees possible that meet the given constraints on the input tree then it will return zero tests. In the example you pasted above, you entered two arguments for "p4", maybe you meant to enter "p3" for one of them.
Deren Eaton
@dereneaton
Dec 18 2017 17:50
You can alternatively generate the tests by hand, rather than use the generate function. This is an easier way to go if you want to run tests that are not represented on the tree that you have. For example, you can write them by hand, or write a for-loop to generate many tests:
## a single test
bb.tests = {
    "p4": ["32082_przewalskii", "33588_przewalskii"],
    "p3": ["29154_superba"], 
    "p2": ["33413_thamno"], 
    "p1": ["40578_rex"],
}

## or, multiple tests
bb.tests = [
    {
     "p4": ["32082_przewalskii", "33588_przewalskii"],
     "p3": ["41954_cyathophylloides"], 
     "p2": ["33413_thamno"], 
     "p1": ["40578_rex"],
    },
    {
     "p4": ["32082_przewalskii", "33588_przewalskii"],
     "p3": ["41478_cyathophylloides"], 
     "p2": ["33413_thamno"], 
     "p1": ["40578_rex"],
    },
]
arminf
@arminf82
Dec 18 2017 17:59
@dereneaton Hello, sorry for the typo, of course I used P3 and P4. BUT, I think the problem is somewhere else. I just rerun a small subsample with ipyrad 0.7.15 starting from step 1 (demultiplexing) and the ABBA test worked fine with that outputfile. I think there is a problem with my dataset. In my original dataset I used ipyrad v.0.5.15 to demultiplex different lanes (5x20 samples), merged demultiplexed files of some technical replicates by text concatenation and run the complete demultiplexed samples (around 100 samples) together starting from step 2. All outputfiles from that analysis /workflow work fine with raxml, svdq, structure, PCA, BPP, Treemix etc. Just not ABBA-BABA.