These are chat archives for dereneaton/ipyrad

26th
Mar 2017
Emily Warschefsky
@ewarschefsky_twitter
Mar 26 2017 17:21

@dereneaton @isaacovercast: I completed a run of ipyrad of my entire dataset, and am now subsetting the dataset into different groups of individuals for downstream analysis. I was able to branch and rerun step 7 for one subset, but now I am trying to do the same thing again and it is not working. ipyrad will perform the branching step, but then, when I try to rerun step 7 with the new branch, it gives me the following output:

-------------------------------------------------------------
  ipyrad [v.0.5.15]
  Interactive assembly and analysis of RAD-seq data
 -------------------------------------------------------------
  loading Assembly: sub2
  from saved path: /scratch/ewars001/ipyrad/Lane2/Lib13/sub2.json
  local compute node: [24 cores] on u01

  Step 7: Filter and write output files for 2 Samples

  Encountered an unexpected error (see ./ipyrad_log.txt)
  Error message is below -------------------------------
    Database file /scratch/ewars001/ipyrad/Lane2/Lib13/bL123_outfiles/bL123.hdf5 not found. First run step 6.

I have definitely run step 6, and I have double checked that this file is in the filepath, so I don't know what the problem could be, especially since it successfully branched and reran step 7 from the same dataset before. You may recall I have had branching issues in the past, so I'm pretty frustrated with this feature at this point. Is there a way to manually manipulate the files to do the same thing like there was in pyrad?

Isaac Overcast
@isaacovercast
Mar 26 2017 17:51
@ewarschefsky_twitter Sorry you're having problems with branching. Before we go any further with troubleshooting will you please update to the latest version (0.6.10) conda install -c ipyrad ipyrad.
Isaac Overcast
@isaacovercast
Mar 26 2017 17:57
Also, will you paste the results of this command ls /scratch/ewars001/ipyrad/Lane2/Lib13/bL123_outfiles/
Isaac Overcast
@isaacovercast
Mar 26 2017 18:03
I tested branching with two samples and it works in the current version.
Emily Warschefsky
@ewarschefsky_twitter
Mar 26 2017 18:28

hey @isaacovercast - I have stopped updating ipyrad for a couple of reasons: 1. because every time I update, I get new errors that didn't exist before, and 2. because I'm in the middle of running multiple iterations of ipyrad on the same dataset, so I want everything to be run on the same version. The results of the ls command:

ls /scratch/ewars001/ipyrad/Lane2/Lib13/bL123_outfiles/
bL123.hdf5  bL123.hdf5_old  bL123.loci  bL123.phy  bL123.snps.map  bL123.snps.phy  bL123_stats.txt  bL123.vcf

What do you mean by "with two samples"?

Emily Warschefsky
@ewarschefsky_twitter
Mar 26 2017 18:33
@isaacovercast - the only thing I can think of is that I have run the "touch" command on all of my files to update the time stamp so they don't get deleted from my scratch folder on our hpc. Could that be causing an issue?
Isaac Overcast
@isaacovercast
Mar 26 2017 18:34

No that wouldn't be the problem. We don't pay attention to timestamps on files.

By "With two samples" I mean it looks like you tried creating a branch with 2 samples, so I tested this with the simulated data:

bash-3.2$ ipyrad -p params.txt -b subset 1A_0 1B_0

  loading Assembly: ipyrad-test
  from saved path: /tmp/ipyrad-test/ipyrad-test.json
  creating a new branch called 'subset' with 2 Samples
  writing new params file to params-subset.txt

bash-3.2$ ipyrad -p params-subset.txt -s 7

 -------------------------------------------------------------
  ipyrad [v.0.6.10]
  Interactive assembly and analysis of RAD-seq data
 -------------------------------------------------------------
  loading Assembly: subset
  from saved path: /tmp/ipyrad-test/subset.json
  host compute node: [24 cores] on yeti

  Step 7: Filter and write output files for 2 Samples
  [####################] 100%  filtering loci        | 0:00:12  
  [####################] 100%  building loci/stats   | 0:00:01  

  Empty varcounts array. Probably no samples passed filtering.
ERROR:ipyrad.core.assembly:max() arg is an empty sequence

  Encountered an unexpected error (see ./ipyrad_log.txt)
  Error message is below -------------------------------
max() arg is an empty sequence
The max() arg error i'm getting there at the end is because no loci have enough sample depth to get written out, so that's expected. Otherwise it works fine, on the simulated data at least.
Isaac Overcast
@isaacovercast
Mar 26 2017 18:39
I looked and the error message you're seeing is maybe sometimes inaccurate. It can happen even if the hdf5 file actually does exist, but i'm trying to figure out how. Can you rerun step 7 on the branch and include the -d flag and post the ipyrad_log.txt?
Emily Warschefsky
@ewarschefsky_twitter
Mar 26 2017 18:44
Ok, here you go:
2017-03-26 14:43:31,193         pid=5783        [load.py]       DEBUG   skipping: no svd results present in old assembly
2017-03-26 14:43:31,967         pid=5783        [parallel.py]   INFO    ['ipcluster', 'start', '--daemonize', '--cluster-id=ipyrad-cli-5783', '--engines=Local', '--profile=default', '--n=16']
2017-03-26 14:43:39,738         pid=5783        [assembly.py]   ERROR       Database file /scratch/ewars001/ipyrad/Lane2/Lib13/bL123_outfiles/bL123.hdf5 not found. First run step 6.

2017-03-26 14:43:39,887         pid=5783        [assembly.py]   INFO      shutting down engines
2017-03-26 14:43:39,963         pid=5783        [assembly.py]   INFO      finished shutdown
2017-03-26 14:43:41,720         pid=6012        [load.py]       DEBUG   skipping: no svd results present in old assembly
2017-03-26 14:43:42,477         pid=6012        [parallel.py]   INFO    ['ipcluster', 'start', '--daemonize', '--cluster-id=ipyrad-cli-6012', '--engines=Local', '--profile=default', '--n=16']
2017-03-26 14:43:49,945         pid=6012        [assembly.py]   ERROR       Database file /scratch/ewars001/ipyrad/Lane2/Lib13/bL123_outfiles/bL123.hdf5 not found. First run step 6.

2017-03-26 14:43:50,099         pid=6012        [assembly.py]   INFO      shutting down engines
2017-03-26 14:43:50,178         pid=6012        [assembly.py]   INFO      finished shutdown
Isaac Overcast
@isaacovercast
Mar 26 2017 18:57
Hm. Not much to go on there... Can you post the .json file for the branch?
Emily Warschefsky
@ewarschefsky_twitter
Mar 26 2017 19:00
The whole thing or a particuar part?
Isaac Overcast
@isaacovercast
Mar 26 2017 19:02
Just do this: grep clust_database <whateverYourJsonFileIsCalled>
Emily Warschefsky
@ewarschefsky_twitter
Mar 26 2017 19:03
"clust_database":"/scratch/ewars001/ipyrad/Lane2/Lib13/L23_consens/bL123.clust.hdf5",
Isaac Overcast
@isaacovercast
Mar 26 2017 19:03
Ok now this: ls -l /scratch/ewars001/ipyrad/Lane2/Lib13/L23_consens/bL123.clust.hdf5
Emily Warschefsky
@ewarschefsky_twitter
Mar 26 2017 19:05
ah ha: ls: cannot access /scratch/ewars001/ipyrad/Lane2/Lib13/L23_consens/bL123.clust.hdf5: No such file or directory
Isaac Overcast
@isaacovercast
Mar 26 2017 19:07
Yes, that's what I expected. Did you rename any directories before, during or after the branching?
Or move any directories?
Emily Warschefsky
@ewarschefsky_twitter
Mar 26 2017 19:14
I did have to do something like that because I was having problems trying to merge after step 5/before steps 6-7. Ipyrad was looking for the consensus files in the L23_consens folder, even though some of them were in a different folder (L1_consens). I saved the L23_consens folder as L23_consens_old and replaced it with a new folder that contained all of the consensus sequences (L1_consens and L23_consens) that were supposedly merged, but I did not copy/paste the bL123.clust.hdf5 file into the new folder...
However, since that time, I successfully ran steps 6-7, branched multiple times, and reran step7 multiple times..
and if I run the grep clust_database command on the branch that worked, it also gives me "clust_database":"/scratch/ewars001/ipyrad/Lane2/Lib13/L23_consens/bL123.clust.hdf5",
Isaac Overcast
@isaacovercast
Mar 26 2017 19:19
Yeah that'll do it. ipyrad makes a pretty strong assumption that the folders and files it creates won't get moved around.... If you are creating directories and moving stuff around by hand the behavior of ipyrad will be non-deterministic... If the bL123.clust.hdf5 file isn't in that directory then step 7 should fail for any assembly that references it. I'll bet if you rerun step 7 on the branch that worked it'll error out in an identical fashion.
Emily Warschefsky
@ewarschefsky_twitter
Mar 26 2017 19:23
Yeah, every branching I have tried fails now. So...would you recommend copy/pasting that hdf5 file into the L23_consens folder?
Isaac Overcast
@isaacovercast
Mar 26 2017 19:29
That would probably fix it.
It's a little hard to say though cuz i don't know what samples are in the L1_consens assembly and which are in the L23_consens assembly. Moving the hdf5 file back will be a good start, but don't be surprised if it still does goofy things if it can't find samples....
For the best results I would recommend going back to step 5 and trying to get the merge to work right.
Emily Warschefsky
@ewarschefsky_twitter
Mar 26 2017 19:34
well that was something you guys said you needed to work on still..
and I was not keen to go back to step 3 to do it...
but thanks for your help with this
it is at least running step 7 now, so we'll see if the output looks alright.
Isaac Overcast
@isaacovercast
Mar 26 2017 19:36
np. hope it works!
Emily Warschefsky
@ewarschefsky_twitter
Mar 26 2017 19:39
the stats file looks good! thank you - you saved my Sunday!
Isaac Overcast
@isaacovercast
Mar 26 2017 19:41
Happy to help. Glad it worked.
Emily Warschefsky
@ewarschefsky_twitter
Mar 26 2017 21:00
...or not - it is giving me the same problem I had before when the merging wasn't working properly.
it runs fine, but all of the samples from the first Lane of data (L1_consens) are entirely missing data.
R2C2.lab
@R2C2_Lab_twitter
Mar 26 2017 21:05
@isaacovercast did you manage to take a look at the raw I sent?
Isaac Overcast
@isaacovercast
Mar 26 2017 22:13
@R2C2_Lab_twitter yes I did. I found the problem and fixed it, but i'm waiting for the final test run to complete before I push the fix to conda.