These are chat archives for dereneaton/ipyrad

25th
Jan 2016
Isaac Overcast
@isaacovercast
Jan 25 2016 00:24
I must be smokin it...
This dataset still crashes for me on step6: ipyrad/tests/data/simrad_test_R1.fastq.gz
Hope you weathered the storm ok. NYC straight up shut down for like 24 hrs, pretty wild.
Deren Eaton
@dereneaton
Jan 25 2016 01:23
yeah, I found some more problems in 5 & 6. Working on it.
snow was pretty wicked, they got it mostly cleaned up here already though
Isaac Overcast
@isaacovercast
Jan 25 2016 17:15
Found a bug in CLI, raw_fastq_path is relative to assembly working directory, not relative to wherever you ran ipyrad from, this was not the behavior i expected. Is this intentional? If so i can learn to live with it ;p
Deren Eaton
@dereneaton
Jan 25 2016 17:17
do you mean just the default location in the paramsdict?
I started on step7 filters and edge trimming using the hdf arrays. It's really clean compared to iterating over the catclust file.
Isaac Overcast
@isaacovercast
Jan 25 2016 17:20
no, i set the rawfastqpath to "./raw" and it wants to look in data.paramsdict["working_directory"]/raw. It's a simple fix (there's a chdir in __main) just wasn't sure if this was what you intended.
(emphasis unintentional) ;p
Deren Eaton
@dereneaton
Jan 25 2016 17:20
Oh, whoa. I didn't even know about chdir.
Isaac Overcast
@isaacovercast
Jan 25 2016 17:21
Ok, i must have put that in there. I'll fix it.
Deren Eaton
@dereneaton
Jan 25 2016 17:21
I don't think we need to chdir at any point, we should always be using abs paths.
I'm not going to have time to work on ipyrad much this week. Should I push the changes I have for step7 on a separate branch so you can get a gist for the direction I'm going with it, and hack on them too? Or just push to master and not worry about step7 being broken at this point?
Isaac Overcast
@isaacovercast
Jan 25 2016 17:23
agreed... I'm still working on fine tuning the preview mode pipeline for glenn's gbs data, if you don't have enough high depth clusters going in to step 4 it isn't happy, also trying to get more informative preview mode output.
Hm, why don't you push a branch, that's a good idea. I'll check it out.
Deren Eaton
@dereneaton
Jan 25 2016 17:30
ok
Isaac Overcast
@isaacovercast
Jan 25 2016 19:02
This is probably obvious, but you can't have more than one assembly using the same working directory. Especially for any step that uses zcat_make_temps, because they will end up writing over the same tmpdir. Could fix this by prepending data.name to tmpdir, if you think this could be a problem.
Deren Eaton
@dereneaton
Jan 25 2016 20:06
oh, good catch. Yeah, we should do that. I suppose someone could run two analyses simultaneously and it would get f'd.
I think similarly that the aligning steps make a dir called tmpchunks or something too.
in the CLI does data.name get grabbed from working_dir? or where does it get the name from the params file?
Isaac Overcast
@isaacovercast
Jan 25 2016 21:15
Lol, i was exactly running to analyses simultaneously. I'll fix it.
In CLI the name does get grabbed from working_dir, it's working_dir.rsplit("/", 1)[1]. It works, but i'm not super happy with it, i'd be open to other ideas.
I wrote a routine to iterate over various values (between .5 and 4 million sequences) for preview_truncate and get timing information for each step. Side-effect is it collects information about whether any step failed, so we could maybe use it as a crude test skeleton.
Deren Eaton
@dereneaton
Jan 25 2016 21:20
What happens if workdir is ./
Or should we require users to give a name?
Deren Eaton
@dereneaton
Jan 25 2016 21:25
I feel like workdir doesn't make it clear that this is a new to level dir that we're going to make. Maybe if we called it project_name it would be more clear?
The problem is still, users will get to step 7 and want to output results for several mindepth settings and they need to be able to give different names to the outputs.