These are chat archives for dereneaton/ipyrad

19th
Feb 2016
Deren Eaton
@dereneaton
Feb 19 2016 02:12
Check out the docs, I put some serious work into them tonight. I think it's best to push all of the API stuff to after the CLI descriptions. Probably best to explain it as an analysis method, but then say, hey, you could also assemble your data using it to.
Deren Eaton
@dereneaton
Feb 19 2016 02:57
Because there are so many new kinds of RAD-like data, I'm thinking about changing the names of the "datatypes". You know, some people use "GBS" to mean all genotyping methods. I've used it to mean a method that uses a single-cutter without sonication. Pretty different.
It would be better to categorize them by the way they have to be analyzed.
Deren Eaton
@dereneaton
Feb 19 2016 15:26
meh, I don't know, I guess it's fine the way it is. I can't think of a better set of names so we'll fall back on making it similar to pyrad's names
Isaac Overcast
@isaacovercast
Feb 19 2016 18:58
I think that makes the most sense re: datatypes naming. I see what you're doing on this page: http://ipyrad.readthedocs.org/files.html and I think that's the best way to solve the problem. Keep the datatype names as they are in the code and the in the docs be very explicit about what each datatype means.
I totally auto-trolled myself, started step 3 on the cluster with CLI and didn't set the -c flag. It's been running for days on 4 cores, too late to stop and restart. I know that the default used to be "use all available" which does maybe seem like overkill, but it also does make our shit look wicked fast :rocket:
Deren Eaton
@dereneaton
Feb 19 2016 19:06
yeah, I've thought about that. But we can make ipcluster launch faster if it knows how many cores to look for. Otherwise it has to keep looping until it thinks there are no more engines being added. I figured for the CLI it gives it a better feel to be able to start up faster. But we should keep it in mind.
Isaac Overcast
@isaacovercast
Feb 19 2016 19:07
That makes sense
Isaac Overcast
@isaacovercast
Feb 19 2016 19:12

Check out the docs, I put some serious work into them tonight. I think it's best to push all of the API stuff to after the CLI descriptions. Probably best to explain it as an analysis method, but then say, hey, you could also assemble your data using it to.

I agree here too. In terms of organization, I like the way msprime has it set up, and I also like the way you have the docs for pyrad set up. The full tutorial is basically the first thing you see after the dl link, then all the more complicated stuff is below it (operating on the assumption that API users are going to be more comfortable digging for information, and lowering the barrier to entry for noobs).

Deren Eaton
@dereneaton
Feb 19 2016 19:34
Ok, I've made a pretty good dent on the docs. Unfortunately I'm busy as hell the next week. Including a job interview. Which is why I was hoping we could launch before then, but it'll be pretty tough. I need to put some serious work into step7 still.
Do you know what you're gonna work on next?
Isaac Overcast
@isaacovercast
Feb 19 2016 19:39
documentation and closing tickets, but I'm open to suggestions if you think there's something that's high priority.
Deren Eaton
@dereneaton
Feb 19 2016 19:42
No, that sounds good. I think getting the refseq location info into the outputs will have to be a 0.2 goal.
I imagine it will get complicated figuring out what to do with the info when the reads are only partially overlapping.
I'm cramming a bunch of stuff into the "advanced_tutorial" but maybe we should make a few separate ones. One that is a "preview-mode tutorial" and another that is "branching tutorial", and another that is "refseq-tutorial". I'm not sure... cuz there's obviously lots of redundancy.
Isaac Overcast
@isaacovercast
Feb 19 2016 19:47
Yeah, that's true, but it's redundancy that serves a purpose. Having the tutorials be separated makes it easier for people to find and learn about what they specifically want.
Deren Eaton
@dereneaton
Feb 19 2016 19:51
agreed.
I've been editing them by hand for expediency, but if we keep changing things and they need to be updated a much easier way to create them is to write them as jupyter notebooks and use the nbconvert --to-rst (or something like that) function to convert them. This way we can quickly rebuild the tutorials even if we change the data or program.
and it ensures that the code we write actually works.
Isaac Overcast
@isaacovercast
Feb 19 2016 19:53
That's important ;P
the nbconvert trick is Handy! I didn't know about that.
Deren Eaton
@dereneaton
Feb 19 2016 19:54
I think it's possible to even have the notebook run and build the docs inside the conda.recipe for rtd, but we'll see,. Having slow bulding docs that need to assemble several data sets would be pretty annoying.
Isaac Overcast
@isaacovercast
Feb 19 2016 20:54
Do you think it would be useful to give users some kind of feedback about progress? of each step? We could think about that thing people do where they print a bunch of dots to indicate stuff is happening? Totally not married to this idea, just putting it out there.
Deren Eaton
@dereneaton
Feb 19 2016 21:06
I like the idea. But it easy require querying ipclient for finished jobs. I think it's doable with async results, but might be tough to figure out
Autocorrect. Not easy. But doable.
Deren Eaton
@dereneaton
Feb 19 2016 21:24
Maybe I'm wrong, easier for some steps than others
Deren Eaton
@dereneaton
Feb 19 2016 22:10
If we did make a progress bar it would be nice to have it print to the same line as "saving assembly"
Isaac Overcast
@isaacovercast
Feb 19 2016 22:11
I'll give it some thought, i think it's the kind of thing people would appreciate..
Deren Eaton
@dereneaton
Feb 19 2016 22:11
Say there are 60 samples in an Assembly, we could have it print a dot after each 6 samples, so the output would look like:
   .......... Saving assembly
```
or [.......] saving, which also could make it more clear if a job stopped and saved before it was finished.
Isaac Overcast
@isaacovercast
Feb 19 2016 23:13
Is this that thing you were talking about the other day about multiplexed tags:
Deren Eaton
@dereneaton
Feb 19 2016 23:21
yeah, you think you could get your hands on some double multiplexed data? Would be sweet if we built support for it.
is the debugger still broken?
Isaac Overcast
@isaacovercast
Feb 19 2016 23:29
you mean logging?
It appears to be behaving better than i remember, it looks like it's working.
Not sure what changed.