Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
  • Jan 31 2019 17:58
    jorgemachucav starred galaxyproject/tools-iuc
  • Jan 31 2019 17:45
    bebatut opened #2270
  • Jan 31 2019 16:18
    cpreviti synchronize #2267
  • Jan 31 2019 14:15
    cpreviti synchronize #2267
  • Jan 31 2019 12:42
    bernt-matthias review_requested #2269
  • Jan 31 2019 12:42
    bernt-matthias edited #2269
  • Jan 31 2019 12:41
    bernt-matthias edited #2269
  • Jan 31 2019 12:40
    bernt-matthias synchronize #2269
  • Jan 31 2019 12:13
    cpreviti commented #2267
  • Jan 31 2019 12:07
    nsoranzo commented #2267
  • Jan 31 2019 12:01
    cpreviti synchronize #2267
  • Jan 31 2019 11:21
    cpreviti synchronize #2267
  • Jan 31 2019 09:47
    cpreviti synchronize #2267
  • Jan 31 2019 09:27
    cpreviti synchronize #2267
  • Jan 30 2019 20:38
    bernt-matthias commented #2131
  • Jan 30 2019 20:19
    hepcat72 commented #2239
  • Jan 30 2019 19:50
    lparsons commented #2239
  • Jan 30 2019 18:36
    bgruening commented #2268
  • Jan 30 2019 15:23
    nsoranzo commented #2268
  • Jan 30 2019 15:23
    nsoranzo commented #2267
wm75 (Wolfgang Maier)
@wm75:matrix.org
[m]
Does anybody know what happened to the changeset diffs on the toolshed?
For me they are currently getting rendered mal-/unformatted.
Nicola Soranzo
@nsoranzo
@wm75:matrix.org I think nothing has changed, most diffs are displayed correctly.
wm75 (Wolfgang Maier)
@wm75:matrix.org
[m]
Weird, I only checked two and they both looked like above. Do you have a link to a correct looking one for me?
Ah, don't bother. I found one.
Any idea what causes the formatting issues for some tools then?
Nicola Soranzo
@nsoranzo
Not from the top of my head
wm75 (Wolfgang Maier)
@wm75:matrix.org
[m]
Well, ok then :) It isn't really a big issue. Thanks!
Greg Von Kuster
@gregvonkuster

@galaxyproject/iuc can the following PRs be addressed? The vSNP suite consists of 6 tools, with individual tools not being useful unless the full suite is available. In fact, the suite of tools are most useful within a workflow that I will contribute (somewhere??) when the PRs have all been merged.

In the meantime, I am having to maintain 2 sets of these tools (one with owner greg and the other with owner iuc) since I am not able to update the many vSNP workflows that I maintain that use these tools.

Significant effort has already gone into reviewing these PRs, so I'm not sure significantly more investment is beneficial, and I'd hate to see these PRs end up languishing here indefinitely.

galaxyproject/tools-iuc#3407 - I think this can be merged without too much invested into a review. It simply fixes tools that have already been reviewed and merged. The only question is if another test is desirable - if so, the test dataset is fairly large.

galaxyproject/tools-iuc#3405 - I think the only question about this is if there is a better way to do what the tool is doing.

I believe the only questions about this galaxyproject/tools-iuc#3404 and this galaxyproject/tools-iuc#3381 is that they accept collections on input, so Galaxy will not split the collections across jobs. The reason that the tools accept collections is because the outputs must be merged instead of split across nested collections, and the Galaxy framework does not yet support merging outputs when the non-collection inputs are split across jobs. The Python multiprocessing library is used to help with this shortfall.

Marius van den Beek
@mvdbeek
It’s a long review, but if a tool needs access to more than one input file to calculate its output, it should be a collection input or a multiple=“true” input. If any summary can be created after running the tool, but processing can otherwise happen in parallel, I think the best way is to have 2 tools, one for processing, one for reporting, but we can accept a all-in-one tool if the runtime per input is negligible
M Bernt
@bernt-matthias
+1
Marius van den Beek
@mvdbeek
Might also be a reasonable scenario for independently schedulable command blocks 🍔
Since galaxy tools are not complicated enough yet :laughing:
M Bernt
@bernt-matthias
Adding to my +1: .. Given that the separate jobs have runtime larger than the overhead for creating and scheduling the jobs..
Actually support for array jobs would be nice.. I have this on my mind for a while since we have a 1k job limit per user on our system..
Nate Coraor
@natefoo:matrix.org
[m]
@mvdbeek: fwiw James wanted and proposed independently schedulable command blocks years ago. It'd give us some nice scheduling flexibility for tools like mappers when you've selected a custom uploaded reference where a index has to be built using a single-threaded indexer before the mapper can run.
Marius van den Beek
@mvdbeek
Can you check if there’s already an issue out there, and if not, create one ? I think this is reasonable as long as we can make that conditional on the job runner supporting it
Nate Coraor
@natefoo:matrix.org
[m]
Although we should probably do an implicit index build as a separate step and store that index as a MetadataFile on the input so it doesn't have to be done every time.
Marius van den Beek
@mvdbeek
I remember, but I’m always hesitant to say it’s on the roadmap for 10 years now :laughing:
Nate Coraor
@natefoo:matrix.org
[m]
;D
Marius van den Beek
@mvdbeek

Although we should probably do an implicit index build as a separate step and store that index as a MetadataFile on the input so it doesn't have to be done every time.

That’s gonnna be a big invisible (to the user) file, otherwise I do like that idea

Nate Coraor
@natefoo:matrix.org
[m]
It's similar to BAM indexes though, except it's not generated with the BAM, it's generated when you use it as an input to a tool that needs that index. We could unhide that implicit dataset to make it obvious and so there'd be a visible job in the history.
Björn Grüning
@bgruening
a (global) reference genome cache and get rid of data managers ;)
Nate Coraor
@natefoo:matrix.org
[m]
I guess if it's not a datatype conversion there's no implicit conversion to be done, though. I mean you could do it today as an implicit conversion but then you have 2 "copies" of the same dataset. Actually I guess it would be 2 actual copies since the input dataset to the conversion would not be the same dataset as the output.
Marius van den Beek
@mvdbeek
Right, but bam indexes are pretty small compared to the dataset, that’s not true for all indexes
like, we have the new history panel now, maybe we can fix the invisible aspect of this?
Nate Coraor
@natefoo:matrix.org
[m]
Yeah, that's a good point
John Chilton
@jmchilton
none of this is worth the complexity, this is already why we can't have nice things on the backend, lets not make it worse
Nate Coraor
@natefoo:matrix.org
[m]
John Chilton has joined the room 😂
So you're saying we're best just leaving all this as-is?
John Chilton
@jmchilton
Definitely we could do better by allowing all these tools to consume indices controlled by the users and make steps away from "Galaxy is a walled garden requiring synchronized global state" and toward "Galaxy can take advantage of workflow knowledge and file-system/linked/referenced resources"
Marius van den Beek
@mvdbeek
So more indices as datasets ?
John Chilton
@jmchilton
Yeah!
Nate Coraor
@natefoo:matrix.org
[m]
Let's revive parent/child datasets! ;D
John Chilton
@jmchilton
We have a way to split computation into multiple parts - it is workflows
17 replies
Marius van den Beek
@mvdbeek
we’ve had some wrappers in the early days that did this, I guess this was a UX issue ?
Maybe something like “tool-provided workflows” could help there ?
M Bernt
@bernt-matthias
bgruening
@bgruening:matrix.org
[m]

Anyway I apologize for not being able to not have the last word, I should be in therapy.

I can help with that ;) You are assuming users are technical people, but that is questionable. From my experience there are a lot of Galaxy users that are happy if nifty details can be hidden as much as possible and this is considered a good UX. In this sense implicit > explicit, even if we as technical people have hard times to understand this.

5 replies
bgruening
@bgruening:matrix.org
[m]

"Yay, I love being able to not understand where data is coming from - that lack of transparency makes this all really usable for me".

I guess its useless to discuss this further on that level. Lets postpone it again a few years :)

Marius van den Beek
@mvdbeek
I mean none of this is an either/or, we can certainly pass indexes as datasets around
We update bwa, bowtie and minimap to also accept exisiting indexes and then we probably cover 90% of the cases
We don’t need framework engineering to do that, that’s just a decisioin the IUC can take
bgruening
@bgruening:matrix.org
[m]
We do already have framework stuff for it in "custom_builds"? Do we?
Marius van den Beek
@mvdbeek
Ugh, this is the worst though
I mean, it could be fixed, but the current state is … not good
Nate Coraor
@natefoo:matrix.org
[m]
Also custom builds IIRC only generates chrom lengths and not any indexes
It was basically only ever used for Trackster?
Marius van den Beek
@mvdbeek
yeah, and in a terrible way that breaks a lot of other conventions and rules
we can take the idea maybe (probably not though), but not the implementation