Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
    Marius van den Beek
    @mvdbeek
    no, mulling is not related to this

    And remap the $tool_directory into a fixed internal path?

    $__tool_directory__ is the path to the directory where the wrapper lives. I don’t know what remapping to a fixed interal path means

    Nuwan Goonasekera
    @nuwang
    But the wrapper is only read by Galaxy right? The tool container does not need a copy of the wrapper, only join.py. So if the mulling process simply copies join.py into the container at a fixed path, the tool would no longer need to have join.py injected/mounted from outside?
    Marius van den Beek
    @mvdbeek
    Mulling has nothing to do with Galaxy
    John Chilton
    @jmchilton
    Yeah - that isn't the path we're going down.
    Marius is right here
    The tool can be rewritten to depend on a Conda package if you'd like to do that - otherwise I think we want to aim for tools annotating the assets they need and Pulsar transferring them
    Nuwan Goonasekera
    @nuwang
    I see your point. The conda package would also need to contain join.py, not just the mulled container. But is there a reason to not treat all code other than the wrapper xml as part of the conda package?
    Marius van den Beek
    @mvdbeek
    That is what we’re suggesting. You can create a package instead of shipping wrapper scripts.
    It’s another step we’d be pushing onto tool developers, and I wouldn’t want to require this for contributors to the IUC for instance
    But I can see that not doing this is a little tricky for the kubernetes job runner case and tools shipped with Galaxy
    Nuwan Goonasekera
    @nuwang
    Ok, thanks, I understand the issue better now.
    Marius van den Beek
    @mvdbeek
    fwiw that is the current situation, plus the additional wrapper scripts for converters
     ~/src/galaxy/tools   shed_indexing_fix_for_python2 ?  find . -name \*.py|wc -l
          84
     ~/src/galaxy/tools   shed_indexing_fix_for_python2 ?  find . -name \*.sh|wc -l
           7
     ~/src/galaxy/tools   shed_indexing_fix_for_python2 ?  find . -name \*.pl|wc -l
          19
     ~/src/galaxy/tools   shed_indexing_fix_for_python2 ?  find . -name \*.R|wc -l
    Marius van den Beek
    @mvdbeek
    I think staging these as part of the job is probably the most reasonable thing to do. Scripting the package creation is an option, but I think this is going to be complicated (they don’t all require the same dependencies, so you’d have to walk back up to the tool wrapper and check what is needed).
    Enis Afgan
    @afgane
    Have all the tools shiped with Galaxy now been converted like the join1 tool?
    Marius van den Beek
    @mvdbeek
    No, only those for which we have standalone galaxy-* packages that can be used for the imports
    That said most tools now use requirements
    The ones that we can’t use without Galaxy being importable are listed in https://github.com/galaxyproject/galaxy/blob/dev/lib/galaxy/tools/__init__.py#L125
    So continuing the possibilities: We could put $__tool_directory__ on the jobs’ PATH, collect all wrapper scripts with symlinks in in a package and then change $__tool_directory__/wrapper.py to just say wrapper.py and annotate the tools with a requirement on this new package
    Marius van den Beek
    @mvdbeek
    that would keep the wrapper scripts in place and in the context within which they are used
    I don’t know if I really like this option … it’s not too much work, but publishing a package without annotating the requirements feels wrong
    and whenever we change a wrapper we’d have to release and update a new version (and update all the tool wrappers that use any one of the wrapper scripts)
    Nuwan Goonasekera
    @nuwang
    This path seems like a good middleground - maintains ease of tool wrapping but each version of a tool has an implicit dependency on an autogenerated artefact. The final resultant packages are completely self contained.
    Pablo Moreno
    @pcm32
    Wouldn't it be easier if things like join.py are made an executable in the path of the galaxy-util container? We sorted other similar tools like that in the past. Then you would avoid the $__tool_directory__ altogether.
    Marius van den Beek
    @mvdbeek
    no
    galaxy-util is a pypi package used in many projects
    we’ll not pollute this with legacy scripts
    Pablo Moreno
    @pcm32
    ok, shouldn’t then the join tool have its own container/conda package with this python logic? This is how 99% of all tools work AFAIK, why should this one be different?
    Marius van den Beek
    @mvdbeek
    I have discussed this above ^^
    and no, this is how ~ 70% of Galaxy tools work
    the other (guesstimated) 30% use wrapper scripts
    Nuwan Goonasekera
    @nuwang
    I think the issue is that a long of tool wrappers have their own wrapper scripts. It makes it difficult for tool authors to iterate on tool development if the tool must first be packaged as an artefact + the code relevant to the tool can be kept together. Is that a fair summary?
    Marius van den Beek
    @mvdbeek
    yep, thanks
    Coming back to any eventual fix: I don’t think we’d want to add new tool dependencies to the 20.01 release, since that would mean deployers would need to rebuild their dependencies without any warning or announcement. So is there any way that you can provide the tools/ folder in your deployments ?
    Pablo Moreno
    @pcm32
    on the older galaxy stable setup, everytime that the container started in k8s, it would copy a number of directories to the persistent volume claim, among them the tools directory, and then the tools config was set accordingly. This was necessary to use tools that used $__tool_directory__.
    this made the startup of the process somewhat slower, but if it is only these default tools, should be ok-ish I guess.
    Marius van den Beek
    @mvdbeek
    that would work
    Nuwan Goonasekera
    @nuwang

    So wrt to the concern you raised about implicit dependencies, what if there was a variant on dependencies like this:

    <requirement type=“code”>join.py</requirement>

    Sort of like the old code tag. I always thought it would be nice to capture the fact that the wrapper relies on an external file like this somewhat more explicitly. The conda package for them could be implicit and synced to the tool wrapper’s version.

    John Chilton
    @jmchilton
    <requirement type="tool_file">*py</requirement>
    I can't find an issue but this has been on the Pulsar roadmap forever
    Alexandru Mahmoud
    @almahmoud
    Jumping in late on this conversation, but I just checked and join.py is at /cvmfs/main.galaxyproject.org/galaxy/tools/filters/join.py, which is already mounted into the job containers. Assuming the code there is kept up to date, can't we universally point to the galaxy code on the CVMFS for any wrapper portion requiring gxy code? Unless i'm misunderstanding the problem
    Enis Afgan
    @afgane
    A problem with the CVMFS is that we wouldn’t necessarily have a self-contained solution where changes on the CVMFS could affect execution of established pipelines.
    Nate Coraor
    @natefoo
    So that thing I was asking about last week
    Was to test running some jobs in k8s without a shared FS using John's new extended metadata stuff
    Nuwan Goonasekera
    @nuwang
    @natefoo I actually made a PR for that: galaxyproject/cloudlaunch-registry#9 but it never got merged because cloudlaunch went down. Will merge it and test now
    Nate Coraor
    @natefoo
    Awesome, thanks!
    Nuwan Goonasekera
    @nuwang
    There’s also a running cluster on jetstream: https://149.165.156.41/ Let me know if you want to use that or launch a fresh one
    Nate Coraor
    @natefoo
    Ok, I got caught up in some other work but I'll come back to this tomorrow.
    Thanks!