Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
    Jeremy Goecks
    @jgoecks
    @bgruening Sorry, try again please
    Björn Grüning
    @bgruening
    works!
    Jeremy Goecks
    @jgoecks
    Excellent, and everyone should have full editor rights for everything as well
    Björn Grüning
    @bgruening
    Mh, its read the same like the single-cell people said in the beginning. Single-Cell is so complex you never can create a GUI for it. You always need to have experts :(
    Jeremy Goecks
    @jgoecks
    Agree, that's a frustrating line of criticism.
    Björn Grüning
    @bgruening
    its also hard to show. We can show runtime stats and point them to the ELIXIR workshop with 200+ participants ... but ...
    Jeremy Goecks
    @jgoecks
    Well, we did show three complex use cases that worked well. Let's think about whether there's anything else to be done. Maybe we need to push harder to get reviewers from the workflow/computational workbench community; it seems the NM reviewers were not familiar with Galaxy or infrastructure in general. We could also retitle in hopes the editors won't get all ML people as reviewers.
    Alireza Khanteymoori
    @khanteymoori
    @jgoecks thanks for the update. I am out next week too, will work on it after.
    Björn Grüning
    @bgruening
    We could also stronger emphasis on the underlying infrastructer. Meaning people not only get Gateway but also access to the hardware.
    Jeremy Goecks
    @jgoecks
    @bgruening I like that idea a lot, could fold in the software installation as well. So accessibility = no coding + scalable infrastructure
    Qiang Gu
    @qiagu
    Frustrating news! One thing in my mind is that we may emphasis the expandability. When there is a new algorithm, people can wrap it in sci-kit learn APIs, throw in existing UI structure, then it can take advantage of the existing facilities, like hyperparameter search tool, galaxy collection run, automation, and so on. In fact, we've done a lot of integration of third-party algorithms , and even our own algorithm implementation. Even for deep learning, I've approved the platform can expand to different scenarios, from sequence, imaging to other things.
    Björn Grüning
    @bgruening
    @qiagu I agree, this was one of the first design goals of the relatively complicated wrappers. Drop in a GUI element and get the rest for free.
    But I'm not sure this is a top selling argument for Nature.
    Qiang Gu
    @qiagu
    In the deep learning part, we may need to solve something new, like making a better model, not just repeating the cases, I guess.
    John Chilton
    @jmchilton
    Kaivan was asking about tool wrapper - is it true that scikit-learn is not available in the default conda channels of Galaxy?
    It feels like there must be a way others are using it without adding a new channel?
    Björn Grüning
    @bgruening
    It is available. It is in conda-forge
    And conda-forge is in the default channels
    John Chilton
    @jmchilton
    Yeah - odd - I wonder what that was about. https://github.com/conda-forge/scikit-learn-feedstock
    Björn Grüning
    @bgruening
    Does Kaivan has an error report? Or does conda search simply do not list it?
    Qiang Gu
    @qiagu
    I usually search the availability of certain package directly on anaconda.
    https://anaconda.org/search?q=scikit-learn
    conda-forge/scikit-learn is listed as the 1st.
    Jeremy Goecks
    @jgoecks
    @qiagu Agree with @bgruening We ran into a tough set of reviewers at Nature Methods who didn't see the value of democratizing ML—or perhaps we didn't write the manuscript well enough. Either way, my plan is minor revisions in the text and then send to Nature Biotechnology, where I think we will get more traction from reviewers.
    Simon Bray
    @simonbray

    Just to note: all the sklearn tool tests are failing when run in docker containers:

    Using TensorFlow backend.
    /usr/local/lib/python3.6/site-packages/sklearn/externals/joblib/__init__.py:15: DeprecationWarning: sklearn.externals.joblib is deprecated in 0.21 and will be removed in 0.23. Please import this functionality directly from joblib, which can be installed with: pip install joblib. If this warning is raised when loading pickled models, you may need to re-serialize those models with scikit-learn 0.21+.
      warnings.warn(msg, category=DeprecationWarning)
    Traceback (most recent call last):
      File "/home/simon/.planemo/planemo_tmp_mkwbqqfs/pca.py", line 4, in <module>
        from galaxy_ml.utils import read_columns
      File "/usr/local/lib/python3.6/site-packages/galaxy_ml/utils.py", line 15, in <module>
        import xgboost
      File "/usr/local/lib/python3.6/site-packages/xgboost/__init__.py", line 11, in <module>
        from .core import DMatrix, Booster
      File "/usr/local/lib/python3.6/site-packages/xgboost/core.py", line 136, in <module>
        _LIB = _load_lib()
      File "/usr/local/lib/python3.6/site-packages/xgboost/core.py", line 128, in _load_lib
        lib.XGBGetLastError.restype = ctypes.c_char_p
    UnboundLocalError: local variable 'lib' referenced before assignment

    I don't know if anyone cares about this or would like to deploy the ML tools in docker/singularity containers; this is just FYI.

    Qiang Gu
    @qiagu
    @simonbray I found similar issue below. Could you please check whether the docker was running with 32-bit python?
    dmlc/xgboost#3616
    Simon Bray
    @simonbray
    yes, I saw that issue as well. I'll check on Monday :+1:
    Anup Kumar
    @anuprulez
    I have added a few sentences about Galaxy infra in the ML paper: https://docs.google.com/document/d/1sbdLrPTUW_dyuVf0IUhDrZf_6UOfFSA9pbwfzSegr7Y/edit#
    We should schedule a meeting to discuss the way forward
    Jeremy Goecks
    @jgoecks
    Thanks @anuprulez The way forward is clear: make the changes suggested by Nature Methods reviewers and resubmit to Nature Biotechnology. This shouldn't take long but I haven't had a chance to do it. Any help from anyone would be great.
    kxk302
    @kxk302
    Hi,
    I was experimenting with Keras in Galaxy and noticed that it does not accept output files that have multiple columns. These files are needed for multiclass/multilabel classification problems. Looked at the Galaxy code and the check_X_y() method in galaxy_ml/keras_galaxy_models.py does not specify a value for ‘multi_output’. The default value for ‘multi_output’ is False, meaning that it expects y to be a vector (it errors out otherwise). It seems if we pass ‘multi_output=True’ to check_X_y() in galaxy_ml/keras_galaxy_models.py we might be able to handle output files with multiple columns. Just wanted to get feedback on this and see how this can be tested
    Qiang Gu
    @qiagu
    Yes, that's one place needing a revision to support multiple things. To make toolkit -wide support happen, there are other places we should take care of, suck as prediction output and evaluation scores.
    Inputs as well.
    kxk302
    @kxk302
    Can we have a Zoom meeting sometimes to discuss this? I would like to add support for multiple outputs but a need a bit of background info about ML workbench
    Its components, how to run a Galaxy instance locally with ML workbench installed, tests, etc
    kxk302
    @kxk302
    Thank you!
    Qiang Gu
    @qiagu
    Sure. We can do that.
    kxk302
    @kxk302
    Great! Could you please share your email with me? I’ll message you to schedule a date/time. Thx
    Qiang Gu
    @qiagu
    Qiang Gu
    @qiagu
    @kxk302 FYI, there is tool specific for image deep learning already. The image data generator, imo, offers advantages over the flatten tabluar inputs.
    https://github.com/goeckslab/Galaxy-ML/blob/master/galaxy_ml/tools/keras_image_deep_learning.xml
    The tool needs to be published though.
    Jeremy Goecks
    @jgoecks
    Two directions I'd like to see for the deep learning:
    1. implementation of some common models/layers in a library for reuse
    2. integration of ludwig (https://ludwig-ai.github.io/ludwig-docs/), which seems sufficiently powerful and high-level to be very useful
    Qiang Gu
    @qiagu
    I made a galaxy datatype for machine learning models. Please have a look, if you're interested in the topic.
    galaxyproject/galaxy#11825
    Björn Grüning
    @bgruening
    thanks made a comment
    and tensorflow has now a nice conda package with ARM and GPU support in version 2.4
    Qiang Gu
    @qiagu
    @bgruening thanks for the comment. Tool upgrading is close to be finished. We will get tensorflow v2.4.* soon. The datatype and dynamic options for hyperparameter are the last pieces.
    kxk302
    @kxk302
    Hello
    Was wondering if someone can help with this PR? Trying to get all my presenrations ready before GCC and appreciate any help.
    galaxyproject/training-material#2476
    Qiang Gu
    @qiagu
    image.png
    How about this kind of preview for h5mlm?