Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
    Michael Barton
    @michaelbarton
    The problem is that they may all want a different database?
    pbelmann
    @pbelmann
    yes
    Michael Barton
    @michaelbarton
    And providing all databases it too much space.
    pbelmann
    @pbelmann
    yes
    But at the moment I have kraken and metabat as examples and they don't have to use a database.
    Michael Barton
    @michaelbarton
    That’s good.
    I think for the user, not having a database is simpler, and so they might end up being used more.
    pbelmann
    @pbelmann
    Yes that's true. But another problem is that kraken has custom database and custom databases are the worst case for bioboxes because there is no standard. So kraken developers offer a mini kraken datbases about 4 GB that will always be downloaded before kraken starts. I entroduced a cache parameter so that a database might be reused but that is still not a nice solution.
    Michael Barton
    @michaelbarton
    Yes, that’s not ideal either.
    @fungs mentioned converting them to fasta, would that help with standardization?
    I guess that kraken needs it to be in it’s own special format.
    pbelmann
    @pbelmann
    I'm not sure.
    Even if it is in fasta
    you don't want to add 100+ entries in the yaml.
    Michael Barton
    @michaelbarton
    Yes
    I agree
    That’s tricky.
    pbelmann
    @pbelmann
    Yes
    Michael Barton
    @michaelbarton
    It makes it harder for a user because then they have to manage all the databases too.
    pbelmann
    @pbelmann
    All they would have to do is download the database (I added the ftp link) and reference it in the yaml. I'm not sure if that is such a big problem. I'm more worried that it is not really interchangeable, or at least just to certain degree.
    Michael Barton
    @michaelbarton
    Yes
    You won’t be able to swap out the bioboxes.
    pbelmann
    @pbelmann
    exactly
    Michael Barton
    @michaelbarton
    If they use different databases, you’d have to change the YAML.
    pbelmann
    @pbelmann
    yes
    Michael Barton
    @michaelbarton
    What do you think?
    pbelmann
    @pbelmann

    I would continue to work on the spec (actually I started today with the validator) and we have to admit that there are some tools that allow bioboxes to be interchangeable just to a certain degree. It does not mean that we should stop following the aim of creating interchangeable tools but maybe this means that we have to find way on the long ther to make it as easy as possible to use such tools. e.g:

    • Find a way that the tools could report in yaml which databases they need (I think in the initial propasal you wrote that he tools could report their types for example.)

      ... we can instead specify a list of morphisms and each container can list which of those they implement. ...

    • Maybe to create a container that checks if another container needs a database and downloads it and places it somewhere (something like an adapter). I mean if you want to use binning tools that uses such databases you would have to download it anyway.

    I'm sure for profiling tools we will have the same problems.

    pbelmann
    @pbelmann
    What do you think?
    Michael Barton
    @michaelbarton
    I agree, this is a good path forward.
    We can continue to evolve the spec as we have done with the assembler.
    We are discussing putting Docker/bioboxes in production here at the JGI.
    pbelmann
    @pbelmann
    That's good to hear. :+1:
    Michael Barton
    @michaelbarton
    And so this could help identify issues with the specs, however it would mean that developers here would write bioboxes.
    pbelmann
    @pbelmann
    Wow, that would be great
    Michael Barton
    @michaelbarton
    We are mostly interested in the preprocessors and assemblers, as we have standard proprocessing and assembly pipelines.
    This is still longer term though.
    The sys admins are experimenting with how to run Docker on the shared super computer cluster.
    pbelmann
    @pbelmann
    Yes I would really like to help with the prepocessing containers. I think they are not that that difficult right?
    Michael Barton
    @michaelbarton
    No, they should be simple.
    pbelmann
    @pbelmann
    ah ok
    Michael Barton
    @michaelbarton
    I have to manage some other responsiblities here at the JGI so I have to juggle my time.
    Also my laptop was stolen so I can’t work at home for the time being either.
    pbelmann
    @pbelmann
    Oh no,really
    Michael Barton
    @michaelbarton
    However I did start to experiment with parsing the signature into the json spec - https://github.com/michaelbarton/bioboxes-signature-validator
    If this works we would not have to write the spec documents ourselves.
    This would simplify development, I think.
    We would only enforce each container provides the default signature.
    Anyway.
    I have to go.
    pbelmann
    @pbelmann
    Ok. see you