Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
    Michael Barton
    @michaelbarton
    @pbelmann Sorry I didn’t get a chance to look at #147 today.
    pbelmann
    @pbelmann
    no problem Michael
    Christian Frech
    @Gig77
    Bioboxes are awesome and could make all our lives easier. The one thing I worry about is the need for Yaml files, because they generate quite some overhead for both users and developers. Why not stick to good ol' Linux command line parameter syntax (see git, samtools, etc. for good examples)? To keep bioboxes exchangeable a spec could still define required and optional parameters for each class of tools that could even be enforced/validated. So running an assembler could be as easy as 'docker run velvet --input-fastq=in.fastq -o contigs.fa'. Other assembler? 'docker run ray --input-fastq=in.fastq -o contigs.fa'. Proper volume mounts would stay the responsibility of the user so that files can be found. Another advantage of this would be that piping is still possible, e.g. 'cat in.fastq | docker run velvet | gzip > out.fa.gz'. I think that would be closer to the hearts of the creators of GNU/Linux and Docker. Thoughts?
    Christian Frech
    @Gig77
    What about Yaml being an option only for Bioboxes that require complex input data types like assemblers (e.g. via 'docker run velvet --yaml inputs.yaml'), instead of making Yaml mandatory for all Bioboxes?
    pbelmann
    @pbelmann
    @Gig77 We originally started with environment variables and it became quite complicated when you want for example assign multiple fasta files different insert sizes.
    But mixing different interfaces might work, yes.
    I'm not sure if we could still integrate piping in bioboxes even with the current yaml based interface.
    We had our longest discussion regarding interfaces in issue #61. Everything in Bioboxes is open for discussion, so feel free to create issues with a proposal for mixing interfaces or passing a yaml with the commandline.
    Michael Barton
    @michaelbarton
    We could consider a simpler command line interface over the top of the YAML API. This might be a script that takes a fastq file and then takes care of mounting the files and generating the bioboxes.yml file.
    pbelmann
    @pbelmann
    @michaelbarton I agree
    Michael Barton
    @michaelbarton
    @Gig77 Would you be interested in implementing a simpler CLI over the existing YAML one?
    ecerami
    @ecerami
    hello, bioboxes people. I had a few newbie questions for you...
    Michael Barton
    @michaelbarton
    Yes, I’ll try to help if I can.
    You can ask any questions you have.
    ecerami
    @ecerami
    hi. Sorry, stepped away. I am basically wondering which doc to read that explains how bioboxes is distinct from docker.
    Michael Barton
    @michaelbarton
    Bioboxes is a standard for docker containers. We make suggestions for how certain types of docker containers should respond to different inputs and outputs.
    For example the short read assembler spec describes how docker containers of these software should accept input and give output.
    The aim is to make them interchangeable, all with the same interface.
    ecerami
    @ecerami
    ok, thanks. which document explains this though? I am looking at the https://github.com/bioboxes/rfc. this look also appears broken: http://bioboxes.org/getting-started/. I am just looking for best starting point for documentation. thanks.
    Michael Barton
    @michaelbarton
    The website http://bioboxes.org has the most recent documentation. You find that this site doesn’t really explain how bioboxes relates to docker?
    pbelmann
    @pbelmann
    @ecerami the links on https://github.com/bioboxes/rfc are fixed now. You can find the user guide here: http://bioboxes.org/guide/user/
    ecerami
    @ecerami
    thanks, everyone. for a complete newbie, yes I found the documentation a bit hard to follow. for example, this page: https://github.com/bioboxes/rfc is a good intro, but it's not obvious where the actual meat of the RFC is, or whether Assembly, Binning, and Profiling are starting points for specific types of applications, or what I would do if I wanted to create an application that did not fall into one of these three categories. Anyway, I will read more. thanks.
    pbelmann
    @pbelmann
    @ecerami I agree it does not directly lead to the interfaces.
    We want to display the github rfc in bioboxes.org so that a developer/user does not have switch betweent bioboxes.org and github. But for now we should maybe reference is just from here: https://github.com/bioboxes/rfc as you stated .
    Could you create an issue in github for everything you think could be improved or even better provide a pull request?
    Michael Barton
    @michaelbarton
    The next bioboxes review meeting is set for July 02, the isse is #159.
    In the last meeting we agreed to have more focused milestones to help organise development goals. The milestone for the next three months will be increasing usage of biobox and to do this we will start tracking downloads - #157.
    @Gig77 In response to your comments and that of others in a similar vein, we will start developing a simpler interface to allow using bioboxes in development workflows #152.
    @pbelmann Do you need the binning validator set up for download from EC2?
    pbelmann
    @pbelmann
    Yes that would be great.
    and assembly benchmark validator too
    Michael Barton
    @michaelbarton
    Ok
    pbelmann
    @pbelmann
    thanks michael
    Michael Barton
    @michaelbarton
    I’ve created a docker container repository which should simplify this.
    The circle ci server still needs the EC2 parameters however.
    So it still requires manual work.
    pbelmann
    @pbelmann
    ok
    Michael Barton
    @michaelbarton
    Perhaps AWS code pipeline might be useful - http://aws.amazon.com/codepipeline/
    It’s still in beta
    pbelmann
    @pbelmann
    But you would still have to provide EC2 keys ?
    Michael Barton
    @michaelbarton
    Yes, hopefully it would allow to create a deployment template or something like that. At the moment I basically have to copy and paste a set of commands each time into circle ci.
    pbelmann
    @pbelmann
    ah ok
    A container runtime too - http://blog.docker.com/2015/06/runc
    Open containers project - https://www.opencontainers.org/
    pbelmann
    @pbelmann
    I think appc was already a great start for a container runtime. I hope they will reuse the most part of it.
    Michael Barton
    @michaelbarton
    Metrics page is now live on the site - http://bioboxes.org/metrics/
    Michael Barton
    @michaelbarton
    My suggestion for the biobox command line interface
    biobox short-read-assemble bioboxes/velvet -i FASTQ -o CONTIGS
    Michael Barton
    @michaelbarton
    Bioboxes data file - https://github.com/bioboxes/data
    Johannes Dröge
    @fungs
    I think the syntax is clear and just what I was thinking of. I'd suggest to make it a bit more abstract for further extension and then simplify via shortcuts/alias names like:
    biobox run --container docker://bioboxes/velvet --specification biobox.yaml --arguments -i FASTQ -o CONTIGS
    or shorthand:
    biobox run docker://bioboxes/velvet -i FASTQ -o CONTIGS
    docker:// is the container runtime backend
    run is analogous to docker run (further commands to be added),
    the specification can be passed as a file but if the biobox command can link the container id and the spec itself via metadata, then there is no need to pass it.
    Johannes Dröge
    @fungs
    IMO, there should also be an option to pass the YAML file itself (I believe it is better to let the YAML file point to valid data on the local system and to let the biobox wrapper transform it to paths according to the container-internal mount points before passing it to the containe)