Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
  • Jan 31 17:58
    jorgemachucav starred galaxyproject/tools-iuc
  • Jan 31 17:45
    bebatut opened #2270
  • Jan 31 16:18
    cpreviti synchronize #2267
  • Jan 31 14:15
    cpreviti synchronize #2267
  • Jan 31 12:42
    bernt-matthias review_requested #2269
  • Jan 31 12:42
    bernt-matthias edited #2269
  • Jan 31 12:41
    bernt-matthias edited #2269
  • Jan 31 12:40
    bernt-matthias synchronize #2269
  • Jan 31 12:13
    cpreviti commented #2267
  • Jan 31 12:07
    nsoranzo commented #2267
  • Jan 31 12:01
    cpreviti synchronize #2267
  • Jan 31 11:21
    cpreviti synchronize #2267
  • Jan 31 09:47
    cpreviti synchronize #2267
  • Jan 31 09:27
    cpreviti synchronize #2267
  • Jan 30 20:38
    bernt-matthias commented #2131
  • Jan 30 20:19
    hepcat72 commented #2239
  • Jan 30 19:50
    lparsons commented #2239
  • Jan 30 18:36
    bgruening commented #2268
  • Jan 30 15:23
    nsoranzo commented #2268
  • Jan 30 15:23
    nsoranzo commented #2267
Helena Rasche
@erasche
@mvdbeek we found a solution! @shiltemann discovered the 'assembly report' in NCBI's ftp, e.g. ecoli k-12 or hg38, where things are separated as "assembled molecule" or "unlocalized-scaffold", so, good, someone does have that information as I suspected :)
I guess we will now look into writing a data manager
Björn Grüning
@bgruening
:+1:
Dave B.
@davebx
:thoughtful beard stroke: is there a mechanism to populate a select param's options from a dataset's metadata file?
M Bernt
@bernt-matthias
metadata file?
Dave B.
@davebx
yeah, e.g. the bam index (though not specifically that)
I'm working on a thing to put the info/format/filter data from a vcf file into dataset metadata, but it would get too big for the database, so it needs to go in a file. And then the tool needs a way to access that information so that parameters can be populated
Björn Grüning
@bgruening
@davebx I would love to have this
I was searching a few days for a tool that does that, as it seems like a so natural use-case
but could not find a tool
Dave B.
@davebx
well, we need it for some gatk4 tools, so it's going to be a thing. I have the metadata part done, now I need the tool param part
Björn Grüning
@bgruening
We also need this for a few HiC tools
Helena Rasche
@erasche
code files?
that's the normal thing
Dave B.
@davebx
I thought those were deprecated?
Helena Rasche
@erasche
see conversation from above
according to marius
they're only deprecated in the sense that "if there is a better way don't use them"
Dave B.
@davebx
aha
Helena Rasche
@erasche
(which was a TIL for me too)
Björn Grüning
@bgruening
I'm not sure those tools will work with puslar etc ...
Helena Rasche
@erasche
they should
it's calculated beforehand
data sent with job
they don't work if you want to use any fun dependencies
e.g. apollo tools which had to copy-and-paste in a TTL library
M Bernt
@bernt-matthias

Since I'm just working on the data_meta filter the discussion is really helpful. Currently only metadata given in the form of text or lists is used properly. Anything else, like dict, orderedDict, or even files is currently ignored (or better: I don't know what would happen). For the files my guess would be that the filenames are listed.

Is there somewhere a list of possibilities in which data structures metadata should be encoded?

Dave B.
@davebx
I'm not aware of an existing list
M Bernt
@bernt-matthias

Seems to be int, string, list, Ordered dict (which are MetadataParameter) and file names (which are metadata.FileParameter ) .. so one can differentiate them.

Question is if there is an easy way to address the data of interest

How about the following:

  • for int and string metadata: just add them to the options
  • for metadata given as lists: we extend the options list by the contents of the list
  • for metadata given as mapping types: add the key value pairs as comma separated strings to the options list
  • for files: add the lines of the file to the options list (would only work for text files)

For the latter case a combination of the regexp and multiple_splitter filters the the relevant pieces of the info can be extracted.

One thing that one needs to keep in mind is that usually only a part of large files is considered.

M Bernt
@bernt-matthias
One alternative would be to implement something analogous to from_dataset, e.g. from_dataset + key ...
Dave B.
@davebx
yeah, I was thinking along the lines of (vague idea so far) <options from_metadata="metadata.FileParameter element name" format="json" values="jsonfile.key[.key.key]" /> or <options from_metadata="metadata.FileParameter element name" format="xml" values="(tag name)" />
M Bernt
@bernt-matthias
OK - I see. Your are thinking of more complex metadata file types like json or xml. I'm not sure if this is a good idea. Because this potentially adds quite a bit of computation to the filters. Just imagine a collection of such data sets, then then for computing the option values for each an xml/json file needs to be parsed. Also I think it would be a good idea to stick to the "rule" to consider only the 1st MB of the data -- which might be a problem.
M Bernt
@bernt-matthias
Still, I think allowing simple txt files is a good idea. Then the same could be achieved by doing a bit of precomputing in the datatype (i.e. preparing an additional simple txt file (or dict) from the more complex files). Then this happens only once during the creating of the data set. ... Though, also datatypes have this 1MB "rule" -- I guess mainly because metadata computation happens on the head node (as far as I know) .. would be better to have this happening in the jobs.
M Bernt
@bernt-matthias
My last commit (https://github.com/galaxyproject/galaxy/pull/8599/commits/e41f13443fc36412777fb4d63e2e618fa65dca2b) here galaxyproject/galaxy#8599 would add mapping types to the data_meta filter. In the same way I guess files would be possible - simply as an additional if statement.
Dave B.
@davebx
I suppose we could store the VCF header information in a tabular file
at least INFO and FORMAT seem to have 4 key-value pairs each
dang, FILTER is an exception
Lucille Delisle
@lldelisle
Hi,
When you use planemo locally to test a wrapper, how do you know which version of galaxy is used?
Nicola Soranzo
@nsoranzo
By default it's the master branch
Lucille Delisle
@lldelisle
Thanks @nsoranzo
Nadia Goué
@nagoue
Hi, we have updated our galaxy-dev instance to 19.05 and I face an issue with data_manager_manual : No module named 'urllib2'. I removed and reinstalled the tool, but the error remains. As urllib2 is not compatible with python 3, how should I handle this ?
Nadia Goué
@nagoue
OK, think I solved it. I will do a PR
Nicola Soranzo
@nsoranzo
@nagoue Also have a look at galaxyproject/tools-iuc#2032
Nadia Goué
@nagoue
@nsoranzo , thanks I didn't see it
Dave B.
@davebx
@bernt-matthias so for a real-world use case for what I have in mind, GATK's VariantsToTable function takes a list of fields from the INFO part of the vcf header. I'd rather not store all of that in database metadata, so a metadata file makes sense to me, but (as discussed) we don't (as far as I can tell) have an option to populate e.g. a multi-select from metadata stored in a file
Prashant Kumar Kuntala
@PrashantKuntala
What is the recommended way to add a java jar tool ? I see that jar files are being git ignored
Dave B.
@davebx
I usually put the jar stuff in bioconda
see also gatk, picard
Prashant Kumar Kuntala
@PrashantKuntala
thank you @davebx
Prashant Kumar Kuntala
@PrashantKuntala
Hi everyone, I'm trying to use Pillow python's imaging library as a requirement for one of the tools and planemo tests fail with an error ImportError: No module named PIL. My understanding is Pillow is not available in conda-forge or bioconda. Is there a way to use Pillow