@peterjc You're doing an excellent work maintaining the BLAST+ wrappers, but if you want to share the burden, please open a PR!
I wanted to modify wrappers so that accept fasta.gz files that beome prelalent in what we’re doing here. So I’ll fork and take a pass over the weekend
Peter's repo is so awesome (the most awesome focused tool repository if I picked favorites). A goal of the IUC should be encouraging awesome, curated repos like that not only encouraging more and more repos to migrate to a big centralized repo. tools-iuc is awesome and y'all do amazing work - but we should be encouraging more people to build awesome repositories like Peter's - I'd be sad to lose his example. :sweat:
I also like the independent, highly curated repos. Would a good 'handle' to them, and a way to organize/endorse, be to add them as subrepos in IUC?
(ostensibly the tool shed should have done this for discoverability, but...)
Subrepos are just the first thing that came to mind, but it could be that or some other way to note/endorse external repos by IUC, seems a useful thing to look into.
PS: Thank you for the kind words folk. And I'll look forward to a pull request next week from Anton :+1:
Looking for some advice about data format conversion tools. A while back I had investigated a tool to convert from GTF to BED12, specifically for RSeQC tools which require the 12 column variant of BED files. However, at the time, BED12 was a "hidden" format and I wasn't able to specify it be an input type. I believe that has changed, but not sure. At least, I seem to be able to select the format when uploading now. If so, any advice on how I can/should write the conversion tool? I can write a straightforward one, but it would be nice if Galaxy could silently convert for people...
@mblue9 Yes, I think that would work, or be the best we can do. They don't seem to document the data source/version in the readme, just the content.
Now that content info is good -- maybe put a link to the subread tool package and a quote from the readme that describes what the annotation contains?
like this part, but edited. User have asked what the gene identifiers are (source), turns out they are entrez, which I didn't know till I looked here (but probably could have found out with a google, ha):
annotation Directory including NCBI RefSeq gene annotations for genomes 'hg19', 'hg38', 'mm10'and'mm9'.
Each row is an exon. Entrez gene identifiers and chromosomal coordinates are provided for each exon.
As for putting the data into a share library on the server, that seems possible, but perhaps hard to maintain. Not sure of the best solution here.
The GTN is going to be organizing training material data in a structured format in data libraries. Perhaps can model after that for any built-in annotation hosted on any particular server. And annotate the lib with source links, etc. Then maybe reference that data lib from the tool form (top level) so the tool help doesn't need to be updated every time the wrapper is updated. Just name the dir with the wrapper version whenever we update it. Thoughts?
name the sub-directories, not the top level. sorry if unclear
Hum, on second thought that won't work. Creates a dependency between the data libs on a server and the tool. Back to thinking could just add in some help on tool form about the built-in annotation content with links to the source (versioned). When the tool wrapper is updated, and new version of this source is used, the help noting the source version could be part of update.
Open to other ideas, I don't really like any of mine, but think we do need to be clear about annotation format, content, and source/version ... somehow
@jj-umn reported problem using RSEM tool repo owned by you. Do you know what could be going wrong? Do these tools work for you in the 18.01 stable release? or the 18.05 pre-release? https://biostar.usegalaxy.org/p/28044/
@jennaj I'll add info on the annotation to the featurecounts wrapper help section (as per your suggestion) after this featurecounts header PR galaxyproject/tools-iuc#1890 (assuming I should wait and make a separate PR)
there are also newer version of featurecounts 1.6.1 in bioconda and 1.6.2 not in bioconda, maybe would be good to update the wrapper
I'd also like to move the featurecounts stranded option up out of "Advanced options" to under "Alignment file" as another PR if noone objects
btw how are people adding gene names to macs2 and diffbind peaks? I don't have a good method for that yet
To get genes overlapping with peaks, you need to get a reference dataset with the gene bounds/coordinates and compare to the peaks by coordinate overlap. Don't think there is an easier way. This gets asked quite a bit, maybe we should build a tool for that ("annotate peaks")
And I like the idea of putting the info in the help and moving the strand assignment up top on the form. Digging for it is not always obvious and it really matters. The alignment has to be stranded, too. Maybe mention that in the help under the option? eg: "Strand setting must be the same as the strand settings used to produce mapped BAM input(s)". Could be more specific.. or better worded .. or we can leave it out and hope for the best :)
Other IUC peeps should comment about which version of featurecounts to use. Not sure which is better, or easier, to incorporate.
They're hitting a problem with mothur and readline - but I wonder why the <requirements> state vsearch but the command line is | mothur ?
oh I see the mothur is coming in via the macros.xml - that makes sense
@mblue9 Homer probably needs an update or replacement...
Maybe create a ticket at IUC github for discussion? We'll have that large codefest in later June/early July at GCCBOSC, could put it out there as a potential project for hacks unless someone else picks it up first
we started a few years ago to integrate the HOMER suite, which is really a great tool collection, but never really got to it anymore
I've been in a discussion with @nekrut about the newick data format in Galaxy. Unfortunately, this format uses the nhx extension instead of just using newick. @nekrut would like me to submit a PR to the Galaxy core to change this extension to be newick. Although nhx is based on newick, it seems to have become deprecated, and phyloxml is advised, at least based on this link: https://sites.google.com/site/cmzmasek/home/software/forester/nhx. But changing the extension in Galaxy will affect tools in both the tools-devteam and tools-iuc repos. My understanding is that support for a new PhyloTree viz is soon coming to the Galaxy core. This viz will be used in the HIV research with which Galaxy is involved, and it will be using the newick format as well. What does the iuc advise here?
can someone of the admins grep in the TS if this extension is used in TS tools? At least to get an estimation of the problem?
We use it for a few Earlham tools
I'm not sure I understand what the problem is, is it just a change of name?
@jennaj I've created an issue for discussion of wrapping HOMER here galaxyproject/tools-iuc#1892
@mblue9 nice! thank you
Hiya there... I'm working on a tool that merges BLAST XML outputs... this tool takes either multiple datasets as input or a list collection, and outputs a single output (i.e. it is a "reduce" step) - is there any way to unify this or must I just use a conditional for switching between the two input modes?
If you use a data parameter with multiple=true you will be able to run it on individual datasets or a collection. You’d need a conditional if you wanted to either (1) have more control of the order of the datasets (use a repeat) or (2) you want to be able to map over a collection instead of reduce it (less likely in this case it sounds like)
Thanks @jmchilton - the tool is done then, #1893 - just waiting on the checks to complete. Thanks for the CWL work on planemo btw - I have yet to use it because of the messiness of the CWLs I've been working with lately but it is nice to see that ecosystem get some attention