Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
  • Jan 31 2019 17:58
    jorgemachucav starred galaxyproject/tools-iuc
  • Jan 31 2019 17:45
    bebatut opened #2270
  • Jan 31 2019 16:18
    cpreviti synchronize #2267
  • Jan 31 2019 14:15
    cpreviti synchronize #2267
  • Jan 31 2019 12:42
    bernt-matthias review_requested #2269
  • Jan 31 2019 12:42
    bernt-matthias edited #2269
  • Jan 31 2019 12:41
    bernt-matthias edited #2269
  • Jan 31 2019 12:40
    bernt-matthias synchronize #2269
  • Jan 31 2019 12:13
    cpreviti commented #2267
  • Jan 31 2019 12:07
    nsoranzo commented #2267
  • Jan 31 2019 12:01
    cpreviti synchronize #2267
  • Jan 31 2019 11:21
    cpreviti synchronize #2267
  • Jan 31 2019 09:47
    cpreviti synchronize #2267
  • Jan 31 2019 09:27
    cpreviti synchronize #2267
  • Jan 30 2019 20:38
    bernt-matthias commented #2131
  • Jan 30 2019 20:19
    hepcat72 commented #2239
  • Jan 30 2019 19:50
    lparsons commented #2239
  • Jan 30 2019 18:36
    bgruening commented #2268
  • Jan 30 2019 15:23
    nsoranzo commented #2268
  • Jan 30 2019 15:23
    nsoranzo commented #2267
Marius van den Beek
@mvdbeek
I think that should work
Anton Nekrutenko
@nekrut
alright - I’ll test in the afternoon (expect unusual activity on the channel ;)
Marius van den Beek
@mvdbeek
$ paste <(echo "hi") <(echo "there")
hi    there
alright, good luck!
mblue9
@mblue9
Sorry @lparsons I missed your notifications above. Totally agree, the different featurecounts outputs for deseq2 vs multiqc/edger/limma-voom is not good. And there's no good reason for it, only that the limma-voom tool that I copied to make the edger tool expected a header and apparently multiqc needs a header to know what the sample is, whereas the deseq2 wrapper doesn't expect a header. I agree that probably the best solution is to have deseq2 also accept a header and remove the featurecounts deseq2 output. I could do that when I can get the time unless someone else gets to it first. P.S. You're braver than me I haven't tried subworkflows yet, I think I'll wait til I hear they work ;) thanks for testing!
mblue9
@mblue9

@jennaj sorry only seeing your notification now too, the featurecounts annotation files are not a data table they're in the subread conda package (featurecounts is part of subread), they should be in the annotation dir beside the bin dir e.g.

ls envs/__subread\@1.6.0/annotation/ hg19_RefSeq_exon.txt hg38_RefSeq_exon.txt mm10_RefSeq_exon.txt mm9_RefSeq_exon.txt

but this is the line in the featurecounts wrapper that adds the path to the annotation folder: https://github.com/galaxyproject/tools-iuc/blob/42cb8c709549fd2fe8882be18eaa1dffe17474f8/tools/featurecounts/featurecounts.xml#L12

could something be wrong there? as I don't see the built-in files in usegalaxy.eu either, but they work in our instance (and work well afaict)

mblue9
@mblue9
forgot to add the built-in annotation format is SAF and is shown here: http://bioinf.wehi.edu.au/featureCounts/
SAF=Simplified Annotation Format
GeneID    Chr    Start    End    Strand
497097    chr1    3204563    3207049    -
497097    chr1    3411783    3411982    -
497097    chr1    3660633    3661579    -
Jennifer Hillman-Jackson
@jennaj
@mblue9 Thanks, that helps a lot. Are these versioned in any way? RefSeq updates daily plus has full releases. Be good to know when a certain amount of time has passed and these should be updated. Might be good to reveal that date (and the original source) to the user.
They are sort of using annotation "blind" with respect the the version this way. I know users supply all kinds of reference data from the history that isn't versioned, but when supplied by Galaxy, we tend to version these (or did.....). Megablast is one example - that tool's target database's were labeled by the genbank division (which implies source) and date retrieved, in the select name revealed to users.
Not trying to make this overly complicated, just thinking of how cached data should be modeled so it has some legs, eg: easier to update over time, have multiple versions present (for reproducibility), etc
Jennifer Hillman-Jackson
@jennaj
This is a big question and could be done a few ways. Maybe a new data table even, to standarize all reference data -- whether genome fasta/indexes (what is in data tables now) or ref anno (could be in a data table, instead of a specialize per-tool data construct)
I like using data tables slightly better but is not my call. There could be reasons why not to do that. Hopefully others might comment, maybe @blankenberg has an opinion?
mblue9
@mblue9
@jennaj the version of the featurecounts built-in annotation would be the version of subread I guess e.g. 1.6.0, see here http://subread.sourceforge.net/
it's 4 files (hg19, hg38, mm9 and mm10) of 7Mb each, could they be provided in Shared Data if users want to see them?
Anton Nekrutenko
@nekrut
@mvdbeek -> tx for this this echo example
Marius van den Beek
@mvdbeek
np
Anton Nekrutenko
@nekrut
does anyone knows where nbci blast+ is? I cannot see it in iuc or devteam on girhub
Anton Nekrutenko
@nekrut
aha, @peterjc , I see
Peter Cock
@peterjc
Yep, RE: peterjc/galaxy_blast#101 I assume? But perhaps it would be sensible to move the BLAST+ wrappers into IUC at some point?
Nicola Soranzo
@nsoranzo
@peterjc You're doing an excellent work maintaining the BLAST+ wrappers, but if you want to share the burden, please open a PR!
Anton Nekrutenko
@nekrut
+1 here
I wanted to modify wrappers so that accept fasta.gz files that beome prelalent in what we’re doing here. So I’ll fork and take a pass over the weekend
John Chilton
@jmchilton
Peter's repo is so awesome (the most awesome focused tool repository if I picked favorites). A goal of the IUC should be encouraging awesome, curated repos like that not only encouraging more and more repos to migrate to a big centralized repo. tools-iuc is awesome and y'all do amazing work - but we should be encouraging more people to build awesome repositories like Peter's - I'd be sad to lose his example. :sweat:
Dannon
@dannon
I also like the independent, highly curated repos. Would a good 'handle' to them, and a way to organize/endorse, be to add them as subrepos in IUC?
(ostensibly the tool shed should have done this for discoverability, but...)
Subrepos are just the first thing that came to mind, but it could be that or some other way to note/endorse external repos by IUC, seems a useful thing to look into.
Peter Cock
@peterjc
Interesting thought. You could also fork and mirror some high profile (semi)independent repositories under the Galaxy GitHub account? We do something like that with work tools https://github.com/HuttonICS - see https://blastedbio.blogspot.co.uk/2016/05/sync-github-mirror-with-cron.html for automating the updating side of this, which ought to be easier
Peter Cock
@peterjc
PS: Thank you for the kind words folk. And I'll look forward to a pull request next week from Anton :+1:
Lance Parsons
@lparsons
Looking for some advice about data format conversion tools. A while back I had investigated a tool to convert from GTF to BED12, specifically for RSeQC tools which require the 12 column variant of BED files. However, at the time, BED12 was a "hidden" format and I wasn't able to specify it be an input type. I believe that has changed, but not sure. At least, I seem to be able to select the format when uploading now. If so, any advice on how I can/should write the conversion tool? I can write a straightforward one, but it would be nice if Galaxy could silently convert for people...
Jennifer Hillman-Jackson
@jennaj
@mblue9 Yes, I think that would work, or be the best we can do. They don't seem to document the data source/version in the readme, just the content.
Now that content info is good -- maybe put a link to the subread tool package and a quote from the readme that describes what the annotation contains?
like this part, but edited. User have asked what the gene identifiers are (source), turns out they are entrez, which I didn't know till I looked here (but probably could have found out with a google, ha):
--------------
annotation    Directory including NCBI RefSeq gene annotations for genomes 'hg19', 'hg38', 'mm10' and 'mm9'.
              Each row is an exon. Entrez gene identifiers and chromosomal coordinates are provided for each exon.
As for putting the data into a share library on the server, that seems possible, but perhaps hard to maintain. Not sure of the best solution here.
Jennifer Hillman-Jackson
@jennaj
The GTN is going to be organizing training material data in a structured format in data libraries. Perhaps can model after that for any built-in annotation hosted on any particular server. And annotate the lib with source links, etc. Then maybe reference that data lib from the tool form (top level) so the tool help doesn't need to be updated every time the wrapper is updated. Just name the dir with the wrapper version whenever we update it. Thoughts?
name the sub-directories, not the top level. sorry if unclear
Jennifer Hillman-Jackson
@jennaj
Hum, on second thought that won't work. Creates a dependency between the data libs on a server and the tool. Back to thinking could just add in some help on tool form about the built-in annotation content with links to the source (versioned). When the tool wrapper is updated, and new version of this source is used, the help noting the source version could be part of update.
Open to other ideas, I don't really like any of mine, but think we do need to be clear about annotation format, content, and source/version ... somehow
Jennifer Hillman-Jackson
@jennaj
@jj-umn reported problem using RSEM tool repo owned by you. Do you know what could be going wrong? Do these tools work for you in the 18.01 stable release? or the 18.05 pre-release? https://biostar.usegalaxy.org/p/28044/
mblue9
@mblue9
@jennaj I'll add info on the annotation to the featurecounts wrapper help section (as per your suggestion) after this featurecounts header PR galaxyproject/tools-iuc#1890 (assuming I should wait and make a separate PR)
there are also newer version of featurecounts 1.6.1 in bioconda and 1.6.2 not in bioconda, maybe would be good to update the wrapper
I'd also like to move the featurecounts stranded option up out of "Advanced options" to under "Alignment file" as another PR if noone objects
mblue9
@mblue9
btw how are people adding gene names to macs2 and diffbind peaks? I don't have a good method for that yet
Jennifer Hillman-Jackson
@jennaj
To get genes overlapping with peaks, you need to get a reference dataset with the gene bounds/coordinates and compare to the peaks by coordinate overlap. Don't think there is an easier way. This gets asked quite a bit, maybe we should build a tool for that ("annotate peaks")
Jennifer Hillman-Jackson
@jennaj
And I like the idea of putting the info in the help and moving the strand assignment up top on the form. Digging for it is not always obvious and it really matters. The alignment has to be stranded, too. Maybe mention that in the help under the option? eg: "Strand setting must be the same as the strand settings used to produce mapped BAM input(s)". Could be more specific.. or better worded .. or we can leave it out and hope for the best :)
Other IUC peeps should comment about which version of featurecounts to use. Not sure which is better, or easier, to incorporate.
mblue9
@mblue9
thanks @jennaj, we used to have the homer annotate peaks tool that worked well afaik, this tool https://toolshed.g2.bx.psu.edu/view/kevyin/homer/f0b5827b6051
but just tried to install it and it didn't work, (might be our setup) I get Conda False in the Manage Tool Dependencies