Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
  • 19:47
    mozack commented #1331
  • 14:51
    daudn commented #1244
  • 13:44
    ewels opened #1334
  • 11:40
    daudn commented #1331
  • 09:21
    daudn commented #1331
  • Oct 12 07:59
    pditommaso commented #1034
  • Oct 12 03:01
    davisem commented #1332
  • Oct 11 17:32
    junaruga commented #1034
  • Oct 11 16:43
    mes5k commented #838
  • Oct 11 16:10
    pditommaso closed #1034
  • Oct 11 16:09
    pditommaso closed #838
  • Oct 11 16:09
    pditommaso commented #838
  • Oct 11 16:03
    pditommaso labeled #1318
  • Oct 11 16:03
    pditommaso commented #1318
  • Oct 11 15:53

    pditommaso on master

    shifterimg workaround for ensur… (compare)

  • Oct 11 15:53
    pditommaso closed #1327
  • Oct 11 13:11
    pditommaso commented #1331
  • Oct 11 13:09
    daudn commented #1331
  • Oct 11 12:44
    pditommaso commented #1331
  • Oct 11 10:54
    daudn commented #1331
Maxime Garcia
@MaxUlysse
maybe with a .collect() to be sure to catch multiple executions of said process
Marko Melnick
@Senorelegans
I am concatenating fastq files by groups (I made a parser in groovy to separate by group). Is there no way to just make one process wait for another one with a dummy channel or some null variable?
I guess my real issue is I am reading channels from pairs earlier. And I have a list of the file name with condition group in a sample table that I can read, but I am struggling bringing them and operating on them by group.
micans
@micans
@Senorelegans can you make a functioning toy example that illustrates your issue? Grouping is usually done by groupKey -> transpose -> groupTuple; e.g.
Channel.from(['a', [1, 2, 3]], ['b', [4, 5]], ['c', [6, 7, 8]])
   .map { tag, stuff -> tuple( groupKey(tag, stuff.size()), stuff ) }
   .view()
   .transpose()
   .map { tag, num -> [tag, num*num+1 ] }
   .view()
   .groupTuple()
   .view()
Ashley S Doane
@DoaneAS
@rsuchecki any ideas on this issue with java cpu use when nextflow is reading the results cache following nextflow -resume? I’m running nextflow on plenty of available resources (192 CPUs, 10T ram), but I’ll try submitting nextflow command as a job. This way SGE will kill it if CPU usage is too high.
Ashley S Doane
@DoaneAS

Also, I had the

executor {
    queueSize=1000
}

But thinking more about how sge works, this is not necessary (jobs are not run based on how long they have been in queue, but based on job priority that SGE determines and updates). Seems possible that this setting coild have caused an issue.

Anthony Ferrari
@af8
Hi all, is it possible to have publishDir directive to resolve symlinks in move mode ? I have a folder going from a process to another, I modify its content in the second process and publish it. But what is actually published is the symlink to the first process workdir. I would like it to be moved physically. Thanks
Rad Suchecki
@rsuchecki
A few ideas @DoaneAS - but I am not convinced by any of them... here is a couple
  • If the problem persists try to look into JVM settings regarding memory and GC - which could be the cause - but why would there be that much garbage to collect in the first place?
  • IO - NF process having to shift all the result files to publishDir - again there is not that many of them so not sure this could be the issue but you could disable publishDir in the first place to see if it makes a difference and re-enable in another run with -resume to publish the cached results.
Ashley S Doane
@DoaneAS
@rsuchecki thanks for the suggestions!
marchoeppner
@marchoeppner
Hi, quick question about joining multiple channels (2 parts to this question):
if I have multiple output channels, each of the format [ a_label, a_result ] - would I put them together by successive ".join" statements?
input_to_summary = read_files_summary .join(fusioncatcher_fusions) .join(star_fusion_fusions) .join(ericscript_fusions) .join(pizzly_fusions) .join(squid_fusions)
or is there a more elegant way?
Second part, what would be a way to deal with empty channels in this scenario?
scenario is the following: multiple parallel analyses of the same sequencing data, which are then merged into a report - each analysis being optional.
micans
@micans
@marchoeppner I don't see anything more elegant; to solve the empty channels I can only think of using mix() combined with groupTuple(), e.g.
a = Channel.from(['a', 1], ['b', 2])
b = Channel.from(['a', 3], ['b', 4])
c = Channel.empty()

a.mix(b).mix(c).groupTuple().view()
marchoeppner
@marchoeppner
ok thanks, I will try that!
micans
@micans
@marchoeppner note that groupTuple() will block until the channel has completed. In your case, you may know the size of each eventual tuple (as it is the number of analyses), so you could give it the size parameter, in that case a tuple is released once it has that size.
micans
@micans
@marchoeppner further caveat; the tuples you get can have the analyses in different orders. If the elements are files I imagine it does not matter much.
marchoeppner
@marchoeppner
that might actually be a problem, since I need to do something like:
input: set val(sample_id),file(analysis1),file(analysis2),file(analysis3) from Foo
so it still is a bit tricky I reckon..
it's not my pipeline, so I don't have too much control over the basic logic of it all, just trying to implement support for multiple input data sets (right now it assumes that there is only one sample)
micans
@micans

How is this

input: set val(sample_id),file(analysis1),file(analysis2),file(analysis3) from Foo

going to work if some analysis could be missing? As for the order that could be fixed I think with an additional sorting step.

marchoeppner
@marchoeppner
indeed ^^
it used to be multiple arguments under "input:" with an added ".ifEmpty('')" - but that seems difficult to do now
well the empty channel needs to emit something at least, then one could try to verify each element to see if it is an actual file or just a placeholder , like "''" or NULL
micans
@micans
yes, that would be a way of structuring the program. It would not be an empty channel, it would emit dummy values.
marchoeppner
@marchoeppner
problem is that this will require much more substantial changes, since we want to join/mix based on a key - so it would have to be [ some_key, NULL ] or something along those lines :D I think the whole pipeline needs to be set up in a different way.... maybe have the reporting step be like MultiQC so that it automatically detects which outputs are present or whatever
and just skip the need for dummy values...
micans
@micans
I've experimented a bit .... with a setup like the following, you could perhaps stick some intelligence in the script section to detect what it has?
a = Channel.from(['a', 1], ['b', 2])
b = Channel.from(['a', 3], ['b', 4])
c = Channel.from(['a', 5], ['b', 6])
d = Channel.empty()

a.mix(b).mix(c).mix(d).groupTuple().view().set { ch }

process bar {
  input: set val(a), val(b) from ch
  echo true

  shell:
  '''
  echo "one value !{a} other values !{b}"
  '''
}
micans
@micans
If the input to the script are files, it could consider file suffixes/infixes. Or the pipeline could notify the script via a different channel.
micans
@micans
(edit to poke @marchoeppner)
Combiz
@combiz_k_twitter
any ideas why a function from a package in R would not be found when run in singularity via nextflow? e.g. could not find function "sumCountsAcrossCells"
and even calling the function with scater::sumCountsAcrossCells gives "Error: 'sumCountsAcrossCells' is not an exported object from 'namespace:scater'
Combiz
@combiz_k_twitter
ah nevermind, realised it's a singularity image troubleshooting issue, thanks
David Mas-Ponte
@davidmasp
Hi, stupid groovy question, sorry i it does not make sense.
I am trying to use a parameter from params inside a .map call. Is that possible?

This is what I want to do

bed_split_ch = bed_ch
    .map{it -> tuple(it[0],it[1].splitText(by: 2, file: true))}

and it works. I would like to use params.splitBed instead of 2.

David Mas-Ponte
@davidmasp
okay, i am stupid. It does work as expected... I was running the console and it does not reset the params every executions (could it be?) sorry for bothering
Paolo Di Tommaso
@pditommaso
yes, the console can have some tricky behavior because it resume the same session, there should a command to clear it in the menu somewhere ..
mmatthews06
@mmatthews06
@davidmasp , you're talking about the online Groovy console, or something else?
mmatthews06
@mmatthews06
:thumbsup:
Matthieu Pichaud
@MatPich_twitter
I am running nextflow on AWS using BATCH.
The spinned-up instance remains when the workflow is complete.
Is it the expected behavior?
Paolo Di Tommaso
@pditommaso
Instances should be teared down after a while
Stephen Kelly
@stevekm
hey I also posted this in the Google groups, but I am trying to come up with methods to implement unit testing and CI for Nextflow pipelines. I got as far as doing unittest in Python, but I am not sure what exactly I should be testing. Any suggestions? My work so far is here: https://github.com/stevekm/nextflow-ci
also helpful I guess in that I also started a super basic module to run Nextflow from Python, its in there under nextflow.py, mostly just a CLI wrapper with ENV variables thrown in
Luca Cozzuto
@lucacozzuto
Hi all, I have a process that should be performed only when a parameter is a certain value. So I use the "when" condition. The problem I have is that the script is complaining for the absence of the input channel in that process even if the condition is not met... If I make an empty channel then it hangs without doing nothing... is this the normal behaviour?
micans
@micans
@lucacozzuto I use when a lot exactly as you describe, and it works for me. Can you make an example illustrating the point?
Luca Cozzuto
@lucacozzuto
do you use Channel.create() for the empty channel? I think I found the solution using Channel.empty()