Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
  • 19:37
    ewels commented #1334
  • 15:04
    pditommaso commented #1289
  • 14:59
    pditommaso closed #1318
  • 12:58

    pditommaso on master

    Fix failing test (compare)

  • 12:36
    pditommaso commented #1284
  • 12:35
    pditommaso closed #1284
  • 12:35
    pditommaso commented #1284
  • 12:35

    pditommaso on master

    Add limit on max field lengths … Merge branch 'master' of github… Fix docs typo and 4 more (compare)

  • 10:05
    pditommaso commented #1331
  • 09:55
    stefanoberri closed #1333
  • 09:55
    stefanoberri commented #1333
  • 08:56
    pditommaso commented #1334
  • Oct 14 22:15
    daudn commented #1331
  • Oct 14 19:47
    mozack commented #1331
  • Oct 14 14:51
    daudn commented #1244
  • Oct 14 13:44
    ewels opened #1334
  • Oct 14 11:40
    daudn commented #1331
  • Oct 14 09:21
    daudn commented #1331
  • Oct 12 07:59
    pditommaso commented #1034
  • Oct 12 03:01
    davisem commented #1332
Maxime Garcia
@MaxUlysse
I'll make a minimal example and an issue
Sri Harsha Meghadri
@harshameghadri
Hey folks, I am pretty new to using docker and singularity. I want to use the nf-core/rnaseq for my analysis. I consistently get errors while trying to execute this command singularity pull --name nf-core-rnaseq-1.3.img docker://nf-core/rnaseq:1.3
Unable to pull docker://nf-core/rnaseq:1.3: conveyor failed to get: Error reading manifest 1.3 in docker.io/nf-core/rnaseq: errors:
I am trying to pull to rackham. My analysis needs to be executed on bianca. Any tips are super appreciated.
Maxime Garcia
@MaxUlysse
try singularity pull --name nf-core-rnaseq-1.3.img docker://nfcore/rnaseq:1.3 instead, without the - in nf-core
I'm guessing if you have more question about nf-core pipelines, it should be better on our slack: https://nf-co.re/join
Sri Harsha Meghadri
@harshameghadri
I tried that as well getting the same error @MaxUlysse
Maxime Garcia
@MaxUlysse
You tried that on bianca?
Sri Harsha Meghadri
@harshameghadri
nope on rackham, I guess it needs internet
Maxime Garcia
@MaxUlysse
Sure
Just trying to find an easy mistake, sorry ;-)
Have you tried running that on an interactive node?
Sri Harsha Meghadri
@harshameghadri
hmmm on rackham? I dnt have allocation there.
Maxime Garcia
@MaxUlysse
I'm afraid singularity might be too demanding on the regular login node
Let me see if I can help you in an other way
Sri Harsha Meghadri
@harshameghadri
perfect, thank you Maxime.
Maxime Garcia
@MaxUlysse
@harshameghadri I messaged you ;-)
Marko Melnick
@Senorelegans
Is there any way to force a process to wait for another process to finish before it is started? I tryed to do it with a dummy channel but I am concatenating files to the channel amounts change from one process to the next.
Maxime Garcia
@MaxUlysse
can't you use the output from the process that need to be finished as an input for the other?
maybe with a .collect() to be sure to catch multiple executions of said process
Marko Melnick
@Senorelegans
I am concatenating fastq files by groups (I made a parser in groovy to separate by group). Is there no way to just make one process wait for another one with a dummy channel or some null variable?
I guess my real issue is I am reading channels from pairs earlier. And I have a list of the file name with condition group in a sample table that I can read, but I am struggling bringing them and operating on them by group.
micans
@micans
@Senorelegans can you make a functioning toy example that illustrates your issue? Grouping is usually done by groupKey -> transpose -> groupTuple; e.g.
Channel.from(['a', [1, 2, 3]], ['b', [4, 5]], ['c', [6, 7, 8]])
   .map { tag, stuff -> tuple( groupKey(tag, stuff.size()), stuff ) }
   .view()
   .transpose()
   .map { tag, num -> [tag, num*num+1 ] }
   .view()
   .groupTuple()
   .view()
Ashley S Doane
@DoaneAS
@rsuchecki any ideas on this issue with java cpu use when nextflow is reading the results cache following nextflow -resume? I’m running nextflow on plenty of available resources (192 CPUs, 10T ram), but I’ll try submitting nextflow command as a job. This way SGE will kill it if CPU usage is too high.
Ashley S Doane
@DoaneAS

Also, I had the

executor {
    queueSize=1000
}

But thinking more about how sge works, this is not necessary (jobs are not run based on how long they have been in queue, but based on job priority that SGE determines and updates). Seems possible that this setting coild have caused an issue.

Anthony Ferrari
@af8
Hi all, is it possible to have publishDir directive to resolve symlinks in move mode ? I have a folder going from a process to another, I modify its content in the second process and publish it. But what is actually published is the symlink to the first process workdir. I would like it to be moved physically. Thanks
Rad Suchecki
@rsuchecki
A few ideas @DoaneAS - but I am not convinced by any of them... here is a couple
  • If the problem persists try to look into JVM settings regarding memory and GC - which could be the cause - but why would there be that much garbage to collect in the first place?
  • IO - NF process having to shift all the result files to publishDir - again there is not that many of them so not sure this could be the issue but you could disable publishDir in the first place to see if it makes a difference and re-enable in another run with -resume to publish the cached results.
Ashley S Doane
@DoaneAS
@rsuchecki thanks for the suggestions!
marchoeppner
@marchoeppner
Hi, quick question about joining multiple channels (2 parts to this question):
if I have multiple output channels, each of the format [ a_label, a_result ] - would I put them together by successive ".join" statements?
input_to_summary = read_files_summary .join(fusioncatcher_fusions) .join(star_fusion_fusions) .join(ericscript_fusions) .join(pizzly_fusions) .join(squid_fusions)
or is there a more elegant way?
Second part, what would be a way to deal with empty channels in this scenario?
scenario is the following: multiple parallel analyses of the same sequencing data, which are then merged into a report - each analysis being optional.
micans
@micans
@marchoeppner I don't see anything more elegant; to solve the empty channels I can only think of using mix() combined with groupTuple(), e.g.
a = Channel.from(['a', 1], ['b', 2])
b = Channel.from(['a', 3], ['b', 4])
c = Channel.empty()

a.mix(b).mix(c).groupTuple().view()
marchoeppner
@marchoeppner
ok thanks, I will try that!
micans
@micans
@marchoeppner note that groupTuple() will block until the channel has completed. In your case, you may know the size of each eventual tuple (as it is the number of analyses), so you could give it the size parameter, in that case a tuple is released once it has that size.
micans
@micans
@marchoeppner further caveat; the tuples you get can have the analyses in different orders. If the elements are files I imagine it does not matter much.
marchoeppner
@marchoeppner
that might actually be a problem, since I need to do something like:
input: set val(sample_id),file(analysis1),file(analysis2),file(analysis3) from Foo
so it still is a bit tricky I reckon..
it's not my pipeline, so I don't have too much control over the basic logic of it all, just trying to implement support for multiple input data sets (right now it assumes that there is only one sample)
micans
@micans

How is this

input: set val(sample_id),file(analysis1),file(analysis2),file(analysis3) from Foo

going to work if some analysis could be missing? As for the order that could be fixed I think with an additional sorting step.

marchoeppner
@marchoeppner
indeed ^^
it used to be multiple arguments under "input:" with an added ".ifEmpty('')" - but that seems difficult to do now
well the empty channel needs to emit something at least, then one could try to verify each element to see if it is an actual file or just a placeholder , like "''" or NULL
micans
@micans
yes, that would be a way of structuring the program. It would not be an empty channel, it would emit dummy values.
marchoeppner
@marchoeppner
problem is that this will require much more substantial changes, since we want to join/mix based on a key - so it would have to be [ some_key, NULL ] or something along those lines :D I think the whole pipeline needs to be set up in a different way.... maybe have the reporting step be like MultiQC so that it automatically detects which outputs are present or whatever
and just skip the need for dummy values...
micans
@micans
I've experimented a bit .... with a setup like the following, you could perhaps stick some intelligence in the script section to detect what it has?
a = Channel.from(['a', 1], ['b', 2])
b = Channel.from(['a', 3], ['b', 4])
c = Channel.from(['a', 5], ['b', 6])
d = Channel.empty()

a.mix(b).mix(c).mix(d).groupTuple().view().set { ch }

process bar {
  input: set val(a), val(b) from ch
  echo true

  shell:
  '''
  echo "one value !{a} other values !{b}"
  '''
}