These are chat archives for nextflow-io/nextflow

22nd
Feb 2018
Paolo Di Tommaso
@pditommaso
Feb 22 2018 07:50
@ewels sorry, there's something broken in that build
just pushed a new one
  Version: 0.28.0-SNAPSHOT build 4762
  Modified: 22-02-2018 07:35 UTC (08:35 CEST)
refresh your snapshot with this command:
CAPSULE_RESET=1 NXF_VER=0.28.0-SNAPSHOT nextflow info
then use it as usual
NXF_VER=0.28.0-SNAPSHOT nextflow run ...etc
Simone Baffelli
@baffelli
Feb 22 2018 08:34

Hello; I have some doubts regarding the sort option of the groupTuple operator. If I apply the operator to a channel emitting lists, e.g [dateId, channelId, file] and I use groupTuple(by:1, sort:True), how will the grouped tuples be sorted? Will the relative order be preserved? I.e, if I have a process:

process average{
 input:
 set val(dateId), val(channel), file(toAverage:"*") from filesWithIds.groupTuple(by:1, sort:True)
shell:
somecommand
}

Will the dateIds in the process correspond to the same file as they did before being collected?

Paolo Di Tommaso
@pditommaso
Feb 22 2018 08:37
elements are supposed to have the same type
it's true not True
Simone Baffelli
@baffelli
Feb 22 2018 08:38
Whoops, python user here
Paolo Di Tommaso
@pditommaso
Feb 22 2018 08:39
it uses the the type comparison criteria by defualt
you can provide your own comparator
Simone Baffelli
@baffelli
Feb 22 2018 08:40
But I don't really understand the mechanics of the sort. If I group lists, will each of these lists be sorted? Or will the result of grouping be sorted?
Well I see that I should not try using sort. It will apparently first flatten the list, which is something I don't want to happen
Paolo Di Tommaso
@pditommaso
Feb 22 2018 08:44
I don't think so
Simone Baffelli
@baffelli
Feb 22 2018 08:45
well, I (very naively) tried .groupTuple(by:1, sort:{a->println(a)})
And what it shows are the individual elements of the lists
Paolo Di Tommaso
@pditommaso
Feb 22 2018 08:45
that it's not a comparator !
you need either specify a closure returning the element to comparare
eg.{ obj -> return obj.field }
or a closure implementing the Comparator interface
Phil Ewels
@ewels
Feb 22 2018 08:56
$ NXF_VER=0.28.0-SNAPSHOT nextflow -version

      N E X T F L O W
      version 0.28.0-SNAPSHOT build 4762
      last modified 22-02-2018 07:35 UTC (08:35 CEST)
      cite doi:10.1038/nbt.3820
      http://nextflow.io

$ NXF_VER=0.28.0-SNAPSHOT nextflow run [...]
Can only specify option -profile once. -- Check the available commands and options and syntax with 'help'
still no dice..
Paolo Di Tommaso
@pditommaso
Feb 22 2018 08:57
how did you specify it .. ?
Phil Ewels
@ewels
Feb 22 2018 08:59
Ah sorry, just restarted my Mac for an update :smile: Will check full command when it’s finished doing it’s thing.
Simone Baffelli
@baffelli
Feb 22 2018 09:00
@pditommaso I know, I wanted to see what the function was seeing
Paolo Di Tommaso
@pditommaso
Feb 22 2018 09:00
multiple profiles need to be provided in the same -profile option separating them by comma
Simone Baffelli
@baffelli
Feb 22 2018 09:00
Just to try and understand how sorting worked internally
Paolo Di Tommaso
@pditommaso
Feb 22 2018 09:01
you are an hacker :)
Simone Baffelli
@baffelli
Feb 22 2018 09:03
More of a person too lazy to check the sources...I still do not get it...what gets sorted?
It looks like the sort tries to sort each sublist individually, breaking the relative order of files and dates.
Paolo Di Tommaso
@pditommaso
Feb 22 2018 09:06
you know -> test case -> github issue -> discuss solution -> PR -> :)
sorry kidding
Simone Baffelli
@baffelli
Feb 22 2018 09:07
I will probably!
I have implemented something similar, regarding the sorting
called sortListOfListsBySublist
I like descriptive names
Paolo Di Tommaso
@pditommaso
Feb 22 2018 09:08
wow
Simone Baffelli
@baffelli
Feb 22 2018 09:08
my workflows are full of horrible names like that
Paolo Di Tommaso
@pditommaso
Feb 22 2018 09:09
you have a great future as Java developer .. :)
Simone Baffelli
@baffelli
Feb 22 2018 09:10
itCouldBeAnOptionForWhenIAmDoneWithThisDamnedPhD
Paolo Di Tommaso
@pditommaso
Feb 22 2018 09:10
GoGoGo!
Simone Baffelli
@baffelli
Feb 22 2018 09:10
IAmSureJavaDevelopersArePaidBetter
ok lets stop
Paolo Di Tommaso
@pditommaso
Feb 22 2018 09:14
need to write some docs, going offline for a while
Phil Ewels
@ewels
Feb 22 2018 10:38

multiple profiles need to be provided in the same -profile option separating them by comma

Aha, I didn't realise this. That would be the problem then :+1:

Paolo Di Tommaso
@pditommaso
Feb 22 2018 10:39
:ok_hand:
Phil Ewels
@ewels
Feb 22 2018 10:39
I'm thinking that this is overkill to just avoid a few pesky warning messages though
I might just create some dummy empty processes that don't run to avoid them :laughing:
Basically - I'm already choosing which tool to use by choosing which script to run. So it's kind of a pain to have to double-specify it in the profile too.
Paolo Di Tommaso
@pditommaso
Feb 22 2018 10:40
I see, it could possible to add a switch to turn the warning off
but it sounds a bad practice
but why you have config for non existing processes ?
conditional execution with a if ?
Phil Ewels
@ewels
Feb 22 2018 10:41
I have two pipelines in one repository.
So yes, I could merge both pipeline scripts into one
I guess that could be better
only three out about 10 steps are shared though
so I just kept them in separate files to keep the script simpler and easier to work with
Paolo Di Tommaso
@pditommaso
Feb 22 2018 10:42
and do you use the same config for both ?
Phil Ewels
@ewels
Feb 22 2018 10:42
yup
Paolo Di Tommaso
@pditommaso
Feb 22 2018 10:42
I see
It's the same for Sarek (the pipeline formerly known as CAW): https://github.com/SciLifeLab/Sarek
Paolo Di Tommaso
@pditommaso
Feb 22 2018 10:43
I see, I need to think about that
Phil Ewels
@ewels
Feb 22 2018 10:43
Makes sense to keep the pipelines in one repo so that everything is together (versioning, similar inputs etc). But having a single script to run everything is really unweildy
Hence previous discussions about NF modules etc
Paolo Di Tommaso
@pditommaso
Feb 22 2018 10:44
you may want to open a issue to keep this problem open
Phil Ewels
@ewels
Feb 22 2018 10:44
:+1:
Maxime Garcia
@MaxUlysse
Feb 22 2018 10:50
:+1:
Alexander Peltzer
@apeltzer
Feb 22 2018 12:00
https://github.com/SciLifeLab/NGI-ExoSeq will have similar issues (two step process) and it makes sense to keep these split... I'll also bookmark the issue to stay in the loop ;-)
Bioninbo
@Bioninbo
Feb 22 2018 13:39
Hello,
I was wondering: is it possible to set the runOptions parameter on a per container basis? So far I have been doing singularity.runOptions = ... However this applies to all my singularity containers.
Paolo Di Tommaso
@pditommaso
Feb 22 2018 13:46
on todo list nextflow-io/nextflow#415
Bioninbo
@Bioninbo
Feb 22 2018 13:50
Ah ok, thanks. Then I'll try doing singularity exec --myoptions mycommand directly in the process where needed
Phil Ewels
@ewels
Feb 22 2018 14:28
@apeltzer - issue is here: nextflow-io/nextflow#621
Ah, you found it already, nvm :)
Bioninbo
@Bioninbo
Feb 22 2018 15:09
The singularity exec --myoptions mycommand command works when I use it in the appropriate work folder, however it doesn't within the script. I think there is a binding issue and the produced files are not exported from the container to the appropriate work folder. I tried -B workpath -B "$PWD" but it didn't help. Any idea on how to solve this problem?
Paolo Di Tommaso
@pditommaso
Feb 22 2018 15:11
why do you need to handle singularity command line at process level ?
Maxime Borry
@maxibor
Feb 22 2018 15:11
Hello,
Is there conditional input/output in NF ? It's not mentioned in the doc, so I'm assuming no...
To achieve this, I'm creating similar processes that deal with different input/output, but basically do the same thing. It works, but is there a better way/ more NFish way to decrease code duplication ?
Félix C. Morency
@fmorency
Feb 22 2018 15:13
You can do something like
output:
file "foo.txt" optional true
You could also check the when: directive. It's in the documentation.
Bioninbo
@Bioninbo
Feb 22 2018 15:14
@pditommaso I need to specify the --containall option for certain containers otherwise the script fail for some reasons, not really sure why
Paolo Di Tommaso
@pditommaso
Feb 22 2018 15:16
I see, what's the side effect if you apply that setting to all processes ?
Bioninbo
@Bioninbo
Feb 22 2018 15:22
I get this error on a macs2 container: [Errno 28] No space left on device The Python egg cache directory is currently set to: /.../cache/Python-Eggs
Perhaps your account does not have write access to this directory? You can change the cache directory by setting the PYTHON_EGG_CACHE environment variable to point to an accessible directory.
Paolo Di Tommaso
@pditommaso
Feb 22 2018 15:24
this is more a topic for singularity forum, but I would suggest to fix the issue instead of using task specific settings
Félix C. Morency
@fmorency
Feb 22 2018 15:25
Sounds like a missing bind path
Bioninbo
@Bioninbo
Feb 22 2018 15:25
In other containers I also get other errors related to conflicting tools between my local install and the container which is why I thought to use the containall option to isolate better certain containers
Ok I'll try to solve this by adding the missing path.
Thanks @pditommaso and @fmorency
Bioninbo
@Bioninbo
Feb 22 2018 15:33
Working now :)
Paolo Di Tommaso
@pditommaso
Feb 22 2018 15:36
How?
Caspar
@caspargross
Feb 22 2018 15:37
Why is not possible to use mutliple channels on a single input declaration? How else can I run the same process on two different channels?
Félix C. Morency
@fmorency
Feb 22 2018 15:38
Merge the channels?
Bioninbo
@Bioninbo
Feb 22 2018 15:39
I added the path missing for the macs2 container in the binding options : runOptions = "--containall -B 'workingdir' -B '/home/user/.cache/Python-Eggs'"
Caspar
@caspargross
Feb 22 2018 15:42
but would this not be basically the same idea as directly splitting the output? see nextflow-io/nextflow#97
Félix C. Morency
@fmorency
Feb 22 2018 15:57
I see your point. I don't think that's possible atm
Maxime Borry
@maxibor
Feb 22 2018 16:25
Thanks @fmorency I'll probably change my if/else to when. Won't change the code duplication, but will clean up the code a bit ! Thanks !
Caspar
@caspargross
Feb 22 2018 16:29
@maxibor Im thinking I have a similar problem. I use a single channel for different kinds of inputs/outputs and an identifier which defines the exact action of the process
the identifier is also in the channel i.e. set id, assembly1 ...