These are chat archives for nextflow-io/nextflow

15th
Nov 2016
Trevor Tanner
@tantrev
Nov 15 2016 02:01
I've been following the faq (specifically question 2) on how to get folder names to appear in the publishDir, but can't seem to figure it out. Specifically, I can only get the "output:" line to work if I write "set val(datasetID)", instead of "set (datasetID)". Are there any other resources documenting output folder name behavior?
Trevor Tanner
@tantrev
Nov 15 2016 03:20
I was able to hack something together with a Closure and the "saveAs" method in the publishDir options, but it'd be nice to have the faq's functionality working. Also, I had to manually create a routine to create folders that didn't exist in my transformed saveAs target paths. Could be nice if such folders were automatically created by Nextflow.
Mike Smoot
@mes5k
Nov 15 2016 04:18
@tantrev It's not quite clear from your questions which directories you'd like created with which names. Can you maybe post your code?
Paolo Di Tommaso
@pditommaso
Nov 15 2016 08:19
@tantrev Yes, there was a typo in the FAQ. Thanks for reporting that. Regarding folders returned by saveAs they should be automatically created.
Trevor Tanner
@tantrev
Nov 15 2016 08:52
@mes5k and @pditommaso - thank you for the kind replies. I apologize, my explanation was rather poor and I indeed was wrong about the path saving issue (forgive my stupidity, I'm a groovy newbie). I still can't seem to figure out the expected "set" behavior from the faq, however. I've made a sample .nf file that showcases the issue here: http://pastebin.com/raw/scjnLE1w
Alongside the script, I just made an empty "data" folder with empty files ["1.a.txt","1.b.txt","2.c.txt"] to test the expected naming behavior.
The "expected behavior" being that the final directory structure would be:
1/1.a.txt
1/1.b.txt
2/2.c.txt
Paolo Di Tommaso
@pditommaso
Nov 15 2016 09:00
still not understanding why you are creating the published file dirs structure
I'm just using this
Closure saveClosure = { file -> relative_target_dir = file.tokenize(".")[0]
relative_target_path = relative_target_dir + "/" + file
}
and it produces the following results
results/
├── 1
│   └── 1.a.txt
└── 2
    └── 2.b.txt
Trevor Tanner
@tantrev
Nov 15 2016 09:05
Right, that's the expected behavior. And yes, the folder-creating part is unnecessary. The main thing I was trying to point out is in the "process unexpectedBehavior" that doesn't utilize a saveClosure - I thought that a folder structure was supposed to created from the set system alone.
Paolo Di Tommaso
@pditommaso
Nov 15 2016 09:09
it does that, as long as the output has that structure
eg
:

    output:
    set val(fake_file_id), file("dir-1/${fake_file_id}") into output_data_1

    script:
    """
    mkdir dir-1/
    echo $fake_file > dir-1/${fake_file_id}
    """
Trevor Tanner
@tantrev
Nov 15 2016 09:15
Ohhh - I get it now. I was just thrown off about the "results/broccoli/" reference. That makes total sense though, thank you for the example.
Paolo Di Tommaso
@pditommaso
Nov 15 2016 09:16
:+1:
Hugues Fontenelle
@huguesfontenelle
Nov 15 2016 12:59

Hello!

ERROR ~ No signature of method: nextflow.util.Duration.multiply() is applicable for argument types: (groovy.util.ConfigObject) values: [[:]]
Possible solutions: multiply(java.lang.Number)

My settings.config looks like this:

process {
    executor='slurm'

    $mapping {
        clusterOptions = "--account=hugues --time=${12.h * task.attempt} --mem-per-cpu=3140 --cpus-per-task=${6 * task.attempt}"
    }
}

Any idea what goes wrong here? Probably a typo...

Hugues Fontenelle
@huguesfontenelle
Nov 15 2016 13:06
(0.20.0)
Mokok
@Mokok
Nov 15 2016 14:21
(Hi) random comment incoming :
  • what about processing the value out of the value field ""
  • what about simple ' instead of double " ?
Mokok
@Mokok
Nov 15 2016 14:26
  • in the doc i read that time setting is taken in account by Nextflow for Slurm, and use 12.hour typo (or 12h without a dot ...but not 12.h)
    now i go back to my own work :|
Hugues Fontenelle
@huguesfontenelle
Nov 15 2016 14:42
thanks @Mokok . I'll try your suggestion about the dot ..
Hugues Fontenelle
@huguesfontenelle
Nov 15 2016 16:31

OK if I submit

clusterOptions = "--time=${task.attempt}"

then my .command.runcontains

#SBATCH --time=[:]
Mokok
@Mokok
Nov 15 2016 16:35
doesn't the "task" variable belong to the task level AT runtime ? (have you tried to put the clusterOptions= .. --time in the concerned script direclty ?)
(please note that i'm jsut giving a eye to this gitter, i'm not using Nextflow currently...maybe soon. I just studied it a bit)
Hugues Fontenelle
@huguesfontenelle
Nov 15 2016 16:38
no it is not in the nf script, it is in the NEXTFLOW_CONFIG script. And yes, as you say, it seems that this is not avail yet.
Mokok
@Mokok
Nov 15 2016 16:38
(my ideas may be garbage, @pditommaso is the real hero here :) )
Hugues Fontenelle
@huguesfontenelle
Nov 15 2016 16:39
any pair of eyes is welcome, thanks :)
Mokok
@Mokok
Nov 15 2016 16:39
you're welcome
Hugues Fontenelle
@huguesfontenelle
Nov 15 2016 16:40
doing atime="${task.attempt}" directive in the config file results in a much clearer error message:
Not a valid time value in process definition: [:]
Hugues Fontenelle
@huguesfontenelle
Nov 15 2016 16:42
@pditommaso Your suggestion on :point_up: June 14, 2016 10:35 AM
doesn't seem to work :p
Mokok
@Mokok
Nov 15 2016 16:43
why everybody hates simple quotes ( ' )
Hugues Fontenelle
@huguesfontenelle
Nov 15 2016 16:44
yes yes this is inside the nextflow script as a process directive
I should probably do that, but having a config file seemed neater
Mokok
@Mokok
Nov 15 2016 16:45
anyway, have a nice end of day, i go off ;)
good luck this this, i prefer config file too ^^
Hugues Fontenelle
@huguesfontenelle
Nov 15 2016 16:45
well you found the issue. Now I just need feedback from paolo on wether this is a bug or a feature .
thanks!
Hugues Fontenelle
@huguesfontenelle
Nov 15 2016 17:20
Changing my settings.config to be like this:
process {
    executor='slurm'

    $mapping {
        clusterOptions = "--account=hugues  --mem-per-cpu=3140"
        time = '1h * task.attempt'
    }
}

works :)
but

cpus = '6 * task.attempt'

Does not.

Hugues Fontenelle
@huguesfontenelle
Nov 15 2016 18:00
In the code I see that it is possible to call task.config.getTime().format('HH:mm:ss'), but i don't see the equivalent for CPU's, only task.config.cpus.toString() . (I've been scouting SlurmExecutor and TaskConfig)
Am I right?
Hugues Fontenelle
@huguesfontenelle
Nov 15 2016 18:12
In issue #245
Phil Ewels
@ewels
Nov 15 2016 21:46
Hi @huguesfontenelle - I don't think that you'll ever get task.attempt to work in the config file. I think we tried something a while back, but it won't work as the config file is processed before the pipeline runs. task is only variable within a process scope (as it varies according to run-time things), so can't be included in a config file
Also, if you're using variables inside strings, then they'll need to be double quotes (I think) and need a dollar prefix - eg. "$task.attempt". And using operators won't work inside strings, so '6 * task.attempt' won't be evaluated. It should probably be cpus = { 6 * task.attempt } (no quotes)
Except, as above, it won't work in a config file as task isn't available in that scope
We do something similar, but set an arbitrary value to what we want for the inital number of cpus. Then in the pipeline script we multiply it by the task attempt:
Phil Ewels
@ewels
Nov 15 2016 21:51
params.process_cpus = 6
process {
    cps { params.process_cpus * task.attempt }
}
You can then set params.process_cpus wherever you want, such as in config files or on the command line
Hope that helps!
Maybe all of that has been discussed above and I'm missing the point..
Paolo Di Tommaso
@pditommaso
Nov 15 2016 21:58
not at all, but I've just commented on the issue opened by Huges
Phil Ewels
@ewels
Nov 15 2016 21:58
Yeah I just saw, so you can use task in a config file?
Paolo Di Tommaso
@pditommaso
Nov 15 2016 21:59
Yes, but it needs to be wrapped by a closure to defer the evaluation of the expression
Phil Ewels
@ewels
Nov 15 2016 21:59
and you can do stuff like multiplication inside strings?
Paolo Di Tommaso
@pditommaso
Nov 15 2016 21:59
BTW it should be
process {
    cpus = { params.process_cpus * task.attempt }
}
Phil Ewels
@ewels
Nov 15 2016 21:59
huh, ok I should totally do that in my pipeline. That's much nicer.
yup, in the config, my example was for the pipeline script.. right?
Paolo Di Tommaso
@pditommaso
Nov 15 2016 22:00
I know it may looks inconsistent, but in the config file you need = to assign a values
in the script it's not required (and not suggested to use)
ahh :+1:
yes in the script you can put also into a string
cpus "${params.process_cpus * task.attempt}"
Phil Ewels
@ewels
Nov 15 2016 22:02
I did a big workaround thing to put all of this into the pipeline script because I didn't think that we could defer evaluation like that, so thought that it was impossible to have task in a config file.. ;)
Nice!
Paolo Di Tommaso
@pditommaso
Nov 15 2016 22:03
there are a lot tricks that are possible thanks to closures .. maybe too many ;)
out of curiosity what are the pros/cons of NF vs clusterflow ?
Phil Ewels
@ewels
Nov 15 2016 22:05
CF is less suited to working at scale - no equivalent to -resume and generally quite a bit messier and more crude
However, it comes with a bunch of analysis pipelines built into it. So if these do what you want already, then it's easier to get up and running with.
Paolo Di Tommaso
@pditommaso
Nov 15 2016 22:05
interesting
Phil Ewels
@ewels
Nov 15 2016 22:05
Also, it's written in Perl, which seems to be a more popular language amongst bioinformaticians ;)
Paolo Di Tommaso
@pditommaso
Nov 15 2016 22:05
uhhhh :)
I was thinking python
Phil Ewels
@ewels
Nov 15 2016 22:06
At my last job / institute, CF was perfect for what we were doing - lots of tweaking, small scale runs
New job we work with much larger volumes, reproducibility is more important, stability more important, NF is better suited
Yes - I'm a Python convert! But not many people seem to know groovy (myself included)
Paolo Di Tommaso
@pditommaso
Nov 15 2016 22:06
great, now you are at Uppsala?
Phil Ewels
@ewels
Nov 15 2016 22:07
Stockholm
Our facility is split with Uppsala though
Paolo Di Tommaso
@pditommaso
Nov 15 2016 22:07
nice, I love that city
Phil Ewels
@ewels
Nov 15 2016 22:07
You should come visit :) :+1:
(seriously, if you're interested then it'd be great to have you over! ;) )
Paolo Di Tommaso
@pditommaso
Nov 15 2016 22:08
I came in 2014 if I'm not wrong for a course on supercomputing at KTH !
too expensive to come for holidays!!!
as soon as I will be fired I will take in consideration ;)
Phil Ewels
@ewels
Nov 15 2016 22:10
haha, yup! Especially the beer.. Fine if you live here though ;)
Paolo Di Tommaso
@pditommaso
Nov 15 2016 22:10
yes I guess so
Phil Ewels
@ewels
Nov 15 2016 22:12
As you can probably guess from the volume of support requests, we've been adopting Nextflow for quite a few pipelines here though..
It's come at a good time, trying to get people across various platforms to use similar methodologies where possible.
Paolo Di Tommaso
@pditommaso
Nov 15 2016 22:13
that sounds nice
thus you are deployment deploying the same pipelines across different clusters and groups?
Phil Ewels
@ewels
Nov 15 2016 22:14
Nearly everyone in Sweden uses a single supercomputing platform called UPPMAX. A few different servers there, but they're all pretty similar in setup (though some have no internet connection for data privacy, which is a huge pain)
Been having a lot of problems with it lately though, lots of downtime. So a few of us are quietly exploring alternatives to see what is possible (eg. recent AWS / docker enquiries)
Paolo Di Tommaso
@pditommaso
Nov 15 2016 22:15
though some have no internet connection for data privacy
same at BSC ..
Phil Ewels
@ewels
Nov 15 2016 22:15
Yup, makes development work very slow..
But yeah, ideal is to build pipelines for ourself (core sequencing facility) which other people like / understand / trust and can use themselves
So getting other people to run the same pipelines in different groups eventually, yes
Paolo Di Tommaso
@pditommaso
Nov 15 2016 22:16
I know.. supercomputing people really do not understand how bioinformaticians work
Phil Ewels
@ewels
Nov 15 2016 22:17
And also by virtue of unexpected interest by some people, maybe different clusters. I've had a couple of requests from people not using uppmax so that's a bonus.
Paolo Di Tommaso
@pditommaso
Nov 15 2016 22:17
nice, we have exactly the same requirements
gradually shifting to the cloud whenever the cost will make it possible
have you already tested NF on AWS ?
Phil Ewels
@ewels
Nov 15 2016 22:18
yup, our main aim with our current AWS testing is to get a more concrete figure on price equivalence, to use as a bargaining chip / plan B when stuff goes wrong
@Galithil is in the process of testing currently.
He needed me to add additional steps to the pipeline (eg. building references) though, also remove a bunch of hardcoded stuff. So been adding that and breaking a lot of stuff over the past few days.
Docker image is building and AWS instance is up now though, so getting closer to getting something to run.
Paolo Di Tommaso
@pditommaso
Nov 15 2016 22:20
great
Phil Ewels
@ewels
Nov 15 2016 22:21
Do you run any stuff on AWS yet?
Paolo Di Tommaso
@pditommaso
Nov 15 2016 22:22
No, mostly on our cluster
but the original idea of NF was to enable transparent migration to the cloud
for the same reasons you were mentioning
Phil Ewels
@ewels
Nov 15 2016 22:23
:+1: It's definitely coming..
We have a test cloud server as part of UPPMAX that we're trying to use too (called smog :cloud: )
So if that ends up being scaled up I could see us using NF with that too
Paolo Di Tommaso
@pditommaso
Nov 15 2016 22:24
a private cloud ?
Phil Ewels
@ewels
Nov 15 2016 22:25
Yup
Relatively far off at this point though
smog is a proof of concept running on an old cluster
Paolo Di Tommaso
@pditommaso
Nov 15 2016 22:26
based on OpenStack
would be interesting to have a support for it in NF
it's quite popular in private cloud environments
though I'm a bit skeptic on private clouds ..
Phil Ewels
@ewels
Nov 15 2016 22:28
AWS / equivalent would for sure be nicer, but main concern for us is Swedish law
Not sure at this point whether we'll be allowed to run anything Human on servers hosted outside of Sweden / the Nordics
Still, we'll cross these bridges when we get there I think. A fair way off this being a large concern yet.
Paolo Di Tommaso
@pditommaso
Nov 15 2016 22:29
I see, I think there's a similar problem at EU level as well
but this makes sense, local consortium needs to be promoted a EU or regional level
Phil Ewels
@ewels
Nov 15 2016 22:46
Yup! Not easy stuff..
Paolo Di Tommaso
@pditommaso
Nov 15 2016 22:47
BTW do you know the author of this PR ?
nextflow-io/nextflow#232
Phil Ewels
@ewels
Nov 15 2016 22:50
Yes, he's a sysadmin at UPPMAX (trying to install nextflow as an environment module at my request)
Paolo Di Tommaso
@pditommaso
Nov 15 2016 22:51
I've made some comments to is pull request and at the end closed it w/o merging because there was no feedback from him
hope it's not a problem
Phil Ewels
@ewels
Nov 15 2016 22:53
I'll try to follow up with him on my support ticket thread. I gave up with using it as an environment module and started recommending people to just install nextflow themselves locally, but it would be nice if we could get it to work I guess.
Found that the length of time it took to get updates installed on the system didn't really match the speed at which nextflow updates are released ;)
Paolo Di Tommaso
@pditommaso
Nov 15 2016 22:54
I agree! :)
Phil Ewels
@ewels
Nov 15 2016 22:54
But hopefully as our pipelines mature, they won't rely on cutting edge changes so much and not running the very latest version won't matter so much
Paolo Di Tommaso
@pditommaso
Nov 15 2016 22:55
btw I've already incorporated a changes he proposed
and refused the other because there's was already a way to handle it
Phil Ewels
@ewels
Nov 15 2016 22:57
Yup, plus his PR had stuff like his own hardcoded github url and other nasties by the look of it..
He's a busy guy, I think he only works part-time with the uppmax stuff and we request a lot of modules / sysadmin tasks
As I stopped bugging him about it I suspect it was buried under more pressing issues
Paolo Di Tommaso
@pditommaso
Nov 15 2016 22:58
a classic for sysadmins .. :grin:
ok, thanks for the nice chat
going offline
Phil Ewels
@ewels
Nov 15 2016 22:59
Yup, same :+1:
Will try out the config file task stuff tomorrow ;)