These are chat archives for nextflow-io/nextflow

4th
Apr 2019
Michael Webb
@michaelwebb_twitter
Apr 04 02:50
I also submitted this as a bug, with some additional info: nextflow-io/nextflow#1098
Looking at the log file, the 10 second gap occurs between these two lines:
Apr-03 19:38:19.570 [main] DEBUG nextflow.Session - Executor pool size: 8
Apr-03 19:38:29.612 [main] DEBUG nextflow.cli.CmdRun -
teoKusa!
@teoKusa_twitter
Apr 04 07:06
test
hi, is it possible to reuse the same variable name as input and output in a process? i.e. get a variable as input and put it in an output channel
I'm trying this in a process:
    input:
    file modeld from guess_model_dir

    output:
    set val(modeld.baseName.split("_")[1])
however I get "ERROR ~ No such variable: modeld"
Lavanya Veeravalli
@veeravalli
Apr 04 07:17
-resume option on aws batch setup fails with the following error message Apr-04 15:10:38.615 [main] ERROR nextflow.cli.Launcher - @unknown java.lang.UnsupportedOperationException: null at com.upplication.s3fs.S3FileSystemProvider.move(S3FileSystemProvider.java:554) The same script works with -resume option is able to cache and proceed further. Anything I am missing here? If I delete the publish dir, then the whole workflow starts from the first setup but not able to cache the earlier step. I tired even adding cache = 'deep' but doesnt help. Thanks.
Anthony Underwood
@aunderwo
Apr 04 08:29

It's something to do with bind paths.
If I bind the parent directory

pwd
/lustre/scratch118/infgen/team212/au3/singularity/work
bash-4.2$ singularity exec --bind /lustre/scratch118/infgen/team212/au3/singularity /lustre/scratch118/infgen/team212/au3/singularity/ghru-assembly.sif /bin/bash -c "zcat /lustre/scratch118/infgen/team212/au3/singularity/assembly_test/fastqs/ERR230474_1.fastq.gz | head"
@ERR230474.1 1/1
TGTATCAAAACAGCTTGGGAAATAATTTATAAAGTATGTATAAGAACTGTATAAGGTATTCAAACATTGTAAACACTCATGCTTCGGACCAAACTCATGGTGATGTTATGAAATTTGATTGCTCGCATCGTGTATTTCTATCTTTAATCG
+
?????BBBDDDDDDDEGGGGGGIIIIIIIIIIIICGHHEFHIIIHHIIIHIIIIIH9AAFHHIIIIIIIIHIIIIIIHIIIIIHIHHHHHIHIIIIIIIIACGHIHIIIIHHHGDGHHGHHHHHHHHHHHFGAFFGFGGFGGGGGGGGGE
@ERR230474.2 2/1
TAGCTCATTGATTATCTAGTCATAATTCAAGCAACTACTACAATATAACAAAATCCTTTTTATAACGCAAGTTCATTTTATGCTACTGCTCAATTTTTTTACTTTTATCGATTAAAGATAGAAATACACGATGCGAGCAATCAAATTTCA
+
AAAAABBBDDDDDDDEGGGGGGIIHIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIHIIIIHHHHHGHIHIHIIIIIIIIIIIIIIIIIIIIIHHIHHHHHHHHHHHHDHHHHHHHFHHHHHGGGGHGGGGGGGGGGGGGGGHHE
@ERR230474.3 3/1
TTAGTGAGTGTATCAAAACAGCTTGGGAAATAATTTATAAAGTATGTATAAGAACTGTATAAGGTATTCAAACATTGTAAACACTCATGCTTCGGACCAAACTCATGGTGATGTTATGAAATTTGATTGCTCGCATCGTGTATTTCTATC

In the end this was solved by using singularity.runOptions = '--bind /lustre'

KochTobi
@KochTobi
Apr 04 08:37
@veeravalli -resume works fine for me on AWSBatch. How are you starting your pipeline? Which nextflow version are you using?
Lavanya Veeravalli
@veeravalli
Apr 04 09:08
@KochTobi Thanks. I am using 19.01.0 version. As I mentioned earlier, local version works.
Paolo Di Tommaso
@pditommaso
Apr 04 10:19
I guess it's this nextflow-io/nextflow#813
Jonathan Manning
@pinin4fjords
Apr 04 10:39
Quick one- what's the best way to derive a list of the failed processes and associated working directories for a Nextflow run?
Paolo Di Tommaso
@pditommaso
Apr 04 10:40
during or after run
Jonathan Manning
@pinin4fjords
Apr 04 10:40
after
Is it just log grepping?
Paolo Di Tommaso
@pditommaso
Apr 04 10:41
nextflow log <run name> -h
Jonathan Manning
@pinin4fjords
Apr 04 10:42
Thanks - what's the solution for 'during'?
Paolo Di Tommaso
@pditommaso
Apr 04 10:44
nextflow run <script> -with-trace`
Jonathan Manning
@pinin4fjords
Apr 04 10:46
Okay thank you- will look into those. Had turned trace off because the files were getting big, but clearly I need it.
Rad Suchecki
@rsuchecki
Apr 04 10:46
this will give task hashes but not full paths to workDirs righ?
Paolo Di Tommaso
@pditommaso
Apr 04 10:47
also full paths and all other metadata, depending the -f option
NEW TRICK! ;)
Rad Suchecki
@rsuchecki
Apr 04 10:47
:tada:
is there config equivalent of -f?
Paolo Di Tommaso
@pditommaso
Apr 04 10:49
what config ?
Rad Suchecki
@rsuchecki
Apr 04 10:50
as in a flag to set in nextflow.config that would have the same effect as -f at run time
Paolo Di Tommaso
@pditommaso
Apr 04 10:51
not sure to understand which flag you are referring, but as a rule of thumb I would say no
ahh that was for log
not trace
Ido Tamir
@idot
Apr 04 11:27
process write {
output:
file "written.txt" into written

exec:
new File("written.txt").text = "this is the text"
print("done")

}
Hello,
I try to write a file, but the file is not created.
[warm up] executor > local
done[73/608310] Submitted process > write
ERROR ~ Error executing process > 'write'

Caused by:
  Missing output file(s) `written.txt` expected by process `write`

Source block:
  new File("written.txt").text = "this is the text"
  print("done")
Ido Tamir
@idot
Apr 04 11:35
ok, the file is in the project folder
Ido Tamir
@idot
Apr 04 11:49
ok, task.workDir gets me to the process folder
Rad Suchecki
@rsuchecki
Apr 04 11:50
yep e.g. task.workDir.resolve('written.txt')
Caspar
@caspargross
Apr 04 13:30
Is there a variable to access the number of available cores inside a nextflow script? For example if I want to run a process with n-1 threads and the number of max threads changes depending on the executor and/or profile.
micans
@micans
Apr 04 13:30
task.cpus
Caspar
@caspargross
Apr 04 13:30
ty!
micans
@micans
Apr 04 13:30
:+1:
Caspar
@caspargross
Apr 04 13:40
from my understanding task.cpus is only defined inside a process. Is it also possible to generally access this information at the start of a script?
micans
@micans
Apr 04 13:56
is this for the local executor? What is the use case?
Caspar
@caspargross
Apr 04 14:42
Yes, for local. I wanted to give the user feedback about the number of available cpus in a log.info" " at application startup. But its not really that important.
Alexander Peltzer
@apeltzer
Apr 04 14:46
Try int cores = Runtime.getRuntime().availableProcessors();
Includes hyperthreading "cores" though and I'm not sure that works in nextflow, this would be the Java way which might work ;-)
Stephen Kelly
@stevekm
Apr 04 14:47

@micans @stevekm Just curious, is the volume of images substantial enough to opt for shared space over user specific SINGULARITY_CACHEDIRs? Or is it more about different users re-running the same pipeline and needing fixed containers for task caching?

@rsuchecki I have about 35 .simg's, 16GB total, for the two main pipelines I am running, at the moment its just me and one other user, but I dont see any reason to force each user to create their own copy of every image file, in fact that would probably be bad for this case because this is a production-style pipeline so it should be the same for everyone that runs it so everyone should be using the exact same images. If it was more of an independent research group style setup and everyone was doing their own thing then it would make sense to let everyone have their own container dir's privately. Yet still I am not sure I see any real value in trying to keep the containers protected, its just a lot easier I think to put everyone's .simg files in a common location for all to use, the same way your HPC's "modules" work, unless you have reason not to. Also it seems like a lot more trouble than its worth to try and pull from Singularityhub on the fly when running your Nextflow pipeline, compared to just saving the static files. I have an example repo here: https://github.com/NYU-Molecular-Pathology/containers where I even include all the steps users need to create the containers locally then upload to the shared location on the HPC. I dunno, its worked great for me.

micans
@micans
Apr 04 14:48
@caspargross @apeltzer In my case this is batch-system specific, for example under LSF I find out how many CPUs I asked for / have on a machine like this: bjobs -o "slots" $LSB_JOBID. Will Runtime.getRuntime just count all cpus on the machine?
@stevekm I forgot / sorry to ask again; what is your solution for the singularity cache multi-user write problem, if any?
Stephen Kelly
@stevekm
Apr 04 14:51
oh I do not use any Singularity cache at all, I just use the full path to the location with the image .simg files
micans
@micans
Apr 04 14:52
got it, thanks
Stephen Kelly
@stevekm
Apr 04 14:57

@teoKusa_twitter

hi, is it possible to reuse the same variable name as input and output in a process? i.e. get a variable as input and put it in an output channel
I'm trying this in a process:

   input:
   file modeld from guess_model_dir

   output:
   set val(modeld.baseName.split("_")[1])

You can definitely reuse a variable as an output, however I am gonna go out on a limb here and suggest that the problem is that you are trying to call all those extra methods on the variable. You might try simply reassigning the variable within the process like this:

process do_thing{   
    input:
     file modeld from guess_model_dir
     output:
          val(newval)

    script:
    newval = modeld.baseName.split("_")[1]
    """
    # empty script section
    """
}

The scoping and order of execution of things inside the process statements is sometimes weird and un-intuitive so its best to keep it as simple as possible, in my experience

wow markdown hates me today lol

@stevekm I think, now that the weblog feature will soonish include complete workflow metadata, you should be able to easily build such monitoring app. we will do so as well at QBiC, same will be possible with @ewels idea of the monitor feature of nf-core tools. I am looking forward to all implementations :) With respect to results: as nf-core standardises the reports (and location), it is quite easy to trigger a second service via i.e. a webhook after completion and register the results. we will evaluate and most likely use StackStorm for the orchestration of events. sorry to the others, this gets a bit ot ;)

@sven1103 I am really interested in following progress on these. Also, is there a separate Gitter for nf-core? I have some questions about the standardized aspects, in addition to its implementation and the webhooks you refer to

Phil Ewels
@ewels
Apr 04 16:36
We used to have Gitter for nf-core but moved to slack instead
Hit the link top right of any page on nf-co.re to join :tada:
Direct link for slack invite: https://nf-core-invite.herokuapp.com/
Phil Ewels
@ewels
Apr 04 16:52
To clarify, https://nf-co.re is the website
Sinisa Ivkovic
@sivkovic
Apr 04 17:05
I just saw discussion here nextflow-io/nextflow#826 and nextflow-io/nextflow#838. @pditommaso did you have time to work on this in the meantime? I also don't think it's a good idea to add a new directive only to specify aws batch mounts inside the process. But also I think ability to specify mount point should exist. It actually solves two problems, first one with the timeouts, which actually is really big for us since we can't run pipeline using AWS Batch without it, and also helps for having more control for volumes attached to the instance, with this autoscalling scripts or EFS or something else. Maybe to be part of configuration similar to workDir? What do you think?
Sven F.
@sven1103
Apr 04 18:34
@ewels thank you for answering
Michael Webb
@michaelwebb_twitter
Apr 04 20:03
Does anyone know if it's possible to turn off the display of the run name during NextFlow execution (i.e., after typing nextflow run in the Terminal)?
I find that looking at phrases like "desperate_lovelace" and "nasty_babbage" all day is a bit distracting and not very nice!
(I know that these names are useful, but I can always look them up if I need them using nextflow log)
Jonathan Gesell
@s181706_gitlab
Apr 04 20:16
I am attempting to run a script that at the end will launch one of several other scripts, however, once the launching script launches the other script, it will run thru the first part, then close out without error or abnormal output. So it is running the first step of the secondary script, then just exiting. Is there a known way to launch a Nextflow script from within a Nextflow script?
Rad Suchecki
@rsuchecki
Apr 04 22:53
@stevekm This makes perfect sense - even more so that images could have been built from docker images. Good point also about parallels with a module system, I might bring that to the attention of our sysadmins - even though a group/lab can do this on their own such approach might make it easier for for admins as well I think we already have some cases where a module just hides a singularity container
Rad Suchecki
@rsuchecki
Apr 04 23:00
@s181706_gitlab can't remember all the details, but generally not advised - if you have to run separate nextflow instances, make sure execution is from separate directories. Also look if the proposed #984 could be of use.