These are chat archives for nextflow-io/nextflow

14th
Aug 2018
Guillaume Collet
@gcollet
Aug 14 2018 06:39 UTC
@skptic I replaced perl by awk and it works.
Guillaume Collet
@gcollet
Aug 14 2018 06:55 UTC
After a few tests, I found that when I open a file in a perl script that is called in my nextflow script, with-report... it fails with the same error as @skptic. Nextflow can not kill a process. So I suppose perl doing something weird with processes... I investigate
Maxime Garcia
@MaxUlysse
Aug 14 2018 07:22 UTC
@gcollet Hi, nice to have you join here ;-)
Guillaume Collet
@gcollet
Aug 14 2018 07:26 UTC
@MaxUlysse Hey Max nice to see you
OK I found how to fix the bug : In the perl script, I open a file with '<:encoding(UTF-8)' but when I only use '<' (read mode without encoding), it works
I saw that encoding uses a C dynamic library called by XSLoader of Perl. So, I suppose that it calls it in another process
Vanessasaurus
@vsoch
Aug 14 2018 07:50 UTC
Sorry I missed the earlier comments about Canada. Awesome!
Sander Bervoets
@Biocentric
Aug 14 2018 10:09 UTC
Hey all, I've been trying out the Nextflow and AWS Batch combination. When trying the rnaseq-nf demo pipeline on AWS Batch, it almost completely works but it returns "segmentation faults" near the end. Rnaseq-nf works and finishes fine on local machines. Error: ".command.sh: line 2: 122 Segmentation fault salmon quant --threads 8 --libType=U -i index -1 ggal_liver_1.fq -2 ggal_liver_2.fq -o ggal_liver". Someone any clue?
Stijn van Dongen
@micans
Aug 14 2018 10:13 UTC
Hello ... I would like to dump NF state to a file after a run. One thing I want to get is for example the run name or session ID so that I can dump a log of commands run using nextflow log <runname> -f script. It seems that dumping or serialising the workflow object with an onComplete() call is pretty much what I'm after. Has anyone done something like this, or is it something I overlooked?
tbugfinder
@tbugfinder
Aug 14 2018 11:58 UTC
@Biocentric what's your AWS Batch setup like?
Karin Lagesen
@karinlag
Aug 14 2018 12:17 UTC
Hi
is there any way to set an entire process as conditional on something?
let's say I set a params.doAssembly = "no", is there a way for me to skip a process if that parameter is set to no?
Alexander Peltzer
@apeltzer
Aug 14 2018 12:20 UTC
Use the whendirective
Karin Lagesen
@karinlag
Aug 14 2018 12:23 UTC
thanks
was a bit tricky to find :)
question: will this do all of the "setup work" for the processes even though they won't be run?
Alexander Peltzer
@apeltzer
Aug 14 2018 13:02 UTC
Thats more of a question for Paolo I guess ;-)
Sander Bervoets
@Biocentric
Aug 14 2018 13:09 UTC
@tbugfinder its setup for nextflow/rnaseq-nf plus my AWS Batch queue and region, and I have a modified AMI image with added storage and AWS CLI, all correctly linked. I've struggled with the whole AWS setup (incl S3) for a while, but I think THAT part is OK; it submits jobs, stores work, and most salmon jobs finish correctly. Only one of the salmon jobs crashes, as mentioned above.
Karin Lagesen
@karinlag
Aug 14 2018 13:10 UTC
@apeltzer I sort of assume that he will answer when&if he has time :)
not critical for my processes though, more curious
next scenario for you guys and/or @pditommaso : I have a process that downloads some files that are used in later processes
I'd like to be able to give such a file as a params, in case somebody wants to use an already downloaded file
Karin Lagesen
@karinlag
Aug 14 2018 13:15 UTC
I would in this scenario need to skip the download process with a when, but I'm uncertain how I could organize having effectively two ways of setting an input to a process
Félix C. Morency
@fmorency
Aug 14 2018 13:23 UTC
@karinlag are those two ways mutually exclusive?
Karin Lagesen
@karinlag
Aug 14 2018 13:24 UTC
I guess not
I am just having trouble wrapping my head around it
so, I have a directory that is an input to the command that I\m running
that directory is now a fixed filename that I get from a previous process
so I'd need to parameterize that, and then ...
Stijn van Dongen
@micans
Aug 14 2018 13:26 UTC

@karinlag , I sometimes also define processes in a Groovy if statement, e.g. if (params.doAssembly == 'yes') { process ... }. This is sometimes easier I feel. I am not sure if it is (always) bad style. It is also possible to set a channel to Channel.Empty() conditionally, this can be useful too. @pditommaso gave this example two months back:

params.trim_to = true
Channel.fromPath('.data/reads/*_1.fq.gz').set{ fastq_ch }
trim_ch = params.trim_to ? fastq_ch : Channel.empty()
no_trimmed_ch = !params.trim_to ? fastq_ch : Channel.empty()

process trim_to_length {
  input:
  file(r1) from trim_ch

  output:
  file('r1.trim.fastq.gz') into trimmed_fastq

  script:
  """
  zcat $r1 | awk 'NR%2==0 {print substr(\$0, 1, $params.trim_to)} NR%2!=0' | gzip -c -1 > r1.trim.fastq.gz
  """
}

process next_stage {
  input:
  file x from trimmed_fastq.mix(no_trimmed_ch)
  """
  echo your_command --input $x
  """
}

I try to be vigilant and see which of the NF idioms is suitable, but I definitely need more experience.

Félix C. Morency
@fmorency
Aug 14 2018 13:28 UTC
@karinlag can you post an isolated code snippet of what you're trying to achieve?
Karin Lagesen
@karinlag
Aug 14 2018 13:28 UTC
I'll try :)

process run_ariba_amr_pred {
publishDir params.out_dir + "/" + params.amr_results, mode: 'copy'

input:
set pair_id, file(reads) from read_pairs_amr
file "db_amr_prepareref" from db_amr_prepareref

output:
file "${pair_id}_amr_report.tsv" into pair_id_amr_tsv
file "${pair_id}_ariba" into pair_id_amr_aribadir

when:
params.do_amr == "yes"


"""
${preCmd}
ariba run db_amr_prepareref ${pair_id}_R*_concat.fq.gz ${pair_id}_ariba &> ariba.out
cp ${pair_id}_ariba/report.tsv ${pair_id}_amr_report.tsv

"""

}

got a bit cut, but should do the trick
it's the db_amr_prepareref that I'd also be able to set by using a params
Maxime Garcia
@MaxUlysse
Aug 14 2018 13:31 UTC
@karinlag I just quickly read what you wanted to do, and I think it's totally possible to do that
Basically for Travis CI for our pipeline, I have a specific script to build all of the indexes and download all of the references, and then other scripts to run the pipeline
And like @apeltzer said, the when directive is quite usefull to filter out processes
Karin Lagesen
@karinlag
Aug 14 2018 13:35 UTC
I am assuming it's possible, just having trouble figuring out how
Possibly because it's late in the day :P
Maxime Garcia
@MaxUlysse
Aug 14 2018 13:38 UTC
Go get more coffee ;-)
Karin Lagesen
@karinlag
Aug 14 2018 13:38 UTC
hih! Trying to cut down, got to the level where I was drowning in the stuff...
Maxime Garcia
@MaxUlysse
Aug 14 2018 13:39 UTC
Some tea maybe?
Karin Lagesen
@karinlag
Aug 14 2018 13:39 UTC
Not a bad idea :)
Tain Mauricio Velasco Luquez
@TainVelasco-Luquez
Aug 14 2018 13:59 UTC
Hello next flowers !!
```

Hello nextflowers !!

I am facing this error, in which, even though the repository do contain the nextflow.configfile, nextflow is not able to recognise it. by looking at similar errors, it is common when the repository is private and further credential are not provided, however, my repo is public.

So please, some help with this issue.

nextflow run https://github.com/TainVelasco-Luquez/QUARS
N E X T F L O W  ~  version 0.30.2
Pulling TainVelasco-Luquez/QUARS ...
Not a valid Nextflow project -- The repository `https://github.com/TainVelasco-Luquez/QUARS` must contain a the script `main.nf` or the file `nextflow.config`
Maxime Garcia
@MaxUlysse
Aug 14 2018 14:06 UTC
@TainVelasco-Luquez How is your main script called?
you should specify the mainScript in the manifest scope within your nextflow.config file
Given your command line, I'm guessing you're trying to have Nextflow dowload your pipeline from github, but maybe you want to test it locally before with something like:
nextflow run /pathto/QUARS/main.nf
KochTobi
@KochTobi
Aug 14 2018 14:10 UTC
@TainVelasco-Luquez Which configuration do you want to have? You have specified params.cpus in your nextflow.config as well as in your quars.nf. Maybe an overriding issue?
Tain Mauricio Velasco Luquez
@TainVelasco-Luquez
Aug 14 2018 14:13 UTC
@MaxUlysse My main script is called quads.nf but I properly defined such name in the [nextflow.configandmainScript. I have already test it locally, and worked fine. However, now that I published it I cannot pull it withnextflow run my/repo`.
@KochTobi Right! Initially I set them in the quars.nf, and later in the .config file, though I forgot to remove the params. from the quars.nf. It is done, and will be available the next pushing. Thanks.
Tain Mauricio Velasco Luquez
@TainVelasco-Luquez
Aug 14 2018 14:19 UTC
@MaxUlysse My main script is called quars.nf but I properly defined such name in the nextflow.config and mainScript.. I have already test it locally, and worked fine. However, now that I published it I cannot pull it with nextflow run my/repo.
Maxime Garcia
@MaxUlysse
Aug 14 2018 14:19 UTC
I just had a look at your repo, and I think @KochTobi idea with the params.cpus could be it
Tain Mauricio Velasco Luquez
@TainVelasco-Luquez
Aug 14 2018 14:25 UTC
@MaxUlysse After fixing the overriding issue @KochTobi pointed out, the same error showed up. =(
Kevin Sayers
@KevinSayers
Aug 14 2018 14:34 UTC
@TainVelasco-Luquez I think it is the author in the manifest
@TainVelasco-Luquez if you do author = '@TainVelasco-Luquez' I think it will work
Venkat Malladi
@vsmalladi
Aug 14 2018 14:39 UTC
Trying to read the number of lines in a file: but am getting this error: groovyx.gpars.dataflow.DataflowQueue.readLines() is applicable for argument types: () values: []
any idea
Tain Mauricio Velasco Luquez
@TainVelasco-Luquez
Aug 14 2018 14:39 UTC
@KevinSayers Thanks for the update, however, it does not work neither =(
Kevin Sayers
@KevinSayers
Aug 14 2018 14:47 UTC
@TainVelasco-Luquez that is odd. I can reproduce it in my fork and the author switch fixes it
Tain Mauricio Velasco Luquez
@TainVelasco-Luquez
Aug 14 2018 14:52 UTC
@KevinSayers great news!! So I am doing something wrong then. Did you pull the htcondor_integration branch (the one with the changes suggested by you, @MaxUlysse and @KochTobi )? I am running as follows and still get the same error:
nextflow run https://github.com/TainVelasco-Luquez/QUARS -r htcondor_integration
N E X T F L O W  ~  version 0.30.2
Pulling TainVelasco-Luquez/QUARS ...
Not a valid Nextflow project -- The repository `https://github.com/TainVelasco-Luquez/QUARS` must contain a the script `main.nf` or the file `nextflow.config`
Kevin Sayers
@KevinSayers
Aug 14 2018 14:52 UTC
@TainVelasco-Luquez yea if I try your branch I get the same
@TainVelasco-Luquez if you use my fork nextflow run https://github.com/KevinSayers/QUARS does it work for you?
Tain Mauricio Velasco Luquez
@TainVelasco-Luquez
Aug 14 2018 14:56 UTC
@KevinSayers Yes! It works shinily. I am trying to figure out where is my mistake.
nextflow run https://github.com/KevinSayers/QUARS
N E X T F L O W  ~  version 0.30.2
Pulling KevinSayers/QUARS ...
 downloaded from https://github.com/KevinSayers/QUARS.git
Launching `KevinSayers/QUARS` [voluminous_davinci] - revision: 6ef0836608 [master]

 Fastp is about to run..
Venkat Malladi
@vsmalladi
Aug 14 2018 14:57 UTC
noUniqueExperiments = Channel .from(uniqueExperiments) .readLines() .size()
Tain Mauricio Velasco Luquez
@TainVelasco-Luquez
Aug 14 2018 15:07 UTC

@KevinSayers Ok I solved it. Somehow, pulling the changes in the branch htcondor_integration(my feature branch) and running the next lines throws the same error

nextflow run https://github.com/TainVelasco-Luquez/QUARS -r htcondor_integration
N E X T F L O W  ~  version 0.30.2
Pulling TainVelasco-Luquez/QUARS ...
Not a valid Nextflow project -- The repository `https://github.com/TainVelasco-Luquez/QUARS` must contain a the script `main.nf` or the file `nextflow.config`

However, after merging the branch htcondor_integration to the master, it works shinily:

nextflow run https://github.com/TainVelasco-Luquez/QUARS
N E X T F L O W  ~  version 0.30.2
Pulling TainVelasco-Luquez/QUARS ...
 downloaded from https://github.com/TainVelasco-Luquez/QUARS.git
Launching `TainVelasco-Luquez/QUARS` [shrivelled_mirzakhani] - revision: 1b9ebb3296 [master]

 Fastp is about to run...

Thanks a lot @KevinSayers, @MaxUlysse and @KochTobi for your help.

Happy nextflowing !!
:+1:

Venkat Malladi
@vsmalladi
Aug 14 2018 18:04 UTC
any ideas?
tbugfinder
@tbugfinder
Aug 14 2018 18:05 UTC
@Biocentric which instance type and which was batch job definition do you have?
tbugfinder
@tbugfinder
Aug 14 2018 18:18 UTC
@vsmalladi is uni
is uniqueExperiments the filename?
Mike Smoot
@mes5k
Aug 14 2018 18:26 UTC
@vsmalladi do you want noUniqueExperiments to be a normal groovy variable or do you want it to be a channel with a single value in it? If you want it to be a groovy variable, then I think the code should be file(uniqueExperiments).readLines().size(). If you want the size to be a value in a channel, then I think it needs to be (something like) Channel.from(uniqueExperiments).readLines().toList().map{ it.size() }
Venkat Malladi
@vsmalladi
Aug 14 2018 18:52 UTC
@mes5k thanks trying the second one now
@mes5k the issue I am having is uniqueExperiments is a channel in output already
Mike Smoot
@mes5k
Aug 14 2018 18:55 UTC
then shouldn't it be uniqueExperiments.readLines().toList()...?
Venkat Malladi
@vsmalladi
Aug 14 2018 18:55 UTC
ya that is what i suspect
@mes5k but get the eror: ERROR ~ Not a valid channel type: [groovyx.gpars.dataflow.expression.DataflowGetPropertyExpression]
Mike Smoot
@mes5k
Aug 14 2018 18:57 UTC
Can you break the chain of operators up so that you can identify which operator is causing this problem?
Venkat Malladi
@vsmalladi
Aug 14 2018 18:57 UTC
ya
@mes5k below is the example that i am testing
params.designFile = "$baseDir/test_data/design_ENCSR238SGC_SE.txt"
designFile = file(params.designFile)


// Annotate Peaks
process test {

  publishDir "$baseDir/output/${task.process}", mode: 'copy'

  input:

  file designFile

  output:

  file "test.txt" into test

  script:

  """
  cp $designFile test.txt
  """
}
// Define channel to find number of unique experiments
 test.readLines().each { println it }
the output I get is DataflowInvocationExpression(value=null)
Venkat Malladi
@vsmalladi
Aug 14 2018 19:02 UTC
@mes5k any thoughts
Mike Smoot
@mes5k
Aug 14 2018 19:05 UTC
heh, I was just assuming that readLines was a valid operator - I don't think it is - you want splitText instead.
Venkat Malladi
@vsmalladi
Aug 14 2018 19:10 UTC
intersting