These are chat archives for nextflow-io/nextflow

28th
Jun 2017
Alexander Mikheyev
@mikheyev
Jun 28 2017 02:46
@pditommaso Weirdly, the jobs just disappear without an error being generated. I am working with the cluster admins to figure out the reason for it. I am not sure what Nextflow's behavior should be in such a pathological case.
Nexflow actually tries to kill some of these nonexisting jobs, but can't
```Jun-28 11:43:41.743 [main] DEBUG n.executor.AbstractGridExecutor - Unable to killing pending jobs
  • cmd executed: scancel 13931726 13931728 13931729 13931733 13931740 13931746 13931747 13931748 13931751 13931755 13931756 13931759 13931763 13931764 13931769
  • exit status : 130
  • output :
    scancel: error: slurm_kill_job2() failed Job/step already completing or completed
    scancel: error: slurm_kill_job2() failed Job/step already completing or completed
    scancel: error: slurm_kill_job2() failed Job/step already completing or completed
    scancel: error: slurm_kill_job2() failed Job/step already completing or completed
    scancel: error: slurm_kill_job2() failed Job/step already completing or completed
    ```
So, maybe the job poller can notice that a job is not there, and throw an error?
Alexander Mikheyev
@mikheyev
Jun 28 2017 04:43

I am trying to split a series of paired-end libraries, and then phase them together, but some of the resulting files appear to be out of sync:

forward =  Channel
     .fromPath("data/reads/*R1*.fastq.gz")     
     .map{ file -> tuple(file.name.replaceAll(/_R1.*/,""), file) }
     .splitFastq( file: true, by: params.chunkSize)


reverse =  Channel
     .fromPath("data/reads/*R2*.fastq.gz")
     .map{ file -> tuple(file.name.replaceAll(/_R2.*/,""), file) }
     .splitFastq( file: true, by: params.chunkSize)

process map {

    input:
    file ref from file( params.ref )
    set file(forward), file(reverse) from forward.phase(reverse).map{f, r -> tuple(f[1], r[1])}

This seems to work the vast majority of the time, but once in a while the mapping process finds read files that are not properly paired. What am I doing wrong?

Alexander Mikheyev
@mikheyev
Jun 28 2017 05:15
Actually, I think I know what's wrong -- the library name is not unique to the split files, but how do I make it so?
Paolo Di Tommaso
@pditommaso
Jun 28 2017 06:13
I think the problem is that phase don't work as expected here
by doing that you will have many chunks with the same sample ID, thus they may be out of sync
Paolo Di Tommaso
@pditommaso
Jun 28 2017 06:23
you can use something like the following
params.chunkSize = 10_000

Channel
  .fromFilePairs('data/reads/*_{1,2}.fq.gz', flat:true)
  .into { fwd; rev }

fwd = fwd.map { sample, file1, file2 -> tuple(sample,file1) }.splitFastq(file:true, by:params.chunkSize)  
rev = rev.map { sample, file1, file2 -> tuple(sample,file2) }.splitFastq(file:true, by:params.chunkSize)  

fwd.merge(rev) { left, right -> tuple(left[0], left[1], right[1]) }.println()
Said that, it think it would be handy that splitFastq would be able to split two or more files are at times. We were speaking with @skptic about that yesterday
Alexander Mikheyev
@mikheyev
Jun 28 2017 06:51
Thanks!
Paolo Di Tommaso
@pditommaso
Jun 28 2017 06:53
also, you should make sure that both read files contains the same number of reads (!)
Alexander Mikheyev
@mikheyev
Jun 28 2017 06:53
I second the addition of a multiple file feature for splitFastq
Paolo Di Tommaso
@pditommaso
Jun 28 2017 06:54
great
:)
regarding the problem with your cluster, thus what happens is that some job work directories are just dropped ?
Maxime Garcia
@MaxUlysse
Jun 28 2017 08:39
HI @pditommaso quick question about singularity, Nextflow does not dowload images from shub yet, right?
Paolo Di Tommaso
@pditommaso
Jun 28 2017 08:41
Not possible at this time, see #356
Maxime Garcia
@MaxUlysse
Jun 28 2017 08:42
Oh yes, I remember reading this issue, I'll suscribe to it then
I'm trying to run CAW with singularity on one of UPPMAX cluster
It will probably be fun
;-)
Paolo Di Tommaso
@pditommaso
Jun 28 2017 08:44
I guess so
Just download the image and use the file
Curious to know how it will perform
Maxime Garcia
@MaxUlysse
Jun 28 2017 08:45
I'll keep you updated
Alexander Mikheyev
@mikheyev
Jun 28 2017 08:55
@pditommaso For the cluster, it looks like some jobs launch, then suddenly disappear without an error log or any other trace that they existed
Paolo Di Tommaso
@pditommaso
Jun 28 2017 08:55
!!
Alexander Mikheyev
@mikheyev
Jun 28 2017 08:56
It is super-puzzling and appears to affect only members of my research group.
So, my guess it has something to do with lustre and the particular permission set I have.
Paolo Di Tommaso
@pditommaso
Jun 28 2017 08:56
But the task work for created by NF are deleted as well?
*work directories
Alexander Mikheyev
@mikheyev
Jun 28 2017 08:59
I don't think they are ever even created
I ran a test with just a sleep command, and the jobs disappear into the ether
Paolo Di Tommaso
@pditommaso
Jun 28 2017 09:03
Oh, I see
Phil Ewels
@ewels
Jun 28 2017 09:03
@pditommaso - running on a SLURM cluster, is there any way to find the slurm job ID of a failed process? I want to check it's memory usage using tools we have on our cluster (I didn't use -with-trace) but I need the numeric job id.
Paolo Di Tommaso
@pditommaso
Jun 28 2017 09:04
Job IDs are in the trace file generated by NF with the -with-trace option
Phil Ewels
@ewels
Jun 28 2017 09:04
yeah, but I didn't specify that ;)
Paolo Di Tommaso
@pditommaso
Jun 28 2017 09:05
What bdo you mean if so? :(
Oops I mean .. :)
Shellfishgene
@Shellfishgene
Jun 28 2017 09:10
@pditommaso I'm trying your Kallisto pipeline, but something ist not working. I have single end reads, named S1.fastq.gz, S2.fastq.gz...S20.fatq.gz and I'm setting --reads ./raw_data/*.fastq.gz. However nf is only running the first sample, S10, and the $name variable from set val(name), file(reads) from read_files from the mapping process is not set. Any idea?
Paolo Di Tommaso
@pditommaso
Jun 28 2017 09:12
Have you put the path between sigle quotes, eg
Phil Ewels
@ewels
Jun 28 2017 09:12
@pditommaso - time to rerun with -with-trace ;)
Paolo Di Tommaso
@pditommaso
Jun 28 2017 09:13
--reads './raw_data/*.fastq.gz'
@ewels yes, the job IDs are not reported otherwise
Shellfishgene
@Shellfishgene
Jun 28 2017 09:14
Right, if I don't bash will expand that I guess. Now it works, thanks!
Paolo Di Tommaso
@pditommaso
Jun 28 2017 09:17
:+1:
Jose Espinosa-Carrasco
@JoseEspinosa
Jun 28 2017 09:22
hi there!
I am running the following code:
source = Channel.from( [1, 'midbody', 1], [1, 'tail', 2], [2, 'midbody', 1] , [2, 'tail', 2])
target =  Channel.from( [1, 'midbody', 1], [1, 'tail', 2], [2, 'midbody', 1] , [2, 'tail', 2] )

source.cross(target).subscribe { println it }
Paolo Di Tommaso
@pditommaso
Jun 28 2017 09:23
and...
Jose Espinosa-Carrasco
@JoseEspinosa
Jun 28 2017 09:23
I would expect it to return combinations of all the list having the same first value, but I get this:
[[1, midbody, 1], [1, midbody, 1]]
[[1, tail, 2], [1, tail, 2]]
[[2, midbody, 1], [2, midbody, 1]]
[[2, tail, 2], [2, tail, 2]]
I solved it using combine
Paolo Di Tommaso
@pditommaso
Jun 28 2017 09:24
Use combine instead of cross, in most cases it's better
Jose Espinosa-Carrasco
@JoseEspinosa
Jun 28 2017 09:25
yeah but it is this the expected behavior of cross?
Paolo Di Tommaso
@pditommaso
Jun 28 2017 09:26
Can't check now, on my way to the :airplane:
Jose Espinosa-Carrasco
@JoseEspinosa
Jun 28 2017 09:26
ok thanks, have a nice flight! :smile:
tonywang0613
@tonywang0613
Jun 28 2017 15:57
Hi, @pditommaso quick question, can we set up switch in the directives. for example, if params.test=="xx" then process.container="DOCKER IMAGE ID" else process.container="ANOTHER IMAGE ID"
Paolo Di Tommaso
@pditommaso
Jun 28 2017 15:58
yep
tonywang0613
@tonywang0613
Jun 28 2017 15:59
great, I can setup under each process, or can it be in the config file?
Paolo Di Tommaso
@pditommaso
Jun 28 2017 15:59
process foo {
  container { params.test=='xx' ? 'image/x' : 'image/y' }
  :   
}
tonywang0613
@tonywang0613
Jun 28 2017 15:59
cool
Paolo Di Tommaso
@pditommaso
Jun 28 2017 16:00
in the config is the same but you need to use the assignment operator ie.
tonywang0613
@tonywang0613
Jun 28 2017 16:00
got it, I will try it
Paolo Di Tommaso
@pditommaso
Jun 28 2017 16:00
process.container =  { params.test=='xx' ? 'image/x' : 'image/y' }
tonywang0613
@tonywang0613
Jun 28 2017 16:00
Thanks, that's really helpful
Paolo Di Tommaso
@pditommaso
Jun 28 2017 16:00
:+1:
tonywang0613
@tonywang0613
Jun 28 2017 16:01
thanks
Matthieu Pichaud
@MatPich_twitter
Jun 28 2017 17:43
Hi, this may be more a groovy than a nextflow problem ... I struggle to create a Channel that takes a path such as xxxx/keyword/file.txt and yields [keyword, fullpath].
I've tried to tune fromFilePairs and fromPath for this need, but with no success. Any idea how to do that?
Paolo Di Tommaso
@pditommaso
Jun 28 2017 17:46
Channel.fromPath('*').map { tuple(it.name, it) }
^^^ for example
Félix C. Morency
@fmorency
Jun 28 2017 20:15
@pditommaso how do you mix optional and into?
I get errors like ERROR ~ No signature of method: nextflow.script.SetOutParam.optional() is applicable for argument types: (java.lang.Boolean) values: [true]
Matthieu Pichaud
@MatPich_twitter
Jun 28 2017 20:22
Thanks a lot for your quick help (this time again), Paolo
Félix C. Morency
@fmorency
Jun 28 2017 20:49
and set
Félix C. Morency
@fmorency
Jun 28 2017 20:57
set sid, "bar" optional true into foodoesn't work :(
Paolo Di Tommaso
@pditommaso
Jun 28 2017 21:08
optional can only be used on file outputs
Paolo Di Tommaso
@pditommaso
Jun 28 2017 21:38
@MatPich_twitter welcome
Félix C. Morency
@fmorency
Jun 28 2017 22:16
Oh I see