Nextflow community chat moved to Slack! https://www.nextflow.io/blog/2022/nextflow-is-moving-to-slack.html
pditommaso on master
Update readme [ci skip] Signed… Dump scm content when trace is … (compare)
pditommaso on STREAMS-IO
Add support for K8s Job resourc… Fix resolve azure devops reposi… Fix missing err message on subm… and 11 more (compare)
bentsherman on issue-640_resource-min-max
Cleanup AcceleratorResource Si… (compare)
bentsherman on issue-640_resource-min-max
Cleanup AcceleratorResource Si… (compare)
Hi all,
My challenge of my code is to create input for pairwise comparison. Based on the comparison list, I would like to select the samples that are in that group. I have tried the following but not with complete success as I receive a DataflowBroadcast around DataflowStream[?]
bams = Channel.from(["sampleX_1", "groupX", "bamX_1", "baiX_1"],
["sampleX_2", "groupX", "bamX_2", "baiX_2"],
["sampleY_1", "groupY", "bamY_1", "baiY_1"],
["sampleY_2", "groupY", "bamY_2", "baiY_2"],
["sampleZ_1", "groupZ", "bamZ_1", "baiZ_1"],
["sampleZ_2", "groupZ", "bamZ_2", "baiZ_2"])
comparison_list = Channel.from(["groupX", "groupY"],["groupX", "groupZ"])
group_input = comparison_list_file.map{ it ->
def bam_by_group = bam.groupTuple(by:1)
def compare1 = it[0]
def compare2 = it[1]
def group_ctrl = bam_by_group.first{ it[1] == compare1}
def group_case = bam_by_group.first{ it[1] == compare2}
def group_input = group_ctrl.combine(group_case)
return group_input.view() // - desired outcome, see below
}.view() // results into - DataflowBroadcast around DataflowStream[?]
I like the result to have it in this format:
// [[sampleX_1, sampleX_2], groupX, [bamX_1, bamX_2], [baiX_1, baiX_2], [sampleY_1, sampleY_2], groupY, [bamY_1, bamY_2], [baiY_1, baiY_2]]
// [[sampleX_1, sampleX_2], groupX, [bamX_1, bamX_2], [baiX_1, baiX_2], [sampleZ_1, sampleZ_2], groupZ, [bamZ_1, bamZ_2], [baiZ_1, baiZ_2]]
Of which group_input.view()
does but how can I use that?
Thanks!
On a related note, I'm trying to group process outputs together for downstream paired processing. I have a digest file that looks like
sample_id,read1,read2,control_id
cell1_ctl,A.fq.gz,B.fq.gz,
cell1_trt1,C.fq.gz,D.fq.gz,cell1_ctl
cell1_trt2,E.fq.gz,F.fq.gz,cell1_ctl
cell2_ctl,G.fq.gz,H.fg.gz,
cell2_trt1,I.fq.gz,J.fq.gz,cell2_ctl
cell2_trt2,K.fq.gz,L.fq.gz,cell2_ctl
and I have a basic pipeline that drives from per-column channels and looks like
aln_bams = bwamem(samples_ch, read1_ch, read2_ch)
filt_bams = mapq_filter(aln_bams, 30)
but now i need to get to bam channels that look like
[cell1_trt1.bam, cell1_trt2.bam, cell2_trt1.bam, cell2_trt2.bam]
[cell1_ctl.bam, cell1_ctl.bam, cell2_ctl.bam, cell2_ctl.bam]
the issue is I can't find a way to join
the filtered bams with the original IDs. In DSL=1, I could merge
; but using DSL=2 I need an index on which to join
, and there's no such thing as .mapWithIndex
.
Any ideas?
is there any usage example for this. nextflow-io/nextflow#524
I did
process {
cpus = 4
withLabel:test2 {
publish_dir = 'result_data'
}
}
And got the following warning
WARN: Access to undefined parameter `publish_dir` -- Initialise it to a default value eg. `params.publish_dir = some_value`
WARN: Process `test2` publishDir path contains a variable with a null value
WARN: Process `test2` publishDir path contains a variable with a null value
WARN: Process `test2` publishDir path contains a variable with a null value
WARN: Process `test2` publishDir path contains a variable with a null value
WARN: Process `test2` publishDir path contains a variable with a null value
Hi all,
My challenge of my code is to create input for pairwise comparison. Based on the comparison list, I would like to select the samples that are in that group. I have tried the following but not with complete success as I receive a
DataflowBroadcast around DataflowStream[?]
bams = Channel.from(["sampleX_1", "groupX", "bamX_1", "baiX_1"], ["sampleX_2", "groupX", "bamX_2", "baiX_2"], ["sampleY_1", "groupY", "bamY_1", "baiY_1"], ["sampleY_2", "groupY", "bamY_2", "baiY_2"], ["sampleZ_1", "groupZ", "bamZ_1", "baiZ_1"], ["sampleZ_2", "groupZ", "bamZ_2", "baiZ_2"]) comparison_list = Channel.from(["groupX", "groupY"],["groupX", "groupZ"]) group_input = comparison_list_file.map{ it -> def bam_by_group = bam.groupTuple(by:1) def compare1 = it[0] def compare2 = it[1] def group_ctrl = bam_by_group.first{ it[1] == compare1} def group_case = bam_by_group.first{ it[1] == compare2} def group_input = group_ctrl.combine(group_case) return group_input.view() // - desired outcome, see below }.view() // results into - DataflowBroadcast around DataflowStream[?]
I like the result to have it in this format:
// [[sampleX_1, sampleX_2], groupX, [bamX_1, bamX_2], [baiX_1, baiX_2], [sampleY_1, sampleY_2], groupY, [bamY_1, bamY_2], [baiY_1, baiY_2]] // [[sampleX_1, sampleX_2], groupX, [bamX_1, bamX_2], [baiX_1, baiX_2], [sampleZ_1, sampleZ_2], groupZ, [bamZ_1, bamZ_2], [baiZ_1, baiZ_2]]
Of which
group_input.view()
does but how can I use that?Thanks!
I modified map_join function created by multimeric (nextflow-io/nextflow#559). I extended to function to "loop over" the keys and it worked!
.nextflow.log
file in the nextflow.config
file? I know it's possible on the command-line using the -log
option but I would like to specify it in the nextflow.config
so I can modify the path based on our params.outdir
parameter.
How to specify the thread value in nextflow script for sge
jobs?
For local jobs, cpu information is provided using the following configuration settings.
process {
cpus = 12
}
And after that that value is used in $task.cpus
variable. But in can of SGE
executor, the cpus variable is not allowed. So I provided threads using -pe
option. But unable to access that value in the script.
Here is my configuration setup.
process {
executor = 'sge'
clusterOptions = '-V -pe threaded 13'
}
So, can someone suggest a solution for this. Thanks in advance.
hi, all. I am brand-new to nextflow. Just recently installed nextflow on my local computer and ran into the follow error. Here is my code,
num = Channel.from( 1, 2, 3 )
process test {
input:
val num
script:
println "echo process test $num"
}
`
Here is my error: N E X T F L O W ~ version 21.10.0
Launching process.nf
[irreverent_feynman] - revision: 386318b60a
executor > local (3)
executor > local (3)
[b7/c314a2] process > test (1) [100%] 3 of 3, failed: 3 ✘
echo process test 2
echo process test 1
echo process test 3
Error executing process > 'test (3)'
Caused by:
Process test (3)
terminated with an error exit status (127)
Command executed:
null
Command exit status:
127
Command output:
(empty)
Command error:
.command.sh: line 2: null: command not found
Work dir:
/Users/limin/limin_practice/nextflow/work/37/b1c3dd131fb6078750b0f16150fe40
Tip: you can try to figure out what's wrong by changing to the process work dir and showing the script file named .command.sh
Can anyone help me understand what does the error message mean? How should I correct my codes?
memory { 2.GB * task.attempt }
, with separate labels that use the same retry-with-increased-resources strategy (like low_mem
and high_mem
) to coarsely indicate the size of the job. The label definitions are specified in the configuration file, and the labels are applied in the process file.x.GB * task.attempt
), while the actual amount (2.GB
, etc) varies based on some calculation done by the process directive?memory (retryFunc { inputfile.size * 3 })
after upgrading Nextflow 21.04.3 to 21.10.3 my workflow fails with the following error:
groovy.lang.MissingMethodException: No signature of method: nextflow.script.WorkflowDef.nr_records() is applicable for argument types: ([Ljava.lang.Object;) values: [[/gro
ups/solve-rd/tmp10/projects/vip/nextflow/git/vip/test/output/test_snv/.nxf_work/92/1158ff50d7175029ff592f11a241f3/snv_prepared.vcf.gz.stats]]
at nextflow.script.WorkflowParamsResolver.invokeMethod(WorkflowDef.groovy:217)
def nr_records(statsFilePath) {
...
}
channel.fromPath(params.input) \
| prepare \
| branch {
small: nr_records(it.last()) <= params.chunk_size
large: true
}
...
process prepare {
...
output:
tuple ..., path("${vcfOutputPath}.stats")
...
I've tried updating in the caller and callee in various ways without success. does anyone have a clue what changed between Nextflow versions?
Hello, I installed next flow in the computing cluster but I am not able to use the docker files required for running deep variant analysis. I used the following command: nextflow run nf-core/deepvariant -profile test,docker
N E X T F L O W ~ version 21.10.4
Launching nf-core/deepvariant
[lethal_tesla] - revision: 2b5486356c [master]
,--./,-.
___ __ __ __ ___ /,-._.--~'
|\ | |__ __ / ` / \ |__) |__ } {
| \| | \__, \__/ | \ |___ \`-._,-`-,
`._,._,'
Pipeline Name : nf-core/deepvariant
Pipeline Version: 1.0
Bam file : https://github.com/nf-core/test-datasets/raw/deepvariant/testdata/NA12878_S1.chr20.10_10p1mb.bam
Bed file : https://github.com/nf-core/test-datasets/raw/deepvariant/testdata/test_nist.b37_chr20_100kbp_at_10mb.bed
Reference genome: hg19chr20
Fasta Ref : s3://deepvariant-data/genomes/hg19chr20/chr20.fa
Fasta Index : s3://deepvariant-data/genomes/hg19chr20/chr20.fa.fai
Fasta gzipped : s3://deepvariant-data/genomes/hg19chr20/chr20.fa.gz
Fasta gzipped Index: s3://deepvariant-data/genomes/hg19chr20/chr20.fa.gz.fai
Fasta bgzip Index: s3://deepvariant-data/genomes/hg19chr20/chr20.fa.gz.gzi
Max Memory : 6 GB
Max CPUs : 2
Max Time : 2d
Model : wgs
Output dir : results
Working dir : /scicore/home/cichon/thirun0000/my-pipelines/nf-core/deepvariant-master/work
Container Engine: docker
Container : nfcore/deepvariant:1.0
Current home : /scicore/home/cichon/thirun0000
Current user : thirun0000
Current path : /scicore/home/cichon/thirun0000/my-pipelines/nf-core/deepvariant-master
Script dir : /scicore/home/cichon/thirun0000/.nextflow/assets/nf-core/deepvariant
executor > local (1)
[- ] process > preprocess_bam -
[- ] process > make_examples -
[- ] process > call_variants -
[- ] process > postprocess_variants -
[e7/dec5cb] process > get_software_versions [100%] 1 of 1, failed: 1 ✘
Execution cancelled -- Finishing pending tasks before exit
Error executing process > 'get_software_versions'
Caused by:
Process get_software_versions
terminated with an error exit status (127)
Command executed:
echo 1.0 &> v_nf_deepvariant.txt
echo 21.10.4 &> v_nextflow.txt
ls /opt/conda/pkgs/ &> v_deepvariant.txt
python --version &> v_python.txt
pip --version &> v_pip.txt
samtools --version &> v_samtools.txt
lbzip2 --version &> v_lbzip2.txt
bzip2 --version &> v_bzip2.txt
scrape_software_versions.py &> software_versions_mqc.yaml
Command exit status:
127
Command output:
(empty)
Command error:
.command.run: line 279: docker: command not found
Work dir:
/scicore/home/cichon/thirun0000/my-pipelines/nf-core/deepvariant-master/work/e7/dec5cbd65b63e4fdd5ccced4f1e516
Tip: you can replicate the issue by changing to the process work dir and entering the command bash .command.run
Fastagz file not found: s3://deepvariant-data/genomes/hg19chr20/chr20.fa.gz
Fai file not found: s3://deepvariant-data/genomes/hg19chr20/chr20.fa.fai
gzfai file not found: s3://deepvariant-data/genomes/hg19chr20/chr20.fa.gz.fai
please specify --bed option (--bed bedfile)
https://github.com/nf-core/test-datasets/raw/deepvariant/testdata/NA12878_S1.chr20.10_10p1mb.bam not found
gzi file not found: s3://deepvariant-data/genomes/hg19chr20/chr20.fa.gz.gzi
hello
is it possible to make iteration over processes in dsl2 ?, as in https://github.com/nextflow-io/patterns/blob/master/docs/feedback-loop.adoc
what I tried doesn't work
(to annotate vcf with several databases)
take:
entrees_vcf // tuple [info1, info2, info3, vcf]
bdds // collection of collections: [ [bdd1,index1,tag1,fields1], [bdd2,index2,tag2,fields2], ...]
main:
input_ch = input_vcf
.concat( bdds.map{it[0]}, Channel.of(0) ).flatten().toList()
.mix(
Annotation_bdd_vcf.out
.concat( bdds )
.toList()
.map { ch_tuple, bdds ->
if ( ch_tuple[4] < bdds.size() ) { //(bdds[iteration]) {
return [ ch_tuple[0], ch_tuple[1], ch_tuple[2], ch_tuple[3], bdds[iteration], ch_tuple[4] ].flatten()
}
}
)
Annotation_bdd_vcf( input_ch )
in Annotation_bdd_vcf input, the last variable (coming from Channel.of(0)) is incremented each time the process is executed
-M
option) and the only thing that is mentioned in the documentation and the issues is using env.SLURM_CLUSTERS. Unfortunately as I am reading in another issue (and I have confirmed by trying) that is not valid in a process { withLabel: xxx {cant put env here}}
selector block. Is my only option a PR?
nextflow.file.FileHelper - Can't check if speci
fied path is NFS (1): /mop2-bucket-1/scratch
publishDir
with DSL2. This announcement suggested that there would be improvements here, and I found this that seems to provide some mechanism, but I was assuming that you should be able to define a process that does not need to worry about publishDir
and let the workflow choose which outputs to publish? Is this possible? Any examples?
Is it possible to use a .count()
channel as the size
parameter for GroupTuples
?
I currently have:
bcftools_index_somatic.out.vcfs_w_csis.groupTuple(by: [0, 1, 2, 3], size: extract_chroms_from_bed.out.chroms_list.count()).set{ vcfs_by_patient }
But I get the error:
Value 'DataflowVariable(value=null)' cannot be used in in parameter 'size' for operator 'groupTuple' -- Value don't match: class java.lang.Integer
Is it possible to convert the .count()
channel into something consumable by size:
?
$params
because this gives a comma-separates map, but something that returns a table. Any ideas?
docker.fixOwnership = true
(and install procps
in your image otherwise Nextflow will complain that you don't have ps
installed). Best guess is that the mismatch in the ids for the owner of the image manifest file was preventing it from being accessed. Haven't tried it but a quick peek at the source code suggests that setting NXF_OWNER
will also make things work.