Nextflow community chat moved to Slack! https://www.nextflow.io/blog/2022/nextflow-is-moving-to-slack.html
jorgeaguileraseqera on 2905-hide-sensitive-data-from-scm-content-trace
feature: stripe sensitive data … (compare)
jorgeaguileraseqera on 2905-hide-sensitive-data-from-scm-content-trace
feature: stripe sensitive data … (compare)
bentsherman on issue-640_resource-min-max
Add request/limit support to re… (compare)
How can I use this as the 'size' part of a groupTuple? I tried:
aligned_bams
.groupTuple(by:0,size:lane_calc)
But it did not like it - complained about the value type etc. All thoughts gladly received!
.collect()
the files from each tuple at the same time. Anyone has done this before? I've already read about each
and combine
, so my input channel is of the right format. The problem I have is .collect()
- not sure how to incorporate it in the input tuple.
Hi, in DSL2, do functions allow the passing in of channels? I was hoping to clean up a workflow where a similar sequence of operations is applied to channels, but it looks like channels passed into functions are demoted to plain Java collections.
The error reads:
No signature of method: java.util.LinkedHashMap.collectFile() is applicable for argument types: (LinkedHashMap, Script_77e02758$_collect_file_tuples_closure1) values: [[storeDir:null, sort:hash], Script_77e02758$_collect_file_tuples_closure1@54e2fe]
process {
withName: structural_alignment {
if (task.exitStatus in 140..143 )
"""
errorStrategy = 'retry'
cpus = {2 * task.attempt}
maxRetries = 5
"""
else
errorStrategy = 'retry'
maxRetries = 10
}
}
process {
withName: structural_alignment {
if (task.exitStatus in 140..143 ) {
errorStrategy = 'retry'
cpus = {2 * task.attempt}
maxRetries = 5
}
else {
errorStrategy = 'retry'
maxRetries = 10
}
}
Hi guys.
I have a simple doubt (I think). I just need to save my output process in an external directory so I'm using the publishDir
function for it. However I need 2 things: i) use variables (collected from input tuple); ii) Create the directory because it doesn't exist.
I successfully run it like this:
publishDir "${params.out_dir}/${tcga_project}/${tcga_barcode}/RSEM_quantification", mode: 'move'
input:
file STAR_bam_file from STAR_alignment_bam
set val(sample_UUID), val(tcga_barcode), val(tcga_project) from samples_ch_3
But It doesn't create the directory. So I tried switching to this:
outDir = file("${params.out_dir}/${tcga_project}/${tcga_barcode}/RSEM_quantification")
outDir.mkdirs()
publishDir outDir, mode: 'move'
input:
file STAR_bam_file from STAR_alignment_bam
set val(sample_UUID), val(tcga_barcode), val(tcga_project) from samples_ch_3
However it states the error of:No such variable: tcga_project
Any help in this situtation? Thanks :)
-profile docker
at the command line level, and instead I would like to have in my nextflow.config file profile = docker
such that it successfully imports all of the config options specified by the docker profile. thanks
collectFile
in my workflow to combine the results of several upstream processes, and setting storeDir
. However, when I resume my workflow (using AWS Batch executor), even though the upstream processes are correctly cached, the downstream process that consumes the output of collectFile
always re-executes. I've tried sorting the input to collectFile
with no luck.
How come
Channel
.fromFilePairs( "${params.reads}/*{1,2}*.fastq.gz", size: 2 )
.into { paired_reads }
does not work when the file starts with a number (like 7126-MS-1_100_S1_L005_R1_001.fastq.gz
and 7126-MS-1_100_S1_L005_R2_001.fastq.gz
or 21VR067049-211104_S1_L001_R1_001.fastq.gz
and21VR067049-211104_S1_L001_R2_001.fastq.gz
.collate()
in my pipeline and then an implementation of this which works elegantly. My problem is with the -resume
option. Some processes get cached, but the rest are re-scheduled. I am assuming the order changes from run to run before the collate operator, and thus assumes input has changed. Could this be it?
Hello Group members, I am trying a sample nextflow script using Azure Blob storage for work directory. I am getting an error and couldn't understand if my setup has anything to do, appreciate your response. Here is the error when trying to run my sample script connecting to Azure Blob storage
nextflow run tutorial.nf -w az://nextflow-work/work
How to deal with the empty path channel in process (dsl2)? I saw a solution for empty channel of value (https://github.com/nextflow-io/patterns/blob/master/docs/process-when-empty.adoc)
nextflow.enable.dsl=2
params.a = '*.txt'
params.b = ''
params.c = ''
process test {
tag "test "
echo true
input:
path file1
path file2
val value1
"""
echo $file1
echo $file2
echo $value1
"""
}
workflow {
a_ch = Channel.fromPath(params.a)
b_ch = Channel.fromPath(params.b)
test(a_ch,b_ch,params.c)
}
The Error message:
N E X T F L O W ~ version 21.04.3
Launching `code/test2.nf` [modest_colden] - revision: b4078eef6a
Missing `fromPath` parameter
a_ch = params.a ? Channel.fromPath(params.a) : []
maybe?
Hi,
I'm trying to use Docker in Nextflow script and I'm getting the following error:
"connect to the Docker daemon socket at unix:///var/run/docker.sock: Post "http://%2Fvar%2Fr
un%2Fdocker.sock/v1.24/containers/create?name=nxf-8IoBF1Fj7HBmfAtOGRGpW5Fp": dial unix /var/run/docker.sock: connect: permission denied.
See 'docker run --help'."
Does anyone have any advice on how to handle this?
:point_up: Edit: Hi,
I'm trying to use Docker in Nextflow script and I'm getting the following error:
"connect to the Docker daemon socket at unix:///var/run/docker.sock: Post "http://%2Fvar%2Fr
un%2Fdocker.sock/v1.24/containers/create?name=nxf-8IoBF1Fj7HBmfAtOGRGpW5Fp": dial unix /var/run/docker.sock: connect: permission denied.
See 'docker run --help'."
Does anyone have any advice on how to handle this?
Hello,
I am trying to run nextflow inside a docker and from there: run other dockers in the host machine. For that I have built a docker image with nextflow and docker inside, as so:
FROM nextflow/nextflow
# Adds ability to run docker in the host
RUN amazon-linux-extras install docker -y
WORKDIR /nextflow
But when running the below line, it says Command error: /bin/bash: .command.sh: No such file or directory
docker run -v /var/run/docker.sock:/var/run/docker.sock custom/nextflow bash -c 'nextflow run hello -with-docker nextflow/examples'
Does anyone have an idea on how to fix this?
publish
command? I understand it is asynchronous by default, but I have a pipeline that runs for 7-10 days and some of the useful results are complete after the first day. Would be good if I could request the results from day 1 are "published" soon after completion.
init_Config (3)
terminated for an unknown reason -- Likely it has been terminated by theHello everyone,
Is it possible to update process information using withLabel
on the command line? So something like:
nextflow run hello_world.nf -process.withLabel:hello_world.cpus 2
I know that something as simple as this works:
nextflow run hello_world.nf -process.cpus 2
but I want to target a particular process.
hello everyone, I am using singularity images from biocontainers for most of my processes, where they are pre-downloaded and used directly (as the cluster nodes don't have internet access). However, I am having issues with the Prokka image from that site. I pull the images from biocontainers with singularity pull
, and use that image directly in the nextflow pipeline. However, it seems that nextflow doesn't want to use the prokka image directly, it rather wants to download it from docker. Note that it works fine for all other processes that have been downloaded the same way. Does anyone know anything about this image? https://biocontainers.pro/tools/prokka
(Note: I have tried several versions of the singularity image, all give the same result).
My nextflow.config
:
params.container_dir = 'path/to/images'
singularity {
enabled = true
autoMounts = true
}
The container is called like this in the prokka process:
process PROKKA {
container "${container_dir}/prokka:1.14.5--pl526_1"
Thanks in advance!
Hello,
I am trying to figure out how to separate and categorize some files based on some string values in their name. For example, if the files are named;
X_filename_f0_r1-1_1_R1.fastq
X_filename_f0_r0-1_1_R1.fastq
X_filename_f0_r0-1_2_R1.fastq
X_filename_f0_r1-0_3_R1.fastq
X_filename_f1_r1-1_1_R1.fastq
X_filename_f1_r0-1_1_R1.fastq
X_filename_f1_r0-1_2_R1.fastq
X_filename_f1_r1-0_3_R1.fastq
X_filename_f2_r1-1_1_R1.fastq
X_filename_f2_r0-1_1_R1.fastq
X_filename_f2_r0-1_2_R1.fastq
X_filename_f2_r1-0_3_R1.fastq
X_filename_f3_r1-1_1_R1.fastq
X_filename_f3_r0-1_1_R1.fastq
X_filename_f3_r0-1_2_R1.fastq
X_filename_f3_r1-0_3_R1.fastq
How can I separate the files based on f0, f1, f2, f3 and have each category saved under a different name? Then I want to separate the same files based on the numbers after the "r" but before the underscore. There are a lot of different numbers so I can't be specific in the script about the numbers to select. These same files will then be separated again based on the number before "_R1". So in the end, I will have a category of files based on f-number then a subcategory based on r number then another subcategory under that based on a number.
Thanks
Tj
Hi all,
My challenge of my code is to create input for pairwise comparison. Based on the comparison list, I would like to select the samples that are in that group. I have tried the following but not with complete success as I receive a DataflowBroadcast around DataflowStream[?]
bams = Channel.from(["sampleX_1", "groupX", "bamX_1", "baiX_1"],
["sampleX_2", "groupX", "bamX_2", "baiX_2"],
["sampleY_1", "groupY", "bamY_1", "baiY_1"],
["sampleY_2", "groupY", "bamY_2", "baiY_2"],
["sampleZ_1", "groupZ", "bamZ_1", "baiZ_1"],
["sampleZ_2", "groupZ", "bamZ_2", "baiZ_2"])
comparison_list = Channel.from(["groupX", "groupY"],["groupX", "groupZ"])
group_input = comparison_list_file.map{ it ->
def bam_by_group = bam.groupTuple(by:1)
def compare1 = it[0]
def compare2 = it[1]
def group_ctrl = bam_by_group.first{ it[1] == compare1}
def group_case = bam_by_group.first{ it[1] == compare2}
def group_input = group_ctrl.combine(group_case)
return group_input.view() // - desired outcome, see below
}.view() // results into - DataflowBroadcast around DataflowStream[?]
I like the result to have it in this format:
// [[sampleX_1, sampleX_2], groupX, [bamX_1, bamX_2], [baiX_1, baiX_2], [sampleY_1, sampleY_2], groupY, [bamY_1, bamY_2], [baiY_1, baiY_2]]
// [[sampleX_1, sampleX_2], groupX, [bamX_1, bamX_2], [baiX_1, baiX_2], [sampleZ_1, sampleZ_2], groupZ, [bamZ_1, bamZ_2], [baiZ_1, baiZ_2]]
Of which group_input.view()
does but how can I use that?
Thanks!
On a related note, I'm trying to group process outputs together for downstream paired processing. I have a digest file that looks like
sample_id,read1,read2,control_id
cell1_ctl,A.fq.gz,B.fq.gz,
cell1_trt1,C.fq.gz,D.fq.gz,cell1_ctl
cell1_trt2,E.fq.gz,F.fq.gz,cell1_ctl
cell2_ctl,G.fq.gz,H.fg.gz,
cell2_trt1,I.fq.gz,J.fq.gz,cell2_ctl
cell2_trt2,K.fq.gz,L.fq.gz,cell2_ctl
and I have a basic pipeline that drives from per-column channels and looks like
aln_bams = bwamem(samples_ch, read1_ch, read2_ch)
filt_bams = mapq_filter(aln_bams, 30)
but now i need to get to bam channels that look like
[cell1_trt1.bam, cell1_trt2.bam, cell2_trt1.bam, cell2_trt2.bam]
[cell1_ctl.bam, cell1_ctl.bam, cell2_ctl.bam, cell2_ctl.bam]
the issue is I can't find a way to join
the filtered bams with the original IDs. In DSL=1, I could merge
; but using DSL=2 I need an index on which to join
, and there's no such thing as .mapWithIndex
.
Any ideas?
is there any usage example for this. nextflow-io/nextflow#524
I did
process {
cpus = 4
withLabel:test2 {
publish_dir = 'result_data'
}
}
And got the following warning
WARN: Access to undefined parameter `publish_dir` -- Initialise it to a default value eg. `params.publish_dir = some_value`
WARN: Process `test2` publishDir path contains a variable with a null value
WARN: Process `test2` publishDir path contains a variable with a null value
WARN: Process `test2` publishDir path contains a variable with a null value
WARN: Process `test2` publishDir path contains a variable with a null value
WARN: Process `test2` publishDir path contains a variable with a null value