Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
  • Dec 06 14:41
    ewels opened #1405
  • Dec 06 10:09
    pditommaso commented #946
  • Dec 06 07:03
    kemin711 opened #1404
  • Dec 05 11:58
    ii-bioinfo-thomas opened #1403
  • Dec 05 10:22
    ii-bioinfo-thomas edited #1402
  • Dec 05 10:20
    ii-bioinfo-thomas opened #1402
  • Dec 05 09:10
    ulfschaefer opened #1401
  • Dec 05 07:50
    pditommaso commented #1400
  • Dec 04 23:14
    durrantmm edited #1400
  • Dec 04 23:13
    durrantmm opened #1400
  • Dec 04 18:32
    pditommaso labeled #1399
  • Dec 04 16:55
    ii-bioinfo-thomas edited #1399
  • Dec 04 16:54
    ii-bioinfo-thomas edited #1399
  • Dec 04 16:52
    ii-bioinfo-thomas opened #1399
  • Dec 04 15:38
    sb10 commented #1088
  • Dec 04 15:07
    pditommaso commented #1088
  • Dec 04 15:04
    pditommaso edited #1398
  • Dec 04 13:12
    sb10 commented #1088
  • Dec 04 12:38
    jma1991 commented #1387
  • Dec 04 11:20
    sb10 commented #1088
Paolo Di Tommaso
@pditommaso
Then falling back from loading from a remote URL if the repository hasn't been cloned with submodules
if you want to use submodule put manifest.gitmodule = true in the config
Phil Ewels
@ewels
ok nice, will try this. Thanks!
Paolo Di Tommaso
@pditommaso
welcome!
Sofia Stamouli
@sofiastam_gitlab

@AlaaBadredine_twitter I get the following error :

Command error:
.command.sh: line 2: /home/work/42/10d5a92402990314bf2ae1b15e4eb5/File1_spades-contigs.fa: Permission denied

Work dir:
/home/scripts/work/3b/6b9118eb30615b7d3f9cffc4aac7f3

It seems that the script is not reaching the update command. I tried to do the following:

        script:
        """
        filepath_contigs.each {
        sqlite3 ${params.db} "update ${params.work_queue} set ${params.status}='finished' where ${params.result_table}='nextflow_spades'"
        sqlite3 ${params.db} "update ${params.nextflow_spades} set ${params.contigs}=$it,${params.log}='$filepath_logs' where ${params.status}='pending'"
        }
        """

But I just get null in the ${params.contigs} and it does not seem that it makes any sense

Alaa Badredine
@AlaaBadredine_twitter
could you paste here the .command.sh ?
Sofia Stamouli
@sofiastam_gitlab

@AlaaBadredine_twitter

#!/bin/bash -e

/home/scripts/work/42/10d5a92402990314bf2ae1b15e4eb5/File1_spades-contigs.fa

The nextflow.config looks like:

docker.runOptions='-u $(id -u):$(id -g)'
docker.enabled = true

process.shell = ['/bin/bash','-e']

Can I change the write permissions in the work directory?

Alaa Badredine
@AlaaBadredine_twitter
can your docker access that path ?
because looks like the docker does not have the persmission to go there
Sofia Stamouli
@sofiastam_gitlab
The docker was able to access the path before I used the 'each' in the script and I do not see why it cannot be possible now. But I agree that it seems like it does not have permission to go there.
Raoul J.P. Bonnal
@helios
For those developing on Windows10, are you using IntelliJ ? How do you execute a make compile, I mean what software do I need to install to build Nextflow from source ? (Not using Ubuntu WSL)
Alaa Badredine
@AlaaBadredine_twitter
well it shouldn't be a problem even after using the each()... this is intriguing to me
Paolo Di Tommaso
@pditommaso
@helios last time I used windows was 2009 .. :satisfied:
Raoul J.P. Bonnal
@helios
@pditommaso last time I used a mac was 4 yrs ago, bored to change them beucase not enough strong with me :)
This is funny, which would be the best laptop for developing, considering Linux as base OS ?
Paolo Di Tommaso
@pditommaso
dell xps ubuntu edition?
Raoul J.P. Bonnal
@helios
we have couple of xps and are very nice, now I am running w/surface pro, which is nice as well
Raoul J.P. Bonnal
@helios
in any case running gradle from intelliJ's console worked
Ólavur Mortensen
@olavurmortensen

Nextflow doesn't seem to know that the fromList method exists. I'm running the example from the docs:

Channel
    .fromList( ['a', 'b', 'c', 'd'] )
    .view { "value: $it" }

And getting an error:

ERROR ~ No signature of method: static nextflow.Channel.fromList() is applicable for argument types: (ArrayList) values: [[a, b, c, d]]
Possible solutions: from([Ljava.lang.Object;), from(java.util.Collection), fromPath(java.lang.Object), fromSRA(java.lang.Object)

 -- Check script 'test.nf' at line: 4 or see '.nextflow.log' file for more details
Ólavur Mortensen
@olavurmortensen
This works with Channel.from. To me it seems like fromList is not supported.
Sofia Stamouli
@sofiastam_gitlab

@AlaaBadredine_twitter I changed the file permissions for the two files I am running the pipeline. I should fix this problem regarding how the docker can have access to the path. But I think it is very strange.

However, it seems that even with 'each', the files are just appended to the command. The .command.sh looks like:

#!/bin/bash -e
['/home/scripts/work/5e/569093003ea6fa1e8ca2b4621dd515/File1-contigs.fa', '/home/scripts/work/69/bf6c8095be25fc10bee8b401cabfa5/File2-contigs.fa']

Any ideas? Should I write my own function instead?

Alaa Badredine
@AlaaBadredine_twitter
Is this the full content of .command.sh ??
Sofia Stamouli
@sofiastam_gitlab
@AlaaBadredine_twitter Yes
Alaa Badredine
@AlaaBadredine_twitter
but you see there are no commands, what is it supposed to run ?
is this still your script block ?
script:
        """
        filepath_contigs.each {
        sqlite3 ${params.db} "update ${params.work_queue} set ${params.status}='finished' where ${params.result_table}='nextflow_spades'"
        sqlite3 ${params.db} "update ${params.nextflow_spades} set ${params.contigs}=$it,${params.log}='$filepath_logs' where ${params.status}='pending'"
        }
        """
because it should be more like
script:
        filepath_contigs.each {
        """
        sqlite3 ${params.db} "update ${params.work_queue} set ${params.status}='finished' where ${params.result_table}='nextflow_spades'"
        sqlite3 ${params.db} "update ${params.nextflow_spades} set ${params.contigs}=$it,${params.log}='$filepath_logs' where ${params.status}='pending'"
        """
        }
Sofia Stamouli
@sofiastam_gitlab
@AlaaBadredine_twitter Sorry for confusion here. This is the script block:
    script:

        filepath_contigs.each {contig ->
        """
        #sqlite3 ${params.db} "update ${params.work_queue} set ${params.status}='finished' where ${params.result_table}='nextflow_spades'"
        sqlite3 ${params.db} "update ${params.nextflow_spades} set ${params.contigs}=$contig,${params.log}='$filepath_logs' where ${params.status}='pending'"

        """
        }
Alaa Badredine
@AlaaBadredine_twitter
ok so let's try to check again the whole thing...
there are things that you can really update with your script
this is what you pasted at the beginning
process SPAdesAssembly {


        publishDir "${params.outdir}/SPAdesContigs", mode:"copy"


        input:
        set val(sample_name), file(raw_data) from spades_ch2


        output:
        set val(sample_name),file("${sample_name}_spades-contigs.fa") into  spades_assembly_results
        set val(sample_name),file("${sample_name}_spades.log") into spades_assembly_logs

        script:
        """
        spades.py --only-assembler -s $raw_data -t ${threads} -o spades_output
        mv spades_output/contigs.fasta ${sample_name}_spades-contigs.fa
        mv spades_output/spades.log ${sample_name}_spades.log
        """
}


process get_filepath {

        publishDir "${params.outdir}/filePath", mode:"copy"

        input:
        set val(sample_name),file (spades_contigs) from spades_assembly_results
        set val(sample_name),file (spades_logs) from spades_assembly_logs

        output:
        file ("${sample_name}-contigs_filepath") into contigs_filepath_ch
        file ("${sample_name}-logs_filepath") into spades_filepath_ch

        script:
        """
        readlink -f ${spades_contigs} > ${sample_name}-contigs_filepath
        readlink -f ${spades_logs} > ${sample_name}-logs_filepath
        """

}


contigs_filepath_ch.map { it.text.trim() }.set {contigs_link}
spades_filepath_ch.map {it.text.trim() }.set {logs_link}


process update_nextflow_spades {


        input:
        val filepath_contigs from contigs_link
        val filepath_logs from logs_link
        file(db) from db_channel

        script:
        """
       sqlite3 ${params.db} "update ${params.nextflow_spades} set ${params.contigs}='${filepath_contigs}',${params.log}='${filepath_logs}' where ${params.status}='pending'"
        """

}
  1. in the process update_nextflow_spades, the input file(db) from db_channel has not been used
  2. You are making many processes when you can do all of them in a single one, I mean, creating a process just to read links and then passing is to another process is not worth it... unless you have a reason
you could do instead do something like that by combining all the 3 of them
process SPAdesAssembly {


        publishDir "${params.outdir}/SPAdesContigs", mode:"copy"


        input:
        set val(sample_name), file(raw_data) from spades_ch2
        file(db) from db_channel


        output:
        set val(sample_name),file("${sample_name}_spades-contigs.fa") into  spades_assembly_results
        set val(sample_name),file("${sample_name}_spades.log") into spades_assembly_logs

        script:
        """
        spades.py --only-assembler -s $raw_data -t ${threads} -o spades_output

        sqlite3 ${params.db} "update ${params.nextflow_spades} set ${params.contigs}=\$(readlink -f spades_output/"$sample_name"_spades-contigs.fa) ,${params.log}=\$(readlink -f spades_output/${sample_name}_spades.log) where ${params.status}='pending'"
        """
}
Sofia Stamouli
@sofiastam_gitlab
@AlaaBadredine_twitter Thank you for taking time with my question :)
You are right about the unnecessary processes. I will try your suggestion now. Clean and smooth.
Alaa Badredine
@AlaaBadredine_twitter
you're welcome !
Sofia Stamouli
@sofiastam_gitlab
@AlaaBadredine_twitter Another reason that I was having many unnecessary processes is that I wanted one docker per process in the nextflow.config file.
Alaa Badredine
@AlaaBadredine_twitter
I understand your point of view. But then again, is it worth it to spawn one docker just to read links for a file ? If I need to perform small tasks like echo or cat or similar stuff that doesn't ask for heavy calculations, I'd rather put them in one process
you'd get a cleaner code and less complex since it treats all the input at the same time for the same sample for example
Sofia Stamouli
@sofiastam_gitlab
@AlaaBadredine_twitter I agree. I am getting there slowly!
Alaa Badredine
@AlaaBadredine_twitter
You will ! no worries
Gregor Sturm
@grst
Hi,
how can I process a channel in chunks?
Say I have a channel files containing 10,000 output files from the previous process.
How can I process them in the next process in chunks of 100?
In the documentation I only found splitText which only works on files, but not channels.
Sofia Stamouli
@sofiastam_gitlab
@AlaaBadredine_twitter But since I cannot define multiple dockers in one process, how can I use both the sqlite3 image and spades image in the same process?
Alaa Badredine
@AlaaBadredine_twitter
Well, I am not expert on docker :( I am sorry but I can't answer you this. However, perhaps, is there a way to call the sqlite3 image from another image ?
if yes, then you only need to call that image inside you script as you'd you on your command line in bash
Jerry Jeyachandra
@jerdra
@grst i think you can use the collate operator https://www.nextflow.io/docs/latest/operator.html#collate
Gregor Sturm
@grst
Thanks a lot, @jerdra!
Steven P. Vensko II
@spvensko
Is it possible to have multiple publishDirs for a process such that some outputs files go to one publishDir and others go to a secondary publishDir?
For context, I am running Abra2 and it's producing two output BAMs and I'd like to store both in different directories (the normal output bam in a directory associated with the normal SRA run ID and the tumor output bam in a directory associated with the tumor SRA run ID).
Jerry Jeyachandra
@jerdra
@spvensko yes you can use publishDir twice with the "pattern" option for each output
Steven P. Vensko II
@spvensko
Ah, I overlooked the pattern option. Thanks @jerdra!
Raoul J.P. Bonnal
@helios

@pditommaso @fstrozzi my tentative new Channel.fromDBSql helios/nextflow@d6bf631
the channel name could be changed to fromDBSelect forcing some internal check to avoid SQL commands that can modify the DB. I followed the Channel.fromSRA. The proper driver(s) can be downloaded and capsuled using NXF_GRAB. I need to write some example and a bit of documentation. The channel returns groovy.sql.GroovyResultSetExtension if there is no closure or you can directly manipulate the query results passing a closure such as

query = """SELECT * FROM libraries JOIN annotations ON libraries.library_id=annotations.library_id JOIN files ON libraries.library_id=files.library_id WHERE libraries.project='${params.project}' AND libraries.group='${params.group}'"""
if (params.filter){
query = query + """ AND ${params.filter}"""
}

Channel.fromDBSql(query, dbUrl: "jdbc:postgresql://${your_server}/s${your_db}", dbUser: "guest", dbPassword: "${your_password}",dbDriver: "org.postgresql.Driver") {
        [it.library_id, it.path] }
       .subscribe {println it}

then discuss with whom intersted