These are chat archives for nextflow-io/nextflow

18th
May 2016
Mike Smoot
@mes5k
May 18 2016 17:17
I'm wondering if there's a way to get an index out of a channel? I'd like to pair the value in a channel with an index to create a new tuple of (index, value) in the channel. Is that possible?
Paolo Di Tommaso
@pditommaso
May 18 2016 17:23
Use a map with a counter defined in the script global context
Mike Smoot
@mes5k
May 18 2016 17:25
Ok, can do. Thanks!
Mike Smoot
@mes5k
May 18 2016 17:39
@pditommaso I just noticed in cytoscape.js.dag.template.html on line 31 of master that you've got a redundant http assignment. I'd fix this, but I'm not sure if your intent is to always use https for cytoscape.js or toggle like the others.
Paolo Di Tommaso
@pditommaso
May 18 2016 18:42
bug!
thanks for pointing it out
Jason Byars
@jbyars
May 18 2016 20:26
is there a quick way to dump the effective values for all of the aws configuration options?
Paolo Di Tommaso
@pditommaso
May 18 2016 20:27
I think they are logged in the .nextflow.log file
(when used)
Jason Byars
@jbyars
May 18 2016 20:28
Is that something that shows up with the -with-trace option?
meeting afk
Paolo Di Tommaso
@pditommaso
May 18 2016 20:29
let me check
You should find this in the log file
however not sure that's what you are looking for
Jason Byars
@jbyars
May 18 2016 21:19
I get AWS S3 config details: {region=us-east-1} from the log file. I want the dump of this http://www.nextflow.io/docs/latest/config.html#config-aws
If I haven't changed any values for the advanced client options, where are the default values coming from?
Paolo Di Tommaso
@pditommaso
May 18 2016 21:26
That setting are available only if you define them in the config file
Jason Byars
@jbyars
May 18 2016 21:35
ok, but if I don't define them in config file, are there any default values nextflow plugs in, or are the AWS defaults used?
Mike Smoot
@mes5k
May 18 2016 21:36
@pditommaso the spread operator is using the now deprecated just operator in 0.19. Should I enter tickets for minor stuff like this or just let you know here?
Paolo Di Tommaso
@pditommaso
May 18 2016 21:36
nope it just relies on aws-s3-sdk defaults
Jason Byars
@jbyars
May 18 2016 21:38
great, then I can debug this. I'm getting occassional socket timeouts when fetching files at the beginning of runs.
Paolo Di Tommaso
@pditommaso
May 18 2016 21:38
@mes5k oops, you are right. I'm losing the control over the codebase :/
yes, please report that on github. I will put together a minor release tomorrow
Mike Smoot
@mes5k
May 18 2016 21:40
no problem and no rush, just wanted to let you know!
Paolo Di Tommaso
@pditommaso
May 18 2016 21:40
@mes5k Thanks!
This message was deleted
@jbyars You should be able to debug that adding the -debug com.amazonaws.somePackage option on the nxf command line
Jason Byars
@jbyars
May 18 2016 21:42
and once again, debugging becomes considerably easier... Thanks!
Paolo Di Tommaso
@pditommaso
May 18 2016 21:43
that's a tool done for developers !
Jason Byars
@jbyars
May 18 2016 22:07
now you just need to make the front page :clap:
Paolo Di Tommaso
@pditommaso
May 18 2016 22:07
what do you mean ? :)
Jason Byars
@jbyars
May 18 2016 22:09
get a nextflow article to show up on the front page of everybody's reddit feed.
that is the new achievement metric right?
Paolo Di Tommaso
@pditommaso
May 18 2016 22:11
actually I'm not fond of reddit
Jason Byars
@jbyars
May 18 2016 22:11
fair enough, it's not what it used to be.
Mike Smoot
@mes5k
May 18 2016 22:25

Paolo, I'm seeing some strange behavior with this simple example.

ch = Channel.from(tuple("A", "/Users/msmoot/code/nextflow_resequencing/marge.fa"),
                  tuple("B", "/Users/msmoot/code/nextflow_resequencing/marge.fa"))

process testIt {

    input:
    set sample, file(fasta) from ch

    output:
    file("${sample}.fasta") into sample_fasta

    script:
    """
    cp ${fasta} ${sample}.fa
    """

I expect the 'fasta' file to be the actual file path, but 'input.1' is passed in. I swear I've had similar things working elsewhere, so I'm not sure what I've done wrong...

Paolo Di Tommaso
@pditommaso
May 18 2016 22:28
actually as long you specify a file handle instead of a file name that file is linked in that way
just use
  input:
    set sample, file('marge.fa') from ch
Mike Smoot
@mes5k
May 18 2016 22:30
What happens when it's not 'marge.fa' each time, but something different?
The second element of the tuple will always be a different file in my case. I'm not sure how I reference it from the channel.
Paolo Di Tommaso
@pditommaso
May 18 2016 22:33
since the beginning the idea was to use always the same to avoid the parametrise the script
even when source files have different names
I found a very bad practice to use the filename to carry meta-data
however, you can still modify the name as you want, for example:
  input:
    set sample, file("${sample}.fa") from ch
or
  input:
    set sample, file('*') from ch
to keep the source name
Mike Smoot
@mes5k
May 18 2016 22:36
Hmmmm, I'm clearly missing something.
Paolo Di Tommaso
@pditommaso
May 18 2016 22:36
what
why I get your problem
Channel.from(tuple("A", "/Users/msmoot/code/nextflow_resequencing/marge.fa"),
                  tuple("B", "/Users/msmoot/code/nextflow_resequencing/marge.fa"))
the second element of the tuple is not a file but just a string
Paolo Di Tommaso
@pditommaso
May 18 2016 22:41
thus the process will create a new file containing the /Users/msmoot/code/nextflow_resequencing/marge.fa string
Mike Smoot
@mes5k
May 18 2016 22:41

Ok, this example works for me:

ch = Channel.from(tuple("A", "/Users/msmoot/code/nextflow_resequencing/marge.fa"),
                  tuple("B", "/Users/msmoot/code/nextflow_resequencing/homer.fa"),
                  tuple("C", "/Users/msmoot/code/nextflow_resequencing/bart.fa"))

process testIt {

    input:
    set sample, fasta from ch

    output:
    file("${sample}.fa") into sample_fasta

    script:
    """
    cp ${fasta} ${sample}.fa
    """
}

What confuses me is that it works with

set sample, fasta from ch

but not

set sample, file(fasta) from ch
Paolo Di Tommaso
@pditommaso
May 18 2016 22:43
have you read my previous reply? I mean
thus the process will create a new file containing the /Users/msmoot/code/nextflow_resequencing/marge.fa string
Mike Smoot
@mes5k
May 18 2016 22:44
Yes, we were typing at the same time I think.
Paolo Di Tommaso
@pditommaso
May 18 2016 22:45
ok,
set sample, fasta from ch
the above it seems to work because it just handle fasta as a string
instead
set sample, file(fasta) from ch
in this case expects fasta as a file, since it received a string, will create a file containing that string
to resume, you should have created the channel like the following:
ch = Channel.from(tuple("A", file("/Users/msmoot/code/nextflow_resequencing/marge.fa")),
                  tuple("B", file("/Users/msmoot/code/nextflow_resequencing/homer.fa")),
                  tuple("C", file("/Users/msmoot/code/nextflow_resequencing/bart.fa")))
Mike Smoot
@mes5k
May 18 2016 22:49
Yes, that finally makes sense to me!
Paolo Di Tommaso
@pditommaso
May 18 2016 22:49
:+1:
Mike Smoot
@mes5k
May 18 2016 22:49
Thanks for your patience with my dumb questions! :)
Paolo Di Tommaso
@pditommaso
May 18 2016 22:49
actually I contributed to confuse your ideas
when I said the file name is lost, it's no true!
Jason Byars
@jbyars
May 18 2016 23:00
Up for one more dilemma? I have a cufflinks job that creates a subfolder with several result files in it. I want to want to retain the subfolder in the output. When I use publishDir everything is flattened into the folder specified by publishDir. What is the correct way to output a subfolder of results?
Paolo Di Tommaso
@pditommaso
May 18 2016 23:03
um, in the output are capturing the folder or the files in the subfolder?
Jason Byars
@jbyars
May 18 2016 23:34

I've tried both with no success. If I do

publishDir "s3://somebucket", mode: 'move'
output:
file foldername into transcripts

I just get folder created in the bucket without any contents. If I do

output:
file "${foldername}/*" into transcripts 
or
file "${foldername}**" into transcripts

all the files get dumped into the folder specified by publishDir without the subfolder

I want s3://somebucket/foldername/results