These are chat archives for nextflow-io/nextflow

14th
Feb 2017
amacbride
@amacbride
Feb 14 2017 00:20
So, good-news, bad-news for me; while
 .toSortedList()
 .flatMap()
...did what I wanted in terms of giving me my ordered sequence of tuples, it looks like the order of consumption by the downstream process of that channel isn't ordered (I'm assuming it's pulling from a hash or something).
Is there a way to guarantee that a downstream process pulls items from a channel in an ordered fashion (treating it like a FIFO)?
Mike Smoot
@mes5k
Feb 14 2017 00:24
I think groupTuple is what you want - that will combine all 4 of your files into one tuple that can then be processed in the order that you need.
But you might need to rework the process that consumes that tuple to handle 4 inputs at once rather than one at a time.
amacbride
@amacbride
Feb 14 2017 00:25
FIFO semantics is what I expected given this statement in the docs: "A channel is a non-blocking unidirectional FIFO queue which connects two processes."
(I'd rather not rework the downstream process, if I can avoid it.)
Mike Smoot
@mes5k
Feb 14 2017 00:28
That's true, but remember that you have (potentially) multiple instances of that process running concurrently.
Which reminds me, you could force the issue and set maxForks: 1 for the process and then I think they'd run sequentially.
amacbride
@amacbride
Feb 14 2017 00:31
Oh, I want and expect them to run concurrently, but I would like them to be submitted into my cluster job-queuing system (SLURM) in order, so that the pipeline doesn't stall, and I get better throughput.
shrug It's an optimization, and I can live without it, but I was surprised that the semantics aren't really FIFO.
Mike Smoot
@mes5k
Feb 14 2017 00:44
I'd guess that the process threads are actually created in order of tuples coming out of the channel, but what you're seeing in stdout or the logs is when the thread says it's been submitted, which, given the vagaries of thread scheduling, is not necessarily the order in which the thread was created.
Maxime Garcia
@MaxUlysse
Feb 14 2017 08:17
Hi @pditommaso, can you give a small example about your answer to @boulund about the several publishDir possibilities ? thanks
Paolo Di Tommaso
@pditommaso
Feb 14 2017 08:31
Hi, something like this
process foo {
  publishDir 'results', saveAs: { it == 'file_a' ? 'dir1/file_a' : 'dir2/file_b' }

  output:
  file '*'

  '''
  touch file_a file_b
  '''
}
Maxime Garcia
@MaxUlysse
Feb 14 2017 08:32
Ooh I see
Thanks a lot
Paolo Di Tommaso
@pditommaso
Feb 14 2017 08:32
the key point is that saveAs should return a relative path to the main publishing dir or an absolute path
Paolo Di Tommaso
@pditommaso
Feb 14 2017 08:57
@amacbride Channels are FIFO, however process executions order cannot be guaranteed as long as they run in parallel
LukeGoodsell
@LukeGoodsell
Feb 14 2017 09:12
Hello. Please can you direct me to the documentation for how to re-use a process with different input/output files? My Google-fu isn't strong enough.
Paolo Di Tommaso
@pditommaso
Feb 14 2017 09:14
Hi, what do you mean exactly for re-use? running multiple instances of the same process with different files?
LukeGoodsell
@LukeGoodsell
Feb 14 2017 09:15
yes. Specifically, I am cat'ing per-lane, per-read fastq files into 2 files, one for each read
I'd like to use one process for the cat job, and use it twice
(to help me learn nextflow)
Paolo Di Tommaso
@pditommaso
Feb 14 2017 09:16
oh, that's one of the basic examples
also this tutorial could be useful
Hugues Fontenelle
@huguesfontenelle
Feb 14 2017 09:17
Hello!
process bar {
    errorStrategy 'ignore'

    output:
    file("f.txt") into ch

"""
exit 1
echo "Hello world" > f.txt
"""
}

process foo {
    echo true

    input:
    file(f) from ch

"""
if [ -f $f ]; then
    echo "File contents:"
    cat $f
else
    echo "File does not exists."
fi
"""
}
Paolo Di Tommaso
@pditommaso
Feb 14 2017 09:18
there's a party !
LukeGoodsell
@LukeGoodsell
Feb 14 2017 09:18
Ok, thanks, Paolo
Paolo Di Tommaso
@pditommaso
Feb 14 2017 09:18
welcome !
Hugues Fontenelle
@huguesfontenelle
Feb 14 2017 09:18
oops I interrupted a conversation sry
Paolo Di Tommaso
@pditommaso
Feb 14 2017 09:18
have a look to the tutorial and don't hesitate for specific questions
Hi Hugues !
Hugues Fontenelle
@huguesfontenelle
Feb 14 2017 09:19
:) Hi!
Here I'd still want foo to run, after bar has run (and wether it has failed or not). Any idea?
Paolo Di Tommaso
@pditommaso
Feb 14 2017 09:20
NF doesn't help a lot on this
you need to hack at BASH level
eg
'''
set +e
exit 1
echo $? > f.txt
set -e
echo ",Hello world" >> f.txt
'''
in this way you can capture the error code and prepend it to f.txt
or a similar strategy
LukeGoodsell
@LukeGoodsell
Feb 14 2017 09:30
Hi again. I can see how to handle each read pair, but I'd like to combine all forward reads, and all reverse reads from multiple read pairs into one read pair.
I have a channel, rawForwardFastQs, which contains a list of the forward reads. And rawReverseFastQs, which contains a list of the reverse reads
Paolo Di Tommaso
@pditommaso
Feb 14 2017 09:31
so far so good
LukeGoodsell
@LukeGoodsell
Feb 14 2017 09:31
my first task is to concat all the forwards, and all the reverse reads, into 1 file each
E.g.:
process catForward {

    input:
    file rawForwardFastQFiles from rawForwardFastQs.toList()

    output:
    file 'forward.fastq.gz' into rawForwardFastQFile

    script:
    """
    cat $rawForwardFastQFiles > forward.fastq.gz
    """
}
Can I write a process that can be re-used, or should I write one for the forward, and one for the reverse, substituting the variable/file names?
Paolo Di Tommaso
@pditommaso
Feb 14 2017 09:34
you can write an external script and reuse in your pipeline
LukeGoodsell
@LukeGoodsell
Feb 14 2017 09:34
I see. Thanks
Paolo Di Tommaso
@pditommaso
Feb 14 2017 09:34
but you will still need to declare a separate process
Hugues Fontenelle
@huguesfontenelle
Feb 14 2017 09:53
Ignored processes do not emit anything in channel.
Can't make it work..
Paolo Di Tommaso
@pditommaso
Feb 14 2017 09:54
why it's ignored ?
that should not be ignored as long as you suppress the error code
Hugues Fontenelle
@huguesfontenelle
Feb 14 2017 09:59
process bar {
    errorStrategy 'ignore'

    output:
    file("f.txt") into ch

"""
set +e
exit 1
echo \$? > f.txt
set -e
echo ", Hello world" >> f.txt
"""
}

process foo {
    echo true

    input:
    file(f) from ch

"""
cat $f
"""
}
foo never runs
Paolo Di Tommaso
@pditommaso
Feb 14 2017 10:02
It turns out that exit cannot be suppressed, but I guess it represent a command returning a non-zero
in that case it would work.
Hugues Fontenelle
@huguesfontenelle
Feb 14 2017 10:03
Oh yes it does ..
Paolo Di Tommaso
@pditommaso
Feb 14 2017 10:03
:+1:
(need to leave now)
Hugues Fontenelle
@huguesfontenelle
Feb 14 2017 10:04
I just wrote garbage instead of exit 1 !
Thanks
Fredrik Boulund
@boulund
Feb 14 2017 14:28

Yes, you can use a closure a value for saveAs option to dynamically define the target path of the publishDir

@pditommaso
Great! I should've thought of that, I love passing functions everywhere :) Thanks

Paolo Di Tommaso
@pditommaso
Feb 14 2017 14:34
:)
:+1:
Félix C. Morency
@fmorency
Feb 14 2017 21:27
Playing with nextflow cloud and got a 't2.micro' is an unsupported instance type error. Ideas?
Paolo Di Tommaso
@pditommaso
Feb 14 2017 21:28
do you have the stack trace
Félix C. Morency
@fmorency
Feb 14 2017 21:29
nextflow cloud create test-cluster -c 2
> cluster name: test-cluster
> instances count: 2
> Launch configuration:
 - driver: 'aws'
 - imageId: 'ami-782be56e'
 - instanceType: 't2.micro'
 - keyFile: /home/morency/.ssh/id_rsa.pub
 - sharedStorageId: 'fs-15e4465c'
 - sharedStorageMount: '/mnt/efs'
 - spotPrice: 0.06
 - subnetId: 'subnet-4b813813'
 - userName: 'morency'

Please confirm you really want to launch the cluster with above configuration [y/n] y
Launching worker node -- Waiting for `running` status.. ERROR ~ Value (t2.micro) for parameter instanceType is invalid. 't2.micro' is an unsupported instance type (Service: AmazonEC2; Status Code: 400; Error Code: InvalidParameterValue; Request ID: a69e446c-10fa-46be-9a0f-72c68684491a)
This is the only info I have
Paolo Di Tommaso
@pditommaso
Feb 14 2017 21:30
it didn't launch anything right ?
Félix C. Morency
@fmorency
Feb 14 2017 21:30
correct
Paolo Di Tommaso
@pditommaso
Feb 14 2017 21:30
I guess it's an error returned by the AWS api
try to run like this
Félix C. Morency
@fmorency
Feb 14 2017 21:30
(I'm still wrapping my head around this aws thing so I might have done something wrong)
Paolo Di Tommaso
@pditommaso
Feb 14 2017 21:30
nextflow -log nf.log cloud create test-cluster -c 2
you should fine more info in the log file
Félix C. Morency
@fmorency
Feb 14 2017 21:32
Paolo Di Tommaso
@pditommaso
Feb 14 2017 21:33
exactly
com.amazonaws.services.ec2.model.AmazonEC2Exception: Value (t2.micro) for parameter instanceType is invalid. 't2.micro' is an unsupported instance type (Service: AmazonEC2; Status Code: 400; Error Code: InvalidParameterValue; Request ID: 49bbd448-6329-4fa9-aa9e-fc11368b7675)
    at com.amazonaws.http.AmazonHttpClient.handleErrorResponse(AmazonHttpClient.java:1406)
    at com.amazonaws.http.AmazonHttpClient.executeOneRequest(AmazonHttpClient.java:950)
    at com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:723)
    at com.amazonaws.http.AmazonHttpClient.doExecute(AmazonHttpClient.java:475)
    at com.amazonaws.http.AmazonHttpClient.executeWithTimer(AmazonHttpClient.java:437)
    at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:386)
    at com.amazonaws.services.ec2.AmazonEC2Client.doInvoke(AmazonEC2Client.java:12031)
    at com.amazonaws.services.ec2.AmazonEC2Client.invoke(AmazonEC2Client.java:12001)
    at com.amazonaws.services.ec2.AmazonEC2Client.requestSpotInstances(AmazonEC2Client.java:11046)
Félix C. Morency
@fmorency
Feb 14 2017 21:33
Could it be because I want a t2.micro instance and there are no spot instance of that type?
Paolo Di Tommaso
@pditommaso
Feb 14 2017 21:34
I think so, try not using spotPrice
Félix C. Morency
@fmorency
Feb 14 2017 21:34
Yup removing spotPrice did the trick
Paolo Di Tommaso
@pditommaso
Feb 14 2017 21:35
I think t2.micro are supposed to be free .. quite difficult to get a lower price for them :)
Félix C. Morency
@fmorency
Feb 14 2017 21:37
Right. I simply copy/pasted the nextflow.config from the NF cloud blog post :P
Paolo Di Tommaso
@pditommaso
Feb 14 2017 21:37
really ?!
Félix C. Morency
@fmorency
Feb 14 2017 21:38
Yeah. I changed the AMI and the EFS tho
which one? not here
Mmm I now have a
Launching master node -- Waiting for `running` status.. 
ERROR ~ Unable to execute HTTP request: ec2.us-east-1.amazonaws.com
I can see the instances running
Paolo Di Tommaso
@pditommaso
Feb 14 2017 21:40
weird, any network problem ?
Félix C. Morency
@fmorency
Feb 14 2017 21:41
Not that I know of. I'll retry creatign the cluster
Paolo Di Tommaso
@pditommaso
Feb 14 2017 21:41
use the -log just in case
Félix C. Morency
@fmorency
Feb 14 2017 21:52
The cluster finally started
Paolo Di Tommaso
@pditommaso
Feb 14 2017 21:53
:+1:
Félix C. Morency
@fmorency
Feb 14 2017 21:54
Thanks
Paolo Di Tommaso
@pditommaso
Feb 14 2017 21:54
enjoy