These are chat archives for nextflow-io/nextflow

31st
Aug 2017
mahdi-b
@mahdi-b
Aug 31 2017 00:34
Thanks for the info. That's very useful. I'll start by getting familiar with the AmazonCloudDriver and will get back in touch.
spaceturtle
@spaceturtle
Aug 31 2017 07:53
I just wonder if nextflow can create a channel to monitor if a specific pair of files were created and emit this pair of files when created. Something like watchPath, but watchPath can monitor individual files, not pairs of files.
Paolo Di Tommaso
@pditommaso
Aug 31 2017 07:58
nope, but it should be possible to implement it as below
Channel
    .watchPath( params.pairs )
    .map { path -> 
       def prefix = readPrefix(path, params.pairs)
       tuple(prefix, path) 
    }
    .groupTuple(size: 2, sort: true)
    .set { read_pairs_ch }
where readPrefix is a function that extract the prefix given a file path and the pattern, eg
def readPrefix( Path actual, template ) {

    final fileName = actual.getFileName().toString()

    def filePattern = template.toString()
    int p = filePattern.lastIndexOf('/')
    if( p !=1 ) filePattern = filePattern.substring(p+1)
    if( !filePattern.contains('*') && !filePattern.contains('?') ) 
        filePattern = '*' + filePattern 

    def regex = filePattern
                    .replace('.','\\.')
                    .replace('*','(.*)')
                    .replace('?','(.?)')
                    .replace('{','(?:')
                    .replace('}',')')
                    .replace(',','|')

    def matcher = (fileName =~ /$regex/)
    if( matcher.matches() ) {  
        def end = matcher.end(matcher.groupCount() )      
        def prefix = fileName.substring(0,end)
        while(prefix.endsWith('-') || prefix.endsWith('_') || prefix.endsWith('.') ) 
          prefix=prefix[0..-2]

        return prefix
    }

    return null
}
Anthony Underwood
@aunderwo
Aug 31 2017 10:11
Hi @pditommaso I know you've been talking with colleagues in Oxford university (Jeremey Swann) about using kubernetes. What is your long term plans for Kubernetes? I know it's experimental at the moment.
Paolo Di Tommaso
@pditommaso
Aug 31 2017 10:23
currently the main problem is on Kubernetes side, since it's not so easy to configure a shared file system
Anthony Underwood
@aunderwo
Aug 31 2017 10:24
If colleagues were to make changes on a fork of Nextflow would you consider a pull request?
Paolo Di Tommaso
@pditommaso
Aug 31 2017 10:25
yes of course
what's their idea ?
Anthony Underwood
@aunderwo
Aug 31 2017 10:27
To alter the nextflow kubernetes code to allow specification of mounts
Paolo Di Tommaso
@pditommaso
Aug 31 2017 10:28
I see, he was mentioning that
I fear it's not enough
Anthony Underwood
@aunderwo
Aug 31 2017 10:29
OK. We are also investigating other options without editing nextflow code to allow auto mounting of filesystems on kubernetes machines
Paolo Di Tommaso
@pditommaso
Aug 31 2017 10:31
mounting the file system only in the container, would require to run NF as pod as well
but then it won't be able to access the kubectl
tho it could work using the ignite executor instead
Anthony Underwood
@aunderwo
Aug 31 2017 10:32
Problem is we are constrained to using Azure :(
don't ask
!!
Paolo Di Tommaso
@pditommaso
Aug 31 2017 10:33
um, doesn't it provide something similar to AWS Batch ?
Anthony Underwood
@aunderwo
Aug 31 2017 10:35
yes but the executors in nextflow are not compatible
Paolo Di Tommaso
@pditommaso
Aug 31 2017 10:36
but they could be :)
likely the easiest way is to deploy a slurm cluster in the azure cloud, but there are a lot of alternative that can be explored for which some NF extension can be implemented
Anthony Underwood
@aunderwo
Aug 31 2017 10:41
OK thanks. we'll explore a bit further. Keen to use kubernetes in some ways since it's more cloud agnostic
Paolo Di Tommaso
@pditommaso
Aug 31 2017 10:43
make sense
Anthony Underwood
@aunderwo
Aug 31 2017 10:48
Is there anyway of viewing the pod yaml file for kubernetes that nextflow creates?
Paolo Di Tommaso
@pditommaso
Aug 31 2017 10:58
have a look at the tests
Mike Smoot
@mes5k
Aug 31 2017 18:28
Hi @pditommaso do you have any idea which s3 permissions are required for file("s3://my-bucket/whatever.fa").exists() to work? The IAM role I'm using has s3:Get* and s3:List*, but the exists check is still failing. I'm trying avoid s3:* if I can help it.
Paolo Di Tommaso
@pditommaso
Aug 31 2017 18:34
umm, Get and List should work, there's nothing in the log that can help ?
Mike Smoot
@mes5k
Aug 31 2017 18:44
No exceptions, it just returns false.
Sorry, I do see access denied in the logs.
Paolo Di Tommaso
@pditommaso
Aug 31 2017 18:46
what's the exact message ?
Mike Smoot
@mes5k
Aug 31 2017 18:46
Aug-31 18:44:49.760 [Actor Thread 1] ERROR n.extension.DataflowExtensions - @unknown com.amazonaws.services.s3.model.AmazonS3Exception: Access Denied (Service: Amazon S3; Status Code: 403; Error Code: AccessDenied; Request ID: 411CC43EFF9E93E8)
Paolo Di Tommaso
@pditommaso
Aug 31 2017 18:48
frankly I don't know, I've always used it with full permissions
Mike Smoot
@mes5k
Aug 31 2017 18:51
Ok, that's probably what I'll end up doing. Looking at the stack trace for the error above, it looks like it's getting the bucket ACL and then querying that.