These are chat archives for nextflow-io/nextflow

16th
Sep 2016
Paolo Di Tommaso
@pditommaso
Sep 16 2016 07:59
Hi Mike, welcome to the famous JAR hell :)
Anyway it's not cleat to me how are you building your module and how are you including it in the nextflow script.
Paolo Di Tommaso
@pditommaso
Sep 16 2016 09:33
upload it with this command
scp  -i <key> <data> ec2-user@ec2-54-194-247-113.eu-west-1.compute.amazonaws.com:~
Mike Smoot
@mes5k
Sep 16 2016 17:01

Hi Paolo, here is my gradle build file: http://pastebin.com/xexb9Y1a. The way I consume the code in nextflow is like this:

@GrabResolver(name='sgi_snapshots', root='http://192.168.99.100:8081/content/repositories/snapshots/')
@GrabResolver(name='sgi_releases', root='http://192.168.99.100:8081/content/repositories/releases/')


@Grab(group='com.syntheticgenomics.compbio', module='nextflow_utilities', version='0.1-SNAPSHOT')
@GrabExclude('org.codehaus.groovy:groovy-all')

import com.syntheticgenomics.compbio.nextflow.Utils

// normal pipeline stuff

workflow.onComplete{ Utils.notify( workflow, 'none', 'test subject', 'results', true ) }

and then here is the notify method itself: http://pastebin.com/H609PH2F

Paolo Di Tommaso
@pditommaso
Sep 16 2016 17:02
I see. If you use @Grab annotation you need to disable the transitive deps already included in the Nextflow runtime
Mike Smoot
@mes5k
Sep 16 2016 17:02
If you've got a better way, I'm all ears! :)
Paolo Di Tommaso
@pditommaso
Sep 16 2016 17:03
yes, copy your JAR in a lib folder in the project root
it is automatically added to the classpath
(actually you can even add groovy files, and they are compiled on-fly)
Mike Smoot
@mes5k
Sep 16 2016 17:08
I'm not sure that would actually solve the problem. I can find and resolve the jar file without problem, the issue seems to be that the nextflow version I specify in the gradle build file needs to be exactly the same as the version of nextflow that I use to run the pipeline. Somehow when I compile the jar groovy embeds a version number into the jar and only that precise version of groovy can be used to run the pipeline.
Paolo Di Tommaso
@pditommaso
Sep 16 2016 17:09
the JAR does not contain any information on the dependencies
are you uploading it to Maven ?
Mike Smoot
@mes5k
Sep 16 2016 17:10
Yeah, I publish it to an internal nexus repository.
Paolo Di Tommaso
@pditommaso
Sep 16 2016 17:10
I saw, you have a local repo, that tracks the deps
Yes, you lib depends on
  compile 'org.codehaus.groovy:groovy-all:2.4.4'
    compile 'io.nextflow:nextflow:0.+'
you have three way
1) tell @Grab to ignore transitive deps, I'm quite sure there's a flag for that
2) declare in your gradle build as provided instead of compile, that means that Gradle will use the deps to compile, but won't include in the deps because are supposed to be provided at runtime
3) copy your jar manually in the lib folder as I said
about 2) provided is not implemented by default by Gradle, you need to write a custom rule
Mike Smoot
@mes5k
Sep 16 2016 17:14
I think 2) might be the trick. I've already got 1) with the @GrabExclude('org.codehaus.groovy:groovy-all'), although perhaps I need to exclude nextflow too. Will experiment!
Paolo Di Tommaso
@pditommaso
Sep 16 2016 17:14
it seems latest version of Gradle have implemented it
I need to exclude nextflow too. Will experiment!
yes of couse
Mike Smoot
@mes5k
Sep 16 2016 17:15
Yup, excluding nextflow did the trick! Thank you so much. This saved much copy+pasting! :)
Paolo Di Tommaso
@pditommaso
Sep 16 2016 17:15
:)
enjoy
Mike Smoot
@mes5k
Sep 16 2016 17:16
I'm curious why you think adding the jar to a lib directory is a better approach than grape grab? It seems like the jar would need to be in git, would then get copied, etc..
Paolo Di Tommaso
@pditommaso
Sep 16 2016 18:35
well, because won't need to manage the deployment to the nexus repo
but it's a valid option
Evan Floden
@evanfloden
Sep 16 2016 19:57
Out of curiosity, has anyone run NF using bash for windows or has a windows machine with the developer preview they can run the nextflow install command on?
Paolo Di Tommaso
@pditommaso
Sep 16 2016 20:09
interesting to investigate
Mike Smoot
@mes5k
Sep 16 2016 21:07

@pditommaso Hi Paolo, sorry to bother with another debugging problem, but I've got a process that seems to be failing intermittently and I'm at a loss to explain why. Here is the process:

process preprocessProteinAlignments {
    container "${euk_container}"
    tag { genewise_file.name }
    errorStrategy 'retry'
    maxRetries 2
    cache 'deep'

    input:
    set contig_id, file(genewise_file), file(dna_file), file(protein_file) from genewise_output_1

    output:
    set stdout, file("${genewise_file}.fa"), file(genewise_file) into valid_proteins

    script:
    """
    write_valid_proteins.py -g ${genewise_file} -d ${dna_file} --hints ${hint_list}
    """
}

The process fails because one or the other input files (dna_file or genewise_file) doesn't exist when the python script tries to open it. Naturally, when I look in the work directory, I see the files. If this were running on an NFS partition, I'd blame the problem on network gremlins, but it's running on a local disk. My first attempt at handling this problem was to add the errorStrategy 'retry' and maxRetries 2, but for reasons I don't understand the process does not get retried. I've got

process {
  errorStrategy = 'finish'
}

set in the nextflow.config file if that changes anything. Any thoughts on how to debug this further would be appreciated.

Mike Smoot
@mes5k
Sep 16 2016 21:17
Is the lack of retries because I didn't have maxErrors 2 set?
Paolo Di Tommaso
@pditommaso
Sep 16 2016 21:59
process does not get retried
never ?
I would focus on the task problem, not trying to retrying it
I mean, isolating running directly the task script w/o nextflow
Mike Smoot
@mes5k
Sep 16 2016 22:04

The task script has run fine several thousand other times. Here's the error I'm getting from the python script when this process fails:

Command exit status:
  1

Command output:
  (empty)

Command error:
  Traceback (most recent call last):
    File "/usr/bin/write_valid_proteins.py", line 24, in <module>
      main()
    File "/usr/bin/write_valid_proteins.py", line 21, in main
      parseValidProteinAlignments(args.hints, args.genewise, args.dna)
    File "/usr/lib/python2.7/site-packages/eukaryotic_annotation/gff.py", line 811, in parseValidProteinAlignments
      if gwObj.validStartCodon(dnaFile) and gwObj.validStopCodon(dnaFile):
    File "/usr/lib/python2.7/site-packages/eukaryotic_annotation/gff.py", line 462, in validStartCodon
      dnaRecord = self.__parseDnaRecord(dnaFile)
    File "/usr/lib/python2.7/site-packages/eukaryotic_annotation/gff.py", line 556, in __parseDnaRecord
      with open(dnaFile, 'rU') as fhIn:
  IOError: [Errno 2] No such file or directory: 'we03730_nuc_v150325_PM_chr_2_186000-189000.fa'

I'm not sure how else to interpret that other than the file doesn't seem to linked into the directory yet.

Paolo Di Tommaso
@pditommaso
Sep 16 2016 22:05
local file system you said?
Mike Smoot
@mes5k
Sep 16 2016 22:05
yup
Paolo Di Tommaso
@pditommaso
Sep 16 2016 22:06
what if you change in task work dir and run bash .command.run ?
Mike Smoot
@mes5k
Sep 16 2016 22:09
trying
Paolo Di Tommaso
@pditommaso
Sep 16 2016 22:09
this is the first thing to try!
;)
Mike Smoot
@mes5k
Sep 16 2016 22:11
it failed, which surprises me
Paolo Di Tommaso
@pditommaso
Sep 16 2016 22:11
ok, now you have a thing on which you can work :)
Mike Smoot
@mes5k
Sep 16 2016 22:12
that's good!
Looking at this I see that both symlinks resolve to actual files
Paolo Di Tommaso
@pditommaso
Sep 16 2016 22:13
um
Mike Smoot
@mes5k
Sep 16 2016 22:27
So I did a cp -Lr work/74/... new_dir and ran things again to see if this was somehow related to the links. It failed, but what really confuses me is that running bash .command.run somehow changed all the actual files in new_dir to symlinks again.
Paolo Di Tommaso
@pditommaso
Sep 16 2016 22:28
yes, the wrapper script creates the links
you may want to comment out that code
anyway it seems there's a problem with the python script
Mike Smoot
@mes5k
Sep 16 2016 22:31
I think you're right. I just pulled out the function that was failing into a standalone script and it worked. So it seems like the filename is getting munged somehow in the python script. Will keep banging on things!
Paolo Di Tommaso
@pditommaso
Sep 16 2016 22:32
good luck & good night (for me)
Mike Smoot
@mes5k
Sep 16 2016 22:34
Thanks for the help!
Paolo Di Tommaso
@pditommaso
Sep 16 2016 22:34
welcome