These are chat archives for nextflow-io/nextflow

27th
Jun 2018
Radoslaw Suchecki
@bioinforad_twitter
Jun 27 2018 07:26
Can basic file i/o operations be applied to files within processes native execution exec: block?
Radoslaw Suchecki
@bioinforad_twitter
Jun 27 2018 07:34

For example given:

process generateGenomeBlocks {
  input:
    set val(species), file(idx) from remoteIndices

you can easily access values e.g.

exec: 
  println species

or

exec:
  println "$species"

but I can't find a way to access the content of idx file.

Beyond this technical issue, would it even be a good idea to do more computation via exec?
Pierre Lindenbaum
@lindenb
Jun 27 2018 08:32
Hi all, I'm generating a lot of sql files per gene and I read them later. Is there a way, using introspection, to get the hash of the current job (something like work/0d/02fb6f698880081fd9156ab315ea2a) to insert it in my sql in order to quickly find the directory of my gene when I look at the results. I cannot find this information in https://www.nextflow.io/docs/latest/metadata.html (I tried scriptID, sessinId, scriptFile, ... )
Maxime Garcia
@MaxUlysse
Jun 27 2018 08:33
a pwd inside the script?
Pierre Lindenbaum
@lindenb
Jun 27 2018 08:34
@MaxUlysse nice idea ! :-) thanks !
Maxime Garcia
@MaxUlysse
Jun 27 2018 08:34
Thanks
but otherwise I would look into task metadata, not workflow metadata
Pierre Lindenbaum
@lindenb
Jun 27 2018 08:41
@MaxUlysse where is it in the doc ?
Maxime Garcia
@MaxUlysse
Jun 27 2018 08:45
Not sure it's in the doc
but in the trace-report they are displaying the hash
so it is definitively somewhere
Pierre Lindenbaum
@lindenb
Jun 27 2018 08:49
:+1:
Pierre Lindenbaum
@lindenb
Jun 27 2018 08:59

Another question, I often gather a list of files using

cat << __EOF__ > statements.sql
${one_sql_file_pergene.join("\n")}
__EOF__

is there a more elegant solution ?

thanks

Maxime Garcia
@MaxUlysse
Jun 27 2018 09:01
where are you doing that?
Pierre Lindenbaum
@lindenb
Jun 27 2018 09:05
in the script section. Gathering a list of files to use it later. For example using a list of vcf files per chromosome for java -jar picard.jar GatherVcf I=subvcf.list O=out.vcf (when the argument ends with '.list' , picard interprets it as a list of files )
Maxime Garcia
@MaxUlysse
Jun 27 2018 09:09
Oh I see
No idea on how to make that more elegant, but I'd love to see that too
Pierre Lindenbaum
@lindenb
Jun 27 2018 09:15
... something like one_sql_file_pergene.toFile("statement.sql") ?
Paolo Di Tommaso
@pditommaso
Jun 27 2018 16:00
@lindenb I think that could a nice addition to the nextflow DSL providing an helper method or something like that.
Could you open an issue on GitHub describing this use case?
Mike Smoot
@mes5k
Jun 27 2018 16:14
Hi @pditommaso how do I build this project https://github.com/nextflow-io/nextflow-s3fs such that I can use it in my nextflow build? I'm able to successfully generate the jar file, but I'm not sure how gradle goes about sharing it (local maven repo?). FWIW, I'm trying to add this as an option for aws authentication.
Jemma Nelson
@fwip
Jun 27 2018 16:17
Documentation nit: It looks like process directives are meant to be listed in alphabetical order in the documentation, but the label directive appears after queue: https://www.nextflow.io/docs/latest/process.html#label
It tripped me up a second when I was looking for the label documentation, but if it's intended don't worry about it. :)
Paolo Di Tommaso
@pditommaso
Jun 27 2018 16:47
@mes5k use ./gradlew publishToMavenLocal in that s3fs project
@fwip oops, thanks for reporting that
Jemma Nelson
@fwip
Jun 27 2018 16:49
no problem :) thank you for being so helpful and responsive in this channel! I really appreciate it.
Paolo Di Tommaso
@pditommaso
Jun 27 2018 17:06
:smile:
Mike Smoot
@mes5k
Jun 27 2018 17:20
@pditommaso that sounds right, but I get the following:
[master !?]$ gradlew publishToMavenLocal

FAILURE: Build failed with an exception.

* What went wrong:
Task 'publishToMavenLocal' not found in root project 'nextflow-s3fs'.

* Try:
Run gradlew tasks to get a list of available tasks. Run with --stacktrace option to get the stack trace. Run with --info or --debug option to get more log output.

* Get more help at https://help.gradle.org
Paolo Di Tommaso
@pditommaso
Jun 27 2018 17:21
Oh
Mike Smoot
@mes5k
Jun 27 2018 17:21
Also, on a fresh checkout ~1/4 tests fail.
Paolo Di Tommaso
@pditommaso
Jun 27 2018 18:10
unfortunately that project is in a bad shape, that tests failed from the original fork, so you can ignore them
likely it should be possible to install locally the jar using
./gradlew install -Dmaven.repo.local=${HOME}/.nextflow/capsule/deps/ -x signArchives
Mike Smoot
@mes5k
Jun 27 2018 18:17
Cool, thanks. The patch I'm working on may not matter in the end. I just realized that the session tokens I'm trying to deal with are only good for one hour, which would make it impossible to run a pipeline that lasts longer than an hour without somehow refreshing the tokens. I think we're going to need local IAM roles. Just need to convince IT.
Olivia Sandvold
@osandvold302
Jun 27 2018 21:22
hi, new to Nextflow, and I have a quick question - I would like to bring in a number of .fa files into a work dir so that when I run bwa mem, the command can then look through all of the files and find the appropriate indexed file.
I do not want to run the same process on each .fa file.
how do I set up this input channel so that it brings in all the files I need into the work dir without running the process for each file I need? My current implementation attempt has all of the files stored into an array
Paolo Di Tommaso
@pditommaso
Jun 27 2018 21:27
something like that
my_files = Channel.fromPath('/my/data/*.fa')

process foo {
  input: file all_files from my_files.collect()
  """
    your command --input $all_file
  """
}
Olivia Sandvold
@osandvold302
Jun 27 2018 21:35
great thanks!
Paolo Di Tommaso
@pditommaso
Jun 27 2018 21:36
you are welcome