These are chat archives for nextflow-io/nextflow

28th
Nov 2018
Krittin Phornsiricharoenphant
@sinonkt
Nov 28 2018 07:34

@pditommaso Hi, i stuck using nextflow with s3 minio over https.
Can i skipped certificate validation like

krittin@dgx101:~$ mc ls biotec_minio
mc: <ERROR> Unable to list folder. Get https://10.227.2.57/: x509: cannot validate certificate for 10.227.2.57 because it doesn't contain any IP SANs
krittin@dgx101:~$ mc ls biotec_minio --insecure
[2018-11-27 01:33:10 +07]     0B biotec/
[2018-11-26 22:22:13 +07]     0B moph/
[2018-11-28 19:53:27 +07]     0B test/
[2018-11-28 21:06:14 +07]     0B wgsoutput/
krittin@dgx101:~$

I got these error for my simple config

aws {
    accessKey = 'my_access_key'
    secretKey = 'my_secret_key'
    client {
      endpoint='10.227.2.57:443'
      protocol='https'
    }
}
ERROR ~ Unable to execute HTTP request: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
Paolo Di Tommaso
@pditommaso
Nov 28 2018 10:35
that some mess with SSL certificate with your jvm
Krittin Phornsiricharoenphant
@sinonkt
Nov 28 2018 13:32
Thanks, for your hint!. now i added public cert to jvm and change config to use domain name but
I got these error instead.
how can i access s3 by these uri mydomain.com/minio/{bucket_name} instead of resolve {bucket_name}.mydomain.com
N E X T F L O W  ~  version 18.10.1
Launching `/data/krittin/alignment/main.nf` [prickly_lovelace] - revision: 95adc78b9f
[warm up] executor > local
ERROR ~ Unable to execute HTTP request: mybucket.mydomain.com

 -- Check script 'main.nf' at line: 43 or see '/data/krittin/alignment//logs/20181128/.nextflow.2017.log' file for more details
my file point to s3://mybucket/project. all of these work fine when i point to ip and not used https.
micans
@micans
Nov 28 2018 13:36

In the script section, but before the actual script, I'm trying to get some groovy code to write a file in the process execution directory. The file writing works, but it ends up in the baseDir. Can I access the execution directory? The code looks roughly like this:

  input:
  set val(aligner), val(counts) from ch_merge

  script:
  File lstfile = new File("${aligner}.txt")
  lstfile.withWriter{ out ->
    counts.each {out.println it}
  }
  """
  do something with ${aligner}.txt
  """

The reason is again (as yesterday) a large amount of files, but this time I have them in tuples as a result of transpose(), so collectFile() will not work.

Krittin Phornsiricharoenphant
@sinonkt
Nov 28 2018 14:45
I try to set disable certificate validation by.
export NXF_OPTS='-Dcom.sun.net.ssl.checkRevocation=false
Krittin Phornsiricharoenphant
@sinonkt
Nov 28 2018 15:03
ERROR ~ Unable to execute HTTP request: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable tofind valid certification path to requested target
Krittin Phornsiricharoenphant
@sinonkt
Nov 28 2018 15:17
also no luck with these.
env {
  NXF_OPTS='-Dcom.sun.net.ssl.checkRevocation=false'
}
micans
@micans
Nov 28 2018 16:01
fwiw ... for now I've worked around it like this:
    fnames=counts.collect{ it.toString()}.join("\n")
    """
    cat <<EOD > lsofiles
$fnames
EOD
    # rest of script
   """
(apologies @sinonkt for interweaving your s3/ssl issues)
Krittin Phornsiricharoenphant
@sinonkt
Nov 28 2018 16:06
@micans never mind :))
Paolo Di Tommaso
@pditommaso
Nov 28 2018 16:14
@micans do that outside the proc using a map operator
micans
@micans
Nov 28 2018 16:25
Thanks @pditommaso ! I overlooked that option -- :grin: :palm_tree:
third time lucky
Paolo Di Tommaso
@pditommaso
Nov 28 2018 16:26
:+1:
micans
@micans
Nov 28 2018 16:28
I think I have a test now for the many-files-in-directory, but it's not working yet. This is the code,:
ch_merge_fc
  .map{ aligner, file -> [aligner, file.toString()] }
  .transpose()
  .groupTuple()
  .set{ ch_merge_fc_byaligner }
In the execution directory I get these files:
bc-3,e8/7de35b1a2983badc653ca670d91c40, ls
farm4b+small-star-fc-genecounts.txt  input.1  input.2  input.3    input.4  lsofiles
where
bc-3,e8/7de35b1a2983badc653ca670d91c40, file input.1
input.1: symbolic link to `/lustre/scratch117/cellgen/cellgeni/TIC-misc/tic-97/work-farm4b+small/tmp/6e/6a849f7461b252d172daec57993f2a/input.1'
Perhaps this is not very clear ... I can make a standalone example. Anyway I was hoping that file.toString() would contain the true locations of the input files, but instead I get this magical input.? behaviour
Paolo Di Tommaso
@pditommaso
Nov 28 2018 16:33
that's is happening because you are using file.toString() instead of file I guess
micans
@micans
Nov 28 2018 16:36
Yesterday I mentioned the 30K files; my aim is to prevent linking 30K files into a single execution directory, so the thought was to create a file containing per line the paths of files in other work directories. The above is me trying to work towards that. If I omit the map I still get symbolic links ... any ideas?
file.trueWorkDirectoryLocation, as it were, and omitting the linking of files. This may be too much to ask ... nonetheless, the 30k files do pose a problem.
Paolo Di Tommaso
@pditommaso
Nov 28 2018 16:38
yes, but I was suggesting to use collectFile for that
micans
@micans
Nov 28 2018 16:39
Indeed, but now I have the transpose().groupTuple() that gets in the way .... extra challenge. Does that make it impossible?
Otherwise I may need to duplicate the process, so I can do collectFile() twice. I hope I miss something obvious somewhere.
Paolo Di Tommaso
@pditommaso
Nov 28 2018 16:47
not understanding how you planning to use transpose().groupTuple() to get a list of files
collectFile allows you to put each entry you have in a channel, if you use groupTuple you will get many files per item making more complicated
I would suggest you prototype that snippet with the console
micans
@micans
Nov 28 2018 16:54
Fair enough. I have a prototype with values, will modify it to have files. I'm testing a feature where our pipeline can run different aligners in the same workflow. So I will have 2 times 30k files, and each item is a tuple [aligner, countfile]. When I transpose().groupTuple() this I get [star, [f1 .. 30k files]] and [hisat2, [g1 .. 30k files]]. So it is the combination of these two features (1) many files and (2) multiple aligners that leads to this problem.
Paolo Di Tommaso
@pditommaso
Nov 28 2018 16:55
then you will have a task processing 30k files at time?
micans
@micans
Nov 28 2018 16:55
Yes ... merging count files for 30k samples.
Perhaps not sustainable long term, but right now I don't see a way around it.
yesterday we got a Smartseq data set on 30k samples.
Paolo Di Tommaso
@pditommaso
Nov 28 2018 16:57
the 30k paths are supposed to be in the text file you are trying to create or just passed to the process using the usual groupTuple ?
the latter I expect you will get huge command wrappers
micans
@micans
Nov 28 2018 16:58
The first option you mention -- well that was my suggestion to overcome the link problem.
Yes, exactly. No I'd like to create a text file and not have all the files linked.
Paolo Di Tommaso
@pditommaso
Nov 28 2018 16:59
being so is more or less the same
 transpose().groupTuple().collectFile { id, files -> [ id, files.collect{ it.toString() }.join('\n') ] }
or something like that
try it
micans
@micans
Nov 28 2018 17:01
ahhhhhhh. cool. Thanks! It looks so promising, hadn't realised you can use collectFile like that. Now need to do something for Vlad first :-)
:+1:
Paolo Di Tommaso
@pditommaso
Nov 28 2018 19:28
quick reminder if didn't vote yet
Tobias "Tobi" Schraink
@tobsecret
Nov 28 2018 19:42
So this poll shows NextFlow has the most active twitter community :sweat_smile:
Paolo Di Tommaso
@pditommaso
Nov 28 2018 19:45
definitely, still valuable :wink:
micans
@micans
Nov 28 2018 21:45
@pditommaso I have a small example, it works, it looks a bit like magic, especially what happens to id in the above code.
process star {
  output: set val('star'), file('*.txt') into ch_f
  script:
  '''
  echo amazingly > f1.txt; echo few > f2.txt echo discotheques > f3.txt
  '''
}
process hisat2 {
  output: set val('hisat2'), file('*.txt') into ch_g
  script:
  '''
  echo six > g1.txt; echo jeopardized > g2.txt; echo gunboats > g3.txt
  '''
}
ch_g.mix(ch_f)
  .transpose()
  .groupTuple()
  .collectFile { id, files -> [ id, files.collect{ it.toString() }.join('\n') + '\n' ] }
  .println()
This prints:
/Users/micans/nf/work/tmp/ac/371521c6b3d93f2dbd644c36dd75e5/hisat2
/Users/micans/nf/work/tmp/ac/371521c6b3d93f2dbd644c36dd75e5/star
So the id ends up in the filename, at the end; is this a completely robust feature?
It works though:
gershwin:nf micans$ cat /Users/micans/nf/work/tmp/ac/371521c6b3d93f2dbd644c36dd75e5/hisat2
/Users/micans/nf/work/a6/aaee6a5897801dd4e150e81013a446/g1.txt
/Users/micans/nf/work/a6/aaee6a5897801dd4e150e81013a446/g2.txt
/Users/micans/nf/work/a6/aaee6a5897801dd4e150e81013a446/g3.txt
I also noted with interest the magic files resides in work/tmp