These are chat archives for nextflow-io/nextflow

11th
Jan 2019
tbugfinder
@tbugfinder
Jan 11 05:59
@t-neumann What's in your key file? Is it just the public ssh key?
Tobias Neumann
@t-neumann
Jan 11 07:51
@tbugfinder No it is the RSA PRIVATE KEY. Was always sufficient to log on to any of the manually launched EC2 instances
Martin Proks
@matq007
Jan 11 13:53
Hey guys has anyone faced an issue with time limit when using local executor?
Caused by:
  Process exceeded running time limit (4h)
Martin Proks
@matq007
Jan 11 13:56
I just realized the issue when looking into base.config: time = { check_max( 4.h * task.attempt, 'time' ) }
Paolo Di Tommaso
@pditommaso
Jan 11 13:56
what's the issue ?
Martin Proks
@matq007
Jan 11 13:57
no issue, I just have to change the 4h * task.attempt to 8h, this should solve the issue hopefully
Paolo Di Tommaso
@pditommaso
Jan 11 13:57
nice
Maxime Garcia
@MaxUlysse
Jan 11 14:03
@matq007 So you had a time limit set up
Tobias Neumann
@t-neumann
Jan 11 14:07
So I'm still failing to startup an instance with nextflow cloud create with my .pem keyFile. I would like to try maybe the keyName instead, but what is this actually referring to? In the documentation it states it is a "key-pair defined in your AWS account". What does that actually mean?
Paolo Di Tommaso
@pditommaso
Jan 11 14:08
it the name associated to your security pair
tip, let NF to create one automatically w/o specifying it
micans
@micans
Jan 11 14:25
I assume there is a change in .command.run in the latest edge release: tail: unrecognized option '--pid=52', and nfx_trace_linux has tail --pid=$mem_proc -f /dev/null
Paolo Di Tommaso
@pditommaso
Jan 11 14:25
ah-ah, here it comes
micans
@micans
Jan 11 14:25
hehehe
Paolo Di Tommaso
@pditommaso
Jan 11 14:26
yes, it has been heavily rewritten
micans
@micans
Jan 11 14:26
kingfisher,24/1c335c79574b043607542eae26bcaa :-() tail --version
tail (GNU coreutils) 8.26
I'm happy to try and find a better tail
Paolo Di Tommaso
@pditommaso
Jan 11 14:26
too bad
micans
@micans
Jan 11 14:26
bsd tail?
ah
np, exciting new development
maybe our tail is ancient
our LSF tail does have that option
Paolo Di Tommaso
@pditommaso
Jan 11 14:27
I see :/
micans
@micans
Jan 11 14:27
mmmmm. My NF pod has it too. Hold on.
Paolo Di Tommaso
@pditommaso
Jan 11 14:27
do you feel like to open an issue ?
micans
@micans
Jan 11 14:28
I do, but let me make the issue clearer first
Paolo Di Tommaso
@pditommaso
Jan 11 14:28
nice
micans
@micans
Jan 11 14:39
We have another (external) container where the program ps is missing. We need to derive a new patched container from it ... in general nextflow will depend on a certain set of core utils present in all containers, right? There is no way around that
Paolo Di Tommaso
@pditommaso
Jan 11 14:39
ps is (and was) needed if you want to collect metrics
before it was failing silently that's a bad idea ..
I struggled to keep deps as minimal as possible compliant with biocontainer base image
micans
@micans
Jan 11 14:42
ps is in our tracer/bracer arm of the pipeline. (1) I did not use reporting (2) the pipeline ran (it had glusterfs issues that are unrelated I assume, not sure) but (3) I saw the ps errors popping up in the .err files
The image with fail tail is this: quay.io/biocontainers/samtools:1.8--4; looking into it
Paolo Di Tommaso
@pditommaso
Jan 11 14:43
tail should be fixed
micans
@micans
Jan 11 14:44
fixed in that image you mean?
Paolo Di Tommaso
@pditommaso
Jan 11 14:44
no, in NF
weird about ps, it's a contaienr ?
micans
@micans
Jan 11 14:44
yes, I blame the container/maintainer
that is just weird, not having ps
Paolo Di Tommaso
@pditommaso
Jan 11 14:45
yes, but my point is that w/o trace ps should not be a problem
could you report an issue with a test and the container so that I can debug
micans
@micans
Jan 11 14:46
Will try hard :-)
Paolo Di Tommaso
@pditommaso
Jan 11 14:46
are we having the call in 15 min BTW ? :)
micans
@micans
Jan 11 14:47
I am surprised about the samtools/tail issue. Want to know for sure they have an old tail somehow. Yes, meeting is coming up. Not much progress here I'm afraid, but Vlad (and I) are ready for it.
Paolo Di Tommaso
@pditommaso
Jan 11 14:48
:+1:
micans
@micans
Jan 11 14:50

As a potential shortcut, let me describe the ps thing here. I'll copy it to an issue! I get e.g. 26 lines in a single .err file:

/workspace/svd/tic-98/work/92/cdf4ba07f830a4b32a94008b69770e/.command.stub: line 45: ps: command not found

In .command.stub there is only one ps invocation, context:

nxf_tree() {
    declare -a ALL_CHILD
    while read P PP;do
        ALL_CHILD[$PP]+=" $P"
    done < <(ps -e -o pid= -o ppid=)
how is this on the hot-cold scale?
Paolo Di Tommaso
@pditommaso
Jan 11 14:51
ummm .command.stub does not exist any more !
micans
@micans
Jan 11 14:51
hehehe
freezing!
ok, I ran this with 18.12.0
cool, this is potentially a non-issue then
Paolo Di Tommaso
@pditommaso
Jan 11 14:52
I see
micans
@micans
Jan 11 14:52
most likely even
just to look into tail now
sorry about the line noise
Paolo Di Tommaso
@pditommaso
Jan 11 14:54
online now, when you want
micans
@micans
Jan 11 14:54
Coming ... same link?
Paolo Di Tommaso
@pditommaso
Jan 11 14:55
I guess so
micans
@micans
Jan 11 14:56
:+1:
Vladimir Kiselev
@wikiselev
Jan 11 15:01
Hi Paolo, we are at the link
Paolo Di Tommaso
@pditommaso
Jan 11 15:02
ummm
I cant see you
Tain Mauricio Velasco Luquez
@TainVelasco-Luquez
Jan 11 16:31
@stevekm Right, I get the differences. Regarding the STAR usage, the idea of @rsuchecki is quite possible as it loads the reference string into memory in one process, and then, in a different process, align the reads to the already loaded string. That is the usage of STAR. However due to some time limits I am not exploring the load part for now. Thank you anyway for your ideas and responses. Best
Tain Mauricio Velasco Luquez
@TainVelasco-Luquez
Jan 11 17:01

Hello nextflowers!

The objective of the pipeline is to run the aligner STAR and then quantify using RSEM for some RNAseq reads. There are several reads read1.fastq.gz, read2.fastq.gz... In the first process what I need is to create a dir with the base name of file and inside it store only two files, from the several files STAR outputs, that I will use in the second process.

The final tree output should be like:

outDir/file1/STAR/twofiles
outDir/file1/RSEM/outfile
outDir/file2/STAR/twofiles
outDir/file2/RSEM/outfile

This is what I have done for the first part:

params.reads
//params.star_index
outDir = params.outdir
//params.cpus

Channel
.fromPath( params.reads )
.ifEmpty { error "Cannot find any reads matching: ${params.reads}" }
.set { dataset }


process STAR_alignment {

  tag "$prefix"

  publishDir pattern: "*Aligned*",  "${params.outdir}/STAR/$prefix/", mode: 'copy'

  input:
  file( reads ) from dataset.collect()

  output:
  file("*Aligned.toTranscriptome.out.bam") into aligned_to_transcritome
  file('*.Aligned.out.bam') into aligned_to_genome

  script:
  prefix = reads[0].toString() - ~/(_R1)?(_trimmed)?(_val_1)?(\.fq)?(\.fastq)?(\.gz)?$/

  """
  for i in $reads
  do
  for j in $prefix
  do
    mkdir \$j
    STAR \\
      --runThreadN 22 \\
      --genomeDir /home/tain/Documents/RNA_seq/My_genomes/GCA.GRCh38/GCA_000001405.15_GRCh38.STAR.index/ \\
      --readFilesIn \$i \\
      --readFilesCommand gunzip -c \\
      --outSAMtype BAM Unsorted \\
      --quantMode TranscriptomeSAM \\
      --outFileNamePrefix \$j/
  done
done
  """
}

I had to add the for loops because all files were being send at the same time to STAR: STAR --readFilesIn read1.fastq.gz read2.fastq.gz
I also add the mkdir because STAR is not capable of creating the dir with the option --outFileNamePrefix.

Now the error I am getting is that in the first pass the dir is created with the name of the first prefix, but in the second pass the mkdir is taking again the name of the first prefix so there is class and mkdir outputs:

Command error:
  mkdir: cannot create directory ‘lines2’: File exists

Do you have any idea of how can I fix this?

I fixed the link ;)
mate, you creating a directory in a loop, the second time such directory already exist ..
micans
@micans
Jan 11 17:07
Also it seems that you only have one prefix? So you don't need the second for loop (over j). Then you can do the mkdir before the first for loop. But I may not understand the program logic as you intend it.
so you could mkdir $prefix before the for i in $reads
tbugfinder
@tbugfinder
Jan 11 19:34
@t-neumann don't use the private key file. Extract the public key and use that one. If you want to use the key-name create a keypair in the EC2 Console (define the name), download pem file and use it to login. Maybe you already have a key in EC2 Console?