These are chat archives for nextflow-io/nextflow

26th
Oct 2018
Maxime HEBRARD
@mhebrard
Oct 26 2018 02:35 UTC

Hello. is there a way to check if a folder exist ? I try

Channel.fromPath(params.folder)
    .ifEmpty { exit 1, "folder not found: ${params.folder}" }

But when the folder don't exist the eror is not fired

Maxime HEBRARD
@mhebrard
Oct 26 2018 02:45 UTC
solved :
if (file(params.folder).isEmpty()) {
    exit 1, "directory not found: ${params.folder}"
  }
Maxime HEBRARD
@mhebrard
Oct 26 2018 06:58 UTC
Question:
Channel.fromPath(params.samples)
    .splitCsv(header: true, sep: '\t')
   .map {
    // Here each line is one obj{key1:val1, key2:val2}
    // How can I edit val2 on each lien and return the object
    // out: {key1:val1, key2:newVal2}
}
actually I wish to delete or substitute the [space] in val2 ... can I do that here ?
Maxime HEBRARD
@mhebrard
Oct 26 2018 07:39 UTC
I found out that splitCsv create groovy maps and then, if I know the keys I can do :
Channel.fromPath(params.samples)
    .splitCsv(header: ['firstKey', 'secondKey'], skip: 1, sep: '\t')
    .map{ [firstKey: it.firstKey, secondKey: it.secondKey.replaceAll(' ', '_')] }
Paolo Di Tommaso
@pditommaso
Oct 26 2018 09:35 UTC
:+1:
Luca Cozzuto
@lucacozzuto
Oct 26 2018 12:52 UTC
dear @pditommaso I have a question :)
//peptideCSVs
def peptideCSVs = [:]
peptideCSVs["QC01"] = file("${CSV_folder}/knime_peptides_final.csv")
peptideCSVs["QC02"] = file("${CSV_folder}/knime_peptides_final.csv")
peptideCSVs["QC03"] = file("${CSV_folder}/knime_peptides_qc4l.csv") 


process calc_peptide_area {

    input:
    set sample_id, internal_code from shot_featureXMLfiles_for_calc_peptide_area.mix(srm_featureXMLfiles_for_calc_peptide_area)
    file(csvfile) from file(peptideCSVs[internal_code])

    """
        echo "${internal_code} ${csvfile} ${peptideCSVs['QC01']} ${peptideCSVs['QC03']}" 
    """
I got
echo "QC01 knime_peptides_qc4l.csv /nfs/software/bi/biocore_tools/git/nextflow/Qcloud/csv/knime_peptides_final.csv /nfs/software/bi/biocore_tools/git/nextflo
w/Qcloud/csv/knime_peptides_qc4l.csv"
I don't understand why is doing this...
Paolo Di Tommaso
@pditommaso
Oct 26 2018 13:01 UTC
what is this ?
Luca Cozzuto
@lucacozzuto
Oct 26 2018 13:20 UTC
csvfile = knime_peptides_qc4l.csv
while it should be
/nfs/software/bi/biocore_tools/git/nextflow/Qcloud/csv/knime_peptides_final.csv
since
internal code = QC01
Paolo Di Tommaso
@pditommaso
Oct 26 2018 13:32 UTC
the main problem is that you are not supposed to specify file parameters in that way
input files need to be declared as input
Luca Cozzuto
@lucacozzuto
Oct 26 2018 13:34 UTC
what is wrong?
Paolo Di Tommaso
@pditommaso
Oct 26 2018 13:35 UTC
you are not supposed to use peptideCSVs in the command script in that way
Luca Cozzuto
@lucacozzuto
Oct 26 2018 13:36 UTC
is a simple dictionary (hash or associative array, depending on the language :) )
Paolo Di Tommaso
@pditommaso
Oct 26 2018 13:36 UTC
as you want
Luca Cozzuto
@lucacozzuto
Oct 26 2018 13:37 UTC
I would like to call a file depending on the value of internal_code
internal code can be QC01, QC02, QC03... and on that call one of the two files
if there is a better way I'll be happy to use it
Paolo Di Tommaso
@pditommaso
Oct 26 2018 13:38 UTC
you need to resolve that externally, likely using a map operator
Luca Cozzuto
@lucacozzuto
Oct 26 2018 13:41 UTC
ok
I'll do it. Many thanks.
Paolo Di Tommaso
@pditommaso
Oct 26 2018 13:42 UTC
:+1:
Krittin Phornsiricharoenphant
@sinonkt
Oct 26 2018 13:45 UTC
Hi, Paolo, can i use glob pattern over S3 file('s3://my-bucket/data/*.fa’) and is there any cache awareness behind the scene for these kind of downloading and uploading.
Paolo Di Tommaso
@pditommaso
Oct 26 2018 13:45 UTC
you can use it but there's no caching
Krittin Phornsiricharoenphant
@sinonkt
Oct 26 2018 13:47 UTC
and for private S3 compatible like minio instead of AWS should be work just fine?
Paolo Di Tommaso
@pditommaso
Oct 26 2018 13:48 UTC
it may work specifying the your endpoint in the config file
Krittin Phornsiricharoenphant
@sinonkt
Oct 26 2018 13:48 UTC
I’ll try that. Thanks, you’re always lovely. :))
Paolo Di Tommaso
@pditommaso
Oct 26 2018 13:49 UTC
ahah
there are different opinions on this :satisfied:
Krittin Phornsiricharoenphant
@sinonkt
Oct 26 2018 13:51 UTC
LOL 😆
Tobias "Tobi" Schraink
@tobsecret
Oct 26 2018 15:03 UTC
My cluster admin wants me to download my starting data into a certain directory. I see there is a move option for publishDirbut that breaks re-runs. Is there a way to move the files but leave a link to the files in the work directory for each task? (basically the inverse of the symlink option)
Luca Cozzuto
@lucacozzuto
Oct 26 2018 15:08 UTC
storeDir?
Paolo Di Tommaso
@pditommaso
Oct 26 2018 15:27 UTC
Use mode 'link' provided files are in the same storage https://www.nextflow.io/docs/latest/process.html#publishdir
Tobias "Tobi" Schraink
@tobsecret
Oct 26 2018 19:33 UTC
Hmmm, so if that folder is flushed, do the files get deleted? Looks like that might not be the case with mode 'link' since what's in those folders realistically is just a link. Or maybe I am not understanding hardlinks correctly
Tobias "Tobi" Schraink
@tobsecret
Oct 26 2018 19:41 UTC
Looks like storedir is the better alternative in this case. I mean I could also just use mode 'link'and pretend our cluster admins are flushing the directory when they're really just deleting a bunch of links but they'd probably be pretty mad when they found out.
Thanks for the help, folks!
Paolo Di Tommaso
@pditommaso
Oct 26 2018 21:31 UTC
an hardlink (or just link) is an additional entry in the file system table for the same file
if you delete it, the file will still there
Tobias "Tobi" Schraink
@tobsecret
Oct 26 2018 21:54 UTC
Yes, that was my intuition as well. I had forgotten about the storeDir directive, didn't know that it supported reruns
Krittin Phornsiricharoenphant
@sinonkt
Oct 26 2018 22:20 UTC
Hi again :), what is your local development environment look like, i have a hard time on mac so i decided to make this docker image to work with https://github.com/sinonkt/docker-centos7-singularity-nextflow, it’s just support your newly released Nextflow 18.0 and Singularity 3.0, what do you think of it. have i miss something.
not even try it yet.😆