These are chat archives for nextflow-io/nextflow

23rd
Jun 2016
Johan Viklund
@viklund
Jun 23 2016 07:53
When I run nextflow run <repo> -r branch I get the error "Not a valid Nextflow project" error
because it can't find the main.nf file
the file is in the base-dir of the repo but only in the branch specified on the command line
is the check done before the correct branch is chosen?
Johan Viklund
@viklund
Jun 23 2016 08:00
I think this is line should be different somehow:
for that to work...
should probably be something like "${config.endpoint}/repos/$project/contents/$branch/$path"
but I'm not that familiar with the github api
Emilio Palumbo
@emi80
Jun 23 2016 08:09
Hi @viklund, I think for Github api it should be something like "${config.endpoint}/repos/$project/contents/$path?ref=$branch"
to solve this you could also set branch as the default branch of your Github repo
but I guess this isn't exactly what you want to do...
Rickard Hammarén
@Hammarn
Jun 23 2016 08:21

Hi! We are thinking about setting up a check on the alignment rate for our star process in our pipeline and wondered about the way to do it. Basically we want to be able to throw a error/ print a message if the alignment rate is below 10% . @ewels quickly wrote something to check for that:

def percent_aligned = 0;
new File(logFinal).eachLine { line ->
    if ((matcher = line =~ /Uniquely mapped reads %\s*|\s*([\d\.]+)%/)) {
        percent_aligned = matcher[0][1]
    }
}
if(percent_aligned <= 10){
  error "Very poor alignment rate for ${file}! Only ${percent_aligned}%"
}

We thought about using the afterScript directive but I suspect it won't take multiple lines of code. We don't want to continue analysing a sample which such low alignment rate but we want the other samples to continue. I could see that having a process with a chanel that collects all the STAR output files might work for generating the warning but not for canceling downstream analysis of the sample.
Quite a lengthy post. To summarise my question? 1) where to execute the code? 2) how to stop one sample but not the entire pipeline?

Johan Viklund
@viklund
Jun 23 2016 08:22
I just wanted to test before making a pull request.
@Hammarn use a filter on the channel?
Rickard Hammarén
@Hammarn
Jun 23 2016 08:31
What do you suggest?
Johan Viklund
@viklund
Jun 23 2016 08:41
I haven't tried this, but if you wrap the code you have up there in a function that returns a boolean you should be able to do something like
starFiles.filter { alignment_rate_ok( it ) }.set( goodStarFiles )
given that you have all files from the start process in the starFiles channel and that your next step takes the goodStarFiles channel
hmm, but you are checking the log files right?
Johan Viklund
@viklund
Jun 23 2016 08:47
that was a bit trickier
in that case
Rickard Hammarén
@Hammarn
Jun 23 2016 08:52
Yeah, I'm looking at the Star log file xxx.Log.final.out
Johan Viklund
@viklund
Jun 23 2016 08:53
and you want to move xxx.smth through the channel?
In that case I would probably do the check in bash in the job and if the alignment rate is too low, rename the output file to xxx.SKIP and then have a filter that removed all files with the name SKIP in them.
it still has to match the output file glob in the job, otherwise the job will fail because it can't find output files. You could of course ignore all errors, but then you won't get information on failed runs because of some other reason.
Rickard Hammarén
@Hammarn
Jun 23 2016 09:04
That sounds like neat solution, thanks!
Johan Viklund
@viklund
Jun 23 2016 09:04
yw
Phil Ewels
@ewels
Jun 23 2016 09:09
The way I want this to work is to have a new errorStategy which doesn't publish that process step's files to the channels - kind of like ignore but removing any output
Johan Viklund
@viklund
Jun 23 2016 09:09
if you delete the files at the end of the process, that should be what happens, right?
Phil Ewels
@ewels
Jun 23 2016 09:10
Yeah, but then you can't find out why it failed :)
Johan Viklund
@viklund
Jun 23 2016 09:10
meh
Phil Ewels
@ewels
Jun 23 2016 09:10
I only want to remove the output from the channels so that the downstream processes don't work on these files
Which is exactly what your code above would do, but somehow it feels like a lot of hoops to jump through
Johan Viklund
@viklund
Jun 23 2016 09:11
maybe you can have a dynamic output directive?
Phil Ewels
@ewels
Jun 23 2016 09:11
Sorry, I shouldn't complain - thanks for your suggestion, it is appreciated and we'll probably use it :)
though output is probably not a directive in that sense
no, it's not
Paolo Di Tommaso
@pditommaso
Jun 23 2016 09:31
I think a filter operator could work for your use case
Phil Ewels
@ewels
Jun 23 2016 09:37
Yup, thanks @pditommaso - that's what @viklund suggested further up (apologies, been spamming with quite a lot of messages)
Though slightly complex as we want to filter the .bam file channel using stats which are captured in the .Log.final channel
Paolo Di Tommaso
@pditommaso
Jun 23 2016 09:41
I see, maybe using Picard can help
otherwise you can try to embed that check in the process itself or to have a downstream process to handle the filtering
Johan Viklund
@viklund
Jun 23 2016 09:42
is it possible to have the output be something like set file(*.log), file(*.bam)?
then you could filter on the channel
Paolo Di Tommaso
@pditommaso
Jun 23 2016 09:43
yes
actually set file('*.log'), file('*.bam') ...
you will get a tuple in which the first items is the list of log files and the second the bam
Phil Ewels
@ewels
Jun 23 2016 09:50
ok, yeah that could work - then filter that channel and return just the bams
And we can still have a second set file('*.log') in addition to that, right? As we need it for MultiQC as well
(and would like to include it in the MultiQC report despite the fact that we're halting downstream analysis)
Paolo Di Tommaso
@pditommaso
Jun 23 2016 09:54
yes, you can have
channel
  .filter { logs, bams -> ... } 
  .map {  logs, bams -> bams } 
  .set { newChannel }
Paolo Di Tommaso
@pditommaso
Jun 23 2016 10:01
(leaving now, out of office these days)
Rickard Hammarén
@Hammarn
Jun 23 2016 10:01
Thanks for the help
Phil Ewels
@ewels
Jun 23 2016 10:01
Thanks for the suggestion / help! :+1:
Paolo Di Tommaso
@pditommaso
Jun 23 2016 10:02
:+1:
Mike Smoot
@mes5k
Jun 23 2016 21:42
Does anyone have an example of using the OrderBy comparator as the sort option in the groupTuple operator? Or an example using any comparator in groupTuple? I'm having a hard time working out the syntax here.
Paolo Di Tommaso
@pditommaso
Jun 23 2016 21:47
Check at this tests
you need to specify a closure that can be either a comparator or just retuning to element to be sorted
Mike Smoot
@mes5k
Jun 23 2016 21:51
Perfect, it was the "as Comparator" bit that I was missing! Thank you so much!
Paolo Di Tommaso
@pditommaso
Jun 23 2016 21:51
Great, welcome