mr-c on main
Prevent duplicated fields from … (compare)
mr-c on inherit-doc
kinow on inherit-doc
Prevent duplicated fields from … (compare)
mr-c on main
Add support for DocFx doc gener… (compare)
doc:
attributes that are defined as lists (e.g. on field definitions or record type definitions) - or is this only valid for type: documentation
? If it is a bug, I can fix it - I just wanted to verify it is a bug
id: input
type:
inputBinding:
position: 3
valueFrom: c
type: record
fields:
- name: input_field
type: string?
inputBinding:
position: 2
valueFrom: b
name: input
Hi everyone, just wondering if there's a way to slightly rename an output file after the tool is run. Essentially, GATK MergeSamFiles is generating the index as ^.bai
, but I need it as .bai
on input, and then on the glob as well.I could manually brute force some mv
statements in there with &&
and shell=True
, but less keen if there's a better way.
The more tools I explore, the more confused I am that there seems to be no consistency or standard in this respect.
Hi @mr-c , I've been playing with rewriting the secondaryFiles, but inside the output glob it looks like there isn't a secondaryFiles property.
cwlVersion: v1.0
class: CommandLineTool
baseCommand: "ls"
inputs:
bam:
type: File
secondaryFiles: ["^.bai"]
outputs:
std: stdout
out:
type: File
secondaryFiles: ["^.bai"]
outputBinding:
glob: $(inputs.bam.basename)
outputEval: |
${
console.log(self)
self[0].secondaryFiles[0].basename=".bai"
}
requirements:
InitialWorkDirRequirement:
listing:
- $(inputs.bam)
InlineJavascriptRequirement: {}
(Formatted error)
('Error collecting output for parameter \'out\':Expression evaluation error:Expecting value: line 1 column 1 (char 0)script was:
01 "use strict";
02 var inputs = {
03 "bam": {
04 "class": "File",
05 "location": "file:///Users/franklinmichael/Desktop/tmp/bamsplit/BRCA1.bam",
06 "size": 2997846,
07 "basename": "BRCA1.bam",
08 "nameroot": "BRCA1",
09 "nameext": ".bam",
10 "secondaryFiles": [
11 {
12 "location": "file:///Users/franklinmichael/Desktop/tmp/bamsplit/BRCA1.bai",
13 "basename": "BRCA1.bai",
14 "class": "File",
15 "nameroot": "BRCA1",
16 "nameext": ".bai",
17 "path": "/private/tmp/docker_tmplo7vt97z/BRCA1.bai",
18 "dirname": "/private/tmp/docker_tmplo7vt97z"
19 }
20 ],
21 "path": "/private/tmp/docker_tmplo7vt97z/BRCA1.bam",
22 "dirname": "/private/tmp/docker_tmplo7vt97z"
23 }
24 };
25 var self = [
26 {
27 "location": "file:///private/tmp/docker_tmplo7vt97z/BRCA1.bam",
28 "path": "/private/tmp/docker_tmplo7vt97z/BRCA1.bam",
29 "basename": "BRCA1.bam",
30 "nameroot": "BRCA1",
31 "nameext": ".bam",
32 "class": "File",
33 "checksum": "sha1$8a38a4e8d58c91d9aaca8cdc739e05eddf7dec1d",
34 "size": 2997846
35 }
36 ];
37 var runtime = {
38 "cores": 1,
39 "ram": 1024,
40 "tmpdirSize": 1024,
41 "outdirSize": 1024,
42 "tmpdir": "/private/var/folders/jz/y9gqxt_s7jxcjkc26gr71ywr7zs5yz/T/tmpoer80xbz",
43 "outdir": "/private/tmp/docker_tmplo7vt97z"
44 };
45 (function(){
46 console.log(self)
47 self[0].secondaryFiles[0].basename=".bai"
48 })()stdout was: \'\'stderr was: \'evalmachine.<anonymous>:
47 self[0].secondaryFiles[0].basename=".bai" ^TypeError: Cannot read property \'0\' of undefined at evalmachine.<anonymous>:47:
27 at evalmachine.<anonymous>:48:3 at Script.runInContext (vm.js:107:20) at Script.runInNewContext (vm.js:113:17) at Object.runInNewContext (vm.js:296:38) at Socket.<anonymous> ([eval]:11:57) at Socket.emit (events.js:182:13) at addChunk (_stream_readable.js:283:12) at readableAddChunk (_stream_readable.js:260:13) at Socket.Readable.push (_stream_readable.js:219:10)\'', {})
My guess is this is because the "outputEval" block is before the secondaryFiles block
I checked out the source code and moved the outputEval block below the secondary files, and this works.
outputEval
, including attaching the index to the primary file in a synthesized secondaryFiles
property. Then hopefully this full object will be passed without problems through the secondaryFiles: .bai
processing stage.
cwltool
and I'd have to think on it more to see if it is compliant with the standard. Maybe @tetron has an idea.
secondaryFiles
field of your output and not use outputEval
, as you are allowed to return a File
object and thus can do the regular renaming trick
I'm happy to use a CWL expression to do this, but I don't quite understand what I should return if:
myfile.bam
myfile.bai
is present in the output directorymyfile.bam.bai
This would ensure it's correctly localised in future steps. But maybe I don't quite understand how secondary files are passed around in CWL especially CWLTool
Handling bam + bai|csi
and cram + crai
for a single tool gets complicated. We ended up having to define multiple CWLs:
https://github.com/cancerit/dockstore-cgpmap/tree/develop/cwls
More elegant solutions would be preferable, but we have to ensure they are also compatible (or will be compatible) with the Dockstore registry.
csi
is not going to be that much of an issue for us. CRAM with modified split size (1k, instead of 10k) is as fast and sometimes faster than BAM to parse now (with samtools/htslib). If I was tied to BAM (legacy tools can be a pain), and setting up something new I'd certainly go with csi
.