BoPeng on master
Fix the display of global varia… Task monitor now honor walltime… (compare)
cdb813b50789fb95 completed
Created 9 hr ago
Started 5 min ago
Signature checked
TASK:
=====
bash('mem_bytes=$(</sys/fs/cgroup/memory/slurm/uid_${SLURM_JOB_UID}/job_${SLURM_JOB_ID}/memory.limit_in_bytes)\nmem_gbytes=$(( $mem_bytes / 1024 **3 ))\n\necho "Starting at $(date)"\necho "Job submitted to the ${SLURM_JOB_PARTITION} partition, the default partition on ${SLURM_CLUSTER_NAME}"\necho "Job name: ${SLURM_JOB_NAME}, Job ID: ${SLURM_JOB_ID}"\necho " I have ${SLURM_CPUS_ON_NODE} CPUs and ${mem_gbytes}GiB of RAM on compute node $(hostname)"\n')
TAGS:
=====
187c6b10c86049c7 default notebooks
GLOBAL:
=======
(<_ast.Module object at 0x7f6b8f19bb50>, {})
ENVIRONMENT:
============
__signature_vars__ {'bash'}
_depends []
_index 0
_input []
_output Unspecified
_runtime {'queue': 'yale_hpc_slurm',
'run_mode': 'interactive',
'sig_mode': 'default',
'verbosity': 2,
'walltime': '00:05:00',
'workdir': path('/data2/helen_mixed_infection/notebooks')}
step_name 'default'
workflow_id '187c6b10c86049c7'
EXECUTION STATS:
================
Duration: 0s
Peak CPU: 0.0 %
Peak mem: 36.7 MiB
execution script:
================
#!/bin/bash
#SBATCH --time=00:05:00
#SBATCH --nodes=1 --ntasks=1 --cpus-per-task=2
#SBATCH --mem-per-cpu=5G
#SBATCH --job-name=cdb813b50789fb95
#SBATCH --output=/home/pgc29/.sos/tasks/cdb813b50789fb95.out
#SBATCH --error=/home/pgc29/.sos/tasks/cdb813b50789fb95.err
cd /data2/helen_mixed_infection/notebooks
sos execute cdb813b50789fb95 -v 2 -s default -m interactive
standard output:
================
Starting at Wed Oct 7 05:49:27 EDT 2020
Job submitted to the general partition, the default partition on farnam
Job name: cdb813b50789fb95, Job ID: 31767135
I have 2 CPUs and 10GiB of RAM on compute node c23n12.farnam.hpc.yale.internal
standard error:
================
/var/spool/slurmd/job31767135/slurm_script: line 8: cd: /data2/helen_mixed_infection/notebooks: No such file or directory
INFO: cdb813b50789fb95 started
sos status
?
%task status
magic. However, because currently %run -q
magic is blocking, that magic would not be run until after the end of %run
.
%run
not blocking after the tasks are all submitted.
sos status
(because %task
is blocked)... definitely something need to be improved. Please feel free to submit tickets for problems and feature requests so that we know what the "pain points" are.
Yes, it will take some configuration and time to use to, and you might want to add
module load {" ".join(modules)}
to your template that allows you to do
task: modules=['module1', 'module2']
to load particular modules for the scripts in the task. Again, feel free to let me know if you get into trouble so that we can make this process as easy and error-proof as possible.
Okay, now I'm trying to run a real job but having an issue. Per your suggestion I set a scratch path for my local machine as scratch: /data2
and for the cluster as scratch: /home/pgc29/scratch60
. Then I try and run
%run test -q 'yale_hpc_slurm'
[test]
input: f'/data2/helen_mixed_infection/dataraw/R9994_CATCAAGT_S35_L001_R1_001.fastq.gz',
f'/data2/helen_mixed_infection/dataraw/R9994_CATCAAGT_S35_L001_R2_001.fastq.gz'
output: f'/data2/helen_mixed_infection/data/tb-profiler/results/R9994_CATCAAGT_S35_L001.results.json'
task: walltime='00:15:00', mem='2G', workdir='#scratch/helen_mixed_infection/data/tb-profiler'
run: expand=True
module load miniconda
conda activate tbprofiler
cd /data2/helen_mixed_infection/data/tb-profiler
tb-profiler profile -1 {_input[0]} -2 {_input[1]} -p R9994_CATCAAGT_S35_L001
It ends up hanging forever with INFO: Waiting for the completion of 1 task.
sos status b5eb33955aee5a77 -v4
gives
b5eb33955aee5a77 submitted
Created 33 min ago
TASK:
=====
run(fr"""module load miniconda
conda activate tbprofiler
cd /data2/helen_mixed_infection/data/tb-profiler
tb-profiler profile -1 {_input[0]} -2 {_input[1]} -p R9994_CATCAAGT_S35_L001
""")
TAGS:
=====
1a2a7669047f0dc5 notebooks test
GLOBAL:
=======
(<_ast.Module object at 0x2b96dfcc8ca0>, {})
ENVIRONMENT:
============
__signature_vars__ {'_input', 'run'}
_depends []
_index 0
_input [file_target('/data2/helen_mixed_infection/dataraw/R9994_CATCAAGT_S35_L001_R1_001.fastq.gz'), file_target('/data2/helen_mixed_infection/dataraw/R9994_CATCAAGT_S35_L001_R2_001.fastq.gz')]
_output [file_target('/data2/helen_mixed_infection/data/tb-profiler/results/R9994_CATCAAGT_S35_L001.results.json')]
_runtime {'mem': 2000000000,
'queue': 'yale_hpc_slurm',
'run_mode': 'interactive',
'sig_mode': 'default',
'verbosity': 2,
'walltime': '00:15:00',
'workdir': path('/data2/helen_mixed_infection/notebooks')}
step_name 'test'
workflow_id '1a2a7669047f0dc5'
b5eb33955aee5a77.sh:
====================
#!/bin/bash
#SBATCH --time=00:15:00
#SBATCH --nodes=1
#SBATCH --mem-per-cpu=5G
#SBATCH --job-name=b5eb33955aee5a77
#SBATCH --output=/home/pgcudahy/.sos/tasks/b5eb33955aee5a77.out
#SBATCH --error=/home/pgcudahy/.sos/tasks/b5eb33955aee5a77.err
cd /data2/helen_mixed_infection/notebooks
sos execute b5eb33955aee5a77 -v 2 -s default -m interactive
b5eb33955aee5a77.job_id:
========================
job_id: 31773933
But when I run on the cluster sacct -j 31773933
sacct -j 31773933
JobID JobName Partition Account AllocCPUS State ExitCode
------------ ---------- ---------- ---------- ---------- ---------- --------
31773933 b5eb33955+ general cohen_the+ 1 FAILED 1:0
31773933.ba+ batch cohen_the+ 1 FAILED 1:0
31773933.ex+ extern cohen_the+ 1 COMPLETED 0:0
/data2/helen_mixed_infection/notebooks
to /home/pgc29/scratch60/helen_mixed_infection/notebooks
but b5eb33955aee5a77.sh has cd /data2/helen_mixed_infection/notebooks
[test]
input: f'#scratch/helen_mixed_infection/dataraw/R9994_CATCAAGT_S35_L001_R1_001.fastq.gz',
f'#scratch/helen_mixed_infection/dataraw/R9994_CATCAAGT_S35_L001_R2_001.fastq.gz'
output: f'#scratch/helen_mixed_infection/data/tb-profiler/results/R9994_CATCAAGT_S35_L001.results.json'
task: walltime='00:15:00', mem='2G', workdir='#scratch/helen_mixed_infection/data/tb-profiler'
run: expand=True
module load miniconda
conda activate tbprofiler
tb-profiler profile -1 {_input[0]} -2 {_input[1]} -p R9994_CATCAAGT_S35_L001
#scratch
execution script:
================
#!/bin/bash
#SBATCH --time=00:15:00
#SBATCH --nodes=1
#SBATCH --mem-per-cpu=5G
#SBATCH --job-name=a904366ca2a493bc
#SBATCH --output=/home/pgc29/.sos/tasks/a904366ca2a493bc.out
#SBATCH --error=/home/pgc29/.sos/tasks/a904366ca2a493bc.err
cd #scratch/helen_mixed_infection/data/tb-profiler
/home/pgc29/.local/bin/sos execute a904366ca2a493bc -v 2 -s default -m interactive
Hello, I have what I think is a simple question but haven't been able to get it to work. I'm processing genomes that are a mix of single-ended and paired-end reads. Before mapping them to a reference the commands are different between single and paired, so I have two parallel pipelines. But after mapping them, they're all bam
files and I'd like to continue processing using just one pipeline. To show you what I mean, first I grab all the fastq files and join the ones that are paired. The filenames have the sample name followed by "_R1.fastq.gz" or "_R2.fastq.gz" to indicate a forward or reverse read.
[global]
import glob
import itertools
import os
fastq_files = sorted(glob.glob("/data/*.fastq.gz"))
grouped_fastq_dict = dict()
for k, v in itertools.groupby(fastq_files, lambda a: os.path.split(a)[1].split("_R", 1)[0]):
grouped_fastq_dict[k] = list(v)
single_read, paired_read = dict(), dict()
for k,v in grouped_fastq_dict.items():
if len(v) == 1:
single_read[k] = v
elif len(v) == 2:
paired_read[k] = v
else:
print(f'Error: {k} has < 1 or more than 2 associated fastq files')
Then I process them and map them to a reference
[trimmomatic-single]
input: single_read, group_by=1
output: trim_single = f'/data/{_input.labels[0]}/{_input:bnn}_trimmed.fastq.gz'
run: expand=True
trimmomatic SE -phred33 {_input} {_output} LEADING:10 TRAILING:10 SLIDINGWINDOW:4:16 MINLEN:40
[trimmomatic-paired]
input: paired_read, group_by=2
output: trim_paired_1=f'/data/{_input.labels[0]}/{_input[0]:bnn}_trimmed.fastq.gz',
trim_unpaired_1=f'/data/{_input.labels[0]}/{_input[0]:bnn}_trimmed_unpaired.fastq.gz',
trim_paired_2=f'/data/{_input.labels[0]}/{_input[1]:bnn}_trimmed.fastq.gz',
trim_unpaired_2=f'/data/{_input.labels[0]}/{_input[1]:bnn}_trimmed_unpaired.fastq.gz'
run: expand=True
trimmomatic PE -phred33 {_input} {_output["trim_paired_1"]} {_output["trim_unpaired_1"]} \
{_output["trim_paired_2"]} {_output["trim_unpaired_2"]} LEADING:10 TRAILING:10 SLIDINGWINDOW:4:16 MINLEN:40
[map-single]
input: output_from("trimmomatic-single"), group_by=1
output: bam = f'/data/{_input.name.split("_R")[0]}_GCF_000195955.2_filtered_sorted.bam'
id=_input.name.split("_R")[0]
rg=f'\"@RG\\tID:{id}\\tPL:Illumina\\tSM:{id}\"'
run: expand=True
bwa mem -v 3 -Y -R {rg} {reference} {_input} | samtools view -bu - | \
samtools sort -T /data2/helen_mixed_infection/data/bam/tmp.{id} -o {_output}
[map-paired]
input: output_from("trimmomatic-paired")["trim_paired_1"], output_from("trimmomatic-paired")["trim_paired_2"], group_by="pairs"
output: bam = f'/data/{_input["trim_paired_1"].name.split("_R")[0]}_GCF_000195955.2_filtered_sorted.bam'
id=_input["trim_paired_1"].name.split("_R")[0]
rg = f'\"@RG\\tID:{id}\\tPL:Illumina\\tSM:{id}\"'
run: expand=True
bwa mem -v 3 -Y -R {rg} {reference} {_input} | samtools view -bu - | \
samtools sort -T /data2/helen_mixed_infection/data/bam/tmp.{id} -o {_output}
But now I want to combine the output of the two parallel pipelines into the next step
[duplicate_marking]
input: output_from("map-single"), output_from("map-paired"), group_by=1
output: dedup=f'{_input:n}_dedup.bam'
bash: expand=True
export JAVA_OPTS='-Xmx3g'
picard MarkDuplicates I={_input} O={_output} M={_output:n}.duplicate_metrics \
REMOVE_DUPLICATES=false ASSUME_SORT_ORDER=coordinate
But SoS complains because the output from map-single
and map-paired
are of different lengths. How can I use the output from both steps as the input to my step duplicate-marking
?
Running sos run test combined
with test.sos
having the following workflow,
[single]
input: for_each=dict(i=range(2))
output: f'single_{i}.bam'
_output.touch()
[double]
input: for_each=dict(i=range(2))
output: f'single_{i}.bam'
_output.touch()
[combined]
input: output_from('single'), output_from('double')
print(_input)
You will see that the two groups from single
and double
are combined to form two groups with one output from single
and one output from double
.
single]
input: for_each=dict(i=range(2))
output: f'single_{i}.bam'
_output.touch()
[double]
input: for_each=dict(i=range(3))
output: f'single_{i}.bam'
_output.touch()
[combined]
input: output_from('single', group_by='all'), output_from('double', group_by='all'), group_by=1
print(_input)
basically "flatten" and join both output_from
into a single group before separating them into groups with one file (group_by=
).
ERROR: ERROR workflow_executor.py:1206 - Failed to connect to yale_hpc_slurm: ssh connection to pgc29@xxx.xxx.xxx.xxx time out with prompt: b'' - None
errors. Is there a way to adjust the timeout to make it longer?
remote
inputs and outputs and SoS not noticing changed files, so ignoring steps with saved signatures. Where exactly are signatures stored for jobs run remotely and how can I clear them? I've tried !sos remove -s
from within my notebook, but I still get steps skipped.
workdir
is set) and certainly needs improvement.
%run check_validation -q yale_hpc_task_spooler -r yale_hpc_task_spooler
) is there a way to synchronize the output back to my local computer? Using a named path like #scratch
fails with WARNING: Error from step check_validation is ignored: [check_validation]: Failed to process step output (f'#scratch/helen_mixed_infection/data/fqtools/good_files.txt'): 'NoneType' object has no attribute 'expanduser'