Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
  • Apr 09 19:51
    gaow commented #1419
  • Apr 09 18:13
    BoPeng commented #1419
  • Apr 09 17:49
    gaow commented #1419
  • Apr 09 17:39
    BoPeng commented #1419
  • Apr 09 13:25
    gaow reopened #1419
  • Apr 09 13:25
    gaow commented #1419
  • Feb 26 20:09
    BoPeng labeled #1437
  • Feb 26 20:09
    BoPeng assigned #1437
  • Feb 26 20:09
    BoPeng opened #1437
  • Feb 26 20:07
    BoPeng commented #1435
  • Feb 26 20:07
    BoPeng closed #1435
  • Feb 26 20:00
    BoPeng closed #1436
  • Feb 26 19:51

    BoPeng on master

    Fix the display of global varia… Task monitor now honor walltime… (compare)

  • Feb 26 19:51
    BoPeng assigned #1436
  • Feb 26 19:51
    BoPeng opened #1436
  • Feb 22 04:42
    BoPeng labeled #1424
  • Feb 22 04:42
    BoPeng assigned #1424
  • Feb 22 04:41
    BoPeng labeled #1435
  • Feb 22 04:41
    BoPeng assigned #1435
  • Feb 22 04:41
    BoPeng commented #1435
Bo
@BoPeng
@N7DR Let us see, with your script
[A]
input:
output: f'constant-file-name.txt'
parameter: n_mx = int
bash: expand=True

  echo {n_mx} > constant-file-name.txt

[B]
depends: sos_step('A')
input: f'constant-file-name.txt'
output:
parameter: n_mx = int
bash: expand=True

  echo cat {_input}
The first time it seems to be ok
 ✗ sos run test B  --n-mx 60
INFO: Running A:
INFO: A is completed.
INFO: A output:   constant-file-name.txt
INFO: Running B:
cat constant-file-name.txt
INFO: B is completed.
INFO: Workflow B (ID=36b8cf2aec411212) is executed successfully with 2 completed steps.
Second time with the same parameter, ok
✗ sos run test B  --n-mx 60
INFO: Running A:
INFO: A (index=0) is ignored due to saved signature
INFO: A output:   constant-file-name.txt
INFO: Running B:
INFO: B (index=0) is ignored due to saved signature
INFO: Workflow B (ID=36b8cf2aec411212) is ignored with 2 ignored steps.
With a different parameter, there seems to be a bug
✗ sos run test B  --n-mx 90
INFO: Running A:
INFO: A (index=0) is ignored due to saved signature
INFO: A output:   constant-file-name.txt
INFO: Running B:
INFO: B (index=0) is ignored due to saved signature
INFO: Workflow B (ID=0403969ddecaa791) is ignored with 2 ignored steps.
which you can for now get around with option -s force (-s signature)
✗ sos run test B  --n-mx 90 -s force
INFO: Running A:
INFO: A is completed.
INFO: A output:   constant-file-name.txt
INFO: Running B:
cat constant-file-name.txt
INFO: B is completed.
INFO: Workflow B (ID=0403969ddecaa791) is executed successfully with 2 completed steps.
Bo
@BoPeng

The case that has been implemented and tested is for parameter: to be defined in the global section,

[global]
parameter: n_mx=int

[A]
input:
output: f'constant-file-name.txt'
bash: expand=True

  echo {n_mx} > constant-file-name.txt

[B]
depends: sos_step('A')
input: f'constant-file-name.txt'
output:
bash: expand=True

  echo cat {_input}

and the signatures are correctly handled

(sos) ➜  demo git:(master) ✗ sos run test B  --n-mx 90 -s force
INFO: Running A:
INFO: A is completed.
INFO: A output:   constant-file-name.txt
INFO: Running B:
cat constant-file-name.txt
INFO: B is completed.
INFO: Workflow B (ID=d91e38138c007847) is executed successfully with 2 completed steps.
(sos) ➜  demo git:(master) ✗ sos run test B  --n-mx 60 -s force
INFO: Running A:
INFO: A is completed.
INFO: A output:   constant-file-name.txt
INFO: Running B:
cat constant-file-name.txt
INFO: B is completed.
INFO: Workflow B (ID=efc47f8da932e0f3) is executed successfully with 2 completed steps.
Bo
@BoPeng
This is now #1372
Bo
@BoPeng
@N7DR @1372 is fixed and I have released sos 0.21.7 for it.
D. R. Evans
@N7DR
When I get a moment I'll figure out how to update sos (easy, I expect, but I haven't done it before); sounds like you've done exactly what's needed, though. Thanks very much.
Bo
@BoPeng
pip install sos -U should do the trick.
D. R. Evans
@N7DR
Thank you! So helpful, sir.
D. R. Evans
@N7DR

I am running afoul of the new warning: WARNING: Embedding script "..." without indentation is error-prone and will be deprecated in the future.

How do I reformat the following step so as not to receive the warning?

[plot_classical_phase_at_same_height]
depends: sos_step('calculate_classical_phase_at_same_height')
input: f'phase-0.txt'
output: f'phase-0.png', f'phase-0.gplt', f'phase-0-diff.png'
bash: expand=True

cat << 'EOF' > phase-0.gplt
set terminal png
set output "phase-0.png"

set key off

#set xrange [180:0]

set title "Direct/Reflected Phase Difference"

set xlabel "log10(d(λ))"
set ylabel "Phase Difference(°)"

#set errorbars small

plot "phase-0.txt" using (log10($6)):14 with lines

set output "phase-0-diff.png"
set ylabel "abs(Phase Difference - 180°)"

plot "phase-0.txt" using (log10($6)):(abs($14-180)) with lines

EOF

gnuplot phase-0.gplt

[I note in passing that I would use the gnuplot kernel directly, except: (1) I don't think SoS supports that kernel; (2) there is a bug somewhere that causes the gnuplot kernel, at least on debian stable, not to handle Unicode correctly: https://github.com/has2k1/gnuplot_kernel/issues/21). In the absence of the gnuplot kernel, I think I have to do something like the above.]

Bo
@BoPeng
Add indentation will suppress the warning. Depending on the editor you use, this can be trivial or quite troublesome.
The warning was added due to vatlab/sos#1363
Bo
@BoPeng
SoS supports all kernels, just that there is no variable exchange for kernels of unsupported languages. https://vatlab.github.io/sos-docs/doc/user_guide/expand_capture_render.html
D. R. Evans
@N7DR
I really don't understand this whole indentation issue, even though you've tried to explain it :-( The cognitive dissonance I experience when having to indent shell code not at all trivial :-( Isn't there some other way to write bash code so that it doesn't have to be indented... it feels like going back to FORTRAN IV :-) How about allowing something like:
D. R. Evans
@N7DR

Stupid thing was in chat mode... I hate the way it does that, basically requiring one to remember to check which mode it's in... anyway, as I was saying, how about allowing something like:

bash: expand=True, end='FRED'

cat << 'EOF' > phase-0.gplt
set terminal png
set output "phase-0.png"

set key off

#set xrange [180:0] etc. etc.

plot "phase-0.txt" using (log10($6)):(abs($14-180)) with lines

EOF
gnuplot phase-0.gplt
FRED

so that one can write the bash script without indentation up until one hits the string 'FRED' on a line by itself. ['FRED', of course, could be any string one likes, defined by the parameter to "end=" on the "bash:" line.]

D. R. Evans
@N7DR

SoS supports all kernels, just that there is no variable exchange for kernels of unsupported languages

I had tried the following:

[C]
input:
output:
parameter:
gnuplot:

plot "phase-0.txt" using (log10($6)):14 with lines

But running the step produced: NameError: name 'gnuplot' is not defined, so I thought that that meant that I couldn't use the gnuplot kernel. What do I need to change in order for it to work? (Although it still wouldn't be usable for many of my plots anyway, because of the Unicode problem.)

Bo
@BoPeng
You are using sos kernel, not gnuplot kernel if there is such a thing. The sh: stuff is a sos function/action. Since did does not provide one, you can use
script:
Bo
@BoPeng
There is actually a gnuplot kernel. https://github.com/has2k1/gnuplot_kernel If you use it, you can use SoS Notebook, and run the script directly in the kernel.

If you are using SoS workerlow, in a SoS cell, gnuplot: action is not defined, but you can try

run:
    #!/bin/env gnuplot
    script...

which will use gnuplot command to run the script. Or you can do

script: args='gnuplot {filename}'
    script ...

to specify the interpreter. see https://vatlab.github.io/sos-docs/doc/user_guide/sos_actions.html#Option--args for details.

D. R. Evans
@N7DR

There is actually a gnuplot kernel. https://github.com/has2k1/gnuplot_kernel

Yes, that's part of what I was trying to get across :-)

I'll try your first suggestion above; that seems cleaner in the absence of explicit support for the gnuplot kernel.

Bo
@BoPeng
I saw a message that was later deleted. I think https://vatlab.github.io/sos-docs/doc/user_guide/sos_actions.html#Option-template-and-template_name was what was asked.
Patrick Cudahy
@pgcudahy

Hello, I'm trying to set up a remote host for my university cluster but having trouble getting started
My hosts.yml is

localhost: macbook
hosts:
  yale_farnam:
    address: farnam.hpc.yale.edu
    paths:
      home: /home/pgc29/scratch60
  macbook:
    address: 127.0.0.1
    paths:
      home: /Users/pgcudahy

But when I run something simple like

%run -r yale_farnam -c ~/.sos/hosts.yml
sh:
    echo Working on `pwd` of $HOSTNAME

I get ERROR: Failed to connect to yale_farnam: pgcudahy@farnam.hpc.yale.edu: Permission denied (publickey).

I also tried sos remote setup but get the error INFO: scp -P 22 /Users/pgcudahy/.ssh/id_rsa.pub farnam.hpc.yale.edu:id_rsa.pub.yale_farnam ERROR: Failed to copy public key to farnam.hpc.yale.edu

Perhaps the issue is that my usernames for my computer and the cluster are different. A simple ssh pgc29@farnam.hpc.yale.edu works for me, but ssh farnam.hpc.yale.edu does not. The scp command referenced in the error message doesn't prepend a username to the cluster's domain name. Any help on how to move forward would be great. Thanks

Bo
@BoPeng
Could you try to change address: farnam.hpc.yale.edu to address: pgc29@farnam.hpc.yale.edu
Patrick Cudahy
@pgcudahy
Ah, thanks. Of course the cluster just went down for maintenance until Wednesday! I'll try it as soon as it's back up.
Bo
@BoPeng
It is a cluster, then you will have to install sos on it, set sos: /path/to/sos in the config file (to avoid changing $PATH on server), and then add a template to submit jobs. I usually have two hosts for headnode and cluster in case I want to run something on the headnode. Let me know if you encounter any problem.
Bo
@BoPeng
BTW -c ~/.sos/hosts.yml is not needed. That file is always read.
Patrick Cudahy
@pgcudahy
Adding my username to the address worked well, thanks! My cluster recommends installing pip modules like sos with anaconda, but I couldn't figure out how to get my local notebook to access sos in the remote conda environment. Instead I was able to install sos with pip install --user sos and then change my PYTHONPATH in ~/.bash_profile
Patrick Cudahy
@pgcudahy

Okay, so now that I can directly run remote jobs, I'm trying to get SLURM set up. The cluster documentation has this example task that I want to replicate

#!/bin/bash
#SBATCH --job-name=example_job
#SBATCH --out="slurm-%j.out"
#SBATCH --time=01:00
#SBATCH --nodes=1 --ntasks=1 --cpus-per-task=2
#SBATCH --mem-per-cpu=5G
#SBATCH --mail-type=ALL

mem_bytes=$(</sys/fs/cgroup/memory/slurm/uid_${SLURM_JOB_UID}/job_${SLURM_JOB_ID}/memory.limit_in_bytes)
mem_gbytes=$(( $mem_bytes / 1024 **3 ))

echo "Starting at $(date)"
echo "Job submitted to the ${SLURM_JOB_PARTITION} partition, the default partition on ${SLURM_CLUSTER_NAME}"
echo "Job name: ${SLURM_JOB_NAME}, Job ID: ${SLURM_JOB_ID}"
echo "  I have ${SLURM_CPUS_ON_NODE} CPUs and ${mem_gbytes}GiB of RAM on compute node $(hostname)"

My updated hosts.yml is now

%save ~/.sos/hosts.yml -f

localhost: ubuntu
hosts:
  ubuntu:
    address: 127.0.0.1
    paths:
      home: /data2
  yale_farnam:
    address: pgc29@farnam1.hpc.yale.edu
    paths:
      home: /home/pgc29/scratch60/
  yale_hpc_slurm:
    address: pgc29@farnam.hpc.yale.edu
    paths:
      home: /home/pgc29/scratch60/
    queue_type: pbs
    submit_cmd: sbatch {job_file}
    submit_cmd_output: "Submitted batch job {job_id}"
    status_cmd: squeue --job {job_id}
    kill_cmd: scancel {job_id}
    status_check_interval: 120
    max_running_jobs: 100
    max_cores: 200 
    max_walltime: "72:00:00"
    max_mem: 1280G
    task_template: |
        #!/bin/bash
        #SBATCH --time={walltime}
        #SBATCH --nodes=1
        #SBATCH --ntasks-per-node={cores}
        #SBATCH --job-name={task}
        #SBATCH --output=/home/{user_name}/.sos/tasks/{task}.out
        #SBATCH --error=/home/{user_name}/.sos/tasks/{task}.err
        cd {workdir}
        {command}

But when I run

%run -q yale_hpc_slurm 

bash:
    mem_bytes=$(</sys/fs/cgroup/memory/slurm/uid_${SLURM_JOB_UID}/job_${SLURM_JOB_ID}/memory.limit_in_bytes)
    mem_gbytes=$(( $mem_bytes / 1024 **3 ))

    echo "Starting at $(date)"
    echo "Job submitted to the ${SLURM_JOB_PARTITION} partition, the default partition on ${SLURM_CLUSTER_NAME}"
    echo "Job name: ${SLURM_JOB_NAME}, Job ID: ${SLURM_JOB_ID}"
    echo "  I have ${SLURM_CPUS_ON_NODE} CPUs and ${mem_gbytes}GiB of RAM on compute node $(hostname)"

It tries to run on my local machine and returns

INFO: Running default:

/tmp/tmp9o84b7hs.sh: line 1: /sys/fs/cgroup/memory/slurm/uid_/job_/memory.limit_in_bytes: No such file or directory
/tmp/tmp9o84b7hs.sh: line 2: / 1024 **3 : syntax error: operand expected (error token is "/ 1024 **3 ")
Starting at Wed Oct  7 04:14:57 EDT 2020
Job submitted to the  partition, the default partition on 
Job name: , Job ID: 
  I have  CPUs and GiB of RAM on compute node ubuntu

INFO: Workflow default (ID=187c6b10c86049c7) is executed successfully with 1 completed step.

I tried

%run -r yale_hpc_slurm

bash:
    mem_bytes=$(</sys/fs/cgroup/memory/slurm/uid_${SLURM_JOB_UID}/job_${SLURM_JOB_ID}/memory.limit_in_bytes)
    mem_gbytes=$(( $mem_bytes / 1024 **3 ))

    echo "Starting at $(date)"
    echo "Job submitted to the ${SLURM_JOB_PARTITION} partition, the default partition on ${SLURM_CLUSTER_NAME}"
    echo "Job name: ${SLURM_JOB_NAME}, Job ID: ${SLURM_JOB_ID}"
    echo "  I have ${SLURM_CPUS_ON_NODE} CPUs and ${mem_gbytes}GiB of RAM on compute node $(hostname)"

and got

ERROR: No workflow engine or invalid engine definition defined for host yale_hpc_slurm

Workflow exited with code 1

Where have I messed up my template? Thanks

Patrick Cudahy
@pgcudahy

Okay, got it to work with

%run -q 'yale_hpc_slurm'

task: walltime='00:05:00'
bash:
    mem_bytes=$(</sys/fs/cgroup/memory/slurm/uid_${SLURM_JOB_UID}/job_${SLURM_JOB_ID}/memory.limit_in_bytes)
    mem_gbytes=$(( $mem_bytes / 1024 **3 ))

    echo "Starting at $(date)"
    echo "Job submitted to the ${SLURM_JOB_PARTITION} partition, the default partition on ${SLURM_CLUSTER_NAME}"
    echo "Job name: ${SLURM_JOB_NAME}, Job ID: ${SLURM_JOB_ID}"
    echo "  I have ${SLURM_CPUS_ON_NODE} CPUs and ${mem_gbytes}GiB of RAM on compute node $(hostname)"

Not sure why I had to quote the remote host name. Also it does not produce a .out file in ~/.sos/tasks/ on the remote computer or my local computer, just job_id, pulse, task and sh files.

Bo
@BoPeng
ok, first, if you add sos: /path/to/sos, you do not need to set up PATH or PYTHONPATH on the cluster. In our experience it is better to leave $PATH on the cluster alone because your job might be using a different Python than the one sos uses.
Then, task is needed to define a portion of the step as external tasks. Using only bash would not work. Basically the template executes the task with sos execute, which can be bash, python, R and any other scripts...
third, the "Starting at $(data)" stuff usually belongs to the template as your notebook would focus on "real" stuff, not anything directly related to the cluster. There is a problem with interpolation of ${ } since sos expands { }, so you will have to use ${{ }} to avoid that.
mem_bytes is read from you job file. Actually sos provides variable mem which is exactly that.
Bo
@BoPeng
Finally, I do not see where you specify mem in your template, is it needed at all for your system?

Also, at least on our cluster, running stuff in $HOME is not recommended so I have things like

    cluster:
        paths:
            scratch: /path/to/scratch/

and

task: workdir='#scratch/project/etc'

to run the tasks under scratch directory.

Patrick Cudahy
@pgcudahy

Thanks, I guess my main question right now is when I run the example task

%run -q 'yale_hpc_slurm'

task: walltime='00:05:00', mem='1G'
bash:
    mem_bytes=$(</sys/fs/cgroup/memory/slurm/uid_${SLURM_JOB_UID}/job_${SLURM_JOB_ID}/memory.limit_in_bytes)
    mem_gbytes=$(( $mem_bytes / 1024 **3 ))

    echo "Starting at $(date)"
    echo "Job submitted to the ${SLURM_JOB_PARTITION} partition, the default partition on ${SLURM_CLUSTER_NAME}"
    echo "Job name: ${SLURM_JOB_NAME}, Job ID: ${SLURM_JOB_ID}"
    echo "  I have ${SLURM_CPUS_ON_NODE} CPUs and ${mem_gbytes}GiB of RAM on compute node $(hostname)"

The notebook reports

INFO: cdb813b50789fb95 submitted to yale_hpc_slurm with job id 31774151
INFO: Waiting for the completion of 1 task.
INFO: Workflow default (ID=187c6b10c86049c7) is executed successfully with 1 completed step and 1 completed task.

But there are no .out or .err files in ~/.sos/tasks

Just a .task and a .pulse
Bo
@BoPeng
ohmm, on the cluster if you run ``sos status cdb813b50789fb95 -v4, what do you see? sos "absorbs" these files into task file to keep the number of files low.
Patrick Cudahy
@pgcudahy
Ah, there it is. It says
cdb813b50789fb95        completed

Created 9 hr ago
Started 5 min ago
Signature checked
TASK:
=====
bash('mem_bytes=$(</sys/fs/cgroup/memory/slurm/uid_${SLURM_JOB_UID}/job_${SLURM_JOB_ID}/memory.limit_in_bytes)\nmem_gbytes=$(( $mem_bytes / 1024 **3 ))\n\necho "Starting at $(date)"\necho "Job submitted to the ${SLURM_JOB_PARTITION} partition, the default partition on ${SLURM_CLUSTER_NAME}"\necho "Job name: ${SLURM_JOB_NAME}, Job ID: ${SLURM_JOB_ID}"\necho "  I have ${SLURM_CPUS_ON_NODE} CPUs and ${mem_gbytes}GiB of RAM on compute node $(hostname)"\n')

TAGS:
=====
187c6b10c86049c7 default notebooks

GLOBAL:
=======
(<_ast.Module object at 0x7f6b8f19bb50>, {})

ENVIRONMENT:
============
__signature_vars__    {'bash'}
_depends              []
_index                0
_input                []
_output               Unspecified
_runtime              {'queue': 'yale_hpc_slurm',
 'run_mode': 'interactive',
 'sig_mode': 'default',
 'verbosity': 2,
 'walltime': '00:05:00',
 'workdir': path('/data2/helen_mixed_infection/notebooks')}
step_name             'default'
workflow_id           '187c6b10c86049c7'

EXECUTION STATS:
================
Duration:       0s
Peak CPU:       0.0 %
Peak mem:       36.7 MiB

execution script:
================
#!/bin/bash
#SBATCH --time=00:05:00
#SBATCH --nodes=1 --ntasks=1 --cpus-per-task=2
#SBATCH --mem-per-cpu=5G
#SBATCH --job-name=cdb813b50789fb95
#SBATCH --output=/home/pgc29/.sos/tasks/cdb813b50789fb95.out
#SBATCH --error=/home/pgc29/.sos/tasks/cdb813b50789fb95.err
cd /data2/helen_mixed_infection/notebooks
sos execute cdb813b50789fb95 -v 2 -s default -m interactive


standard output:
================
Starting at Wed Oct  7 05:49:27 EDT 2020
Job submitted to the general partition, the default partition on farnam
Job name: cdb813b50789fb95, Job ID: 31767135
  I have 2 CPUs and 10GiB of RAM on compute node c23n12.farnam.hpc.yale.internal


standard error:
================
/var/spool/slurmd/job31767135/slurm_script: line 8: cd: /data2/helen_mixed_infection/notebooks: No such file or directory
INFO: cdb813b50789fb95 started
So If I want to figure out why a job failed, run sos status?
Bo
@BoPeng
Yes. The error messages are absorbed so sos status -v4 is currently the only way to go. From notebook, you can run %task status jobid -q queue with the same effect.
Actually if you hover the mouse to tasks, there is a little icon for you to submit the %task status magic. However, because currently %run -q magic is blocking, that magic would not be run until after the end of %run.
I have been thinking of making %run not blocking after the tasks are all submitted.
Patrick Cudahy
@pgcudahy
That would be nice
Bo
@BoPeng
I am using this mechanism heavily these days because it makes running "small" scripts very easy. Debugging of failed jobs is not particularly easy and I have found myself logging into the cluster to run sos status (because %task is blocked)... definitely something need to be improved. Please feel free to submit tickets for problems and feature requests so that we know what the "pain points" are.