Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
  • 02:47
    spencerkclark commented #764
  • 01:18
    aaronspring commented #764
  • 00:25
    bradyrx commented #764
  • 00:24
    bradyrx commented #764
  • 00:23
    bradyrx opened #764
  • Feb 26 22:39
    rsoutelino commented #763
  • Feb 26 19:56
    fmaussion commented #685
  • Feb 26 16:11
    rsignell-usgs commented #71
  • Feb 26 16:10
    rsignell-usgs commented #71
  • Feb 26 16:10
    rsignell-usgs commented #71
  • Feb 26 16:08
    rsignell-usgs commented #71
  • Feb 26 16:07
    rsignell-usgs commented #71
  • Feb 26 09:06
    willirath commented #685
  • Feb 26 03:09
    rabernat commented #763
  • Feb 26 02:57
    rsoutelino opened #763
  • Feb 25 01:41
    stale[bot] closed #749
  • Feb 25 01:41
    stale[bot] commented #749
  • Feb 24 13:52
    tjcrone commented #71
  • Feb 24 13:43
    rsignell-usgs commented #71
  • Feb 24 13:42
    rsignell-usgs commented #71
Joe Hamman
@jhamman
This is on my list of things to deal with today.
Matthew Rocklin
@mrocklin
@jhamman I'm around. I've submitted a job and am going to wait a while to see if it clears
Joe Hamman
@jhamman
cheyenne is down right now so you’ll need to be on casper.
Matthew Rocklin
@mrocklin
Well that's good to know :)
Joe Hamman
@jhamman
do we have a working slurm cluster right now?
https://jupyterhub.ucar.edu/dav will put you on casper (which uses slurm)
Matthew Rocklin
@mrocklin
I'm ssh'ing in
I've just discovered that we use SLURM rather than PBS
sbatch: error: You must specify an account (--account)
In [4]: print(cluster.job_script())
#!/usr/bin/env bash

#SBATCH -J dask-worker
#SBATCH -n 1
#SBATCH --cpus-per-task=16
#SBATCH --mem=60G
#SBATCH -t 00:30:00
#SBATCH -C skylake
JOB_ID=${SLURM_JOB_ID%;*}



/glade/u/home/mrocklin/miniconda/envs/dev/bin/python -m distributed.cli.dask_worker tcp://10.12.203.5:39794 --nthreads 16 --memory-limit 64.00GB --name dask-worker--${JOB_ID}-- --death-timeout 60 --interface ib0


In [5]: import dask

In [6]: dask.config.get("jobqueue.slurm")
Out[6]:
{'name': 'dask-worker',
 'cores': 1,
 'memory': '25 GB',
 'processes': 1,
 'interface': 'ib0',
 'walltime': '00:30:00',
 'job-extra': {'-C skylake': None},
 'death-timeout': 60,
 'local-directory': None,
 'shebang': '#!/usr/bin/env bash',
 'queue': None,
 'project': None,
 'extra': ['--interface', 'ib0'],
 'env-extra': [],
 'job-cpu': None,
 'job-mem': None,
 'log-directory': None}
Matthew Rocklin
@mrocklin
I'm good
I had to copy over my project from my PBS_ACCOUNT environment variable
Joe Hamman
@jhamman
sounds good. Enjoy.
Matthew Rocklin
@mrocklin
It looks like we're storing some config here:
/glade/u/apps/config/dask
Is this global?
Matthew Rocklin
@mrocklin
I'm not certain that that config is optimal
Joe Hamman
@jhamman
@mrocklin - yes, that is the baseline config we have but we can ask for specific edits.
Matthew Rocklin
@mrocklin
It's cool that they've added a baseline config
OK, I'm all set. Thanks for your help @jhamman !
Joe Hamman
@jhamman
do you have some specific suggestions on edits to that config?
Matthew Rocklin
@mrocklin
In the future we should extend dask-jobqueue to respect environment variables, and add project: $PBS_ACCOUNT
Joe Hamman
@jhamman
+1
Matthew Rocklin
@mrocklin
I did have suggestions, but then I realized that I was mixing up two config files
Joe Hamman
@jhamman
Great!
Pier
@PhenoloBoy
@jhamman I've to customize the installation with GCSFS/Fuse as by https://gcsfs.readthedocs.io/en/latest/fuse.html and seems that the only way ( for my understanding ) is to create a clone repository follow the instruction here https://github.com/pangeo-data/pangeo-cloud-federation. If you have any other idea I'm more than open to avoid a huge headache to me
karan bhatia
@lila
looking forward to attend the pangeo community meeting next week http://pangeo.io/meetings/2019_summer-meeting.html ... Thank you all for organizing and in particular providing remote access for those not able to attend in person (but i will be there in person)...
Joe Hamman
@jhamman
Hi Karan, we’re looking forward to having you.
Tina Odaka
@tinaok
Hi, I am in france and can not attend the meeting but would like to remote-attend, i saw that you plan to provide the remote access, do i need to register in advance for remote attending? thank you for your help!
hi kevin , thanks for the marge, i have a question

in compute_study.md, it is indicated that
'Duplicate each study for 2, 4, 8, and 16 workers per node (reducing chunk size proportionally)'

But I do not recall this reduction of chunk size for each increase of workers per node in utils.py,

am i missing something??
Kevin Paul
@kmpaul
@tinaok I believe you are correct; utils.py does not account for a reduction in chunk size as you increase the workers per node.
Let me look more closely...
@tinaok Also, if you are planning on attending the Pangeo community meeting remotely, you might be able to present some of your benchmarking work in a Lightning Talk. It would be a very short (and remote) presentation, but it might be possible. @jhamman would know for sure.
Kevin Paul
@kmpaul
@tinaok Ok. I've looked more closely at the benchmarking code. (@andersy005 may have something more to say about this, but I'll take a stab myself.) The design of the current code assumes 1 chunk per worker (see benchmarks/datasets.py), and it therefore assumes that the total dataset size will be equal to the (chunk size) * (number of nodes) * (number of workers per node).
This may not be optimal, but I think this can be amended in later versions.
Kevin Paul
@kmpaul
The thought behind the compute_study.md writeup was to sketch out what would be needed to generate some preliminary scaling studies for various "common" operations done with xarray and dask. The results of each study should be a plot of "Number of Nodes" vs "Operation Runtime". However, the "Operation Runtime" depends on much more than "Number of Nodes", including "Number of Workers per Node", "Number of Threads per Worker", "Total Number of Chunks", "Chunk Size", etc.
I wanted to consider 2 kinds of studies, strong and weak, since these are considered "canonical" in the HPC world. In the strong scaling studies, the "Total Data Size" should be fixed while the "Number of Nodes" is varied. In the weak scaling studies, the "Data Size per Node" should be fixed while the "Number of Nodes" is varied.
Kevin Paul
@kmpaul
When I was coming up with the compute_study.md document, I tried to find a way to fix all of the other parameters in a way such that each study was "fair." I chose 1 "Chunk per Worker" and 1 "Thread per Worker", and I chose to vary the "Chunk Size" and "Number of Workers per Node". And later we tried to come up with a way of varying the "Chunking Scheme" (i.e., chunk over all dimensions, chunk over only spatial dimensions, chunk over only time), too. But we need to generate data that looks at how these numbers vary with "Chunks per Worker" and "Threads per Worker", too.
Joe Hamman
@jhamman
@/all - the agenda and attendee list for next week’s community meeting in Seattle is now final. See details here: http://pangeo.io/meetings/2019_summer-meeting.html#
Remote participation details are also available on this page. @kmpaul - we’ll probably need to wait and see if remote lightning talks will work. We may need a proxy presenter.
Daniel Rothenberg
@darothen
Such an awesome agenda... very much looking forward to participate remotely as much as possible!
Kevin Paul
@kmpaul
@jhamman Thanks for the info regarding remote lightning talks.
@tinaok If you want to present something, and you need a proxy presenter, I'll do it for you.
Scott Henderson
@scottyhq
thought folks on this channel might be interested in this job opening at JPL https://jpl.jobs/jobs/2019-10892-Big-Data-Software-Lead
Scott
@scollis
Wow.. that is quite the requirements ;)
James A. Bednar
@jbednar
Good luck!
David Brochart
@davidbrochart
Is anyone else experiencing the Error displaying widget with e.g. the dask_kubernetes.KubeCluster widget (or any other widget)? It looks like this is related to ipywidgets==7.5. I have a Pangeo environment with jupyterlab=0.35, tornado=5.1.1 and dask_labextension==0.3.3, because I noticed that it was a working configuration at some point, but I'm not sure this is still the recommended configuration.
Matthew Rocklin
@mrocklin

I get in to Seattle a bit early (leaving tonight). I plan to mostly work on HPC deployments M/Tu if anyone wants to join.

Also, do folks want to meet up for drinks Tuesday night? I imagine that people will be arriving then.

Ryan Abernathey
@rabernat
@mrocklin - I would enjoy meeting up post-dinner on Tuesday night, preferably in the Eastlake or Capitol Hill area. @jhamman will also be around.
Ryan Abernathey
@rabernat
Question for the kubernetes folks--where is the log that tells me who has been logging into the jupyterhubs?