Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
  • Apr 07 05:02
    stale[bot] closed #756
  • Apr 07 05:02
    stale[bot] commented #756
  • Apr 03 18:26
    TomAugspurger reopened #757
  • Apr 03 18:26
    TomAugspurger commented #757
  • Apr 03 18:25
    TomAugspurger closed #757
  • Apr 03 18:25
    TomAugspurger commented #757
  • Apr 01 22:00
    jbusecke commented #757
  • Mar 30 20:11
    stale[bot] labeled #756
  • Mar 30 20:11
    stale[bot] commented #756
  • Mar 30 14:21
    TomAugspurger commented #757
  • Mar 30 13:47
    jbusecke commented #757
  • Mar 27 17:33
    rabernat commented #691
  • Mar 27 17:06
    alimanfoo commented #691
  • Mar 26 19:05
    TomAugspurger commented #757
  • Mar 26 19:05
    TomAugspurger commented #757
  • Mar 26 18:59
    jbusecke commented #757
  • Mar 26 18:56
    TomAugspurger commented #757
  • Mar 26 18:45
    jbusecke commented #757
  • Mar 24 17:35
    alimanfoo commented #691
  • Mar 24 17:33
    alimanfoo commented #691
Scott
@scollis
so the base docker.. does that have the dask and jupyterlab set ups? so no need to include that anymore in the binder dir environment.yml
ie the environment.yml should now only have domain specific packages?
Scott Henderson
@scottyhq
that’s the idea yes. because we’re using repo2docker you end up layering packages that are defined in the following places in order 1) https://github.com/jupyter/repo2docker/blob/master/repo2docker/buildpacks/conda/environment.yml 2) https://github.com/pangeo-data/pangeo-stacks/blob/master/base-notebook/binder/environment.yml and 3) your own environment.yml.
if you want different versions of these base packages, including those versions in your environment.yml should overwrite them
Scott
@scollis
Great
Scott Henderson
@scottyhq
https://github.com/scottyhq/pangeo-binder-test/tree/esip-tech-dive is a working current example (although I don’t automatically run the notebook to check for errors)
in the binder folder this line in Dockerfile is key FROM pangeo/base-notebook-onbuild:2019.08.08 . It fetches the dated image form https://github.com/pangeo-data/pangeo-stacks and then runs your environment.yml, postBuild, etc on top
Joe Hamman
@jhamman
@scottyhq - I think we’re good with pangeo-data/pangeo-cloud-federation#362. Please take a look around staging.hub.pangeo.io to see if you spot anything out of place.
Scott
@scollis
Thanks @scottyhq .. trying now. This is very nice
Scott
@scollis
@scottyhq assume it is the same for postBuild and other stuff in /binder/*
Scott Henderson
@scottyhq
that’s right. except unfortunately I don’t think we enabled the ‘start’ script.
Scott
@scollis
@scottyhq up and running nicely.. some issues with the dask integration into the jupyterlab panel space but stuff I can debug and iron our..
Radar Pangeo here we come!
Scott Henderson
@scottyhq
awesome! great news
Joe Hamman
@jhamman
@scottyhq - ideas on this error/warning from conda:
The following packages are causing the inconsistency:

  - conda-forge/linux-64::notebook==6.0.0=py37_0
  - conda-forge/linux-64::ipyleaflet==0.11.1=py37_1
  - conda-forge/noarch::geoviews==1.6.2=py_0
  - conda-forge/linux-64::widgetsnbextension==3.5.1=py37_0
  - conda-forge/noarch::ipywidgets==7.5.1=py_0
  - conda-forge/noarch::jupyter==1.0.0=py_2
  - conda-forge/noarch::nbconvert==5.5.0=py_0
  - conda-forge/noarch::hvplot==0.4.0=py_1
  - conda-forge/noarch::geoviews-core==1.6.2=py_0
  - conda-forge/noarch::nbgitpuller==0.7.0=py_0
  - conda-forge/linux-64::jupyterlab==1.0.4=py37_0
  - conda-forge/noarch::jupyter-server-proxy==1.1.0=py_0
  - conda-forge/noarch::jupyterlab_server==1.0.0=py_1
Matthew Rocklin
@mrocklin
@jhamman ah, sorry, I was on a flight
I'm also currently away from a full-sized USB port, maybe tomorrow though?
Joe Hamman
@jhamman
Yeah, sounds good. You’ll find me here tomorrow as well. :airplane:
Pier
@PhenoloBoy
@jhamman is it you that I've to contact to get the git-crypt symmetric key as described here? https://github.com/pangeo-data/pangeo-cloud-federation
Scott
@scollis
@jhamman is there a publication I can cite for Pangeo?
Joe Hamman
@jhamman
@scollis - nothing comprehensive just yet. However, we did publish the NSF proposal (https://figshare.com/articles/Pangeo_NSF_Earthcube_Proposal/5361094) and @guillaumeeb had this conference paper published:
Eynard-Bontemps, G., R Abernathey, J. Hamman, A. Ponte, W. Rath, 2019: The Pangeo Big Data Ecosystem and its use at CNES. In P. Soille, S. Loekken, and S. Albani, Proc. of the 2019 conference on Big Data from Space (BiDS’2019), 49-52. EUR 29660 EN, Publications Office of the European Union, Luxembourg. ISBN: 978-92-76-00034-1, doi:10.2760/848593.
@PhenoloBoy - we’re trying not to share the git-crypt key unless absolutely necessary. Can I ask what you are up to?
Scott
@scollis
Perfect. Thanks… This would be perfect for something like AMS BAMS or suchlike or even a glossy like Science..
Joe Hamman
@jhamman
We’re…trying...
We actually have something in review with Nature but things seem to have stalled out in a big way.
This is on my list of things to deal with today.
Matthew Rocklin
@mrocklin
@jhamman I'm around. I've submitted a job and am going to wait a while to see if it clears
Joe Hamman
@jhamman
cheyenne is down right now so you’ll need to be on casper.
Matthew Rocklin
@mrocklin
Well that's good to know :)
Joe Hamman
@jhamman
do we have a working slurm cluster right now?
https://jupyterhub.ucar.edu/dav will put you on casper (which uses slurm)
Matthew Rocklin
@mrocklin
I'm ssh'ing in
I've just discovered that we use SLURM rather than PBS
sbatch: error: You must specify an account (--account)
In [4]: print(cluster.job_script())
#!/usr/bin/env bash

#SBATCH -J dask-worker
#SBATCH -n 1
#SBATCH --cpus-per-task=16
#SBATCH --mem=60G
#SBATCH -t 00:30:00
#SBATCH -C skylake
JOB_ID=${SLURM_JOB_ID%;*}



/glade/u/home/mrocklin/miniconda/envs/dev/bin/python -m distributed.cli.dask_worker tcp://10.12.203.5:39794 --nthreads 16 --memory-limit 64.00GB --name dask-worker--${JOB_ID}-- --death-timeout 60 --interface ib0


In [5]: import dask

In [6]: dask.config.get("jobqueue.slurm")
Out[6]:
{'name': 'dask-worker',
 'cores': 1,
 'memory': '25 GB',
 'processes': 1,
 'interface': 'ib0',
 'walltime': '00:30:00',
 'job-extra': {'-C skylake': None},
 'death-timeout': 60,
 'local-directory': None,
 'shebang': '#!/usr/bin/env bash',
 'queue': None,
 'project': None,
 'extra': ['--interface', 'ib0'],
 'env-extra': [],
 'job-cpu': None,
 'job-mem': None,
 'log-directory': None}
Matthew Rocklin
@mrocklin
I'm good
I had to copy over my project from my PBS_ACCOUNT environment variable
Joe Hamman
@jhamman
sounds good. Enjoy.
Matthew Rocklin
@mrocklin
It looks like we're storing some config here:
/glade/u/apps/config/dask
Is this global?
Matthew Rocklin
@mrocklin
I'm not certain that that config is optimal
Joe Hamman
@jhamman
@mrocklin - yes, that is the baseline config we have but we can ask for specific edits.
Matthew Rocklin
@mrocklin
It's cool that they've added a baseline config
OK, I'm all set. Thanks for your help @jhamman !
Joe Hamman
@jhamman
do you have some specific suggestions on edits to that config?
Matthew Rocklin
@mrocklin
In the future we should extend dask-jobqueue to respect environment variables, and add project: $PBS_ACCOUNT
Joe Hamman
@jhamman
+1
Matthew Rocklin
@mrocklin
I did have suggestions, but then I realized that I was mixing up two config files
Joe Hamman
@jhamman
Great!