github-actions[bot] on gh-pages
Update documentation (compare)
rabernat on master
remove reference to GMT (#808) (compare)
charlesbluca on master
Change main job name (compare)
charlesbluca on master
Delete build.yml (compare)
charlesbluca on master
Merge limited build workflow (compare)
charlesbluca on master
Delete stale.yml (compare)
charlesbluca on master
First attempt at making preview… Use -B option to checkout branc… Try creating new branch with ac… and 7 more (compare)
The following packages are causing the inconsistency:
- conda-forge/linux-64::notebook==6.0.0=py37_0
- conda-forge/linux-64::ipyleaflet==0.11.1=py37_1
- conda-forge/noarch::geoviews==1.6.2=py_0
- conda-forge/linux-64::widgetsnbextension==3.5.1=py37_0
- conda-forge/noarch::ipywidgets==7.5.1=py_0
- conda-forge/noarch::jupyter==1.0.0=py_2
- conda-forge/noarch::nbconvert==5.5.0=py_0
- conda-forge/noarch::hvplot==0.4.0=py_1
- conda-forge/noarch::geoviews-core==1.6.2=py_0
- conda-forge/noarch::nbgitpuller==0.7.0=py_0
- conda-forge/linux-64::jupyterlab==1.0.4=py37_0
- conda-forge/noarch::jupyter-server-proxy==1.1.0=py_0
- conda-forge/noarch::jupyterlab_server==1.0.0=py_1
Eynard-Bontemps, G., R Abernathey, J. Hamman, A. Ponte, W. Rath, 2019: The Pangeo Big Data Ecosystem and its use at CNES. In P. Soille, S. Loekken, and S. Albani, Proc. of the 2019 conference on Big Data from Space (BiDS’2019), 49-52. EUR 29660 EN, Publications Office of the European Union, Luxembourg. ISBN: 978-92-76-00034-1, doi:10.2760/848593.
sbatch: error: You must specify an account (--account)
In [4]: print(cluster.job_script())
#!/usr/bin/env bash
#SBATCH -J dask-worker
#SBATCH -n 1
#SBATCH --cpus-per-task=16
#SBATCH --mem=60G
#SBATCH -t 00:30:00
#SBATCH -C skylake
JOB_ID=${SLURM_JOB_ID%;*}
/glade/u/home/mrocklin/miniconda/envs/dev/bin/python -m distributed.cli.dask_worker tcp://10.12.203.5:39794 --nthreads 16 --memory-limit 64.00GB --name dask-worker--${JOB_ID}-- --death-timeout 60 --interface ib0
In [5]: import dask
In [6]: dask.config.get("jobqueue.slurm")
Out[6]:
{'name': 'dask-worker',
'cores': 1,
'memory': '25 GB',
'processes': 1,
'interface': 'ib0',
'walltime': '00:30:00',
'job-extra': {'-C skylake': None},
'death-timeout': 60,
'local-directory': None,
'shebang': '#!/usr/bin/env bash',
'queue': None,
'project': None,
'extra': ['--interface', 'ib0'],
'env-extra': [],
'job-cpu': None,
'job-mem': None,
'log-directory': None}
in compute_study.md, it is indicated that
'Duplicate each study for 2, 4, 8, and 16 workers per node (reducing chunk size proportionally)'
But I do not recall this reduction of chunk size for each increase of workers per node in utils.py,
benchmarks/datasets.py
), and it therefore assumes that the total dataset size will be equal to the (chunk size) * (number of nodes) * (number of workers per node)
.
compute_study.md
writeup was to sketch out what would be needed to generate some preliminary scaling studies for various "common" operations done with xarray
and dask
. The results of each study should be a plot of "Number of Nodes" vs "Operation Runtime". However, the "Operation Runtime" depends on much more than "Number of Nodes", including "Number of Workers per Node", "Number of Threads per Worker", "Total Number of Chunks", "Chunk Size", etc.