Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
  • Apr 07 05:02
    stale[bot] closed #756
  • Apr 07 05:02
    stale[bot] commented #756
  • Apr 03 18:26
    TomAugspurger reopened #757
  • Apr 03 18:26
    TomAugspurger commented #757
  • Apr 03 18:25
    TomAugspurger closed #757
  • Apr 03 18:25
    TomAugspurger commented #757
  • Apr 01 22:00
    jbusecke commented #757
  • Mar 30 20:11
    stale[bot] labeled #756
  • Mar 30 20:11
    stale[bot] commented #756
  • Mar 30 14:21
    TomAugspurger commented #757
  • Mar 30 13:47
    jbusecke commented #757
  • Mar 27 17:33
    rabernat commented #691
  • Mar 27 17:06
    alimanfoo commented #691
  • Mar 26 19:05
    TomAugspurger commented #757
  • Mar 26 19:05
    TomAugspurger commented #757
  • Mar 26 18:59
    jbusecke commented #757
  • Mar 26 18:56
    TomAugspurger commented #757
  • Mar 26 18:45
    jbusecke commented #757
  • Mar 24 17:35
    alimanfoo commented #691
  • Mar 24 17:33
    alimanfoo commented #691
Filipe
@ocefpaf
@robfatland that's the one we found. Don have it.
Kai Mühlbauer
@kmuehlbauer
Hi, short question. I want to test https://binder.pangeo.io/ with one of my notebook repos. The data used in this repo is normally cloned to the system the notebooks are running on and made available to the notebooks via env variables. Is there a simple way to achieve this? If not, what are my options?
Tim Head
@betatim
if you put a start file in your repo (https://repo2docker.readthedocs.io/en/latest/config_files.html#start-run-code-before-the-user-sessions-starts) you can run commands when your repo is launched
you could also bake the data into the image by using a postBuild file. or you use a data format which allows random access over the network and only read what you need when you need it
depends a bit on how big your data is and how much of it you need
Kai Mühlbauer
@kmuehlbauer
@betatim Thanks for the hints. How much can I safely bake into the container? I need to look how much it is. If it would fit with your suggested size, if would bake it into instead of pulling everytime.
Tim Head
@betatim
if it is more than ~1000MB i'd think about only pulling it when you actually need it (so neither postBuild nor start) because it either makes the image huge (slow startup times) or gives your binder a lot to do on startup (slow startup times :) ) as well as a lot of network traffic
David Brochart
@davidbrochart
I'm going to update the GPM precipitation dataset at gs://pangeo-data/gpm_imerg_early, because a new version has been released and the TRMM era has been merged, so it's now available from June 2000 instead of 2014. I will also include the probabilityLiquidPrecipitation field, along with the precipitationCal field, so that we have an information about solid precipitation.
Kai Mühlbauer
@kmuehlbauer
@betatim Thanks again, I'll think about restructuring everything.
David Hoese
@djhoese
@kmuehlbauer (hi from another non-vispy project) In some of my satpy repositories I've had it run a download script in the background from the start script. The data being downloaded though was on google's cloud storage though so I wasn't too worried about the network traffic for the ~3GB I was downloading.
Scott
@scollis
@kmuehlbauer I use the postbuild for data
Scott
@scollis
Are folks having issues today with postbuilds.. my jupyter labextentions install is throwing errors..
according to the tracker it should be fixed..
I wonder if the docker image needs updating
@scottyhq ...
David Hoese
@djhoese
@scollis the issue with doing data download in postbuild is that all that gets bundled in the image. If the image takes too long to download it can stop your image from being loaded in time and binderhub will time out
if you do the data download in the start without putting it in the background then a similar thing will happen where binderhub will wait for jupyter lab to start up but it won't happen in time because data is taking too long to download
(from experience)
Scott
@scollis
Great point! One issue is the current way I am building my image is using a Pangeo docker container as a base.. and somehow this stops start being run
BTW Great seeing @kmuehlbauer here.. Kai, I plan to build an OpenRadarScience Pangeo
the AMS Radar workshop will be a launchpad…
Scott Henderson
@scottyhq
@scollis - looks like you can pin jupyterlab<1.1 in your environment file as a solution? if you’re up for a bit of digging a pull request to enable running the ‘start’ script would be great! https://github.com/pangeo-data/pangeo-stacks/blob/4c90b98836c66403ab81ca837ce979ec9628a232/onbuild/r2d_overlay.py#L125
Kai Mühlbauer
@kmuehlbauer
@scollis You seem to be everywhere :grinning: I think I got it working using postBuild. Need to check tomorrow. jupyter hub on a mobile just isn't usable (in terms of screen size, :grimacing:).
Scott
@scollis
Thanks @scottyhq ! I’ll take a look
Scott
@scollis
something like this @scottyhq
Scott Henderson
@scottyhq
Seems like the right idea! looks like you’ll have to add a line here too https://github.com/pangeo-data/pangeo-stacks/blob/4c90b98836c66403ab81ca837ce979ec9628a232/onbuild/Dockerfile#L15. Are you able to run locally to test? @yuvipanda has mentioned putting r2d-overlay in a separate repo b/c it is a useful tool! So some additional changes could be in store in the future, but this feature would be nice to have in the meantime.
Scott
@scollis
Would that be ONSTART RUN?
if do ONBUILD RUN /usr/local/bin/r2d_overlay.py start
it would just run that on build right
(winging it :D )
and do you just check it with Docker?
wonder if it is this
Scott Henderson
@scottyhq
(I’m also winging it ;) I think it would be best to take a look at https://github.com/pangeo-data/pangeo-stacks/issues/38#issuecomment-494125005. Then go ahead and start your pull request w/ the current modifications saying it’ll close that issue.
Scott
@scollis
Cool… so it is Entrypoint
Scott Henderson
@scottyhq
yes - i’ve run repo2docker often in the past locally to check builds - and you can always push to dockerhub and then point to these images to run with any of the binderhubs with Dockerfile.
Joe Hamman
@jhamman
@scollis - yes, the start script is run as an entrypoint. You may be interested in this PR in the main repo2docker repo where the feature was first developed: https://github.com/jupyter/repo2docker/pull/363/files
(we renamed launch —> start but the idea is there)
Scott
@scollis
cool.. I commended on this issue
oh cool.. yeah I can just point repo2docker to the branch
hah.. commended.. I meant commented.. sounds like I have a mouth full of marbles
Scott
@scollis
@scottyhq cool.. I am learning a lot today.. got repo2docker running..
Scott
@scollis
ok.. it ran and I could see the server.. now making a saved image and pushing to dockerhub
Yuvi Panda
@yuvipanda
@scollis @scottyhq @jhamman I left a comment in the issue!
thanks for working on it
Scott
@scollis
@scottyhq can you share the command line you use to make the image?
I am doing `jupyter-repo2docker --no-run --user-name=jovian --image=pgimnomod https://github.com/pangeo-data/pangeo-stacks to reproduce the base stack
Scott Henderson
@scottyhq
or repo2docker --no-run --user-name=jovyan --user-id 1000 --image-name=scottyhq/geohackweek2019:2019-08-01 ./geohackweek2019