Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
  • 04:23
    rabernat commented #745
  • 01:10
    ChrisBarker-NOAA commented #356
  • 01:08
    ChrisBarker-NOAA commented #356
  • Nov 18 20:43
    rsignell-usgs commented #356
  • Nov 18 20:42
    rsignell-usgs commented #356
  • Nov 18 19:42
    rsignell-usgs commented #356
  • Nov 18 19:41
    rsignell-usgs commented #356
  • Nov 18 19:33
    yosoyjay commented #356
  • Nov 18 18:15
    rsignell-usgs commented #71
  • Nov 18 12:27
    TomAugspurger commented #744
  • Nov 18 05:51
    shoyer opened #745
  • Nov 18 02:24
    lauren-gaiascope commented #744
  • Nov 18 02:14
    lauren-gaiascope commented #744
  • Nov 18 01:23
    TomAugspurger commented #744
  • Nov 18 01:05
    lauren-gaiascope opened #744
  • Nov 16 20:24
    stale[bot] labeled #719
  • Nov 16 20:24
    stale[bot] commented #719
  • Nov 15 21:29
    mrocklin commented #743
  • Nov 15 20:56
    stale[bot] labeled #707
  • Nov 15 20:56
    stale[bot] commented #707
Ryan Abernathey
@rabernat
For some reason, @tjcrone and I are have trouble making intake work with s3. I think Tim will open an intake issue.
Rob Fatland
@robfatland
Two questions: Who is going to the Princeton Workshop on Next Gen Cloud Research Infrastructure? My abstract is only tangentially about pangeo and I was thinking someone was going to attend to carry that banner but I'm not sure who that is. Second: Is gitter Lobby just any and all conversations?
Joe Hamman
@jhamman
Rob, @rabernat is subbmitting on cloud-native data formats (e.g. Zarr) and I am submitting on the Pangeo Cloud Architecture/Principles.
This gitter lobby is a place for general conversation but we tend to try to limit the depth of any conversation because its hard to track.
Anderson Banihirwe
@andersy005
@jhamman, @rabernat, I still haven't had time to create static intake catalog for the CESM1 LENS data in S3. What I have today is something that works with intake-esm: https://intake-esm.readthedocs.io/en/latest/notebooks/examples/cesm1-lens-aws.html
Rob Fatland
@robfatland
Ok first: Great. Second: The Princeton thing is Mon/Tues and I'm at the Cornell Cloud Forum the prior Wed/Thu/Fri so plan to kill the weekend wandering around Manhattan looking for sandbagger climbing gyms. And third: I'll keep my gitter discourse oh wait, cool ! PewDiePie > 1e8 subscribers!

plan to kill the weekend wandering around Manhattan looking for sandbagger climbing gyms

Would love to meet up when you're in the city

Scott
@scollis
Hey folks.. I am using the KubeCluster and want to write data back to the main binder machine for analysis in my workflow.. but the daksk workers get a permission denied error when trying to write back to the file system
Joe Hamman
@jhamman
Your workers don’t have access to the home directory of your notebook session.
Scott
@scollis
Thanks @jhamman .. is there a FS the workers can see?
Joe Hamman
@jhamman
No. Well, they all have their own persistent disk.
But if you want the data locally, you need to compute first.
Scott
@scollis
right, is there a way to copy back from the workers to the notebook session FS? I am generating grid data on the workers that would be too big to fit in memory if I bought it all back by a client.gather(future)
I guess I could push the data back in the futures and then loop over them and dispose of the data as I save..
Let me experiment with that.. fun :)
Pier
@PhenoloBoy
@scollis Hi, in which format are you going to save your data once back from the workers?
Scott
@scollis
netcdf.
I know.. I really need to learn XARR
Pier
@PhenoloBoy
can you process your data per lines?
Scott
@scollis
Its very nicely parallel.. a task per time step. So chunking by time is easy..
Pier
@PhenoloBoy
I've got to solve a similar problem and, in my case, the only solution has been an append results in a netCDF4 file
netcdf not xarray
Scott
@scollis
Cool. That’s the kind of thing I am looking at doing
Pier
@PhenoloBoy
you have to create an empty netCDF http://unidata.github.io/netcdf4-python/netCDF4/index.html and almost that's it
even if isn't a super clean method in my case has been the only solution
but be aware that netcdf has some problem over GC
so before you start with this approach have a look at some discussion about netCDF and the Cloud base infrastructures
Scott
@scollis
Will do.. I am just messing around right now but will be hacking at this more seriously soon
Anyone got some cool code that shows dealing with futures as they complete on dask?
Pier
@PhenoloBoy
let me see if I've anything, most of the time I'm using that approach
Scott
@scollis
Thanks!
This is my loop once done script I want to have run while the compute is ongoing
tpls = [] for this_future in future: gathered = client.gather(this_future) pyart.io.write_grid(gathered[-1], gathered[-2]) tpls.append(tpls[0:-3]) del gathered
hmm.. not so good for showing code
:D
Pier
@PhenoloBoy
don't worry it's enough
have a look to this, even if is far fro be perfect and is more written by a monkey it could help you. You have to readapt as the writing part isn't there. Unfortunately, I couldn't retest if it's working as I'm a little bit busy
Scott
@scollis
Awesome, thanks!
Pier
@PhenoloBoy
the approach is unconventional and I don't suggest to anybody to follow it. Time to time in some cases is the only solution that I figured out
Scott
@scollis
@jhamman is there any example that stores data from Kuberneties workers to a cloud store like google cloud as a way of returning data?
Pier
@PhenoloBoy
Zarr or parquette is your solution
Scott
@scollis
Thats really nice @PhenoloBoy … acts as way to start thinking about stuff.. Yeah.. gotta learn to use Zarr
Pier
@PhenoloBoy
I've made the same question to @jhamman few days ago
the solution is to use Zarr or Parquet
seems that in the upcoming 6 months there will be a beta for netCDF that will use Zarr but right now is more gossip than anything else ( at least for my understanding)
Rob Fatland
@robfatland
@rabernat fantastic i was hoping you'd be available.
Charles Blackmon-Luca
@charlesbluca
Quickly tossed together some notebooks to generate a catalog of all the data on gs://pangeo-data; my idea is that one day a script could be automated to do this on a regular basis
Rob Fatland
@robfatland
If nobody minds I'm going to hijack appear.in/pangeo from 5pm to 5:30pm PDT today for a conversation on Megaptera, our citizen science whale call identification ML project. LMK if any conflicting and I'll bang on over to zoom.