Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
  • Nov 16 20:24
    stale[bot] labeled #719
  • Nov 16 20:24
    stale[bot] commented #719
  • Nov 15 21:29
    mrocklin commented #743
  • Nov 15 20:56
    stale[bot] labeled #707
  • Nov 15 20:56
    stale[bot] commented #707
  • Nov 15 19:38
    ian-r-rose commented #743
  • Nov 15 19:34
    rabernat commented #743
  • Nov 15 04:44
    stale[bot] labeled #567
  • Nov 15 04:44
    stale[bot] commented #567
  • Nov 15 01:25
    mrocklin opened #743
  • Nov 13 14:20
    stale[bot] labeled #677
  • Nov 13 14:20
    stale[bot] commented #677
  • Nov 11 12:17
    stale[bot] closed #712
  • Nov 11 12:17
    stale[bot] commented #712
  • Nov 10 16:13
    stale[bot] closed #703
  • Nov 10 16:13
    stale[bot] commented #703
  • Nov 08 12:49
    TomAugspurger commented #739
  • Nov 07 18:44
    DanielMichelson commented #739
  • Nov 07 18:40
    DanielMichelson commented #739
  • Nov 04 16:44
    martindurant commented #741
Ryan Abernathey
@rabernat
Does anyone have an example of an intake catalog pointing at zarr datasets on S3?
Aimee Barciauskas
@abarciauskas-bgse
I was hoping to do this as a part of my sprint but didn't get to it :disappointed:
Joe Hamman
@jhamman
@andersy005 :arrow_heading_up:
Ryan Abernathey
@rabernat
For some reason, @tjcrone and I are have trouble making intake work with s3. I think Tim will open an intake issue.
Rob Fatland
@robfatland
Two questions: Who is going to the Princeton Workshop on Next Gen Cloud Research Infrastructure? My abstract is only tangentially about pangeo and I was thinking someone was going to attend to carry that banner but I'm not sure who that is. Second: Is gitter Lobby just any and all conversations?
Joe Hamman
@jhamman
Rob, @rabernat is subbmitting on cloud-native data formats (e.g. Zarr) and I am submitting on the Pangeo Cloud Architecture/Principles.
This gitter lobby is a place for general conversation but we tend to try to limit the depth of any conversation because its hard to track.
Anderson Banihirwe
@andersy005
@jhamman, @rabernat, I still haven't had time to create static intake catalog for the CESM1 LENS data in S3. What I have today is something that works with intake-esm: https://intake-esm.readthedocs.io/en/latest/notebooks/examples/cesm1-lens-aws.html
Rob Fatland
@robfatland
Ok first: Great. Second: The Princeton thing is Mon/Tues and I'm at the Cornell Cloud Forum the prior Wed/Thu/Fri so plan to kill the weekend wandering around Manhattan looking for sandbagger climbing gyms. And third: I'll keep my gitter discourse oh wait, cool ! PewDiePie > 1e8 subscribers!

plan to kill the weekend wandering around Manhattan looking for sandbagger climbing gyms

Would love to meet up when you're in the city

Scott
@scollis
Hey folks.. I am using the KubeCluster and want to write data back to the main binder machine for analysis in my workflow.. but the daksk workers get a permission denied error when trying to write back to the file system
Joe Hamman
@jhamman
Your workers don’t have access to the home directory of your notebook session.
Scott
@scollis
Thanks @jhamman .. is there a FS the workers can see?
Joe Hamman
@jhamman
No. Well, they all have their own persistent disk.
But if you want the data locally, you need to compute first.
Scott
@scollis
right, is there a way to copy back from the workers to the notebook session FS? I am generating grid data on the workers that would be too big to fit in memory if I bought it all back by a client.gather(future)
I guess I could push the data back in the futures and then loop over them and dispose of the data as I save..
Let me experiment with that.. fun :)
Pier
@PhenoloBoy
@scollis Hi, in which format are you going to save your data once back from the workers?
Scott
@scollis
netcdf.
I know.. I really need to learn XARR
Pier
@PhenoloBoy
can you process your data per lines?
Scott
@scollis
Its very nicely parallel.. a task per time step. So chunking by time is easy..
Pier
@PhenoloBoy
I've got to solve a similar problem and, in my case, the only solution has been an append results in a netCDF4 file
netcdf not xarray
Scott
@scollis
Cool. That’s the kind of thing I am looking at doing
Pier
@PhenoloBoy
you have to create an empty netCDF http://unidata.github.io/netcdf4-python/netCDF4/index.html and almost that's it
even if isn't a super clean method in my case has been the only solution
but be aware that netcdf has some problem over GC
so before you start with this approach have a look at some discussion about netCDF and the Cloud base infrastructures
Scott
@scollis
Will do.. I am just messing around right now but will be hacking at this more seriously soon
Anyone got some cool code that shows dealing with futures as they complete on dask?
Pier
@PhenoloBoy
let me see if I've anything, most of the time I'm using that approach
Scott
@scollis
Thanks!
This is my loop once done script I want to have run while the compute is ongoing
tpls = [] for this_future in future: gathered = client.gather(this_future) pyart.io.write_grid(gathered[-1], gathered[-2]) tpls.append(tpls[0:-3]) del gathered
hmm.. not so good for showing code
:D
Pier
@PhenoloBoy
don't worry it's enough
have a look to this, even if is far fro be perfect and is more written by a monkey it could help you. You have to readapt as the writing part isn't there. Unfortunately, I couldn't retest if it's working as I'm a little bit busy
Scott
@scollis
Awesome, thanks!
Pier
@PhenoloBoy
the approach is unconventional and I don't suggest to anybody to follow it. Time to time in some cases is the only solution that I figured out
Scott
@scollis
@jhamman is there any example that stores data from Kuberneties workers to a cloud store like google cloud as a way of returning data?
Pier
@PhenoloBoy
Zarr or parquette is your solution
Scott
@scollis
Thats really nice @PhenoloBoy … acts as way to start thinking about stuff.. Yeah.. gotta learn to use Zarr
Pier
@PhenoloBoy
I've made the same question to @jhamman few days ago
the solution is to use Zarr or Parquet
seems that in the upcoming 6 months there will be a beta for netCDF that will use Zarr but right now is more gossip than anything else ( at least for my understanding)
Rob Fatland
@robfatland
@rabernat fantastic i was hoping you'd be available.