Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
  • Nov 16 20:24
    stale[bot] labeled #719
  • Nov 16 20:24
    stale[bot] commented #719
  • Nov 15 21:29
    mrocklin commented #743
  • Nov 15 20:56
    stale[bot] labeled #707
  • Nov 15 20:56
    stale[bot] commented #707
  • Nov 15 19:38
    ian-r-rose commented #743
  • Nov 15 19:34
    rabernat commented #743
  • Nov 15 04:44
    stale[bot] labeled #567
  • Nov 15 04:44
    stale[bot] commented #567
  • Nov 15 01:25
    mrocklin opened #743
  • Nov 13 14:20
    stale[bot] labeled #677
  • Nov 13 14:20
    stale[bot] commented #677
  • Nov 11 12:17
    stale[bot] closed #712
  • Nov 11 12:17
    stale[bot] commented #712
  • Nov 10 16:13
    stale[bot] closed #703
  • Nov 10 16:13
    stale[bot] commented #703
  • Nov 08 12:49
    TomAugspurger commented #739
  • Nov 07 18:44
    DanielMichelson commented #739
  • Nov 07 18:40
    DanielMichelson commented #739
  • Nov 04 16:44
    martindurant commented #741
Ryan Abernathey
@rabernat

plan to kill the weekend wandering around Manhattan looking for sandbagger climbing gyms

Would love to meet up when you're in the city

Scott
@scollis
Hey folks.. I am using the KubeCluster and want to write data back to the main binder machine for analysis in my workflow.. but the daksk workers get a permission denied error when trying to write back to the file system
Joe Hamman
@jhamman
Your workers don’t have access to the home directory of your notebook session.
Scott
@scollis
Thanks @jhamman .. is there a FS the workers can see?
Joe Hamman
@jhamman
No. Well, they all have their own persistent disk.
But if you want the data locally, you need to compute first.
Scott
@scollis
right, is there a way to copy back from the workers to the notebook session FS? I am generating grid data on the workers that would be too big to fit in memory if I bought it all back by a client.gather(future)
I guess I could push the data back in the futures and then loop over them and dispose of the data as I save..
Let me experiment with that.. fun :)
Pier
@PhenoloBoy
@scollis Hi, in which format are you going to save your data once back from the workers?
Scott
@scollis
netcdf.
I know.. I really need to learn XARR
Pier
@PhenoloBoy
can you process your data per lines?
Scott
@scollis
Its very nicely parallel.. a task per time step. So chunking by time is easy..
Pier
@PhenoloBoy
I've got to solve a similar problem and, in my case, the only solution has been an append results in a netCDF4 file
netcdf not xarray
Scott
@scollis
Cool. That’s the kind of thing I am looking at doing
Pier
@PhenoloBoy
you have to create an empty netCDF http://unidata.github.io/netcdf4-python/netCDF4/index.html and almost that's it
even if isn't a super clean method in my case has been the only solution
but be aware that netcdf has some problem over GC
so before you start with this approach have a look at some discussion about netCDF and the Cloud base infrastructures
Scott
@scollis
Will do.. I am just messing around right now but will be hacking at this more seriously soon
Anyone got some cool code that shows dealing with futures as they complete on dask?
Pier
@PhenoloBoy
let me see if I've anything, most of the time I'm using that approach
Scott
@scollis
Thanks!
This is my loop once done script I want to have run while the compute is ongoing
tpls = [] for this_future in future: gathered = client.gather(this_future) pyart.io.write_grid(gathered[-1], gathered[-2]) tpls.append(tpls[0:-3]) del gathered
hmm.. not so good for showing code
:D
Pier
@PhenoloBoy
don't worry it's enough
have a look to this, even if is far fro be perfect and is more written by a monkey it could help you. You have to readapt as the writing part isn't there. Unfortunately, I couldn't retest if it's working as I'm a little bit busy
Scott
@scollis
Awesome, thanks!
Pier
@PhenoloBoy
the approach is unconventional and I don't suggest to anybody to follow it. Time to time in some cases is the only solution that I figured out
Scott
@scollis
@jhamman is there any example that stores data from Kuberneties workers to a cloud store like google cloud as a way of returning data?
Pier
@PhenoloBoy
Zarr or parquette is your solution
Scott
@scollis
Thats really nice @PhenoloBoy … acts as way to start thinking about stuff.. Yeah.. gotta learn to use Zarr
Pier
@PhenoloBoy
I've made the same question to @jhamman few days ago
the solution is to use Zarr or Parquet
seems that in the upcoming 6 months there will be a beta for netCDF that will use Zarr but right now is more gossip than anything else ( at least for my understanding)
Rob Fatland
@robfatland
@rabernat fantastic i was hoping you'd be available.
Charles Blackmon-Luca
@charlesbluca
Quickly tossed together some notebooks to generate a catalog of all the data on gs://pangeo-data; my idea is that one day a script could be automated to do this on a regular basis
Rob Fatland
@robfatland
If nobody minds I'm going to hijack appear.in/pangeo from 5pm to 5:30pm PDT today for a conversation on Megaptera, our citizen science whale call identification ML project. LMK if any conflicting and I'll bang on over to zoom.
Anderson Banihirwe
@andersy005

RE: For some reason, @tjcrone and I are have trouble making intake work with s3. I think Tim will open an intake issue.

@rabernat & @tjcrone, did you figure this out? Could you expand on what the exact issue was? I am trying to create static intake catalogs pointing to CESM LENS data in S3 and I seem to be having some issues when accessing the data.

Joe Hamman
@jhamman
@robfatland :thumbsup:
Tom Augspurger
@TomAugspurger
Ping me if you’re having issues with s3fs. There’s been some churn lately.
Filipe
@ocefpaf
Did anyone loose a laptop charger during the meeting last week?
Ryan Abernathey
@rabernat
@tjcrone - could you open an intake issue about the s3fs / intake problem we ran into? I don't have the code to reproduce on my machine.
Ryan Abernathey
@rabernat
Satpy / pyresample experts. What is the best way to serialize an area definition and store it in an xarray dataset?
Satpy does some of this stuff internally, but I want to roll my own.