Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
  • 20:43
    rsignell-usgs commented #356
  • 20:42
    rsignell-usgs commented #356
  • 19:42
    rsignell-usgs commented #356
  • 19:41
    rsignell-usgs commented #356
  • 19:33
    yosoyjay commented #356
  • 18:15
    rsignell-usgs commented #71
  • 12:27
    TomAugspurger commented #744
  • 05:51
    shoyer opened #745
  • 02:24
    lauren-gaiascope commented #744
  • 02:14
    lauren-gaiascope commented #744
  • 01:23
    TomAugspurger commented #744
  • 01:05
    lauren-gaiascope opened #744
  • Nov 16 20:24
    stale[bot] labeled #719
  • Nov 16 20:24
    stale[bot] commented #719
  • Nov 15 21:29
    mrocklin commented #743
  • Nov 15 20:56
    stale[bot] labeled #707
  • Nov 15 20:56
    stale[bot] commented #707
  • Nov 15 19:38
    ian-r-rose commented #743
  • Nov 15 19:34
    rabernat commented #743
  • Nov 15 04:44
    stale[bot] labeled #567

plan to kill the weekend wandering around Manhattan looking for sandbagger climbing gyms

Would love to meet up when you're in the city

Scott
@scollis
Hey folks.. I am using the KubeCluster and want to write data back to the main binder machine for analysis in my workflow.. but the daksk workers get a permission denied error when trying to write back to the file system
Joe Hamman
@jhamman
Your workers don’t have access to the home directory of your notebook session.
Scott
@scollis
Thanks @jhamman .. is there a FS the workers can see?
Joe Hamman
@jhamman
No. Well, they all have their own persistent disk.
But if you want the data locally, you need to compute first.
Scott
@scollis
right, is there a way to copy back from the workers to the notebook session FS? I am generating grid data on the workers that would be too big to fit in memory if I bought it all back by a client.gather(future)
I guess I could push the data back in the futures and then loop over them and dispose of the data as I save..
Let me experiment with that.. fun :)
Pier
@PhenoloBoy
@scollis Hi, in which format are you going to save your data once back from the workers?
Scott
@scollis
netcdf.
I know.. I really need to learn XARR
Pier
@PhenoloBoy
can you process your data per lines?
Scott
@scollis
Its very nicely parallel.. a task per time step. So chunking by time is easy..
Pier
@PhenoloBoy
I've got to solve a similar problem and, in my case, the only solution has been an append results in a netCDF4 file
netcdf not xarray
Scott
@scollis
Cool. That’s the kind of thing I am looking at doing
Pier
@PhenoloBoy
you have to create an empty netCDF http://unidata.github.io/netcdf4-python/netCDF4/index.html and almost that's it
even if isn't a super clean method in my case has been the only solution
but be aware that netcdf has some problem over GC
so before you start with this approach have a look at some discussion about netCDF and the Cloud base infrastructures
Scott
@scollis
Will do.. I am just messing around right now but will be hacking at this more seriously soon
Anyone got some cool code that shows dealing with futures as they complete on dask?
Pier
@PhenoloBoy
let me see if I've anything, most of the time I'm using that approach
Scott
@scollis
Thanks!
This is my loop once done script I want to have run while the compute is ongoing
tpls = [] for this_future in future: gathered = client.gather(this_future) pyart.io.write_grid(gathered[-1], gathered[-2]) tpls.append(tpls[0:-3]) del gathered
hmm.. not so good for showing code
:D
Pier
@PhenoloBoy
don't worry it's enough
have a look to this, even if is far fro be perfect and is more written by a monkey it could help you. You have to readapt as the writing part isn't there. Unfortunately, I couldn't retest if it's working as I'm a little bit busy
Scott
@scollis
Awesome, thanks!
Pier
@PhenoloBoy
the approach is unconventional and I don't suggest to anybody to follow it. Time to time in some cases is the only solution that I figured out
Scott
@scollis
@jhamman is there any example that stores data from Kuberneties workers to a cloud store like google cloud as a way of returning data?
Pier
@PhenoloBoy
Zarr or parquette is your solution
Scott
@scollis
Thats really nice @PhenoloBoy … acts as way to start thinking about stuff.. Yeah.. gotta learn to use Zarr
Pier
@PhenoloBoy
I've made the same question to @jhamman few days ago
the solution is to use Zarr or Parquet
seems that in the upcoming 6 months there will be a beta for netCDF that will use Zarr but right now is more gossip than anything else ( at least for my understanding)
Rob Fatland
@robfatland
@rabernat fantastic i was hoping you'd be available.
Charles Blackmon-Luca
@charlesbluca
Quickly tossed together some notebooks to generate a catalog of all the data on gs://pangeo-data; my idea is that one day a script could be automated to do this on a regular basis
Rob Fatland
@robfatland
If nobody minds I'm going to hijack appear.in/pangeo from 5pm to 5:30pm PDT today for a conversation on Megaptera, our citizen science whale call identification ML project. LMK if any conflicting and I'll bang on over to zoom.
Anderson Banihirwe
@andersy005

RE: For some reason, @tjcrone and I are have trouble making intake work with s3. I think Tim will open an intake issue.

@rabernat & @tjcrone, did you figure this out? Could you expand on what the exact issue was? I am trying to create static intake catalogs pointing to CESM LENS data in S3 and I seem to be having some issues when accessing the data.

Joe Hamman
@jhamman
@robfatland :thumbsup:
Tom Augspurger
@TomAugspurger
Ping me if you’re having issues with s3fs. There’s been some churn lately.
Filipe
@ocefpaf
Did anyone loose a laptop charger during the meeting last week?
Ryan Abernathey
@rabernat
@tjcrone - could you open an intake issue about the s3fs / intake problem we ran into? I don't have the code to reproduce on my machine.
Ryan Abernathey
@rabernat
Satpy / pyresample experts. What is the best way to serialize an area definition and store it in an xarray dataset?