by

Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
  • 05:30
    Boes-man commented #773
  • Aug 08 07:25
    stale[bot] closed #764
  • Aug 08 07:25
    stale[bot] commented #764
  • Aug 01 08:33
    stale[bot] closed #778
  • Aug 01 08:32
    stale[bot] commented #778
  • Jul 29 19:08

    TomAugspurger on user-install-caveats

    (compare)

  • Jul 29 19:08

    TomAugspurger on user-install

    (compare)

  • Jul 29 19:07

    TomAugspurger on user-install-caveats

    more caveats (compare)

  • Jul 29 19:07

    TomAugspurger on user-install-caveats

    Merge remote-tracking branch 'o… more caveats (compare)

  • Jul 29 19:01

    rabernat on gh-pages

    Update docs after building Trav… (compare)

  • Jul 29 18:59

    rabernat on master

    DOC: Demonstrate user installs … (compare)

  • Jul 29 18:59
    rabernat closed #782
  • Jul 29 18:33
    TomAugspurger opened #782
  • Jul 29 18:21

    TomAugspurger on user-install

    DOC: Demonstrate user installs (compare)

  • Jul 27 18:13
    stale[bot] labeled #764
  • Jul 27 18:13
    stale[bot] commented #764
  • Jul 25 19:39
    stale[bot] closed #777
  • Jul 25 19:39
    stale[bot] commented #777
  • Jul 25 06:40
    stale[bot] closed #752
  • Jul 25 06:40
    stale[bot] closed #776
Joe Hamman
@jhamman
Rich! I’ll hang out in https://whereby.com/pangeo
Tom Augspurger
@TomAugspurger
Who is able to do DNS things for pangeo.io? I’d like to set up grafana.<deployment>.pangeo.io before I forget.
Scott Henderson
@scottyhq
tom i’ll post the login info via keybase so that you can do it
Tom Augspurger
@TomAugspurger
Thanks.
Andrew Annex
@AndrewAnnex
hey all, does anyone here work with rasterio and dask, I am trying to work out a way to get concurrent reads from xarray open_rasterio working but disabling the lock runs into attribute errors elsewhere as does attempting to use other locks provided in xarray.
Rich Signell
@rsignell-usgs
@AndrewAnnex , if nobody responds here, perhaps try the rioxarray folks? https://github.com/corteva/rioxarray
Andrew Annex
@AndrewAnnex
@rsignell-usgs I will take a look, since xarray has open_rasterio the use of rioxarray is a little unclear to me at the moment
Rich Signell
@rsignell-usgs
Pangeans: The US Unified Forecast System folks are having their first users group meeting, and despite having dozens of talks, there is no mention of Pangeo! https://dtcenter.org/events/2020/unified-forecast-system-ufs-users-workshop/agenda-public
Ryan Abernathey
@rabernat
Their loss? :laughing:
We can't make every conference
Rich Signell
@rsignell-usgs
Kevin Jorrisen from AWS said he would be willing to include one slide in hist 10 min talk, so the question is: does anyone have a good one slide summary of Pangeo?
Rich Signell
@rsignell-usgs
Thanks Ryan! And yes, it's their loss, but it's really sad because Pangeo is the perfect framework for what that community is trying to accomplish
Ryan Abernathey
@rabernat
We need a champion from the WX community if we want to get traction in that crowd. I have credibility in PO circles because I'm an actual oceanographer. We have lots of good ocean examples. It's a lot harder to have an influence as an outsider.
Scott Henderson
@scottyhq
Hi @AndrewAnnex ! I’m very interested in what you are trying to do, but never dug into it because ‘things were working well enough’. You might find some useful info here https://github.com/pangeo-data/pangeo-example-notebooks/issues/21#issuecomment-442243515
I also recommend the rasterio mailing list if you aren’t already signed up https://rasterio.groups.io/g/main/search?q=concurrent
Keep us posted on your efforts!
Andrew Annex
@AndrewAnnex
@scottyhq I'll post to the rasterio mailing list (i've been there for a few years actually). In short I have a collection of rasters and a collection of geospatial vectors where each vector defines a smallish read operation on the raster file, and then I do other stuff on the values retrieved from the raster. Each of these read operations can be done independently, but what I have seen in the dask task stream are that these reads are done one at a time for an individual raster, and that I can't seem to even get dask to read multiple rasters simultaneously and have the read operations occur serially for a raster. I have an existing pipeline that uses futures.ThreadPoolExecutor with rasterio (for each vector, open the raster file then read out the blocks I need, close it and then do other stuff) which is currently outperforming the dask/xarray method I've made.
@scottyhq I'll check back later/ask on rasterio mailing list later
Andrew Annex
@AndrewAnnex
It could be that since I am just developing on a laptop I am hitting the overhead with dask/xarray, or that I need to make sure all my geotiffs are internally tiled correctly, but naively I wouldn't expect the overhead to be as high as I am seeing (old pipeline takes ~90 seconds, this process takes about 2-3x as long)
Scott Henderson
@scottyhq
Interesting, thanks for the info @AndrewAnnex - if you have a notebook/script available I think this is something worth documenting and getting to the bottom of. Because of the various libraries involved it might be worth making a post here https://discourse.pangeo.io
As a model, this post on operations on NetCDF files has been extremely illuminating https://discourse.pangeo.io/t/best-practices-to-go-from-1000s-of-netcdf-files-to-analyses-on-a-hpc-cluster/588
if you’re using dask.distributed a performance report could be helpful to loop in dask developers for diagnosing issues https://distributed.dask.org/en/latest/diagnosing-performance.html#performance-reports
Andrew Annex
@AndrewAnnex
I'll look into posting on the discourse/catching up on that best practice post (as it is long). I've made a performance report so it is easy to see the pauses, my current thinking is that it could be because in my "per vector feature" function (the one passed to map partitions) I use the rio.clip_box function and call compute on it so that I can use xarray interp in the function. If I don't I get an xarray error on running interp on chunked data
Anne Fouilloux
@annefou
@AndrewAnnex I am also very interested. Sorry I am a bit new here and tend to get lost with all your communication channels... I also have performance issues with more "GIS-like" operations and would like to find (or help to develop if they do not exist) best practices from A to Z (from creation of dataset in an optimized format to data analysis and visualization). Examples tend to focus on one aspect; they always work perfectly but as soon as I deviate from them, I fail (probably because there are a few fundamentals I do not understand/miss).
Ryan Abernathey
@rabernat

and tend to get lost with all your communication channels

@annefou -- this is our problem, not yours! We have too many channels. We try to recommend discourse.pangeo.io as the main place to ask questions.

Anne Fouilloux
@annefou
Thanks
Daniel Rothenberg
@darothen

Regarding the UFS meeting - the workshop this week is really heavily focused on NWP/HPC applications. I think the "EPIC" program came up very briefly at the beginning of yesterday's meeting, but it's been really clear that the cloud/data support focus needed for the UFS to successfully evolve is not really within the scope of this meeting.

Inside of EPIC though, it's a different story, and one with a ton of potential for synergy with the Pangeo community.

Ryan Abernathey
@rabernat
@darothen - how can we best take advantage of the EPIC opportunity? What sort of top-level coordination is needed from Pangeo? This might be a good way to bootstrap 2i2c (http://2i2c.org/)
Daniel Rothenberg
@darothen

@rabernat it's a great question... I wish I had a good answer!

One of the most important things will be to see the outcome of the open-competition RFP that NOAA put out earlier this year to solicit a commercial partner for building out core EPIC infrastructure (https://wpo.noaa.gov/Resources/News/ArtMID/446/ArticleID/69/Contract-Opportunity-NOAAs-EPIC-releases-RFP). A few big name folks are submitting responses (my company is partnered with another major contractor as the prime) and I think whoever wins will likely be very open to direct collaboration, since a major enabler of EPIC will be the establishment of easy-access, cloud-based and optimized datasets to feed into whatever evolves into "UFS on the Cloud"

The timeline of the published outcomes for UFS capabilities expected from EPIC is incredibly accelerated... it's going to leave very little room for infrastructure tech development, which I suspect is going to leave a lot of folks looking for ready-to-go analysis platforms like Pangeo that can be deployed to the cloud co-located with the UFS deployment itself
I can try to join the next Pangeo weekly video meeting and chat with folks about it if there's interest
Ryan Abernathey
@rabernat
:+1:
Aimee Barciauskas
@abarciauskas-bgse
@rabernat as noted in the rechunker repo, I’m working on rechunking a zarr store using an S3 file system and a local dask cluster operating on an r5.16xlarge (20 workers). The rechunking (estimated time to complete right now would be about 100 hours) is going very slowly even though everything is in us-west-2. Workers have enough memory (using 30-40%) but are maxing out CPU (>100%, each worker has 4 cores). Any advice? Should I try an instance with more CPU? (Should I also post this question on pangeo-data discourse)?
Ryan Abernathey
@rabernat
Open an issue in rechunker with all the details, and we will help you debug
Aimee Barciauskas
@abarciauskas-bgse
:+1:
Aimee Barciauskas
@abarciauskas-bgse
Chelle Gentemann
@cgentemann
A new JAXA instrument, launch ~2024 is asking for advice about file formats. They have two suggested models, both netcdf4, cf-compliant. 1) data organized into groups. 2) data not organized into groups. I generally don't like groups because of the way you have to load them in xarray as separate datasets. Are there any reasons to prefer / not perfer groups? Thanks.
Joe Hamman
@jhamman
My agrument against using netcdf groups usually comes down to avoiding complexity of linked data between groups. From a purely practical perspective, its quite nice for a group to be self contained (including all its own necessary metadata) without references to other groups. In my experience, data providers that choose to go the groups route, get too clever with the groups concept (usually to save space) and that makes use of the data harder for everyone else.
Ryan Abernathey
@rabernat
Weekly checkin meeting in 5 minutes!
Chelle Gentemann
@cgentemann
link?
Tom Augspurger
@TomAugspurger
Daniel Rothenberg
@darothen
@cgentemann - the easiest place to reach me is my work e-mail, daniel@climacell.co. I'll also drop my contact info in the Google Docs meeting notes
Ryan Abernathey
@rabernat
Are all the ESIP videos online? I can't find them.
Scott Henderson
@scottyhq
maybe not all of them are posted, and the links aren’t obvious. But for this session https://2020esipsummermeeting.sched.com/event/cIvF you can scroll down to links to ‘View Recording’ and ‘View Session Notes'
Joe Hamman
@jhamman
Long overdue pangeo coffee break today at 8a pt.
Tom Augspurger
@TomAugspurger
Sorry thunderstorm going over. Had a power flicker.
And my modem isn’t on a UPS.
Losing power now so probably won’t be able to rejoin.