Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
  • Aug 08 12:34
    github-actions[bot] unlabeled #190
  • Aug 08 12:34
    github-actions[bot] closed #873
  • Aug 08 12:34
    github-actions[bot] commented #873
  • Aug 08 06:13
    yuvipanda commented #190
  • Aug 07 23:06
    yuvipanda reopened #190
  • Aug 02 15:47
    jameshalgren commented #285
  • Aug 02 12:38
    github-actions[bot] closed #871
  • Aug 02 12:38
    github-actions[bot] commented #871
  • Aug 02 12:38
    github-actions[bot] closed #872
  • Aug 02 12:38
    github-actions[bot] commented #872
  • Jul 31 12:31
    github-actions[bot] labeled #873
  • Jul 31 12:30
    github-actions[bot] commented #873
  • Jul 25 12:35
    github-actions[bot] labeled #871
  • Jul 25 12:35
    github-actions[bot] commented #871
  • Jul 25 12:35
    github-actions[bot] labeled #872
  • Jul 25 12:35
    github-actions[bot] commented #872
  • Jul 06 23:36
    rok commented #285
  • Jun 28 21:48
    jameshalgren commented #285
  • Jun 28 20:50
    TomAugspurger commented #285
  • Jun 28 17:48
    jameshalgren commented #285
Ryan Abernathey
@rabernat
What repos are you launching?
MikeBeller
@MikeBeller
Yes this binder works fine if launched using the "launch binder" button on that page. It's exciting to watch the Dask cluster! :-) ( I'm not sure it matters but it only works fine if you launch it from that button. If you try to put the URL of the github repo into binder.pangeo.io it doesn't work.) So getting back to what I was doing: I am using this repository: https://github.com/pangeo-data/pangeo-example-notebooks -- I tried launching from the "launch binder" button in the readme. I also tried launching by taking the repo URL (https://github.com/pangeo-data/pangeo-example-notebooks.git) and putting it into the GIT URL on binder.pangeo.io, and clicking "launch". Regardless of which notebook I open in the binder, when I get to the gateway.new_cluster() call, it always fails with that same basic error.
12 replies
Chris Erdmann
@libcce
Hi all! I've been mulling over an idea... @jbusecke created this cookie cutter https://github.com/jbusecke/cookiecutter-science-project had me thinking about how we often try to tell authors about practices earlier on in their projects that could help them later, like at publication, when they have to share their research outcomes (data, software, notebooks, etc). A cookie cutter seemed like a nice forkable approach to communicate our guidance. Also was thinking it is a nice flexible solution vs some of the admin tools that we have to use such as the DMP Tool. What if you could generate statements, etc more easily from a yaml file... there is this nice project coming out of AUS called RAID that is an identifier for projects and a wrapper for all the other identifiers (will ultimately be available via DataCite). @cgentemann also mentioned an example https://ncsu-libraries.github.io/jekyll-academic-docs/ where a cookie cutter can be this easy to use like the Jekyll Academic instructions. First post here and an idea inspired by @jbusecke but wondering if anyone is interested and would like to brainstorm further?
5 replies
Anne Fouilloux
@annefou
I am teaching master students, PhD students, postdocs and researchers about Pangeo. What is the most recommended way/package to transform vertical coordinates? We are using Pangeo CMIP6 (atmosphere data) with "atmosphere_hybrid_sigma_pressure_coordinate" or "alevel" and would like to compare models (for instance transform them on the same pressure levels). Can xgcm do that?
It can definitely do it. But we could use more examples for the documentation.
Anne Fouilloux
@annefou
Thanks. I will try to make new examples.
Docetom
@Docetom
I have just started to use pangeo.io yesterday and found it interesting to use. I launched into the binder and duplicated one of the examples then named it. Later on, restarting my system, I could not find the codes again, making me start afresh. What is it that I am not doing right? I wanted to sign up but did not see where or how to if that will save me. Please kindly direct me accordingly.
1 reply
Robin Wilson
@robintw
Hi everyone - I'm starting to use some of the pangeo tools and technologies in a workflow involving STAC, COGs and Dask. I'm trying to use xbatcher to extract batches of patches from my xarrays for machine learning, but I'm struggling with a few things. The most important issue for me is that I'm struggling to put the patches back together again at the end - the co-ordinates from the original DataArray don't seem to have been preserved.
I've created an issue on the xbatcher Github (pangeo-data/xbatcher#37) and it'd be great if anyone has any ideas.
Ryan Abernathey
@rabernat
Hi Robin, thanks for your feedback! Xbatcher is very new and untested in many use cases. Your input is really valuable. Someone will get back to you on the issue you opened.
Joe Hamman
@jhamman
Yeah, thanks @robintw for the ping. I’ll try to respond today.
Robin Wilson
@robintw
That's wonderful, thank you so much @rabernat and @jhamman
Martin Durant
@martindurant
Do we have a location/issue/PR where bringing analytic coordinates to xarray is discussed?
James A. Bednar
@jbednar
(And linked PRs)
Martin Durant
@martindurant
Thank you
Ryan Abernathey
@rabernat
This project board is also a useful place to track progress on flexible indexes: https://github.com/pydata/xarray/projects/1
Joe Hamman
@jhamman
There is also a regular (weekly) meeting on the subject: pydata/xarray#5452
James A. Bednar
@jbednar
So much going on!
ben
@ben:matrix.nrp-nautilus.io
[m]
Hi all, I'm trying to deploy pangeo on a k8s cluster and was wondering where is the appropriate place to post a technical question. Is there a pangeo chat room for topics such as deployment-related questions?
Ryan Abernathey
@rabernat
this is that room :)
but you'd be better of on our discourse
ben
@ben:matrix.nrp-nautilus.io
[m]
thanks, Ryan. I posted over there.
Alejandro ©
@acocac
Hi all! not sure if this is the right channel to raise issues related to the Pangeo binder, but I've experimented some recent issues to launch it. Following pangeo-data/pangeo-binder#192, I'm testing a short term solution which suggests setting a Pangeo binder template, but it still returns Failed to create temporary user for gcr.io/pangeo-181919/prod-acocac-2dpangeo-2dbinder-2dtemplate-0d9a05:c3201d8e480aba5d9ebad622a251fb4660ba1021.
7 replies
till90
@till90
Hi folks, I try to get soil moisture values from sentinel 1 backscatter values with pythons keras and tensorflow libraries. So far I try it with linear regression but the CNN I train don't take in account that the values i put in for training are in relation to its previous or next value. Like it is in real world for time series soil moisture. Now i try to find some resources to learn time series ?regression? I wonder about some terms, maybe because my mother language isn't English. When I search for example "time series regression in combination with MLP, CNN or other ML keywords" i always end up with the term forecasting? I translate this with future value prediction. But I don't want to predict the future, I want to train a model with sentinel 1 backscatter values link to real world soil moisture. After training i want to get soil moisture time series from sentinel 1 backscatter time series. Can you give me some keywords I can google to research the right topic. Thanks a lot and cheers!
Shane Mill
@ShaneMill1
Hi all, I have been looking a lot into an infrastructure where AWS Lambda could connect with a Dask cluster. For example, the python Lambda function could read Zarr stored in S3 into an Xarray dataset object, and return the JSON to API Gateway. What would be the best practice for this using Dask? Could you create a Cluster using Fargate, Kubernetes, or Parallel Cluster and somehow have the distributed client within the Lambda python script connect to that? Could Lambda have a LocalCluster? I am not fully sure what is possible and would love some insights. Came across this article from several years ago which seems to talk about something similar: https://medium.com/informatics-lab/exploring-dask-and-distributed-on-aws-lambda-55d81d9641d. Thanks again, any feedback is greatly appreciated!!
Ryan Abernathey
@rabernat
Dask cloudprovider can deploy on fargate: https://cloudprovider.dask.org/en/latest/
But from what you described, I'm not sure why you need Dask here. Is there any computation involved?
Shane Mill
@ShaneMill1
The main benefit of Dask here would be the asynchronous computing of the data chunks coming from Zarr. Just to allow for faster computation when invoking xarray.dataset.to_dict()
Ryan Abernathey
@rabernat
:+1:
Shane Mill
@ShaneMill1
The overall thought is an API that uses API Gateway to call a lambda function. Based on user provided query parameters, Xarray loads the data from Zarr, does the trimming and slicing, and returns the data response to API Gateway.
Ryan Abernathey
@rabernat
I've always seen Dask and serverless as two different distributed computing paradigms
Shane Mill
@ShaneMill1
As always, thanks for the help. Still new to Lambda and Fargate, so still trying to workout the details
Ryan Abernathey
@rabernat
With serverless, you invoke a lambda function many times to achieve parallelism. This is good for embarrassingly parallel operations.
With dask, you have a cluster that schedules jobs in parallel. It's a more sophisticated form of parallelism.
I'd be interested to learn about how they can be combined
I think lambda would be a great candidate for a subsetting service. Like if you have a giant (many TB) Zarr on S3, and you want to produce a netCDF subset, lambda would be great for that
Shane Mill
@ShaneMill1
Exactly! that is exactly what we are looking at prototyping here
Ryan Abernathey
@rabernat
In that scenario, you wouldn't want to use a dask cluster inside the lambda function but rather just the simple (default) multithreaded scheduler
Shane Mill
@ShaneMill1
Okay interesting! I will look at that
Joe Hamman
@jhamman
@/all - please take a few minutes to fill out the 2022 Xarray user survey: https://docs.google.com/forms/d/e/1FAIpQLSfnMd8UsC1XP1lPuFczl148VfpmwnFu4a0Z94odt1L6U0R0Pw/viewform?usp=sf_link
Rich Signell
@rsignell-usgs
Done! Thanks @jhamman !
James A. Bednar
@jbednar
Hey! It's supposed to be anonymous :-) But yes, I'm done too.
Joe Hamman
@jhamman
It can be as anonymous as you want it to be. Thanks both for filling it out.
Martin Durant
@martindurant
@rabernat , you had mentioned before that these were efforts to enable parameterised/analytic coordinates in xarray, as opposed to materialised grids. Where can I find out about progress on this?
Benoit Bovy
@benbovy
@martindurant such coordinates would be enabled by the flexible indexes refactor, you can track the progress in pydata/xarray#6293 and https://github.com/pydata/xarray/projects/1. (also pydata/xarray#3620 specific to functional coordinates).
Martin Durant
@martindurant
Thank you, and sorry that I must have asked about this before and forgotten...
1 reply
James A. Bednar
@jbednar
1 reply