Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
  • 18:41
    keewis commented #3910
  • 18:38
    max-sixty commented #3910
  • 18:27
    mancellin opened #3910
  • 18:18

    max-sixty on master

    update the docstring of diff (#… (compare)

  • 18:18
    max-sixty closed #3909
  • 18:18
    max-sixty closed #1040
  • 17:24
    keewis commented #3909
  • 17:24
    keewis synchronize #3909
  • 16:55
    max-sixty synchronize #3905
  • 16:54
    max-sixty commented #3909
  • 16:51
    stale[bot] closed #2023
  • 16:35
    keewis edited #3909
  • 16:35
    keewis synchronize #3909
  • 16:32
    keewis opened #3909
  • 16:12
    smartass101 commented #1040
  • 14:57
    pep8speaks commented #3816
  • 14:57
    dcherian synchronize #3816
  • 14:46
    dcherian commented #3908
  • 14:39
    miniufo closed #3896
  • 14:39
    miniufo commented #3896
Andy Traumüller
@TheSwaine
@pgierz hi there, try something like xr.open_dataset("/work/ba0989/a270077/coupled_ice_paper/model_data/coupled/LIG_coupled/outdata/fesom//LIG_coupled_fesom_thetao_19680101.nc", chunks={'depth': 2, 'time': 10})
Paul Gierz
@pgierz
Thanks @TheSwaine, turns out that it was a problem from earlier — my files had different internal structures which I didn’t notice
not sure why that throws a memory error though
Paul Gierz
@pgierz
Hello, I am having trouble assigning values to a new output dataset. For some reason, I’m getting missing values
Screenshot 2020-02-24 at 11.27.25.png
If I plot and add the values “by hand”, I don’t get missing value stripes, but if I do it in an assignment, they appear. What am I doing wrong?
James A. Bednar
@jbednar
Usually with xarray I've found behavior like that to be due to broadcasting rules or coordinate handling that I've failed to understand, where it does something that's well defined but totally not what I expected. Not sure in this specific case...
Deepak Cherian
@dcherian

@pgierz some latitudes on the RHS are not exactly equal to latitudes on the LHS.

out_ds["precip_0"] = (...).assign_coords(latitude=out_ds.latitude)

James A. Bednar
@jbednar
Yes, something like that!
Thomas Diederen
@Patrickens_gitlab

dear all:

I have an xarray with multiple coordinates along a single dimension. In the example below, coords a and b are defined along dimension dim1. How would I groupby using two coordinates that are defined along the same dimension(s)? Unlike this question, I am not trying to group along different dimensions, but a single one.

import xarray as xr

d = xr.DataArray([[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12]],
    coords={
        'a': ('dim1',['A', 'A', 'B', 'B']),
        'b': ('dim1',['1', '2', '1', '2']),
        'c': ('dim2',['x', 'y', 'z'])
    },
    dims=['dim1', 'dim2'])
d.groupby(['a','b']) # this gives: TypeError: `group` must be an xarray.DataArray or the name of an xarray variable or dimension

Or should a and b not be coords but xr.IndexVariables? And if so, how do you groupby then?

Andy Traumüller
@TheSwaine
@Patrickens_gitlab hi, as you can see in the error message, this is a typerror. Groupby is a dataset function; have a look here http://xarray.pydata.org/en/stable/api.html
you are having a Dataarray, but you can transform via d = d.rename("foo").to_dataset()
Thomas Diederen
@Patrickens_gitlab
@TheSwaine Yes, I see. The problem is that I have many annotations along dimension(s) (e.g. a and b in the example above) and that I would like to groupby them in different ways.
My current solution is to construct a new coordinate myself, and then to groupby that and later delete it.
def test(xarr):
    return xarr / xarr.argmax(dim='dim2')

d = xr.DataArray([[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12]],
    coords={
        'a': ('dim1',['A', 'A', 'B', 'B']),
        'b': ('dim1',['1', '2', '1', '2']),
        'c': ('dim2',['x', 'y', 'z'])
    },
    dims=['dim1', 'dim2'])

def construct_grouper(xarr, groups):
    # only works for coords that have the same dimensions
    old_groups = np.squeeze(np.dstack([xarr[group].values for group in groups]))
    dtypes = []
    for group in groups:
        dtype = xarr[group].dtype

        if dtype == object:
            dtype = xarr[group].astype('U').dtype
        dtypes.append((group,dtype.str))
    new_groups = np.zeros(xarr[groups[0]].shape, dtype=dtypes)

    slicer = [range(n) for n in new_groups.shape]
    slicer.append([slice(None)])

    for combo in product(*slicer):
        new_groups[combo[:-1]] = tuple(old_groups[combo]) # TODO: maybe look whether np.vecorize returns array of tuples
    return new_groups

def groupby_apply_same_dims(xarr, groups, func, func_kwargs={}):
    grouper = construct_grouper(xarr=xarr, groups=groups)
    xarr = xarr.assign_coords(coords={'grouper':(xarr[groups[0]].dims, (grouper))})
    xarr = xarr.groupby('grouper', restore_coord_dims=False).apply(func)
    del xarr['grouper']
    return xarr

groupby_apply_same_dims(d, ['a','b'], test)
but this seems terribly inefficient
now imagine dim1 has extra another coord d and I would like to groupby a and d or b and d, and I also have many coords that I would like to groupby in different ways along dim2
Riley Brady
@bradyrx
I’m having some trouble with multi-level indexing and looking for a quick (and scalable solution):
da = xr.DataArray(np.arange(3), dims=['x'], coords=[[1,2,3]])
>>> <xarray.DataArray (x: 3)>
    array([0, 1, 2])
    Coordinates:
      * x        (x) int64 1 2 3

# I want to do the following all at once. I.e., select different combinations of `x`
# over some additional dimension (e.g. `y`) which is duplicates of `x`
da[[0,0,1]]
da[[0,1,1]]

# Perhaps some form of multi-level indexing after creating the arbitrary dimension `y`?
da = da.expand_dims({'y': 2})
da
>>> <xarray.DataArray (y: 2, x: 3)>
array([[0, 1, 2],
           [0, 1, 2]])
    Coordinates:
      * x        (x) int64 1 2 3
    Dimensions without coordinates: y
I want to do this without using xr.concat because it’s quite slow. I’m envisioning doing this 500-1000 times lets say. This is to be applied to resampling with replacement. We have an arbitrary dimension like x here and need to take different combinations of x with replacement over many iterations (e.g. a bootstrap or resample dimension)
I’m thinking of using expand_dims and then using vectorized/multi-level indexing to efficiently select different combinations of label x all at once then applying the function to our resampled dataset. Any thoughts on this would help! We are getting really slow dask task graphs to build by doing all this for each individual iteration and then concatenating it all in the end.
Riley Brady
@bradyrx

Here is a working example in numpy of what I’m aiming for:

member = np.arange(10)
time = np.arange(1900, 1990)
X = np.random.randn(len(member), len(time))
N = 100
idx = np.random.randint(0, member.size, (N, member.size))
X[idx].shape
>>> (100,10,90)

However, this doesn’t work for dask arrays or xarray DataArrays. Any idea on how to extend it to either? We anticipate working with/creating resampled arrays larger than memory so we don’t want to go the numpy route.

member = np.arange(10)
time = np.arange(1990, 1995)
X = xr.DataArray(np.random.randn(len(member), len(time)),
                 dims=['member', 'time'],
                 coords=[member, time])
N = 100
idx = np.random.randint(0, member.size, (N, member.size))
X[idx].shape
IndexError: Unlabeled multi-dimensional array cannot be used for indexing: member
Andrew
@IAteAnDrew1_twitter

The solution was to use xr.apply_ufunc:

import numpy as np
import xarray as xr

np.random.seed(100)
lon = np.arange(3)
lat = np.arange(4)
time = np.arange(5)
member = np.arange(6)
x = np.random.randn(len(member), len(time), len(lat), len(lon))

n = 7
idx = np.random.randint(0, member.size, (n, member.size))
x2 = x[idx].mean()  # want the same as this

x_da = xr.DataArray(
    x, dims=('member', 'time', 'lat', 'lon'),
    coords={'member': member, 'time': time, 'lat': lat, 'lon': lon}
)
idx_da = xr.DataArray(
    idx, dims=('samples', 'member'),
    coords=({'samples': range(n), 'member': member})
)

def xr_broadcast(x, idx):
    return np.moveaxis(x.squeeze()[idx.squeeze().transpose()], 0, -1)

x2_da = xr.apply_ufunc(xr_broadcast, x_da, idx_da).mean()  # xarray version

x2 == x2_da

however, this breaks if one of the dimensions size == 1 because of the squeeze

Andrew Tolmie
@DancingQuanta
Hi is there an equivalent of pandas's get_loc for xarray?
Joe Hamman
@jhamman
ds.indexes[‘x’].get_loc(‘a’)
ds.indexes[key] gives you the underlying Pandas indexes.
Andrew Tolmie
@DancingQuanta
Oh thank you! I think this is not documented
Julia Signell
@jsignell
@DancingQuanta perhaps you want to propose a change to these docs: http://xarray.pydata.org/en/stable/indexing.html#underlying-indexes?
Andrew Tolmie
@DancingQuanta
Thanks, missed that. Also I have a plotting issue. data.plot(x='index', hue='segments', ax=ax2, add_legend=False, **marker_style) and handles, labels = ax2.get_legend_handles_labels() return empty list. I was trying to move the legend around.
What can I do to control legend position?
Deepak Cherian
@dcherian
FYI pydata/xarray#3862 was just merged. If you are assigning to the .values or .data attribute of dimension coordinates, it will now raise an error. This assignment has been broken for a while (pydata/xarray#3470). Please use .assign_coords instead
Andrew Tolmie
@DancingQuanta
Good morning. Is it possible to overlay multiple facetgrid? I have a scatter plot and line plot that I wanted to overlay in a facetgrid.
Deepak Cherian
@dcherian
You can plot to each individual axes using FacetGrid.Axes but not otherwise
Andrew Tolmie
@DancingQuanta
Hm, I tried hue for scatter plots but that creates a colorbar. Do FacetGrid allow for legends?

Also

g = data.plot.scatter(x='Bext', y='y1', row='I', col='direction')
ax = g.axes.flat[0]
handles, labels = ax.get_legend_handles_labels()

shows empty handles. Where do I get access to data?

Deepak Cherian
@dcherian
You can force a legend instead of a colorbar by setting hue_style='discrete'. Additionally, the boolean kwarg add_guide can be used to prevent the display of a legend or colorbar (as appropriate).
from https://xarray.pydata.org/en/stable/plotting.html#datasets. Looks like we need a graphical example if you have the time to contribute one
Akihiro Matsukawa
@amatsukawa

What is the easiest way to do df.asfreq in xarray?

In [10]: index = pd.to_datetime(["1991-01-01 10:00", "1991-01-02 10:00", "1991-01-03 10:00", "1991-01-04 10:00",])

In [11]: df = pd.DataFrame(range(4), index=index)

In [12]: df
Out[12]:
                     0
1991-01-01 10:00:00  0
1991-01-02 10:00:00  1
1991-01-03 10:00:00  2
1991-01-04 10:00:00  3

In [13]: df.asfreq("2D")
Out[13]:
                     0
1991-01-01 10:00:00  0
1991-01-03 10:00:00  2

Note that this is different from .resample("2D").asfreq() which would pin the timestamps to the day.

Right now, I'm just converting my time coordinate to a pd dataframe, doing the above in pandas, and then doing a .reindex using the new index that pandas computes. Wondering if there is a better way.
Christian O'Reilly
@christian-oreilly
Lets say I have a Dataset with a variable which has the dimensions subjects X epochs X channels. Subjects also have age. I want the age values to be "attached" to subject coordinates, so that if I drop subjects, or performed any kind of operation, I can still link my variable with the age without having to carry around an age lookup table that I always need to "align" with the subject coordinate. Ideally I would have added added the age as a kind of "linked-coordinate" that is attached to the subject coordinate and that can be used in place of the subject. Kind of a synonym coordinate if you want. From what I looked at, there is no such a thing in XArray. So I added the age as a supplementary variable to the dataset, sharing the subject coordinate. But this has side effect. For examples, operation aggregating operations that I want to run on my main variable gets also run on my secondary "age" variable, which I clearly don't want to be impacted by these operation. That would work if there was something like Dataset.median(some_dim, ..., variable="my_variable") that can be used to tell xarray to apply some function like the median only to one of the variable of the a Dataset, but it seems that there is not this possibility either. Any idea how to deal with this use-case in a clean way?
This is a relatively generic use case that I think is of use for many situations when some coordinate is identifying some identity (e.g., a subject) which has some properties (e.g., age, gender, etc.) that we eventually correlate with the variable of interest...
Andrew Tolmie
@DancingQuanta
Morning, is there a way to make Dataset.plot.scatter display lines plots?
James A. Bednar
@jbednar
Dataset.plot.line?
Deepak Cherian
@dcherian
you could hack it. scatter switches from scatter to plot when hue_style="discrete"; so do that and pass linestyle?
Davis Bennett
@d-v-b
does anyone here have experience using DataArrays for representing image pyramids?
I'm looking for a nice datastructure to represent a collection of arrays that represent data sampled from the same space (i.e., same axis names) but at different levels of resolution. The constituent arrays can be DataArrays, but the collection can't be a Dataset, because it looks like Dataset wants all the arrays to have different dimensions
Tim Crone
@tjcrone
This recent discussion of multiscale arrays with some references to Xarray may interest you: zarr-developers/zarr-specs#50
Davis Bennett
@d-v-b
yeah, i'm in that comment thread :)
Tim Crone
@tjcrone
Sorry!
Davis Bennett
@d-v-b
but there's no discussion in there for the right in-memory datastructure for representing a multiresolution pyramid
basically a metadata-ful group of arrays in the same physical space, but at different resolutions.