- 18:41keewis commented #3910
- 18:38max-sixty commented #3910
- 18:27mancellin opened #3910
- 18:18
max-sixty on master

update the docstring of diff (#… (compare)

- 18:18max-sixty closed #3909
- 18:18max-sixty closed #1040
- 17:24keewis commented #3909
- 17:24keewis synchronize #3909
- 16:55max-sixty synchronize #3905
- 16:54max-sixty commented #3909
- 16:51stale[bot] closed #2023
- 16:35keewis edited #3909
- 16:35keewis synchronize #3909
- 16:32keewis opened #3909
- 16:12smartass101 commented #1040
- 14:57pep8speaks commented #3816
- 14:57dcherian synchronize #3816
- 14:46dcherian commented #3908
- 14:39miniufo closed #3896
- 14:39miniufo commented #3896

not sure why that throws a memory error though

If I plot and add the values “by hand”, I don’t get missing value stripes, but if I do it in an assignment, they appear. What am I doing wrong?

dear all:

I have an xarray with multiple coordinates along a single dimension. In the example below, coords `a`

and `b`

are defined along dimension `dim1`

. How would I `groupby`

using two coordinates that are defined along the same dimension(s)? Unlike this question, I am not trying to group along different dimensions, but a single one.

```
import xarray as xr
d = xr.DataArray([[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12]],
coords={
'a': ('dim1',['A', 'A', 'B', 'B']),
'b': ('dim1',['1', '2', '1', '2']),
'c': ('dim2',['x', 'y', 'z'])
},
dims=['dim1', 'dim2'])
d.groupby(['a','b']) # this gives: TypeError: `group` must be an xarray.DataArray or the name of an xarray variable or dimension
```

Or should `a`

and `b`

not be coords but `xr.IndexVariables`

? And if so, how do you `groupby`

then?

@Patrickens_gitlab hi, as you can see in the error message, this is a typerror. Groupby is a dataset function; have a look here http://xarray.pydata.org/en/stable/api.html

you are having a Dataarray, but you can transform via d = d.rename("foo").to_dataset()

My current solution is to construct a new coordinate myself, and then to groupby that and later delete it.

```
def test(xarr):
return xarr / xarr.argmax(dim='dim2')
d = xr.DataArray([[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12]],
coords={
'a': ('dim1',['A', 'A', 'B', 'B']),
'b': ('dim1',['1', '2', '1', '2']),
'c': ('dim2',['x', 'y', 'z'])
},
dims=['dim1', 'dim2'])
def construct_grouper(xarr, groups):
# only works for coords that have the same dimensions
old_groups = np.squeeze(np.dstack([xarr[group].values for group in groups]))
dtypes = []
for group in groups:
dtype = xarr[group].dtype
if dtype == object:
dtype = xarr[group].astype('U').dtype
dtypes.append((group,dtype.str))
new_groups = np.zeros(xarr[groups[0]].shape, dtype=dtypes)
slicer = [range(n) for n in new_groups.shape]
slicer.append([slice(None)])
for combo in product(*slicer):
new_groups[combo[:-1]] = tuple(old_groups[combo]) # TODO: maybe look whether np.vecorize returns array of tuples
return new_groups
def groupby_apply_same_dims(xarr, groups, func, func_kwargs={}):
grouper = construct_grouper(xarr=xarr, groups=groups)
xarr = xarr.assign_coords(coords={'grouper':(xarr[groups[0]].dims, (grouper))})
xarr = xarr.groupby('grouper', restore_coord_dims=False).apply(func)
del xarr['grouper']
return xarr
groupby_apply_same_dims(d, ['a','b'], test)
```

but this seems terribly inefficient

now imagine

`dim1`

has extra another coord `d`

and I would like to groupby `a`

and `d`

or `b`

and `d`

, and I also have many coords that I would like to groupby in different ways along `dim2`

I’m having some trouble with multi-level indexing and looking for a quick (and scalable solution):

```
da = xr.DataArray(np.arange(3), dims=['x'], coords=[[1,2,3]])
>>> <xarray.DataArray (x: 3)>
array([0, 1, 2])
Coordinates:
* x (x) int64 1 2 3
# I want to do the following all at once. I.e., select different combinations of `x`
# over some additional dimension (e.g. `y`) which is duplicates of `x`
da[[0,0,1]]
da[[0,1,1]]
# Perhaps some form of multi-level indexing after creating the arbitrary dimension `y`?
da = da.expand_dims({'y': 2})
da
>>> <xarray.DataArray (y: 2, x: 3)>
array([[0, 1, 2],
[0, 1, 2]])
Coordinates:
* x (x) int64 1 2 3
Dimensions without coordinates: y
```

I want to do this without using

`xr.concat`

because it’s quite slow. I’m envisioning doing this 500-1000 times lets say. This is to be applied to resampling with replacement. We have an arbitrary dimension like `x`

here and need to take different combinations of `x`

with replacement over many iterations (e.g. a `bootstrap`

or `resample`

dimension)
I’m thinking of using

`expand_dims`

and then using vectorized/multi-level indexing to efficiently select different combinations of label `x`

all at once then applying the function to our resampled dataset. Any thoughts on this would help! We are getting really slow `dask`

task graphs to build by doing all this for each individual iteration and then concatenating it all in the end.
Here is a working example in `numpy`

of what I’m aiming for:

```
member = np.arange(10)
time = np.arange(1900, 1990)
X = np.random.randn(len(member), len(time))
N = 100
idx = np.random.randint(0, member.size, (N, member.size))
X[idx].shape
>>> (100,10,90)
```

However, this doesn’t work for `dask`

arrays or `xarray`

DataArrays. Any idea on how to extend it to either? We anticipate working with/creating resampled arrays larger than memory so we don’t want to go the `numpy`

route.

```
member = np.arange(10)
time = np.arange(1990, 1995)
X = xr.DataArray(np.random.randn(len(member), len(time)),
dims=['member', 'time'],
coords=[member, time])
N = 100
idx = np.random.randint(0, member.size, (N, member.size))
X[idx].shape
```

`IndexError: Unlabeled multi-dimensional array cannot be used for indexing: member`

The solution was to use xr.apply_ufunc:

```
import numpy as np
import xarray as xr
np.random.seed(100)
lon = np.arange(3)
lat = np.arange(4)
time = np.arange(5)
member = np.arange(6)
x = np.random.randn(len(member), len(time), len(lat), len(lon))
n = 7
idx = np.random.randint(0, member.size, (n, member.size))
x2 = x[idx].mean() # want the same as this
x_da = xr.DataArray(
x, dims=('member', 'time', 'lat', 'lon'),
coords={'member': member, 'time': time, 'lat': lat, 'lon': lon}
)
idx_da = xr.DataArray(
idx, dims=('samples', 'member'),
coords=({'samples': range(n), 'member': member})
)
def xr_broadcast(x, idx):
return np.moveaxis(x.squeeze()[idx.squeeze().transpose()], 0, -1)
x2_da = xr.apply_ufunc(xr_broadcast, x_da, idx_da).mean() # xarray version
x2 == x2_da
```

however, this breaks if one of the dimensions size == 1 because of the squeeze

`ds.indexes[key]`

gives you the underlying Pandas indexes.
@DancingQuanta perhaps you want to propose a change to these docs: http://xarray.pydata.org/en/stable/indexing.html#underlying-indexes?

What can I do to control legend position?

FYI pydata/xarray#3862 was just merged. If you are assigning to the *dimension coordinates*, it will now raise an error. This assignment has been broken for a while (pydata/xarray#3470). Please use

`.values`

or `.data`

attribute of `.assign_coords`

instead
Also

```
g = data.plot.scatter(x='Bext', y='y1', row='I', col='direction')
ax = g.axes.flat[0]
handles, labels = ax.get_legend_handles_labels()
```

shows empty handles. Where do I get access to data?

`You can force a legend instead of a colorbar by setting hue_style='discrete'. Additionally, the boolean kwarg add_guide can be used to prevent the display of a legend or colorbar (as appropriate).`

from https://xarray.pydata.org/en/stable/plotting.html#datasets. Looks like we need a graphical example if you have the time to contribute one
What is the easiest way to do `df.asfreq`

in xarray?

```
In [10]: index = pd.to_datetime(["1991-01-01 10:00", "1991-01-02 10:00", "1991-01-03 10:00", "1991-01-04 10:00",])
In [11]: df = pd.DataFrame(range(4), index=index)
In [12]: df
Out[12]:
0
1991-01-01 10:00:00 0
1991-01-02 10:00:00 1
1991-01-03 10:00:00 2
1991-01-04 10:00:00 3
In [13]: df.asfreq("2D")
Out[13]:
0
1991-01-01 10:00:00 0
1991-01-03 10:00:00 2
```

Note that this is different from `.resample("2D").asfreq()`

which would pin the timestamps to the day.

Right now, I'm just converting my time coordinate to a pd dataframe, doing the above in pandas, and then doing a

`.reindex`

using the new index that pandas computes. Wondering if there is a better way.
Lets say I have a Dataset with a variable which has the dimensions subjects X epochs X channels. Subjects also have age. I want the age values to be "attached" to subject coordinates, so that if I drop subjects, or performed any kind of operation, I can still link my variable with the age without having to carry around an age lookup table that I always need to "align" with the subject coordinate. Ideally I would have added added the age as a kind of "linked-coordinate" that is attached to the subject coordinate and that can be used in place of the subject. Kind of a synonym coordinate if you want. From what I looked at, there is no such a thing in XArray. So I added the age as a supplementary variable to the dataset, sharing the subject coordinate. But this has side effect. For examples, operation aggregating operations that I want to run on my main variable gets also run on my secondary "age" variable, which I clearly don't want to be impacted by these operation. That would work if there was something like Dataset.median(some_dim, ..., variable="my_variable") that can be used to tell xarray to apply some function like the median only to one of the variable of the a Dataset, but it seems that there is not this possibility either. Any idea how to deal with this use-case in a clean way?

This is a relatively generic use case that I think is of use for many situations when some coordinate is identifying some identity (e.g., a subject) which has some properties (e.g., age, gender, etc.) that we eventually correlate with the variable of interest...

I'm looking for a nice datastructure to represent a collection of arrays that represent data sampled from the same space (i.e., same axis names) but at different levels of resolution. The constituent arrays can be DataArrays, but the collection can't be a

`Dataset`

, because it looks like `Dataset`

wants all the arrays to have different dimensions
This recent discussion of multiscale arrays with some references to Xarray may interest you: zarr-developers/zarr-specs#50

basically a metadata-ful group of arrays in the same physical space, but at different resolutions.