Windowed Computations
Contents

Windowed Computations#
Xarray has built-in support for windowed operations:
In this notebook, we’ll learn to
Compute rolling, or sliding window, means along one or more dimensions.
Compute block averages along a dimension.
Use
construct
to reshape arrays so that a new dimension provides windowed views to the data.
import numpy as np
import xarray as xr
import matplotlib.pyplot as plt
%config InlineBackend.figure_format='retina'
ds = xr.tutorial.load_dataset("ersstv5")
ds
<xarray.Dataset> Dimensions: (lat: 89, lon: 180, time: 624, nbnds: 2) Coordinates: * lat (lat) float32 88.0 86.0 84.0 82.0 ... -82.0 -84.0 -86.0 -88.0 * lon (lon) float32 0.0 2.0 4.0 6.0 8.0 ... 352.0 354.0 356.0 358.0 * time (time) datetime64[ns] 1970-01-01 1970-02-01 ... 2021-12-01 Dimensions without coordinates: nbnds Data variables: time_bnds (time, nbnds) float64 9.969e+36 9.969e+36 ... 9.969e+36 9.969e+36 sst (time, lat, lon) float32 -1.8 -1.8 -1.8 -1.8 ... nan nan nan nan Attributes: (12/37) climatology: Climatology is based on 1971-2000 SST, Xue, Y.... description: In situ data: ICOADS2.5 before 2007 and NCEP i... keywords_vocabulary: NASA Global Change Master Directory (GCMD) Sci... keywords: Earth Science > Oceans > Ocean Temperature > S... instrument: Conventional thermometers source_comment: SSTs were observed by conventional thermometer... ... ... creator_url_original: https://www.ncei.noaa.gov license: No constraints on data access or use comment: SSTs were observed by conventional thermometer... summary: ERSST.v5 is developed based on v4 after revisi... dataset_title: NOAA Extended Reconstructed SST V5 data_modified: 2022-06-07
Rolling or moving windows#
Rolling window operations
can be applied along any dimension, or along multiple dimensions.
returns object of same shape as input
pads with NaNs to make (3) possible
Again, all common reduction operations are available
rolling = ds.rolling(time=12, center=True)
rolling
DatasetRolling [time->12(center)]
ds_rolling = rolling.mean()
ds_rolling
<xarray.Dataset> Dimensions: (time: 624, nbnds: 2, lat: 89, lon: 180) Coordinates: * time (time) datetime64[ns] 1970-01-01 1970-02-01 ... 2021-12-01 * lat (lat) float32 88.0 86.0 84.0 82.0 ... -82.0 -84.0 -86.0 -88.0 * lon (lon) float32 0.0 2.0 4.0 6.0 8.0 ... 352.0 354.0 356.0 358.0 Dimensions without coordinates: nbnds Data variables: time_bnds (time, nbnds) float64 nan nan nan nan nan ... nan nan nan nan nan sst (time, lat, lon) float32 nan nan nan nan nan ... nan nan nan nan Attributes: (12/37) climatology: Climatology is based on 1971-2000 SST, Xue, Y.... description: In situ data: ICOADS2.5 before 2007 and NCEP i... keywords_vocabulary: NASA Global Change Master Directory (GCMD) Sci... keywords: Earth Science > Oceans > Ocean Temperature > S... instrument: Conventional thermometers source_comment: SSTs were observed by conventional thermometer... ... ... creator_url_original: https://www.ncei.noaa.gov license: No constraints on data access or use comment: SSTs were observed by conventional thermometer... summary: ERSST.v5 is developed based on v4 after revisi... dataset_title: NOAA Extended Reconstructed SST V5 data_modified: 2022-06-07
ds.sst.sel(lon=300, lat=50).plot(label="monthly anom")
ds_rolling.sst.sel(lon=300, lat=50).plot(label="12 month rolling mean")
plt.legend()
<matplotlib.legend.Legend at 0x7f5fdfd48d30>

We can apply rolling mean along multiple dimensions as a 2D smoother in (lat, lon). Here is an example of a 5-point running mean applied along both the lat
and lon
dimensions
extract = ds.sst.isel(time=0)
smoothed = extract.rolling(lon=5, lat=5, center=True).mean()
f, ax = plt.subplots(2, 1, sharex=True, sharey=True)
extract.plot(ax=ax[0], robust=True)
smoothed.plot(ax=ax[1], robust=True)
f.set_size_inches((10, 7))
plt.tight_layout()

Note the addition of NaNs at the data boundaries and near continental boundaries.
Custom reductions#
While common reductions are implemented by default, sometimes it is useful to apply our own windowed operations. For these uses, Xarray provides the construct
methods for DataArray.rolling and Dataset.rolling.
For rolling over a dimension time
with a window size N
, construct
adds a new dimension (with user-provided name) of size N
.
We illustrate with a simple example array:
simple = xr.DataArray(np.arange(10), dims="time", coords={"time": np.arange(10)})
simple
<xarray.DataArray (time: 10)> array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) Coordinates: * time (time) int64 0 1 2 3 4 5 6 7 8 9
We call construct
and provide a name for the new dimension: window
# adds a new dimension "window"
simple.rolling(time=5, center=True).construct("window")
<xarray.DataArray (time: 10, window: 5)> array([[nan, nan, 0., 1., 2.], [nan, 0., 1., 2., 3.], [ 0., 1., 2., 3., 4.], [ 1., 2., 3., 4., 5.], [ 2., 3., 4., 5., 6.], [ 3., 4., 5., 6., 7.], [ 4., 5., 6., 7., 8.], [ 5., 6., 7., 8., 9.], [ 6., 7., 8., 9., nan], [ 7., 8., 9., nan, nan]]) Coordinates: * time (time) int64 0 1 2 3 4 5 6 7 8 9 Dimensions without coordinates: window
Exercise#
Illustrate the difference between center=True
and center=False
for rolling by looking at the construct
-ed array.
display("center=True")
display(simple.rolling(time=5, center=True).construct("window"))
display("center=False")
display(simple.rolling(time=5, center=False).construct("window"))
'center=True'
<xarray.DataArray (time: 10, window: 5)> array([[nan, nan, 0., 1., 2.], [nan, 0., 1., 2., 3.], [ 0., 1., 2., 3., 4.], [ 1., 2., 3., 4., 5.], [ 2., 3., 4., 5., 6.], [ 3., 4., 5., 6., 7.], [ 4., 5., 6., 7., 8.], [ 5., 6., 7., 8., 9.], [ 6., 7., 8., 9., nan], [ 7., 8., 9., nan, nan]]) Coordinates: * time (time) int64 0 1 2 3 4 5 6 7 8 9 Dimensions without coordinates: window
'center=False'
<xarray.DataArray (time: 10, window: 5)> array([[nan, nan, nan, nan, 0.], [nan, nan, nan, 0., 1.], [nan, nan, 0., 1., 2.], [nan, 0., 1., 2., 3.], [ 0., 1., 2., 3., 4.], [ 1., 2., 3., 4., 5.], [ 2., 3., 4., 5., 6.], [ 3., 4., 5., 6., 7.], [ 4., 5., 6., 7., 8.], [ 5., 6., 7., 8., 9.]]) Coordinates: * time (time) int64 0 1 2 3 4 5 6 7 8 9 Dimensions without coordinates: window
Coarsening#
coarsen
does something similar to rolling
, but allows us to work with discrete non-overlapping blocks of data.
You will need to specify boundary
if the length of the dimension is not a multiple of the window size (“block size”). You can choose to
trim
the excess valuespad
with NaNs
Again, all standard reductions are implemented.
coarse = ds.coarsen(lon=5, lat=5)
coarse
DatasetCoarsen [windows->{'lon': 5, 'lat': 5},side->left]
# we expect an error here because lat has size 89, which is not divisible by block size 5
coarse.mean()
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[11], line 2
1 # we expect an error here because lat has size 89, which is not divisible by block size 5
----> 2 coarse.mean()
File /usr/share/miniconda3/envs/xarray-tutorial/lib/python3.10/site-packages/xarray/core/rolling.py:1106, in DatasetCoarsen._reduce_method.<locals>.wrapped_func(self, keep_attrs, **kwargs)
1104 reduced = {}
1105 for key, da in self.obj.data_vars.items():
-> 1106 reduced[key] = da.variable.coarsen(
1107 self.windows,
1108 func,
1109 self.boundary,
1110 self.side,
1111 keep_attrs=keep_attrs,
1112 **kwargs,
1113 )
1115 coords = {}
1116 for c, v in self.obj.coords.items():
1117 # variable.coarsen returns variables not containing the window dims
1118 # unchanged (maybe removes attrs)
File /usr/share/miniconda3/envs/xarray-tutorial/lib/python3.10/site-packages/xarray/core/variable.py:2449, in Variable.coarsen(self, windows, func, boundary, side, keep_attrs, **kwargs)
2446 if not windows:
2447 return self._replace(attrs=_attrs)
-> 2449 reshaped, axes = self.coarsen_reshape(windows, boundary, side)
2450 if isinstance(func, str):
2451 name = func
File /usr/share/miniconda3/envs/xarray-tutorial/lib/python3.10/site-packages/xarray/core/variable.py:2485, in Variable.coarsen_reshape(self, windows, boundary, side)
2483 if boundary[d] == "exact":
2484 if n * window != size:
-> 2485 raise ValueError(
2486 f"Could not coarsen a dimension of size {size} with "
2487 f"window {window} and boundary='exact'. Try a different 'boundary' option."
2488 )
2489 elif boundary[d] == "trim":
2490 if side[d] == "left":
ValueError: Could not coarsen a dimension of size 89 with window 5 and boundary='exact'. Try a different 'boundary' option.
coarse = ds.coarsen(lat=5, lon=5, boundary="trim").mean()
coarse
<xarray.Dataset> Dimensions: (time: 624, nbnds: 2, lat: 17, lon: 36) Coordinates: * lat (lat) float32 84.0 74.0 64.0 54.0 ... -46.0 -56.0 -66.0 -76.0 * lon (lon) float32 4.0 14.0 24.0 34.0 44.0 ... 324.0 334.0 344.0 354.0 * time (time) datetime64[ns] 1970-01-01 1970-02-01 ... 2021-12-01 Dimensions without coordinates: nbnds Data variables: time_bnds (time, nbnds) float64 9.969e+36 9.969e+36 ... 9.969e+36 9.969e+36 sst (time, lat, lon) float32 -1.757 -1.78 -1.8 ... -1.716 -1.685 nan Attributes: (12/37) climatology: Climatology is based on 1971-2000 SST, Xue, Y.... description: In situ data: ICOADS2.5 before 2007 and NCEP i... keywords_vocabulary: NASA Global Change Master Directory (GCMD) Sci... keywords: Earth Science > Oceans > Ocean Temperature > S... instrument: Conventional thermometers source_comment: SSTs were observed by conventional thermometer... ... ... creator_url_original: https://www.ncei.noaa.gov license: No constraints on data access or use comment: SSTs were observed by conventional thermometer... summary: ERSST.v5 is developed based on v4 after revisi... dataset_title: NOAA Extended Reconstructed SST V5 data_modified: 2022-06-07
coarse.sst.isel(time=0).plot();

Custom reductions#
Like rolling
, coarsen
also provides a construct
method for custom block operations.
Tip coarsen.construct
is a handy way to reshape Xarray objects.
Consider a “monthly” 1D timeseries. This simple example has one value per month for 2 years
months = xr.DataArray(
np.tile(np.arange(1, 13), reps=2),
dims="time",
coords={"time": np.arange(1, 25)},
)
months
<xarray.DataArray (time: 24)> array([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]) Coordinates: * time (time) int64 1 2 3 4 5 6 7 8 9 10 ... 15 16 17 18 19 20 21 22 23 24
Now we reshape to get one new dimension year
of size 12.
# break "time" into two new dimensions: "year", "month"
months.coarsen(time=12).construct(time=("year", "month"))
<xarray.DataArray (year: 2, month: 12)> array([[ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12], [ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]]) Coordinates: time (year, month) int64 1 2 3 4 5 6 7 8 9 ... 17 18 19 20 21 22 23 24 Dimensions without coordinates: year, month
Exercise#
Imagine the array months
was one element shorter. Use boundary="pad"
and the side
kwarg to reshape months.isel(time=slice(1, None))
to a 2D DataArray with the following values:
array([[nan, 2., 3., 4., 5., 6., 7., 8., 9., 10., 11., 12.],
[ 1., 2., 3., 4., 5., 6., 7., 8., 9., 10., 11., 12.]])
months.isel(time=slice(1, None)).coarsen({"time": 12}, boundary="pad", side="right").construct(
time=("year", "month")
)
<xarray.DataArray (year: 2, month: 12)> array([[nan, 2., 3., 4., 5., 6., 7., 8., 9., 10., 11., 12.], [ 1., 2., 3., 4., 5., 6., 7., 8., 9., 10., 11., 12.]]) Coordinates: time (year, month) float64 nan 2.0 3.0 4.0 5.0 ... 21.0 22.0 23.0 24.0 Dimensions without coordinates: year, month
Note that coarsen
pads with NaNs. For more control over paddnig, use
DataArray.pad explicitly.
Going further#
Follow the tutorial on high-level computational patterns