High-level computational patterns
Contents
High-level computational patterns#
Motivation / Learning goals#
From https://toolz.readthedocs.io/en/latest/control.html
The Toolz library contains dozens of patterns like map and groupby. Learning a core set (maybe a dozen) covers the vast majority of common programming tasks often done by hand. A rich vocabulary of core control functions conveys the following benefits:
You identify new patterns
You make fewer errors in rote coding
You can depend on well tested and benchmarked implementations
The same is true for xarray
Xarray’s high-level patterns#
Xarray allows you to leverage dataset metadata to write more readable analysis code. The metadata is stored with the data; not in your head.
Dimension names:
dim="latitude"
instead ofaxis=0
Coordinate “labels”: or axis tick labels.
data.sel(latitude=45)
instead ofdata[10]
Xarray also provides high-level computational patterns that cover many data analysis tasks.
rolling
: Operate on rolling windows of your data e.g. running meancoarsen
: Downsample your datagroupby
: Bin data in to groups and reducegroupby_bins
: GroupBy after discretizing a numeric variable.resample
: Groupby specialized for time axes. Either downsample or upsample your data.weighted
: Weight your data before reducing
Load example dataset#
import numpy as np
import xarray as xr
xr.set_options(keep_attrs=True, display_expand_data=False)
da = xr.tutorial.load_dataset("air_temperature", engine="netcdf4").air
monthly = da.resample(time="M").mean()
data = da.isel(time=0)
data.plot()
<matplotlib.collections.QuadMesh at 0x7f02f8ea1720>

Concept: “index space” vs “label space”#
These are windowed operations with a window of a fixed size.
rolling
: sliding window operations e.g. running meancoarsen
: decimating; reshaping
data
<xarray.DataArray 'air' (lat: 25, lon: 53)> 241.2 242.5 243.5 244.0 244.1 243.9 ... 298.0 297.8 297.6 296.9 296.8 296.6 Coordinates: * lat (lat) float32 75.0 72.5 70.0 67.5 65.0 ... 25.0 22.5 20.0 17.5 15.0 * lon (lon) float32 200.0 202.5 205.0 207.5 ... 322.5 325.0 327.5 330.0 time datetime64[ns] 2013-01-01 Attributes: long_name: 4xDaily Air temperature at sigma level 995 units: degK precision: 2 GRIB_id: 11 GRIB_name: TMP var_desc: Air temperature dataset: NMC Reanalysis level_desc: Surface statistic: Individual Obs parent_stat: Other actual_range: [185.16 322.1 ]
# index space
data[10, :] # 10th element along the first axis; ¯\_(ツ)_/¯
<xarray.DataArray 'air' (lon: 53)> 277.3 277.4 277.8 278.6 279.5 280.1 ... 280.5 282.9 284.7 286.1 286.9 286.6 Coordinates: lat float32 50.0 * lon (lon) float32 200.0 202.5 205.0 207.5 ... 322.5 325.0 327.5 330.0 time datetime64[ns] 2013-01-01 Attributes: long_name: 4xDaily Air temperature at sigma level 995 units: degK precision: 2 GRIB_id: 11 GRIB_name: TMP var_desc: Air temperature dataset: NMC Reanalysis level_desc: Surface statistic: Individual Obs parent_stat: Other actual_range: [185.16 322.1 ]
# slightly better index space
data.isel(lat=10) # slightly better, 10th element in latitude
<xarray.DataArray 'air' (lon: 53)> 277.3 277.4 277.8 278.6 279.5 280.1 ... 280.5 282.9 284.7 286.1 286.9 286.6 Coordinates: lat float32 50.0 * lon (lon) float32 200.0 202.5 205.0 207.5 ... 322.5 325.0 327.5 330.0 time datetime64[ns] 2013-01-01 Attributes: long_name: 4xDaily Air temperature at sigma level 995 units: degK precision: 2 GRIB_id: 11 GRIB_name: TMP var_desc: Air temperature dataset: NMC Reanalysis level_desc: Surface statistic: Individual Obs parent_stat: Other actual_range: [185.16 322.1 ]
# "label" space
data.sel(lat=50) # much better! lat=50°N
<xarray.DataArray 'air' (lon: 53)> 277.3 277.4 277.8 278.6 279.5 280.1 ... 280.5 282.9 284.7 286.1 286.9 286.6 Coordinates: lat float32 50.0 * lon (lon) float32 200.0 202.5 205.0 207.5 ... 322.5 325.0 327.5 330.0 time datetime64[ns] 2013-01-01 Attributes: long_name: 4xDaily Air temperature at sigma level 995 units: degK precision: 2 GRIB_id: 11 GRIB_name: TMP var_desc: Air temperature dataset: NMC Reanalysis level_desc: Surface statistic: Individual Obs parent_stat: Other actual_range: [185.16 322.1 ]
# What I wanted to do
data.sel(lat=50)
# What I had to do (if I wasn't using xarray)
data[10, :]
<xarray.DataArray 'air' (lon: 53)> 277.3 277.4 277.8 278.6 279.5 280.1 ... 280.5 282.9 284.7 286.1 286.9 286.6 Coordinates: lat float32 50.0 * lon (lon) float32 200.0 202.5 205.0 207.5 ... 322.5 325.0 327.5 330.0 time datetime64[ns] 2013-01-01 Attributes: long_name: 4xDaily Air temperature at sigma level 995 units: degK precision: 2 GRIB_id: 11 GRIB_name: TMP var_desc: Air temperature dataset: NMC Reanalysis level_desc: Surface statistic: Individual Obs parent_stat: Other actual_range: [185.16 322.1 ]
Xarray provides high-level patterns in both “index space” and “label space”#
Index space#
rolling
: Operate on rolling windows of your data e.g. running meancoarsen
: Downsample your data
Label space#
groupby
: Bin data in to groups and reducegroupby_bins
: GroupBy after discretizing a numeric variable.resample
: Groupby specialized for time axes. Either downsample or upsample your data.
Index space: windows of fixed width#
Sliding windows of fixed length: rolling
#
returns object of same shape as input
pads with NaNs to make this happen
supports multiple dimensions
Here’s the dataset
data.plot()
<matplotlib.collections.QuadMesh at 0x7f02f0e4fc70>

And now smoothed 5 point running mean in lat and lon
data.rolling(lat=5, lon=5, center=True).mean().plot()
<matplotlib.collections.QuadMesh at 0x7f02f0cb2440>

Apply an existing numpy-only function with reduce
#
Tip: The reduce
method expects a function that can receive and return plain
arrays (e.g. numpy). The map
method expects a function that can receive and
return Xarray objects.
Here’s an example function: np.mean
Exercise Calculate the rolling mean in 5 point bins along both latitude and
longitude using
rolling(...).reduce
Answer:
# exactly equivalent to data.rolling(...).mean()
data.rolling(lat=5, lon=5, center=True).reduce(np.mean).plot()
<matplotlib.collections.QuadMesh at 0x7f02f0c548e0>

For more complicated analysis, construct a new array with a new dimension.#
Allows things like short-time fourier transform, spectrogram, windowed rolling etc.
simple = xr.DataArray(np.arange(10), dims="time", coords={"time": np.arange(10)})
simple
<xarray.DataArray (time: 10)> 0 1 2 3 4 5 6 7 8 9 Coordinates: * time (time) int64 0 1 2 3 4 5 6 7 8 9
# adds a new dimension "window"
simple.rolling(time=5, center=True).construct("window")
<xarray.DataArray (time: 10, window: 5)> nan nan 0.0 1.0 2.0 nan 0.0 1.0 2.0 3.0 ... 7.0 8.0 9.0 nan 7.0 8.0 9.0 nan nan Coordinates: * time (time) int64 0 1 2 3 4 5 6 7 8 9 Dimensions without coordinates: window
Exercise Calculate the 5 point running mean in time using
rolling.construct
Answer
(simple.rolling(time=5, center=True).construct("window").mean("window"))
<xarray.DataArray (time: 10)> 1.0 1.5 2.0 3.0 4.0 5.0 6.0 7.0 7.5 8.0 Coordinates: * time (time) int64 0 1 2 3 4 5 6 7 8 9
construct
is clever.
It constructs a view of the original array, so it is memory-efficient. but you didn’t have to know that.
It does something sensible for dask arrays (though generally you want big chunksizes for the dimension you’re sliding along).
It also works with rolling along multiple dimensions!
Advanced: Another construct
example#
This is a 2D rolling example; we need to provide two new dimension names
(data.rolling(lat=5, lon=5, center=True).construct(lat="lat_roll", lon="lon_roll"))
<xarray.DataArray 'air' (lat: 25, lon: 53, lat_roll: 5, lon_roll: 5)> nan nan nan nan nan nan nan nan nan nan ... nan nan nan nan nan nan nan nan nan Coordinates: * lat (lat) float32 75.0 72.5 70.0 67.5 65.0 ... 25.0 22.5 20.0 17.5 15.0 * lon (lon) float32 200.0 202.5 205.0 207.5 ... 322.5 325.0 327.5 330.0 time datetime64[ns] 2013-01-01 Dimensions without coordinates: lat_roll, lon_roll Attributes: long_name: 4xDaily Air temperature at sigma level 995 units: degK precision: 2 GRIB_id: 11 GRIB_name: TMP var_desc: Air temperature dataset: NMC Reanalysis level_desc: Surface statistic: Individual Obs parent_stat: Other actual_range: [185.16 322.1 ]
Block windows of fixed length: coarsen
#
For non-overlapping windows or “blocks” use coarsen
. The syntax is very
similar to rolling
. You will need to specify boundary
if the length of the
dimension is not a multiple of the block size
data
<xarray.DataArray 'air' (lat: 25, lon: 53)> 241.2 242.5 243.5 244.0 244.1 243.9 ... 298.0 297.8 297.6 296.9 296.8 296.6 Coordinates: * lat (lat) float32 75.0 72.5 70.0 67.5 65.0 ... 25.0 22.5 20.0 17.5 15.0 * lon (lon) float32 200.0 202.5 205.0 207.5 ... 322.5 325.0 327.5 330.0 time datetime64[ns] 2013-01-01 Attributes: long_name: 4xDaily Air temperature at sigma level 995 units: degK precision: 2 GRIB_id: 11 GRIB_name: TMP var_desc: Air temperature dataset: NMC Reanalysis level_desc: Surface statistic: Individual Obs parent_stat: Other actual_range: [185.16 322.1 ]
data.plot()
<matplotlib.collections.QuadMesh at 0x7f02f0b327a0>

data.coarsen(lat=5, lon=5, boundary="trim").std()
<xarray.DataArray 'air' (lat: 5, lon: 10)> 12.73 12.19 11.7 6.48 2.827 2.738 2.491 ... 4.97 1.925 1.98 1.824 1.17 0.9438 Coordinates: * lat (lat) float32 70.0 57.5 45.0 32.5 20.0 * lon (lon) float32 205.0 217.5 230.0 242.5 ... 280.0 292.5 305.0 317.5 time datetime64[ns] 2013-01-01 Attributes: long_name: 4xDaily Air temperature at sigma level 995 units: degK precision: 2 GRIB_id: 11 GRIB_name: TMP var_desc: Air temperature dataset: NMC Reanalysis level_desc: Surface statistic: Individual Obs parent_stat: Other actual_range: [185.16 322.1 ]
(data.coarsen(lat=5, lon=5, boundary="trim").mean().plot())
<matplotlib.collections.QuadMesh at 0x7f02f0a0bd60>

coarsen supports reduce
for custom reductions#
Exercise Use coarsen.reduce
to apply np.mean
in 5x5 (latxlon) point
blocks of data
Answer
(data.coarsen(lat=5, lon=5, boundary="trim").reduce(np.mean).plot())
<matplotlib.collections.QuadMesh at 0x7f02f0901b70>

coarsen supports construct
for block reshaping#
This is usually a good alternative to np.reshape
A simple example splits a 2-year long monthly 1D time series into a 2D array shaped (year x month)
months = xr.DataArray(
np.tile(np.arange(1, 13), reps=2),
dims="time",
coords={"time": np.arange(1, 25)},
)
months
<xarray.DataArray (time: 24)> 1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12 Coordinates: * time (time) int64 1 2 3 4 5 6 7 8 9 10 ... 15 16 17 18 19 20 21 22 23 24
# break "time" into two new dimensions: "year", "month"
months.coarsen(time=12).construct(time=("year", "month"))
<xarray.DataArray (year: 2, month: 12)> 1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12 Coordinates: time (year, month) int64 1 2 3 4 5 6 7 8 9 ... 17 18 19 20 21 22 23 24 Dimensions without coordinates: year, month
Note two things:
The
time
dimension was also reshaped.The new dimensions
year
andmonth
don’t have any coordinate labels associated with them.
What if the data had say 23 instead of 24 values? In that case we specify a
different boundary
, here we pad to 24 values.
months.isel(time=slice(1, None)).coarsen(time=12, boundary="pad").construct(time=("year", "month"))
<xarray.DataArray (year: 2, month: 12)> 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0 10.0 ... 5.0 6.0 7.0 8.0 9.0 10.0 11.0 12.0 nan Coordinates: time (year, month) float64 2.0 3.0 4.0 5.0 6.0 ... 22.0 23.0 24.0 nan Dimensions without coordinates: year, month
This ends up adding values at the end of the array, not so sensible for this
problem. We have some control of the padding through the side
kwarg to coarsen
. For side="right"
we get more sensible output.
months.isel(time=slice(1, None)).coarsen(time=12, boundary="pad", side="right").construct(
time=("year", "month")
)
<xarray.DataArray (year: 2, month: 12)> nan 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0 ... 4.0 5.0 6.0 7.0 8.0 9.0 10.0 11.0 12.0 Coordinates: time (year, month) float64 nan 2.0 3.0 4.0 5.0 ... 21.0 22.0 23.0 24.0 Dimensions without coordinates: year, month
Note that coarsen
pads with NaNs. For more control over paddnig, use
DataArray.pad explicitly.
(
months.isel(time=slice(1, None))
.pad(time=(1, 0), constant_values=-1)
.coarsen(time=12)
.construct(time=("year", "month"))
)
<xarray.DataArray (year: 2, month: 12)> -1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12 Coordinates: time (year, month) float64 nan 2.0 3.0 4.0 5.0 ... 21.0 22.0 23.0 24.0 Dimensions without coordinates: year, month
Exercise Reshape the time
dimension of the DataArray monthly
to year x
month and visualize the seasonal cycle for two years at 250°E
Answer
# splits time dimension into year x month
year_month = monthly.coarsen(time=12).construct(time=("year", "month"))
# assign a nice coordinate value for month
year_month["month"] = [
"jan",
"feb",
"mar",
"apr",
"may",
"jun",
"jul",
"aug",
"sep",
"oct",
"nov",
"dec",
]
# assign a nice coordinate value for year
year_month["year"] = [2013, 2014]
# seasonal cycle for two years
year_month.sel(lon=250).plot.contourf(col="year", x="month", y="lat")
<xarray.plot.facetgrid.FacetGrid at 0x7f02f0794b50>

This exercise came up during the live lecture.
Exercise Calculate the rolling 4 month average, averaged across years.
Answer
We first reshape using
coarsen.construct
to addyear
as a new dimension.Then
rolling
on the month dimension.It turns out that
roll.mean(["year", "month"])
doesn’t work. So we useroll.construct
to get a DataArray with a new dimensionwindow
and then take the mean overwindow
andyear
reshaped = months.coarsen(time=12).construct(time=("year", "month"))
roll = reshaped.rolling(month=4, center=True)
roll.construct("window").mean(["window", "year"])
<xarray.DataArray (month: 12)> 1.5 2.0 2.5 3.5 4.5 5.5 6.5 7.5 8.5 9.5 10.5 11.0 Dimensions without coordinates: month
Index space summary#
Use rolling
and coarsen
for fixed size windowing operations.
rolling
for overlapping windowscoarsen
for non-overlapping windows.
Both provide the usual reductions as methods (.mean()
and friends), and also
reduce
and construct
for custom operations.
Label space “windows” or bins : GroupBy#
Generalization of coarsen
: sometimes the windows you want are not regular.
groupby
: e.g. climatologies, composites; works when “groups” are exact: e.g. characters or integers; not floatsgroupby_bins
: binning operations e.g. histogramsresample
: groupby but specialized for time grouping (so far)
tip Both groupby_bins
and resample
are implemented as GroupBy
with a
specific way of constructing group labels.
Deconstructing GroupBy#
Commonly called “split-apply-combine”.
“split” : break dataset into groups
“apply” : apply an operation, usually a reduction like
mean
“combine” : concatenate results from apply step along new “group” dimension
But really there is a first step: “identifying groups” also called “factorization” (or “binning”). Usually this is the hard part.
So “identify groups” → “split into groups” → “apply function” → “combine results”.
da.groupby("time.month")
DataArrayGroupBy, grouped over 'month'
12 groups with labels 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12.
da.groupby("time.month").mean()
<xarray.DataArray 'air' (month: 12, lat: 25, lon: 53)> 246.3 246.4 246.2 245.8 245.2 244.6 ... 298.1 298.0 298.0 297.6 297.6 297.5 Coordinates: * lat (lat) float32 75.0 72.5 70.0 67.5 65.0 ... 25.0 22.5 20.0 17.5 15.0 * lon (lon) float32 200.0 202.5 205.0 207.5 ... 322.5 325.0 327.5 330.0 * month (month) int64 1 2 3 4 5 6 7 8 9 10 11 12 Attributes: long_name: 4xDaily Air temperature at sigma level 995 units: degK precision: 2 GRIB_id: 11 GRIB_name: TMP var_desc: Air temperature dataset: NMC Reanalysis level_desc: Surface statistic: Individual Obs parent_stat: Other actual_range: [185.16 322.1 ]
This is how xarray identifies “groups” for the monthly climatology computation
da.time.dt.month.plot()
[<matplotlib.lines.Line2D at 0x7f02f0a73d60>]

Similarly for binning,
data.groupby_bins("lat", bins=[20, 35, 40, 45, 50])
DataArrayGroupBy, grouped over 'lat_bins'
4 groups with labels (45.0,, 50.0], ..., (20.0,, 35.0].
and resampling…
da.resample(time="M")
DataArrayResample, grouped over '__resample_dim__'
24 groups with labels 2013-01-31, ..., 2014-12-31.
Constructing group labels#
Xarray uses pandas.factorize
for groupby
and pandas.cut
for
groupby_bins
.
If the automatic group detection doesn’t work for your problem then these functions are useful for constructing “group labels” in many cases
numpy.digitize (binning)
numpy.searchsorted supports many other data types
pandas.factorize supports characters, strings etc.
pandas.cut for binning
More commonly useful are “datetime components”#
See a full list here
Accessed using DataArray.dt.*
da.time
<xarray.DataArray 'time' (time: 2920)> 2013-01-01 2013-01-01T06:00:00 ... 2014-12-31T12:00:00 2014-12-31T18:00:00 Coordinates: * time (time) datetime64[ns] 2013-01-01 ... 2014-12-31T18:00:00 Attributes: standard_name: time long_name: Time
da.time.dt.day
<xarray.DataArray 'day' (time: 2920)> 1 1 1 1 2 2 2 2 3 3 3 3 4 4 4 4 ... 28 28 28 29 29 29 29 30 30 30 30 31 31 31 31 Coordinates: * time (time) datetime64[ns] 2013-01-01 ... 2014-12-31T18:00:00
da["time.day"]
<xarray.DataArray 'day' (time: 2920)> 1 1 1 1 2 2 2 2 3 3 3 3 4 4 4 4 ... 28 28 28 29 29 29 29 30 30 30 30 31 31 31 31 Coordinates: * time (time) datetime64[ns] 2013-01-01 ... 2014-12-31T18:00:00
da.time.dt.season
<xarray.DataArray 'season' (time: 2920)> 'DJF' 'DJF' 'DJF' 'DJF' 'DJF' 'DJF' ... 'DJF' 'DJF' 'DJF' 'DJF' 'DJF' 'DJF' Coordinates: * time (time) datetime64[ns] 2013-01-01 ... 2014-12-31T18:00:00
Demo Grouping over a custom definition of seasons using numpy.isin.
We want to group over 4 seasons: DJF
, MAM
, JJAS
, ON
- this makes
physical sense in the Indian Ocean basin
Start by extracting months.
month = da.time.dt.month.data
month
array([ 1, 1, 1, ..., 12, 12, 12])
Create a new empty array
season = np.full(month.shape, " ")
season
array([' ', ' ', ' ', ..., ' ', ' ', ' '], dtype='<U4')
Use isin
to assign custom seasons,
season[np.isin(month, [12, 1, 2])] = "DJF"
season[np.isin(month, [3, 4, 5])] = "MAM"
season[np.isin(month, [6, 7, 8, 9])] = "JJAS"
season[np.isin(month, [10, 11])] = "ON"
season = da.time.copy(data=season)
season
<xarray.DataArray 'time' (time: 2920)> 'DJF' 'DJF' 'DJF' 'DJF' 'DJF' 'DJF' ... 'DJF' 'DJF' 'DJF' 'DJF' 'DJF' 'DJF' Coordinates: * time (time) datetime64[ns] 2013-01-01 ... 2014-12-31T18:00:00 Attributes: standard_name: time long_name: Time
(
# Calculate climatology
da.groupby(season)
.mean()
# reindex to get seasons in logical order (not alphabetical order)
.reindex(time=["DJF", "MAM", "JJAS", "ON"])
.plot(col="time")
)
<xarray.plot.facetgrid.FacetGrid at 0x7f02f05fff70>

floor
, ceil
and round
time#
Basically “resampling”
da.time
<xarray.DataArray 'time' (time: 2920)> 2013-01-01 2013-01-01T06:00:00 ... 2014-12-31T12:00:00 2014-12-31T18:00:00 Coordinates: * time (time) datetime64[ns] 2013-01-01 ... 2014-12-31T18:00:00 Attributes: standard_name: time long_name: Time
# remove roundoff error in timestamps
# floor to daily frequency
da.time.dt.floor("D")
<xarray.DataArray 'floor' (time: 2920)> 2013-01-01 2013-01-01 2013-01-01 2013-01-01 ... 2014-12-31 2014-12-31 2014-12-31 Coordinates: * time (time) datetime64[ns] 2013-01-01 ... 2014-12-31T18:00:00
strftime
can be extremely useful#
So useful and so unintuitive that it has its own website: https://strftime.org/
This example avoids merging “Feb-29” and “Mar-01” for a daily climatology
da.time.dt.strftime("%b-%d")
<xarray.DataArray 'strftime' (time: 2920)> 'Jan-01' 'Jan-01' 'Jan-01' 'Jan-01' ... 'Dec-31' 'Dec-31' 'Dec-31' 'Dec-31' Coordinates: * time (time) datetime64[ns] 2013-01-01 ... 2014-12-31T18:00:00
groupby supports reduce
for custom reductions#
This applies to groupby_bins
and resample
(da.groupby("time.month").reduce(np.mean).plot(col="month", col_wrap=4))
<xarray.plot.facetgrid.FacetGrid at 0x7f02f052fee0>

tip map
is for functions that expect and return xarray objects (see also
Dataset.map
). reduce
is for functions that expect and return plain arrays
(like numpy or scipy functions)
GroupBy does not provide construct#
All the groups need not be the same “length” (e.g. months can have 28, 29, 30, or 31 days)
Instead looping over groupby objects is possible#
Maybe you want to plot data in each group separately?
for label, group in da.groupby("time.month"):
print(label)
1
2
3
4
5
6
7
8
9
10
11
12
This is a DataArray contain data for all December days
group
<xarray.DataArray 'air' (time: 248, lat: 25, lon: 53)> 268.8 266.6 263.8 260.7 257.6 254.8 ... 297.9 297.4 297.2 296.5 296.2 295.7 Coordinates: * lat (lat) float32 75.0 72.5 70.0 67.5 65.0 ... 25.0 22.5 20.0 17.5 15.0 * lon (lon) float32 200.0 202.5 205.0 207.5 ... 322.5 325.0 327.5 330.0 * time (time) datetime64[ns] 2013-12-01 ... 2014-12-31T18:00:00 Attributes: long_name: 4xDaily Air temperature at sigma level 995 units: degK precision: 2 GRIB_id: 11 GRIB_name: TMP var_desc: Air temperature dataset: NMC Reanalysis level_desc: Surface statistic: Individual Obs parent_stat: Other actual_range: [185.16 322.1 ]
Maybe you want a histogram of December temperatures?
group.plot.hist()
(array([ 218., 6763., 32167., 34735., 26251., 29243., 40491., 41906.,
72286., 44540.]),
array([221. , 229.33900452, 237.67799377, 246.01699829,
254.35598755, 262.69500732, 271.03399658, 279.37298584,
287.7119751 , 296.05099487, 304.38998413]),
<BarContainer object of 10 artists>)

In most cases, avoid a for loop using map
#
Apply functions that expect xarray Datasets or DataArrays.
Avoid having to manually combine results using concat
def iqr(da, dim):
"""Calculates interquartile range"""
return (da.quantile(q=0.75, dim=dim) - da.quantile(q=0.25, dim=dim)).rename("iqr")
da.groupby("time.month").map(iqr, dim="time")
<xarray.DataArray 'iqr' (month: 12, lat: 25, lon: 53)> 7.528 7.425 7.025 6.65 6.152 6.135 6.127 ... 0.9 0.9 0.855 1.0 1.3 1.432 1.5 Coordinates: * lat (lat) float32 75.0 72.5 70.0 67.5 65.0 ... 25.0 22.5 20.0 17.5 15.0 * lon (lon) float32 200.0 202.5 205.0 207.5 ... 322.5 325.0 327.5 330.0 * month (month) int64 1 2 3 4 5 6 7 8 9 10 11 12 Attributes: long_name: 4xDaily Air temperature at sigma level 995 units: degK precision: 2 GRIB_id: 11 GRIB_name: TMP var_desc: Air temperature dataset: NMC Reanalysis level_desc: Surface statistic: Individual Obs parent_stat: Other actual_range: [185.16 322.1 ]
Summary#
Xarray provides methods for high-level analysis patterns:
rolling
: Operate on rolling windows of your data e.g. running meancoarsen
: Downsample your datagroupby
: Bin data in to groups and reducegroupby_bins
: GroupBy after discretizing a numeric variable.resample
: Groupby specialized for time axes. Either downsample or upsample your data.weighted
: Weight your data before reducing
More resources#
More tutorials here: https://tutorial.xarray.dev/
Answers to common questions on “how to do X” are here: https://docs.xarray.dev/en/stable/howdoi.html