Handling complex output#
We’ve seen how to use apply_ufunc
to handle relatively simple functions that transform every element, or reduce along a single dimension.
This lesson will show you how to handle cases where the output is more complex in two ways:
Handle adding a new dimension by specifying
output_core_dims
Handling the change in size of an existing dimension by specifying
exclude_dims
in addition tooutput_core_dims
Introduction#
A good example of a function that returns relatively complex output is numpy’s 1D interpolate function numpy.interp
:
Signature: np.interp(x, xp, fp, left=None, right=None, period=None)
Docstring:
One-dimensional linear interpolation.
Returns the one-dimensional piecewise linear interpolant to a function
with given discrete data points (`xp`, `fp`), evaluated at `x`.
This function expects a 1D array as input, and returns a 1D array as output. That is, numpy.interp
has one core dimension.
Tip
We’ll reduce the length of error messages using %xmode minimal
See the ipython documentation for details.
%xmode minimal
import xarray as xr
import numpy as np
np.set_printoptions(threshold=10, edgeitems=2)
xr.set_options(display_expand_data=False)
air = (
xr.tutorial.load_dataset("air_temperature")
.air.sortby("lat") # np.interp needs coordinate in ascending order
.isel(time=-0, lon=0) # choose a 1D subset
)
air
Exception reporting mode: Minimal
<xarray.DataArray 'air' (lat: 25)> Size: 200B 296.3 295.9 296.6 297.0 295.4 293.8 ... 272.1 274.5 266.5 250.0 243.8 241.2 Coordinates: * lat (lat) float32 100B 15.0 17.5 20.0 22.5 25.0 ... 67.5 70.0 72.5 75.0 lon float32 4B 200.0 time datetime64[ns] 8B 2013-01-01 Attributes: long_name: 4xDaily Air temperature at sigma level 995 units: degK precision: 2 GRIB_id: 11 GRIB_name: TMP var_desc: Air temperature dataset: NMC Reanalysis level_desc: Surface statistic: Individual Obs parent_stat: Other actual_range: [185.16 322.1 ]
# Our goal is to densify from 25 to 100 coordinate values:s
newlat = np.linspace(15, 75, 100)
np.interp(newlat, air.lat.data, air.data)
array([296.29 , 296.19545455, ..., 241.83030303, 241.2 ])
Adding a new dimension#
1D interpolation transforms the size of the input along a single dimension.
Logically, we can think of this as removing the old dimension and adding a new dimension.
We provide this information to apply_ufunc
using the output_core_dims
keyword argument
output_core_dims : List[tuple], optional
List of the same length as the number of output arguments from
``func``, giving the list of core dimensions on each output that were
not broadcast on the inputs. By default, we assume that ``func``
outputs exactly one array, with axes corresponding to each broadcast
dimension.
Core dimensions are assumed to appear as the last dimensions of each
output in the provided order.
For interp
we expect one returned output with one new core dimension that we will call "lat_interp"
.
Specify this using output_core_dims=[["lat_interp"]]
newlat = np.linspace(15, 75, 100)
xr.apply_ufunc(
np.interp, # function to apply
newlat, # 1st input to np.interp
air.lat, # 2nd input to np.interp
air, # 3rd input to np.interp
input_core_dims=[["lat_interp"], ["lat"], ["lat"]], # one entry per function input, 3 in total!
output_core_dims=[["lat_interp"]],
)
<xarray.DataArray (lat_interp: 100)> Size: 800B 296.3 296.2 296.1 296.0 295.9 296.0 ... 245.1 243.7 243.1 242.5 241.8 241.2 Coordinates: lon float32 4B 200.0 time datetime64[ns] 8B 2013-01-01 Dimensions without coordinates: lat_interp
Exercise
Apply the following function using apply_ufunc
. It adds a new dimension to the input array, let’s call it newdim
. Specify the new dimension using output_core_dims
. Do you need any input_core_dims
?
def add_new_dim(array):
return np.expand_dims(array, axis=-1)
Solution
def add_new_dim(array):
return np.expand_dims(array, axis=-1)
xr.apply_ufunc(
add_new_dim,
air,
output_core_dims=[["newdim"]],
)
Dimensions that change size#
Imagine that you want the output to have the same dimension name "lat"
i.e. applyingnp.interp
changes the size of the "lat"
dimension.
We get an a error if we specify "lat"
in output_core_dims
newlat = np.linspace(15, 75, 100)
xr.apply_ufunc(
np.interp, # first the function
newlat,
air.lat,
air,
input_core_dims=[["lat"], ["lat"], ["lat"]],
output_core_dims=[["lat"]],
)
ValueError: size of dimension 'lat' on inputs was unexpectedly changed by applied function from 25 to 100. Only dimensions specified in ``exclude_dims`` with xarray.apply_ufunc are allowed to change size. The data returned was:
array([296.29 , 296.195455, 296.100909, 296.006364, 295.911818, 296.048485,
296.218182, 296.387879, 296.557576, 296.672727, 296.769697, 296.866667,
296.963636, 296.757576, 296.369697, 295.981818, 295.593939, 295.204848,
294.814545, 294.424242, 294.033939, 293.727273, 293.56 , 293.392727,
293.225455, 292.924242, 292.221212, 291.518182, 290.815152, 290.130303,
289.572727, 289.015152, 288.457576, 287.9 , 287.560606, 287.221212,
286.881818, 286.542424, 286.09697 , 285.636364, 285.175758, 284.715152,
284.270909, 283.832121, 283.393333, 282.954545, 282.367273, 281.690909,
281.014545, 280.338182, 279.806061, 279.418182, 279.030303, 278.642424,
278.299091, 278.03 , 277.760909, 277.491818, 277.254242, 277.111212,
276.968182, 276.825152, 276.675758, 276.481818, 276.287879, 276.093939,
275.9 , 275.630909, 275.361818, 275.092727, 274.823636, 274.558788,
274.294545, 274.030303, 273.766061, 273.409091, 273.021212, 272.633333,
272.245455, 272.463636, 273.045455, 273.627273, 274.209091, 273.530303,
271.590909, 269.651515, 267.712121, 265. , 261. , 257. ,
253. , 249.624242, 248.121212, 246.618182, 245.115152, 243.721212,
243.090909, 242.460606, 241.830303, 241.2 ])
As the error message points out,
Only dimensions specified in ``exclude_dims`` with xarray.apply_ufunc are allowed to change size.
Looking at the docstring we need to specify exclude_dims
as a “set”:
exclude_dims : set, optional
Core dimensions on the inputs to exclude from alignment and
broadcasting entirely. Any input coordinates along these dimensions
will be dropped. Each excluded dimension must also appear in
``input_core_dims`` for at least one argument. Only dimensions listed
here are allowed to change size between input and output objects.
newlat = np.linspace(15, 75, 100)
xr.apply_ufunc(
np.interp, # first the function
newlat,
air.lat,
air,
input_core_dims=[["lat"], ["lat"], ["lat"]],
output_core_dims=[["lat"]],
exclude_dims={"lat"},
)
<xarray.DataArray (lat: 100)> Size: 800B 296.3 296.2 296.1 296.0 295.9 296.0 ... 245.1 243.7 243.1 242.5 241.8 241.2 Coordinates: lon float32 4B 200.0 time datetime64[ns] 8B 2013-01-01 Dimensions without coordinates: lat
Returning multiple variables#
Another common, but more complex, case is to handle multiple outputs returned by the function.
As an example we will write a function that returns the minimum and maximum value along the last axis of the array.
We will work with a 2D array, and apply the function minmax
along the "lat"
dimension:
def minmax(array):
return array.min(axis=-1), array.max(axis=-1)
def minmax(array):
return array.min(axis=-1), array.max(axis=-1)
air2d = xr.tutorial.load_dataset("air_temperature").air.isel(time=0)
air2d
<xarray.DataArray 'air' (lat: 25, lon: 53)> Size: 11kB 241.2 242.5 243.5 244.0 244.1 243.9 ... 298.0 297.8 297.6 296.9 296.8 296.6 Coordinates: * lat (lat) float32 100B 75.0 72.5 70.0 67.5 65.0 ... 22.5 20.0 17.5 15.0 * lon (lon) float32 212B 200.0 202.5 205.0 207.5 ... 325.0 327.5 330.0 time datetime64[ns] 8B 2013-01-01 Attributes: long_name: 4xDaily Air temperature at sigma level 995 units: degK precision: 2 GRIB_id: 11 GRIB_name: TMP var_desc: Air temperature dataset: NMC Reanalysis level_desc: Surface statistic: Individual Obs parent_stat: Other actual_range: [185.16 322.1 ]
By default, Xarray assumes one array is returned by the applied function.
Here we have two returned arrays, and the input core dimension "lat"
is removed (or reduced over).
So we provide output_core_dims=[[], []]
i.e. an empty list of core dimensions for each of the two returned arrays.
minda, maxda = xr.apply_ufunc(
minmax,
air2d,
input_core_dims=[["lat"]],
output_core_dims=[[], []],
)
minda
<xarray.DataArray 'air' (lon: 53)> Size: 424B 241.2 242.5 243.5 244.0 243.4 242.4 ... 227.5 228.8 230.6 232.8 235.3 238.6 Coordinates: * lon (lon) float32 212B 200.0 202.5 205.0 207.5 ... 325.0 327.5 330.0 time datetime64[ns] 8B 2013-01-01
Exercise
We presented the concept of “core dimensions” as the “smallest unit of data the function could handle.” Do you understand how the above use of apply_ufunc
generalizes to an array with more than one dimension?
Try applying the minmax function to a 3d air temperature dataset
air3d = xr.tutorial.load_dataset("air_temperature").air
Your goal is to have a minimum and maximum value of temperature across all latitudes for a given time and longitude.
Solution
We want to use minmax
to compute the minimum and maximum along the “lat” dimension always, regardless of how many dimensions are on the input. So we specify input_core_dims=[["lat"]]
. The output does not contain the “lat” dimension, but we expect two returned variables. So we pass an empty list []
for each returned array, so output_core_dims=[[], []]
just as before.
minda, maxda = xr.apply_ufunc(
minmax,
air3d,
input_core_dims=[["lat"]],
output_core_dims=[[],[]],
)