Handling complex output#

We’ve seen how to use apply_ufunc to handle relatively simple functions that transform every element, or reduce along a single dimension.

This lesson will show you how to handle cases where the output is more complex in two ways:

  1. Handle adding a new dimension by specifying output_core_dims

  2. Handling the change in size of an existing dimension by specifying exclude_dims in addition to output_core_dims

Introduction#

A good example of a function that returns relatively complex output is numpy’s 1D interpolate function numpy.interp:

    Signature: np.interp(x, xp, fp, left=None, right=None, period=None)
    Docstring:
        One-dimensional linear interpolation.

    Returns the one-dimensional piecewise linear interpolant to a function
    with given discrete data points (`xp`, `fp`), evaluated at `x`.

This function expects a 1D array as input, and returns a 1D array as output. That is, numpy.interp has one core dimension.

Tip

We’ll reduce the length of error messages using %xmode minimal See the ipython documentation for details.

%xmode minimal

import xarray as xr
import numpy as np

np.set_printoptions(threshold=10, edgeitems=2)
xr.set_options(display_expand_data=False)

air = (
    xr.tutorial.load_dataset("air_temperature")
    .air.sortby("lat")  # np.interp needs coordinate in ascending order
    .isel(time=-0, lon=0)  # choose a 1D subset
)
air
Exception reporting mode: Minimal
<xarray.DataArray 'air' (lat: 25)>
296.3 295.9 296.6 297.0 295.4 293.8 ... 272.1 274.5 266.5 250.0 243.8 241.2
Coordinates:
  * lat      (lat) float32 15.0 17.5 20.0 22.5 25.0 ... 65.0 67.5 70.0 72.5 75.0
    lon      float32 200.0
    time     datetime64[ns] 2013-01-01
Attributes:
    long_name:     4xDaily Air temperature at sigma level 995
    units:         degK
    precision:     2
    GRIB_id:       11
    GRIB_name:     TMP
    var_desc:      Air temperature
    dataset:       NMC Reanalysis
    level_desc:    Surface
    statistic:     Individual Obs
    parent_stat:   Other
    actual_range:  [185.16 322.1 ]
# Our goal is to densify from 25 to 100 coordinate values:s
newlat = np.linspace(15, 75, 100)
np.interp(newlat, air.lat.data, air.data)
array([296.29000854, 296.19545954, ..., 241.83029776, 241.19999695])

Adding a new dimension#

1D interpolation transforms the size of the input along a single dimension.

Logically, we can think of this as removing the old dimension and adding a new dimension.

We provide this information to apply_ufunc using the output_core_dims keyword argument

   output_core_dims : List[tuple], optional
        List of the same length as the number of output arguments from
        ``func``, giving the list of core dimensions on each output that were
        not broadcast on the inputs. By default, we assume that ``func``
        outputs exactly one array, with axes corresponding to each broadcast
        dimension.

        Core dimensions are assumed to appear as the last dimensions of each
        output in the provided order.

For interp we expect one returned output with one new core dimension that we will call "lat_interp".

Specify this using output_core_dims=[["lat_interp"]]

newlat = np.linspace(15, 75, 100)

xr.apply_ufunc(
    np.interp,  # function to apply
    newlat,  # 1st input to np.interp
    air.lat,  # 2nd input to np.interp
    air,  # 3rd input to np.interp
    input_core_dims=[["lat_interp"], ["lat"], ["lat"]],  # one entry per function input, 3 in total!
    output_core_dims=[["lat_interp"]],
)
<xarray.DataArray (lat_interp: 100)>
296.3 296.2 296.1 296.0 295.9 296.0 ... 245.1 243.7 243.1 242.5 241.8 241.2
Coordinates:
    lon      float32 200.0
    time     datetime64[ns] 2013-01-01
Dimensions without coordinates: lat_interp

Exercise 25

Apply the following function using apply_ufunc. It adds a new dimension to the input array, let’s call it newdim. Specify the new dimension using output_core_dims. Do you need any input_core_dims?

def add_new_dim(array):
    return np.expand_dims(array, axis=-1)

Dimensions that change size#

Imagine that you want the output to have the same dimension name "lat" i.e. applyingnp.interp changes the size of the "lat" dimension.

We get an a error if we specify "lat" in output_core_dims

newlat = np.linspace(15, 75, 100)

xr.apply_ufunc(
    np.interp,  # first the function
    newlat,
    air.lat,
    air,
    input_core_dims=[["lat"], ["lat"], ["lat"]],
    output_core_dims=[["lat"]],
)
ValueError: size of dimension 'lat' on inputs was unexpectedly changed by applied function from 25 to 100. Only dimensions specified in ``exclude_dims`` with xarray.apply_ufunc are allowed to change size. The data returned was:

array([296.290009, 296.19546 , 296.100911, 296.006362, 295.911813, 296.048481,
       296.218181, 296.387881, 296.557581, 296.672732, 296.7697  , 296.866669,
       296.963637, 296.757575, 296.369695, 295.981814, 295.593934, 295.204844,
       294.814545, 294.424245, 294.033946, 293.727281, 293.560008, 293.392734,
       293.225461, 292.924247, 292.221211, 291.518175, 290.815138, 290.130285,
       289.572712, 289.015139, 288.457567, 287.899994, 287.560601, 287.221209,
       286.881817, 286.542424, 286.096971, 285.636366, 285.175762, 284.715157,
       284.270916, 283.832128, 283.393341, 282.954554, 282.36728 , 281.690914,
       281.014549, 280.338183, 279.80606 , 279.41818 , 279.030299, 278.642419,
       278.299086, 278.029999, 277.760911, 277.491824, 277.254249, 277.111213,
       276.968176, 276.825139, 276.67574 , 276.481803, 276.287867, 276.09393 ,
       275.899994, 275.630907, 275.361819, 275.092732, 274.823644, 274.558791,
       274.294542, 274.030293, 273.766044, 273.409077, 273.021204, 272.633331,
       272.245458, 272.463642, 273.045458, 273.627275, 274.209092, 273.530303,
       271.590909, 269.651515, 267.712121, 265.      , 261.      , 257.      ,
       253.      , 249.624242, 248.121208, 246.618175, 245.115142, 243.7212  ,
       243.090899, 242.460599, 241.830298, 241.199997])

As the error message points out,

Only dimensions specified in ``exclude_dims`` with xarray.apply_ufunc are allowed to change size.

Looking at the docstring we need to specify exclude_dims as a “set”:

exclude_dims : set, optional
        Core dimensions on the inputs to exclude from alignment and
        broadcasting entirely. Any input coordinates along these dimensions
        will be dropped. Each excluded dimension must also appear in
        ``input_core_dims`` for at least one argument. Only dimensions listed
        here are allowed to change size between input and output objects.
newlat = np.linspace(15, 75, 100)

xr.apply_ufunc(
    np.interp,  # first the function
    newlat,
    air.lat,
    air,
    input_core_dims=[["lat"], ["lat"], ["lat"]],
    output_core_dims=[["lat"]],
    exclude_dims={"lat"},
)
<xarray.DataArray (lat: 100)>
296.3 296.2 296.1 296.0 295.9 296.0 ... 245.1 243.7 243.1 242.5 241.8 241.2
Coordinates:
    lon      float32 200.0
    time     datetime64[ns] 2013-01-01
Dimensions without coordinates: lat

Returning multiple variables#

Another common, but more complex, case is to handle multiple outputs returned by the function.

As an example we will write a function that returns the minimum and maximum value along the last axis of the array.

We will work with a 2D array, and apply the function minmax along the "lat" dimension:

def minmax(array):
    return array.min(axis=-1), array.max(axis=-1)
def minmax(array):
    return array.min(axis=-1), array.max(axis=-1)


air2d = xr.tutorial.load_dataset("air_temperature").air.isel(time=0)
air2d
<xarray.DataArray 'air' (lat: 25, lon: 53)>
241.2 242.5 243.5 244.0 244.1 243.9 ... 298.0 297.8 297.6 296.9 296.8 296.6
Coordinates:
  * lat      (lat) float32 75.0 72.5 70.0 67.5 65.0 ... 25.0 22.5 20.0 17.5 15.0
  * lon      (lon) float32 200.0 202.5 205.0 207.5 ... 322.5 325.0 327.5 330.0
    time     datetime64[ns] 2013-01-01
Attributes:
    long_name:     4xDaily Air temperature at sigma level 995
    units:         degK
    precision:     2
    GRIB_id:       11
    GRIB_name:     TMP
    var_desc:      Air temperature
    dataset:       NMC Reanalysis
    level_desc:    Surface
    statistic:     Individual Obs
    parent_stat:   Other
    actual_range:  [185.16 322.1 ]

By default, Xarray assumes one array is returned by the applied function.

Here we have two returned arrays, and the input core dimension "lat" is removed (or reduced over).

So we provide output_core_dims=[[], []] i.e. an empty list of core dimensions for each of the two returned arrays.

minda, maxda = xr.apply_ufunc(
    minmax,
    air2d,
    input_core_dims=[["lat"]],
    output_core_dims=[[], []],
)
minda
<xarray.DataArray 'air' (lon: 53)>
241.2 242.5 243.5 244.0 243.4 242.4 ... 227.5 228.8 230.6 232.8 235.3 238.6
Coordinates:
  * lon      (lon) float32 200.0 202.5 205.0 207.5 ... 322.5 325.0 327.5 330.0
    time     datetime64[ns] 2013-01-01

Exercise 26

We presented the concept of “core dimensions” as the “smallest unit of data the function could handle.” Do you understand how the above use of apply_ufunc generalizes to an array with more than one dimension?

Try applying the minmax function to a 3d air temperature dataset

air3d = xr.tutorial.load_dataset("air_temperature").air)

Your goal is to have a minimum and maximum value of temperature across all latitudes for a given time and longitude.