Fast vectorization with Numba

Fast vectorization with Numba#

https://numba.pydata.org/_static/numba-blue-horizontal-rgb.svg

np.vectorize is a very convenient function but is unfortunately slow. It is only marginally faster than writing a for loop in Python and looping.

A common way to get around this is to write a base interpolation function that can handle nD arrays in a compiled language like C or Fortran and then pass that to apply_ufunc.

Another option is to use the numba package which provides two very convenient decorators to build numpy universal functions or ufuncs:

vectorize for functions that act on scalars, and
guvectorize for functions that operates on subsets of the array along core-dimensions. Any decorated function gets compiled and will loop over the loop dimensions in parallel when necessary.

For apply_ufunc the key concept is that we must provide vectorize=False (the default) when using Numba vectorized functions. Numba handles the vectorization (or looping) and apply_ufunc handles converting Xarray objects to bare arrays and handling metadata.

Load data#

%xmode minimal

import numpy as np
import xarray as xr

da = xr.DataArray(
    np.arange(12).reshape(3, 4),
    dims=("x", "y"),
    coords={"x": [12, 13, 14]},
    attrs={"foo": "bar"},
)
da

Exception reporting mode: Minimal

<xarray.DataArray (x: 3, y: 4)> Size: 96B
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])
Coordinates:
  * x        (x) int64 24B 12 13 14
Dimensions without coordinates: y
Attributes:
    foo:      bar

`vectorize`#

Our squared_error example from earlier works element-by-element, and is a great example for vectorize

from numba import vectorize, float64


@vectorize([float64(float64, float64)])
def squared_error(x, y):
    return (x - y) ** 2

See the numba documentation to understand @vectorize([float64(float64, float64)])

Now use apply_ufunc to apply it.

xr.apply_ufunc(squared_error, da, 1)

<xarray.DataArray (x: 3, y: 4)> Size: 96B
array([[  1.,   0.,   1.,   4.],
       [  9.,  16.,  25.,  36.],
       [ 49.,  64.,  81., 100.]])
Coordinates:
  * x        (x) int64 24B 12 13 14
Dimensions without coordinates: y

`guvectorize`#

guvectorize is for functions that work on small subsets of the data. Quoting the Numba documentation

While vectorize() allows you to write ufuncs that work on one element at a time, the guvectorize() decorator takes the concept one step further and allows you to write ufuncs that will work on an arbitrary number of elements of input arrays, and take and return arrays of differing dimensions. The typical example is a running median or a convolution filter.

This description should remind you of apply_ufunc!

We will use the example function g from the numba docs, which adds a scalar y to a 1D vector x. The res argument here will contain the output (this is a Numba detail).

from numba import guvectorize, int64


@guvectorize([(int64[:], int64, int64[:])], '(n),()->(n)')
def g(x, y, res):
    for i in range(x.shape[0]):
        res[i] = x[i] + y


a = np.arange(5)
g(a, 2)

array([2, 3, 4, 5, 6])

Unlike squared_error we cannot pass an Xarray object to g directly.

g(da, 1)

NotImplementedError: <numba._GUFunc 'g'> not supported: xarray objects do not directly implement generalized ufuncs. Instead, use xarray.apply_ufunc or explicitly convert to xarray objects to NumPy arrays (e.g., with `.values`).

Now use apply_ufunc.

xr.apply_ufunc(
    g,
    da,
    1,
    input_core_dims=[["x"], []],
    output_core_dims=[["x"]],
)

<xarray.DataArray (y: 4, x: 3)> Size: 96B
array([[ 1,  5,  9],
       [ 2,  6, 10],
       [ 3,  7, 11],
       [ 4,  8, 12]])
Coordinates:
  * x        (x) int64 24B 12 13 14
Dimensions without coordinates: y

Notice the following:

The guvectorize decorator includes the concept of “core dimensions”: '(n),()->(n)'. This string means that the g takes a 1D vector of size n, a scalar, and returns a 1D vector of size n. There is one core dimension for the input, and one core dimension for the output. Both core dimensions have the same size.
That string translates to input_core_dims=[["x"], []], output_core_dims=[["x"]] in apply_ufunc.
We don’t provide vectorize=True to apply_ufunc since numba will handle the vectorization in compiled code automatically.

With dask#

Use the chunked DataArray

da_dask = da.chunk({"y": 1})
da_dask

<xarray.DataArray (x: 3, y: 4)> Size: 96B
dask.array<xarray-<this-array>, shape=(3, 4), dtype=int64, chunksize=(3, 1), chunktype=numpy.ndarray>
Coordinates:
  * x        (x) int64 24B 12 13 14
Dimensions without coordinates: y
Attributes:
    foo:      bar

Exercise

Apply g to da_dask

Solution

xr.apply_ufunc(
    g,
    da_dask, 
    1, 
    input_core_dims=[["x"], []], 
    output_core_dims=[["x"]],
    dask="parallelized",
)

Next#

For more, see the numpy.interp end-to-end example in the left sidebar.