Parallelizing custom functions

Parallelizing custom functions#

Almost all of xarray’s built-in operations work on Dask arrays.

Sometimes analysis calls for functions that aren’t in xarray’s API (e.g. scipy). There are three ways to apply these functions in parallel on each block of your xarray object:

  1. Extract Dask arrays from xarray objects (.data) and use Dask directly e.g. (apply_gufunc, map_blocks, map_overlap, blockwise, reduction). Then wrap the result as an Xarray object.

  2. Use apply_ufunc to apply functions that consume and return duck arrays. This automates extracting the data from Xarray objects, applying a function, and then converting the bare array result back to a Xarray object.

  3. Use map_blocks, Dataset.map_blocks or DataArray.map_blocks to apply functions that consume and return xarray objects.

Which method you use ultimately depends on the type of input objects expected by the function you’re wrapping, and the level of performance or convenience you desire.