A gentle introduction#

map_blocks is inspired by the dask.array function of the same name and lets you map a function on blocks of the xarray object (including Datasets!).

At compute time, your function will receive an xarray object with concrete (computed) values along with appropriate metadata. This function should return an xarray object.


import dask
import numpy as np
import xarray as xr

First lets set up a LocalCluster using dask.distributed.

You can use any kind of dask cluster. This step is completely independent of xarray. While not strictly necessary, the dashboard provides a nice learning tool.

from dask.distributed import Client

client = Client()



Connection method: Cluster object Cluster type: distributed.LocalCluster

Cluster Info


Click the Dashboard link above. Or click the "Search" button in the dashboard.

Let’s test that the dashboard is working..

import dask.array

dask.array.ones((1000, 4), chunks=(2, 1)).compute()  # should see activity in dashboard
array([[1., 1., 1., 1.],
       [1., 1., 1., 1.],
       [1., 1., 1., 1.],
       [1., 1., 1., 1.],
       [1., 1., 1., 1.],
       [1., 1., 1., 1.]])

Let’s open a dataset. We specify chunks so that we create a dask arrays for the DataArrays

ds = xr.tutorial.open_dataset("air_temperature", chunks={"time": 100})
Dimensions:  (lat: 25, time: 2920, lon: 53)
  * lat      (lat) float32 75.0 72.5 70.0 67.5 65.0 ... 25.0 22.5 20.0 17.5 15.0
  * lon      (lon) float32 200.0 202.5 205.0 207.5 ... 322.5 325.0 327.5 330.0
  * time     (time) datetime64[ns] 2013-01-01 ... 2014-12-31T18:00:00
Data variables:
    air      (time, lat, lon) float32 dask.array<chunksize=(100, 25, 53), meta=np.ndarray>
    Conventions:  COARDS
    title:        4x daily NMC reanalysis (1948)
    description:  Data is from NMC initialized reanalysis\n(4x/day).  These a...
    platform:     Model

Simple example#

Here is an example

def time_mean(obj):
    # use xarray's convenient API here
    # you could convert to a pandas dataframe and use pandas' extensive API
    # or use .plot() and plt.savefig to save visualizations to disk in parallel.
    return obj.mean("lat")

ds.map_blocks(time_mean)  # this is lazy!
Dimensions:  (time: 2920, lon: 53)
  * time     (time) datetime64[ns] 2013-01-01 ... 2014-12-31T18:00:00
  * lon      (lon) float64 200.0 202.5 205.0 207.5 ... 322.5 325.0 327.5 330.0
Data variables:
    air      (time, lon) float32 dask.array<chunksize=(100, 53), meta=np.ndarray>
# this will calculate values and will return True if the computation works as expected


Try applying the following function with map_blocks. Specify scale as an argument and offset as a kwarg.

The docstring should help:

def time_mean_scaled(obj, scale, offset):
    return obj.mean("lat") * scale + offset

More advanced functions#

map_blocks needs to know what the returned object looks like exactly. It does so by passing a 0-shaped xarray object to the function and examining the result. This approach cannot work in all cases For such advanced use cases, map_blocks allows a template kwarg. See for more details