A gentle introduction#
map_blocks
is inspired by the dask.array
function of the same name and lets
you map a function on blocks of the xarray object (including Datasets!).
At compute time, your function will receive an xarray object with concrete (computed) values along with appropriate metadata. This function should return an xarray object.
Setup#
First lets set up a LocalCluster
using dask.distributed.
You can use any kind of dask cluster. This step is completely independent of xarray. While not strictly necessary, the dashboard provides a nice learning tool.
Client
Client-01dbde7f-8a44-11ef-8e3c-000d3a9e3c97
Connection method: Cluster object | Cluster type: distributed.LocalCluster |
Dashboard: http://127.0.0.1:8787/status |
Cluster Info
LocalCluster
e20dbe05
Dashboard: http://127.0.0.1:8787/status | Workers: 4 |
Total threads: 4 | Total memory: 15.61 GiB |
Status: running | Using processes: True |
Scheduler Info
Scheduler
Scheduler-9d4df423-dfa8-4712-9493-14ddf544563c
Comm: tcp://127.0.0.1:44483 | Workers: 4 |
Dashboard: http://127.0.0.1:8787/status | Total threads: 4 |
Started: Just now | Total memory: 15.61 GiB |
Workers
Worker: 0
Comm: tcp://127.0.0.1:46739 | Total threads: 1 |
Dashboard: http://127.0.0.1:35055/status | Memory: 3.90 GiB |
Nanny: tcp://127.0.0.1:40657 | |
Local directory: /tmp/dask-scratch-space/worker-uksmetpf |
Worker: 1
Comm: tcp://127.0.0.1:37721 | Total threads: 1 |
Dashboard: http://127.0.0.1:46457/status | Memory: 3.90 GiB |
Nanny: tcp://127.0.0.1:33927 | |
Local directory: /tmp/dask-scratch-space/worker-hlozomb6 |
Worker: 2
Comm: tcp://127.0.0.1:44335 | Total threads: 1 |
Dashboard: http://127.0.0.1:39775/status | Memory: 3.90 GiB |
Nanny: tcp://127.0.0.1:40585 | |
Local directory: /tmp/dask-scratch-space/worker-emvfeib8 |
Worker: 3
Comm: tcp://127.0.0.1:38443 | Total threads: 1 |
Dashboard: http://127.0.0.1:42433/status | Memory: 3.90 GiB |
Nanny: tcp://127.0.0.1:46637 | |
Local directory: /tmp/dask-scratch-space/worker-4ot9ejd2 |
👆
Click the Dashboard link above. Or click the "Search" button in the dashboard.Let’s test that the dashboard is working..
import dask.array
dask.array.ones((1000, 4), chunks=(2, 1)).compute() # should see activity in dashboard
array([[1., 1., 1., 1.],
[1., 1., 1., 1.],
[1., 1., 1., 1.],
...,
[1., 1., 1., 1.],
[1., 1., 1., 1.],
[1., 1., 1., 1.]])
Let’s open a dataset. We specify chunks
so that we create a dask arrays for the DataArrays
ds = xr.tutorial.open_dataset("air_temperature", chunks={"time": 100})
ds
<xarray.Dataset> Size: 31MB Dimensions: (lat: 25, time: 2920, lon: 53) Coordinates: * lat (lat) float32 100B 75.0 72.5 70.0 67.5 65.0 ... 22.5 20.0 17.5 15.0 * lon (lon) float32 212B 200.0 202.5 205.0 207.5 ... 325.0 327.5 330.0 * time (time) datetime64[ns] 23kB 2013-01-01 ... 2014-12-31T18:00:00 Data variables: air (time, lat, lon) float64 31MB dask.array<chunksize=(100, 25, 53), meta=np.ndarray> Attributes: Conventions: COARDS title: 4x daily NMC reanalysis (1948) description: Data is from NMC initialized reanalysis\n(4x/day). These a... platform: Model references: http://www.esrl.noaa.gov/psd/data/gridded/data.ncep.reanaly...
Simple example#
Here is an example
def time_mean(obj):
# use xarray's convenient API here
# you could convert to a pandas dataframe and use pandas' extensive API
# or use .plot() and plt.savefig to save visualizations to disk in parallel.
return obj.mean("lat")
ds.map_blocks(time_mean) # this is lazy!
<xarray.Dataset> Size: 1MB Dimensions: (lon: 53, time: 2920) Coordinates: * lon (lon) float32 212B 200.0 202.5 205.0 207.5 ... 325.0 327.5 330.0 * time (time) datetime64[ns] 23kB 2013-01-01 ... 2014-12-31T18:00:00 Data variables: air (time, lon) float64 1MB dask.array<chunksize=(100, 53), meta=np.ndarray>
# this will calculate values and will return True if the computation works as expected
ds.map_blocks(time_mean).identical(ds.mean("lat"))
True
Exercise#
Try applying the following function with map_blocks
. Specify scale
as an
argument and offset
as a kwarg.
The docstring should help: https://docs.xarray.dev/en/stable/generated/xarray.map_blocks.html
def time_mean_scaled(obj, scale, offset):
return obj.mean("lat") * scale + offset
More advanced functions#
map_blocks
needs to know what the returned object looks like exactly. It
does so by passing a 0-shaped xarray object to the function and examining the
result. This approach cannot work in all cases For such advanced use cases,
map_blocks
allows a template
kwarg. See
https://docs.xarray.dev/en/stable/user-guide/dask.html#map-blocks for more details
client.close()