# Hierarchical computations

In this lesson, we extend what we learned about <project:#fundamentals/basic-computation> to hierarchical datasets. By the end of the lesson, we will be able to:

- Apply basic arithmetic and label-aware reductions to xarray DataTree objects
- Apply arbitrary functions across all nodes across a tree

In [None]:
import xarray as xr
import numpy as np

xr.set_options(keep_attrs=True, display_expand_attrs=False, display_expand_data=False)

## Example dataset

First we load the NMC reanalysis air temperature dataset and arrange it to form a hierarchy of temporal resolutions:

In [None]:
ds = xr.tutorial.open_dataset("air_temperature")

ds_daily = ds.resample(time="D").mean("time")
ds_weekly = ds.resample(time="W").mean("time")
ds_monthly = ds.resample(time="ME").mean("time")

tree = xr.DataTree.from_dict(
    {
        "daily": ds_daily,
        "weekly": ds_weekly,
        "monthly": ds_monthly,
        "": xr.Dataset(attrs={"name": "NMC reanalysis temporal pyramid"}),
    }
)
tree

## Arithmetic

As an extension to `Dataset`, `DataTree` objects automatically apply arithmetic to all variables within all nodes:

In [None]:
tree - 273.15

## Indexing

Just like arithmetic, indexing is simply forwarded to the node datasets. The only difference is that nodes that don't have a certain coordinate / dimension are skipped instead of raising an error:

In [None]:
tree.isel(lat=slice(None, 10))

In [None]:
tree.sel(time="2013-11")

## Reductions

In a similar way, we can reduce all nodes in the datatree at once:

In [None]:
tree.mean(dim=["lat", "lon"])

## Applying functions designed for `Dataset` with `map_over_datasets`

What if we wanted to convert the data to log-space? For a `Dataset` or `DataArray`, we could just use {py:func}`xarray.ufuncs.log`, but that does not support `DataTree` objects, yet:

In [None]:
xr.ufuncs.log(tree)

Note how the result is a empty `Dataset`?

To map a function to all nodes, we can use {py:func}`xarray.map_over_datasets` and {py:meth}`xarray.DataTree.map_over_datasets`: 

In [None]:
tree.map_over_datasets(xr.ufuncs.log)

We can also use a custom function to perform more complex operations, like subtracting a group mean:

In [None]:
def demean(ds):
    return ds.groupby("time.day") - ds.groupby("time.day").mean()

Applying that to the dataset raises an error, though:

In [None]:
tree.map_over_datasets(demean)

The reason for this error is that the root node does not have any variables, and thus in particular no `"time"` coordinate. To avoid the error, we have to skip computing the function for that node:

In [None]:
def demean(ds):
    if "time" not in ds.coords:
        return ds
    return ds.groupby("time.day") - ds.groupby("time.day").mean()


tree.map_over_datasets(demean)