xarray.DataTree and hierarchical data#

In this lesson, we will learn how to use xarray.DataTree with hierarchical data. By the end of the lesson, we will be able to:

Learning Goals

  • Understand a basic DataTree structure (nodes, parents and children)

  • Selecting DataTree

  • Understand coordinate inheritance :::

What is a DataTree#

Example DataTree 1#

image.png

Example DataTree 2#

image.png

import xarray as xr

Opening a dataset with open_datatree()#

Let’s open up a precipitation dataset. This dataset was derived from “GPM_3IMERGHH_07” and “M2T1NXFLX_5.12.4” products.

precipitation = xr.tutorial.open_datatree('precipitation.nc4')

Nodes#

Groups in a netcdf4 or hdf5 file in the DataTree model are represented as “nodes” in the DataTree model. We can list all of the groups with .groups

precipitation.groups
('/', '/observed', '/reanalysis')

Accessing variables in a nested groups#

Nested variables and groups can be accessed with either dict-like syntax or method based syntax.

precipitation['observed']

# Returns a DataTree object, containing the variables, dimensions, and coordinates in the "observed" node
<xarray.DatasetView> Size: 2MB
Dimensions:        (time: 10, lon: 320, lat: 150)
Coordinates:
  * time           (time) datetime64[ns] 80B 2021-08-29T07:30:00 ... 2021-08-...
  * lon            (lon) float32 1kB -109.9 -109.8 -109.8 ... -78.15 -78.05
  * lat            (lat) float32 600B 20.05 20.15 20.25 ... 34.75 34.85 34.95
Data variables:
    precipitation  (time, lon, lat) float32 2MB ...
precipitation['/observed/precipitation']
<xarray.DataArray 'precipitation' (time: 10, lon: 320, lat: 150)> Size: 2MB
[480000 values with dtype=float32]
Coordinates:
  * time     (time) datetime64[ns] 80B 2021-08-29T07:30:00 ... 2021-08-29T16:...
  * lon      (lon) float32 1kB -109.9 -109.8 -109.8 ... -78.25 -78.15 -78.05
  * lat      (lat) float32 600B 20.05 20.15 20.25 20.35 ... 34.75 34.85 34.95
Attributes:
    LongName:          \nComplete merged microwave-infrared (gauge-adjusted)\...
    Units:             mm/hr
    units:             mm/hr
    CodeMissingValue:  -9999.9
    DimensionNames:    time,lon,lat
precipitation.reanalysis.precipitation

# Method based syntax
<xarray.DataArray 'precipitation' (time: 10, lat: 31, lon: 52)> Size: 64kB
[16120 values with dtype=float32]
Coordinates:
  * time     (time) datetime64[ns] 80B 2021-08-29T07:30:00 ... 2021-08-29T16:...
  * lon      (lon) float64 416B -110.0 -109.4 -108.8 ... -79.38 -78.75 -78.12
  * lat      (lat) float64 248B 20.0 20.5 21.0 21.5 22.0 ... 33.5 34.0 34.5 35.0

Get the parent and child nodes from a group#

precipitation['reanalysis'].parent
<xarray.DatasetView> Size: 80B
Dimensions:  (time: 10)
Coordinates:
  * time     (time) datetime64[ns] 80B 2021-08-29T07:30:00 ... 2021-08-29T16:...
Data variables:
    *empty*
precipitation.children
Frozen({'observed': <xarray.DataTree 'observed'>
Group: /observed
    Dimensions:        (time: 10, lon: 320, lat: 150)
    Coordinates:
      * lon            (lon) float32 1kB -109.9 -109.8 -109.8 ... -78.15 -78.05
      * lat            (lat) float32 600B 20.05 20.15 20.25 ... 34.75 34.85 34.95
    Inherited coordinates:
      * time           (time) datetime64[ns] 80B 2021-08-29T07:30:00 ... 2021-08-...
    Data variables:
        precipitation  (time, lon, lat) float32 2MB ..., 'reanalysis': <xarray.DataTree 'reanalysis'>
Group: /reanalysis
    Dimensions:        (time: 10, lat: 31, lon: 52)
    Coordinates:
      * lon            (lon) float64 416B -110.0 -109.4 -108.8 ... -78.75 -78.12
      * lat            (lat) float64 248B 20.0 20.5 21.0 21.5 ... 34.0 34.5 35.0
    Inherited coordinates:
      * time           (time) datetime64[ns] 80B 2021-08-29T07:30:00 ... 2021-08-...
    Data variables:
        precipitation  (time, lat, lon) float32 64kB ...})

Inheritance#

DataTree implements a simple inheritance mechanism. Coordinates, dimensions and their associated indices are propagated downward from the root node to all descendent nodes.

Let’s take a look at some inherited coordinates with our precipitation dataset

precipitation.time
<xarray.DataArray 'time' (time: 10)> Size: 80B
array(['2021-08-29T07:30:00.000000000', '2021-08-29T08:30:00.000000000',
       '2021-08-29T09:30:00.000000000', '2021-08-29T10:30:00.000000000',
       '2021-08-29T11:30:00.000000000', '2021-08-29T12:30:00.000000000',
       '2021-08-29T13:30:00.000000000', '2021-08-29T14:30:00.000000000',
       '2021-08-29T15:30:00.000000000', '2021-08-29T16:30:00.000000000'],
      dtype='datetime64[ns]')
Coordinates:
  * time     (time) datetime64[ns] 80B 2021-08-29T07:30:00 ... 2021-08-29T16:...

The "time" dimension is defined at the root node of our dataset and is propagated downward to the “observed” and “reanalysis” group

precipitation.observed
<xarray.DatasetView> Size: 2MB
Dimensions:        (time: 10, lon: 320, lat: 150)
Coordinates:
  * time           (time) datetime64[ns] 80B 2021-08-29T07:30:00 ... 2021-08-...
  * lon            (lon) float32 1kB -109.9 -109.8 -109.8 ... -78.15 -78.05
  * lat            (lat) float32 600B 20.05 20.15 20.25 ... 34.75 34.85 34.95
Data variables:
    precipitation  (time, lon, lat) float32 2MB ...
precipitation.reanalysis
<xarray.DatasetView> Size: 65kB
Dimensions:        (time: 10, lat: 31, lon: 52)
Coordinates:
  * time           (time) datetime64[ns] 80B 2021-08-29T07:30:00 ... 2021-08-...
  * lon            (lon) float64 416B -110.0 -109.4 -108.8 ... -78.75 -78.12
  * lat            (lat) float64 248B 20.0 20.5 21.0 21.5 ... 34.0 34.5 35.0
Data variables:
    precipitation  (time, lat, lon) float32 64kB ...

Review#

Example DataTree 1#

image.png

Example DataTree 2#

image.png

Exercises#

  1. Make a plot of plot of the Oberserved and Predicted precipitation

  2. Think through if there are datasets from your field