A Tour of Xarray Customizations#

Xarray is easily extensible. This means it is easy to add on to to build custom packages that tackle particular computational problems or supply domain specific functionality.

These packages can plug in to xarray in various different ways. They may build directly on top of xarray, or they may take advantage of some of xarray’s dedicated interfacing features:

  • Accessors (extensions)

  • Backend (filetype) entrypoint

  • Metadata attributes

  • Duck-array wrapping interface

  • Flexible indexes (coming soon!)

Here we introduce several popular or interesting extensions that are installable as their own packages (via conda and pip). These packages integrate with xarray using one or more of the features mentioned above.

  • hvplot, a powerful interactive plotting library

  • rioxarray, for working with geospatial raster data using rasterio

  • cf-xarray, for interpreting CF-compliant data

  • pint-xarray, for unit-aware computations using pint.

Specific examples for implementing your own xarray customizations using these interfacing features are available in the ADVANCED section of this book.

import xarray as xr
import numpy as np

Quick note on accessors#

Before we look at the packages we need to briefly introduce a feature they commonly use: “xarray accessors”.

The accessor-style syntax is used heavily by the other libraries we are about to cover.

For users, accessors just allow us to have a familiar dot (method-like) syntax on xarray objects, for example da.hvplot(), da.pint.quantify(), or da.cf.describe().

hvplot via accessors#

The HoloViews library makes great use of accessors to allow seamless plotting of xarray data using a completely different plotting backend (by default, xarray uses matplotlib).

We first need to import the code that registers the hvplot accessor

import hvplot.xarray

And now we can call the .hvplot method to plot using holoviews in the same way that we would have used .plot to plot using matplotlib.

ds = xr.tutorial.load_dataset("air_temperature")
ds['air'].isel(time=1).hvplot(cmap="fire")

For some more examples of how powerful HoloViews is see here.

Rioxarray via the backend entrypoint#

Rioxarray is a Python library that enhances Xarray’s ability to work with geospatial data and coordinate reference systems. Geographic information systems use GeoTIFF and many other formats to organize and store gridded, or raster, datasets.

The Geospatial Data Abstraction Library (GDAL) provides foundational drivers and geospatial algorithms, and the rasterio library provides a Pythonic interface to GDAL. Rioxarray brings key features of rasterio to Xarray:

  1. A backend engine to read any format recognized by GDAL

  2. A .rio accessor for rasterio’s geospatial algorithms such as reprojection

Below a couple brief examples to illustrate these features:

import rioxarray  # ensure you have rioxarray installed in your environment

You can explicitly use rioxarray’s ‘rasterio’ engine to load myriad geospatial raster formats. Below is a Cloud-Optimized Geotiff from an AWS public dataset of synthetic aperture radar data over Washington, State, USA. overview_level=4 is an argument specific to the rasterio engine that allows opening a pre-computed lower resolution “overview” of the data.

url = 'https://sentinel-s1-rtc-indigo.s3.us-west-2.amazonaws.com/tiles/RTC/1/IW/10/U/CU/2017/S1A_20170101_10UCU_ASC/Gamma0_VV.tif'
da = xr.open_dataarray(url, engine='rasterio', open_kwargs={"overview_level": 2})
da
<xarray.DataArray 'band_data' (band: 1, y: 687, x: 687)> Size: 2MB
[471969 values with dtype=float32]
Coordinates:
  * band         (band) int64 8B 1
  * x            (x) float64 5kB 3.001e+05 3.002e+05 ... 4.096e+05 4.097e+05
  * y            (y) float64 5kB 5.4e+06 5.4e+06 5.4e+06 ... 5.29e+06 5.29e+06
    spatial_ref  int64 8B ...
Attributes:
    ABSOLUTE_ORBIT_NUMBER:  14632
    DATE:                   2017-01-01
    MISSION_ID:             S1A
    NUMBER_SCENES:          1
    ORBIT_DIRECTION:        ascending
    OVR_RESAMPLING_ALG:     AVERAGE
    SCENES:                 S1A_IW_GRDH_1SDV_20170101T021007_20170101T021036_...
    SCENE_1_METADATA:       {"title": "S1A_IW_GRDH_1SDV_20170101T021007_20170...
    SCENE_1_PRODUCT_INFO:   {"id": "S1A_IW_GRDH_1SDV_20170101T021007_20170101...
    TILE_ID:                10UCU
    VALID_PIXEL_PERCENT:    66.318
    AREA_OR_POINT:          Area

The spatial_ref coordinate is added by rioxarray to store standardized geospatial Coordinate Reference System (CRS) information. We can access that information and additional methods via the .rio accessor:

da.rio.crs
CRS.from_epsg(32610)

EPSG refers to ‘European Petroleum Survey Group’, a database of the many CRS definitions for our Planet used over the years! EPSG=32610 is the “UTM 10N” CRS, with coordinate units in meters. Let’s say you want longitude,latitude coordinate points in degrees instead. You’d have to reproject this data:

da_lonlat = da.rio.reproject('epsg:4326')
da_lonlat
<xarray.DataArray 'band_data' (band: 1, y: 548, x: 820)> Size: 2MB
array([[[nan, nan, nan, ..., nan, nan, nan],
        [nan, nan, nan, ..., nan, nan, nan],
        [nan, nan, nan, ..., nan, nan, nan],
        ...,
        [nan, nan, nan, ..., nan, nan, nan],
        [nan, nan, nan, ..., nan, nan, nan],
        [nan, nan, nan, ..., nan, nan, nan]]], dtype=float32)
Coordinates:
  * x            (x) float64 7kB -125.7 -125.7 -125.7 ... -124.2 -124.2 -124.2
  * y            (y) float64 4kB 48.75 48.74 48.74 48.74 ... 47.74 47.74 47.73
  * band         (band) int64 8B 1
    spatial_ref  int64 8B 0
Attributes:
    ABSOLUTE_ORBIT_NUMBER:  14632
    DATE:                   2017-01-01
    MISSION_ID:             S1A
    NUMBER_SCENES:          1
    ORBIT_DIRECTION:        ascending
    OVR_RESAMPLING_ALG:     AVERAGE
    SCENES:                 S1A_IW_GRDH_1SDV_20170101T021007_20170101T021036_...
    SCENE_1_METADATA:       {"title": "S1A_IW_GRDH_1SDV_20170101T021007_20170...
    SCENE_1_PRODUCT_INFO:   {"id": "S1A_IW_GRDH_1SDV_20170101T021007_20170101...
    TILE_ID:                10UCU
    VALID_PIXEL_PERCENT:    66.318
    AREA_OR_POINT:          Area

Note that that the size of the data has changed as well as the coordinate values. This is typical of reprojection, as your data must be resampled and often interpolated to match the new CRS grid! A quick plot will compare the results of our reprojected data:

import panel as pn

img1 = da.sel(band=1).hvplot.image(
    x='x', y='y', rasterize=True, cmap='gray', clim=(0, 0.5), title='UTM'
)
img2 = da_lonlat.sel(band=1).hvplot.image(
    rasterize=True, cmap='gray', clim=(0, 0.5), title='LON/LAT'
)

pn.Column(img1, img2)