Xarray’s Data structures

Xarray’s Data structures#

In this lesson, we cover the basics of Xarray data structures. By the end of the lesson, we will be able to:

Learning Goals

Understand the basic Xarray data structures DataArray and Dataset
Customize the display of Xarray data structures
The connection between Pandas and Xarray data structures

Data structures#

Xarray provides two data structures: the DataArray and Dataset. The DataArray class attaches dimension names, coordinates and attributes to multi-dimensional arrays while Dataset combines multiple DataArrays.

Both classes are most commonly created by reading data. To learn how to create a DataArray or Dataset manually, see the Creating Data Structures tutorial.

import numpy as np
import xarray as xr
import pandas as pd

# When working in a Jupyter Notebook you might want to customize Xarray display settings to your liking
# The following settings reduce the amount of data displayed out by default
xr.set_options(display_expand_attrs=False, display_expand_data=False)
np.set_printoptions(threshold=10, edgeitems=2)

To Pandas and back#

DataArray and Dataset objects are frequently created by converting from other libraries such as pandas or by reading from data storage formats such as NetCDF or zarr.

To convert from / to pandas, we can use the to_xarray methods on pandas objects or the to_pandas methods on xarray objects:

series = pd.Series(np.ones((10,)), index=list("abcdefghij"))
series

a    1.0
b    1.0
c    1.0
d    1.0
e    1.0
f    1.0
g    1.0
h    1.0
i    1.0
j    1.0
dtype: float64

arr = series.to_xarray()
arr

<xarray.DataArray (index: 10)> Size: 80B
1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
Coordinates:
  * index    (index) object 80B 'a' 'b' 'c' 'd' 'e' 'f' 'g' 'h' 'i' 'j'

arr.to_pandas()

index
a    1.0
b    1.0
c    1.0
d    1.0
e    1.0
f    1.0
g    1.0
h    1.0
i    1.0
j    1.0
dtype: float64

We can also control what pandas object is used by calling to_series / to_dataframe:

to_series#

This will always convert DataArray objects to pandas.Series, using a MultiIndex for higher dimensions

ds.air.to_series()

time                 lat   lon  
2013-01-01 00:00:00  75.0  200.0    241.20
                           202.5    242.50
                           205.0    243.50
                           207.5    244.00
                           210.0    244.10
                                     ...  
2014-12-31 18:00:00  15.0  320.0    297.39
                           322.5    297.19
                           325.0    296.49
                           327.5    296.19
                           330.0    295.69
Name: air, Length: 3869000, dtype: float64

to_dataframe#

This will always convert DataArray or Dataset objects to a pandas.DataFrame. Note that DataArray objects have to be named for this. Since columns in a DataFrame need to have the same index, they are broadcasted.

ds.air.to_dataframe()

			air
time	lat	lon
2013-01-01 00:00:00	75.0	200.0	241.20
		202.5	242.50
		205.0	243.50
		207.5	244.00
		210.0	244.10
...	...	...	...
2014-12-31 18:00:00	15.0	320.0	297.39
		322.5	297.19
		325.0	296.49
		327.5	296.19
		330.0	295.69

3869000 rows × 1 columns

Xarray’s Data structures

Contents

Xarray’s Data structures#

Data structures#

Dataset#

HTML vs text representations#

DataArray#

String representations#

Named dimensions#

Coordinates#

Attributes#

To Pandas and back#

to_series#

to_dataframe#