Advanced Indexing

Advanced Indexing#

Learning Objectives#

Orthogonal vs. Pointwise (Vectorized) Indexing.
Pointwise indexing in Xarray to extract data at a collection of points.
Understand the difference between NumPy and Xarray indexing behavior.

Overview#

In the previous notebooks, we learned basic forms of indexing with Xarray, including positional and label-based indexing, datetime indexing, and nearest neighbor lookups. We also learned that indexing an Xarray DataArray directly works (mostly) like it does for NumPy arrays; however, Xarray indexing behavior deviates from NumPy when using multiple arrays for indexing, like arr[[0, 1], [0, 1]].

To better understand this difference, let’s take a look at an example of 2D 5x5 array:

import numpy as np

# Create a 5x5 array with values from 1 to 25
np_array = np.arange(1, 26).reshape(5, 5)
np_array

array([[ 1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10],
       [11, 12, 13, 14, 15],
       [16, 17, 18, 19, 20],
       [21, 22, 23, 24, 25]])

Now create a Xarray DataArray from this NumPy array:

import xarray as xr

da = xr.DataArray(np_array, dims=["x", "y"])
da

<xarray.DataArray (x: 5, y: 5)> Size: 200B
array([[ 1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10],
       [11, 12, 13, 14, 15],
       [16, 17, 18, 19, 20],
       [21, 22, 23, 24, 25]])
Dimensions without coordinates: x, y

Now, let’s see how the indexing behavior is different between NumPy array and Xarray DataArray when indexing with multiple arrays:

np_array[[0, 2, 4], [0, 2, 4]]

array([ 1, 13, 25])

da[[0, 2, 4], [0, 2, 4]]

<xarray.DataArray (x: 3, y: 3)> Size: 72B
array([[ 1,  3,  5],
       [11, 13, 15],
       [21, 23, 25]])
Dimensions without coordinates: x, y

The image below summarizes the difference between vectorized and orthogonal indexing for a 2D 5x5 NumPy array and Xarray DataArray:

Orthogonal vs. Vectorized Indexing

Pointwise or Vectorized indexing, shown on the left, selects specific elements at given coordinates, resulting in an array of those individual elements. In the example shown, the indices [0, 2, 4], [0, 2, 4] select the elements at positions (0, 0), (2, 2), and (4, 4), resulting in the values [1, 13, 25]. This is the default behavior of NumPy arrays.

In contrast, orthogonal indexing uses the same indices to select entire rows and columns, forming the Cartesian product of the specified indices. This method results in sub-arrays that include all combinations of the selected rows and columns. The example demonstrates this by selecting rows 0, 2, and 4 and columns 0, 2, and 4, resulting in a subarray containing [[1, 3, 5], [11, 13, 15], [21, 23, 25]]. This is Xarray DataArray’s default behavior.

The output of vectorized indexing is a 1D array, while the output of orthogonal indexing is a 3x3 array.

Tip

To Summarize:

Pointwise or vectorized indexing is a more general form of indexing that allows for arbitrary combinations of indexing arrays. This method of indexing is analogous to the broadcasting rules in NumPy, where the dimensions of the indexers are aligned and the result is determined by the shape of the indexers. This is the default behavior in NumPy.
Orthogonal or outer indexing allows for indexing along each dimension independently, treating the indexers as one-dimensional arrays. The principle of outer or orthogonal indexing is that the result mirrors the effect of independently indexing along each dimension with integer or boolean arrays, treating both the indexed and indexing arrays as one-dimensional. This method of indexing is analogous to vector indexing in programming languages like MATLAB, Fortran, and R, where each indexer component independently selects along its corresponding dimension. This is the default behavior in Xarray.

Orthogonal indexing with NumPy

While pointwise indexing is the default behavior in NumPy, you can achieve orthogonal indexing by using the np.ix_ function. This function constructs an open mesh from multiple arrays, allowing you to index along each dimension independently similar to Xarray indexing behavior. For example:

ixgrid = np.ix_([0, 2, 4], [0, 2, 4])
np_array[ixgrid]

Exercises#

Exercise

In the simple 2D 5x5 Xarray data array above, select the sub-array containing (0,0),(2,2),(4,4):

Solution

indices = np.array([0, 2, 4])

xs_da = xr.DataArray(indices, dims="points")
ys_da = xr.DataArray(indices, dims="points")

subset_da = da.sel(x=xs_da, y=xs_da)
subset_da

Additional Resources#

Xarray Docs - Indexing and Selecting Data

Advanced Indexing

Contents

Advanced Indexing#

Learning Objectives#

Overview#

Orthogonal Indexing in Xarray#

Vectorized or Pointwise Indexing in Xarray#

Exercises#

Additional Resources#