# Core dimensions

[Previously](gentle-intro) we learned to use `apply_ufunc` on simple functions that acted element by element. 

Here we move on to slightly more complex functions like `np.mean` that can act along a subset of an input array's dimensions.

Such operations involve the concept of "core dimensions". 

Our learning goals are:
- Learn how to identify "core dimensions" for the function you're applying.
- Learn that "core dimensions" are automatically moved or transposed to the end of the array.


## Introduction

For using more complex operations that consider some array values collectively,
itâ€™s important to understand the idea of **core dimensions**. 
Usually, they correspond to the fundamental dimensions over
which an operation is defined, e.g., the summed axis in `np.sum`. One way to think about core dimensions 
is to consider the smallest dimensionality of data that the function acts on.

```{important}

A good clue that core dimensions are needed is the presence of an `axis` argument on the
corresponding NumPy function.

```


## Setup

In [None]:
%xmode minimal

import numpy as np
import xarray as xr

# limit the amount of information printed to screen
xr.set_options(display_expand_data=False)
np.set_printoptions(threshold=10, edgeitems=2)

Let's load a dataset

In [None]:
ds = xr.tutorial.load_dataset("air_temperature")
ds

## Reducing with `np.mean`

Let's write a function that computes the mean along `time` for a provided xarray object. 

This function requires one core dimension `time`. For `ds.air` note that `time` is the 0th axis.

In [None]:
ds.air.dims

`get_axis_num` is a useful method.

In [None]:
ds.air.get_axis_num("time")

In [None]:
np.mean(ds.air, axis=ds.air.get_axis_num("time"))

In [None]:
np.mean(ds.air.data, axis=0)

Let's try to use `apply_ufunc` to replicate `np.mean(ds.air.data, axis=0)`

In [None]:
xr.apply_ufunc(
    # function to apply
    np.mean,
    # object with data to pass to function
    ds,
    # keyword arguments to pass to np.mean
    kwargs={"axis": 0},
)

The error here
```
applied function returned data with unexpected number of dimensions. 
Received 2 dimension(s) but expected 3 dimensions with names: ('time', 'lat', 'lon')
```

means that while `np.mean` did indeed reduce one dimension, we did not tell `apply_ufunc` that this would happen. That is, we need to specify the core dimensions on the input.

Do that by passing a list of dimension names for each input object. For this function we have one input : `ds` and with a single core dimension `"time"` so we have `input_core_dims=[["time"]]`

In [None]:
xr.apply_ufunc(
    np.mean,
    ds,
    # specify core dimensions as a list of lists
    # here 'time' is the core dimension on `ds`
    input_core_dims=[
        ["time"],  # core dimension for ds
    ],
    kwargs={"axis": 0},
)

This next error is a little confusing.

```
size of dimension 'lat' on inputs was unexpectedly changed by applied function from 25 to 53. 
Only dimensions specified in ``exclude_dims`` with xarray.apply_ufunc are allowed to change size.
```


A good trick here is to pass a little wrapper function to `apply_ufunc` instead and inspect the shapes of data received by the wrapper.


In [None]:
def wrapper(array, **kwargs):
    print(f"received {type(array)} shape: {array.shape}, kwargs: {kwargs}")
    result = np.mean(array, **kwargs)
    print(f"result.shape: {result.shape}")
    return result


xr.apply_ufunc(
    wrapper,
    ds,
    # specify core dimensions as a list of lists
    # here 'time' is the core dimension on `ds`
    input_core_dims=[["time"]],
    kwargs={"axis": 0},
)

Now we see the issue:

    received <class 'numpy.ndarray'> shape: (25, 53, 2920), kwargs: {'axis': 0}
    result.shape: (53, 2920)
    
The `time` dimension is of size `2920` and is now the last axis of the array but was initially the first axis

In [None]:
ds.air.get_axis_num("time")

```{important}
This illustrates an important concept. Arrays are transposed so that core dimensions are at the end.
```

With `apply_ufunc`, core dimensions are recognized by name, and then moved to
the last dimension of any input arguments before applying the given function.
This means that for functions that accept an `axis` argument, you usually need
to set `axis=-1`

Such behaviour means that our functions (like `wrapper` or `np.mean`) do not need to know the exact order of dimensions. They can rely on the core dimensions being at the end allowing us to write very general code! 

We can fix our `apply_ufunc` call by specifying `axis=-1` instead.

In [None]:
def wrapper(array, **kwargs):
    print(f"received {type(array)} shape: {array.shape}, kwargs: {kwargs}")
    result = np.mean(array, **kwargs)
    print(f"result.shape: {result.shape}")
    return result


xr.apply_ufunc(
    wrapper,
    ds,
    input_core_dims=[["time"]],
    kwargs={"axis": -1},
)

::::{admonition} Exercise
:class: tip

Use `apply_ufunc` to apply `scipy.integrate.trapezoid` along the `time` axis.

:::{admonition} Solution
:class: dropdown

```python
import scipy as sp
import scipy.integrate

xr.apply_ufunc(scipy.integrate.trapezoid, ds, input_core_dims=[["time"]], kwargs={"axis": -1})
```
:::
::::