Binary data without lazy loading#

Author: Aureliana Barghini (B-Open)

BackendEntrypoint#

Implement a subclass of BackendEntrypoint that expose a method open_dataset:

from xarray.backends import BackendEntrypoint

class MyBackendEntrypoint(BackendEntrypoint):
    def open_dataset(
        self,
        filename_or_obj,
        *,
        drop_variables=None,
    ):

        return my_open_dataset(filename_or_obj, drop_variables=drop_variables)

BackendEntrypoint integration#

Declare this class as an external plugin in your setup.py:

setuptools.setup(
    entry_points={
        'xarray.backends': ['engine_name=package.module:my_backendentrypoint'],
    },
)

or pass it in xr.open_dataset:

xr.open_dataset(filename, engine=MyBackendEntrypoint)

Example backend for binary files#

import numpy as np
import xarray as xr

Create sample files#

arr = np.arange(30000000, dtype=np.int64)
with open("foo.bin", "w") as f:
    arr.tofile(f)

arr = np.arange(30000000, dtype=np.float64)
with open("foo_float.bin", "w") as f:
    arr.tofile(f)

Define the entrypoint#

Example of backend to open binary files

class BinaryBackend(xr.backends.BackendEntrypoint):
    def open_dataset(
        self,
        filename_or_obj,
        *,
        drop_variables=None,
        # backend specific parameter
        dtype=np.int64,
    ):
        with open(filename_or_obj) as f:
            arr = np.fromfile(f, dtype)

        var = xr.Variable(dims=("x"), data=arr)
        coords = {"x": np.arange(arr.size) * 10}
        return xr.Dataset({"foo": var}, coords=coords)

It Works!#

But it may be memory demanding

arr = xr.open_dataarray("foo.bin", engine=BinaryBackend)
arr
<xarray.DataArray 'foo' (x: 30000000)> Size: 240MB
[30000000 values with dtype=int64]
Coordinates:
  * x        (x) int64 240MB 0 10 20 30 ... 299999970 299999980 299999990
arr = xr.open_dataarray("foo_float.bin", engine=BinaryBackend, dtype=np.float64)
arr
<xarray.DataArray 'foo' (x: 30000000)> Size: 240MB
[30000000 values with dtype=float64]
Coordinates:
  * x        (x) int64 240MB 0 10 20 30 ... 299999970 299999980 299999990
arr.sel(x=slice(0, 100))
<xarray.DataArray 'foo' (x: 11)> Size: 88B
[11 values with dtype=float64]
Coordinates:
  * x        (x) int64 88B 0 10 20 30 40 50 60 70 80 90 100