Binary data without lazy loading#
Author: Aureliana Barghini (B-Open)
BackendEntrypoint#
Implement a subclass of BackendEntrypoint
that expose a method open_dataset
:
from xarray.backends import BackendEntrypoint
class MyBackendEntrypoint(BackendEntrypoint):
def open_dataset(
self,
filename_or_obj,
*,
drop_variables=None,
):
return my_open_dataset(filename_or_obj, drop_variables=drop_variables)
BackendEntrypoint integration#
Declare this class as an external plugin in your setup.py
:
setuptools.setup(
entry_points={
'xarray.backends': ['engine_name=package.module:my_backendentrypoint'],
},
)
or pass it in xr.open_dataset
:
xr.open_dataset(filename, engine=MyBackendEntrypoint)
Example backend for binary files#
Create sample files#
Define the entrypoint#
Example of backend to open binary files
class BinaryBackend(xr.backends.BackendEntrypoint):
def open_dataset(
self,
filename_or_obj,
*,
drop_variables=None,
# backend specific parameter
dtype=np.int64,
):
with open(filename_or_obj) as f:
arr = np.fromfile(f, dtype)
var = xr.Variable(dims=("x"), data=arr)
coords = {"x": np.arange(arr.size) * 10}
return xr.Dataset({"foo": var}, coords=coords)
It Works!#
But it may be memory demanding
arr = xr.open_dataarray("foo.bin", engine=BinaryBackend)
arr
<xarray.DataArray 'foo' (x: 30000000)> Size: 240MB [30000000 values with dtype=int64] Coordinates: * x (x) int64 240MB 0 10 20 30 ... 299999970 299999980 299999990
arr = xr.open_dataarray("foo_float.bin", engine=BinaryBackend, dtype=np.float64)
arr
<xarray.DataArray 'foo' (x: 30000000)> Size: 240MB [30000000 values with dtype=float64] Coordinates: * x (x) int64 240MB 0 10 20 30 ... 299999970 299999980 299999990
arr.sel(x=slice(0, 100))
<xarray.DataArray 'foo' (x: 11)> Size: 88B [11 values with dtype=float64] Coordinates: * x (x) int64 88B 0 10 20 30 40 50 60 70 80 90 100