Skip to content

Kvikio backend entrypoint with Zarr v3 #70

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 39 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
39 commits
Select commit Hold shift + click to select a range
9deadb7
Add Kvikio backend entrypoint
dcherian Aug 2, 2022
aa2dc91
Add demo notebook
dcherian Aug 2, 2022
7fb4b94
Update kvikio notebook
dcherian Aug 16, 2022
743fe7d
Merge branch 'main' into kvikio-entrypoint
dcherian Aug 17, 2022
5d501e4
Merge branch 'main' into kvikio-entrypoint
andersy005 Aug 17, 2022
facf5f7
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Aug 17, 2022
f3f5189
Update cupy_xarray/kvikio.py
dcherian Aug 17, 2022
9c98d19
Merge branch 'main' into kvikio-entrypoint
andersy005 Jan 3, 2023
dd8bc57
Merge branch 'main' into kvikio-entrypoint
dcherian Jan 20, 2023
d2da1e4
Add url, description.
dcherian Jan 21, 2023
b87c3c2
Working
dcherian Aug 18, 2023
87cb74e
Updated notebook
dcherian Aug 22, 2023
d7394ef
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Aug 22, 2023
1b23fef
Merge remote-tracking branch 'upstream/main' into kvikio-entrypoint
dcherian Nov 3, 2023
ca0cf45
Add tests
dcherian Nov 3, 2023
97260d6
Merge branch 'main' into kvikio-entrypoint
weiji14 Jun 21, 2024
5d27b26
Move kvikio notebook under docs/source
weiji14 Jun 21, 2024
85491d7
Add zarr as a dependency in ci/doc.yml
weiji14 Jun 22, 2024
c470b97
Add entry for KvikioBackendEntrypoint in API docs
weiji14 Jun 22, 2024
95efa18
Fix input argument into CupyZarrArrayWrapper
weiji14 Jun 22, 2024
d684dad
Merge branch 'main' into kvikio-entrypoint
weiji14 Dec 14, 2024
ae2a7f1
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Dec 14, 2024
15fbafd
Re-add kvikio backend entrypoint to pyproject.toml
weiji14 Dec 14, 2024
f3df115
Fix C408 and E402
weiji14 Dec 14, 2024
4e1857a
Use get_duck_array instead of get_array
weiji14 Dec 14, 2024
7345b61
Fix SIM108 Use ternary operator
weiji14 Dec 16, 2024
e2b410e
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Dec 16, 2024
7dd78e9
Install nightly version of kvikio=25.04.00a and zarr>=3.0.5
weiji14 Mar 10, 2025
cb77678
Remove custom open_store_variable method from GDSZarrStore class
weiji14 Mar 10, 2025
0262151
Fix UserWarning compressor -> compressors
weiji14 Mar 10, 2025
e26ed24
Reuse logic from xarray.backends.zarr.ZarrStore.open_group
weiji14 Mar 11, 2025
f185b44
Install xarray=2025.1.3.dev22+g0184702f
weiji14 Mar 11, 2025
1a52ce5
Add zarr.config.enable_gpu() context manager to test_lazy_indexing
weiji14 Mar 11, 2025
789a9b6
Refresh kvikIO demo notebook
weiji14 Mar 11, 2025
3894d29
Merge branch 'main' into kvikio-backend
weiji14 Mar 11, 2025
7fa7c06
Bump xarray from 2025.1.3.dev22+g0184702f to 2025.03.0
weiji14 Mar 21, 2025
1e205ec
Try overriding default prototype to be GPU buffer
weiji14 Apr 1, 2025
9227810
Merge branch 'main' into kvikio-backend
weiji14 Apr 11, 2025
b45decb
Update to stable version of kvikio=25.04.00
weiji14 Apr 11, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 4 additions & 2 deletions ci/doc.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,9 @@ channels:
- conda-forge
dependencies:
- cupy-core
- rapidsai::kvikio>=25.04.00
- pip
- python=3.10
- python=3.11
- sphinx
- sphinx-design
- sphinx-copybutton
Expand All @@ -14,7 +15,8 @@ dependencies:
- ipywidgets
- furo>=2024.8.6
- myst-nb
- xarray
- xarray>=2025.03.0
- zarr>=3.0.3
- pip:
# relative to this file. Needs to be editable to be accepted.
- --editable ..
3 changes: 2 additions & 1 deletion cupy_xarray/__init__.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
from . import _version
from .accessors import CupyDataArrayAccessor, CupyDatasetAccessor # noqa
from .accessors import CupyDataArrayAccessor, CupyDatasetAccessor # noqa: F401
from .kvikio import KvikioBackendEntrypoint # noqa: F401

__version__ = _version.get_versions()["version"]
104 changes: 104 additions & 0 deletions cupy_xarray/kvikio.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,104 @@
"""
:doc:`kvikIO <kvikio:index>` backend for xarray to read Zarr stores directly into CuPy
arrays in GPU memory.
"""

import functools

from xarray.backends.common import _normalize_path # TODO: can this be public
from xarray.backends.store import StoreBackendEntrypoint
from xarray.backends.zarr import ZarrBackendEntrypoint, ZarrStore
from xarray.core.dataset import Dataset
from xarray.core.utils import close_on_error # TODO: can this be public.

try:
import kvikio.zarr
import zarr

has_kvikio = True
except ImportError:
has_kvikio = False


class KvikioBackendEntrypoint(ZarrBackendEntrypoint):
"""
Xarray backend to read Zarr stores using 'kvikio' engine.

For more information about the underlying library, visit
:doc:`kvikIO's Zarr page<kvikio:zarr>`.
"""

available = has_kvikio
description = "Open zarr files (.zarr) using Kvikio"
url = "https://docs.rapids.ai/api/kvikio/stable/api/#zarr"

# disabled by default
# We need to provide this because of the subclassing from
# ZarrBackendEntrypoint
def guess_can_open(self, filename_or_obj):
return False

def open_dataset(
self,
filename_or_obj,
mask_and_scale=True,
decode_times=True,
concat_characters=True,
decode_coords=True,
drop_variables=None,
use_cftime=None,
decode_timedelta=None,
group=None,
mode="r",
synchronizer=None,
consolidated=None,
chunk_store=None,
storage_options=None,
zarr_version=None,
zarr_format=None,
store=None,
engine=None,
use_zarr_fill_value_as_mask=None,
cache_members: bool = True,
) -> Dataset:
filename_or_obj = _normalize_path(filename_or_obj)
if not store:
with zarr.config.enable_gpu():
_store = kvikio.zarr.GDSStore(root=filename_or_obj)

# Override default buffer prototype to be GPU buffer
# buffer_prototype = zarr.core.buffer.core.default_buffer_prototype()
buffer_prototype = zarr.core.buffer.gpu.buffer_prototype
_store.get = functools.partial(_store.get, prototype=buffer_prototype)
_store.get_partial_values = functools.partial(
_store.get_partial_values, prototype=buffer_prototype
)

store = ZarrStore.open_group(
store=_store,
group=group,
mode=mode,
synchronizer=synchronizer,
consolidated=consolidated,
consolidate_on_close=False,
chunk_store=chunk_store,
storage_options=storage_options,
zarr_version=zarr_version,
use_zarr_fill_value_as_mask=None,
zarr_format=zarr_format,
cache_members=cache_members,
)

store_entrypoint = StoreBackendEntrypoint()
with close_on_error(store):
ds = store_entrypoint.open_dataset(
store,
mask_and_scale=mask_and_scale,
decode_times=decode_times,
concat_characters=concat_characters,
decode_coords=decode_coords,
drop_variables=drop_variables,
use_cftime=use_cftime,
decode_timedelta=decode_timedelta,
)
return ds
54 changes: 54 additions & 0 deletions cupy_xarray/tests/test_kvikio.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
import cupy as cp
import numpy as np
import pytest
import xarray as xr
from xarray.core.indexing import ExplicitlyIndexedNDArrayMixin

kvikio = pytest.importorskip("kvikio")
zarr = pytest.importorskip("zarr")

import kvikio.zarr # noqa
import xarray.core.indexing # noqa


@pytest.fixture
def store(tmp_path):
ds = xr.Dataset(
{
"a": ("x", np.arange(10), {"foo": "bar"}),
"scalar": np.array(1),
},
coords={"x": ("x", np.arange(-5, 5))},
)

for var in ds.variables:
ds[var].encoding["compressors"] = None

store_path = tmp_path / "kvikio.zarr"
ds.to_zarr(store_path, consolidated=True)
return store_path


def test_entrypoint():
assert "kvikio" in xr.backends.list_engines()


@pytest.mark.parametrize("consolidated", [True, False])
def test_lazy_load(consolidated, store):
with xr.open_dataset(store, engine="kvikio", consolidated=consolidated) as ds:
for _, da in ds.data_vars.items():
assert isinstance(da.variable._data, ExplicitlyIndexedNDArrayMixin)


@pytest.mark.parametrize("indexer", [slice(None), slice(2, 4), 2, [2, 3, 5]])
def test_lazy_indexing(indexer, store):
with zarr.config.enable_gpu(), xr.open_dataset(store, engine="kvikio") as ds:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally, xr.open_dataset(store, engine="kvikio") should have zarr.config.enable_gpu() set already, so GPU-backed cupy arrays are returned when a user accesses the arrays. Unless we expect there to be users who wants to use the kvikio engine while returning CPU-backed numpy arrays. Default should probably be cupy though?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure... The (v3) kvikio.zarr.GDSStore is written to accept either. I'm tempted to leave this up to the user (so make them configure Zarr correctly) rather than assuming they want to use GPU memory, but maybe engine="kvikio" is sufficient indication that they want stuff on the GPU.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, kvikio.zarr.GDSStore can return numpy (CPU) arrays too, and I guess there might be a case where someone wants nvCOMP decompression on the GPU (once zarr-developers/zarr-python#2863 is implemented), but have a numpy array returned in CPU memory? But I do feel that engine="kvikio" should default to putting things on GPU/cupy, while having a way to toggle it off as needed.

Looking at xr.open_dataset, I'm thinking if it might make sense to pass something into the backend_kwargs parameter to indicate the user's preference for GPU or CPU outputs. Is the from_array_kwargs or chunked_array_type parameters (experimental API) the intended use for this actually?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am inclined to agree that kvikio.zarr.GDSStore should have the ability to not enable the GPU zarr configs even if someone wants to get data on the GPU since they can try and use a different BufferPrototype that return torch tensors instead of cupy arrays, for example.

But for ease of use, it might be better to default to zarr.config.enable_gpu() and that can be overridden by passing in a different context manager? So xr.open_dataset(store, engine="kvikio") will default to using cupy arrays, but xr.open_dataset(store, engine="kvikio", zarr_context=contextlib.nullcontext()) would return numpy arrays and xr.open_dataset(store, engine="kvikio", zarr_context=MyCustomTorchContext()) would return torch tensors?

Copy link
Member Author

@weiji14 weiji14 Apr 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is similar to the discussion at zarr-developers/zarr-python#2473. That zarr_context parameter is exactly what I'm thinking of, I'm trying to find out if there's a way to implement this right now by setting/monkeypatching the default prototype in kvikio.zarr.GDSStore's get method here:

https://github.com/rapidsai/kvikio/blob/a4170fc098e80d339a42c5da9a605796eb864c9f/python/kvikio/kvikio/zarr/_zarr_python_3.py#L98-L102

Currently the prototype: BufferPrototype is set dynamically by calling zarr.core.buffer.core.default_buffer_prototype(). But I'm wondering if we can have a pre-determined default buffer prototype that is set when the GDSStore instance is created?

ds = ds.isel(x=indexer)
for _, da in ds.data_vars.items():
assert isinstance(da.variable._data, ExplicitlyIndexedNDArrayMixin)

loaded = ds.compute()
for _, da in loaded.data_vars.items():
if da.ndim == 0:
continue
assert isinstance(da.data, cp.ndarray)
13 changes: 13 additions & 0 deletions docs/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -51,3 +51,16 @@ Methods

Dataset.cupy.as_cupy
Dataset.cupy.as_numpy


KvikIO engine
-------------

.. currentmodule:: cupy_xarray

.. automodule:: cupy_xarray.kvikio

.. autosummary::
:toctree: generated/

KvikioBackendEntrypoint
1 change: 1 addition & 0 deletions docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,7 @@
"python": ("https://docs.python.org/3/", None),
"dask": ("https://docs.dask.org/en/latest", None),
"cupy": ("https://docs.cupy.dev/en/latest", None),
"kvikio": ("https://docs.rapids.ai/api/kvikio/stable", None),
"xarray": ("http://docs.xarray.dev/en/latest/", None),
}

Expand Down
Loading