-
Notifications
You must be signed in to change notification settings - Fork 17
Kvikio backend entrypoint with Zarr v3 #70
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
weiji14
wants to merge
39
commits into
main
Choose a base branch
from
kvikio-backend
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Changes from all commits
Commits
Show all changes
39 commits
Select commit
Hold shift + click to select a range
9deadb7
Add Kvikio backend entrypoint
dcherian aa2dc91
Add demo notebook
dcherian 7fb4b94
Update kvikio notebook
dcherian 743fe7d
Merge branch 'main' into kvikio-entrypoint
dcherian 5d501e4
Merge branch 'main' into kvikio-entrypoint
andersy005 facf5f7
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] f3f5189
Update cupy_xarray/kvikio.py
dcherian 9c98d19
Merge branch 'main' into kvikio-entrypoint
andersy005 dd8bc57
Merge branch 'main' into kvikio-entrypoint
dcherian d2da1e4
Add url, description.
dcherian b87c3c2
Working
dcherian 87cb74e
Updated notebook
dcherian d7394ef
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 1b23fef
Merge remote-tracking branch 'upstream/main' into kvikio-entrypoint
dcherian ca0cf45
Add tests
dcherian 97260d6
Merge branch 'main' into kvikio-entrypoint
weiji14 5d27b26
Move kvikio notebook under docs/source
weiji14 85491d7
Add zarr as a dependency in ci/doc.yml
weiji14 c470b97
Add entry for KvikioBackendEntrypoint in API docs
weiji14 95efa18
Fix input argument into CupyZarrArrayWrapper
weiji14 d684dad
Merge branch 'main' into kvikio-entrypoint
weiji14 ae2a7f1
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 15fbafd
Re-add kvikio backend entrypoint to pyproject.toml
weiji14 f3df115
Fix C408 and E402
weiji14 4e1857a
Use get_duck_array instead of get_array
weiji14 7345b61
Fix SIM108 Use ternary operator
weiji14 e2b410e
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 7dd78e9
Install nightly version of kvikio=25.04.00a and zarr>=3.0.5
weiji14 cb77678
Remove custom open_store_variable method from GDSZarrStore class
weiji14 0262151
Fix UserWarning compressor -> compressors
weiji14 e26ed24
Reuse logic from xarray.backends.zarr.ZarrStore.open_group
weiji14 f185b44
Install xarray=2025.1.3.dev22+g0184702f
weiji14 1a52ce5
Add zarr.config.enable_gpu() context manager to test_lazy_indexing
weiji14 789a9b6
Refresh kvikIO demo notebook
weiji14 3894d29
Merge branch 'main' into kvikio-backend
weiji14 7fa7c06
Bump xarray from 2025.1.3.dev22+g0184702f to 2025.03.0
weiji14 1e205ec
Try overriding default prototype to be GPU buffer
weiji14 9227810
Merge branch 'main' into kvikio-backend
weiji14 b45decb
Update to stable version of kvikio=25.04.00
weiji14 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,4 +1,5 @@ | ||
from . import _version | ||
from .accessors import CupyDataArrayAccessor, CupyDatasetAccessor # noqa | ||
from .accessors import CupyDataArrayAccessor, CupyDatasetAccessor # noqa: F401 | ||
from .kvikio import KvikioBackendEntrypoint # noqa: F401 | ||
|
||
__version__ = _version.get_versions()["version"] |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,104 @@ | ||
""" | ||
:doc:`kvikIO <kvikio:index>` backend for xarray to read Zarr stores directly into CuPy | ||
arrays in GPU memory. | ||
""" | ||
|
||
import functools | ||
|
||
from xarray.backends.common import _normalize_path # TODO: can this be public | ||
from xarray.backends.store import StoreBackendEntrypoint | ||
from xarray.backends.zarr import ZarrBackendEntrypoint, ZarrStore | ||
from xarray.core.dataset import Dataset | ||
from xarray.core.utils import close_on_error # TODO: can this be public. | ||
|
||
try: | ||
import kvikio.zarr | ||
import zarr | ||
|
||
has_kvikio = True | ||
except ImportError: | ||
has_kvikio = False | ||
|
||
|
||
class KvikioBackendEntrypoint(ZarrBackendEntrypoint): | ||
""" | ||
Xarray backend to read Zarr stores using 'kvikio' engine. | ||
|
||
For more information about the underlying library, visit | ||
:doc:`kvikIO's Zarr page<kvikio:zarr>`. | ||
""" | ||
|
||
available = has_kvikio | ||
description = "Open zarr files (.zarr) using Kvikio" | ||
url = "https://docs.rapids.ai/api/kvikio/stable/api/#zarr" | ||
|
||
# disabled by default | ||
# We need to provide this because of the subclassing from | ||
# ZarrBackendEntrypoint | ||
def guess_can_open(self, filename_or_obj): | ||
return False | ||
|
||
def open_dataset( | ||
self, | ||
filename_or_obj, | ||
mask_and_scale=True, | ||
decode_times=True, | ||
concat_characters=True, | ||
decode_coords=True, | ||
drop_variables=None, | ||
use_cftime=None, | ||
decode_timedelta=None, | ||
group=None, | ||
mode="r", | ||
synchronizer=None, | ||
consolidated=None, | ||
chunk_store=None, | ||
storage_options=None, | ||
zarr_version=None, | ||
zarr_format=None, | ||
store=None, | ||
engine=None, | ||
use_zarr_fill_value_as_mask=None, | ||
cache_members: bool = True, | ||
) -> Dataset: | ||
filename_or_obj = _normalize_path(filename_or_obj) | ||
if not store: | ||
with zarr.config.enable_gpu(): | ||
_store = kvikio.zarr.GDSStore(root=filename_or_obj) | ||
|
||
# Override default buffer prototype to be GPU buffer | ||
# buffer_prototype = zarr.core.buffer.core.default_buffer_prototype() | ||
buffer_prototype = zarr.core.buffer.gpu.buffer_prototype | ||
_store.get = functools.partial(_store.get, prototype=buffer_prototype) | ||
_store.get_partial_values = functools.partial( | ||
_store.get_partial_values, prototype=buffer_prototype | ||
) | ||
|
||
store = ZarrStore.open_group( | ||
store=_store, | ||
group=group, | ||
mode=mode, | ||
synchronizer=synchronizer, | ||
consolidated=consolidated, | ||
consolidate_on_close=False, | ||
chunk_store=chunk_store, | ||
storage_options=storage_options, | ||
zarr_version=zarr_version, | ||
use_zarr_fill_value_as_mask=None, | ||
zarr_format=zarr_format, | ||
cache_members=cache_members, | ||
) | ||
|
||
store_entrypoint = StoreBackendEntrypoint() | ||
with close_on_error(store): | ||
ds = store_entrypoint.open_dataset( | ||
store, | ||
mask_and_scale=mask_and_scale, | ||
decode_times=decode_times, | ||
concat_characters=concat_characters, | ||
decode_coords=decode_coords, | ||
drop_variables=drop_variables, | ||
use_cftime=use_cftime, | ||
decode_timedelta=decode_timedelta, | ||
) | ||
return ds |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,54 @@ | ||
import cupy as cp | ||
import numpy as np | ||
import pytest | ||
import xarray as xr | ||
from xarray.core.indexing import ExplicitlyIndexedNDArrayMixin | ||
|
||
kvikio = pytest.importorskip("kvikio") | ||
zarr = pytest.importorskip("zarr") | ||
|
||
import kvikio.zarr # noqa | ||
import xarray.core.indexing # noqa | ||
|
||
|
||
@pytest.fixture | ||
def store(tmp_path): | ||
ds = xr.Dataset( | ||
{ | ||
"a": ("x", np.arange(10), {"foo": "bar"}), | ||
"scalar": np.array(1), | ||
}, | ||
coords={"x": ("x", np.arange(-5, 5))}, | ||
) | ||
|
||
for var in ds.variables: | ||
ds[var].encoding["compressors"] = None | ||
|
||
store_path = tmp_path / "kvikio.zarr" | ||
ds.to_zarr(store_path, consolidated=True) | ||
return store_path | ||
|
||
|
||
def test_entrypoint(): | ||
assert "kvikio" in xr.backends.list_engines() | ||
|
||
|
||
@pytest.mark.parametrize("consolidated", [True, False]) | ||
def test_lazy_load(consolidated, store): | ||
with xr.open_dataset(store, engine="kvikio", consolidated=consolidated) as ds: | ||
for _, da in ds.data_vars.items(): | ||
assert isinstance(da.variable._data, ExplicitlyIndexedNDArrayMixin) | ||
|
||
|
||
@pytest.mark.parametrize("indexer", [slice(None), slice(2, 4), 2, [2, 3, 5]]) | ||
def test_lazy_indexing(indexer, store): | ||
with zarr.config.enable_gpu(), xr.open_dataset(store, engine="kvikio") as ds: | ||
ds = ds.isel(x=indexer) | ||
for _, da in ds.data_vars.items(): | ||
assert isinstance(da.variable._data, ExplicitlyIndexedNDArrayMixin) | ||
|
||
loaded = ds.compute() | ||
for _, da in loaded.data_vars.items(): | ||
if da.ndim == 0: | ||
continue | ||
assert isinstance(da.data, cp.ndarray) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally,
xr.open_dataset(store, engine="kvikio")
should havezarr.config.enable_gpu()
set already, so GPU-backed cupy arrays are returned when a user accesses the arrays. Unless we expect there to be users who wants to use the kvikio engine while returning CPU-backed numpy arrays. Default should probably be cupy though?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure... The (v3) kvikio.zarr.GDSStore is written to accept either. I'm tempted to leave this up to the user (so make them configure Zarr correctly) rather than assuming they want to use GPU memory, but maybe
engine="kvikio"
is sufficient indication that they want stuff on the GPU.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah,
kvikio.zarr.GDSStore
can return numpy (CPU) arrays too, and I guess there might be a case where someone wants nvCOMP decompression on the GPU (once zarr-developers/zarr-python#2863 is implemented), but have a numpy array returned in CPU memory? But I do feel thatengine="kvikio"
should default to putting things on GPU/cupy, while having a way to toggle it off as needed.Looking at
xr.open_dataset
, I'm thinking if it might make sense to pass something into thebackend_kwargs
parameter to indicate the user's preference for GPU or CPU outputs. Is thefrom_array_kwargs
orchunked_array_type
parameters (experimental API) the intended use for this actually?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am inclined to agree that
kvikio.zarr.GDSStore
should have the ability to not enable the GPU zarr configs even if someone wants to get data on the GPU since they can try and use a differentBufferPrototype
that return torch tensors instead of cupy arrays, for example.But for ease of use, it might be better to default to
zarr.config.enable_gpu()
and that can be overridden by passing in a different context manager? Soxr.open_dataset(store, engine="kvikio")
will default to using cupy arrays, butxr.open_dataset(store, engine="kvikio", zarr_context=contextlib.nullcontext())
would return numpy arrays andxr.open_dataset(store, engine="kvikio", zarr_context=MyCustomTorchContext())
would return torch tensors?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is similar to the discussion at zarr-developers/zarr-python#2473. That
zarr_context
parameter is exactly what I'm thinking of, I'm trying to find out if there's a way to implement this right now by setting/monkeypatching the defaultprototype
inkvikio.zarr.GDSStore
'sget
method here:https://github.com/rapidsai/kvikio/blob/a4170fc098e80d339a42c5da9a605796eb864c9f/python/kvikio/kvikio/zarr/_zarr_python_3.py#L98-L102
Currently the
prototype: BufferPrototype
is set dynamically by callingzarr.core.buffer.core.default_buffer_prototype()
. But I'm wondering if we can have a pre-determined default buffer prototype that is set when the GDSStore instance is created?