Skip to content

Public API for buffer objects #2876

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

TomAugspurger
Copy link
Contributor

@TomAugspurger TomAugspurger commented Feb 28, 2025

This moves the public imports from buffer things out of zarr.core.buffer.

  • Abstract stuff is availble under zarr.abc.buffer.
  • Concrete implementations are available under zarr.buffer.{cpu,gpu}.

I haven't added any new tests, but I updated the tests in tests/test_buffer.py to use the public API.

Closes #2871

This moves the public imports from buffer things out of `zarr.core`.

Abstract stuff is availble under `zarr.abc.buffer`.

Concrete implementations are available under `zarr.buffer.{cpu,gpu}`.
@github-actions github-actions bot added the needs release notes Automatically applied to PRs which haven't added release notes label Feb 28, 2025
@github-actions github-actions bot removed the needs release notes Automatically applied to PRs which haven't added release notes label Mar 1, 2025
@dstansby
Copy link
Contributor

dstansby commented Mar 4, 2025

Do you think it's worth just moving the actual code to the new public location, so it doesn't live in zarr.core.buffer any more? That would make for a simpler code base and be my preference, but I might be missing some reason not to move the code.

@TomAugspurger
Copy link
Contributor Author

Mmm, I think I'd still want the implementation to be in a private module so that implementation details like numpy imports don't leak into the public API's namespace. So it'd end up looking pretty similar. And if we're moving stuff out of zarr.core I'd want to do it properly, since I wouldn't be surprised if people were already using it. IMO not worth it right now.

@TomAugspurger
Copy link
Contributor Author

I wouldn't be surprised if people were already using it.

Yep: https://github.com/earth-mover/icechunk/blob/584d4f2b0aad1160f8e4ba9a48ebbf770c058c00/icechunk-python/python/icechunk/store.py#L13

I think this PR takes care of those zarr.core.buffer imports. IMO, we should ensure that all the types in our public API (like BytesLike) are exported somewhere, but that should be done separately.

Copy link
Contributor

@dstansby dstansby left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some comments/suggestions inline.

I also think we should move the code to where we expect users to import it from. This is because:

  • having code in zarr.core.buffer, which we expcititly mark as private API, could lead developers to think they can make API changes without deprecations. But in reality it's public API.
  • It forces us to import code in our tests from where users are also importing it

I appreciate it's a bit more work to move the code around, but that's the cost of being a stable and widely used library 😄

And if we're moving stuff out of zarr.core I'd want to do it properly, since I wouldn't be surprised if people were already using it.

We decided that the API in zarr.core is private, so I think we should feel free to do what we want there - if downstream users are using it then I don't think that's our problem, since we don't document it's existence. On a practical level I can appreciate just moving the code might cause issues, in which case we could still have the code importable from zarr.core, but issue deprecation warnings before removing it completely.

@TomAugspurger
Copy link
Contributor Author

TomAugspurger commented Mar 28, 2025

I don't plan to move the implementation at this time. We already know that some libraries, like icechunk, are depending on it. I don't want to break those implementations because Zarr lacked a public API for this previously. We need to offer a public API first and then give them some time to migrate over.

I'll post a proposed plan over in #2621.

@TomAugspurger
Copy link
Contributor Author

Should be good to go.

Copy link
Contributor

@dstansby dstansby left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As well as my inline comments/suggestions, I noticed that there are parts of the API that are typed with zarr.core.buffer... - it would be good to change those to use the new public interface. As an example, zarr.abc.codec.ArrayArrayCodec

@@ -83,7 +83,10 @@ Coming soon.
Custom array buffers
--------------------

Coming soon.
zarr-python provides control where and how arrays stored in memory through
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
zarr-python provides control where and how arrays stored in memory through
Zarr-Python provides control for where and how arrays stored in memory through

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're inconsistent about zarr-python vs. Zarr-python in the docs.

@dstansby
Copy link
Contributor

I don't plan to move the implementation at this time. We already know that some libraries, like icechunk, are depending on it. I don't want to break those implementations because Zarr lacked a public API for this previously. We need to offer a public API first and then give them some time to migrate over.

👍 - my suggestion would be to introduce those deprecations in this PR for the buffer stuff that's been made public, but happy to punt that to a later PR if that's easier.

@TomAugspurger
Copy link
Contributor Author

The latest pair of commits updates our zarr.config, which references zarr.core.buffer. We needed to slightly update our registration code to let the class being registered have a different qualname / import path than where they're defined.

@TomAugspurger
Copy link
Contributor Author

Thanks for the review.

I noticed that there are parts of the API that are typed

This is proving to be a bit more work than I can do right now. Feel free to push a commit if you're able to, or we can merge this as is since and handle that later.

@TomAugspurger
Copy link
Contributor Author

👍 - my suggestion would be to introduce those deprecations in this PR for the buffer stuff that's been made public, but happy to punt that to a later PR if that's easier.

I'd recommend waiting. A release with this public API will give us a chance to migrate packages like icechunk without their users ever seeing a warning.

@d-v-b
Copy link
Contributor

d-v-b commented Apr 20, 2025

It would be nice to get this merged. In the interest of moving forward here, @dstansby are you OK with sorting out some of the requested changes in follow-up PRs?

@dstansby
Copy link
Contributor

dstansby commented Apr 22, 2025

I had a think about this again, and I actually think fairly strongly that if we are moving where we want users to import this code from, then we should deprecate importing it from the old locations at the same time, in the same PR. This would make sure:

  1. It's logistically possible to deprecate the old namespace (otherwise, if it's impossible there's no point complicating things with a new namespace)
  2. Doing the deprecation happens and isn't lost.

I know it's extra work, but I think in order to make sure the current nice bit of work is worth it, pushing users towards using the new work (ie deprecating the old import path) should happen at the same time.

@dstansby
Copy link
Contributor

that's my two cents - if there are a couple of other developers that feel strongly in the other direction (splitting creation of new API/deprecating old so there's a period with duplicate API), then happy to do that. I have a gut feeling that it has the potential to go wrong and create a mess if not done in one go though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Public API for zarr.core.buffer
3 participants