You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm trying to read a sharded zarr dataset using obstore but xarray is throwing an error saying NotSupportedError: Operation not supported: Azure does not support suffix range requests. However, I am able to read and write sharded arrays just fine using zarr-python directly so I think they have found a way around this.
Reading and writing without the shards works.
I was not able to test this using an FsspecStore because reading/writing to Azure blob storage from xarray isn't working at all, probably due to an issue in fsspec or adlfs that I wasn't able to track down.
What did you expect to happen?
I expect to be able to read and write sharded zarr v3 stores.
Minimal Complete Verifiable Example
importxarrayasxrimportnumpyasnpimportzarrfromzarr.storageimportObjectStorefromobstore.storeimportAzureStoreobjstore=ObjectStore(
store=AzureStore(
container_name=CONTAINER,
prefix="xr-test/test_shards.zarr-v3",
account_name=ACCOUNT,
sas_key=SAS,
)
)
# Reading sharded array with zarr-python works as expectedroot=zarr.create_group(store=objstore, zarr_format=3, overwrite=True)
z1=root.create_array(name='foo', shape=(10000, 10000), shards=(2000, 2000), chunks=(1000, 1000), dtype='int32')
z1[:] =np.random.randint(0, 100, size=(10000, 10000))
root_read=zarr.open_group(store=objstore, zarr_format=3, mode='r')
root_read['foo'][:]
# Writing to xarray with shards also worksds=xr.Dataset(
{"foo": xr.DataArray(root_read['foo'][:], dims=['x', 'y'])},
)
objstore_xr=ObjectStore(
store=AzureStore(
container_name=CONTAINER,
prefix="xr-test/test_shards_xr.zarr-v3",
account_name=ACCOUNT,
sas_key=SAS,
)
)
ds.to_zarr(
objstore_xr,
mode='w',
consolidated=False,
zarr_format=3,
encoding={'foo': {'chunks': (1000, 1000), 'shards': (2000, 2000)}}
)
# Opening the dataset also works as expectedds_n=xr.open_zarr(objstore_xr, consolidated=False)
# However, I get the error when loading the chunks into memoryds_n.compute()
MVCE confirmation
Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
Complete example — the example is self-contained, including all data and the text of any traceback.
Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
New issue — a search of GitHub Issues suggests this is not a duplicate.
Recent environment — the issue occurs with the latest version of xarray and its dependencies.
Relevant log output
---------------------------------------------------------------------------NotSupportedErrorTraceback (mostrecentcalllast)
CellIn[44], line1---->1ds_n.compute()
File~/code/lsimpfendoerfer/xr-test/.venv/lib/python3.11/site-packages/xarray/core/dataset.py:714, inDataset.compute(self, **kwargs)
690"""Manually trigger loading and/or computation of this dataset's data 691 from disk or a remote source into memory and return a new dataset. 692 Unlike load, the original dataset is left unaltered. (...) 711 dask.compute 712 """713new=self.copy(deep=False)
-->714returnnew.load(**kwargs)
File~/code/lsimpfendoerfer/xr-test/.venv/lib/python3.11/site-packages/xarray/core/dataset.py:541, inDataset.load(self, **kwargs)
538chunkmanager=get_chunked_array_type(*lazy_data.values())
540# evaluate all the chunked arrays simultaneously-->541evaluated_data: tuple[np.ndarray[Any, Any], ...] =chunkmanager.compute(
542*lazy_data.values(), **kwargs543 )
545fork, datainzip(lazy_data, evaluated_data, strict=False):
546self.variables[k].data=dataFile~/code/lsimpfendoerfer/xr-test/.venv/lib/python3.11/site-packages/xarray/namedarray/daskmanager.py:85, inDaskManager.compute(self, *data, **kwargs)
80defcompute(
81self, *data: Any, **kwargs: Any82 ) ->tuple[np.ndarray[Any, _DType_co], ...]:
83fromdask.arrayimportcompute--->85returncompute(*data, **kwargs)
File~/code/lsimpfendoerfer/xr-test/.venv/lib/python3.11/site-packages/dask/base.py:656, incompute(traverse, optimize_graph, scheduler, get, *args, **kwargs)
653postcomputes.append(x.__dask_postcompute__())
655withshorten_traceback():
-->656results=schedule(dsk, keys, **kwargs)
658returnrepack([f(r, *a) forr, (f, a) inzip(results, postcomputes)])
File~/code/lsimpfendoerfer/xr-test/.venv/lib/python3.11/site-packages/xarray/core/indexing.py:574, inImplicitToExplicitIndexingAdapter.__array__(self, dtype, copy)
570def__array__(
571self, dtype: np.typing.DTypeLike=None, /, *, copy: bool|None=None572 ) ->np.ndarray:
573ifVersion(np.__version__) >=Version("2.0.0"):
-->574returnnp.asarray(self.get_duck_array(), dtype=dtype, copy=copy)
575else:
576returnnp.asarray(self.get_duck_array(), dtype=dtype)
File~/code/lsimpfendoerfer/xr-test/.venv/lib/python3.11/site-packages/xarray/core/indexing.py:579, inImplicitToExplicitIndexingAdapter.get_duck_array(self)
578defget_duck_array(self):
-->579returnself.array.get_duck_array()
File~/code/lsimpfendoerfer/xr-test/.venv/lib/python3.11/site-packages/xarray/core/indexing.py:790, inCopyOnWriteArray.get_duck_array(self)
789defget_duck_array(self):
-->790returnself.array.get_duck_array()
File~/code/lsimpfendoerfer/xr-test/.venv/lib/python3.11/site-packages/xarray/core/indexing.py:653, inLazilyIndexedArray.get_duck_array(self)
649array=apply_indexer(self.array, self.key)
650else:
651# If the array is not an ExplicitlyIndexedNDArrayMixin,652# it may wrap a BackendArray so use its __getitem__-->653array=self.array[self.key]
655# self.array[self.key] is now a numpy array when656# self.array is a BackendArray subclass657# and self.key is BasicIndexer((slice(None, None, None),))658# so we need the explicit check for ExplicitlyIndexed659ifisinstance(array, ExplicitlyIndexed):
File~/code/lsimpfendoerfer/xr-test/.venv/lib/python3.11/site-packages/xarray/backends/zarr.py:223, inZarrArrayWrapper.__getitem__(self, key)
221elifisinstance(key, indexing.OuterIndexer):
222method=self._oindex-->223returnindexing.explicit_indexing_adapter(
224key, array.shape, indexing.IndexingSupport.VECTORIZED, method225 )
File~/code/lsimpfendoerfer/xr-test/.venv/lib/python3.11/site-packages/xarray/core/indexing.py:1014, inexplicit_indexing_adapter(key, shape, indexing_support, raw_indexing_method)
992"""Support explicit indexing by delegating to a raw indexing method. 993 994 Outer and/or vectorized indexers are supported by indexing a second time (...) 1011 Indexing result, in the form of a duck numpy-array. 1012 """1013raw_key, numpy_indices=decompose_indexer(key, shape, indexing_support)
->1014result=raw_indexing_method(raw_key.tuple)
1015ifnumpy_indices.tuple:
1016# index the loaded duck array1017indexable=as_indexable(result)
File~/code/lsimpfendoerfer/xr-test/.venv/lib/python3.11/site-packages/xarray/backends/zarr.py:213, inZarrArrayWrapper._getitem(self, key)
212def_getitem(self, key):
-->213returnself._array[key]
File~/code/lsimpfendoerfer/xr-test/.venv/lib/python3.11/site-packages/zarr/core/array.py:2430, inArray.__getitem__(self, selection)
2428returnself.vindex[cast(CoordinateSelection|MaskSelection, selection)]
2429elifis_pure_orthogonal_indexing(pure_selection, self.ndim):
->2430returnself.get_orthogonal_selection(pure_selection, fields=fields)
2431else:
2432returnself.get_basic_selection(cast(BasicSelection, pure_selection), fields=fields)
File~/code/lsimpfendoerfer/xr-test/.venv/lib/python3.11/site-packages/zarr/_compat.py:43, in_deprecate_positional_args.<locals>._inner_deprecate_positional_args.<locals>.inner_f(*args, **kwargs)
41extra_args=len(args) -len(all_args)
42ifextra_args<=0:
--->43returnf(*args, **kwargs)
45# extra_args > 046args_msg= [
47f"{name}={arg}"48forname, arginzip(kwonly_args[:extra_args], args[-extra_args:], strict=False)
49 ]
File~/code/lsimpfendoerfer/xr-test/.venv/lib/python3.11/site-packages/zarr/core/array.py:2872, inArray.get_orthogonal_selection(self, selection, out, fields, prototype)
2870prototype=default_buffer_prototype()
2871indexer=OrthogonalIndexer(selection, self.shape, self.metadata.chunk_grid)
->2872returnsync(
2873self._async_array._get_selection(
2874indexer=indexer, out=out, fields=fields, prototype=prototype2875 )
2876 )
File~/code/lsimpfendoerfer/xr-test/.venv/lib/python3.11/site-packages/zarr/core/sync.py:163, insync(coro, loop, timeout)
160return_result=next(iter(finished)).result()
162ifisinstance(return_result, BaseException):
-->163raisereturn_result164else:
165returnreturn_resultFile~/code/lsimpfendoerfer/xr-test/.venv/lib/python3.11/site-packages/zarr/core/sync.py:119, in_runner(coro)
114""" 115 Await a coroutine and return the result of running it. If awaiting the coroutine raises an 116 exception, the exception will be returned. 117 """118try:
-->119returnawaitcoro120exceptExceptionasex:
121returnexFile~/code/lsimpfendoerfer/xr-test/.venv/lib/python3.11/site-packages/zarr/core/array.py:1289, inAsyncArray._get_selection(self, indexer, prototype, out, fields)
1286_config=replace(_config, order=self.metadata.order)
1288# reading chunks and decoding them->1289awaitself.codec_pipeline.read(
1290 [
1291 (
1292self.store_path/self.metadata.encode_chunk_key(chunk_coords),
1293self.metadata.get_chunk_spec(chunk_coords, _config, prototype=prototype),
1294chunk_selection,
1295out_selection,
1296is_complete_chunk,
1297 )
1298forchunk_coords, chunk_selection, out_selection, is_complete_chunkinindexer1299 ],
1300out_buffer,
1301drop_axes=indexer.drop_axes,
1302 )
1303ifisinstance(indexer, BasicIndexer) andindexer.shape== ():
1304returnout_buffer.as_scalar()
File~/code/lsimpfendoerfer/xr-test/.venv/lib/python3.11/site-packages/zarr/core/codec_pipeline.py:464, inBatchedCodecPipeline.read(self, batch_info, out, drop_axes)
458asyncdefread(
459self,
460batch_info: Iterable[tuple[ByteGetter, ArraySpec, SelectorTuple, SelectorTuple, bool]],
461out: NDBuffer,
462drop_axes: tuple[int, ...] = (),
463 ) ->None:
-->464awaitconcurrent_map(
465 [
466 (single_batch_info, out, drop_axes)
467forsingle_batch_infoinbatched(batch_info, self.batch_size)
468 ],
469self.read_batch,
470config.get("async.concurrency"),
471 )
File~/code/lsimpfendoerfer/xr-test/.venv/lib/python3.11/site-packages/zarr/core/common.py:68, inconcurrent_map(items, func, limit)
65asyncwithsem:
66returnawaitfunc(*item)
--->68returnawaitasyncio.gather(*[asyncio.ensure_future(run(item)) foriteminitems])
File~/code/lsimpfendoerfer/xr-test/.venv/lib/python3.11/site-packages/zarr/core/common.py:66, inconcurrent_map.<locals>.run(item)
64asyncdefrun(item: tuple[Any]) ->V:
65asyncwithsem:
--->66returnawaitfunc(*item)
File~/code/lsimpfendoerfer/xr-test/.venv/lib/python3.11/site-packages/zarr/core/codec_pipeline.py:251, inBatchedCodecPipeline.read_batch(self, batch_info, out, drop_axes)
244asyncdefread_batch(
245self,
246batch_info: Iterable[tuple[ByteGetter, ArraySpec, SelectorTuple, SelectorTuple, bool]],
247out: NDBuffer,
248drop_axes: tuple[int, ...] = (),
249 ) ->None:
250ifself.supports_partial_decode:
-->251chunk_array_batch=awaitself.decode_partial_batch(
252 [
253 (byte_getter, chunk_selection, chunk_spec)
254forbyte_getter, chunk_spec, chunk_selection, *_inbatch_info255 ]
256 )
257forchunk_array, (_, chunk_spec, _, out_selection, _) inzip(
258chunk_array_batch, batch_info, strict=False259 ):
260ifchunk_arrayisnotNone:
File~/code/lsimpfendoerfer/xr-test/.venv/lib/python3.11/site-packages/zarr/core/codec_pipeline.py:207, inBatchedCodecPipeline.decode_partial_batch(self, batch_info)
205assertself.supports_partial_decode206assertisinstance(self.array_bytes_codec, ArrayBytesCodecPartialDecodeMixin)
-->207returnawaitself.array_bytes_codec.decode_partial(batch_info)
File~/code/lsimpfendoerfer/xr-test/.venv/lib/python3.11/site-packages/zarr/abc/codec.py:198, inArrayBytesCodecPartialDecodeMixin.decode_partial(self, batch_info)
178asyncdefdecode_partial(
179self,
180batch_info: Iterable[tuple[ByteGetter, SelectorTuple, ArraySpec]],
181 ) ->Iterable[NDBuffer|None]:
182"""Partially decodes a batch of chunks. 183 This method determines parts of a chunk from the slice selection, 184 fetches these parts from the store (via ByteGetter) and decodes them. (...) 196 Iterable[NDBuffer | None] 197 """-->198returnawaitconcurrent_map(
199list(batch_info),
200self._decode_partial_single,
201config.get("async.concurrency"),
202 )
File~/code/lsimpfendoerfer/xr-test/.venv/lib/python3.11/site-packages/zarr/core/common.py:68, inconcurrent_map(items, func, limit)
65asyncwithsem:
66returnawaitfunc(*item)
--->68returnawaitasyncio.gather(*[asyncio.ensure_future(run(item)) foriteminitems])
File~/code/lsimpfendoerfer/xr-test/.venv/lib/python3.11/site-packages/zarr/core/common.py:66, inconcurrent_map.<locals>.run(item)
64asyncdefrun(item: tuple[Any]) ->V:
65asyncwithsem:
--->66returnawaitfunc(*item)
File~/code/lsimpfendoerfer/xr-test/.venv/lib/python3.11/site-packages/zarr/codecs/sharding.py:506, inShardingCodec._decode_partial_single(self, byte_getter, selection, shard_spec)
503shard_dict=shard_dict_maybe504else:
505# read some chunks within the shard-->506shard_index=awaitself._load_shard_index_maybe(byte_getter, chunks_per_shard)
507ifshard_indexisNone:
508returnNoneFile~/code/lsimpfendoerfer/xr-test/.venv/lib/python3.11/site-packages/zarr/codecs/sharding.py:718, inShardingCodec._load_shard_index_maybe(self, byte_getter, chunks_per_shard)
713index_bytes=awaitbyte_getter.get(
714prototype=numpy_buffer_prototype(),
715byte_range=RangeByteRequest(0, shard_index_size),
716 )
717else:
-->718index_bytes=awaitbyte_getter.get(
719prototype=numpy_buffer_prototype(), byte_range=SuffixByteRequest(shard_index_size)
720 )
721ifindex_bytesisnotNone:
722returnawaitself._decode_shard_index(index_bytes, chunks_per_shard)
File~/code/lsimpfendoerfer/xr-test/.venv/lib/python3.11/site-packages/zarr/storage/_common.py:124, inStorePath.get(self, prototype, byte_range)
122ifprototypeisNone:
123prototype=default_buffer_prototype()
-->124returnawaitself.store.get(self.path, prototype=prototype, byte_range=byte_range)
File~/code/lsimpfendoerfer/xr-test/.venv/lib/python3.11/site-packages/zarr/storage/_obstore.py:109, inObjectStore.get(self, key, prototype, byte_range)
107returnprototype.buffer.from_bytes(awaitresp.bytes_async()) # type: ignore[arg-type]108elifisinstance(byte_range, SuffixByteRequest):
-->109resp=awaitobs.get_async(
110self.store, key, options={"range": {"suffix": byte_range.suffix}}
111 )
112returnprototype.buffer.from_bytes(awaitresp.bytes_async()) # type: ignore[arg-type]113else:
NotSupportedError: Operationnotsupported: AzuredoesnotsupportsuffixrangerequestsDebugsource:
NotSupported {
source: "Azure does not support suffix range requests",
}
Anything else we need to know?
No response
Environment
Note that I'm using the latest version of zarr available on GitHub so I can use the obstore library.
Thanks for opening your first issue here at xarray! Be sure to follow the issue template!
If you have an idea for a solution, we would really welcome a Pull Request with proposed changes.
See the Contributing Guide for more.
It may take us a while to respond here, but we really value your contribution. Contributors like you help make xarray better.
Thank you!
I'll look into this tomorrow; I thought that the azure backend automatically switched to two requests (head + range) instead of suffix (because Azure doesn't support suffix range requests)
What happened?
I'm trying to read a sharded zarr dataset using obstore but xarray is throwing an error saying
NotSupportedError: Operation not supported: Azure does not support suffix range requests
. However, I am able to read and write sharded arrays just fine using zarr-python directly so I think they have found a way around this.Reading and writing without the shards works.
I was not able to test this using an FsspecStore because reading/writing to Azure blob storage from xarray isn't working at all, probably due to an issue in fsspec or adlfs that I wasn't able to track down.
What did you expect to happen?
I expect to be able to read and write sharded zarr v3 stores.
Minimal Complete Verifiable Example
MVCE confirmation
Relevant log output
Anything else we need to know?
No response
Environment
Note that I'm using the latest version of zarr available on GitHub so I can use the
obstore
library.INSTALLED VERSIONS
commit: None
python: 3.11.9 (main, Apr 6 2024, 17:59:24) [GCC 9.4.0]
python-bits: 64
OS: Linux
OS-release: 5.15.0-1086-azure
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: C.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: None
libnetcdf: None
xarray: 2025.3.1
pandas: 2.2.3
numpy: 2.1.3
scipy: 1.15.2
netCDF4: None
pydap: None
h5netcdf: None
h5py: None
zarr: 3.0.7.dev8+g018f61d9
cftime: None
nc_time_axis: None
iris: None
bottleneck: 1.4.2
dask: 2025.3.0
distributed: 2025.3.0
matplotlib: None
cartopy: None
seaborn: None
numbagg: None
fsspec: 2025.3.2
cupy: None
pint: None
sparse: None
flox: 0.10.2
numpy_groupies: 0.11.2
setuptools: None
pip: None
conda: None
pytest: None
mypy: None
IPython: 9.1.0
sphinx: None
The text was updated successfully, but these errors were encountered: