Skip to content

RecursionError when reindexing Dataset with pandas extension arrays inside #10218

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
5 tasks done
ilia-kats opened this issue Apr 11, 2025 · 0 comments
Open
5 tasks done
Labels
bug needs triage Issue that has not been reviewed by xarray team member

Comments

@ilia-kats
Copy link

What happened?

When trying to reindex a Dataset that contains a Pandas extension array, I get a RecursionError: maximum recursion depth exceeded.

What did you expect to happen?

reindexing succeeds

Minimal Complete Verifiable Example

import xarray as xr
import pandas as pd
series = pd.Series([1, 2, pd.NA, 3], dtype=pd.Int32Dtype())
test = xr.Dataset({"test": series})
test.reindex(dim_0=series.index)

MVCE confirmation

  • Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • Complete example — the example is self-contained, including all data and the text of any traceback.
  • Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • New issue — a search of GitHub Issues suggests this is not a duplicate.
  • Recent environment — the issue occurs with the latest version of xarray and its dependencies.

Relevant log output

File /data/ilia/envs/famo/lib/python3.11/site-packages/xarray/core/dataset.py:3571, in Dataset.reindex(self, indexers, method, tolerance, copy, fill_value, **indexers_kwargs)
   3373 """Conform this object onto a new set of indexes, filling in
   3374 missing values with ``fill_value``. The default fill value is NaN.
   3375 
   (...)
   3568 
   3569 """
   3570 indexers = utils.either_dict_or_kwargs(indexers, indexers_kwargs, "reindex")
-> 3571 return alignment.reindex(
   3572     self,
   3573     indexers=indexers,
   3574     method=method,
   3575     tolerance=tolerance,
   3576     copy=copy,
   3577     fill_value=fill_value,
   3578 )

File /data/ilia/envs/famo/lib/python3.11/site-packages/xarray/structure/alignment.py:1000, in reindex(obj, indexers, method, tolerance, copy, fill_value, sparse, exclude_vars)
    981 # TODO: (benbovy - explicit indexes): uncomment?
    982 # --> from reindex docstrings: "any mismatched dimension is simply ignored"
    983 # bad_keys = [k for k in indexers if k not in obj._indexes and k not in obj.dims]
   (...)
    987 #         "or unindexed dimension in the object to reindex"
    988 #     )
    990 aligner = Aligner(
    991     (obj,),
    992     indexes=indexers,
   (...)
    998     exclude_vars=exclude_vars,
    999 )
-> 1000 aligner.align()
   1001 return aligner.results[0]

File /data/ilia/envs/famo/lib/python3.11/site-packages/xarray/structure/alignment.py:583, in Aligner.align(self)
    581     self.results = self.objects
    582 else:
--> 583     self.reindex_all()

File /data/ilia/envs/famo/lib/python3.11/site-packages/xarray/structure/alignment.py:558, in Aligner.reindex_all(self)
    557 def reindex_all(self) -> None:
--> 558     self.results = tuple(
    559         self._reindex_one(obj, matching_indexes)
    560         for obj, matching_indexes in zip(
    561             self.objects, self.objects_matching_indexes, strict=True
    562         )
    563     )

File /data/ilia/envs/famo/lib/python3.11/site-packages/xarray/structure/alignment.py:559, in <genexpr>(.0)
    557 def reindex_all(self) -> None:
    558     self.results = tuple(
--> 559         self._reindex_one(obj, matching_indexes)
    560         for obj, matching_indexes in zip(
    561             self.objects, self.objects_matching_indexes, strict=True
    562         )
    563     )

File /data/ilia/envs/famo/lib/python3.11/site-packages/xarray/structure/alignment.py:547, in Aligner._reindex_one(self, obj, matching_indexes)
    544 new_indexes, new_variables = self._get_indexes_and_vars(obj, matching_indexes)
    545 dim_pos_indexers = self._get_dim_pos_indexers(matching_indexes)
--> 547 return obj._reindex_callback(
    548     self,
    549     dim_pos_indexers,
    550     new_variables,
    551     new_indexes,
    552     self.fill_value,
    553     self.exclude_dims,
    554     self.exclude_vars,
    555 )

File /data/ilia/envs/famo/lib/python3.11/site-packages/xarray/core/dataset.py:3270, in Dataset._reindex_callback(self, aligner, dim_pos_indexers, variables, indexes, fill_value, exclude_dims, exclude_vars)
   3268         reindexed = self._overwrite_indexes(new_indexes, new_variables)
   3269     else:
-> 3270         reindexed = self.copy(deep=aligner.copy)
   3271 else:
   3272     to_reindex = {
   3273         k: v
   3274         for k, v in self.variables.items()
   3275         if k not in variables and k not in exclude_vars
   3276     }

File /data/ilia/envs/famo/lib/python3.11/site-packages/xarray/core/dataset.py:1042, in Dataset.copy(self, deep, data)
    945 def copy(self, deep: bool = False, data: DataVars | None = None) -> Self:
    946     """Returns a copy of this dataset.
    947 
    948     If `deep=True`, a deep copy is made of each of the component variables.
   (...)
   1040     pandas.DataFrame.copy
   1041     """
-> 1042     return self._copy(deep=deep, data=data)

File /data/ilia/envs/famo/lib/python3.11/site-packages/xarray/core/dataset.py:1078, in Dataset._copy(self, deep, data, memo)
   1076         variables[k] = index_vars[k]
   1077     else:
-> 1078         variables[k] = v._copy(deep=deep, data=data.get(k), memo=memo)
   1080 attrs = copy.deepcopy(self._attrs, memo) if deep else copy.copy(self._attrs)
   1081 encoding = (
   1082     copy.deepcopy(self._encoding, memo) if deep else copy.copy(self._encoding)
   1083 )

File /data/ilia/envs/famo/lib/python3.11/site-packages/xarray/core/variable.py:896, in Variable._copy(self, deep, data, memo)
    893         ndata = indexing.MemoryCachedArray(data_old.array)  # type: ignore[assignment]
    895     if deep:
--> 896         ndata = copy.deepcopy(ndata, memo)
    898 else:
    899     ndata = as_compatible_data(data)

File /usr/lib/python3.11/copy.py:172, in deepcopy(x, memo, _nil)
    170                 y = x
    171             else:
--> 172                 y = _reconstruct(x, memo, *rv)
    174 # If is its own copy, don't memoize.
    175 if y is not x:

File /usr/lib/python3.11/copy.py:272, in _reconstruct(x, memo, func, args, state, listiter, dictiter, deepcopy)
    270 if deep:
    271     state = deepcopy(state, memo)
--> 272 if hasattr(y, '__setstate__'):
    273     y.__setstate__(state)
    274 else:

File /data/ilia/envs/famo/lib/python3.11/site-packages/xarray/core/extension_array.py:112, in PandasExtensionArray.__getattr__(self, attr)
    111 def __getattr__(self, attr: str) -> object:
--> 112     return getattr(self.array, attr)

File /data/ilia/envs/famo/lib/python3.11/site-packages/xarray/core/extension_array.py:112, in PandasExtensionArray.__getattr__(self, attr)
    111 def __getattr__(self, attr: str) -> object:
--> 112     return getattr(self.array, attr)

    [... skipping similar frames: PandasExtensionArray.__getattr__ at line 112 (2971 times)]

File /data/ilia/envs/famo/lib/python3.11/site-packages/xarray/core/extension_array.py:112, in PandasExtensionArray.__getattr__(self, attr)
    111 def __getattr__(self, attr: str) -> object:
--> 112     return getattr(self.array, attr)

RecursionError: maximum recursion depth exceeded

Anything else we need to know?

No response

Environment

INSTALLED VERSIONS

commit: None
python: 3.11.2 (main, Nov 30 2024, 21:22:50) [GCC 12.2.0]
python-bits: 64
OS: Linux
OS-release: 6.12.12+bpo-amd64
machine: x86_64
processor:
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.14.4
libnetcdf: None

xarray: 2025.3.1
pandas: 2.2.3
numpy: 2.0.2
scipy: 1.14.1
netCDF4: None
pydap: None
h5netcdf: None
h5py: 3.12.1
zarr: 3.0.6
cftime: None
nc_time_axis: None
iris: None
bottleneck: None
dask: 2025.1.0
distributed: 2025.2.0
matplotlib: 3.9.2
cartopy: None
seaborn: 0.13.2
numbagg: None
fsspec: 2024.10.0
cupy: None
pint: None
sparse: 0.16.0
flox: None
numpy_groupies: None
setuptools: 66.1.1
pip: 23.0.1
conda: None
pytest: 8.3.3
mypy: None
IPython: 8.29.0
sphinx: 8.1.3

@ilia-kats ilia-kats added bug needs triage Issue that has not been reviewed by xarray team member labels Apr 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug needs triage Issue that has not been reviewed by xarray team member
Projects
None yet
Development

No branches or pull requests

1 participant