Skip to content

BUG: Impossible creation of array with dtype=string #61263

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 11 commits into
base: main
Choose a base branch
from
Open
5 changes: 4 additions & 1 deletion pandas/_libs/lib.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -769,7 +769,10 @@ cpdef ndarray[object] ensure_string_array(
return out
arr = arr.to_numpy(dtype=object)
elif not util.is_array(arr):
arr = np.array(arr, dtype="object")
# GH#61155: Guarantee a 1-d result when array is a list of lists
input_arr = arr
arr = np.empty(len(arr), dtype="object")
arr[:] = input_arr

result = np.asarray(arr, dtype="object")

Expand Down
2 changes: 2 additions & 0 deletions pandas/core/arrays/string_.py
Original file line number Diff line number Diff line change
Expand Up @@ -655,6 +655,8 @@ def _from_sequence(
# zero_copy_only to True which caused problems see GH#52076
scalars = np.array(scalars)
# convert non-na-likes to str, and nan-likes to StringDtype().na_value
if isinstance(scalars, list) and all(isinstance(x, list) for x in scalars):
scalars = [str(x) for x in scalars]
Comment on lines +658 to +659
Copy link
Member

@rhshadrach rhshadrach Apr 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of this, can you modify ensure_string_array in pandas._libs.lib.pyx as follows. Instead of

elif not util.is_array(arr):
    arr = np.array(arr, dtype="object")

do

elif not util.is_array(arr):
    # GH#61155: Guarantee a 1-d result when array is a list of lists
    arr = np.empty(len(array), dtype="object")
    arr[:] = array

Will has almost no performance impact.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much for the suggestions, I have made the necessary changes as per the guidance.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for making the update. The changes in this file should now be reverted.

result = lib.ensure_string_array(scalars, na_value=na_value, copy=copy)

# Manually creating new array avoids the validation step in the __init__, so is
Expand Down
4 changes: 4 additions & 0 deletions pandas/core/indexes/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -4912,6 +4912,10 @@ def values(self) -> ArrayLike:
:meth:`Index.to_numpy`, depending on whether you need
a reference to the underlying data or a NumPy array.

.. versionchanged:: 3.0.0

The returned array is read-only.

Returns
-------
array: numpy.ndarray or ExtensionArray
Expand Down
4 changes: 4 additions & 0 deletions pandas/tests/arrays/test_string_array.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
import pandas as pd

print(pd.array([list("test"), list("words")], dtype="string"))
print(pd.array([list("test"), list("word")], dtype="string"))
Loading