Skip to content

Series constructor skips dtype=str conversion for list data #16605

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
kbg opened this issue Jun 5, 2017 · 2 comments · Fixed by #18795
Closed

Series constructor skips dtype=str conversion for list data #16605

kbg opened this issue Jun 5, 2017 · 2 comments · Fixed by #18795
Labels
Bug Compat pandas objects compatability with Numpy or Python functions Dtype Conversions Unexpected or buggy dtype conversions
Milestone

Comments

@kbg
Copy link

kbg commented Jun 5, 2017

Code Example

from pandas import Series, DataFrame

int_list = [1, 2, 3]

s1 = Series(int_list)
s2 = Series(int_list, dtype=float)
s3 = Series(int_list, dtype=str)
s4 = Series(int_list, dtype='U')
s5 = Series(Series(int_list), dtype=str)

print('Series element type:')
print('  s1:', type(s1[0]))
print('  s2:', type(s2[0]))
print('  s3:', type(s3[0]))
print('  s4:', type(s4[0]))
print('  s5:', type(s5[0]))

f1 = DataFrame(int_list)
f2 = DataFrame(int_list, dtype=float)
f3 = DataFrame(int_list, dtype=str)
f4 = DataFrame(int_list, dtype='U')
f5 = DataFrame(DataFrame(int_list), dtype=str)

print('\nDataFrame element type:')
print('  f1:', type(f1.iloc[0, 0]))
print('  f2:', type(f2.iloc[0, 0]))
print('  f3:', type(f3.iloc[0, 0]))
print('  f4:', type(f4.iloc[0, 0]))
print('  f5:', type(f5.iloc[0, 0]))

Output:

Series element type:
  s1: <class 'numpy.int64'>
  s2: <class 'numpy.float64'>
  s3: <class 'int'>
  s4: <class 'int'>
  s5: <class 'str'>

DataFrame element type:
  f1: <class 'numpy.int64'>
  f2: <class 'numpy.float64'>
  f3: <class 'str'>
  f4: <class 'str'>
  f5: <class 'str'>

Problem description

When creating a Series from a list using dtype=str, the data elements are not converted to strings. The Series instance apparently just keeps the original (Python) data type in this case.

This problem does not occur when, instead of a list, another Series is used as input data (s5 in the example above). It also does not happen when creating DataFrame instances from list data.

Expected Output

Series element type:
  s1: <class 'numpy.int64'>
  s2: <class 'numpy.float64'>
  s3: <class 'str'>
  s4: <class 'str'>
  s5: <class 'str'>

DataFrame element type:
  f1: <class 'numpy.int64'>
  f2: <class 'numpy.float64'>
  f3: <class 'str'>
  f4: <class 'str'>
  f5: <class 'str'>

Output of pd.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 3.6.1.final.0
python-bits: 64
OS: Linux
OS-release: 4.11.3-1-ARCH
machine: x86_64
processor: 
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.20.1
pytest: 3.1.1
pip: 9.0.1
setuptools: 36.0.1
Cython: 0.25.2
numpy: 1.12.1
scipy: 0.19.0
xarray: None
IPython: 5.3.0
sphinx: 1.6.1
patsy: None
dateutil: 2.6.0
pytz: 2017.2
blosc: None
bottleneck: 1.2.1
tables: 3.4.2
numexpr: 2.6.2
feather: None
matplotlib: 2.0.2
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 0.999999999
sqlalchemy: 1.1.10
pymysql: None
psycopg2: 2.7.1 (dt dec pq3 ext lo64)
jinja2: 2.9.6
s3fs: None
pandas_gbq: None
pandas_datareader: 0.4.0
@jreback
Copy link
Contributor

jreback commented Jun 5, 2017

hmm, the array-like should be converted exactly like .astype(str), IOW

Series(arr, dtype=str) should be equal to Series(arr).astype(str).

I think str is treated as object here. pull-requests welcome.

@jreback jreback added Bug Compat pandas objects compatability with Numpy or Python functions Difficulty Intermediate Dtype Conversions Unexpected or buggy dtype conversions labels Jun 5, 2017
@jreback jreback added this to the Next Major Release milestone Jun 5, 2017
@kbg
Copy link
Author

kbg commented Jun 7, 2017

I don't have the time to fix it right now. Maybe I'll find some time at the weekend.

In case somebody else wants to fix this: The problem is located at the very end of pandas.core.series._sanitize_array() which is called by the Series constructor.

There is also an issue with scalar input values:

>>> type(pandas.Series(1.0, dtype=str)[0])
float

which needs some additional changes near the end of pandas.core.series._sanitize_array().

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Compat pandas objects compatability with Numpy or Python functions Dtype Conversions Unexpected or buggy dtype conversions
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants