Skip to content

BUG: Convert data elements when dtype=str in Series constructor with … #18795

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Dec 21, 2017

Conversation

reidy-p
Copy link
Contributor

@reidy-p reidy-p commented Dec 15, 2017

…int/float list

Not sure if my solution is correct but it seems to resolve the issue and pass the tests.

# If not empty convert the data to dtype
if not isna(data).all():
data = np.array(data, dtype=dtype)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

u can set copy =False i think here
and just use .astype() which copies

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A lot of tests started failing when I replaced

data = np.array(data, dtype=dtype)

with

data = np.array(data, copy=False).astype(dtype)

Do you know why? Is it related to the fact that dtype= is used only for upcasting but .astype(dtype) is used for downcasting?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah this is numpy stringifying things. I think this might be ok if you follow this with copy=False, IOW this will copy things, so don't need to copy again.

@codecov
Copy link

codecov bot commented Dec 15, 2017

Codecov Report

❗ No coverage uploaded for pull request base (master@07d8c2d). Click here to learn what that means.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff            @@
##             master   #18795   +/-   ##
=========================================
  Coverage          ?   91.62%           
=========================================
  Files             ?      154           
  Lines             ?    51410           
  Branches          ?        0           
=========================================
  Hits              ?    47106           
  Misses            ?     4304           
  Partials          ?        0
Flag Coverage Δ
#multiple 89.49% <100%> (?)
#single 40.84% <100%> (?)
Impacted Files Coverage Δ
pandas/core/series.py 94.83% <100%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 07d8c2d...11542e9. Read the comment docs.

@jreback jreback added Bug Dtype Conversions Unexpected or buggy dtype conversions Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Dec 15, 2017
@@ -142,6 +142,22 @@ def test_constructor_list_like(self):
result = Series(obj, index=[0, 1, 2])
assert_series_equal(result, expected)

@pytest.mark.parametrize('input_vals, dtype, expected', [
([1, 2, 3], 'str', ['1', '2', '3']),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add other dtypes here: strings, datetimes, datetime w/tz, period, timedelta, categorical (with int), categorical (with float), categorical (with str), interval. throw in a nan or two on the floats.

(you can make it an array of len 2 to type a bit less if it helps). could also parametrize this I think (IOW an astype of str of a list should be quiv of str on each element

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added some more dtypes to the tests. I'm not sure how to add a test for categorical, though, because Series(..,dtype=str) throws a ValueError for a Categorical but .astype(str) doesn't:

In [1]: pd.Series(pd.Categorical([1.0, 2.0, np.nan])).astype(str)
Out[1]: 
0    1.0
1    2.0
2    nan
dtype: object

In [2]: pd.Series(pd.Categorical([1.0, 2.0, np.nan]), dtype=str)
Out[2]:
ValueError: cannot specify a dtype with a Categorical unless dtype='category'

# If not empty convert the data to dtype
if not isna(data).all():
data = np.array(data, dtype=dtype)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah this is numpy stringifying things. I think this might be ok if you follow this with copy=False, IOW this will copy things, so don't need to copy again.

])
def test_constructor_list_str(self, input_vals):
# GH 16605
# Ensure that data elements are converted to strings when
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the original issue also covers DataFrame, I am not sure this fix will handle this. Can you add a test for that as well?

Copy link
Contributor Author

@reidy-p reidy-p Dec 18, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the original issue covers DataFrame but shows that it works as expected. For example, from the original issue:

In [1]: int_list = [1, 2, 3]
In [2]: s3 = pd.Series(int_list, dtype=str)

# Expect str but get int
In [3]: print(type(s3[0]))
Out[3]:
<class 'int'>

In [4]: f3 = pd.DataFrame(int_list, dtype=str)

# Expect str and get str
In [5]: print(type(f3.iloc[0, 0]))
Out[5]:
<class 'str'>

But I can try to add similar DataFrame tests if they don't exist already.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are things covered for DataFrame? ping when ready.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldn't find any tests that directly covered the issue so I added a similar test in tests/frame/test_dtypes.py

@pep8speaks
Copy link

pep8speaks commented Dec 19, 2017

Hello @reidy-p! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on December 19, 2017 at 21:34 Hours UTC

@jreback jreback added this to the 0.22.0 milestone Dec 20, 2017
@jreback jreback merged commit 4a2d55b into pandas-dev:master Dec 21, 2017
@jreback
Copy link
Contributor

jreback commented Dec 21, 2017

thanks @reidy-p

@reidy-p reidy-p deleted the series_dtype_str branch December 21, 2017 16:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Dtype Conversions Unexpected or buggy dtype conversions Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Series constructor skips dtype=str conversion for list data
3 participants