-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
BUG: Convert data elements when dtype=str in Series constructor with … #18795
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
pandas/core/series.py
Outdated
# If not empty convert the data to dtype | ||
if not isna(data).all(): | ||
data = np.array(data, dtype=dtype) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
u can set copy =False i think here
and just use .astype() which copies
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A lot of tests started failing when I replaced
data = np.array(data, dtype=dtype)
with
data = np.array(data, copy=False).astype(dtype)
Do you know why? Is it related to the fact that dtype=
is used only for upcasting but .astype(dtype)
is used for downcasting?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah this is numpy stringifying things. I think this might be ok if you follow this with copy=False
, IOW this will copy things, so don't need to copy again.
Codecov Report
@@ Coverage Diff @@
## master #18795 +/- ##
=========================================
Coverage ? 91.62%
=========================================
Files ? 154
Lines ? 51410
Branches ? 0
=========================================
Hits ? 47106
Misses ? 4304
Partials ? 0
Continue to review full report at Codecov.
|
@@ -142,6 +142,22 @@ def test_constructor_list_like(self): | |||
result = Series(obj, index=[0, 1, 2]) | |||
assert_series_equal(result, expected) | |||
|
|||
@pytest.mark.parametrize('input_vals, dtype, expected', [ | |||
([1, 2, 3], 'str', ['1', '2', '3']), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add other dtypes here: strings, datetimes, datetime w/tz, period, timedelta, categorical (with int), categorical (with float), categorical (with str), interval. throw in a nan or two on the floats.
(you can make it an array of len 2 to type a bit less if it helps). could also parametrize this I think (IOW an astype of str of a list should be quiv of str on each element
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added some more dtypes to the tests. I'm not sure how to add a test for categorical, though, because Series(..,dtype=str)
throws a ValueError for a Categorical but .astype(str)
doesn't:
In [1]: pd.Series(pd.Categorical([1.0, 2.0, np.nan])).astype(str)
Out[1]:
0 1.0
1 2.0
2 nan
dtype: object
In [2]: pd.Series(pd.Categorical([1.0, 2.0, np.nan]), dtype=str)
Out[2]:
ValueError: cannot specify a dtype with a Categorical unless dtype='category'
dae1c35
to
f3fd1d9
Compare
pandas/core/series.py
Outdated
# If not empty convert the data to dtype | ||
if not isna(data).all(): | ||
data = np.array(data, dtype=dtype) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah this is numpy stringifying things. I think this might be ok if you follow this with copy=False
, IOW this will copy things, so don't need to copy again.
]) | ||
def test_constructor_list_str(self, input_vals): | ||
# GH 16605 | ||
# Ensure that data elements are converted to strings when |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the original issue also covers DataFrame, I am not sure this fix will handle this. Can you add a test for that as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, the original issue covers DataFrame but shows that it works as expected. For example, from the original issue:
In [1]: int_list = [1, 2, 3]
In [2]: s3 = pd.Series(int_list, dtype=str)
# Expect str but get int
In [3]: print(type(s3[0]))
Out[3]:
<class 'int'>
In [4]: f3 = pd.DataFrame(int_list, dtype=str)
# Expect str and get str
In [5]: print(type(f3.iloc[0, 0]))
Out[5]:
<class 'str'>
But I can try to add similar DataFrame tests if they don't exist already.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are things covered for DataFrame? ping when ready.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I couldn't find any tests that directly covered the issue so I added a similar test in tests/frame/test_dtypes.py
f3fd1d9
to
697385e
Compare
697385e
to
732f1f8
Compare
Hello @reidy-p! Thanks for updating the PR. Cheers ! There are no PEP8 issues in this Pull Request. 🍻 Comment last updated on December 19, 2017 at 21:34 Hours UTC |
thanks @reidy-p |
…int/float list
git diff upstream/master -u -- "*.py" | flake8 --diff
Not sure if my solution is correct but it seems to resolve the issue and pass the tests.