-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
Categorical.from_codes shouldn't coerce to int64 #18501
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
pls show a copy pastable example |
I'm not sure what you want. The intermediate conversion to int64 isn't observable, so there'd be nothing to show in a copy-paste example. It's an efficiency enhancement request, not a functionality one. |
i want a copy pastable example of your input |
# Shouldn't convert the input codes array to int64, because we then convert it to
# to int8 anyway.
pd.Categorical.from_codes(codes=np.asarray([0,1], np.int16), categories=["foo", "bar"]) |
Should be able to wrap this: pandas/pandas/core/categorical.py Lines 599 to 603 in 38f41e6
in an if not is_integer_dtype(codes):
# do the try / except And see what breaks. @dcolascione could you submit a PR for that, along with tests and a release note? Note that we even if we avoid the cast to In [6]: pd.Categorical.from_codes(codes=np.asarray([0,1], np.int16), categories=["foo", "bar"]).codes.dtype
Out[6]: dtype('int8') Avoiding all copies may be more difficult, but possible. |
ok this looks fine to do
change
|
Hi, I would like to work on this issue. Is any issue that I couldn't work on this? And any suggestions or documents for the first starter like how to write a test case etc? Thanks! |
Categorical.from_codes coerces its input to an array of np.int64 unconditionally even though the Categorical constructor immediately coerces the input to some other dtype using coerce_indexer_dtype. This coercion might cause a memory usage spike when codes is large. ISTM that we can just avoid the conversion in from_codes entirely and let coerce_indexer_dtype take care of any error case.
Version: master
The text was updated successfully, but these errors were encountered: