-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
PERF: categorical rank #15498
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@jreback @jorisvandenbossche : in the _values_for_rank method in Categorical, re-organizing and moving the typecast to float outside the if condition, like below, has the advantage that i can set a single rank function, for categoricals, in the _get_data_algo function in pandas/core/algorithms.py. Which imo is cleaner. Should i move it out or do you think otherwise?
|
yes going to need some reorg |
@ikilledthecat You can open a PR with the above change, that will be the easiest to discuss (in any case, the above certainly looks reasonable. A reason to keep the astype in the if condition is to avoid a conversion of the data from int to float when not needed, which will give a (small) performance penalty.) @jreback why is it needed to pass rank args? The above (or similar) seems OK to me without additional args |
Another thing we could do for performance is for the unordered categorical to first check whether the categories are sorted before doing the renaming (from a quick test this checking is much less expensive than the actual renaming). Although that may not be worth the complexity. |
so when rank is called on the categories it's fine though that will be handled when the categories are re expanded so maybe not needed yeah makes for sense that way |
closes pandas-dev#15498 Author: Prasanjit Prakash <[email protected]> Closes pandas-dev#15518 from ikilledthecat/rank_categorical_perf and squashes the following commits: 30b49b9 [Prasanjit Prakash] PERF: GH15498 - pep8 changes ad38544 [Prasanjit Prakash] PERF: GH15498 - asv tests and whatsnew 1ebdb56 [Prasanjit Prakash] PERF: categorical rank GH#15498 a67cd85 [Prasanjit Prakash] PERF: categorical rank GH#15498 81df7df [Prasanjit Prakash] PERF: categorical rank GH#15498 45dd125 [Prasanjit Prakash] PERF: categorical rank GH#15498 33249b3 [Prasanjit Prakash] PERF: categorical rank GH#15498
xref #15422 (comment)
easy enough after #15422 to rank the categories themselves rather than using expanded values; prob most relevant for
object
dtypes.The text was updated successfully, but these errors were encountered: