Skip to content

BUG: drop_duplicates drops non-duplicate rows in the presence of integer columns #11403

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Oct 24, 2015
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Revert "PERF: perf improvements in drop_duplicates for integer dtyped…
… arrays"

This reverts commit a00c7ea, but leaves new tests and benchmark
  • Loading branch information
evanpw authored and Evan Wright committed Oct 23, 2015
commit b7107283df30a7c45dbc30347d06c3bbda7f05f3
8 changes: 1 addition & 7 deletions pandas/core/frame.py
Original file line number Diff line number Diff line change
Expand Up @@ -2994,13 +2994,7 @@ def duplicated(self, subset=None, keep='first'):
from pandas.hashtable import duplicated_int64, _SIZE_HINT_LIMIT

def f(vals):

# if we have integers we can directly index with these
if com.is_integer_dtype(vals):
from pandas.core.nanops import unique1d
labels, shape = vals, unique1d(vals)
else:
labels, shape = factorize(vals, size_hint=min(len(self), _SIZE_HINT_LIMIT))
labels, shape = factorize(vals, size_hint=min(len(self), _SIZE_HINT_LIMIT))
return labels.astype('i8',copy=False), len(shape)

if subset is None:
Expand Down