-
-
Notifications
You must be signed in to change notification settings - Fork 18.6k
[PERF] Get rid of MultiIndex conversion in IntervalIndex.intersection #26225
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 1 commit
Commits
Show all changes
33 commits
Select commit
Hold shift + click to select a range
a5a1272
Gid rid of MultiIndex conversion in IntervalIndex.intersection
makbigc 3cd095a
Add benchmark for IntervalIndex.intersection
makbigc 0486a4e
clear code
makbigc 09c89f1
Add whatsnew note
makbigc 841a0b7
Modity the case for duplicate index
makbigc 8b22623
Combine the set operation to find indexer into one
makbigc 32d4005
Move setops tests to test_setops.py and add two tests
makbigc d502fcb
Remove relundant line
makbigc 8ec6366
Remove duplicate line in whatsnew note
makbigc 6000904
Isort interval/test_setops.py
makbigc 7cb7d2c
Split the intersection into two sub-functions
makbigc bcf36bb
Functionalize some indexes
makbigc 745c0bb
Remove relundant lines in whatsnew
makbigc ff8bb97
Fixturize the sort parameter
makbigc 17d775f
Factor out the check and decorate the setops
makbigc 03a989a
Add docstring to two subfunction
makbigc b36cbc8
Add intersection into _index_shared_docs
makbigc 1cdb170
Isort and change the decorator's name
makbigc 18c2d37
Remove object inheritance
makbigc d229677
merge master
makbigc 35594b0
Add docstring to setop_check
makbigc 0834206
Merge master again
makbigc 3cf5be8
merge again
makbigc 9cf9b7e
complete merge
makbigc ab67edd
2nd approach
makbigc 402b09c
Add a new benchmark
makbigc b4f130d
Fix linting issue
makbigc 3ff4c64
Change the decorator name to SetopCheck
makbigc 3db3130
Amend and add test for a more corner case
makbigc 1f25adb
Merge commit to resolve conflict
makbigc 4a9cd29
merge master
makbigc 1467e94
merge
makbigc ea2550a
merge again
makbigc File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Amend and add test for a more corner case
- Loading branch information
commit 3db3130bf2dece5394aaff5c919f18de4e342912
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There might be an issue with this approach when dupes are present in
self
andother
. For other index types, such a scenario can result in more dupes being present in the intersection than inself
. This behavior looks a bit buggy and inconsistent though, so I'm not sure if we actually wantIntervalIndex
to be consistent with it.Some examples of the buggy and inconsistent behavior with
Index
:It seems strange that
[3]
has more dupes present than in either original index but[4]
does not. Similarly, it seems like[5]
and[6]
should be identical, as the presence of a non-intersecting element shouldn't impact the number of dupes returned.@jreback : Do you know what the expected behavior for
intersection
with dupes should be? Or if there are any dependencies on the behavior ofintersection
that would dictate this?If we treat indexes like multisets, then the intersection should contain the minimum multiplicity of dupes, e.g.
idx2.intersection(idx3)
andidx3.intersection(idx2)
should both have length 2, so you maintain the property of the intersection being a subset of the original indexes.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah this is weird as these are set ops
what happens (meaning how much breakage) if
prob need to do this for all set ops
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haven't had time to extensively test this out but I made the two changes you suggested in
indexes/base.py
forintersection
and both resulted in some breakage. Aside some from breakage in the index set ops tests, there was also some breakage intests/reshape/test_merge.py
.