-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
API/ERR: allow iterators in df.set_index & improve errors #24984
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 1 commit
3e01681
1b71e68
caeb125
8bd5340
3c8b69a
cdfd86a
d76ecfb
d2ffb81
5863678
0a7d783
087d4f1
794f61d
0761633
c58e8b6
29fa8c0
b5c8fa8
5590433
7767ff7
37c12d0
2c4eaea
6c78816
2ccd9a9
b03c43b
ea10359
ca17895
a401eea
f4deacc
125b0ca
9bfcfde
6838613
ca2ac60
87bd0a6
759b369
40f1aaa
ecc7d03
5f99b15
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
- Loading branch information
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -4151,14 +4151,23 @@ def set_index(self, keys, drop=True, append=False, inplace=False, | |
# arrays are fine as long as they are one-dimensional | ||
if getattr(col, 'ndim', 1) > 1: | ||
raise ValueError(err_msg) | ||
elif is_list_like(col, allow_sets=False): | ||
# various iterators/generators are hashable, but should not | ||
# raise a KeyError | ||
tipo = type(col) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. just put type(col) directly rather than adding another line here There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done. |
||
raise ValueError(err_msg + ' Received column of ' | ||
'type {}'.format(tipo)) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I added this extra branch to give a more sensible error for the iterator/generator cases. It's maybe worth noting that it would be easy to re-add the capability to consume list-likes (excluding tuples) here, because tuples now always enter the first branch. In any case, that would be something for There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I am not fully sure we should use There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I know this is a PITA .. (and thanks for the updates!) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
There's a line to draw somewhere, right? Iterables should not be keys (excluding strings, obviously), hashable or not. Have a look at this franken-example and tell me this is something we should explicitly support. I think the current set-up already goes a long way towards making it really clear to the user what's happening or what's wrong. But hashable and iterable? I don't know how to sort that out with reasonable complexity (while keeping more important errors clean and clear), and I already spent way too much time that I don't have today on the last commit, to help with getting out 0.24.1 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. And the line we draw is on hashable (at least, if it is not hashable, the indexing machinery simply doesn't work), not on iterable (tuples are iterable, strings are iterable) I am not saying I would use it myself, but there are certainly reasonable examples of things you can put in an object dtype array that are iterable. One example is a shapely geometry object which are iterable (iterate through coordinates points; strictly spoken, they are currently not hashable, but that is something they plan to fix). Given 0.24.1, I think we might want to release today. Are you fine with us adding some small changes here to get us to merge it during the day? (keeping the broad rationale of course) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @jorisvandenbossche
Why should an object dtype array (which is mutable) be hashable? My suggestion would be that you try to come up with a hashable/iterable example that would also be list-like (from the POV of Edit: didn't see the shapely example, but this case is simply at odds with the iterator/generator case, that - I'd argue - is more widespread. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Sorry, I just generally meant object dtype array-like, like an Index or Series. So that it can make sense to put hashable but iterable objects in an Index or Series There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
As I said, eg a shapely geometry. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Didn't see your comments pop up, and edited my answer independently about the shaply case In any case, I can't work on this any more today. Failures seem to be due to dict-views being hashable on |
||
else: | ||
# everything else gets tried as a key; see GH 24969 | ||
try: | ||
self[col] | ||
except KeyError: | ||
found = col in self.columns | ||
except TypeError: | ||
tipo = type(col) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. same as above |
||
raise ValueError(err_msg, | ||
'Received column of type {}'.format(tipo)) | ||
raise TypeError(err_msg + ' Received column of ' | ||
'type {}'.format(tipo)) | ||
else: | ||
if not found: | ||
WillAyd marked this conversation as resolved.
Show resolved
Hide resolved
|
||
missing.append(col) | ||
jreback marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
if missing: | ||
raise KeyError('{}'.format(missing)) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
!= 1, is a 0-dim numpy scalar valid?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done