-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
BUG: ngroups and len(groups) do not equal when grouping with a list of Grouper and column label #26326
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
returns the
As of pandas 0.24.2, the
so it returns the length of the first set of codes in the The groups that are formed in the example's
It appears that the correct groups are not being formed correctly when you are passing a Not sure if this is appropriate behaviour though. |
This is pretty tricky. I'm surprised to see a difference between the below as well: >>> df.groupby([pd.Grouper(level=0), 'foo']).grouper.groups.keys()
dict_keys([('Dennis', 1), ('Mona', 2)])
>>> df.reset_index(level=1).groupby(['name', 'foo']).grouper.groups.keys()
dict_keys([('Dennis', 1), ('Dennis', 2), ('Mona', 1), ('Mona', 2)])
>>> df.groupby([pd.Grouper(['Mona', 'Mona', 'Dennis', 'Dennis']), 'foo']).grouper.groups.keys()
dict_keys([('Dennis', 1), ('Dennis', 2), ('Mona', 1), ('Mona', 2)]) I think the problem starts here: pandas/pandas/core/groupby/ops.py Line 261 in ee6b131
Where the groups are getting constructed from the individual groupings (here the first grouping is the The problem is that with the former iteration only goes over the unique values (here 'Dennis' and 'Mona') which is why your length calculations are getting truncated to 2: pandas/pandas/core/groupby/grouper.py Line 348 in ee6b131
Not sure the best resolution off the top of my head but if you'd like to investigate further and try your hand at a PR would certainly be welcome! |
When a
The
Meanwhile, if the
The (Is this behavior intentional?) Also as @WillAyd mentioned,
when working with a So, a possible fix might be either to handle the special case of a |
Adding to the above discussion,
|
Thanks for the bug confirmation, will use combination of labels to groupby instead. |
Code
Problem description
I grouped a DataFrame by a list of pandas.Grouper object and column label, the length of groups or group keys should be 4, but I checked the length of GroupBy object as n2, it is 2, why? Thanks for your answer in advance.
ommit: None
python: 3.7.3.final.0
python-bits: 64
OS: Darwin
OS-release: 18.5.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: zh_CN.UTF-8
LOCALE: zh_CN.UTF-8
The text was updated successfully, but these errors were encountered: