Skip to content

BUG: DataFrameGroupBy.transform with axis=1 fails (#36308) #36350

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Nov 14, 2020

Conversation

ethanator
Copy link
Contributor

@ethanator ethanator commented Sep 14, 2020

Updated pandas.core.groupby.generic._wrap_transformed_output as suggested by @rhshadrach but also had to modify pandas.core.groupby.groupby.pct_change to get 2 tests passed where the operation passed to transform is pct_change.

All tests passed except for pandas/tests/io/test_clipboard.py and pandas/tests/io/test_parquet.py that are already broken on the latest master as of September 14, 2020 01:15 UTC.

@dsaxton dsaxton added Apply Apply, Aggregate, Transform, Map Bug Groupby labels Sep 14, 2020
@dsaxton
Copy link
Member

dsaxton commented Sep 14, 2020

Thanks @ethanator, can you also add a test and bug fix note in the v1.1.3 whatsnew?

@jreback jreback modified the milestone: 1.2 Sep 15, 2020
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

put this on 1.2, pls add a test for shift and for pct_change which was modified as well.

@rhshadrach
Copy link
Member

@ethanator Are you still interested in working on this?

@ethanator
Copy link
Contributor Author

@ethanator Are you still interested in working on this?

Yes. I got distracted but I'm adding tests now to wrap this up.

@github-actions
Copy link
Contributor

github-actions bot commented Nov 5, 2020

This pull request is stale because it has been open for thirty days with no activity. Please update or respond to this comment if you're still interested in working on this.

@github-actions github-actions bot added the Stale label Nov 5, 2020
@rhshadrach
Copy link
Member

@ethanator I'd like to get this in 1.2, would it be okay if I finished it up?

@jbrockmendel
Copy link
Member

I'd like to get this in 1.2, would it be okay if I finished it up?

yes, if you can address comments about tests and merge master this looked almost ready

@rhshadrach
Copy link
Member

Both shift and pct_change are included in the transformation functions in the added test.

@rhshadrach rhshadrach removed the Stale label Nov 14, 2020
@rhshadrach
Copy link
Member

Test fails on Windows 32-bit because DataFrame gets constructed with 32-bit ints but doing df["b"] = df["b"].astype(int) results in 64-bit ints. Not sure what a good way to do this is; maybe specify dtype=int in the DataFrame construction?

@jbrockmendel
Copy link
Member

Well which is the "correct" answer? if you want to get int32 when on a 32bit machine you can do

expected["b"] = expected["b"].astype(np.intp)

@rhshadrach
Copy link
Member

Thanks @jbrockmendel, that's exactly what I was asking but it turns out I had it backwards. The result in the test was 64-bit, but using astype(int) on expected (to undo the coercion to float in the expected computation) resulted in 32-bit. I've updated the test to use astype("int64").

However, I'm not sure if this is really the correct behavior. Should the result be coming back as 64-bit here as is currently the case?

@jreback jreback added this to the 1.2 milestone Nov 14, 2020
@jreback
Copy link
Contributor

jreback commented Nov 14, 2020

Thanks @jbrockmendel, that's exactly what I was asking but it turns out I had it backwards. The result in the test was 64-bit, but using astype(int) on expected (to undo the coercion to float in the expected computation) resulted in 32-bit. I've updated the test to use astype("int64").

However, I'm not sure if this is really the correct behavior. Should the result be coming back as 64-bit here as is currently the case?

your inputs are all int64 (on windows as well), so it IS suprising that you are getting an intp back (32-bit on windows). though i suppose somewhere we are constructing a return array for the ints that uses intp. I would say ok for now, but let's open an issue to dig depper on this.

@rhshadrach ok on this otherwise? (lgtm.).

@rhshadrach
Copy link
Member

your inputs are all int64 (on windows as well), so it IS suprising that you are getting an intp back (32-bit on windows).

result in the test is coming back as int64; so I think this is then okay. Unless you're surprised that astype(int) is giving back 32-bit ints on 32-bit Windows? In the call to astype(int), the inputs are floats.

Aside from this, the PR is good on my end.


if transformation_func == "diff":
# Result contains nans, so transpose coerces to float
expected["b"] = expected["b"].astype("int64")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh i c you are doing this because we want to compare to ints, got it ,then this is fine.

@jreback jreback merged commit 138d575 into pandas-dev:master Nov 14, 2020
@jreback
Copy link
Contributor

jreback commented Nov 14, 2020

thanks @ethanator

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Apply Apply, Aggregate, Transform, Map Bug Groupby
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: DataFrameGroupBy.transform with axis=1 fails
5 participants