BUG: DataFrame.diff(axis=1) with mixed (or EA) dtypes #32995

jbrockmendel · 2020-03-25T01:31:54Z

TomAugspurger

I don't think we can just raise NotImplemented here. I suspect a mix of floats and bool will be somewhat common.

Can we instead skip doing things blockwise for axis=1 and do it columnwise instead? Something like

In [39]: df = pd.DataFrame({'a': [1,1,2,2],'b': [1, 2, 3, 4.], 'c': [True, False, False, True]})

In [40]: import toolz

In [41]: pd.concat([a - b for a, b in toolz.sliding_window(2, (df.iloc[:, i] for i in range(len(df.columns))))], axis=1)
Out[41]:
     0    1
0  0.0  0.0
1 -1.0  2.0
2 -1.0  3.0
3 -2.0  3.0

(without using toolz, and including the all-NA columns and fixing the column names). We could perhaps only do that when nblocks > 1, to preserve the performance in the homogenous case.

…g-diff-axis1

jbrockmendel · 2020-03-25T17:33:37Z

Can we instead skip doing things blockwise for axis=1 and do it columnwise instead?

That would also allow us to avoid consolidating, which would be nice.

…g-diff-axis1

jbrockmendel · 2020-03-25T23:24:32Z

updated to operate column-wise.

DatetimeTZBlock.diff could have its NotImplementedError case removed in a follow-up

jreback · 2020-03-26T00:00:07Z

pandas/core/frame.py

@@ -6667,6 +6667,30 @@ def diff(self, periods: int = 1, axis: Axis = 0) -> "DataFrame":
        5  NaN  NaN   NaN
        """
        bm_axis = self._get_block_manager_axis(axis)
+        self._consolidate_inplace()


this is not pretty. why are we not simply transposing and calling .diff()?

why are we not simply transposing and calling .diff()?

I'd be fine with that, but it is potentially costly

I personally think the column-wise approach could be fine

this should be co-located with BlockManager.diff. but to be honest I think transposing is just fine here. The block type is already inferred and handled. This is just adding a lot of complexity.

if you really want to do this column based approach them move this with the other .diff methods (I actually prefer just a transpose here, its totally fine for now)

jbrockmendel · 2020-03-31T20:28:39Z

Do we have anything approaching consensus here? I'm OK with either column-wise or transpose

TomAugspurger · 2020-03-31T20:40:15Z

I'm pretty strongly in favor of columnwise when there's more than 1 block. While I don't think sparse support should ever be a primary motivator, the difference in performance of a columnwise diff vs. a .T.diff().T for something like

In [6]: a = pd.arrays.SparseArray([1] + [0] * 1000)

In [7]: df = pd.DataFrame({"A": a, "B": a})

will be huge.

…g-diff-axis1

jreback · 2020-04-05T20:21:25Z

pandas/core/frame.py

@@ -6667,6 +6667,30 @@ def diff(self, periods: int = 1, axis: Axis = 0) -> "DataFrame":
        5  NaN  NaN   NaN
        """
        bm_axis = self._get_block_manager_axis(axis)
+        self._consolidate_inplace()


if you really want to do this column based approach them move this with the other .diff methods (I actually prefer just a transpose here, its totally fine for now)

…g-diff-axis1

jbrockmendel · 2020-04-09T23:54:10Z

@jreback is column-wise vs transpose a deal breaker? if not, i think this is ready

jreback · 2020-04-10T00:08:23Z

if you really want to do columnwise can you move impl to another location as indicated

the code is pretty complex and should be separated from inline in the function

jreback · 2020-04-10T16:06:06Z

thanks, IIRC there might be some issues this closes if you'd have a look

jbrockmendel added 2 commits March 24, 2020 18:27

BUG: DataFrame.diff(axis=1) with mixed (or EA) dtypes

a47d289

update GH refs

ee5f7dc

jbrockmendel added the Bug label Mar 25, 2020

TomAugspurger reviewed Mar 25, 2020

View reviewed changes

Merge branch 'master' of https://github.com/pandas-dev/pandas into bu…

ddc9361

…g-diff-axis1

jbrockmendel added 4 commits March 25, 2020 15:45

Merge branch 'master' of https://github.com/pandas-dev/pandas into bu…

7c8079b

…g-diff-axis1

Operate column-wise

1d46ed6

restore test

1df730b

simplify, unused import

3a161b7

jreback requested changes Mar 26, 2020

View reviewed changes

Merge branch 'master' of https://github.com/pandas-dev/pandas into bu…

5778212

…g-diff-axis1

jreback requested changes Apr 5, 2020

View reviewed changes

jbrockmendel added 2 commits April 6, 2020 11:11

Merge branch 'master' of https://github.com/pandas-dev/pandas into bu…

78b75ce

…g-diff-axis1

Merge branch 'master' of https://github.com/pandas-dev/pandas into bu…

bc1aa2e

…g-diff-axis1

jbrockmendel added 2 commits April 9, 2020 18:15

transpose

f7cb97d

update whatsnew

189b02e

jreback added this to the 1.1 milestone Apr 10, 2020

jreback added the Numeric Operations Arithmetic, Comparison, and Logical operations label Apr 10, 2020

jreback approved these changes Apr 10, 2020

View reviewed changes

jreback merged commit a142ad7 into pandas-dev:master Apr 10, 2020

jbrockmendel deleted the bug-diff-axis1 branch April 10, 2020 18:18

dsaxton mentioned this pull request Apr 22, 2020

BUG: argument 'axis=1' is silently ignored in DataFrame diff() method when dtype is nullable integer (Int64) #33726

Closed

jbrockmendel mentioned this pull request Sep 28, 2020

.diff(axis=1) gives NaNs with different types. #21437

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: DataFrame.diff(axis=1) with mixed (or EA) dtypes #32995

BUG: DataFrame.diff(axis=1) with mixed (or EA) dtypes #32995

jbrockmendel commented Mar 25, 2020 •

edited

Loading

TomAugspurger left a comment

jbrockmendel commented Mar 25, 2020

jbrockmendel commented Mar 25, 2020

jreback Mar 26, 2020

jbrockmendel Mar 26, 2020

jorisvandenbossche Mar 26, 2020

jreback Mar 29, 2020

jreback Apr 5, 2020

jbrockmendel commented Mar 31, 2020

TomAugspurger commented Mar 31, 2020

jreback Apr 5, 2020

jbrockmendel commented Apr 9, 2020

jreback commented Apr 10, 2020

jreback commented Apr 10, 2020

BUG: DataFrame.diff(axis=1) with mixed (or EA) dtypes #32995

BUG: DataFrame.diff(axis=1) with mixed (or EA) dtypes #32995

Conversation

jbrockmendel commented Mar 25, 2020 • edited Loading

TomAugspurger left a comment

Choose a reason for hiding this comment

jbrockmendel commented Mar 25, 2020

jbrockmendel commented Mar 25, 2020

jreback Mar 26, 2020

Choose a reason for hiding this comment

jbrockmendel Mar 26, 2020

Choose a reason for hiding this comment

jorisvandenbossche Mar 26, 2020

Choose a reason for hiding this comment

jreback Mar 29, 2020

Choose a reason for hiding this comment

jreback Apr 5, 2020

Choose a reason for hiding this comment

jbrockmendel commented Mar 31, 2020

TomAugspurger commented Mar 31, 2020

jreback Apr 5, 2020

Choose a reason for hiding this comment

jbrockmendel commented Apr 9, 2020

jreback commented Apr 10, 2020

jreback commented Apr 10, 2020

jbrockmendel commented Mar 25, 2020 •

edited

Loading