Skip to content

Rolling skewness and kurtosis fail on a sample of all equal values #5749

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
yieldsfalsehood opened this issue Dec 19, 2013 · 2 comments · Fixed by #5760
Closed

Rolling skewness and kurtosis fail on a sample of all equal values #5749

yieldsfalsehood opened this issue Dec 19, 2013 · 2 comments · Fixed by #5760
Labels
Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff Bug Numeric Operations Arithmetic, Comparison, and Logical operations
Milestone

Comments

@yieldsfalsehood
Copy link
Contributor

For a sample of data like this:

d = pd.Series([1] * 25)

Both of these throw an exception (during an attempt to divide by zero):

pd.rolling_skew(d, window=25)
pd.rolling_kurt(d, window=25)

The issue is in algos.pyx. There are no checks for what amounts to zero variance in the data. If one value occurs more times in a row than than the size of the window, the entire rolling computation fails, rather than just returning NaN for that one period (which is what I'd expect). For reference, scipy gives a kurtosis of -3 and a skewness of 0 (plus a warning) for this situation, which is not what I'd expect (since the higher moments are all zero, implying a division by zero).

>>> from scipy import stats
>>> stats.kurtosis([1,1,1,1,1,1,1])
-3.0
>>> stats.skew([1,1,1,1,1,1,1])
/usr/lib/python2.7/dist-packages/scipy/stats/stats.py:1067: RuntimeWarning: invalid value encountered in double_scalars
  vals = np.where(zero, 0, m3 / m2**1.5)
0.0

Below is the approach I was taking to weed out any possible divide by zero issues. I'll submit a proper pull request tomorrow, in the meantime this is here in case I can get any feedback, preferably on whether these added conditions are enough (I think the kurtosis could still break) and how to add some tests for both of these.

diff --git a/pandas/algos.pyx b/pandas/algos.pyx
index 08ec707..78b619f 100644
--- a/pandas/algos.pyx
+++ b/pandas/algos.pyx
@@ -1160,7 +1160,7 @@ def roll_skew(ndarray[double_t] input, int win, int minp):

                 nobs -= 1

-        if nobs >= minp:
+        if nobs >= minp and not (x == 0 and xx == 0) and nobs != 2:
             A = x / nobs
             B = xx / nobs - A * A
             C = xxx / nobs - A * A * A - 3 * A * B
@@ -1227,7 +1227,7 @@ def roll_kurt(ndarray[double_t] input,

                 nobs -= 1

-        if nobs >= minp:
+        if nobs >= minp and not (x == 0 and xx == 0) and nobs != 2:
             A = x / nobs
             R = A * A
             B = xx / nobs - R
@jreback
Copy link
Contributor

jreback commented Dec 19, 2013

yep...prob nice to have some nice edge tests for these

@yieldsfalsehood
Copy link
Contributor Author

I sent in a pull request for this - #5760

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff Bug Numeric Operations Arithmetic, Comparison, and Logical operations
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants