Rolling skewness and kurtosis fail on a sample of all equal values #5749
Labels
Algos
Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff
Bug
Numeric Operations
Arithmetic, Comparison, and Logical operations
Milestone
For a sample of data like this:
Both of these throw an exception (during an attempt to divide by zero):
The issue is in algos.pyx. There are no checks for what amounts to zero variance in the data. If one value occurs more times in a row than than the size of the window, the entire rolling computation fails, rather than just returning NaN for that one period (which is what I'd expect). For reference, scipy gives a kurtosis of -3 and a skewness of 0 (plus a warning) for this situation, which is not what I'd expect (since the higher moments are all zero, implying a division by zero).
Below is the approach I was taking to weed out any possible divide by zero issues. I'll submit a proper pull request tomorrow, in the meantime this is here in case I can get any feedback, preferably on whether these added conditions are enough (I think the kurtosis could still break) and how to add some tests for both of these.
The text was updated successfully, but these errors were encountered: