Skip to content

ENH: make Series.ptp() handle missing values #11163

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ajcr opened this issue Sep 21, 2015 · 4 comments
Closed

ENH: make Series.ptp() handle missing values #11163

ajcr opened this issue Sep 21, 2015 · 4 comments
Labels
Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Numeric Operations Arithmetic, Comparison, and Logical operations
Milestone

Comments

@ajcr
Copy link
Contributor

ajcr commented Sep 21, 2015

Currently (in master), Series.ptp() is just implemented using np.ptp() and so the method will return nan for any Series that has one or more missing values:

>>> s = pd.Series([5, 0, np.nan, -3, 2])
>>> s.ptp()
nan

It is simple to write s.max() - s.min() instead, but the ptp() result is surprising as most pandas methods are designed to handle missing data gracefully. I think most users would expect the ptp() method to ignore NaN.

If there is any agreement as to whether ptp() should be changed, I would like to work on a pull request!


Extending the idea, it might be useful to have both DataFrame.ptp() and groupby.ptp() methods.

For this example DataFrame...

df = pd.DataFrame({'a': [1, 2, 2, 1, 1],
                   'b': [3, 11, 72, 46, 32],
                   'c': [1.2, 6.7, 13.9, np.nan, -7.7],
                   'd': ['v', 'w', 'x', 'y', 'z']})

...I would expect the following behaviour:

>>> df.ptp()
a      1
b     69
c   12.7
dtype: float64

>>> df.ptp(axis=1)
0     2.0
1     9.0
2    70.0
3    45.0
4    39.7
dtype: float64

>>> df.groupby('a').ptp()
    b    c
a         
1  43  8.9
2  61  7.2

Again, if there is any consensus from the community on whether these additional methods should be added, I'd be happy to work on the pull request.

@jreback
Copy link
Contributor

jreback commented Sep 21, 2015

absolutely. this should do .align, then .max()-.min(). Its here prob just as a convience.

want to do a pull-requests?

@jreback jreback added Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Numeric Operations Arithmetic, Comparison, and Logical operations labels Sep 21, 2015
@jreback jreback added this to the Next Major Release milestone Sep 21, 2015
@ajcr
Copy link
Contributor Author

ajcr commented Sep 21, 2015

Sure, I can work on this. It looks pretty straightforward, although I guess non-numeric columns will have to skipped over as 'str' - 'str' will raise an error, for example.

@jorisvandenbossche
Copy link
Member

Certainly OK to fix the NaN issue in Series.ptp! (we could consider it a bug)

But, I am a bit more hesitant on the second part, adding it to DataFrame. Some reasons: 1) we already have many methods, and personally I don't think ptp has enough added value to justify addition, 2) it is really easy to do this yourself and 3) I also don't find ptp a very clear name.

@ajcr
Copy link
Contributor Author

ajcr commented Sep 22, 2015

I agree ptp is very easily implemented by the user if they need ever it, so maybe it's not worth adding it as a new method to DataFrame and groupby for now.

In that case, I'll just change Series.ptp() so that it's written as max() - min() and so is NaN-aware.

@jreback jreback modified the milestones: 0.17.1, Next Major Release Oct 5, 2015
@jreback jreback modified the milestones: Next Major Release, 0.17.1 Nov 13, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Numeric Operations Arithmetic, Comparison, and Logical operations
Projects
None yet
Development

No branches or pull requests

3 participants