Skip to content

ENH: Add DatetimeIndexResampler.nlargest #17791

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
edschofield opened this issue Oct 5, 2017 · 10 comments
Open

ENH: Add DatetimeIndexResampler.nlargest #17791

edschofield opened this issue Oct 5, 2017 · 10 comments

Comments

@edschofield
Copy link

Code Sample, a copy-pastable example if possible

With this setup:

import numpy as np
n = 1000
dates = pd.date_range(start='2010-01-01', periods=n)
rain_random = pd.Series(data=np.random.uniform(size=n), index=dates)

these two operations given different results:

rain_random.groupby(rain_random.index.year).nlargest(3)
rain_random.resample('A').nlargest(3)

Problem description

The Series.resample().nlargest() operation is inconsistent with DataFrame.resample()[column].nlargest() and the groupby equivalent. It emits a warning

Output:

/Users/schofield/miniconda/envs/py36/lib/python3.6/site-packages/ipykernel_launcher.py:1: FutureWarning: 
.resample() is now a deferred operation
You called nlargest(...) on this deferred object which materialized it into a series
by implicitly taking the mean.  Use .resample(...).mean() instead
  """Entry point for launching an IPython kernel.
Out[427]:
2010-12-31    0.507550
2012-12-31    0.490082
2011-12-31    0.478356
dtype: float64

Expected output:

Date        Date      
1930-12-31  1930-10-06      288.135370
            1930-10-05      285.587734
            1930-10-07      259.439935
            1930-10-08      227.587389
            1930-10-09      190.054844
1931-12-31  1931-01-26     3052.104566
            1931-01-25     2839.126102
            1931-01-29     2196.167129
            1931-02-01     1953.331709
            1931-01-27     1893.975328
1932-12-31  1932-01-19     9526.953864
            1932-01-20     4278.291105
            1932-03-03     2952.348903
            1932-03-02     2946.385433
            1932-03-04     2098.108897

pd.show_versions() output:

INSTALLED VERSIONS

commit: None
python: 3.6.1.final.0
python-bits: 64
OS: Darwin
OS-release: 16.6.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_AU.UTF-8
LOCALE: en_AU.UTF-8

pandas: 0.20.1
pytest: 3.0.7
pip: 9.0.1
setuptools: 27.2.0
Cython: 0.25.2
numpy: 1.12.1
scipy: 0.19.0
xarray: None
IPython: 5.3.0
sphinx: 1.6.3
patsy: 0.4.1
dateutil: 2.6.0
pytz: 2017.2
blosc: None
bottleneck: 1.2.1
tables: 3.4.2
numexpr: 2.6.2
feather: None
matplotlib: 2.0.2
openpyxl: 2.4.7
xlrd: 1.0.0
xlwt: 1.2.0
xlsxwriter: 0.9.6
lxml: 3.7.3
bs4: 4.6.0
html5lib: 0.999
sqlalchemy: 1.1.9
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
pandas_gbq: None
pandas_datareader: 0.5.0

@jreback
Copy link
Contributor

jreback commented Oct 5, 2017

nlargest is not a first class operation on resample, so you need to do this.

In [4]: rain_random.resample('A').apply(lambda x: x.nlargest(3))
Out[4]: 
2010-12-31  2010-09-24    0.998530
            2010-04-27    0.997371
            2010-03-09    0.996582
2011-12-31  2011-11-30    0.999936
            2011-02-20    0.997470
            2011-01-17    0.992270
2012-12-31  2012-07-23    0.999762
            2012-06-20    0.998130
            2012-02-25    0.998010
dtype: float64

@sinhrks sinhrks added Groupby Resample resample method labels Oct 6, 2017
@discort
Copy link
Contributor

discort commented Aug 23, 2018

@jreback @sinhrks
The bug is not reproducible now:

In [14]: rain_random.resample('A').nlargest(3)
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-14-ec29cc197ee8> in <module>()
----> 1 rain_random.resample('A').nlargest(3)

/Users/discort/python/fun/pandas/pandas/core/resample.py in __getattr__(self, attr)
     96             return self[attr]
     97
---> 98         return object.__getattribute__(self, attr)
     99
    100     def __iter__(self):

AttributeError: 'DatetimeIndexResampler' object has no attribute 'nlargest'

vesion

INSTALLED VERSIONS

commit: 9122952
python: 3.5.3.candidate.1
python-bits: 64
OS: Darwin
OS-release: 16.7.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: en_US.UTF-8
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.24.0.dev0+510.g9122952d8
pytest: 3.7.2
pip: 18.0
setuptools: 33.1.1
Cython: 0.28.4
numpy: 1.12.0
scipy: None
pyarrow: None
xarray: None
IPython: 5.2.2
sphinx: 1.6.6
patsy: None
dateutil: 2.6.0
pytz: 2016.10
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 2.0.0
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 0.9999999
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.9.5
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
gcsfs: None

@hellojinwoo
Copy link

hellojinwoo commented Jun 7, 2020

I am using pandas version 1.0.1 and rain_random.resample('A').nlargest(3) is still not working. Hope this function is added in the following updates.

AttributeError                            Traceback (most recent call last)
<ipython-input-102-7057e1436432> in <module>
----> 1 rain_random.resample('A').nlargest(3)

~\anaconda3\lib\site-packages\pandas\core\resample.py in __getattr__(self, attr)
    105             return self[attr]
    106 
--> 107         return object.__getattribute__(self, attr)
    108 
    109     def __iter__(self):

AttributeError: 'DatetimeIndexResampler' object has no attribute 'nlargest'

@jreback
Copy link
Contributor

jreback commented Jun 7, 2020

pull requests are accepted;
this is how issues get addressed in open source

@hellojinwoo
Copy link

hellojinwoo commented Jun 8, 2020

pull requests are accepted;
this is how issues get addressed in open source

May I ask what you mean by "this is how issues get addressed in open source?"

@jreback
Copy link
Contributor

jreback commented Jun 8, 2020

pandas and virtually all open source project are all volunteer

the core team will review pull requests

since there are 3000+ open issue most patches must come from the community

issues get fixed when folks like you open pull requests

@hellojinwoo
Copy link

hellojinwoo commented Jun 8, 2020

pandas and virtually all open source project are all volunteer

the core team will review pull requests

since there are 3000+ open issue most patches must come from the community

issues get fixed when folks like you open pull requests

Yeah I know that Pandas is an open-source project. But regarding this issue resample('D').nlargest(3), I cannot see neither the assignees nor linked pull requests, which can be found on the right side of this webpage. So I was curious to know what you meant by "pull requests are accepted".

And since this issue was raised about 2 years and a half ago, I just wanted to point out that this has not been resolved yet. So it made me puzzle a little bit when you said "this is how issues get addressed in open source".

@jreback
Copy link
Contributor

jreback commented Jun 8, 2020

there are no assignees (who would we assign?)

and PRs would be linked to the issue

that’s the point here - no one has submitted anything

you or anyone else are welcome to do so

in this or any other issue

noting that something is not done is not that helpful - the issue is marked open

what IS helpful is submitting changes / examples /
tests

@hellojinwoo
Copy link

hellojinwoo commented Jun 8, 2020

there are no assignees (who would we assign?)

and PRs would be linked to the issue

that’s the point here - no one has submitted anything

you or anyone else are welcome to do so

in this or any other issue

noting that something is not done is not that helpful - the issue is marked open

what IS helpful is submitting changes / examples /
tests

Now I can see why your replies have been sour. I am new to this pandas-dev zone, so if it is rude to report an old issue once again, I would like to apologize. You don't need to be sulky like that either because that IS NOT helpful either, right? Good day

@jreback
Copy link
Contributor

jreback commented Jun 8, 2020

@hellojinwoo thanks for the apology

we have been getting the: why has this x year old issue not been resolved

many times

and to be honest it’s very rude of folks to do this but i guess new folks just don’t realize this so ok

people work extremely hard in open source and volunteer much time - yet continued comments like this (and to be clear i am not calling you out at all) cause burnout for this thankless task

so thank you for commenting in the issue
as i said above - if you would like to help out great

@rhshadrach rhshadrach changed the title Series.resample().nlargest produces incorrect output ENH: Add DatetimeIndexResampler.nlargest Nov 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants