Inconsistency, NaT included in result of groupby method first but not NaN #10590

larvian · 2015-07-15T21:23:28Z

NaT is included in result of groupby method first while NaN. I am expecting that first should skip both NaN and NaT and include the first value where pandas.isnull is False.
Demonstration of the inconsistency. (note that both NaT and NaN in the data frame are produced by np.nan, the difference is that the d_t column contains date values).

import numpy as np
import pandas as pd
from datetime import datetime as dt

testFrame=DataFrame({'IX':['A','A'],'num':[np.nan,100],'d_t':[np.nan,dt.now()]})

Resulting data frame:

  IX                     d_t  num
0  A                     NaT  NaN
1  A 2015-07-15 22:47:10.635  100

Grouping this data frame on the IX column and executing the first method results in this data frame which shows the inconsistency between the d_t and num columns.

testFrame.groupby('IX').first()

Resulting dataframe:

        d_t  num
IX              
A       NaT  100

INSTALLED VERSIONS
------------------
commit: None
python: 2.7.9.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 58 Stepping 9, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None

pandas: 0.16.2
nose: 1.3.4
Cython: 0.22
numpy: 1.9.2
scipy: 0.15.1
statsmodels: 0.6.1
IPython: 3.0.0
sphinx: 1.2.3
patsy: 0.3.0
dateutil: 2.4.2
pytz: 2015.4
bottleneck: None
tables: 3.1.1
numexpr: 2.3.1
matplotlib: 1.4.3
openpyxl: 1.8.5
xlrd: 0.9.3
xlwt: 0.7.5
xlsxwriter: 0.6.7
lxml: 3.4.2
bs4: 4.3.2
html5lib: 0.999
httplib2: None
apiclient: None
sqlalchemy: 0.9.9
pymysql: None
psycopg2: None

The text was updated successfully, but these errors were encountered:

jreback · 2015-07-15T21:28:26Z

hmm, @sinhrks I thought this was fixed?

sinhrks · 2015-07-15T21:48:35Z

I remember some issues when NaT is in group key, but not aware the aggregation issure. .min might be affected also.

jreback · 2015-07-15T22:20:08Z

prob needs some adjustment for comparison vs iNaT in the first/last (though i thought it was there)

sinhrks · 2015-07-16T13:12:39Z

Confirmed .min also affected. @larvian PR is appreciated:)

testFrame.groupby('IX').min()
#    d_t  num
# IX         
# A  NaT  100

larvian · 2015-07-18T08:33:38Z

@sinhrks :) I see. Well, I have started to research GitHub and the Pandas source code a bit but right now I unfortunately don't have very much time available and waiting for me could require patience.
If I come to the point when I feel confident that I can contribute to the solution then I will do a PR. If someone else does it I am OK with that too :)

For groupby the time stamps gets converted to integervalue tslib.iNaT which is -9223372036854775808. The aggregation is then done using this value with incorrect result as a consequence. The solution proposed here is to replace its value by np.nan in case it is a datetime or timedelta.

jreback added Bug Datetime Datetime data dtype Groupby Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate labels Jul 15, 2015

jreback added this to the 0.17.0 milestone Jul 15, 2015

larvian mentioned this issue Jul 20, 2015

BUG: Fix issue with incorrect groupby handling of NaT #10625

Merged

jreback modified the milestones: Next Major Release, 0.17.0 Sep 1, 2015

jreback added the Prio-medium label Sep 1, 2015

jreback modified the milestones: 0.17.0, Next Major Release Sep 2, 2015

jreback closed this as completed in #10625 Sep 3, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inconsistency, NaT included in result of groupby method first but not NaN #10590

Inconsistency, NaT included in result of groupby method first but not NaN #10590

larvian commented Jul 15, 2015

jreback commented Jul 15, 2015

sinhrks commented Jul 15, 2015

jreback commented Jul 15, 2015

sinhrks commented Jul 16, 2015

larvian commented Jul 18, 2015

Inconsistency, NaT included in result of groupby method first but not NaN #10590

Inconsistency, NaT included in result of groupby method first but not NaN #10590

Comments

larvian commented Jul 15, 2015

jreback commented Jul 15, 2015

sinhrks commented Jul 15, 2015

jreback commented Jul 15, 2015

sinhrks commented Jul 16, 2015

larvian commented Jul 18, 2015