Possible Bug - math on like-indexed datetime series doesn't work as expected #7500

aullrich2013 · 2014-06-18T19:15:25Z

I have two series that are like-indexed datetimes. I'm trying to do simple math operations on them and noticed the results don't match what I'd expect. Specifically, subtracting one datetime from the other doesn't always result in subtraction across the aligned indices. Transforming the series to a dataframe with a dummy column gets us closer but the type manipulation isn't correct.

print firstOrderNotEval.loc[site]
print firstEvalOrder.loc[site]
print type(firstOrderNotEval.loc[site])
print type(firstEvalOrder.loc[site])

### output:
#2008-08-21 00:00:00
#2013-09-10 00:00:00
# <class 'pandas.tslib.Timestamp'>
# <class 'pandas.tslib.Timestamp'>

timeToFirstNonEvalPurchase_doesntWork = ((firstOrderNotEval - firstEvalOrder)/np.timedelta64(1,'D'))
timeToFirstNonEvalPurchase = ((firstOrderNotEval.to_frame('a') - firstEvalOrder.to_frame('a'))/np.timedelta64(1,'D'))['a']

print timeToFirstNonEvalPurchase_doesntWork.loc[2898717]
print timeToFirstNonEvalPurchase.loc[2898717]

### output:
# nan
# -1846 nanoseconds # note should be 1846 days

Subtracting individual elements gives the correct result but as a datetime.timedelta type. subtracting the series directly gives NaT:

site = 2898717   
print (firstOrderNotEval.loc[site] - firstEvalOrder.loc[site])
print type(firstOrderNotEval.loc[site] - firstEvalOrder.loc[site])
print (firstOrderNotEval - firstEvalOrder).loc[site]
### output:
# -1846 days, 0:00:00
# <type 'datetime.timedelta'>
# NaT

Perhaps this has to do with the timestamp type itself given the following example:

print (firstOrderNotEval.to_frame('a') - firstEvalOrder.to_frame('a')).loc[site]/np.timedelta64(1,'D')
print ((firstOrderNotEval.to_frame('a') - firstEvalOrder.to_frame('a'))/np.timedelta64(1,'D')).loc[site]

### output:
# a   -1846
# Name: 2898717.0, dtype: float64
# a   -00:00:00.000002
# Name: 2898717.0, dtype: timedelta64[ns]

Note that the following have different results based on how the divide by timedelta64 is performed:

tmp = ((firstOrderNotEval.to_frame('a') - firstEvalOrder.to_frame('a')))
print (tmp/np.timedelta64(1,'D')).loc[site]
print tmp.apply(lambda x: x/np.timedelta64(1,'D')).loc[site]
### output:
# a   -00:00:00.000002
# Name: 2898717.0, dtype: timedelta64[ns]
### what we'd expect: 
# a   -1846
# Name: 2898717.0, dtype: float64

The attached pickle files (as .jpg) include the series used in this example

firstOrderNotEval.to_pickle('./firstOrderNotEval.jpg')
firstEvalOrder.to_pickle('./firstEvalOrder.jpg')

jreback · 2014-06-18T19:20:53Z

show what pd.show_versions() displays

jreback · 2014-06-18T19:28:10Z

so you need to align the series first (this is a bug, should be done internally).

x,y = firstOrderNotEval.align(firstEvalOrder)

then

x-y will be fine

jreback · 2014-06-19T00:01:48Z

@aullrich2013 this is now fixed in master..thanks for the report

jreback added Bug labels Jun 18, 2014

jreback added this to the 0.14.1 milestone Jun 18, 2014

jreback mentioned this issue Jun 18, 2014

BUG: Bug in timeops with non-aligned Series (GH7500) #7503

Merged

jreback closed this as completed in #7503 Jun 19, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Possible Bug - math on like-indexed datetime series doesn't work as expected #7500

Possible Bug - math on like-indexed datetime series doesn't work as expected #7500

aullrich2013 commented Jun 18, 2014

jreback commented Jun 18, 2014

jreback commented Jun 18, 2014

jreback commented Jun 19, 2014

Possible Bug - math on like-indexed datetime series doesn't work as expected #7500

Possible Bug - math on like-indexed datetime series doesn't work as expected #7500

Comments

aullrich2013 commented Jun 18, 2014

jreback commented Jun 18, 2014

jreback commented Jun 18, 2014

jreback commented Jun 19, 2014