Skip to content

Possible Bug - math on like-indexed datetime series doesn't work as expected #7500

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
aullrich2013 opened this issue Jun 18, 2014 · 3 comments · Fixed by #7503
Closed

Possible Bug - math on like-indexed datetime series doesn't work as expected #7500

aullrich2013 opened this issue Jun 18, 2014 · 3 comments · Fixed by #7503
Labels
Bug Datetime Datetime data dtype Timedelta Timedelta data type
Milestone

Comments

@aullrich2013
Copy link

firstordernoteval
firstevalorder

I have two series that are like-indexed datetimes. I'm trying to do simple math operations on them and noticed the results don't match what I'd expect. Specifically, subtracting one datetime from the other doesn't always result in subtraction across the aligned indices. Transforming the series to a dataframe with a dummy column gets us closer but the type manipulation isn't correct.

print firstOrderNotEval.loc[site]
print firstEvalOrder.loc[site]
print type(firstOrderNotEval.loc[site])
print type(firstEvalOrder.loc[site])

### output:
#2008-08-21 00:00:00
#2013-09-10 00:00:00
# <class 'pandas.tslib.Timestamp'>
# <class 'pandas.tslib.Timestamp'>

timeToFirstNonEvalPurchase_doesntWork = ((firstOrderNotEval - firstEvalOrder)/np.timedelta64(1,'D'))
timeToFirstNonEvalPurchase = ((firstOrderNotEval.to_frame('a') - firstEvalOrder.to_frame('a'))/np.timedelta64(1,'D'))['a']

print timeToFirstNonEvalPurchase_doesntWork.loc[2898717]
print timeToFirstNonEvalPurchase.loc[2898717]

### output:
# nan
# -1846 nanoseconds # note should be 1846 days

Subtracting individual elements gives the correct result but as a datetime.timedelta type. subtracting the series directly gives NaT:

site = 2898717   
print (firstOrderNotEval.loc[site] - firstEvalOrder.loc[site])
print type(firstOrderNotEval.loc[site] - firstEvalOrder.loc[site])
print (firstOrderNotEval - firstEvalOrder).loc[site]
### output:
# -1846 days, 0:00:00
# <type 'datetime.timedelta'>
# NaT

Perhaps this has to do with the timestamp type itself given the following example:

print (firstOrderNotEval.to_frame('a') - firstEvalOrder.to_frame('a')).loc[site]/np.timedelta64(1,'D')
print ((firstOrderNotEval.to_frame('a') - firstEvalOrder.to_frame('a'))/np.timedelta64(1,'D')).loc[site]

### output:
# a   -1846
# Name: 2898717.0, dtype: float64
# a   -00:00:00.000002
# Name: 2898717.0, dtype: timedelta64[ns]

Note that the following have different results based on how the divide by timedelta64 is performed:

tmp = ((firstOrderNotEval.to_frame('a') - firstEvalOrder.to_frame('a')))
print (tmp/np.timedelta64(1,'D')).loc[site]
print tmp.apply(lambda x: x/np.timedelta64(1,'D')).loc[site]
### output:
# a   -00:00:00.000002
# Name: 2898717.0, dtype: timedelta64[ns]
### what we'd expect: 
# a   -1846
# Name: 2898717.0, dtype: float64

The attached pickle files (as .jpg) include the series used in this example

firstOrderNotEval.to_pickle('./firstOrderNotEval.jpg')
firstEvalOrder.to_pickle('./firstEvalOrder.jpg')
@jreback
Copy link
Contributor

jreback commented Jun 18, 2014

show what pd.show_versions() displays

@jreback
Copy link
Contributor

jreback commented Jun 18, 2014

so you need to align the series first (this is a bug, should be done internally).

x,y = firstOrderNotEval.align(firstEvalOrder)

then

x-y will be fine

@jreback
Copy link
Contributor

jreback commented Jun 19, 2014

@aullrich2013 this is now fixed in master..thanks for the report

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Datetime Datetime data dtype Timedelta Timedelta data type
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants