Skip to content

statsmodels frequency inference problems on 0.14.0 #7922

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jseabold opened this issue Aug 4, 2014 · 7 comments
Closed

statsmodels frequency inference problems on 0.14.0 #7922

jseabold opened this issue Aug 4, 2014 · 7 comments
Labels
Frequency DateOffsets

Comments

@jseabold
Copy link
Contributor

jseabold commented Aug 4, 2014

Reports are starting to trickle in of failing statsmodels tests on pandas 0.14.0. I haven't had a chance to see what the problem is yet.

statsmodels/statsmodels#1822

@jreback
Copy link
Contributor

jreback commented Aug 4, 2014

ok, pls post relevant failing code as an example. These look like with statsmodels master yes?

@TomAugspurger
Copy link
Contributor

I ran test_arima off statsmodels 0.6.0.dev-8709f00, pandas '0.14.1-25-ga797b28'

(py3) ~/E/p/l/p/s/s/s/t/tests (master=) nosetests test_arima.py
..............................................................................................................................................................................................................................................................................................................................................................................................................................................................................EE............** On entry to DLASCL, parameter number  4 had an illegal value
** On entry to DLASCL, parameter number  5 had an illegal value
** On entry to DLASCL, parameter number  4 had an illegal value
** On entry to DLASCL, parameter number  5 had an illegal value
.........
======================================================================
ERROR: statsmodels.tsa.tests.test_arima.test_arma_predict_indices
----------------------------------------------------------------------
Traceback (most recent call last):
  File "index.pyx", line 542, in pandas.index.DatetimeEngine.get_loc (pandas/index.c:9582)
  File "hashtable.pyx", line 381, in pandas.hashtable.Int64HashTable.get_item (pandas/hashtable.c:7032)
  File "hashtable.pyx", line 387, in pandas.hashtable.Int64HashTable.get_item (pandas/hashtable.c:6973)
KeyError: 1262217600000000000

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/tom/Envs/py3/lib/python3.4/site-packages/statsmodels/statsmodels/tsa/base/datetools.py", line 42, in _index_date
    date = dates.get_loc(date)
  File "/Users/tom/Envs/py3/lib/python3.4/site-packages/pandas-0.14.1_25_ga797b28-py3.4-macosx-10.9-x86_64.egg/pandas/tseries/index.py", line 1301, in get_loc
    return self._engine.get_loc(stamp)
  File "index.pyx", line 519, in pandas.index.DatetimeEngine.get_loc (pandas/index.c:9851)
  File "index.pyx", line 544, in pandas.index.DatetimeEngine.get_loc (pandas/index.c:9632)
KeyError: Timestamp('2009-12-31 00:00:00')

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/tom/Envs/py3/lib/python3.4/site-packages/nose-1.3.3-py3.4.egg/nose/case.py", line 198, in runTest
    self.test(*self.arg)
  File "/Users/tom/Envs/py3/lib/python3.4/site-packages/statsmodels/statsmodels/tsa/tests/test_arima.py", line 971, in test_arma_predict_indices
    _check_start(*((model,) + case))
  File "/Users/tom/Envs/py3/lib/python3.4/site-packages/statsmodels/statsmodels/tsa/tests/test_arima.py", line 917, in _check_start
    start = model._get_predict_start(given, dynamic)
  File "/Users/tom/Envs/py3/lib/python3.4/site-packages/statsmodels/statsmodels/tsa/arima_model.py", line 581, in _get_predict_start
    method)
  File "/Users/tom/Envs/py3/lib/python3.4/site-packages/statsmodels/statsmodels/tsa/arima_model.py", line 306, in _validate
    start = _index_date(start, dates)
  File "/Users/tom/Envs/py3/lib/python3.4/site-packages/statsmodels/statsmodels/tsa/base/datetools.py", line 57, in _index_date
    "an integer" % date)
ValueError: There is no frequency for these dates and date 2009-12-31 00:00:00 is not in dates index. Try giving a date that is in the dates index or use an integer

======================================================================
ERROR: statsmodels.tsa.tests.test_arima.test_arima_predict_indices
----------------------------------------------------------------------
Traceback (most recent call last):
  File "index.pyx", line 542, in pandas.index.DatetimeEngine.get_loc (pandas/index.c:9582)
  File "hashtable.pyx", line 381, in pandas.hashtable.Int64HashTable.get_item (pandas/hashtable.c:7032)
  File "hashtable.pyx", line 387, in pandas.hashtable.Int64HashTable.get_item (pandas/hashtable.c:6973)
KeyError: 1262217600000000000

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/tom/Envs/py3/lib/python3.4/site-packages/statsmodels/statsmodels/tsa/base/datetools.py", line 42, in _index_date
    date = dates.get_loc(date)
  File "/Users/tom/Envs/py3/lib/python3.4/site-packages/pandas-0.14.1_25_ga797b28-py3.4-macosx-10.9-x86_64.egg/pandas/tseries/index.py", line 1301, in get_loc
    return self._engine.get_loc(stamp)
  File "index.pyx", line 519, in pandas.index.DatetimeEngine.get_loc (pandas/index.c:9851)
  File "index.pyx", line 544, in pandas.index.DatetimeEngine.get_loc (pandas/index.c:9632)
KeyError: Timestamp('2009-12-31 00:00:00')

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/tom/Envs/py3/lib/python3.4/site-packages/nose-1.3.3-py3.4.egg/nose/case.py", line 198, in runTest
    self.test(*self.arg)
  File "/Users/tom/Envs/py3/lib/python3.4/site-packages/statsmodels/statsmodels/tsa/tests/test_arima.py", line 1043, in test_arima_predict_indices
    _check_start(*((model,) + case))
  File "/Users/tom/Envs/py3/lib/python3.4/site-packages/statsmodels/statsmodels/tsa/tests/test_arima.py", line 917, in _check_start
    start = model._get_predict_start(given, dynamic)
  File "/Users/tom/Envs/py3/lib/python3.4/site-packages/statsmodels/statsmodels/tsa/arima_model.py", line 959, in _get_predict_start
    method)
  File "/Users/tom/Envs/py3/lib/python3.4/site-packages/statsmodels/statsmodels/tsa/arima_model.py", line 306, in _validate
    start = _index_date(start, dates)
  File "/Users/tom/Envs/py3/lib/python3.4/site-packages/statsmodels/statsmodels/tsa/base/datetools.py", line 57, in _index_date
    "an integer" % date)
ValueError: There is no frequency for these dates and date 2009-12-31 00:00:00 is not in dates index. Try giving a date that is in the dates index or use an integer

----------------------------------------------------------------------
Ran 485 tests in 24.284s

FAILED (errors=2)

I may have a chance to debug tomorrow

@jseabold
Copy link
Contributor Author

jseabold commented Aug 4, 2014

Thanks. The C/Cython errors may be local issues. I don't recall having seen those.

The relevant failing code is the test suite... ;) Specifically any of the tsa tests and graphics.tests.test_tsaplots.

@TomAugspurger
Copy link
Contributor

This was broken by commit 4b27023 in issue #7606 (cc @sinhrks).

The statsmodels tests check if a DatetimeIndex has the freqstr attribute set.
statsmodels was relying on hasattr('freqstr') returning True being equivalent to self.freq != None.
After 4b27023, we explicitly set the frequency to None, so a DatetimeIndex with no frequency will still have the freqstr attribute, it's just set to None.

The change was make the code work as documented. @jseabold are you OK with me submitting a PR for statsmodels? Sorry to break the tests like that, but I think DatetimeIndex should always have a freqstr attribute, even if it's None.

@TomAugspurger
Copy link
Contributor

Closing this as it's been fixed in statsmodels.

Sorry again for the break.

@startakovsky
Copy link

Am I doing anything wrong to generate this? I don't think so because when I run this on small values for i things look good, but for large values (>20000) I get a collection of errors. These df.iloc[i]=series[i] is from within a for i in range(200000) loop. Thoughts?

Traceback (most recent call last):
  File "/home/vagrant/shared/packages/cluster.py", line 41, in getGeoDataFrameClusterCounts
    df.iloc[i]=series[i]
  File "/usr/local/lib/python2.7/dist-packages/pandas/core/series.py", line 502, in __getitem__
    result = self.index.get_value(self, key)
  File "/usr/local/lib/python2.7/dist-packages/pandas/core/index.py", line 1400, in get_value
return self._engine.get_value(s, k)
 File "pandas/index.pyx", line 99, in pandas.index.IndexEngine.get_value (pandas/index.c:3080)
 File "pandas/index.pyx", line 107, in pandas.index.IndexEngine.get_value (pandas/index.c:2809)
 File "pandas/index.pyx", line 153, in pandas.index.IndexEngine.get_loc (pandas/index.c:3675)
 File "pandas/hashtable.pyx", line 381, in pandas.hashtable.Int64HashTable.get_item (pandas/hashtable.c:7201)
 File "pandas/hashtable.pyx", line 387, in pandas.hashtable.Int64HashTable.get_item (pandas/hashtable.c:7139)

@jreback
Copy link
Contributor

jreback commented Dec 4, 2014

ideally show a copy / pastable example that is self reproducing

at a minimum show some more code and df.info() and series.dtype

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Frequency DateOffsets
Projects
None yet
Development

No branches or pull requests

4 participants