add fix for opening zero observation dta files #7369

bquistorff · 2014-06-06T04:38:38Z

Opening a Stata dta file with no observations (but having variables) resulted in an error. Example file: https://dl.dropboxusercontent.com/u/6705315/no_obs_v115.dta.

>>> import pandas as pd
>>> pd.read_stata("no_obs_v115.dta")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Python27\lib\site-packages\pandas\io\stata.py", line 49, in read_stata
    return reader.data(convert_dates, convert_categoricals, index)
  File "C:\Python27\lib\site-packages\pandas\io\stata.py", line 855, in data
    data = DataFrame(data, columns=self.varlist, index=index)
  File "C:\Python27\lib\site-packages\pandas\core\frame.py", line 255, in __init__
    copy=copy)
  File "C:\Python27\lib\site-packages\pandas\core\frame.py", line 367, in _init_ndarray
    return create_block_manager_from_blocks([values.T], [columns, index])
  File "C:\Python27\lib\site-packages\pandas\core\internals.py", line 3185, in create_block_manager_from_blocks
    construction_error(tot_items, blocks[0].shape[1:], axes, e)
  File "C:\Python27\lib\site-packages\pandas\core\internals.py", line 3166, in construction_error
    passed,implied))
ValueError: Shape of passed values is (0, 0), indices imply (1, 0)
>>> pd.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 2.7.6.final.0
python-bits: 32
OS: Windows
OS-release: 7
machine: AMD64
processor: AMD64 Family 16 Model 6 Stepping 3, AuthenticAMD
byteorder: little
LC_ALL: None
LANG: None

pandas: 0.14.0
nose: 1.3.3
Cython: 0.20.1
numpy: 1.8.1
scipy: None
statsmodels: None
IPython: None
sphinx: None
patsy: None
scikits.timeseries: None
dateutil: 2.2
pytz: 2014.3
bottleneck: 0.8.0
tables: None
numexpr: None
matplotlib: None
openpyxl: 1.8.6
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
bq: None
apiclient: None
rpy2: None
sqlalchemy: None
pymysql: None
psycopg2: None

The PR fixes this issue in stata.py, though maybe the issue should be fixed in DataFrame.

jreback · 2014-06-06T13:02:17Z

needs a test for doing this (e.g. write an empty frame then read it back in and compare), put in io/tests/test_stat.py

jreback · 2014-06-06T13:02:46Z

you can also add your test file to the repo (same place as existiing test files)

bquistorff · 2014-06-06T16:46:42Z

OK, I've added a test. I've left out my file, because generating the empty file seems to work just fine. Let me know if something isn't right as I'm new to git.

jreback · 2014-06-06T17:31:10Z

pandas/io/tests/test_stata.py

@@ -72,6 +72,13 @@ def read_dta(self, file):
    def read_csv(self, file):
        return read_csv(file, parse_dates=True)

+    def test_read_empty_dta(self):
+        empty_ds = DataFrame(columns=['unit'])


add the comment GH 7369 here

jreback · 2014-06-06T17:31:31Z

add a note in Bug Fixes in v0.14.1.txt otherwise looks good

bquistorff · 2014-06-06T19:05:05Z

OK. Thanks for the help.

jreback · 2014-06-06T19:06:05Z

ok looks fine

pls squash down to a single commit

bquistorff · 2014-06-07T02:06:07Z

OK. I think that worked.

hayd · 2014-06-07T06:19:13Z

@bquistorff There's a merge conflict (most likely in the release notes), do you mind rebasing?

@jreback perhaps we could put like fifty blank lines in each subsection of the release notes and let people insert their's in randomly (to minimise merge conflicts early on in the release cycle) ?

jreback · 2014-06-07T10:15:51Z

@hayd good idea!
go for it

hayd · 2014-06-08T01:01:08Z

@jreback done 8633e6f (I bet this'll cause at lease 3 merge conflicts). Is the issue that people appends there release note at the end rather than insert it somewhere randomly? Hmmm, we'll see of this helps at all.

@bquistorff pandas git tip: insert release notes somewhere randomly rather than append at the end! :)

jreback · 2014-06-13T20:33:41Z

@bquistorff thanks!

jreback added the Stata label Jun 6, 2014

jreback reviewed Jun 6, 2014
View reviewed changes

jreback added this to the 0.14.1 milestone Jun 6, 2014

jreback added the Bug label Jun 6, 2014

BUG: add fix for opening zero observation dta files

f5a1113

jreback merged commit f5a1113 into pandas-dev:master Jun 13, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add fix for opening zero observation dta files #7369

add fix for opening zero observation dta files #7369

bquistorff commented Jun 6, 2014

jreback commented Jun 6, 2014

jreback commented Jun 6, 2014

bquistorff commented Jun 6, 2014

jreback Jun 6, 2014

jreback commented Jun 6, 2014

bquistorff commented Jun 6, 2014

jreback commented Jun 6, 2014

bquistorff commented Jun 7, 2014

hayd commented Jun 7, 2014

jreback commented Jun 7, 2014

hayd commented Jun 8, 2014

jreback commented Jun 13, 2014

add fix for opening zero observation dta files #7369

add fix for opening zero observation dta files #7369

Conversation

bquistorff commented Jun 6, 2014

jreback commented Jun 6, 2014

jreback commented Jun 6, 2014

bquistorff commented Jun 6, 2014

jreback Jun 6, 2014

Choose a reason for hiding this comment

jreback commented Jun 6, 2014

bquistorff commented Jun 6, 2014

jreback commented Jun 6, 2014

bquistorff commented Jun 7, 2014

hayd commented Jun 7, 2014

jreback commented Jun 7, 2014

hayd commented Jun 8, 2014

jreback commented Jun 13, 2014