Skip to content

add fix for opening zero observation dta files #7369

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 13, 2014
Merged

add fix for opening zero observation dta files #7369

merged 1 commit into from
Jun 13, 2014

Conversation

bquistorff
Copy link
Contributor

Opening a Stata dta file with no observations (but having variables) resulted in an error. Example file: https://dl.dropboxusercontent.com/u/6705315/no_obs_v115.dta.

>>> import pandas as pd
>>> pd.read_stata("no_obs_v115.dta")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Python27\lib\site-packages\pandas\io\stata.py", line 49, in read_stata
    return reader.data(convert_dates, convert_categoricals, index)
  File "C:\Python27\lib\site-packages\pandas\io\stata.py", line 855, in data
    data = DataFrame(data, columns=self.varlist, index=index)
  File "C:\Python27\lib\site-packages\pandas\core\frame.py", line 255, in __init__
    copy=copy)
  File "C:\Python27\lib\site-packages\pandas\core\frame.py", line 367, in _init_ndarray
    return create_block_manager_from_blocks([values.T], [columns, index])
  File "C:\Python27\lib\site-packages\pandas\core\internals.py", line 3185, in create_block_manager_from_blocks
    construction_error(tot_items, blocks[0].shape[1:], axes, e)
  File "C:\Python27\lib\site-packages\pandas\core\internals.py", line 3166, in construction_error
    passed,implied))
ValueError: Shape of passed values is (0, 0), indices imply (1, 0)
>>> pd.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 2.7.6.final.0
python-bits: 32
OS: Windows
OS-release: 7
machine: AMD64
processor: AMD64 Family 16 Model 6 Stepping 3, AuthenticAMD
byteorder: little
LC_ALL: None
LANG: None

pandas: 0.14.0
nose: 1.3.3
Cython: 0.20.1
numpy: 1.8.1
scipy: None
statsmodels: None
IPython: None
sphinx: None
patsy: None
scikits.timeseries: None
dateutil: 2.2
pytz: 2014.3
bottleneck: 0.8.0
tables: None
numexpr: None
matplotlib: None
openpyxl: 1.8.6
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
bq: None
apiclient: None
rpy2: None
sqlalchemy: None
pymysql: None
psycopg2: None

The PR fixes this issue in stata.py, though maybe the issue should be fixed in DataFrame.

@jreback
Copy link
Contributor

jreback commented Jun 6, 2014

needs a test for doing this (e.g. write an empty frame then read it back in and compare), put in io/tests/test_stat.py

@jreback jreback added the Stata label Jun 6, 2014
@jreback
Copy link
Contributor

jreback commented Jun 6, 2014

you can also add your test file to the repo (same place as existiing test files)

@bquistorff
Copy link
Contributor Author

OK, I've added a test. I've left out my file, because generating the empty file seems to work just fine. Let me know if something isn't right as I'm new to git.

@@ -72,6 +72,13 @@ def read_dta(self, file):
def read_csv(self, file):
return read_csv(file, parse_dates=True)

def test_read_empty_dta(self):
empty_ds = DataFrame(columns=['unit'])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add the comment GH 7369 here

@jreback
Copy link
Contributor

jreback commented Jun 6, 2014

add a note in Bug Fixes in v0.14.1.txt otherwise looks good

@jreback jreback added this to the 0.14.1 milestone Jun 6, 2014
@jreback jreback added the Bug label Jun 6, 2014
@bquistorff
Copy link
Contributor Author

OK. Thanks for the help.

@jreback
Copy link
Contributor

jreback commented Jun 6, 2014

ok looks fine

pls squash down to a single commit

@bquistorff
Copy link
Contributor Author

OK. I think that worked.

@hayd
Copy link
Contributor

hayd commented Jun 7, 2014

@bquistorff There's a merge conflict (most likely in the release notes), do you mind rebasing?

@jreback perhaps we could put like fifty blank lines in each subsection of the release notes and let people insert their's in randomly (to minimise merge conflicts early on in the release cycle) ?

@jreback
Copy link
Contributor

jreback commented Jun 7, 2014

@hayd good idea!
go for it

@hayd
Copy link
Contributor

hayd commented Jun 8, 2014

@jreback done 8633e6f (I bet this'll cause at lease 3 merge conflicts). Is the issue that people appends there release note at the end rather than insert it somewhere randomly? Hmmm, we'll see of this helps at all.

@bquistorff pandas git tip: insert release notes somewhere randomly rather than append at the end! :)

@jreback jreback merged commit f5a1113 into pandas-dev:master Jun 13, 2014
@jreback
Copy link
Contributor

jreback commented Jun 13, 2014

@bquistorff thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug IO Stata read_stata, to_stata
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants