Skip to content

read_fwf/table on py3 has trouble with BytesIO #4785

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ghost opened this issue Sep 9, 2013 · 7 comments
Closed

read_fwf/table on py3 has trouble with BytesIO #4785

ghost opened this issue Sep 9, 2013 · 7 comments
Labels
Bug Unicode Unicode strings
Milestone

Comments

@ghost
Copy link

ghost commented Sep 9, 2013

>>> import pandas as pd
>>> from io import BytesIO
>>> pd.read_fwf(BytesIO("שלום".encode('utf8')),widths=[2])
>>>pandas/io/parsers.py", line 1944, in <listcomp>
>>>    for (fromm, to) in self.colspecs]
>>>TypeError: Type str doesn't support the buffer API

By another path:

>>> from io import BytesIO
>>> pd.read_table(BytesIO("שלום::1234\n".encode('cp1255')),sep="::", engine='python', encoding='cp1255')
  File "/usr/local/lib/python3.3/dist-packages/pandas-0.12.0_357_g218f334-py3.3-linux-x86_64.egg/pandas/io/parsers.py", line 1324, in _read
    yield pat.split(line.decode('utf-8').strip())
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf9 in position 0: invalid start byte

is broken. Note that len(sep)>1 activates the python engine anyway right now.

related #4784

Edit: fixed incorrect encoding and updated error
Edit: Updated examples

ghost referenced this issue in jtratner/pandas Sep 9, 2013
`gzip` and `bz2` both now return `bytes` rather than `str` in Python 3,
so need to check for bytes and decode as necessary.
@ghost ghost assigned jtratner Sep 9, 2013
@jtratner
Copy link
Contributor

jtratner commented Sep 9, 2013

I don't think your example is due to pandas...I get the same error as you have above just by running the following 2 lines:

from io import BytesIO
BytesIO("שלום".encode('cp1252')).read()

@ghost
Copy link
Author

ghost commented Sep 9, 2013

You're right, that's the wrong codepage (Should be cp1255) and the example doesn't trigger that code path anyway. fixed.

@jtratner
Copy link
Contributor

jtratner commented Sep 9, 2013

first error appears to be resolved by #4784 (since it's just that bytes aren't being decoded). Second one still raises after #4784.

@ghost
Copy link
Author

ghost commented Sep 9, 2013

Glad to see you've claimed this bug @jtratner.

Just noting here that #4784 is only a half-measure fix and so I've reopened the original #3963,
adding an explanation of the problem.

@jtratner
Copy link
Contributor

@y-p How would you set up the test case in this issue for Python 2? I'm not sure whether I'm encountering a bug with this in PY2 or just translating this test incorrectly.

Doesn't need to work in Python 3...I can definitely figure out how to make a cross-compatible test case if I see the Python 2 version. It would be helpful if it didn't use the unicode literals future import.

@ghost
Copy link
Author

ghost commented Sep 13, 2013

Not sure I follow. It's a bug that manifests only on py3, so reasonably can be tested only on py3. Not so?

@jtratner
Copy link
Contributor

Closed by #4783

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Unicode Unicode strings
Projects
None yet
Development

No branches or pull requests

1 participant