read_fwf/table on py3 has trouble with BytesIO #4785

ghost · 2013-09-09T07:10:50Z

>>> import pandas as pd
>>> from io import BytesIO
>>> pd.read_fwf(BytesIO("שלום".encode('utf8')),widths=[2])
>>>pandas/io/parsers.py", line 1944, in <listcomp>
>>>    for (fromm, to) in self.colspecs]
>>>TypeError: Type str doesn't support the buffer API

By another path:

>>> from io import BytesIO
>>> pd.read_table(BytesIO("שלום::1234\n".encode('cp1255')),sep="::", engine='python', encoding='cp1255')
  File "/usr/local/lib/python3.3/dist-packages/pandas-0.12.0_357_g218f334-py3.3-linux-x86_64.egg/pandas/io/parsers.py", line 1324, in _read
    yield pat.split(line.decode('utf-8').strip())
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf9 in position 0: invalid start byte

is broken. Note that len(sep)>1 activates the python engine anyway right now.

related #4784

Edit: fixed incorrect encoding and updated error
Edit: Updated examples

The text was updated successfully, but these errors were encountered:

`gzip` and `bz2` both now return `bytes` rather than `str` in Python 3, so need to check for bytes and decode as necessary.

jtratner · 2013-09-09T11:19:39Z

I don't think your example is due to pandas...I get the same error as you have above just by running the following 2 lines:

from io import BytesIO
BytesIO("שלום".encode('cp1252')).read()

ghost · 2013-09-09T12:05:26Z

You're right, that's the wrong codepage (Should be cp1255) and the example doesn't trigger that code path anyway. fixed.

jtratner · 2013-09-09T12:26:54Z

first error appears to be resolved by #4784 (since it's just that bytes aren't being decoded). Second one still raises after #4784.

ghost · 2013-09-09T15:58:34Z

Glad to see you've claimed this bug @jtratner.

Just noting here that #4784 is only a half-measure fix and so I've reopened the original #3963,
adding an explanation of the problem.

jtratner · 2013-09-13T12:29:37Z

@y-p How would you set up the test case in this issue for Python 2? I'm not sure whether I'm encountering a bug with this in PY2 or just translating this test incorrectly.

Doesn't need to work in Python 3...I can definitely figure out how to make a cross-compatible test case if I see the Python 2 version. It would be helpful if it didn't use the unicode literals future import.

ghost · 2013-09-13T14:02:15Z

Not sure I follow. It's a bug that manifests only on py3, so reasonably can be tested only on py3. Not so?

jtratner · 2013-09-14T02:00:38Z

Closed by #4783

ghost referenced this issue in jtratner/pandas Sep 9, 2013

BUG: Fix read_fwf with compressed files.

8633d23

`gzip` and `bz2` both now return `bytes` rather than `str` in Python 3, so need to check for bytes and decode as necessary.

ghost assigned jtratner Sep 9, 2013

jtratner mentioned this issue Sep 12, 2013

BUG: Fix input bytes conversion in Py3 to return str #4783

Merged

jtratner closed this as completed Sep 14, 2013

wesm unassigned jtratner Oct 12, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

read_fwf/table on py3 has trouble with BytesIO #4785

read_fwf/table on py3 has trouble with BytesIO #4785

ghost commented Sep 9, 2013

jtratner commented Sep 9, 2013

ghost commented Sep 9, 2013

jtratner commented Sep 9, 2013

ghost commented Sep 9, 2013

jtratner commented Sep 13, 2013

ghost commented Sep 13, 2013

jtratner commented Sep 14, 2013

read_fwf/table on py3 has trouble with BytesIO #4785

read_fwf/table on py3 has trouble with BytesIO #4785

Comments

ghost commented Sep 9, 2013

jtratner commented Sep 9, 2013

ghost commented Sep 9, 2013

jtratner commented Sep 9, 2013

ghost commented Sep 9, 2013

jtratner commented Sep 13, 2013

ghost commented Sep 13, 2013

jtratner commented Sep 14, 2013