Skip to content

BUG: Fix input bytes conversion in Py3 to return str #4783

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

jtratner
Copy link
Contributor

@jtratner jtratner commented Sep 9, 2013

Fixes #3963, #4785

Fixed bug with reading compressed files in as bytes (gzip and bz2 both now return bytes rather
than str in Python 3) rather than str in Python 3, as well as the lack of conversion of BytesIO. Now, _get_handle and _wrap_compressed both wrap in an io.TextIOWrapper, so that the parsers work internally only with str in Python 3. In Python 3.2, has to read the entire file in first (because gzip and bz2 files both lack a read1() method in 3.2)

Also adds support for passing fileobjects with compression == 'bz2'.

@jtratner
Copy link
Contributor Author

jtratner commented Sep 9, 2013

bz2 in python 3.2 doesn't support a read1 method, for what look to be historical reasons. It does support read1 in Python 3.3. Because it's a C module, you can't monkey-patch it directly.

@jtratner
Copy link
Contributor Author

@y-p I'm getting closer on this, but decided to give up and just read the entire file in with 3.2, then decode and rewrap with StringIO -- otherwise we couldn't support compressed files (easily) in that version...

@ghost
Copy link

ghost commented Sep 12, 2013

Yep, seems like a reasonable minimal workaround to me.

@jtratner
Copy link
Contributor Author

@y-p look okay to you? going to merge if so...

@ghost
Copy link

ghost commented Sep 13, 2013

Yep, go ahead.

Fixed bug with reading compressed files in as `bytes` rather than
`str` in Python 3. `gzip` and `bz2` both now return `bytes` rather
than `str` in Python 3, so just needed to wrap them in an
`io.TextIOWrapper` to make everything work.

Only wrap BytesIO and compressed streams

Read entire compressed file in 3.2 to get around inconsistencies with TextIOWrapper
jtratner added a commit that referenced this pull request Sep 14, 2013
…-py3

BUG: Fix input bytes conversion in Py3 to return str
@jtratner jtratner merged commit d702de0 into pandas-dev:master Sep 14, 2013
@jtratner jtratner deleted the wrap-compressed-fileobjects-in-py3 branch September 21, 2013 20:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: python 3 compression and read_fwf
1 participant