BUG: read_stata ignoring encoding? #4626

jseabold · 2013-08-21T21:57:28Z

I don't have time to debug right now, and maybe my expectations are just off, but it looks like read_stata doesn't respect the encoding keyword. I'm also not sure it's needed. AFAIK, Stata doesn't (and likely won't) support unicode. It always uses latin-1, so we can always use the latin-1 encoding for strings (maybe not desirable though).

https://www.dropbox.com/s/hq42trq4327ker8/encoding_issue.dta

dta = pd.read_stata("./encoding_issue.dta")
dta.head()

dta = pd.read_stata("./encoding_issue.dta", encoding="latin-1")
dta.head()

dta = pd.read_stata("./encoding_issue.dta")
dta.kreis1849.str.decode("latin-1")

The text was updated successfully, but these errors were encountered:

jreback mentioned this issue Aug 22, 2013

BUG: (GH4626) Fix decoding based on a passed in non-default encoding in pd.read_stata #4643

Merged

jreback closed this as completed in #4643 Aug 26, 2013

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: read_stata ignoring encoding? #4626

BUG: read_stata ignoring encoding? #4626

jseabold commented Aug 21, 2013

BUG: read_stata ignoring encoding? #4626

BUG: read_stata ignoring encoding? #4626

Comments

jseabold commented Aug 21, 2013