UnicodeDecodeError: 'gbk' codec can't decode byte 0xbf in position 2: illegal multibyte sequence

最新推荐文章于 2025-06-06 21:47:13 发布

pcy1127918

最新推荐文章于 2025-06-06 21:47:13 发布

阅读量7.7k

点赞数

CC 4.0 BY-SA版权

文章标签：编码错误

本文链接：https://blog.csdn.net/pcy1127918/article/details/79956479

在尝试读取txt文件时遇到UnicodeDecodeError，首先尝试使用'utf-8'编码读取，但问题依旧。解决方法是将文本另存为UTF-8编码，如果已操作但仍出错，可以将文本转换为HTML格式，通过urllib请求读取，同时添加<html></html>标签。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

针对我上一篇《简易版计算文本相似度》出现的问题：

Traceback (most recent call last):
File "D:/pythonlianxi/wenbensimi1.py", line 52, in <module>
d3 = open(doc3).read()

UnicodeDecodeError: 'gbk' codec can't decode byte 0xbf in position 2: illegal multibyte sequence

编码错误，好吧，在后面加上encoding='utf-8'，问题依然存在。

Traceback (most recent call last):
File "D:/pythonlianxi/wenbensimi1.py", line 9, in <module>
d1 = open(doc1,'r',encoding='utf-8').read()
File "C:\Users\asus\AppData\Local\Programs\Python\Python35\lib\codecs.py", line 321, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb9 in position 0: invalid start byte

解决方法：因为打开的都是txt，所以就将文本另存为，编码从ANSI变成UTF-8，保存就可以了。

如果你已经这样做，还是有编码错误的话，建议把d3 = open(doc3,'r',encoding='utf-8').read()，改成d3=urilib.request.urlopen("http://127.0.0.1/zhenhuan.html").read().decode("utf-8"),把文本从txt改成html，并且可以在文本内容首尾加个标签<html></html>，保存，再将其放在服务器上，通过urilib.request.urlopen爬取到也可以。