-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
TypeError when using 'comment=...' in read_csv from a file #31396
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I could reproduce your issue with the same versions of python and pandas. The problem also occurs without the comment in the file, but with It seems like The original issue is probably that in line 2392: line = self._check_comments([line])[0] the argument to I assume changing the line to line = self._check_comments([[line]])[0] should solve the issue. It would probably be good to add a test for this parameter combination. I can do it, but I'm unsure how to add the test. Should I just add it as parameter to |
Nice! Thank you for your fast investigation 👍 I think, in principle it's good practice to make an extra test. But I'm not active in pandas development. |
Added a test case to io/parser/test_python_parser_only.py in order to reproduce pandas-dev#31396.
This makes it possible to use read_csv with sep=None and comment set to a non-None value. Fixes pandas-dev#31396.
Added a test case to reproduce issue pandas-dev#31396.
This makes read_csv work when sep=None and comment is set to a value. Fixes pandas-dev#31396.
Added a note in whatsnew/v1.0.0.rst and moved test for pandas-dev#31396 to the end of tests/io/parser/test_python_parser_only.py.
Code Sample
Given a data file
data.csv
with a line that is commented out:Problem description
It raises a
TypeError
when using thecomment
parameter:TypeError
Without the comment in the data file and without the parameter
comment='#'
everything works as expected.It seems that
sep=None
ist the problem here.When using
sep=','
it works. But in our case, the import is part of a general importer that should accept a variety of different files. Thus, we must usesep=None
.Expected Output
I would expect the following output:
Output of
pd.show_versions()
Details
INSTALLED VERSIONS
commit : None
python : 3.8.1.final.0
python-bits : 64
OS : Linux
OS-release : 5.4.13-201.fc31.x86_64
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : de_DE.UTF-8
LOCALE : de_DE.UTF-8
pandas : 0.25.3
numpy : 1.18.1
pytz : 2019.3
dateutil : 2.8.1
pip : 19.3.1
setuptools : 42.0.2
Cython : None
pytest : 5.3.2
hypothesis : None
sphinx : 2.3.1
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 2.10.3
IPython : 7.11.1
pandas_datareader: None
bs4 : None
bottleneck : None
fastparquet : None
gcsfs : None
lxml.etree : None
matplotlib : 3.1.2
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pytables : None
s3fs : None
scipy : 1.4.1
sqlalchemy : None
tables : None
xarray : None
xlrd : None
xlwt : None
xlsxwriter : None
The text was updated successfully, but these errors were encountered: