Skip to content

pandas.read_excel 'Can't determine version for xlrd' - old bug on pandas-2.1 #56692

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
brobr opened this issue Dec 31, 2023 · 6 comments · Fixed by #57708
Closed

pandas.read_excel 'Can't determine version for xlrd' - old bug on pandas-2.1 #56692

brobr opened this issue Dec 31, 2023 · 6 comments · Fixed by #57708
Labels
Bug Dependencies Required and optional dependencies IO Excel read_excel, to_excel

Comments

@brobr
Copy link

brobr commented Dec 31, 2023

  BUG: read_excel failing to check older xlrd versions properly #39355

I ran into this bug on pandas-2.1
without xlrd present no problem; but when installed, excel file is not loaded:

In [5]: df_ref=pd.read_excel(refprots, engine="openpyxl", na_values =[''])
---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
Cell In[5], line 1
----> 1 df_ref=pd.read_excel(refprots, engine="openpyxl", na_values =[''])

  ImportError: Can't determine version for xlrd

In [6]: df_ref=pd.read_excel(refprots, na_values =[''])
---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
Cell In[6], line 1
----> 1 df_ref=pd.read_excel(refprots, na_values =[''])

  ImportError: Can't determine version for xlrd

In [7]: import xlrd

In [8]: xlrd.__version__
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[8], line 1
1 xlrd.__version__

AttributeError: module 'xlrd' has no attribute '__version__'

In [9]: xlrd.__VERSION__
Out[9]: '1.1.0'

In [10]: pd.__version__
Out[10]: '2.1.4'

Originally posted by @brobr in #39355 (comment)

@brobr brobr changed the title I ran into this bug on pandas-2.1 pandas.read_excel 'Can't determine version for xlrd' - old bug on pandas-2.1 Dec 31, 2023
@asishm
Copy link
Contributor

asishm commented Dec 31, 2023

xlrd minimum version supported is 2.0.1 (was changed with pandas 1.4.0 release)

See https://pandas.pydata.org/docs/getting_started/install.html#excel-files

edit: misread -

@rhshadrach
Copy link
Member

Thanks for the report! It seems to me we should only be checking the xlrd version if we intend to use it. Also, I'm wondering if pandas.compat._optional.get_version should either use or fallback to importlib.metadata.version()

@rhshadrach rhshadrach added IO Excel read_excel, to_excel Dependencies Required and optional dependencies Bug labels Dec 31, 2023
@pmhatre1
Copy link
Contributor

pmhatre1 commented Jan 2, 2024

I could reproduce this issue for version 1.1.0 for xlrd @rhshadrach @brobr.The code will work for xlrd: 2.0.1. For the optional file, xlrd version is hardcoded and so whenever object tries to check the version for 1.1.0 in the optional file, it returns a None when we got to getattr(module, ""version"",None) and thus the code breaks.

@pmhatre1
Copy link
Contributor

pmhatre1 commented Jan 2, 2024

Screenshot 2024-01-01 at 9 19 05 PM

@rhshadrach
Copy link
Member

For the optional file, xlrd version is hardcoded...

Yes - this is the minimum supported version. Any version older than 2.0.1 is not supported by pandas. Still, we shouldn't be checking an optional import that you are not trying to use.

...and so whenever object tries to check the version for 1.1.0 in the optional file, it returns a None when we got to getattr(module, ""version"",None) and thus the code breaks.

This is not correct. getattr(module, "__version__", None) does not use the hardcoded value.

@pmhatre1
Copy link
Contributor

pmhatre1 commented Jan 3, 2024

I get it what's happening here @rhshadrach. I should have been a bit more robust with debugging earlier.
getattr(module, "__version__", None) is the source for the error. But as you said is right, its not comparing the hardcoded value of the current version and the earlier version. Infact its getting the version from the info.py file in the xlrd module.

The info.py file mentions version for 2.0.1 as __version__ = __VERSION__ = "2.0.1"
The info.py file mentions version for 1.1.0 as __VERSION__ = "1.1.0"

We are retrieving the version using getattr(module, "__version__", None) and thus the code is breaking after this with import error.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Dependencies Required and optional dependencies IO Excel read_excel, to_excel
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants