Skip to content

BUG/DOC: Styler apply/format do not accept callable subset #46685

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
3 tasks done
tehunter opened this issue Apr 7, 2022 · 5 comments
Open
3 tasks done

BUG/DOC: Styler apply/format do not accept callable subset #46685

tehunter opened this issue Apr 7, 2022 · 5 comments
Labels
Bug Styler conditional formatting using DataFrame.style

Comments

@tehunter
Copy link
Contributor

tehunter commented Apr 7, 2022

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd

test = pd.DataFrame({"A": [1, 2, 3]}, index=["I1", "I2", "I3"])
indexer = lambda df: df.index=="I2"

print(test.loc[indexer])
#     A
# I2  2

test.style.format(lambda x: f"{x:0.3f}", subset=indexer).to_html()
# KeyError: "None of [Index([<function <lambda> at 0x00000________>], dtype='object')] are in the [index]"

Issue Description

The .loc documentation states that a callable function is an allowable input. Styler documentation implies that subset accepts any valid loc input:

A valid 2d input to DataFrame.loc[<subset>], or, in the case of a 1d input or single key, to DataFrame.loc[:, <subset>] where the columns are prioritised, to limit data to before applying the function.

When a callable is passed to the subset argument in styler methods, it raises a KeyError. The root cause is the _non_reducing_slice function does not properly handling callable subsets.

Expected Behavior

The format function is applied correctly to the "I2" index only.

<style type="text/css">\n</style>\n<table id="T_163a1">\n  <thead>\n    <tr>\n      <th class="blank level0" >&nbsp;</th>\n      <th id="T_163a1_level0_col0" class="col_heading level0 col0" >A</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th id="T_163a1_level0_row0" class="row_heading level0 row0" >I1</th>\n      <td id="T_163a1_row0_col0" class="data row0 col0" >1</td>\n    </tr>\n    <tr>\n      <th id="T_163a1_level0_row1" class="row_heading level0 row1" >I2</th>\n      <td id="T_163a1_row1_col0" class="data row1 col0" >2.000</td>\n    </tr>\n    <tr>\n      <th id="T_163a1_level0_row2" class="row_heading level0 row2" >I3</th>\n      <td id="T_163a1_row2_col0" class="data row2 col0" >3</td>\n    </tr>\n  </tbody>\n</table>\n

Installed Versions

INSTALLED VERSIONS

commit : 06d2301
python : 3.10.1.final.0
python-bits : 64
OS : Windows
OS-release : 10
Version :
machine : AMD64
processor :
byteorder : little
LC_ALL : None
LANG : None
LOCALE : English_United States.1252

pandas : 1.4.1
numpy : 1.22.0
pytz : 2021.3
dateutil : 2.8.2
pip : 22.0.4
setuptools : 60.2.0
Cython : None
pytest : 7.0.1
hypothesis : None
sphinx : 3.5.4
blosc : None
feather : None
xlsxwriter : 3.0.2
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 3.0.3
IPython : 8.0.0
pandas_datareader: None
bs4 : 4.10.0
bottleneck : 1.3.2
fastparquet : None
fsspec : None
gcsfs : None
matplotlib : 3.5.1
numba : None
numexpr : 2.8.0
odfpy : None
openpyxl : 3.0.9
pandas_gbq : None
pyarrow : None
pyreadstat : 1.1.4
pyxlsb : None
s3fs : None
scipy : 1.7.3
sqlalchemy : 1.4.32
tables : 3.6.1
tabulate : None
xarray : None
xlrd : 2.0.1
xlwt : None
zstandard : None

@tehunter tehunter added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Apr 7, 2022
@simonjayhawkins
Copy link
Member

Thanks @tehunter for the report.

Note that passing the Indexer directly does not work either, even though it is valid input for df.loc

>>>print(test.loc[[False, True, False]])
    A
I2  2
>>>test.style.format(lambda x: f"{x:0.3f}", subset=[False, True, False]).to_html()
...
IndexError: Boolean index has wrong length: 3 instead of 1

whereas

test.style.format(
    lambda x: f"{x:0.3f}", subset=([False, True, False], slice(None))
).to_html()

gives the expected output in the OP.

This is because, as the documentation states, A valid 2d input to DataFrame.loc[<subset>], or, in the case of a 1d input or single key, to DataFrame.loc[:, <subset>] where the columns are prioritised, to limit data to before applying the function.

So I think the expected output of the code sample in the OP is IndexError: Boolean index has wrong length: 3 instead of 1 as the callable returns a 1d indexer?

@simonjayhawkins simonjayhawkins added Styler conditional formatting using DataFrame.style and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Apr 8, 2022
@simonjayhawkins simonjayhawkins added this to the Contributions Welcome milestone Apr 8, 2022
@tehunter
Copy link
Contributor Author

tehunter commented Apr 8, 2022

Thanks for the additional perspective, I hadn't considered that the indexer function is returning a 1D array. I guess it is somewhat ambiguous then, as the subset argument itself is a single value, but is it a single key? I think it's reasonable to treat a callable as a special case differently that other single keys. Even DataFrame.loc assumes that a callable is not a single key:

indexer_1d = lambda df: df["B"] == 11

df_index = pd.DataFrame({"A": [0, 10], "B": [1, 11]}, index=pd.Index([indexer_1d, "Not a function"]))
print(df_index)
#                                             A   B
# <function <lambda> at 0x000001C599CE5D30>   0   1
# Not a function                             10  11

# Treats argument as callable, not as a scalar index value:
print(df_index.loc[indexer_1d])
#                  A   B
# Not a function  10  11

# Treats argument as an array of index values:
print(df_index.loc[[indexer_1d]])
#                                            A  B
# <function <lambda> at 0x000001C599CE5D30>  0  1

df_index.style.applymap(lambda x: "color:red", subset=indexer_1d)

For the final line, the current Styler behavior treats a callable subset as an index value, just like any other single value:

image

If the behavior mirrored .loc, it would treat a callable subset as a special case with the following output:

image

It seems like a callable value that feeds into DataFrame.loc would be a nice thing to support, I can certainly imagine use cases in my own work. The documentation could look like: "A valid 2d input or callable to DataFrame.loc[<subset>], or, in the case of a 1d input or single key, to DataFrame.loc[:, <subset>] where the columns are prioritised, to limit data to before applying the function."

Let me know if you think that's worth pursuing. I could try to put the pull request together when I have some time.

@simonjayhawkins
Copy link
Member

Let me know if you think that's worth pursuing. I could try to put the pull request together when I have some time.

Thanks @tehunter for the further investigation. Contributions most welcome. I'm not sure the history here, why the subset argument is different from .loc for a 1d indexer, so can't say with certainty how a callable would be treated. @attack68?

@attack68
Copy link
Contributor

attack68 commented Apr 9, 2022

I think the history is that originally there was no subset for the apply method. Then a subset was introduced that allowed column specification, and then, later, combination [row, column] subsets were also allowed. In order to create some kind of documentation that was widely understood 'pandas wise' I referenced the loc framework, although as you have pointed out this is not one to one as there are some missing elements.

Contributions are welcome here to synchronise the usage but be aware:

  • At least one of the functions (hide) can only accept subset in 1d, whereas most others accept 2d.
  • The columns priority should be maintained, otherwise backwards compatibility will be lost and tests will break

@attack68
Copy link
Contributor

attack68 commented Apr 9, 2022

@tehunter on the topic of using callables as an index label to a dataframe, I would consider that a pathological example, which I would expect to be warned against for unpredictable behaviour. I would not design a solution accounting for callables as index labels.

@mroeschke mroeschke removed this from the Contributions Welcome milestone Oct 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Styler conditional formatting using DataFrame.style
Projects
None yet
Development

No branches or pull requests

4 participants