Skip to content

BUG: df agg() issue when dataframe has a column called 'name' #36212

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
3 tasks
shyam-sreenivasan opened this issue Sep 8, 2020 · 2 comments · Fixed by #36224
Closed
3 tasks

BUG: df agg() issue when dataframe has a column called 'name' #36212

shyam-sreenivasan opened this issue Sep 8, 2020 · 2 comments · Fixed by #36224
Labels
Apply Apply, Aggregate, Transform, Map Bug
Milestone

Comments

@shyam-sreenivasan
Copy link

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • (optional) I have confirmed this bug exists on the master branch of pandas.


Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.

Code Sample, a copy-pastable example

import pandas as pd
data = {"name": ["abc", "xyz"]}
df = pd.DataFrame(data)
print(df.agg({'name': 'count'}))

Problem description

In the above code snippet, there is a column called 'name' in the dataframe and when executing it an exception is being thrown. Following the stacktrace , it is observed that in line 475 of core/base.py , df.name is being passed to the name argument of

result = Series(result, name=getattr(self, "name", None))

when the dataframe has a column called 'name'.

The same code snippet works fine for any other column name. For example, if we change the column name to nameee. It executes fine.

Stack trace

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
~/.virtualenvs/dimensions-connectors/lib/python3.7/site-packages/pandas/core/base.py in _aggregate(self, arg, *args, **kwargs)
    470             try:
--> 471                 result = DataFrame(result)
    472             except ValueError:

~/.virtualenvs/dimensions-connectors/lib/python3.7/site-packages/pandas/core/frame.py in __init__(self, data, index, columns, dtype, copy)
    467         elif isinstance(data, dict):
--> 468             mgr = init_dict(data, index, columns, dtype=dtype)
    469         elif isinstance(data, ma.MaskedArray):

~/.virtualenvs/dimensions-connectors/lib/python3.7/site-packages/pandas/core/internals/construction.py in init_dict(data, index, columns, dtype)
    282         ]
--> 283     return arrays_to_mgr(arrays, data_names, index, columns, dtype=dtype)
    284 

~/.virtualenvs/dimensions-connectors/lib/python3.7/site-packages/pandas/core/internals/construction.py in arrays_to_mgr(arrays, arr_names, index, columns, dtype, verify_integrity)
     77         if index is None:
---> 78             index = extract_index(arrays)
     79         else:

~/.virtualenvs/dimensions-connectors/lib/python3.7/site-packages/pandas/core/internals/construction.py in extract_index(data)
    386         if not indexes and not raw_lengths:
--> 387             raise ValueError("If using all scalar values, you must pass an index")
    388 

ValueError: If using all scalar values, you must pass an index

During handling of the above exception, another exception occurred:

TypeError                                 Traceback (most recent call last)
~/.virtualenvs/dimensions-connectors/lib/python3.7/site-packages/pandas/core/frame.py in aggregate(self, func, axis, *args, **kwargs)
   7358         try:
-> 7359             result, how = self._aggregate(func, axis=axis, *args, **kwargs)
   7360         except TypeError as err:

~/.virtualenvs/dimensions-connectors/lib/python3.7/site-packages/pandas/core/frame.py in _aggregate(self, arg, axis, *args, **kwargs)
   7383             return result, how
-> 7384         return super()._aggregate(arg, *args, **kwargs)
   7385 

~/.virtualenvs/dimensions-connectors/lib/python3.7/site-packages/pandas/core/base.py in _aggregate(self, arg, *args, **kwargs)
    474                 # we have a dict of scalars
--> 475                 result = Series(result, name=getattr(self, "name", None))
    476 

~/.virtualenvs/dimensions-connectors/lib/python3.7/site-packages/pandas/core/series.py in __init__(self, data, index, dtype, name, copy, fastpath)
    228 
--> 229             name = ibase.maybe_extract_name(name, data, type(self))
    230 

~/.virtualenvs/dimensions-connectors/lib/python3.7/site-packages/pandas/core/indexes/base.py in maybe_extract_name(name, obj, cls)
   5658     if not is_hashable(name):
-> 5659         raise TypeError(f"{cls.__name__}.name must be a hashable type")
   5660 

TypeError: Series.name must be a hashable type

The above exception was the direct cause of the following exception:

TypeError                                 Traceback (most recent call last)
<ipython-input-6-efe06ed8dce0> in <module>
      2 data = {"name": ["abc", "xyz"]}
      3 df = pd.DataFrame(data)
----> 4 print(df.agg({'name': 'count'}))

~/.virtualenvs/dimensions-connectors/lib/python3.7/site-packages/pandas/core/frame.py in aggregate(self, func, axis, *args, **kwargs)
   7363                 f"incompatible data and dtype: {err}"
   7364             )
-> 7365             raise exc from err
   7366         if result is None:
   7367             return self.apply(func, axis=axis, args=args, **kwargs)

TypeError: DataFrame constructor called with incompatible data and dtype: Series.name must be a hashable type

Expected Output

name 2
dtype: int64

Output of pd.show_versions()

INSTALLED VERSIONS

commit : f2ca0a2
python : 3.7.5.final.0
python-bits : 64
OS : Darwin
OS-release : 17.4.0
Version : Darwin Kernel Version 17.4.0: Sun Dec 17 09:19:54 PST 2017; root:xnu-4570.41.2~1/RELEASE_X86_64
machine : x86_64
processor : i386
byteorder : little
LC_ALL : en_US.UTF-8
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

pandas : 1.1.1
numpy : 1.18.4
pytz : 2019.3
dateutil : 2.8.1
pip : 20.1
setuptools : 46.1.3
Cython : 0.29.16
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : 0.9.3
psycopg2 : None
jinja2 : 2.11.2
IPython : 7.16.1
pandas_datareader: None
bs4 : None
bottleneck : None
fsspec : 0.7.4
fastparquet : None
gcsfs : None
matplotlib : 3.2.1
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pytables : None
pyxlsb : None
s3fs : None
scipy : 1.4.1
sqlalchemy : 1.3.11
tables : None
tabulate : 0.7.7
xarray : None
xlrd : 1.2.0
xlwt : None
numba : None

@shyam-sreenivasan shyam-sreenivasan added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Sep 8, 2020
@phofl
Copy link
Member

phofl commented Sep 8, 2020

Hi, thanks for your report.

Looks like the error is in:

~/.virtualenvs/dimensions-connectors/lib/python3.7/site-packages/pandas/core/base.py in _aggregate(self, arg, *args, **kwargs)
    474                 # we have a dict of scalars
--> 475                 result = Series(result, name=getattr(self, "name", None))

In this case self.name is referencing a Series instead of a hashable type

@phofl phofl added Apply Apply, Aggregate, Transform, Map and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Sep 8, 2020
@shyam-sreenivasan
Copy link
Author

Hi, thanks for your report.

Looks like the error is in:

~/.virtualenvs/dimensions-connectors/lib/python3.7/site-packages/pandas/core/base.py in _aggregate(self, arg, *args, **kwargs)
    474                 # we have a dict of scalars
--> 475                 result = Series(result, name=getattr(self, "name", None))

In this case self.name is referencing a Series instead of a hashable type

Yeah thats right. Hope this gets fixed soon. For now, m hacking around this for my development. Thanks a lot.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Apply Apply, Aggregate, Transform, Map Bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants