Skip to content

DataFrame.itertuples() incorrectly determines when plain tuples should be used #28282

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
plamut opened this issue Sep 4, 2019 · 1 comment · Fixed by #30600
Closed

DataFrame.itertuples() incorrectly determines when plain tuples should be used #28282

plamut opened this issue Sep 4, 2019 · 1 comment · Fixed by #30600
Labels
Regression Functionality that used to work in a prior pandas version Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Milestone

Comments

@plamut
Copy link

plamut commented Sep 4, 2019

Code Sample, a copy-pastable example if possible

>>> import pandas, sys
>>> sys.version
'3.6.7 (default, Oct 25 2018, 09:16:13) \n[GCC 5.4.0 20160609]'
>>> pandas.__version__
'0.25.1'
>>> df = pandas.DataFrame([{f"foo_{i}": f"bar_{i}" for i in range(255)}])
>>> df.itertuples(index=False)
...
SyntaxError: more than 255 arguments

The issue seems to have been caused/revealed by this commit that removed the try-catch block around the namedtuple class creation.

FWIW, this issue is not reproducible in version 0.24.2, and is also not a problem in Python 3.7+, as the limit of the max number of arguments that can be passed to a function has been removed (AFAIK).

Problem description

The condition in itertuples() method does not correctly determine when plain tuples should be used instead of named tuples.

This how the named tuple class template defines the __new__() method (in Python 3.6 at least):

"""
...
def __new__(_cls, {arg_list}):
    ...
"""

If there are 255 column names given, the total number of arguments to __new__() will be 256, because of that extra cls, causing a syntax error.

@plamut plamut changed the title DataFrame.itertuples() incorrectly determines when a plain tuples should be used DataFrame.itertuples() incorrectly determines when plain tuples should be used Sep 4, 2019
@TomAugspurger TomAugspurger added the Regression Functionality that used to work in a prior pandas version label Sep 10, 2019
@TomAugspurger TomAugspurger added this to the Contributions Welcome milestone Sep 10, 2019
@TomAugspurger
Copy link
Contributor

Happy to take a fix for this.

simongibbons added a commit to simongibbons/pandas that referenced this issue Jan 1, 2020
Currently DataFrame.itertuples() has an off by one error
when it inspects whether or not it should return namedtuples
or plain tuples in it's response.

This PR addresses that bug by correcting the condition
that is used when making the check.

Fixes: pandas-dev#28282
simongibbons added a commit to simongibbons/pandas that referenced this issue Jan 1, 2020
Currently DataFrame.itertuples() has an off by one error
when it inspects whether or not it should return namedtuples
or plain tuples in it's response.

This PR addresses that bug by correcting the condition
that is used when making the check.

Closes: pandas-dev#28282
@jreback jreback modified the milestones: Contributions Welcome, 1.0 Jan 1, 2020
@jreback jreback added the Reshaping Concat, Merge/Join, Stack/Unstack, Explode label Jan 1, 2020
simongibbons added a commit to simongibbons/pandas that referenced this issue Jan 1, 2020
Currently DataFrame.itertuples() has an off by one error
when it inspects whether or not it should return namedtuples
or plain tuples in it's response.

This PR addresses that bug by correcting the condition
that is used when making the check.

Closes: pandas-dev#28282
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Regression Functionality that used to work in a prior pandas version Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants