Skip to content

pandas.concat doesn't accept a deque #8645

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
waveform80 opened this issue Oct 27, 2014 · 2 comments · Fixed by #8668
Closed

pandas.concat doesn't accept a deque #8645

waveform80 opened this issue Oct 27, 2014 · 2 comments · Fixed by #8668
Labels
API Design Docs Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Milestone

Comments

@waveform80
Copy link
Contributor

The pandas.concat routine's behaviour is slightly at odds with its documentation (and error messages). The documentation states that the first parameter (objs) accepts "list or dict of Series, DataFrame, ..." but the routine is rather more forgiving and appears to accept lists, tuples, dicts, and generator objects (very useful!); hence my impression (during usage) was that it accepted "iterables" generally. Unfortunately it turns out this isn't the case; attempting to concatenate a deque of DataFrame objects results in the following:

from collections import deque
import pandas as pd

df = pd.DataFrame.from_dict({'a': [1, 2, 3], 'b': [4, 5, 6]})
d = deque((df, df, df))
pd.concat(d)
----> 1 pd.concat(d)

/home/dave/arcticenv/lib/python3.4/site-packages/pandas/tools/merge.py in concat(objs, axis, join, join_axes, ignore_index, keys, levels, names, verify_integrity, copy)
    720                        keys=keys, levels=levels, names=names,
    721                        verify_integrity=verify_integrity,
--> 722                        copy=copy)
    723     return op.get_result()
    724 

/home/dave/arcticenv/lib/python3.4/site-packages/pandas/tools/merge.py in __init__(self, objs, axis, join, join_axes, keys, levels, names, ignore_index, verify_integrity, copy)
    735             raise TypeError('first argument must be a list-like of pandas '
    736                             'objects, you passed an object of type '
--> 737                             '"{0}"'.format(type(objs).__name__))
    738 
    739         if join == 'outer':

TypeError: first argument must be a list-like of pandas objects, you passed an object of type "deque"

The error message states that the first argument "must be a list-like of pandas objects" (which is slightly different to the documentation, and closer to the actual implementation's behaviour). Given that a deque is iterable but not indexable (similar to a generator expression) it seems to fulfil the criteria of being "list-like".

Digging into the implementation, the test that's failing appears to be the first line of _Concatenator.__init__ in pandas.tools.merge which reads as follows (in my installation):

        if not isinstance(objs, (list,tuple,types.GeneratorType,dict,TextFileReader)):
            raise TypeError('first argument must be a list-like of pandas '
                            'objects, you passed an object of type '
                            '"{0}"'.format(type(objs).__name__))

So it appears the actual set of iterables accepted by pandas.concat is lists, tuples, generator expressions, dicts, and instances of TextFileReader. I suggest that it might be better to check for (and act upon) special cases and otherwise assume that objs is a suitable iterable of DataFrame objects. In other words, get rid of that check entirely, and add a couple of checks for "expected" special cases (such as a user mistakenly passing a DataFrame as objs; there's already a check in place for dicts a bit further on).

The conversion of objs to a list-comprehension later on (below if keys is None) should raise a TypeError in the case that it isn't iterable so the change shouldn't cause much impact (i.e. in the case of a non-iterable passed as objs, it'll raise an exception of the same class as the existing code).

If this sounds reasonable, I'm happy to provide a pull request?

@jreback
Copy link
Contributor

jreback commented Oct 27, 2014

happy to have a pull request to do this
the issue is that list-like can be satisfied by a single pandas object (eg series/DataFrame) and then doesn't give a sensible error message

so I guess the true meaning here is an iterable that may or may not be indexable (eg dict) that is not a pandas object might work (and maybe need to add generator too as it's not technically in that definition)

doc string improvement also useful

pls add your example as a test but can't change any existing tests

@jreback jreback added the Reshaping Concat, Merge/Join, Stack/Unstack, Explode label Oct 27, 2014
@jreback jreback added this to the 0.15.1 milestone Oct 27, 2014
@waveform80
Copy link
Contributor Author

Opened pull request #8668 with implementation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design Docs Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants