You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The pandas.concat routine's behaviour is slightly at odds with its documentation (and error messages). The documentation states that the first parameter (objs) accepts "list or dict of Series, DataFrame, ..." but the routine is rather more forgiving and appears to accept lists, tuples, dicts, and generator objects (very useful!); hence my impression (during usage) was that it accepted "iterables" generally. Unfortunately it turns out this isn't the case; attempting to concatenate a deque of DataFrame objects results in the following:
----> 1 pd.concat(d)
/home/dave/arcticenv/lib/python3.4/site-packages/pandas/tools/merge.py in concat(objs, axis, join, join_axes, ignore_index, keys, levels, names, verify_integrity, copy)
720 keys=keys, levels=levels, names=names,
721 verify_integrity=verify_integrity,
--> 722 copy=copy)
723 return op.get_result()
724
/home/dave/arcticenv/lib/python3.4/site-packages/pandas/tools/merge.py in __init__(self, objs, axis, join, join_axes, keys, levels, names, ignore_index, verify_integrity, copy)
735 raise TypeError('first argument must be a list-like of pandas '
736 'objects, you passed an object of type '
--> 737 '"{0}"'.format(type(objs).__name__))
738
739 if join == 'outer':
TypeError: first argument must be a list-like of pandas objects, you passed an object of type "deque"
The error message states that the first argument "must be a list-like of pandas objects" (which is slightly different to the documentation, and closer to the actual implementation's behaviour). Given that a deque is iterable but not indexable (similar to a generator expression) it seems to fulfil the criteria of being "list-like".
Digging into the implementation, the test that's failing appears to be the first line of _Concatenator.__init__ in pandas.tools.merge which reads as follows (in my installation):
ifnotisinstance(objs, (list,tuple,types.GeneratorType,dict,TextFileReader)):
raiseTypeError('first argument must be a list-like of pandas ''objects, you passed an object of type ''"{0}"'.format(type(objs).__name__))
So it appears the actual set of iterables accepted by pandas.concat is lists, tuples, generator expressions, dicts, and instances of TextFileReader. I suggest that it might be better to check for (and act upon) special cases and otherwise assume that objs is a suitable iterable of DataFrame objects. In other words, get rid of that check entirely, and add a couple of checks for "expected" special cases (such as a user mistakenly passing a DataFrame as objs; there's already a check in place for dicts a bit further on).
The conversion of objs to a list-comprehension later on (below if keys is None) should raise a TypeError in the case that it isn't iterable so the change shouldn't cause much impact (i.e. in the case of a non-iterable passed as objs, it'll raise an exception of the same class as the existing code).
If this sounds reasonable, I'm happy to provide a pull request?
The text was updated successfully, but these errors were encountered:
happy to have a pull request to do this
the issue is that list-like can be satisfied by a single pandas object (eg series/DataFrame) and then doesn't give a sensible error message
so I guess the true meaning here is an iterable that may or may not be indexable (eg dict) that is not a pandas object might work (and maybe need to add generator too as it's not technically in that definition)
doc string improvement also useful
pls add your example as a test but can't change any existing tests
The
pandas.concat
routine's behaviour is slightly at odds with its documentation (and error messages). The documentation states that the first parameter (objs) accepts "list or dict of Series, DataFrame, ..." but the routine is rather more forgiving and appears to accept lists, tuples, dicts, and generator objects (very useful!); hence my impression (during usage) was that it accepted "iterables" generally. Unfortunately it turns out this isn't the case; attempting to concatenate a deque of DataFrame objects results in the following:The error message states that the first argument "must be a list-like of pandas objects" (which is slightly different to the documentation, and closer to the actual implementation's behaviour). Given that a deque is iterable but not indexable (similar to a generator expression) it seems to fulfil the criteria of being "list-like".
Digging into the implementation, the test that's failing appears to be the first line of
_Concatenator.__init__
inpandas.tools.merge
which reads as follows (in my installation):So it appears the actual set of iterables accepted by
pandas.concat
is lists, tuples, generator expressions, dicts, and instances ofTextFileReader
. I suggest that it might be better to check for (and act upon) special cases and otherwise assume thatobjs
is a suitable iterable ofDataFrame
objects. In other words, get rid of that check entirely, and add a couple of checks for "expected" special cases (such as a user mistakenly passing aDataFrame
as objs; there's already a check in place for dicts a bit further on).The conversion of
objs
to a list-comprehension later on (belowif keys is None
) should raise aTypeError
in the case that it isn't iterable so the change shouldn't cause much impact (i.e. in the case of a non-iterable passed asobjs
, it'll raise an exception of the same class as the existing code).If this sounds reasonable, I'm happy to provide a pull request?
The text was updated successfully, but these errors were encountered: