ENH: Named tuple fields as column names #11181

max-sixty · 2015-09-23T23:24:17Z

Currently, passing a list of namedtuples (or SQLAlchemy results, which is our specific case) to DataFrame.from_records doesn't resolve the field names.

I think it would make sense to, at least if the names are all the same. Happy to do a PR if people agree.

In [17]: from collections import namedtuple

In [18]: Record = namedtuple('Record',('date','value'))

In [19]: records = [Record(d, v) for (d, v) in zip(pd.date_range('2000',freq='D', periods=20), range(20))]

In [20]: records
Out[20]: 
[Record(date=Timestamp('2000-01-01 00:00:00', offset='D'), value=0),
 Record(date=Timestamp('2000-01-02 00:00:00', offset='D'), value=1),
...
 Record(date=Timestamp('2000-01-19 00:00:00', offset='D'), value=18),
 Record(date=Timestamp('2000-01-20 00:00:00', offset='D'), value=19)]

In [21]: pd.DataFrame.from_records(records)
Out[21]: 
            0   1
0  2000-01-01   0
1  2000-01-02   1
...
18 2000-01-19  18
19 2000-01-20  19

Desired behavior:

Out[21]: 
            date   value
0  2000-01-01   0
1  2000-01-02   1
...
18 2000-01-19  18
19 2000-01-20  19

The text was updated successfully, but these errors were encountered:

jreback · 2015-09-24T00:56:54Z

You could do this, but you would have to then deal with potentially multiple names here (for a single column), and should raise in that case. This is unambiguous with a recarray, but you are passing a list of records-likes here. FYI this is the same for DataFrame(records). Further you would have to be careful about perf here.

jreback · 2015-09-24T00:57:16Z

any reason you are not simply using pd.read_sql which already does this?

max-sixty · 2015-09-24T01:33:40Z

@jreback - we're using SQLAlchemy as an ORM, so we have a SQLAlchemy Query object, which encapsulates joins & filtering.

My understanding is that those methods take a SELECT string - are we missing something, and they'll work with the ORM? We're using this fairly prevalently so I'm keen to know if we're doing something inefficient.

Thanks

max-sixty · 2015-09-24T01:38:42Z

I hear you re performance, if you have to check every item.

I had thought that if the first item a tuple with ._fields, it would be a reasonable default to set the appropriate axis index to those fields. The alternative is integers, so there's not a lot of competition, and the user is free to supply columns / index as ever.

Without that convenience, it does seem like a lot of lifting for pandas vs the benefit.

jreback · 2015-09-24T01:40:17Z

this is prob not that crazy to add
we obviously already support lists of ndarrays and such
so it's just another path

jreback · 2015-09-24T01:46:16Z

side issue: #4916

love to remove from_records entirely (and just have it figured out in the constructor)

jorisvandenbossche · 2015-09-24T12:32:30Z

@MaximilianR regarding the sqlalchemy Query object. At the moment this will indeed not work with read_sql_query, but would that be useful to add this functionality?
I think you can try it out by providing the selectable attribute (http://docs.sqlalchemy.org/en/rel_1_0/orm/query.html#sqlalchemy.orm.query.Query.selectable), something like: pd.read_sql_query(Query().selectable, engine)

max-sixty · 2015-09-24T14:06:55Z

@jorisvandenbossche Yes, that works, thanks!

I asked a Q to SQLAlchemy, to see if those guys had any guidance. I could imagine a few options - leave it as-is, allow read_sql to take a Query object as a very simple wrapper, or something more coupled with SQLAlchemy if there are optimizations
https://bitbucket.org/zzzeek/sqlalchemy/issues/3542/design-for-pandas-access-to-query

jreback added the API Design label Sep 24, 2015

max-sixty mentioned this issue Oct 23, 2015

ENH: namedtuple's fields as columns #11416

Merged

jreback added the Compat pandas objects compatability with Numpy or Python functions label Oct 23, 2015

jreback added this to the 0.17.1 milestone Oct 23, 2015

jreback closed this as completed in #11416 Oct 23, 2015

sqlalchemy-bot mentioned this issue Nov 27, 2018

Design for Pandas access to Query sqlalchemy/sqlalchemy#3542

Closed

ghost mentioned this issue Jul 9, 2019

ENH: Preserve key order when passing list of dicts to DataFrame on py 3.6+ #27309

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: Named tuple fields as column names #11181

ENH: Named tuple fields as column names #11181

max-sixty commented Sep 23, 2015

jreback commented Sep 24, 2015

jreback commented Sep 24, 2015

max-sixty commented Sep 24, 2015

max-sixty commented Sep 24, 2015

jreback commented Sep 24, 2015

jreback commented Sep 24, 2015

jorisvandenbossche commented Sep 24, 2015

max-sixty commented Sep 24, 2015

ENH: Named tuple fields as column names #11181

ENH: Named tuple fields as column names #11181

Comments

max-sixty commented Sep 23, 2015

jreback commented Sep 24, 2015

jreback commented Sep 24, 2015

max-sixty commented Sep 24, 2015

max-sixty commented Sep 24, 2015

jreback commented Sep 24, 2015

jreback commented Sep 24, 2015

jorisvandenbossche commented Sep 24, 2015

max-sixty commented Sep 24, 2015