Skip to content

ENH: Named tuple fields as column names #11181

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
max-sixty opened this issue Sep 23, 2015 · 8 comments
Closed

ENH: Named tuple fields as column names #11181

max-sixty opened this issue Sep 23, 2015 · 8 comments
Labels
API Design Compat pandas objects compatability with Numpy or Python functions
Milestone

Comments

@max-sixty
Copy link
Contributor

Currently, passing a list of namedtuples (or SQLAlchemy results, which is our specific case) to DataFrame.from_records doesn't resolve the field names.

I think it would make sense to, at least if the names are all the same. Happy to do a PR if people agree.

In [17]: from collections import namedtuple

In [18]: Record = namedtuple('Record',('date','value'))

In [19]: records = [Record(d, v) for (d, v) in zip(pd.date_range('2000',freq='D', periods=20), range(20))]

In [20]: records
Out[20]: 
[Record(date=Timestamp('2000-01-01 00:00:00', offset='D'), value=0),
 Record(date=Timestamp('2000-01-02 00:00:00', offset='D'), value=1),
...
 Record(date=Timestamp('2000-01-19 00:00:00', offset='D'), value=18),
 Record(date=Timestamp('2000-01-20 00:00:00', offset='D'), value=19)]

In [21]: pd.DataFrame.from_records(records)
Out[21]: 
            0   1
0  2000-01-01   0
1  2000-01-02   1
...
18 2000-01-19  18
19 2000-01-20  19

Desired behavior:

Out[21]: 
            date   value
0  2000-01-01   0
1  2000-01-02   1
...
18 2000-01-19  18
19 2000-01-20  19
@jreback
Copy link
Contributor

jreback commented Sep 24, 2015

You could do this, but you would have to then deal with potentially multiple names here (for a single column), and should raise in that case. This is unambiguous with a recarray, but you are passing a list of records-likes here. FYI this is the same for DataFrame(records). Further you would have to be careful about perf here.

@jreback
Copy link
Contributor

jreback commented Sep 24, 2015

any reason you are not simply using pd.read_sql which already does this?

@max-sixty
Copy link
Contributor Author

@jreback - we're using SQLAlchemy as an ORM, so we have a SQLAlchemy Query object, which encapsulates joins & filtering.

My understanding is that those methods take a SELECT string - are we missing something, and they'll work with the ORM? We're using this fairly prevalently so I'm keen to know if we're doing something inefficient.

Thanks

@max-sixty
Copy link
Contributor Author

I hear you re performance, if you have to check every item.

I had thought that if the first item a tuple with ._fields, it would be a reasonable default to set the appropriate axis index to those fields. The alternative is integers, so there's not a lot of competition, and the user is free to supply columns / index as ever.

Without that convenience, it does seem like a lot of lifting for pandas vs the benefit.

@jreback
Copy link
Contributor

jreback commented Sep 24, 2015

this is prob not that crazy to add
we obviously already support lists of ndarrays and such
so it's just another path

@jreback
Copy link
Contributor

jreback commented Sep 24, 2015

side issue: #4916

love to remove from_records entirely (and just have it figured out in the constructor)

@jorisvandenbossche
Copy link
Member

@MaximilianR regarding the sqlalchemy Query object. At the moment this will indeed not work with read_sql_query, but would that be useful to add this functionality?
I think you can try it out by providing the selectable attribute (http://docs.sqlalchemy.org/en/rel_1_0/orm/query.html#sqlalchemy.orm.query.Query.selectable), something like: pd.read_sql_query(Query().selectable, engine)

@max-sixty
Copy link
Contributor Author

@jorisvandenbossche Yes, that works, thanks!

I asked a Q to SQLAlchemy, to see if those guys had any guidance. I could imagine a few options - leave it as-is, allow read_sql to take a Query object as a very simple wrapper, or something more coupled with SQLAlchemy if there are optimizations
https://bitbucket.org/zzzeek/sqlalchemy/issues/3542/design-for-pandas-access-to-query

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design Compat pandas objects compatability with Numpy or Python functions
Projects
None yet
Development

No branches or pull requests

3 participants