Skip to content

BigQuery: load_table_from_dataframe fails on datetime64 column used for partitioning, saying it's an INTEGER #9206

@simonvanderveldt

Description

@simonvanderveldt

Versions:

google-cloud-bigquery==1.19.0

We were initially using 1.18.0 but I noticed #9044 was included in 1.19.0 so we tried that as well, but it made no difference.

We're using a pandas dataframe read from parquet, example data would be

id status created_at execution_date
0 1 NEW 2018-09-12 09:24:31.291 2019-05-10 17:40:00
1 2 NEW 2018-09-12 09:26:45.890 2019-05-10 17:40:00
[628 rows x 4 columns]

df.dtypes shows:

id object
status object
created_at datetime64[ns]
dataplatform_execution_date datetime64[ns]
dtype: object

When trying to load this into BigQuery using load_table_from_dataframe() and setting the job config's time_partitioning to bigquery.table.TimePartitioning(field="execution_date") we get the following error:

The field specified for time partitioning can only be of type TIMESTAMP or DATE. The type found is: INTEGER.

Which doesn't really make sense, since the field is clearly a datetime64.
The job config shown in the console looks correct (ie it's set to partitioned by day and it's using the correct field).

edit:
It seems the cause for this is that Dataframe columns of type datetime64 are being converted to type INTEGER instead of DATE (or TIMESTAMP? I'm not sure which one would be the correct type in BigQuery).

edit2:
Could it be this mapping is wrong and it should be DATE instead of DATETIME?

Metadata

Metadata

Assignees

Labels

api: bigqueryIssues related to the BigQuery API.type: questionRequest for information or clarification. Not an issue.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions