BUG/ENH: Bad columns dtype when creating empty DataFrame #22858

lowerthansound · 2018-09-27T17:44:46Z

Code Sample

>>> df = pd.DataFrame(columns=list('ABC'), dtype='int64')
>>> df
Empty DataFrame
Columns: [A, B, C]
Index: []
>>> df.dtypes
A    float64
B    float64
C    float64
dtype: object

Problem description

When creating a DataFrame with no rows, the presence of a dtype argument may convert the columns into float64. The problem does not happen if the DataFrame has one or more rows:

>>> df = pd.DataFrame([[1, 2, 3]], columns=list('ABC'), dtype='int64')
>>> df
   A  B  C
0  1  2  3
>>> df.dtypes
A    int64
B    int64
C    int64
dtype: object

Expected Output

>>> df = pd.DataFrame(columns=list('ABC'), dtype='int64')
>>> df.dtypes
A    int64
B    int64
C    int64
dtype: object

Output of `pd.show_versions()`

commit: None
python: 3.6.6.final.0
python-bits: 64
OS: Linux
OS-release: 4.18.5-arch1-1-ARCH
machine: x86_64
processor:
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.23.4
pytest: 3.8.0
pip: 10.0.1
setuptools: 40.2.0
Cython: 0.28.5
numpy: 1.15.1
scipy: 1.1.0
pyarrow: 0.9.0
xarray: 0.10.8
IPython: 6.5.0
sphinx: 1.7.9
patsy: 0.5.0
dateutil: 2.7.3
pytz: 2018.5
blosc: None
bottleneck: 1.2.1
tables: 3.4.4
numexpr: 2.6.8
feather: 0.4.0
matplotlib: 2.2.3
openpyxl: 2.5.5
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 1.1.0
lxml: 4.2.5
bs4: 4.6.3
html5lib: 1.0.1
sqlalchemy: 1.2.11
pymysql: 0.9.2
psycopg2: None
jinja2: 2.10
s3fs: 0.1.6
fastparquet: 0.1.6
pandas_gbq: None
pandas_datareader: None

The text was updated successfully, but these errors were encountered:

JustinZhengBC · 2018-10-02T22:34:41Z

This seems to be intended behaviour, as demonstrated by the following test in pandas/tests/frame/test_constructor.py::TestDataFramConstructors::test_constructor_corner

df = DataFrame(index=lrange(10), columns=['a', 'b'], dtype=int)
    assert df.values.dtype == np.dtype('float64')

The code responsible for this behaviour is found in pandas/core/dtypes/cast.py, on line 1223. Commenting out these two lines causes the above test, and no others, to fail in the pytest suite.

if is_integer_dtype(dtype) and isna(value):
    dtype = np.float64

lowerthansound · 2018-10-03T00:17:33Z

I don't feel this is intended behavior, but it may be a rough corner produced by the code you mentioned.

In the issue sample, the columns are empty, therefore, no need to upcast to float:

>>> df = pd.DataFrame(columns=list('ABC'), dtype='int64')
>>> df
Empty DataFrame
Columns: [A, B, C]
Index: []

In the test case you mentioned, though, the DataFrame must be filled with NaN and therefore float is needed:

>>> df = pd.DataFrame(index=range(10), columns=['a', 'b'], dtype=int)
>>> df
    a   b
0 NaN NaN
1 NaN NaN
2 NaN NaN
3 NaN NaN
4 NaN NaN
5 NaN NaN
6 NaN NaN
7 NaN NaN
8 NaN NaN
9 NaN NaN

JustinZhengBC · 2018-10-03T00:53:50Z

Good point. Theoretically it could be fixed by making the int cast to float only if an lrange is specified. I can try it out later and submit a PR if the tests pass.

sinhrks added Dtype Conversions Unexpected or buggy dtype conversions MultiIndex labels Oct 1, 2018

JustinZhengBC mentioned this issue Oct 3, 2018

BUG GH22858 When creating empty dataframe, only cast int to float if index given #22963

Merged

4 tasks

jreback added this to the 0.24.0 milestone Oct 4, 2018

WillAyd closed this as completed in #22963 Oct 4, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG/ENH: Bad columns dtype when creating empty DataFrame #22858

BUG/ENH: Bad columns dtype when creating empty DataFrame #22858

lowerthansound commented Sep 27, 2018 •

edited

Loading

JustinZhengBC commented Oct 2, 2018

lowerthansound commented Oct 3, 2018 •

edited

Loading

JustinZhengBC commented Oct 3, 2018

BUG/ENH: Bad columns dtype when creating empty DataFrame #22858

BUG/ENH: Bad columns dtype when creating empty DataFrame #22858

Comments

lowerthansound commented Sep 27, 2018 • edited Loading

Code Sample

Problem description

Expected Output

Output of pd.show_versions()

JustinZhengBC commented Oct 2, 2018

lowerthansound commented Oct 3, 2018 • edited Loading

JustinZhengBC commented Oct 3, 2018

lowerthansound commented Sep 27, 2018 •

edited

Loading

Output of `pd.show_versions()`

lowerthansound commented Oct 3, 2018 •

edited

Loading