Skip to content

Misleading error for pd.read_msgpack #27160

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
smspillaz opened this issue Jul 1, 2019 · 5 comments · Fixed by #27201
Closed

Misleading error for pd.read_msgpack #27160

smspillaz opened this issue Jul 1, 2019 · 5 comments · Fixed by #27201
Labels
Error Reporting Incorrect or improved errors from pandas
Milestone

Comments

@smspillaz
Copy link

Code Sample, a copy-pastable example if possible

import pandas as pd
pd.read_msgpack('this/path/does/not/exist')

Problem description

Such an error is misleading because it suggests that there is a problem with the datatype being passed, not that the path does not exist. The error raised is:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File ".local/anaconda3/lib/python3.7/site-packages/pandas/io/packers.py", line 226, in read_msgpack
    raise ValueError('path_or_buf needs to be a string file path or file-like')
ValueError: path_or_buf needs to be a string file path or file-like

Expected Output

Raise an error indicating that the path was not found.

Output of pd.show_versions()

>>> pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.7.3.final.0
python-bits: 64
OS: Linux
OS-release: 4.18.0-24-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.24.2
pytest: 4.3.1
pip: 19.0.3
setuptools: 40.8.0
Cython: 0.29.6
numpy: 1.16.2
scipy: 1.2.1
pyarrow: None
xarray: None
IPython: 7.4.0
sphinx: 1.8.5
patsy: 0.5.1
dateutil: 2.8.0
pytz: 2018.9
blosc: None
bottleneck: 1.2.1
tables: 3.5.1
numexpr: 2.6.9
feather: None
matplotlib: 3.0.3
openpyxl: 2.6.1
xlrd: 1.2.0
xlwt: 1.3.0
xlsxwriter: 1.1.5
lxml.etree: 4.3.2
bs4: 4.7.1
html5lib: 1.0.1
sqlalchemy: 1.3.1
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datarea

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Jul 1, 2019 via email

@jreback jreback added Error Reporting Incorrect or improved errors from pandas IO Msgpack labels Jul 1, 2019
@jreback
Copy link
Contributor

jreback commented Jul 1, 2019

yeah this is true of several routines (e.g. read_json), there is an issue about this somewhere. but for msgpack since we are deprecated, this is out of scope (would take a reasonable patch though).

@jreback jreback closed this as completed Jul 1, 2019
@jreback jreback modified the milestones: No action, 0.25.0 Jul 1, 2019
@jreback jreback reopened this Jul 1, 2019
@jreback
Copy link
Contributor

jreback commented Jul 1, 2019

as PR is submitted :->

@simonjayhawkins
Copy link
Member

I think the first argument of read_msgpack can also be data.

I think that assuming a string passed to pd.read_msgpack is a filepath and then raising if not found is OK?

the data as bytes works as intended.

the docs for pandas.DataFrame.to_msgpack are misleading https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_msgpack.html?highlight=to_msgpack#pandas-dataframe-to-msgpack suggest that a string is returned when bytes are returned...

path : string File path, buffer-like, or None
if None, return generated string
>>> import numpy as np
>>> import pandas as pd
>>> from pandas import DataFrame
>>> df = DataFrame(np.random.randn(10, 2))
>>>
>>>
>>> df.to_msgpack(None)
b'\x84\xa3typ\xadblock_manager\xa5klass\xa9DataFrame\xa4axes\x92\x86\xa3typ\xabrange_index\xa5klass\xaaRangeIndex\xa4name\xc0\xa5start\x00\xa4s
top\x02\xa4step\x01\x86\xa3typ\xabrange_index\xa5klass\xaaRangeIndex\xa4name\xc0\xa5start\x00\xa4stop\n\xa4step\x01\xa6blocks\x91\x86\xa4locs\x
86\xa3typ\xa7ndarray\xa5shape\x91\x02\xa4ndim\x01\xa5dtype\xa5int64\xa4data\xd8\x00\x00\x00\x00\x00\x00\x00\x00\x00\x01\x00\x00\x00\x00\x00\x00
\x00\xa8compress\xc0\xa6values\xc7\xa0\x00A\x10\x94Z\x0f|\xd0?F]>R\xc7\xfc\xf5\xbf\xa4\xeb\xe2:\x07X\xc5\xbf&\x1bAje\t\xbb?\x98w9\x17:"\xe1?#\x
e4\xc9\xda\x86\xdf\xaf\xbf\xec\xe63K2\x03\xee\xbf\xad0%v\x11$\xda\xbf\xa1\x02@\xff\xb7\xc8\xff?\xb0G\x11\x02\x80\x13\xe1?)\xf8l\xcb~/\xd2?\xb2\
x17I\xeb\x91k\x03@\xbf\xfaj\xb2\x89\x14\xc2\xbf\xbd5\xba\xb3j\x1c\xed?u\xe504\x17\xaf\xd0\xbf\xc7\xa5\xc3\xf3\x12\xf1\xf4?\xe6\xf0\x05\xf2\xef\
xd6\x05@\xec\xeb\xd1\x80w}\xf0\xbfx\x94\x82\x10"U\xeb?.\xbdZI\x89X\xea?\xa5shape\x92\x02\n\xa5dtype\xa7float64\xa5klass\xaaFloatBlock\xa8compre
ss\xc0'
>>>
>>> pd.read_msgpack(df.to_msgpack(None))
sys:1: FutureWarning: The read_msgpack is deprecated and will be removed in a future version.
It is recommended to use pyarrow for on-the-wire transmission of pandas objects.
          0         1
0  0.257572  0.284149
1 -1.374214  2.427524
2 -0.166749 -0.141252
3  0.105612  0.909719
4  0.535428 -0.260687
5 -0.062252  1.308856
6 -0.937890  2.729950
7 -0.408451 -1.030632
8  1.986504  0.854142
9  0.533630  0.823308
>>>

@jreback jreback modified the milestones: 0.25.0, Contributions Welcome Jul 2, 2019
U09Kane added a commit to U09Kane/pandas that referenced this issue Jul 3, 2019
@jreback jreback modified the milestones: Contributions Welcome, 0.25.0 Jul 6, 2019
@happyshows
Copy link

Hi,

I'm confused by the deprecation msg. Compare to read_msgpack, which function in pyarrow is the replacement?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Error Reporting Incorrect or improved errors from pandas
Projects
None yet
5 participants