-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
Misleading error for pd.read_msgpack #27160
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I think the first argument of read_msgpack can *also* be data.
```
In [4]: pd.read_msgpack(b'')
/Users/taugspurger/Envs/pandas-dev/lib/python3.7/site-packages/IPython/core/interactiveshell.py:3296:
FutureWarning: The read_msgpack is deprecated and will be removed in a
future version.
It is recommended to use pyarrow for on-the-wire transmission of pandas
objects.
exec(code_obj, self.user_global_ns, self.user_ns)
Out[4]: []
```
Regardless, I believe we're deprecating read_msgpack so this may not be
worth changing.
…On Mon, Jul 1, 2019 at 8:02 AM Sam Spilsbury ***@***.***> wrote:
Code Sample, a copy-pastable example if possible
import pandas as pd
pd.read_msgpack('this/path/does/not/exist')
Problem description
Such an error is misleading because it suggests that there is a problem
with the datatype being passed, not that the path does not exist. The error
raised is:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File ".local/anaconda3/lib/python3.7/site-packages/pandas/io/packers.py", line 226, in read_msgpack
raise ValueError('path_or_buf needs to be a string file path or file-like')
ValueError: path_or_buf needs to be a string file path or file-like
Expected Output
Raise an error indicating that the path was not found.
Output of pd.show_versions() >>> pd.show_versions() INSTALLED VERSIONS
commit: None
python: 3.7.3.final.0
python-bits: 64
OS: Linux
OS-release: 4.18.0-24-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
pandas: 0.24.2
pytest: 4.3.1
pip: 19.0.3
setuptools: 40.8.0
Cython: 0.29.6
numpy: 1.16.2
scipy: 1.2.1
pyarrow: None
xarray: None
IPython: 7.4.0
sphinx: 1.8.5
patsy: 0.5.1
dateutil: 2.8.0
pytz: 2018.9
blosc: None
bottleneck: 1.2.1
tables: 3.5.1
numexpr: 2.6.9
feather: None
matplotlib: 3.0.3
openpyxl: 2.6.1
xlrd: 1.2.0
xlwt: 1.3.0
xlsxwriter: 1.1.5
lxml.etree: 4.3.2
bs4: 4.7.1
html5lib: 1.0.1
sqlalchemy: 1.3.1
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datarea
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#27160?email_source=notifications&email_token=AAKAOIWGKYF5VVZUERGDRRTP5H54ZA5CNFSM4H4SFK22YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4G4UIWEA>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAKAOIVMPWSJDZCUCOO5F3TP5H54ZANCNFSM4H4SFK2Q>
.
|
yeah this is true of several routines (e.g. read_json), there is an issue about this somewhere. but for msgpack since we are deprecated, this is out of scope (would take a reasonable patch though). |
as PR is submitted :-> |
I think that assuming a string passed to the data as the docs for
>>> import numpy as np
>>> import pandas as pd
>>> from pandas import DataFrame
>>> df = DataFrame(np.random.randn(10, 2))
>>>
>>>
>>> df.to_msgpack(None)
b'\x84\xa3typ\xadblock_manager\xa5klass\xa9DataFrame\xa4axes\x92\x86\xa3typ\xabrange_index\xa5klass\xaaRangeIndex\xa4name\xc0\xa5start\x00\xa4s
top\x02\xa4step\x01\x86\xa3typ\xabrange_index\xa5klass\xaaRangeIndex\xa4name\xc0\xa5start\x00\xa4stop\n\xa4step\x01\xa6blocks\x91\x86\xa4locs\x
86\xa3typ\xa7ndarray\xa5shape\x91\x02\xa4ndim\x01\xa5dtype\xa5int64\xa4data\xd8\x00\x00\x00\x00\x00\x00\x00\x00\x00\x01\x00\x00\x00\x00\x00\x00
\x00\xa8compress\xc0\xa6values\xc7\xa0\x00A\x10\x94Z\x0f|\xd0?F]>R\xc7\xfc\xf5\xbf\xa4\xeb\xe2:\x07X\xc5\xbf&\x1bAje\t\xbb?\x98w9\x17:"\xe1?#\x
e4\xc9\xda\x86\xdf\xaf\xbf\xec\xe63K2\x03\xee\xbf\xad0%v\x11$\xda\xbf\xa1\x02@\xff\xb7\xc8\xff?\xb0G\x11\x02\x80\x13\xe1?)\xf8l\xcb~/\xd2?\xb2\
x17I\xeb\x91k\x03@\xbf\xfaj\xb2\x89\x14\xc2\xbf\xbd5\xba\xb3j\x1c\xed?u\xe504\x17\xaf\xd0\xbf\xc7\xa5\xc3\xf3\x12\xf1\xf4?\xe6\xf0\x05\xf2\xef\
xd6\x05@\xec\xeb\xd1\x80w}\xf0\xbfx\x94\x82\x10"U\xeb?.\xbdZI\x89X\xea?\xa5shape\x92\x02\n\xa5dtype\xa7float64\xa5klass\xaaFloatBlock\xa8compre
ss\xc0'
>>>
>>> pd.read_msgpack(df.to_msgpack(None))
sys:1: FutureWarning: The read_msgpack is deprecated and will be removed in a future version.
It is recommended to use pyarrow for on-the-wire transmission of pandas objects.
0 1
0 0.257572 0.284149
1 -1.374214 2.427524
2 -0.166749 -0.141252
3 0.105612 0.909719
4 0.535428 -0.260687
5 -0.062252 1.308856
6 -0.937890 2.729950
7 -0.408451 -1.030632
8 1.986504 0.854142
9 0.533630 0.823308
>>> |
Hi, I'm confused by the deprecation msg. Compare to read_msgpack, which function in pyarrow is the replacement? |
Code Sample, a copy-pastable example if possible
Problem description
Such an error is misleading because it suggests that there is a problem with the datatype being passed, not that the path does not exist. The error raised is:
Expected Output
Raise an error indicating that the path was not found.
Output of
pd.show_versions()
INSTALLED VERSIONS
commit: None
python: 3.7.3.final.0
python-bits: 64
OS: Linux
OS-release: 4.18.0-24-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
pandas: 0.24.2
pytest: 4.3.1
pip: 19.0.3
setuptools: 40.8.0
Cython: 0.29.6
numpy: 1.16.2
scipy: 1.2.1
pyarrow: None
xarray: None
IPython: 7.4.0
sphinx: 1.8.5
patsy: 0.5.1
dateutil: 2.8.0
pytz: 2018.9
blosc: None
bottleneck: 1.2.1
tables: 3.5.1
numexpr: 2.6.9
feather: None
matplotlib: 3.0.3
openpyxl: 2.6.1
xlrd: 1.2.0
xlwt: 1.3.0
xlsxwriter: 1.1.5
lxml.etree: 4.3.2
bs4: 4.7.1
html5lib: 1.0.1
sqlalchemy: 1.3.1
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datarea
The text was updated successfully, but these errors were encountered: