-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
BUG: OverflowError on to_json with numbers larger than sys.maxsize #34395
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I checked that this bug exists in the master version. Output of pd.show_versions()
INSTALLED VERSIONScommit : 62c7dd3 pandas : 1.1.0.dev0+1681.g62c7dd3e7 |
I dug a little and tracked the problem down to the version of dumps specified in import sys
import pandas as pd
pd._libs.json.dumps(sys.maxsize)
pd._libs.json.dumps(sys.maxsize + 1) I'm stuck on finding the actual code for |
I think the implementation comes from the embedded version of ultrajson in pandas/_libs/src/ujson. I'm not sure how it's vendored or gets linked up, though. |
Thanks! It looks to me like the code which does the encoding is in |
I guess that there is no way to fix the problem without messing with the ultrajson source code? In
|
Related to #20599 this isn’t really feasible to do in the ujson source so would probably have to catch and coerce to a serializable type |
@WillAyd Thanks for this! Reading through that thread it seems like a solution to this issue would be to wrap ultrajson's So, instead of: dumps = json.dumps # line 28 we would do something like this: def dumps(obj, default_handler=str, **kwargs):
try:
return json.dumps(obj, **kwargs)
except OverflowError:
return json.dumps(default_handler(obj), **kwargs) This fixes the original error. I checked that with this change the code still passes the unit test in I'm happy to keep working on fixing this is this solution isn't quite right! Once we've settled on the fix, would the next steps be these?
|
Yea if you want to add a test case and submit a pull request we can go from there. Will also want to check the performance benchmarks for JSON which you’ll find more info on here |
* BUG: overflow on to_json with numbers larger than sys.maxsize * TST: overflow on to_json with numbers larger than sys.maxsize (#34395) * DOC: update with issue #34395 * TST: removed unused import * ENH: added case JT_BIGNUM to encode * ENH: added JT_BIGNUM to JSTYPES * BUG: changed error for ints>sys.maxsize into JT_BIGNUM * ENH: removed debug statements * BUG: removed dumps wrapper * removed bigNum from TypeContext * TST: fixed bug in the test * added pointer to string rep converter for BigNum * TST: removed ujson.loads from the test * added getBigNumStringValue * added code to JT_BIGNUM handler by analogy with JT_UTF8 * TST: update pandas/tests/io/json/test_ujson.py Co-authored-by: William Ayd <[email protected]> * added Object_getBigNumStringValue to pyEncoder * added skeletal code for Object_GetBigNumStringValue * completed Object_getBigNumStringValue using PyObject_Repr * BUG: changed Object_getBigNumStringValue * improved Object_getBigNumStringValue some more * update getBigNumStringValue argument * corrected Object_getBigNumStringValue * more fixes to Object_getBigNumStringValue * Update pandas/_libs/src/ujson/python/objToJSON.c Co-authored-by: William Ayd <[email protected]> * Update pandas/_libs/src/ujson/python/objToJSON.c Co-authored-by: William Ayd <[email protected]> * Update pandas/_libs/src/ujson/python/objToJSON.c Co-authored-by: William Ayd <[email protected]> * Update pandas/_libs/src/ujson/python/objToJSON.c Co-authored-by: William Ayd <[email protected]> * Update pandas/_libs/src/ujson/python/objToJSON.c Co-authored-by: William Ayd <[email protected]> * Update pandas/_libs/src/ujson/python/objToJSON.c Co-authored-by: William Ayd <[email protected]> * Update pandas/_libs/src/ujson/python/objToJSON.c * Update pandas/_libs/src/ujson/python/objToJSON.c * updated pyEncoder for JT_BIGNUM * updated pyEncoder * moved getBigNumStringValue to pyEncoder * fixed declaration of Object_getBigNumStringValue * fixed Object_getBigNumStringValue * catch overflow error with PyLong_AsLongLongAndOverflow * remove unnecessary error check * added shortcircuit for error check * simplify int overflow error catching Co-authored-by: William Ayd <[email protected]> * Update long int test in pandas/tests/io/json/test_ujson.py Co-authored-by: William Ayd <[email protected]> * removed tests expecting numeric overflow * remove underscore from overflow Co-authored-by: William Ayd <[email protected]> * removed underscores from _overflow everywhere * fixed small typo * fix type of exc * deleted numeric overflow tests * remove extraneous condition in if statement Co-authored-by: William Ayd <[email protected]> * remove extraneous condition in if statement Co-authored-by: William Ayd <[email protected]> * change _Bool into int Co-authored-by: William Ayd <[email protected]> * Update pandas/_libs/src/ujson/python/objToJSON.c Co-authored-by: William Ayd <[email protected]> * Update pandas/_libs/src/ujson/lib/ultrajsonenc.c Co-authored-by: William Ayd <[email protected]> * allocate an extra byte in Object_getBigNumStringValue Co-authored-by: William Ayd <[email protected]> * allocate an extra byte in Object_getBigNumStringValue Co-authored-by: William Ayd <[email protected]> * reinstate RESERVE_STRING(szlen) in JT_BIGNUM case * replaced (private) with (public) in whatnew * release bytes in Object_endTypeContext * in JT_BIGNUM change if+if into if+else if * added reallocation of bigNum_bytes * removed bigNum_bytes * added to_json test for ints>sys.maxsize * Use python malloc to match PyObject_Free in endTypeContext Co-authored-by: William Ayd <[email protected]> * TST: added manually constructed strs to compare encodings * fixed styling to minimize diff with master * fixed styling * fixed conflicts with master * fix styling to minimize diff * fix styling to minimize diff * fixed styling * added negative nigNum to test_to_json_large_numers * added negative nigNum to test_to_json_large_numers * Update pandas/tests/io/json/test_ujson.py Co-authored-by: William Ayd <[email protected]> * fixe test_to_json_for_large_nums for -ve * TST: added xfail for ujson.encode with long int input * TST: fixed variable names in test_to_json_large_numbers * TST: added xfail test for json.decode Series with long int * TST: added xfail test for json.decode DataFrame with long int * BENCH: added benchmarks for long ints Co-authored-by: William Ayd <[email protected]>
* BUG: overflow on to_json with numbers larger than sys.maxsize * TST: overflow on to_json with numbers larger than sys.maxsize (pandas-dev#34395) * DOC: update with issue pandas-dev#34395 * TST: removed unused import * ENH: added case JT_BIGNUM to encode * ENH: added JT_BIGNUM to JSTYPES * BUG: changed error for ints>sys.maxsize into JT_BIGNUM * ENH: removed debug statements * BUG: removed dumps wrapper * removed bigNum from TypeContext * TST: fixed bug in the test * added pointer to string rep converter for BigNum * TST: removed ujson.loads from the test * added getBigNumStringValue * added code to JT_BIGNUM handler by analogy with JT_UTF8 * TST: update pandas/tests/io/json/test_ujson.py Co-authored-by: William Ayd <[email protected]> * added Object_getBigNumStringValue to pyEncoder * added skeletal code for Object_GetBigNumStringValue * completed Object_getBigNumStringValue using PyObject_Repr * BUG: changed Object_getBigNumStringValue * improved Object_getBigNumStringValue some more * update getBigNumStringValue argument * corrected Object_getBigNumStringValue * more fixes to Object_getBigNumStringValue * Update pandas/_libs/src/ujson/python/objToJSON.c Co-authored-by: William Ayd <[email protected]> * Update pandas/_libs/src/ujson/python/objToJSON.c Co-authored-by: William Ayd <[email protected]> * Update pandas/_libs/src/ujson/python/objToJSON.c Co-authored-by: William Ayd <[email protected]> * Update pandas/_libs/src/ujson/python/objToJSON.c Co-authored-by: William Ayd <[email protected]> * Update pandas/_libs/src/ujson/python/objToJSON.c Co-authored-by: William Ayd <[email protected]> * Update pandas/_libs/src/ujson/python/objToJSON.c Co-authored-by: William Ayd <[email protected]> * Update pandas/_libs/src/ujson/python/objToJSON.c * Update pandas/_libs/src/ujson/python/objToJSON.c * updated pyEncoder for JT_BIGNUM * updated pyEncoder * moved getBigNumStringValue to pyEncoder * fixed declaration of Object_getBigNumStringValue * fixed Object_getBigNumStringValue * catch overflow error with PyLong_AsLongLongAndOverflow * remove unnecessary error check * added shortcircuit for error check * simplify int overflow error catching Co-authored-by: William Ayd <[email protected]> * Update long int test in pandas/tests/io/json/test_ujson.py Co-authored-by: William Ayd <[email protected]> * removed tests expecting numeric overflow * remove underscore from overflow Co-authored-by: William Ayd <[email protected]> * removed underscores from _overflow everywhere * fixed small typo * fix type of exc * deleted numeric overflow tests * remove extraneous condition in if statement Co-authored-by: William Ayd <[email protected]> * remove extraneous condition in if statement Co-authored-by: William Ayd <[email protected]> * change _Bool into int Co-authored-by: William Ayd <[email protected]> * Update pandas/_libs/src/ujson/python/objToJSON.c Co-authored-by: William Ayd <[email protected]> * Update pandas/_libs/src/ujson/lib/ultrajsonenc.c Co-authored-by: William Ayd <[email protected]> * allocate an extra byte in Object_getBigNumStringValue Co-authored-by: William Ayd <[email protected]> * allocate an extra byte in Object_getBigNumStringValue Co-authored-by: William Ayd <[email protected]> * reinstate RESERVE_STRING(szlen) in JT_BIGNUM case * replaced (private) with (public) in whatnew * release bytes in Object_endTypeContext * in JT_BIGNUM change if+if into if+else if * added reallocation of bigNum_bytes * removed bigNum_bytes * added to_json test for ints>sys.maxsize * Use python malloc to match PyObject_Free in endTypeContext Co-authored-by: William Ayd <[email protected]> * TST: added manually constructed strs to compare encodings * fixed styling to minimize diff with master * fixed styling * fixed conflicts with master * fix styling to minimize diff * fix styling to minimize diff * fixed styling * added negative nigNum to test_to_json_large_numers * added negative nigNum to test_to_json_large_numers * Update pandas/tests/io/json/test_ujson.py Co-authored-by: William Ayd <[email protected]> * fixe test_to_json_for_large_nums for -ve * TST: added xfail for ujson.encode with long int input * TST: fixed variable names in test_to_json_large_numbers * TST: added xfail test for json.decode Series with long int * TST: added xfail test for json.decode DataFrame with long int * BENCH: added benchmarks for long ints Co-authored-by: William Ayd <[email protected]>
Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.
Code Sample, a copy-pastable example
Problem description
The Pandas JSON dumper doesn't seem to handle number values larger than
sys.maxsize
(a word). I have a dataframe that I'm trying to write to_json, but it's failing withOverflowError: int too big to convert
. There are some numbers larger than9223372036854775807
in it.Passing a
default_handler
doesn't help. It doesn't get called for the error.Expected Output
Python's built-in json module handles large numbers without issues.
I expect Pandas to be able to output large numbers to JSON. An option to use the built-in
json
module instead ofujson
would be fine.Output of
pd.show_versions()
INSTALLED VERSIONS
commit : None
python : 3.8.3.final.0
python-bits : 64
OS : Linux
OS-release : 4.19.76-linuxkit
machine : x86_64
processor :
byteorder : little
LC_ALL : None
LANG : C.UTF-8
LOCALE : en_US.UTF-8
pandas : 1.0.3
numpy : 1.18.2
pytz : 2019.3
dateutil : 2.8.1
pip : 20.1.1
setuptools : 46.4.0
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 2.11.2
IPython : None
pandas_datareader: None
bs4 : None
bottleneck : None
fastparquet : None
gcsfs : None
lxml.etree : None
matplotlib : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : 0.16.0
pytables : None
pytest : None
pyxlsb : None
s3fs : 0.4.2
scipy : None
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
xlsxwriter : None
numba : None
The text was updated successfully, but these errors were encountered: