DOC: Categorical "Memory Usage" uses nbytes instead of memory_usage(deep=True)
#48438
Open
1 task done
Labels
Pandas version checks
main
hereLocation of the documentation
https://pandas.pydata.org/docs/dev/user_guide/categorical.html#memory-usage
Documentation problem
The "Memory Usage" section states that the memory usage of an
object
dtype is a constant times the length of the data, then provides an example of memory usage forobject
andcategory
Series using the.nbytes
property.In the example provided, the Series data contains only 3-character strings. The documentation does not address the fact that nbytes only includes the size of the "pointers" to the objects (e.g. 8-bytes * 2000 items), and does not include the memory allocated to the string objects themselves (e.g. 52 bytes * 2000 items). An array of longer strings will take up even more memory.
Even though this is in the "Gotchas" section, I think it is important to draw attention to the impact of object size. A Categorical will tend to provide better memory reduction on a large objects than on small objects, which might impact whether a user will want to use a Categorical or not. Here's a quick example:
Suggested fix for documentation
I recommend the text be changed to:
And the code be changed to use
.memory_usage(deep=True)
for a more accurate understanding of the memory difference.The text was updated successfully, but these errors were encountered: