DOC: Categorical "Memory Usage" uses nbytes instead of `memory_usage(deep=True)`

### Pandas version checks

- [X] I have checked that the issue still exists on the latest versions of the docs on `main` [here](https://pandas.pydata.org/docs/dev/)


### Location of the documentation

https://pandas.pydata.org/docs/dev/user_guide/categorical.html#memory-usage

### Documentation problem

The "Memory Usage" section states that the memory usage of an `object` dtype is a constant times the length of the data, then provides an example of memory usage for `object` and `category` Series using the `.nbytes` property.

In the example provided, the Series data contains only 3-character strings. The documentation does not address the fact that nbytes only includes the size of the "pointers" to the objects (e.g. 8-bytes * 2000 items), and does not include the memory allocated to the string objects themselves (e.g. 52 bytes * 2000 items). An array of longer strings will take up even more memory.

```python
import sys
import pandas as pd

s = pd.Series(["foo", "bar"] * 1000)
sys.getsizeof(s.iloc[0])
>>> 52
s.nbytes
>>> 16000
s.memory_usage(deep=True)
>>> 120128

pd.Series(["foooo", "barrr"] * 1000).memory_usage(deep=True)
>>> 124128
```

Even though this is in the "Gotchas" section, I think it is important to draw attention to the impact of object size. A Categorical will tend to provide better memory reduction on a large objects than on small objects, which might impact whether a user will want to use a Categorical or not. Here's a quick example:
```python
for t in [np.int8, np.int16, np.int32, np.int64]:
    # 64 unique categories, each repeated 16 times
    s = pd.Series([t(i) for i in range(0,64)] * 16)
    print(s.memory_usage(deep=True) - s.astype("category").memory_usage(deep=True))

# Negative => Categorical uses more memory; Positive => Categorical uses less memory
>>> -2616
>>> -1592
>>> 456
>>> 4552
```

### Suggested fix for documentation

I recommend the text be changed to:

> The memory usage of a Categorical is proportional to the number *and size* of categories plus the length of the data. In contrast, an object dtype is *proportional to the size of the objects* times the length of the data.

And the code be changed to use `.memory_usage(deep=True)` for a more accurate understanding of the memory difference.

```python
s = pd.Series(["foo", "bar"] * 1000)

# object dtype
s.memory_usage(deep=True)
>>> 120128

# category dtype
s.astype("category").memory_usage(deep=True)
>>> 2356
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

DOC: Categorical "Memory Usage" uses nbytes instead of `memory_usage(deep=True)` #48438

Pandas version checks

Location of the documentation

Documentation problem

Suggested fix for documentation

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

DOC: Categorical "Memory Usage" uses nbytes instead of memory_usage(deep=True) #48438

Description

Pandas version checks

Location of the documentation

Documentation problem

Suggested fix for documentation

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

DOC: Categorical "Memory Usage" uses nbytes instead of `memory_usage(deep=True)` #48438