0% found this document useful (0 votes)
10 views

42.Histograms

A histogram is a graphical representation of numerical data distribution, using bars to show the frequency of data points within specified ranges or 'bins'. The document explains how to create histograms using Python's pandas and matplotlib libraries, including customization options for appearance. It also provides examples of generating histograms from datasets, illustrating how to visualize frequency distributions effectively.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

42.Histograms

A histogram is a graphical representation of numerical data distribution, using bars to show the frequency of data points within specified ranges or 'bins'. The document explains how to create histograms using Python's pandas and matplotlib libraries, including customization options for appearance. It also provides examples of generating histograms from datasets, illustrating how to visualize frequency distributions effectively.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Histograms

A histogram is basically used to represent data provided in a


form of some groups.It is accurate method for the graphical
representation of numerical data distribution.It is a type of bar
plot where X-axis represents the bin ranges while Y-axis gives
information about frequency.

Histograms are column-charts, where each column represents


a range of values, and the height of a column corresponds to
how many values are in that range. To make a histogram, the
data is sorted into "bins" and the number of data points in each
bin is counted. The height of each column in the histogram
is then proportional to the number of data points its bin
contains. The df.plot(kind=’hist’) function automatically selects
the size of the bins based on the spread of values in
the data.

import pandas as pd
import matplotlib.pyplot as plt
data = {'Name':['Arnav', 'Sheela', 'Azhar', 'Bincy', 'Yash',
'Nazar'],
'Height' : [60,61,63,65,61,60],
'Weight' : [47,89,52,58,50,47]}
}

df=pd.DataFrame(data)
How can we make the
df.plot(kind='hist')

plt.show()
The Program 4-9 displays the histogram corresponding to all attributes having
numeric values, i.e., ‘Height’ and ‘Weight’ attributes as shown in Figure 4.9. On
thebasis of the height and weight values provided in the DataFrame, the plot()
calculated the bin values.It is also possible to set value for the bins parameter,
for example,
df.plot(kind=’hist’,bins=20)
df.plot(kind='hist',bins=[18,19,20,21,22])
df.plot(kind='hist',bins=range(18,25)
A histogram is a graph showing frequency distributions. It is a graph showing the
number of observations within each given interval.
Example: Say you ask for the height of 250 people, you might end up with a
histogram like this

2 people from 140 to 145cm


5 people from 145 to 150cm
15 people from 151 to 156cm
31 people from 157 to 162cm
46 people from 163 to 168cm
53 people from 168 to 173cm
45 people from 173 to 178cm
28 people from 179 to 184cm
21 people from 185 to 190cm
4 people from 190 to 195cm

Creating a Histogram
To create a histogram the first step is to create bin of the ranges, then
distribute the whole range of the values into a series of intervals, and
count the values which fall into each of the intervals.Bins are clearly
identified as consecutive, non-overlapping intervals of variables.The
matplotlib.pyplot.hist() function is used to compute and create histogram
of x.
The following table shows the parameters accepted by
matplotlib.pyplot.hist() function :

Attribute parameter
x array or sequence of array
bins optional parameter contains integer or sequence or strings
density optional parameter contains boolean values
range optional parameter represents upper and lower range of bins
optional parameter used to create type of histogram [bar, barstacked, step,
histtype
stepfilled], default is “bar”
align optional parameter controls the plotting of histogram [left, right, mid]
weights optional parameter contains array of weights having same dimensions as x
bottom location of the basline of each bin
rwidth optional parameter which is relative width of the bars with respect to bin width
color optional parameter used to set color or sequence of color specs
label optional parameter string or sequence of string to match with multiple datasets
log optional parameter used to set histogram axis on log scale
Let’s create a basic histogram of some random values. Below code creates
a simple histogram of some random values

In Matplotlib, we use the hist() function to create histograms.


The hist() function will use an array of numbers to create a histogram, the array is
sent into the function as an argument.
For simplicity we use NumPy to randomly generate an array with 250 values, where
the values will concentrate around 170, and the standard deviation is 10. Learn more
about Normal Data Distribution in our Machine Learning Tutorial.

Import matplotlib.pyplot as plt


import numpy as np

x = np.random.normal(170,10,250)

plt.hist(x)
plt.show()
from matplotlib import pyplot as plt
import numpy as np

# Creating dataset
a = np.array([22, 87, 5, 43, 56,
73, 55, 54, 11,
20, 51, 5, 79, 31,
27])

# Creating histogram
fig, ax = plt.subplots(figsize =(10, 7))
ax.hist(a, bins = [0, 25, 50, 75, 100])

# Show plot
plt.show()

Customising Histogram:

Taking the same data as above, now let see how the histogram can be customised. Let
us change the edgecolor, which is the border of each hist, to green.Also, let us change
the line style to ":" and line width to 2. Let us try another property called fill, which
takes boolean values. The default True means each hist will be filled with color and
False means each hist will be empty. Another property called hatch can be used to fill
to each hist with pattern ( '-', '+', 'x', '\\', '*', 'o', 'O', '.'). In the Program 4-10, we have
used the hatch value as "o".

mport pandas as pd
import matplotlib.pyplot as plt
data = {'Name':['Arnav', 'Sheela', 'Azhar','Bincy','Yash',
'Nazar'],
'Height' : [60,61,63,65,61,60],
'Weight' : [47,89,52,58,50,47]}
df=pd.DataFrame(data)
df.plot(kind='hist',edgecolor='Green',linewidth=2,linestyle=':',fil
l=False,hatch='o')
plt.show()

You might also like