0% found this document useful (0 votes)

35 views27 pages

In - Gov.transport VHINSC

Uploaded by

harshkumarswarnkar123

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

35 views27 pages

In - Gov.transport VHINSC

Uploaded by

harshkumarswarnkar123

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 27

Data Visualization using Pyplot

Data Visualization means representing the data in a graphical format which is easier to understand.
For Data Visualization in Python we are using the Matplotlib library

Matplotlib
Matplotlib is a Python 2D plotting library which produces publication quality figures in a variety of hardcopy
formats and interactive environments across platforms. Matplotlib can be used in Python scripts, the Python
and IPython shells, the Jupyter notebook, web application servers, and different graphical user interface
toolkits.

Types of plots/charts (to be studied as per syllabus):

Some examples of charts are:
1. Line plots
2. Bar plot
3. Histograms

Working with matplotlib

For working with matplotlib usually we use the following import command:
import matplotlib.pyplot as plt

and frequently we need numpy for creating datasets, so numpy is also imported as follows:
import numpy as np

Matplotlib Object Hierarchy

A plot drawn in matplotlib is a hierarchy of nested Python objects as shown below:

• A Figure object is the outermost container for a matplotlib graphic. It is the overall window/page on
which everything is drawn.
• The Axes is the area on which the data is plotted with functions such as plot() and scatter(). A
Figure can contain multiple Axes, but an Axes object is a part of only one Figure.
• Below the Axes in the hierarchy are smaller objects such as tick marks, individual lines, legends, and
text boxes. Almost every “element” of a chart is a Python object which can be manipulated, all the way
down to the ticks and labels.
Parts of a Figure/Plot

Basic Steps involved in drawing any plot

1. Identify the data you want to represent on the plot.

For plots such as line graph it means identify the values that will be represented in the X-axis as well as Y-
axis. For pie-charts, histograms etc. there will usually be only one dataset.
2. Identify the structure of the plot you want
The next step is identifying which plot will be suitable to represent the data accurately. It can be line plot,
bar plot, histogram etc. Also consider whether you want many sets of data to be represented in the same
plot or to show different plots for different sets of data.
3. Setup the different parameters of the plot
Each plot has different components such as the xticks, yticks, the shape/colour of markers/plots, legend
etc. Set the parameters of the plot.
4. Draw the plot.
Line plots:
A line chart or line plot or line graph or curve chart is a type of chart which displays information as a series of
data points called 'markers' connected by straight line segments.

A line chart is often used to visualize a trend in data over intervals of time – a time series – thus the line is often
drawn chronologically.
Plotting a Line Plot

#1 line plot - basic

import matplotlib.pyplot as plt

#1 setup the data

x=[1,2,3,4,5]
y=[2,4,6,8,10]

#2 setup the parameters for the plot

plt.plot(x,y)

#3 display the plot

plt.show()

Output:

1. For drawing any plot usually the matplotlib.pyplot is imported as plt

2. The plot function plt.plot() is used to draw line graphs drawing lines between any two successive values of x
and y.
3. The plot function accepts two datasets, the first one is a list of x-coordinates and the second a list of
corresponding y-coordinates. The number of values in both the x and y lists must be same.
4. The plt.plot(x,y) is used to draw the graph and the plt.show() function is used to display the plot on the
screen.

Variations to the above program:

1. The x- and y-coordinates can be numpy arrays. The advantage of using numpy is that linearly spaced
values of x-coordinates can be generated as follows.
import numpy as np
x=np.linspace(10,120,100)
The above code generates 100 values that are linearly spaced between the end values of 10 and 120.

2. After generating the x-values if the y-values can be represented as an equation in terms of x-variable
then the y values can be generated directly as follows:
y = x*x +2*x + 6
The above code will generate the corresponding y-values for all the x- values that are generated above.
For generating the sin values the following can be used:
y=np.sin(x)

3. For generation of smooth curves the linear spacing for both the x- and y- co-ordinates must be close to
one another for range of values to be displayed otherwise the curves will appear as jagged lines.
#2 line plot - using numpy arrays
import matplotlib.pyplot as plt
import numpy as np

#1 setup the data

x=np.linspace(10,120,100) #generate 100 linearly spaced values betwn 10 & 120
y = x*x +2*x + 6 #generate the corresponding y-values

plt.plot(x,y) #plot and show

plt.show()

x=np.linspace(0,np.pi2,100) #generate 100 values between 0 and 2pi

y=np.sin(x)
plt.plot(x,y)
plt.show()

Output:

Setting the parameters on the graph

The plt object can be used to set the different parameters of the plot and then display the plot.
The various parameters that can be set are:
1. plt.xlabel('time') - xlabel sets the x-axis label
2. plt.ylabel('speed') - ylabel sets the y-axis label
3. plt.yticks([5,7,10]) - yticks sets the tick marks that appear on y-axis
4. plt.xticks([1,3,4],['abc','def','ghi']) - xticks sets the ticks to appear on the x-axis at
points [1,3,4] the second parameter changes the corresponding labels to ['abc','def','ghi'].
5. plt.grid() - displays the gridlines
6. plt.legend() - displays the legend using the labels for the corresponding plots. legend is drawn
only after the plot() is called since it takes the labels from plot function.
7. Besides the plot function can accept a third parameter the format string -
plt.plot(x,y,'>--c', label='car 1')
The format string (fmt) has the following specification:
fmt = '[marker][line][color]'
All the values are optional and the possible values for marker, line and color are shown below:

marker line style color

character description character description character color
'.' point marker '-' solid line style 'b' blue
',' pixel marker '--' dashed line style 'g' green
dash-dot line
'o' circle marker '-.' 'r' red
style
'v' triangle_down marker ':' dotted line style 'c' cyan
'^' triangle_up marker 'm' magenta
'<' triangle_left marker 'y' yellow
'>' triangle_right marker 'k' black
'1' tri_down marker 'w' white
'2' tri_up marker
'3' tri_left marker
'4' tri_right marker
's' square marker
'p' pentagon marker
'*' star marker
'h' hexagon1 marker
'H' hexagon2 marker
'+' plus marker
'x' x marker
'D' diamond marker
'd' thin_diamond marker
'|' vline marker
'_' hline marker
Example of format strings:
'b' # blue markers with default shape
'or' # red circles
'-g' # green solid line
'--' # dashed line with default color
'^k:' # black triangle_up markers connected by a dotted line

#3 line plot - setting parameters

import matplotlib.pyplot as plt

#1 setup the data

x=[1,2,3,4,5]
y=[2,4,6,8,10]

#2 setup the parameters for the plot

plt.xlabel('time') #xlabel - x axis label
plt.ylabel('speed') #ylabel - y axis label
plt.title('speed vs time') # title - title of plot
plt.xticks([1,3,4],['abc','def','ghi']) #xticks - ticks on x-axis
plt.yticks([5,7,10]) #yticks - ticks on y-axis
plt.plot(x,y,'>--c', label='car 1') #using format strings
plt.legend() #display legend using label of plot
plt.grid() #display gridlines

#3 display the plot

plt.show()

Output:
Multiple plots in the same figure
Multiple plots can be drawn in the same figure by any one of the following methods:
1. By using the plt.plot() function multiple times with different data sets and parameters each time.
2. By using a single plot function with multiple parameters for x and y variables as shown below:
plt.plot(x1,y1,'formatstring1', x2,y2,'formatstring2')
When using this method, the labels for both the plots must be passed as a list while calling the legend() function

#3 line plot - multiple plots on same figure

import matplotlib.pyplot as plt

#1 setup the data

x1=[1,2,3,4,5] #dataset for first plot
y1=[2,4,6,8,10]

x2=[3,7,9,12,15,17] #dataset for second plot

y2=[3,9,12,18,23,27]

#2 setup the common parameters

plt.xlabel('time')
plt.ylabel('location')
plt.title('speed vs time')
plt.xticks([8,13,16],['loc1','loc2','loc3']) #ticks can be assigned names
plt.yticks([5,12,25])
plt.grid()

#3 parameters for first plot

plt.plot(x1,y1,'>--c', label='car 1') #car 1 dataset is x1,y1

#4 parameters for second plot

plt.plot(x2,y2,'o-.g', label='car 2') #car 2 dataset is x2,y2

plt.legend() #display legend only after plotting all graphs

#5 display the plot

plt.show()

#6 another way of plotting multiple plots

#all the other parameter are erased on calling show the second time
#if needed parameter must be set again
plt.plot(x1,y1,'>--c', x2,y2,'o-.g') #plot does not accept multiple labels
plt.legend(['car 1', 'car 2']) #multiple entries in legend written here
plt.show()

Output:
Plotting from a DataFrame
The plot() function can accept the source data for the x- and y- coordinates from an object having tabular data
such as from a DataFrame. The syntax used for plot() in this case is:
plt.plot('column_name_for_x_axis', 'column_name_for_y_axis',data=DataFrameName, label='labelname')

While using this method, we can also plot multiple plots from the same DataFrame by calling the plot() function
multiple times with different x- and y- data's.
#4 line plot - plotting from a DataFrame
import matplotlib.pyplot as plt
import pandas as pd

df1=pd.read_csv('.\chennai_reservoir_levels.csv')
#print('df1=\n', df1)

#1 plotting a single plot

plt.xticks(rotation=90) #rotate the xticks
plt.plot('Date','POONDI',data=df1, label='POONDI')
plt.legend()
plt.show()

#2 multiple plots in same Figure from DataFrame

plt.xticks(rotation=90) #rotate the xticks
plt.plot('Date','POONDI', data=df1)
plt.plot('Date','CHOLAVARAM', data=df1)
plt.plot('Date','REDHILLS', data=df1)
plt.plot('Date','CHEMBARAMBAKKAM', data=df1)
plt.legend(['POONDI','CHOLAVARAM','REDHILLS','CHEMBARAMBAKKAM'])
plt.show()

Output:
The source data in file 'chennai_reservoir_levels.csv' is shown below:
Date POONDI CHOLAVARAM REDHILLS CHEMBARAMBAKKAM
01-01-2018 1012 513 1585 1842
01-02-2018 1387 451 1368 1693
01-03-2018 2011 398 1194 1507
01-04-2018 1611 100 1660 1215
01-05-2018 396 70 1779 1198
01-06-2018 184 68 1427 1214
01-07-2018 132 61 1120 906
01-08-2018 50 26 920 628
01-09-2018 13 1 713 445
01-10-2018 93 8 478 338
01-11-2018 695 20 809 232
01-12-2018 381 40 1102 185
01-01-2019 298 48 941 102
01-03-2019 477 48 520 22
01-04-2019 333 42 301 10
01-05-2019 193 11 125 2
Plotting multiple subplots
We can plot multiple subplots in the same Figure object by dividing the Figure object into subplots as shown
below:
plt.subplots(num_of_rows, num_of_columns, sharex=False, sharey=False)

where
num_of_rows - is the number of rows in the figure
num_of_columns - is the number of columns in the Figure
sharex - if we want the subplots to share the xticks across the subplots then sharex must be set to True. If sharex
is True then xticks is shown only in the bottom-most plot. (Default value is False)
sharey - if we want the subplots to share the yticks across subplots then sharey must be set to True. If sharey is
True, then yticks is shown only for the leftmost plot (Default value is False)

The subplots() function returns two values, the figure object and the axes object. The figure object refers to the
entire drawing area. The axes objects can be used in two ways:
Method 1:
f1, (ax1,ax2) = plt.subplots(1,2,sharey=True)
The figure object is divided into 1 row and 2 columns i.e. two subplots are created. The first subplot is
assigned to object ax1 and the second subplot is assigned to ax2
Method 2:
f1, ax = plt.subplots(2,2,sharex=True, sharey=True)
The figure object is divided into 2 rows and 2 columns i.e. four subplots are created. All the four objects
are passed as a matrix to the ax object. Individual axes object is accessed using the matrix notation, i.e.
the first subplot is ax[0,0], the second subplot is ax[0,1], third subplot is ax[1,0], fourth subplot is ax[1,1].

ax[0,0] ax[0,1]

ax1 ax2

ax[1,0] ax[1,1]

f1, (ax1,ax2) = plt.subplots(1,2,sharey=True) f1, ax = plt.subplots(2,2,sharex=True, sharey=True)

After getting the individual axes objects we can use the plot() function with the individual axes objects and draw
independent plots in each of the axes object areas.

While using the individual axes objects the following care is to be taken:

1. For setting ticks on x- and y- axes, instead of plt.xticks() and plt.yticks() use the functions :
ax1.set_xticks() and ax1.set_yticks() functions
2. For rotating the labels - instead of plt.xticks(rotation=90) use :
ax[1,0].tick_params( axis='x', labelrotation =90)
#4 line plot - plotting multiple plots in different subplots
import matplotlib.pyplot as plt
import pandas as pd

df1=pd.read_csv('.\chennai_reservoir_levels.csv')
#print('df1=\n', df1)

#1 creating 1x2 grid subplots

f1, (ax1,ax2) = plt.subplots(1,2,sharey=True) #yticks shown only once per row

#2 set the parameters for the two subplots

ax1.set_xticks([]) #with axis objects use set_xticks not xticks
ax2.set_xticks([])
ax1.plot('Date','POONDI', data=df1,label='POONDI')
ax2.plot('Date','CHOLAVARAM', data=df1, label='CHOLAVARAM')
ax1.legend()
ax2.legend()

#3 Display the plot

plt.show()

Output:
#5 line plot - 4x4 subplots, sharing axes, rotating labels
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np

df1=pd.read_csv('.\chennai_reservoir_levels.csv')
#print('df1=\n', df1)

#1 creating 4x4 grid

f2, ax = plt.subplots(2,2,sharey=True,sharex=True) #using single axes object

#2 using array notation to access individual subplot

ax[0,0].plot('Date','POONDI', data=df1, label='POONDI' )
ax[0,1].plot('Date','CHOLAVARAM', data=df1, label='CHOLAVARAM')
ax[1,0].plot('Date','REDHILLS', data=df1, label='REDHILLS')
ax[1,1].plot('Date','CHEMBARAMBAKKAM', data=df1, label='CHEMBARAMBAKKAM')

ax[1,0].set_xticks(np.arange(0,18,5)) #show only 0,5,10,15 th reading dates

#rotate labels in x-axis by 90 degrees

ax[1,0].tick_params( axis='x', labelrotation =90)
ax[1,1].tick_params( axis='x', labelrotation =90)

#display all legends

ax[0,0].legend()
ax[0,1].legend()
ax[1,0].legend()
ax[1,1].legend()
plt.show()

Output:
Bar plot:
A bar chart or bar graph is a chart or graph that presents categorical data with rectangular bars with heights or
lengths proportional to the values that they represent. The bars can be plotted vertically or horizontally.

A bar graph shows comparisons among discrete categories. One axis of the chart shows the specific categories
being compared, and the other axis represents a measured value.
Plotting a Bar Graph

The syntax for plotting a Bar Graph is:

plt.bar(x, height, width=0.8, bottom=None, align='center', data=None)

where:
x : sequence of scalars which form the x coordinates of the bars
height: sequence of scalars which form the heights of the bars.
width: scalar or array-like, optional which are the width(s) of the bars (default: 0.8)
bottom: scalar or array-like, optional. The y coordinate(s) of the bars bases (default: 0)
align: {'center', 'edge'}, optional, default: 'center'. It shows the alignment of the bars to the x coordinates:
'center': Center the base on the x positions.
'edge': Align the left edges of the bars with the x positions.
data: If the source of the data is another matrix like structure such as a DataFrame then the name of the object
is mentioned here.

Explanation regarding width of bar graph

If the x-coordinates are numbers then width is in the same unit as the number on the x-axis.

If the x-coordinates are not numbers (for e.g. strings) then the width is the fractional part of the distance
between one xtick and another xtick. For example consider the xticks line below

By default width=0.8. This means that between the xticks 'a' and 'b' , 0.8 i.e. 80 percent of the space will be
occupied by the bar graph and 20 percent will be the space between one bar and the next bar.

a b c

The other parameters of the bar graph such as xlabel, ylabel, title, xticks, yticks, legend are same as the line
plots and can be set using the plt object.
# 7 Simple bar plot
import matplotlib.pyplot as plt

x1=[10, 20, 30, 40, 50]

y1 = [35, 60, 75, 25, 90]
plt.bar(x1,y1) #width of graph is 0.8 since x1 is numeric data
plt.show()

x2=['a', 'b', 'c', 'd', 'e']

y2 = [35, 60, 75, 25, 90]
plt.bar(x2,y2) #width of graph is 80% since x2 is string data
plt.show()

Output:
# 8 bar plot setting parameters
import matplotlib.pyplot as plt

x=['a', 'b', 'c', 'd', 'e']

y = [35, 60, 75, 25, 90]

plt.xlabel('city') #xlabel - x axis label

plt.ylabel('number of birds') #ylabel - y axis label
plt.title('Birds in Cities') # title - title of plot
plt.yticks(y) #yticks - ticks on y-axis
plt.bar(x,y,label='Birds')

plt.legend() #display legend using label of plot

plt.grid() #display gridlines

plt.show()

Output:
Displaying Bar plot from a DataFrame

Consider the excel file 'product_sales.xlsx' containing the following data and imported into DataFrame df:

Sales Area Chocolate Cake Biscuit

Area A 20 5 20
Area M 30 9 12
Area B 12 12 18
Area N 8 7 23

For displaying data from a DataFrame df, we use the appropriate column names for the x- and y-coordinates and
pass the parameter data=df when using the bar() function.

# 9 bar plot from DataFrame

import matplotlib.pyplot as plt
import pandas as pd

df=pd.read_excel('product_sales.xlsx')

plt.bar('Sales Area','Chocolate',data=df, label='Chocolate')

plt.legend() #display legend using label of plot

plt.grid() #display gridlines

plt.show()

Output:
Displaying grouped bar chart

For displaying grouped data, we change the x-position on the x-axis where the bar for each of the individual
plots should appear. Consider that we want to draw three bar plots in the same figure for 'Area A', 'Area B' and
'Area C' to be shown on the x-axis. On the y-axis we want the data for 'Chocolate', 'Cake' and 'Biscuit' i.e. three
groups to be shown. The steps are as follows:

1. First change the xticks to start from 1,2,3 and so on. The names that are displayed on the tick marks can
be 'Area A', 'Area B' and 'Area C'.

x-axis

1 2 3

Area A Area B Area B

2. The distance between any two xticks is 1. This is the maximum width that is available for displaying all
the three bar plots. Select any one width such that the sum of all the widths of the three bar plots is less
than 1. For example if we select the width as 0.2, then since we display three bar plots, the combined
width becomes (0.2 x 3) = 0.6, which is less than 1. The remaining (1 - 0.6) =0.4 is the empty space
between one grouped bar plot and the next grouped bar plot.
3. Next step is rearrange the x-positions of the three individual bar plots so that they are adjacent and do
not overlap.
For doing so, the following method is adopted.

width of individual bar plot, wd=0.2

wd wd wd

x-axis

(x-wd) (x) (x+wd)

1 2 3

Area A Area B Area B

a) The centre bar plot of blue colour is centered exactly at the xtick position, (x)
b) The first bar bar plot of green colour is centered at (x-wd)
c) The third bar plot of red colour is centered at (x+wd)
i.e. for displaying three grouped bar plots, the first bar plot will be plotted at (x-wd) position, second bar plot
will be plotted at (x) position and the third bar plot will be plotted at (x+wd) position.

Similar method is adopted for displaying any number of grouped data. For example for displaying two grouped
bar plots, the first bar plot will be plotted at (x-wd/2) position and the second bar plot will be plotted at
(x+wd/2) position.
# 10 Displaying grouped bar charts
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np

df=pd.read_excel('product_sales.xlsx')
x=np.arange(1, len(df)+1) #x values are 1,2,3,4…
wd=0.2 #width of bar plot=0.2
xlbl=df['Sales Area']

plt.xticks(x,xlbl)
plt.bar(x-wd,'Chocolate',data=df, label='Chocolate', width=wd)
plt.bar(x,'Cake',data=df, label='Cake', width=wd)
plt.bar(x+wd,'Biscuit',data=df, label='Biscuit', width=wd)

plt.legend() #display legend using label of plot

plt.grid() #display gridlines

plt.show()

Output:
Displaying Stacked bar plots

For displaying stacked bar plot we use the parameter bottom=[y values] while plotting the second bar plot. The
[y values] are the list of y-coordinate values of the first bar plot on which we want to stack the second bar plot.

Consider the following population data of Males and Females contained in the excel file 'population.xlsx'
for which we want to show the stacked bar plot.

city Male Female Total

Area A 20 17 37
Area M 30 19 49
Area B 12 15 27
Area N 8 7 15

# 11 Displaying stacked bar charts

import matplotlib.pyplot as plt
import pandas as pd

df=pd.read_excel('population.xlsx')

plt.bar('city','Male',data=df, label='Male')
plt.bar('city','Female',data=df, label='Female', bottom='Male')

plt.legend() #display legend using label of plot

plt.grid() #display gridlines

plt.show()

Output:
Horizontal Bar plot

To plot a horizontal bar plot use the barh() with the following syntax:

plt.barh(y, width, height=0.8, align='center', data=None)

where:
y : sequence of scalars which form the y coordinates of the bars
width: scalar or array-like, which are the width(s) of the bars on the x-axis
height: sequence of scalars which form the heights of the bars (default: 0.8)
align: {'center', 'edge'}, optional, default: 'center'. It shows the alignment of the bars to the y coordinates:
'center': Center the base on the x positions.
'edge': Align the left edges of the bars with the x positions.
data: If the source of the data is another matrix like structure such as a DataFrame then the name of the object
is mentioned here.

# 12 Displaying horizontal bar plots

import matplotlib.pyplot as plt
import pandas as pd

df=pd.read_excel('population.xlsx')

plt.barh('city','Male',data=df, label='Male')

plt.legend() #display legend using label of plot

plt.grid() #display gridlines

plt.show()

Output:
Histogram
Histogram is a graphical display of data using bars of different heights to group numbers into ranges. The height
of each bar shows how many of the data fall in that particular range.

A histogram is an accurate representation of the distribution of numerical data. It differs from a bar graph, in
the sense that a bar graph relates two variables, but a histogram relates only one variable.
How to draw a Histogram
Example 1: For the dataset containing CGPA of 15 students shown below draw the histogram for bin size 10:

6.1, 4.12, 8.2, 6.4, 3.6, 9.2, 5.5, 8.4, 6.2, 9.8, 5.3, 3.9, 8.1, 6.1, 2.7

Step 1: Calculate the range of the data set

range = largest value - smallest value = 9.8 - 2.7 = 7.1

Step 2: Divide the range by the number of groups you want and then round up.
For example we want to divide the data set into 10 groups (in python if bin size is not mentioned then 10 is
taken as the default bin size), and then the width of each group is found by
class-width = range / number of groups = 7.1 / 10 = 0.71
Therefore class width = 0.71

Step 3: Use the class width to create your groups

The smallest value is 2.7 and class-width is 0.71, so first class or first bin is from 2.7 to (2.7 + 0.71) i.e. from 2.7
to 3.41.
The second class or second bin is from 3.41 to (3.41 +0.71) i.e. second bin is 3.41 to 4.12 and so on…

Draw the following table with the classes/bins:

Bin Classes Tally Frequency
First Bin [2.7 – 3.41)
Second Bin [3.41 – 4.12)
Third Bin [4.12 – 4.83)
Fourth Bin [4.83 – 5.54)
Fifth Bin [5.54 – 6.25)
Sixth Bin [6.25 – 6.96)
Seventh Bin [6.96-7.67)
Eighth Bin [7.67-8.38)
Ninth Bin [8.38-9.09)
Tenth Bin [9.09-9.8]

For any class/bin square brackets [ or ] means including that number and round brackets ) means excluding
that number. For example, [2.7 – 3.41) means the range from 2.7 (including 2.7) till 3.41(excluding 3.41).

Only the last bin has both ends with square brackets which mean that both 9.09 and 9.8 will be counted in the
last bin.

[Note: Due to limitations of storing floating point numbers accurately in a computer, sometimes the values
appearing at the boundaries are taken in the adjacent bin. For e.g. if 5.54 is coming in the data, then ideally is
should come in bin [5.54 – 6.25) but due to inaccuracies in floating point number calculation/storage it is
considered in the bin [4.83 – 5.54). Apart from this python follows the boundary rules as explained above ]
Step 4: Fill the tally column
For each element in the dataset find the correct bin and place a tally mark ( | ) against that bin. The filled in
table is shown below:
Bin Classes Tally Frequency
First Bin [2.7 – 3.41) |
Second Bin [3.41 – 4.12) ||
Third Bin [4.12 – 4.83) |
Fourth Bin [4.83 – 5.54) ||
Fifth Bin [5.54 – 6.25) |||
Sixth Bin [6.25 – 6.96) |
Seventh Bin [6.96-7.67)
Eighth Bin [7.67-8.38) ||
Ninth Bin [8.38-9.09) |
Tenth Bin [9.09-9.8] ||

Step 5: Fill the Frequency column

Count the number of tally marks and fill the frequency column
Bin Classes Tally Frequency
First Bin [2.7 – 3.41) | 1
Second Bin [3.41 – 4.12) || 2
Third Bin [4.12 – 4.83) | 1
Fourth Bin [4.83 – 5.54) || 2
Fifth Bin [5.54 – 6.25) ||| 3
Sixth Bin [6.25 – 6.96) | 1
Seventh Bin [6.96-7.67) 0
Eighth Bin [7.67-8.38) || 2
Ninth Bin [8.38-9.09) | 1
Tenth Bin [9.09-9.8] || 2
Setp 6: Draw the histogram
Take the classes in the X-axis and the Frequency on the Y-axis and draw the histogram.
Drawing histogram using hist() function of pyplot
The hist() function can be used to draw a histogram. It accepts only a single dimensional 1D array or a list to
draw the histogram. The other properties of plot object such as setting xlabels, ylabels, xticks, yticks etc. remain
the same as line/bar plots.

In its simplest form histogram is drawn using the command:

plt.hist(data, bins=10)
where-
data - is the list or 1D array containing the data on which histogram is to be created
bins - it can either be a number or a list. If it is a single number it denotes the number of intervals of the
histogram we want. If bins parameters is a list, then the elements of the list are the bin edges. The
number of bin edges must be one greater than the number of intervals needed for the histogram. If bins
parameter is not passed a default value of 10 is taken.

Program : Drawing Histogram using pyplot

#18 Histogram using pyplot

import matplotlib.pyplot as plt
import pandas as pd

dict1={ 'student': ['s1','s2','s3','s4','s5','s6','s7','s8','s9','s10',

's11','s12','s13', 's14', 's15'],
'cgpa': [6.1,4.12,8.2,6.4,3.6,9.2,5.5,8.4,6.2,9.8,
5.3,3.9,8.1,6.1,2.7],
'numattempts':[1,2,1,3,1,1,2,1,1,2,1,1,3,1,2] }

df1=pd.DataFrame(dict1)
data = df1['cgpa'] #extract 1D data on which the histogram is to be drawn

plt.xlabel('cgpa range')
plt.ylabel('Number of Students')
plt.title('cgpa range vs Number of students')
plt.grid()
plt.hist(data,bins=10)
plt.show() #bin edges not shown automatically

Output:

Whenever python draws histogram the xticks are automatically calculated, which is not aligned to the bin edges.
Histogram - displaying bin edges correctly
There are two ways to display the bin edges correctly-
1. Create a list of bin edges. Use this list as the value for bins parameter of hist() function and set the xticks
to this list of bin edges.
2. Use the return value of the hist() function. The hist() function when called returns back a tuple. The first
element of the tuple is the numpy array containing the frequencies of the intervals of the histogram.
The second element is the numpy array containing the bin edges of the histogram that was created. The
number of bin edges is one more than the frequencies of the histogram.

#19 Histogram setting bin edges correctly

import matplotlib.pyplot as plt

data= [6.1,4.12,8.2,6.4,3.6,9.2,5.5,8.4,6.2,9.8,5.3,3.9,8.1,6.1,2.7]

#method 1 setting bin edges manually and setting the xticks to bin edges
b1=[3,5.5,7.5,10]
plt.hist(data, bins=b1)
plt.xticks(b1)
plt.grid()
plt.show()

#method2 accessing the bin edges returned by hist function

rtval = plt.hist(data)
print('rtval=', rtval)
print('rtval[0]=', rtval[0]) #contains frequencies
print('rtval[1]=', rtval[1]) #contains bin edges
plt.xticks(rtval[1]) #set xticks to bin edges
plt.grid()
plt.show()
Output:

rtval[0]= [1. 2. 1. 2. 3. 1. 0. 2. 1. 2.]

rtval[1]= [2.7 3.41 4.12 4.83 5.54 6.25 6.96 7.67 8.38 9.09 9.8 ]

Predicted IGCSE Computer Science Paper 2025
0% (1)
Predicted IGCSE Computer Science Paper 2025
15 pages
Case Analysis Nora Sakari
100% (1)
Case Analysis Nora Sakari
5 pages
A+ Guide To Managing & Maintaining Your PC PDF
No ratings yet
A+ Guide To Managing & Maintaining Your PC PDF
45 pages
Codebreaker 17 Answers
No ratings yet
Codebreaker 17 Answers
7 pages
Data Visualization
No ratings yet
Data Visualization
28 pages
Data Visualization
No ratings yet
Data Visualization
26 pages
CSK W Data Visualization
No ratings yet
CSK W Data Visualization
17 pages
Advance Python Unit - 3
No ratings yet
Advance Python Unit - 3
14 pages
MATPLOTLIB For Python
No ratings yet
MATPLOTLIB For Python
46 pages
07. Matplotlib
No ratings yet
07. Matplotlib
20 pages
Data Visualization Using Python
No ratings yet
Data Visualization Using Python
44 pages
DataVisualization - 1 Surya Sir
No ratings yet
DataVisualization - 1 Surya Sir
51 pages
Module - 5
No ratings yet
Module - 5
30 pages
Python Univ V
No ratings yet
Python Univ V
16 pages
Data Visualization Part Notes - 1
No ratings yet
Data Visualization Part Notes - 1
9 pages
Data Visualization
No ratings yet
Data Visualization
66 pages
Lab 10
No ratings yet
Lab 10
16 pages
Matplotlib
No ratings yet
Matplotlib
30 pages
Unit 6 Data Visualization-1
No ratings yet
Unit 6 Data Visualization-1
30 pages
MATPLOTLIB NOTES Pandas
No ratings yet
MATPLOTLIB NOTES Pandas
17 pages
Mod 5
No ratings yet
Mod 5
61 pages
Module-5 DSV
No ratings yet
Module-5 DSV
72 pages
Mat Plot Lib
No ratings yet
Mat Plot Lib
21 pages
Unit 4 python
No ratings yet
Unit 4 python
12 pages
Unit 5
No ratings yet
Unit 5
30 pages
Python Matplotlib
No ratings yet
Python Matplotlib
20 pages
Mat Plot Lib
No ratings yet
Mat Plot Lib
44 pages
DSS Module 5
No ratings yet
DSS Module 5
64 pages
Unit 4 Data Visualization using Matplotlib - Copy
No ratings yet
Unit 4 Data Visualization using Matplotlib - Copy
42 pages
Matplotlib 1
No ratings yet
Matplotlib 1
14 pages
01-Matplotlib
No ratings yet
01-Matplotlib
2 pages
Chapter 2 - part 1 - Line charts & Bar charts
No ratings yet
Chapter 2 - part 1 - Line charts & Bar charts
50 pages
Mat Plot Lib
No ratings yet
Mat Plot Lib
22 pages
Matplotlib Sample Program
No ratings yet
Matplotlib Sample Program
12 pages
Matplotlib in Python
No ratings yet
Matplotlib in Python
43 pages
Mat Plot Lib
No ratings yet
Mat Plot Lib
24 pages
Matplotlib 1
No ratings yet
Matplotlib 1
29 pages
CHAPTER-2 Data Visualization
No ratings yet
CHAPTER-2 Data Visualization
4 pages
Unit 5 Maplotlib 2
No ratings yet
Unit 5 Maplotlib 2
22 pages
Data Visualization Using Matplotlib
No ratings yet
Data Visualization Using Matplotlib
30 pages
DS UNIT-V
No ratings yet
DS UNIT-V
49 pages
Python Matplotlib 2
No ratings yet
Python Matplotlib 2
48 pages
Unit 4 (2) Python
No ratings yet
Unit 4 (2) Python
27 pages
XII-IP - Data Visualisation
No ratings yet
XII-IP - Data Visualisation
65 pages
Chapter-5- Matplotlib-Part-1
No ratings yet
Chapter-5- Matplotlib-Part-1
63 pages
Introduction To Matplotlib Using Python For Beginners
No ratings yet
Introduction To Matplotlib Using Python For Beginners
14 pages
Unit III-1
No ratings yet
Unit III-1
91 pages
Module 5-2
No ratings yet
Module 5-2
56 pages
a9bf73_Introduction to Matplotlib
No ratings yet
a9bf73_Introduction to Matplotlib
18 pages
Python Plotting Beginners Guide
No ratings yet
Python Plotting Beginners Guide
15 pages
Basic Beginners' Introduction To Plotting in Python: Sarah Blyth July 23, 2009
No ratings yet
Basic Beginners' Introduction To Plotting in Python: Sarah Blyth July 23, 2009
15 pages
Mat Plot Lib
No ratings yet
Mat Plot Lib
18 pages
Matplotlib_Functions
No ratings yet
Matplotlib_Functions
32 pages
Ch 4 Plotting Data Using Mathplotlib 2024-25
No ratings yet
Ch 4 Plotting Data Using Mathplotlib 2024-25
29 pages
Matplotlib
No ratings yet
Matplotlib
23 pages
Pyplot
No ratings yet
Pyplot
14 pages
Matplotseabornfinal
No ratings yet
Matplotseabornfinal
103 pages
2.5. Introduction To Matplotlib 1
No ratings yet
2.5. Introduction To Matplotlib 1
45 pages
XII-DataVisualization
No ratings yet
XII-DataVisualization
34 pages
Data Visulation
No ratings yet
Data Visulation
8 pages
Plotting Data Using Matplotlib
No ratings yet
Plotting Data Using Matplotlib
32 pages
Graphs with MATLAB (Taken from "MATLAB for Beginners: A Gentle Approach")
From Everand
Graphs with MATLAB (Taken from "MATLAB for Beginners: A Gentle Approach")
Peter Kattan
4/5 (2)
Profound Python Data Science
From Everand
Profound Python Data Science
Onder Teker
No ratings yet
Line Drawing Algorithm: Mastering Techniques for Precision Image Rendering
From Everand
Line Drawing Algorithm: Mastering Techniques for Precision Image Rendering
Fouad Sabry
No ratings yet
Voltage Detector - Model 1700 & 1710
No ratings yet
Voltage Detector - Model 1700 & 1710
3 pages
Reading test 1_ 4th.
No ratings yet
Reading test 1_ 4th.
3 pages
Maths Worksheets Set-5
100% (1)
Maths Worksheets Set-5
158 pages
VERSI 1-Latihan
No ratings yet
VERSI 1-Latihan
7 pages
SC 672
100% (2)
SC 672
11 pages
Crash 2024 08 28 - 14.23.59 FML
No ratings yet
Crash 2024 08 28 - 14.23.59 FML
9 pages
50SS Us
No ratings yet
50SS Us
24 pages
BLR-RDP 13-09-2023, RDP-BLR 17-9-2023
No ratings yet
BLR-RDP 13-09-2023, RDP-BLR 17-9-2023
5 pages
Development of Refrigeration Oil For Use With R32
No ratings yet
Development of Refrigeration Oil For Use With R32
7 pages
Validations I
No ratings yet
Validations I
4 pages
ASTM-D2859-16-2021-
No ratings yet
ASTM-D2859-16-2021-
3 pages
Parts List: M0-3304E-OE4 M0-3314E-BE6 M0-3316E-DE4
No ratings yet
Parts List: M0-3304E-OE4 M0-3314E-BE6 M0-3316E-DE4
36 pages
DPV 60hz Technical Data DP Pumps PDF
100% (1)
DPV 60hz Technical Data DP Pumps PDF
88 pages
PEDUC 10 Module-3 - FINAL
No ratings yet
PEDUC 10 Module-3 - FINAL
3 pages
Walvis Bay, Namibia: Medium-Voltage Solution To Erongo RED's Paratus Substation
No ratings yet
Walvis Bay, Namibia: Medium-Voltage Solution To Erongo RED's Paratus Substation
2 pages
Functional Project
No ratings yet
Functional Project
4 pages
E-Procurement Government of Karnataka
No ratings yet
E-Procurement Government of Karnataka
93 pages
Microcontroller Inputs and Outputs
No ratings yet
Microcontroller Inputs and Outputs
1 page
Supply Autarsys ESS Medium
No ratings yet
Supply Autarsys ESS Medium
4 pages
CoDaPack An Excel and Visual Basic Based Software
No ratings yet
CoDaPack An Excel and Visual Basic Based Software
9 pages
Thermo Quiz 1 (OAC)
No ratings yet
Thermo Quiz 1 (OAC)
4 pages
Lesson Plan ME410A (R16)
No ratings yet
Lesson Plan ME410A (R16)
2 pages
APAC-Newsletter 1st-Edition March2022
No ratings yet
APAC-Newsletter 1st-Edition March2022
10 pages
Nikola Tesla: Thomas Edison
No ratings yet
Nikola Tesla: Thomas Edison
2 pages
Service Bulletin Letter Index
No ratings yet
Service Bulletin Letter Index
198 pages
Mint Delhi 18-06
No ratings yet
Mint Delhi 18-06
22 pages

In - Gov.transport VHINSC

Uploaded by

In - Gov.transport VHINSC

Uploaded by

Data Visualization using Pyplot

Types of plots/charts (to be studied as per syllabus):

Working with matplotlib

Matplotlib Object Hierarchy

Basic Steps involved in drawing any plot

1. Identify the data you want to represent on the plot.

#1 line plot - basic

#1 setup the data

#2 setup the parameters for the plot

#3 display the plot

1. For drawing any plot usually the matplotlib.pyplot is imported as plt

Variations to the above program:

#1 setup the data

plt.plot(x,y) #plot and show

x=np.linspace(0,np.pi*2,100) #generate 100 values between 0 and 2*pi

Setting the parameters on the graph

marker line style color

#3 line plot - setting parameters

#1 setup the data

#2 setup the parameters for the plot

#3 display the plot

#3 line plot - multiple plots on same figure

#1 setup the data

x2=[3,7,9,12,15,17] #dataset for second plot

#2 setup the common parameters

#3 parameters for first plot

#4 parameters for second plot

plt.legend() #display legend only after plotting all graphs

#5 display the plot

#6 another way of plotting multiple plots

#1 plotting a single plot

#2 multiple plots in same Figure from DataFrame

f1, (ax1,ax2) = plt.subplots(1,2,sharey=True) f1, ax = plt.subplots(2,2,sharex=True, sharey=True)

#1 creating 1x2 grid subplots

#2 set the parameters for the two subplots

#3 Display the plot

#1 creating 4x4 grid

#2 using array notation to access individual subplot

ax[1,0].set_xticks(np.arange(0,18,5)) #show only 0,5,10,15 th reading dates

#rotate labels in x-axis by 90 degrees

#display all legends

The syntax for plotting a Bar Graph is:

Explanation regarding width of bar graph

x1=[10, 20, 30, 40, 50]

x2=['a', 'b', 'c', 'd', 'e']

x=['a', 'b', 'c', 'd', 'e']

plt.xlabel('city') #xlabel - x axis label

plt.legend() #display legend using label of plot

Sales Area Chocolate Cake Biscuit

# 9 bar plot from DataFrame

plt.bar('Sales Area','Chocolate',data=df, label='Chocolate')

plt.legend() #display legend using label of plot

Area A Area B Area B

width of individual bar plot, wd=0.2

(x-wd) (x) (x+wd)

Area A Area B Area B

plt.legend() #display legend using label of plot

city Male Female Total

# 11 Displaying stacked bar charts

plt.legend() #display legend using label of plot

plt.barh(y, width, height=0.8, align='center', data=None)

# 12 Displaying horizontal bar plots

plt.legend() #display legend using label of plot

Step 1: Calculate the range of the data set

Step 3: Use the class width to create your groups

Draw the following table with the classes/bins:

Step 5: Fill the Frequency column

In its simplest form histogram is drawn using the command:

Program : Drawing Histogram using pyplot

#18 Histogram using pyplot

dict1={ 'student': ['s1','s2','s3','s4','s5','s6','s7','s8','s9','s10',

#19 Histogram setting bin edges correctly

#method2 accessing the bin edges returned by hist function

rtval[0]= [1. 2. 1. 2. 3. 1. 0. 2. 1. 2.]

You might also like

x=np.linspace(0,np.pi2,100) #generate 100 values between 0 and 2pi