Session 13, Data Visualization
Session 13, Data Visualization
Data visualization is the discipline of trying to understand data by placing it in a visual context so
that patterns, trends and correlations can be exposed. Python offers multiple great graphing
libraries that come packed with lots of different features.
Few popular plotting libraries:
• Matplotlib: low level, provides lots of freedom
• Pandas Visualization: easy to use interface, built on Matplotlib
• Seaborn: high-level interface, great default styles
• ggplot: based on R’s ggplot2, uses Grammar of Graphics
• Plotly: can create interactive plots
matplotlib
matplotlib is the most popular Python library for producing plots and other 2D data visualizations.
It is well-suited for creating plots suitable for publication.
It integrates well with IPython, thus providing a comfortable interactive environment for plotting and
exploring data.
The plots are also interactive; you can zoom in on a section of the plot and pan around the plot using
the toolbar in the plot window.
Importing Datasets
import pandas as pd
df=pd.read_csv('irisData.csv')
df
Scatter Plot
df=pd.read_csv('irisData.csv')
fig, ax=plt.subplots()
ax.scatter(df['Sepal.Length'], df['Sepal.Width'])
ax.set_title('Iris Dataset')
ax.set_xlabel('sepal_length')
ax.set_ylabel('sepal_width')
Line Chart (1 of 2) Economy Data
Year Unemployment_Rate
1920 9.8
In Matplotlib we can create a bar chart using the plot method.
1930 12
1940 8
fig, ax=plt.subplots()
ax.plot(Year, Unemployment_Rate, color="blue", marker=".")
ax.set_title("Unemployment Data", fontsize="18")
ax.set_xlabel("Year", fontsize="14")
ax.set_ylabel("Unemployment Rate", fontsize="14")
Line Chart (2 of 2) Unemployment Data
Year Unemployment_Rate
fig, ax=plt.subplots()
ax.plot(df[0], df[1], color="blue", marker=".")
ax.set_title("Unemployment Data", fontsize="18")
ax.set_xlabel("Year", fontsize="14")
ax.set_ylabel("Unemployment Rate", fontsize="14")
Bar Chart (1 of 2)
#Data in list
# Data in DataFrame
import pandas as pd
import matplotlib.pyplot as plt
data_df=pd.DataFrame([[1,5], [2,6], [3,7], [4, 8], [5,5]])
fig, ax = plt.subplots()
ax.bar(data_df[0], data_df[1])
ax.set_title(‘Bar Plot')
ax.set_xlabel("X-Axis")
ax.set_ylabel("Y-Axis")
Histogram
In Matplotlib we can create a Histogram using the hist method. If we pass it categorical data
like the points column from the wine-review dataset it will automatically calculate how often
each class occurs.
import pandas as pd
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
ax.hist([1,1,1, 3,3, 4,5,5,5,5,5,])
ax.set_title('Histogram')
ax.set_xlabel("X-Axis")
ax.set_ylabel("Y-Axis")
Pie Chart (1 of 2)
In Matplotlib we can create a pie chart using the pie Fruits Quantity
Apple 25
method.
Banana 40
Cherry 15
#Data in List
Dates 10
import matplotlib.pyplot as plt
fruits = ["Apples", "Bananas", "Cherries", "Dates"]
weight=[25,40,15,10]
fig, ax=plt.subplots()
ax.pie(weight, labels=fruits,autopct="%0.2f%%", explode=[0.0,
0.2, 0.0, 0.0])
Pie Chart (2 of 2)
Fruits Quantity
#Data in DataFrame
Apple 25
Banana 40
import matplotlib.pyplot as plt
Cherry 15
Dates 10
fig, ax=plt.subplots()
fruit_df=pd.DataFrame([['Apple', 25],
['Banana', 40],
['Cherry', 15],
['Dates', 10]])
plt.pie(fruit_df[1], labels=fruit_df[0], autopct="%0.2f%%",
explode=[0.0, 0.2, 0.0, 0.0])
Readings