0% found this document useful (0 votes)
3 views

Data Mining_Week - 6

The document provides an overview of data visualization in Python, emphasizing its importance for understanding data and communicating insights effectively. It outlines key principles for creating effective visualizations, types of charts, and practical examples using Python libraries like Matplotlib and Seaborn. Mastery of these techniques enhances data analysis and presentation skills for data analysts and scientists.

Uploaded by

nghiemhoa4895
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Data Mining_Week - 6

The document provides an overview of data visualization in Python, emphasizing its importance for understanding data and communicating insights effectively. It outlines key principles for creating effective visualizations, types of charts, and practical examples using Python libraries like Matplotlib and Seaborn. Mastery of these techniques enhances data analysis and presentation skills for data analysts and scientists.

Uploaded by

nghiemhoa4895
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Week 6

Data Visualization in Python Lecture Notes


I. Introduction to Data Visualization
Data visualization is the graphical representation of information and data using
visual elements like charts, graphs, and maps. By utilizing visual elements, data
visualization helps to see patterns, outliers, and trends that might not be evident in
raw data. It enables decision-makers to comprehend insights quickly and take
action accordingly. Visualizing data is particularly important for conveying findings
to stakeholders in an effective and easy-to-understand way.

Key Benefits of Data Visualization:

Simplifies complex data for better understanding.

Helps identify relationships, trends, and patterns.

Facilitates effective communication of data insights.

Supports exploratory data analysis to help discover new insights.

Python is one of the most popular programming languages for data visualization,
owing to its libraries such as Matplotlib, Seaborn, Plotly, and Bokeh.
II. Principles of Data Visualization

To create effective data visualizations, it is essential to follow certain fundamental


principles. These principles ensure that visualizations communicate insights

Week 6 1
effectively and are easy to interpret by the target audience.

1. Clarity: Ensure that the visual representation is clear, without unnecessary


clutter. The key message should be instantly understandable to the audience.

Avoid overloading the chart with too many elements.

Use appropriate labels and legends.

2. Accuracy: Represent the data truthfully without distorting the information.

Use the correct chart type for the data (e.g., bar charts for categorical
data, line charts for time trends).

Ensure scales are accurate and do not mislead the viewer.

3. Simplicity: The visualization should be as simple as possible while effectively


conveying the message.

Avoid using unnecessary colors or excessive visual elements.

Highlight the key information you want the audience to focus on.

4. Visual Hierarchy: Direct the viewer's eye to the most critical information.

Use different sizes, colors, or annotations to emphasize key points.

Order elements in a way that guides the viewer through the data logically.

5. Context: Provide adequate context for the data being visualized.

Include titles, labels, and units of measurement to explain what is being


presented.

Use comparisons to help the audience understand the significance of the


data (e.g., comparing current sales to previous years).

6. Choosing the Right Chart Type: Select the chart type that best suits the data
and the message you want to convey.

Example: Use a histogram for frequency distribution or a scatter plot for


relationships between two numerical variables.

III. Types of Data Visualization Charts in Python and Use Cases

1. Line Chart

Week 6 2
Use Case: Ideal for displaying trends over time (e.g., stock prices,
temperature change).

Example: Plotting daily sales over the course of a month to observe


trends.

2. Bar Chart

Use Case: Useful for comparing quantities across categories (e.g.,


revenue by region, product popularity).

Example: Showing total sales per product category in a given quarter.

3. Histogram

Use Case: Suitable for showing the frequency distribution of a variable


(e.g., distribution of customer ages).

Example: Plotting the distribution of customer ages to determine the target


audience.

4. Scatter Plot

Use Case: Effective for showing relationships between two variables (e.g.,
height vs weight, price vs demand).

Example: Analyzing the relationship between advertising expenditure and


sales.

5. Box Plot

Use Case: Useful for visualizing the spread and identifying outliers within
a dataset (e.g., analyzing salaries).

Example: Displaying the distribution of salaries across different


departments.

6. Heatmap

Use Case: Great for showing the intensity of relationships between


multiple variables (e.g., correlation between features).

Example: Visualizing the correlation matrix to find features that are highly
correlated.

7. Pie Chart

Week 6 3
Use Case: Useful for displaying proportions (e.g., market share, budget
allocation).

Example: Showing the percentage of expenses allocated to different


business departments.

8. Pair Plot

Use Case: Used for visualizing pairwise relationships between different


features in a dataset.

Example: Exploring relationships among several numerical features in a


dataset to determine how they interact.

IV. Using Matplotlib and Seaborn

Matplotlib and Seaborn are two of the most commonly used Python libraries for
data visualization. They offer a variety of visualization types and tools for
customizing charts.

1. Matplotlib

Introduction: Matplotlib is a foundational plotting library in Python. It is


highly customizable and can create static, animated, and interactive
visualizations.

Basic Plotting: The pyplot module provides functions similar to MATLAB,


allowing for quick and easy chart generation.

import matplotlib.pyplot as plt


x = [1, 2, 3, 4, 5]
y = [2, 5, 7, 1, 6]
plt.plot(x, y)
plt.xlabel('X-axis')
plt.ylabel('Y-axis')
plt.title('Basic Line Chart')
plt.show()

2. Seaborn

Introduction: Seaborn is built on top of Matplotlib and provides a high-


level interface for creating attractive statistical graphics. It simplifies

Week 6 4
complex visualizations and integrates well with Pandas.

Basic Plotting: Seaborn's default styles and color palettes make it a


powerful tool for creating visually appealing plots.

import seaborn as sns


import pandas as pd
# Example dataset
data = {'Age': [22, 25, 29, 31, 35], 'Salary': [35000,
42000, 48000, 50000, 60000]}
df = pd.DataFrame(data)
sns.scatterplot(x='Age', y='Salary', data=df)
plt.title('Scatter Plot of Age vs Salary')
plt.show()

3. Formatting Charts

Customizing Charts with Matplotlib: You can customize plots by adding


titles, labels, legends, gridlines, and modifying color schemes.

plt.plot(x, y, color='red', linestyle='--', marker='o')


plt.xlabel('X-axis')
plt.ylabel('Y-axis')
plt.title('Customized Line Chart')
plt.grid(True)
plt.legend(['Data Line'])
plt.show()

Seaborn Customizations: Seaborn offers easy-to-use functions for


setting themes, color palettes, and styles.

sns.set(style='whitegrid', palette='muted')
sns.boxplot(x='Age', y='Salary', data=df)
plt.title('Box Plot with Custom Style')
plt.show()

V. Creating Different Visualization Types with Matplotlib and Seaborn

Week 6 5
1. Creating a Line Chart with Matplotlib

import numpy as np
x = np.linspace(0, 10, 100)
y = np.sin(x)
plt.plot(x, y, label='Sine Wave', color='blue', linewidth=
2)
plt.xlabel('Time')
plt.ylabel('Amplitude')
plt.title('Line Chart Example - Sine Wave')
plt.legend()
plt.grid(True)
plt.show()

2. Creating a Histogram with Seaborn

import seaborn as sns


data = np.random.randn(1000)
sns.histplot(data, kde=True, bins=20)
plt.title('Histogram with Density Plot')
plt.xlabel('Value')
plt.ylabel('Frequency')
plt.show()

3. Creating a Heatmap with Seaborn

import numpy as np
data = np.random.rand(10, 12)
sns.heatmap(data, annot=True, cmap='coolwarm')
plt.title('Heatmap Example')
plt.show()

4. Creating a Pair Plot with Seaborn

Week 6 6
iris = sns.load_dataset('iris')
sns.pairplot(iris, hue='species', markers=['o', 's', 'D'])
plt.suptitle('Pair Plot of Iris Dataset', y=1.02)
plt.show()

5. Creating a Bar Chart with Matplotlib

categories = ['A', 'B', 'C', 'D']


values = [10, 25, 30, 20]
plt.bar(categories, values, color='orange', edgecolor='bla
ck')
plt.xlabel('Categories')
plt.ylabel('Values')
plt.title('Bar Chart Example')
plt.show()

Data visualization is an essential skill for data analysts and data scientists. Using
libraries such as Matplotlib and Seaborn, you can create a variety of visualizations
to better understand data and communicate findings effectively. Mastery of data
visualization techniques not only enhances exploratory data analysis but also
improves the ability to present complex information clearly to stakeholders and
decision-makers. Following the principles of data visualization ensures that your
charts are not only visually appealing but also convey accurate and insightful
information effectively.

Week 6 7

You might also like