Data Mining_Week - 6
Data Mining_Week - 6
Python is one of the most popular programming languages for data visualization,
owing to its libraries such as Matplotlib, Seaborn, Plotly, and Bokeh.
II. Principles of Data Visualization
Week 6 1
effectively and are easy to interpret by the target audience.
Use the correct chart type for the data (e.g., bar charts for categorical
data, line charts for time trends).
Highlight the key information you want the audience to focus on.
4. Visual Hierarchy: Direct the viewer's eye to the most critical information.
Order elements in a way that guides the viewer through the data logically.
6. Choosing the Right Chart Type: Select the chart type that best suits the data
and the message you want to convey.
1. Line Chart
Week 6 2
Use Case: Ideal for displaying trends over time (e.g., stock prices,
temperature change).
2. Bar Chart
3. Histogram
4. Scatter Plot
Use Case: Effective for showing relationships between two variables (e.g.,
height vs weight, price vs demand).
5. Box Plot
Use Case: Useful for visualizing the spread and identifying outliers within
a dataset (e.g., analyzing salaries).
6. Heatmap
Example: Visualizing the correlation matrix to find features that are highly
correlated.
7. Pie Chart
Week 6 3
Use Case: Useful for displaying proportions (e.g., market share, budget
allocation).
8. Pair Plot
Matplotlib and Seaborn are two of the most commonly used Python libraries for
data visualization. They offer a variety of visualization types and tools for
customizing charts.
1. Matplotlib
2. Seaborn
Week 6 4
complex visualizations and integrates well with Pandas.
3. Formatting Charts
sns.set(style='whitegrid', palette='muted')
sns.boxplot(x='Age', y='Salary', data=df)
plt.title('Box Plot with Custom Style')
plt.show()
Week 6 5
1. Creating a Line Chart with Matplotlib
import numpy as np
x = np.linspace(0, 10, 100)
y = np.sin(x)
plt.plot(x, y, label='Sine Wave', color='blue', linewidth=
2)
plt.xlabel('Time')
plt.ylabel('Amplitude')
plt.title('Line Chart Example - Sine Wave')
plt.legend()
plt.grid(True)
plt.show()
import numpy as np
data = np.random.rand(10, 12)
sns.heatmap(data, annot=True, cmap='coolwarm')
plt.title('Heatmap Example')
plt.show()
Week 6 6
iris = sns.load_dataset('iris')
sns.pairplot(iris, hue='species', markers=['o', 's', 'D'])
plt.suptitle('Pair Plot of Iris Dataset', y=1.02)
plt.show()
Data visualization is an essential skill for data analysts and data scientists. Using
libraries such as Matplotlib and Seaborn, you can create a variety of visualizations
to better understand data and communicate findings effectively. Mastery of data
visualization techniques not only enhances exploratory data analysis but also
improves the ability to present complex information clearly to stakeholders and
decision-makers. Following the principles of data visualization ensures that your
charts are not only visually appealing but also convey accurate and insightful
information effectively.
Week 6 7