0% found this document useful (0 votes)

13 views

Data Exploration and Visualization - Laboratory Practical Experiments

All the best

Uploaded by

madhanraj143nve

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views

Data Exploration and Visualization - Laboratory Practical Experiments

All the best

Uploaded by

madhanraj143nve

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 34

Click on Subject/Paper under Semester to enter.

Professional English Discrete Mathematics Environmental Sciences

Professional English - - II - HS3252 - MA3354 and Sustainability -
I - HS3152 GE3451
Digital Principles and
Statistics and Probability and
Computer Organization
Matrices and Calculus Numerical Methods - Statistics - MA3391
- CS3351
- MA3151 MA3251
3rd Semester
1st Semester

4th Semester
2nd Semester

Database Design and Operating Systems -

Engineering Physics - Engineering Graphics
Management - AD3391 AL3452
PH3151 - GE3251

Physics for Design and Analysis of Machine Learning -

Engineering Chemistry Information Science Algorithms - AD3351 AL3451
- CY3151 - PH3256
Data Exploration and Fundamentals of Data
Basic Electrical and
Visualization - AD3301 Science and Analytics
Problem Solving and Electronics Engineering -
BE3251 - AD3491
Python Programming -
GE3151 Artificial Intelligence
Data Structures Computer Networks
- AL3391
Design - AD3251 - CS3591

Deep Learning -
AD3501

Embedded Systems
Data and Information Human Values and
and IoT - CS3691
5th Semester

Security - CW3551 Ethics - GE3791

6th Semester

7th Semester

8th Semester

Open Elective-1
Distributed Computing Open Elective 2
- CS3551 Project Work /
Elective-3
Open Elective 3 Intership
Big Data Analytics - Elective-4
CCS334 Open Elective 4
Elective-5
Elective 1 Management Elective
Elective-6
Elective 2
All Computer Engg Subjects - [ B.E., M.E., ] (Click on Subjects to enter)
Programming in C Computer Networks Operating Systems
Programming and Data Programming and Data Problem Solving and Python
Structures I Structure II Programming
Database Management Systems Computer Architecture Analog and Digital
Communication
Design and Analysis of Microprocessors and Object Oriented Analysis
Algorithms Microcontrollers and Design
Software Engineering Discrete Mathematics Internet Programming
Theory of Computation Computer Graphics Distributed Systems
Mobile Computing Compiler Design Digital Signal Processing
Artificial Intelligence Software Testing Grid and Cloud Computing
Data Ware Housing and Data Cryptography and Resource Management
Mining Network Security Techniques
Service Oriented Architecture Embedded and Real Time Multi - Core Architectures
Systems and Programming
Probability and Queueing Theory Physics for Information Transforms and Partial
Science Differential Equations
Technical English Engineering Physics Engineering Chemistry
Engineering Graphics Total Quality Professional Ethics in
Management Engineering
Basic Electrical and Electronics Problem Solving and Environmental Science and
and Measurement Engineering Python Programming Engineering
www.BrainKart.com

VARUVN VADIVELAN INSTITUTE OF TECHNOLOGY

DHARMAPURI – 636701

DEPARTMENT OF ARTIFICIAL INTELLIGENCE ANDDATA

SCIENCE

AD3301

DATA EXPLORATION AND VISUALIZATION LABORATORY

https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com

PRACTICAL EXERCISES:

1. Install the data Analysis and Visualization tool: R/ Python /Tableau Public/ Power BI.
2. Perform exploratory data analysis (EDA) on with datasets like email data set. Export all your emails as a
dataset, import them inside a pandas data frame, visualize them and get different insights from the data.
3. Working with Numpy arrays, Pandas data frames , Basic plots using Matplotlib.
4. Explore various variable and row filters in R for cleaning data. Apply various plot features in R on sample
data sets and visualize.
5. Perform Time Series Analysis and apply the various visualization techniques.
6. Perform Data Analysis and representation on a Map using various Map data sets with Mouse Rollover
effect, user interaction, etc..
7. Build cartographic visualization for multiple datasets involving various countries of the world;
states and districts in India etc.
8. Perform EDA on Wine Quality Data Set.
9. Use a case study on a data set and apply the various EDA and visualization techniques and present an
analysis report.

https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com

LIST OF EXPERIMENTS
S.NO EXPERIMENS PAGE NO MARKS SIGNATURE

https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com
EX NO: 1
DATE: INSTALLING DATA ANALYSIS AND VISUALIZATION TOOL

AIM:
To write a steps to install data Analysis and Visualization tool: R/ Python /Tableau Public/ Power BI.

PROCEDURE:
R:
 R is a programming language and software environment specifically designed for statistical
computing and graphics.
Windows:
 Download R from the official website: https://cran.r-project.org/mirrors.html
 Run the installer and follow the installation instructions.
macOS:
 Download R for macOS from the official website: https://cran.r-project.org/mirrors.html
 Open the downloaded file and follow the installation instructions.
Linux:
 You can typically install R using your distribution's package manager. For example, on Ubuntu, you
can use the following command:
csharp
Copy code
sudo apt-get install r-base
Python:
 Python is a versatile programming language widely used for data analysis. You can install Python
and data analysis libraries using a package manager like conda or pip.
Windows:
 Download Python from the official website: https://www.python.org/downloads/windows/
 Run the installer, and make sure to check the "Add Python to PATH" option during installation.
 You can install data analysis libraries like NumPy, pandas, and matplotlib using pip.
macOS:
 macOS typically comes with Python pre-installed. You can install additional packages using pip or
set up a virtual environment using Ana
 conda.
Linux:
 Python is often pre-installed on Linux. Use your distribution's package manager to install Python if
it's not already installed. You can also use conda or pip to manage Python packages.

https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com

Tableau Public:
 Tableau Public is a free version of Tableau for creating and sharing interactive data visualizations.
 Go to the Tableau Public website: https://public.tableau.com/s/gallery
 Download and install Tableau Public by following the instructions on the website.
Power BI:
 Power BI is a business analytics service by Microsoft for creating interactive reports and dashboards.
 Go to the Power BI website: https://powerbi.microsoft.com/en-us/downloads/
 Download and install Power BI Desktop, which is the tool for creating reports and dashboards.
 Please note that the installation steps may change over time, so it's a good idea to check the official
websites for the most up-to-date instructions and download links. Additionally, system requirements
may vary, so make sure your computer meets the necessary specifications for these tools.

https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com
Ex no: 2
Date: Exploratory Data Analysis (EDA) on with Datasets

Aim:
To Perform exploratory data analysis (EDA) on with datasets like email data set.
Procedure:
Exploratory Data Analysis (EDA) on email datasets involves importing the data, cleaning it, visualizing
it, and extracting insights. Here's a step-by-step guide on how to perform EDA on an email dataset using
Python and Pandas
1. Import Necessary Libraries:
Import the required Python libraries for data analysis and visualization.
2. Load Email Data:
Assuming you have a folder containing email files (e.g., .eml files), you can use the email library to
parse and extract the email contents.
3. Data Cleaning:
Depending on your dataset, you may need to clean and preprocess the data. Common
cleaning steps include handling missing values, converting dates to datetime format, and removing
duplicates.
4. Data Exploration:
Now, you can start exploring the dataset using various techniques. Here are some common EDA
tasks:
Basic Statistics:
Get summary statistics of the dataset.
Distribution of Dates:
Visualize the distribution of email dates.
5. Word Cloud for Subject or Message:
Create a word cloud to visualize common words in email subjects or messages.
6. Top Senders and Recipients:
Find the top email senders and recipients.
Depending on your dataset, you can explore further, analyze sentiment, perform network analysis, or
any other relevant analysis to gain insights from your email data.

https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com
Program:
# Import necessary libraries
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
# Load the dataset
df = pd.read_csv('D:\ARCHANA\dxv\LAB\DXV\Emaildataset.csv')
# Display basic information about the dataset
print(df.info())
# Display the first few rows of the dataset
print(df.head())
# Descriptive statistics
print(df.describe())
# Check for missing values
print(df.isnull().sum())
# Visualize the distribution of numerical variables
sns.pairplot(df)
plt.show()
# Visualize the distribution of categorical variables
sns.countplot(x='label', data=df)
plt.show()
# Correlation matrix for numerical variables
correlation_matrix = df.corr()
sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm')
plt.show()
# Word cloud for text data (if you have a column with text data)
from wordcloud import WordCloud
text_data = ' '.join(df['text_column'])
wordcloud = WordCloud(width=800, height=400, random_state=21,
max_font_size=110).generate(text_data)
plt.figure(figsize=(10, 7))
plt.imshow(wordcloud, interpolation="bilinear")
plt.axis('off')
plt.show()

OUT PUT:
Data columns (total 4 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Unnamed: 0 5171 non-null int64
1 label 5171 non-null object
2 text 5171 non-null object
3 label_num 5171 non-null int64
dtypes: int64(2), object(2)
memory usage: 161.7+ KB
None
Unnamed: 0 label text label_num
0 605 ham Subject: enron methanol ; meter # : 988291\r\n... 0

https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com
1 2349 ham Subject: hpl nom for january 9 , 2001\r\n( see... 0
2 3624 ham Subject: neon retreat\r\nho ho ho , we ' re ar... 0
3 4685 spam Subject: photoshop , windows , office . cheap ... 1
4 2030 ham Subject: re : indian springs\r\nthis deal is t... 0
Unnamed: 0 label_num
count 5171.000000 5171.000000
mean 2585.000000 0.289886
std 1492.883452 0.453753
min 0.000000 0.000000
25% 1292.500000 0.000000
50% 2585.000000 0.000000
75% 3877.500000 1.000000
max 5170.000000 1.000000
Unnamed: 0 0
label 0
text 0
label_num 0
dtype: int64

https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com

Result:
The above Performing exploratory data analysis (EDA) on with datasets like email data set has been
performed successfully.

https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com
Ex no: 03
Date: Working with Numpy arrays, Pandas data frames , Basic plots using Matplotlib

Aim:
Write the steps for Working with Numpy arrays, Pandas data frames , Basic plots using Matplotlib
Procedure:
1. NumPy:
NumPy is a fundamental library for numerical computing in Python. It provides support for multi-
dimensional arrays and various mathematical functions. To get started, you'll first need to install NumPy if
you haven't already (you can use pip):

pip install numpy

Once NumPy is installed, you can use it as follows:

import numpy as np
# Creating NumPy arrays
arr = np.array([1, 2, 3, 4, 5])
print(arr)
# Basic operations
mean = np.mean(arr)
sum = np.sum(arr)
# Mathematical functions
square_root = np.sqrt(arr)
exponential = np.exp(arr)
# Indexing and slicing
first_element = arr[0]
sub_array = arr[1:4]
# Array operations
combined_array = np.concatenate([arr, sub_array])
OUTPUT:

https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com
2. Pandas:
Pandas is a powerful library for data manipulation and analysis.
You can install Pandas using pip:
pip install pandas
Here's how to work with Pandas DataFrames:
import pandas as pd

# Creating a DataFrame from a dictionary

data = {
'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Emily'],
'Age': [25, 30, 35, 28, 22],
'City': ['New York', 'Los Angeles', 'Chicago', 'Houston', 'Miami']
}

df = pd.DataFrame(data)
# Display the entire DataFrame
print("DataFrame:")
print(df)
# Accessing specific columns
print("\nAccessing 'Name' column:")
print(df['Name'])
# Adding a new column
df['Salary'] = [50000, 60000, 75000, 48000, 55000]
# Filtering data
print("\nPeople older than 30:")
print(df[df['Age'] > 30])
# Sorting by a column
print("\nSorting by 'Age' in descending order:")
print(df.sort_values(by='Age', ascending=False))
# Aggregating data
print("\nAverage age:")
print(df['Age'].mean())
# Grouping and aggregation
grouped_data = df.groupby('City')['Salary'].mean()
print("\nAverage salary by city:")
print(grouped_data)
# Applying a function to a column
df['Age_Squared'] = df['Age'].apply(lambda x: x ** 2)
# Removing a column
df = df.drop(columns=['Age_Squared'])
# Saving the DataFrame to a CSV file
df.to_csv('output.csv', index=False)
# Reading a CSV file into a DataFrame
new_df = pd.read_csv('output.csv')
print("\nDataFrame from CSV file:")
print(new_df)

https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com
OUTPUT:

https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com
3. Matplotlib:

Matplotlib is a popular library for creating static, animated, or interactive plots and graphs.
Install Matplotlib using pip:
pip install matplotlib
Here's a simple example of creating a basic plot:
import matplotlib.pyplot as plt
# Sample data
x = np.linspace(0, 10, 100)
y = np.sin(x)
# Create a line plot
plt.figure(figsize=(8, 6))
plt.plot(x, y, label='Sine Wave')
plt.title('Sine Wave Plot')
plt.xlabel('X-axis')
plt.ylabel('Y-axis')
plt.legend()
plt.grid(True)
plt.show()
OUTPUT:

https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com

RESULT:
Thus the above working with numpy, pandas, matplotlib has been completed successfully.

https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com
Ex no:4
Date: Exploring various variable and row filters in R for cleaning data
Aim:
Exploring various variable and row filters in R for cleaning data.
PROCEDURE:
Data Preparation and Cleaning
First, let's create a sample dataset and then explore various variable and row filters to clean the data

# Create a sample dataset

set.seed(123)
data <- data.frame(
ID = 1:10,
Age = sample(18:60, 10, replace = TRUE),
Gender = sample(c("Male", "Female"), 10, replace = TRUE),
Score = sample(1:100, 10)
)
# Print the sample data
print(data)
OUTPUT:

Variable Filters
1. Filtering by a Specific Value:
To filter rows based on a specific value in a variable (e.g., only show rows where Age is greater than
30):
filtered_data <- data[data$Age > 30, ]

2. Filtering by Multiple Conditions:

https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com
You can filter rows based on multiple conditions using the & (AND) or | (OR) operators (e.g., show
rows where Age is greater than 30 and Gender is "Male"):
filtered_data <- data[data$Age > 30 & data$Gender == "Male", ]
Row Filters
1. Removing Duplicate Rows:
To remove duplicate rows based on certain columns (e.g., remove duplicates based on 'ID'):
cleaned_data <- unique(data[, c("ID", "Age", "Gender")])
2. Removing Rows with Missing Values:
To remove rows with missing values (NA):
cleaned_data <- na.omit(data)
Data Visualization
1. Apply various plot features using the ggplot2 package to visualize the cleaned data.
# Load the ggplot2 package
library(ggplot2)
# Create a scatterplot of Age vs. Score with points colored by Gender
ggplot(data = cleaned_data, aes(x = Age, y = Score, color = Gender)) +
geom_point() +
labs(title = "Scatterplot of Age vs. Score",
x = "Age",
y = "Score")
# Create a histogram of Age
ggplot(data = cleaned_data, aes(x = Age)) +
geom_histogram(binwidth = 5, fill = "blue", alpha = 0.5) +
labs(title = "Histogram of Age",
x = "Age",
y = "Frequency")
# Create a bar chart of Gender distribution
ggplot(data = cleaned_data, aes(x = Gender)) +
geom_bar(fill = "green", alpha = 0.7) +
labs(title = "Gender Distribution",
x = "Gender",
y = "Count")

RESULT:
Thus the above Exploring various variable and row filters in R for cleaning data.

https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com
EXNO: 5 PERFORM EDA ON WINE QUALITY DATA SET.
DATE
AIM:
To write a program to Perform EDA on Wine Quality Data Set.
PROGRAM:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
# Load the dataset
data = pd.read_csv("pathname")
# Display the first few rows of the dataset
print(data.head())
# Get information about the dataset
print(data.info())
# Summary statistics
print(data.describe())
# Distribution of wine quality
sns.countplot(data['quality'])
plt.title(" Wine Quality data set")
plt.show()
# Box plots for selected features by wine quality
features = ['alcohol', 'volatile acidity', 'citric acid', 'residual sugar']
for feature in features:
plt.figure(figsize=(8, 6))
sns.boxplot(x='quality', y=feature, data=data)
plt.title(f'{feature} by Wine Quality')
plt.show()
# Pair plot of selected features
sns.pairplot(data, vars=['alcohol', 'volatile acidity', 'citric acid', 'residual sugar'],
hue='quality', diag_kind='kde')
plt.suptitle("Pair Plot of Selected Features")
plt.show()
# Correlation heatmap
corr_matrix = data.corr()
plt.figure(figsize=(10, 8))
sns.heatmap(corr_matrix, annot=True, cmap="coolwarm", fmt=".2f")
plt.title("Correlation Heatmap")
plt.show()
# Histograms of selected features
features = ['alcohol', 'volatile acidity', 'citric acid', 'residual sugar']
for feature in features:
plt.figure(figsize=(6, 4))
sns.histplot(data[feature], kde=True, bins=20)
plt.title(f"Distribution of {feature}")
plt.show()

OUTPUT:

https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com

RESULT:
Thus the above program to to Perform EDA on Wine Quality Data Set.

https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com
EX NO:6
DATE: TIME SERIES ANALYSIS USING VARIOUS VISULAIZATION
TECHNIQUES
AIM:
To perform time series analysis and apply the various visualization techniques.

DOWNLOADING DATASET:
Step 1: Open google and type the following path in the address bar and download a dataset.
http://github.com/jbrownlee/Datasets.
Step 2: write the following code to get the details.
from pandas import read_csv
from matplotlib import pyplot
series=read_csv(‘pathname')
print(series.head())
series.plot()
pyplot.show()

OUTPUT:

https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com
Step 3: To get the time series line plot:
series.plot(style='-.')
pyplot.show()

Step 4:
To create a Histogram:
series.hist()
pyplot.show()

https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com

Step 5:
To create density plot:
series.plot(kind='kde')
pyplot.show()

Result:
Thus the above time analysis has been checked with Various visualization techniques.

https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com

EX NO: 7
DATE: DATA ANALYSIS AND REPRESENTATION ON A MAP

AIM:
Write a program to perform data analysis and representation on a map using various map data sets
with mouse rollover effect, user interaction.
PROCEDURE:
STEP 1:
 Make sure to install the necessary libraries.
pip install geopandas folium bokeh
PROGRAM:
from bokeh.io import show
from bokeh.models import ColumnDataSource, HoverTool
from bokeh.plotting import figure
from bokeh.layouts import column
import pandas as pd
import folium
# Load your data
data = pd.read_csv('D:\ARCHANA\dxv\LAB\DXV\geographic.csv')
# Create a Bokeh figure
p = figure(width=800, height=400, tools='pan,wheel_zoom,reset')
# Create a ColumnDataSource to hold data
source = ColumnDataSource(data)
# Add circle markers to the figure
p.circle(x='Longitude', y='Latitude', size=10, source=source, color='orange')
# Create a hover tool for mouse rollover effect
hover = HoverTool()
hover.tooltips = [("Info", "@Info"), ("Latitude", "@Latitude"), ("Longitude",
"@Longitude")]
p.add_tools(hover)
# Display the Bokeh plot
layout = column(p)
show(layout)
# Create a map centered at a specific location
m = folium.Map(location=[latitude, longitude], zoom_start=10)
# Add markers for your data points
for index, row in data.iterrows():
folium.Marker(
location=[row['Latitude'], row['Longitude']],
popup=row['Info'], # Display additional info on mouse click
).add_to(m)
# Save the map to an HTML file
m.save('map.html')

https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com

OUPUT:

RESULT:
Data analysis and representation on a map using various map data sets with mouse rollover effect,
user interaction has been completed successfully.

https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com
EX NO: 8
DATE: BUILDING CARTOGRAPHIC VISUALIZATION

AIM:
Build cartographic visualization for multiple datasets involving various countries of the world;
states and districts in India etc
PROCEDURE:
STEP 1:
Collect Datasets
Gather the datasets containing geographical information for countries, states, or districts. Make sure these
datasets include the necessary attributes for mapping (e.g., country/state/district names, codes, and
relevant data).
STEP 2:
Install Required Libraries:
pip install geopandas matplotlib
STEP 3:
Load Geographic Data:
Use Geopandas to load the geographic data for countries, states, or districts. Make sure to match the
geographical data with your datasets based on the common attributes.
STEP 4:
Merge Datasets:
Merge your datasets with the geographic data based on common attributes. This step is crucial for linking
your data to the corresponding geographic regions.
STEP 5:
Create Cartographic Visualizations:
Use Matplotlib to create cartographic visualizations. You can create separate plots for different datasets
or overlay them on a single map.
STEP 6:
Customize and Enhance:
Customize your visualizations based on your needs. You can add legends, labels, titles, and other
elements to enhance the interpretability of your maps.
STEP 7:
Save and Share:
Save your visualizations as image files or interactive plots if needed. You can then share these
visualizations with others.

https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com
PROGRAM:
import pandas as pd
import geopandas as gpd
import shapely
# needs 'descartes'
import matplotlib.pyplot as plt
df = pd.DataFrame({'city': ['Berlin', 'Paris', 'Munich'],
'latitude': [52.518611111111, 48.856666666667, 48.137222222222],
'longitude': [13.408333333333, 2.3516666666667, 11.575555555556]})
gdf = gpd.GeoDataFrame(df.drop(['latitude', 'longitude'], axis=1),
crs={'init': 'epsg:4326'},
geometry=[shapely.geometry.Point(xy)
for xy in zip(df.longitude, df.latitude)])
print(gdf)
world = gpd.read_file(gpd.datasets.get_path('naturalearth_lowres'))
base = world.plot(color='white', edgecolor='black')
gdf.plot(ax=base, marker='o', color='red', markersize=5)
plt.show()

OUTPUT:
city geometry
0 Berlin POINT (13.40833 52.51861)
1 Paris POINT (2.35167 48.85667)
2 Munich POINT (11.57556 48.13722)

https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com

RESULT:
Build cartographic visualization for multiple datasets involving various countries of the world;
has been visualized successfully.

https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com

EX NO :9
DATE: VISUALIZING VARIOUS EDA TECHNIQUES AS CASE STUDY FOR
IRIS DATASET
AIM:
Use a case study on a data set and apply the various EDA and visualization techniques and
present an analysis report.
PROCEDURE:
Import Libraries:
Start by importing the necessary libraries and loading the dataset.
Descriptive Statistics:
Compute and display descriptive statistics.
python
Check for Missing Values:
Verify if there are any missing values in the dataset.
Visualize Data Distributions:
Visualize the distribution of numerical variables.
python
Correlation Heatmap:
Examine the correlation between numerical variables.
Boxplots for Categorical Variables:
Use boxplots to visualize the distribution of features by species.
Violin Plots:
Combine box plots with kernel density estimation for better visualization.
Correlation between Features:
Visualize pair-wise feature correlations.
Conclusion and Summary:
Summarize key findings and insights from the analysis.
This case study provides a comprehensive analysis of the Iris dataset, including data exploration,
descriptive statistics, visualization of data distributions, correlation analysis, and feature-specific
visualizations.

https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
Click on Subject/Paper under Semester to enter.
Professional English Discrete Mathematics Environmental Sciences
Professional English - - II - HS3252 - MA3354 and Sustainability -
I - HS3152 GE3451
Digital Principles and
Statistics and Probability and
Computer Organization
Matrices and Calculus Numerical Methods - Statistics - MA3391
- CS3351
- MA3151 MA3251
3rd Semester
1st Semester

4th Semester
2nd Semester

Database Design and Operating Systems -

Engineering Physics - Engineering Graphics
Management - AD3391 AL3452
PH3151 - GE3251

Physics for Design and Analysis of Machine Learning -

Deep Learning -
AD3501

Embedded Systems
Data and Information Human Values and
and IoT - CS3691
5th Semester

Security - CW3551 Ethics - GE3791

6th Semester

7th Semester

8th Semester

Data Driven System Engineering: Automotive ECU Development
From Everand
Data Driven System Engineering: Automotive ECU Development
James Wen
No ratings yet
Operating Systems - AL3452 - Notes - Unit 1 - Introduction
No ratings yet
Operating Systems - AL3452 - Notes - Unit 1 - Introduction
36 pages
FDS UNIT 5 IMPORTANT QUES
No ratings yet
FDS UNIT 5 IMPORTANT QUES
9 pages
Fds Unit 3 Notes
No ratings yet
Fds Unit 3 Notes
39 pages
Machine Learning - AL3451 - Written Notes
No ratings yet
Machine Learning - AL3451 - Written Notes
216 pages
Deep Learning - AD3501 - Notes - Unit 3 - Recurrent Neural Networks
No ratings yet
Deep Learning - AD3501 - Notes - Unit 3 - Recurrent Neural Networks
33 pages
Design and Analysis of Algorithms - AD3351 - Important Questions With Answer - Unit 2 - Brute Force and Divide and Conquer
No ratings yet
Design and Analysis of Algorithms - AD3351 - Important Questions With Answer - Unit 2 - Brute Force and Divide and Conquer
9 pages
Machine Learning - AL3451 - Notes - Unit 1 - Introduction To Machine Learning
No ratings yet
Machine Learning - AL3451 - Notes - Unit 1 - Introduction To Machine Learning
29 pages
Deep Learning - AD3501 - Important Questions and 2 Marks with Answer - Unit 2 - Convolutional Neural Networks
No ratings yet
Deep Learning - AD3501 - Important Questions and 2 Marks with Answer - Unit 2 - Convolutional Neural Networks
10 pages
BD 3
No ratings yet
BD 3
15 pages
Data Exploration and Visualization - AD3301 - Important Questions With Answer - Unit 5 - Multivariate and Time Series Analysis
No ratings yet
Data Exploration and Visualization - AD3301 - Important Questions With Answer - Unit 5 - Multivariate and Time Series Analysis
8 pages
Artificial Intelligence - AL3391 - Important Questions With Answer - Unit 1 - Intelligent Agents
No ratings yet
Artificial Intelligence - AL3391 - Important Questions With Answer - Unit 1 - Intelligent Agents
10 pages
Data Exploration and Visualization - AD3301 - Important Questions With Answer - Unit 3 - Univariate Analysis
No ratings yet
Data Exploration and Visualization - AD3301 - Important Questions With Answer - Unit 3 - Univariate Analysis
8 pages
Deep Learning - AD3501 - Important Questions and Question Bank
No ratings yet
Deep Learning - AD3501 - Important Questions and Question Bank
18 pages
Database Design and Management - AD3391 - Important Questions With Answer - Unit 3 - Relational Database Design and Normalization
No ratings yet
Database Design and Management - AD3391 - Important Questions With Answer - Unit 3 - Relational Database Design and Normalization
10 pages
Artificial Intelligence - AL3391 - Important Questions With Answer - Unit 2 - Problem Solving
No ratings yet
Artificial Intelligence - AL3391 - Important Questions With Answer - Unit 2 - Problem Solving
9 pages
Deep Learning - AD3501 - Important Questions
No ratings yet
Deep Learning - AD3501 - Important Questions
12 pages
Click On Subject/Paper Under Semester To Enter.: - HS3152 - HS3252 - MA3354 - GE3451 MA3391
No ratings yet
Click On Subject/Paper Under Semester To Enter.: - HS3152 - HS3252 - MA3354 - GE3451 MA3391
43 pages
Fundamentals of Data Science and Analytics - AD3491 - Important Questions with Answer - Unit 1 - Introduction to Data Science
No ratings yet
Fundamentals of Data Science and Analytics - AD3491 - Important Questions with Answer - Unit 1 - Introduction to Data Science
28 pages
Deep Learning - AD3501 - Important Questions and 2 Marks With Answer - Unit 4 - Model Evaluation
No ratings yet
Deep Learning - AD3501 - Important Questions and 2 Marks With Answer - Unit 4 - Model Evaluation
12 pages
Design and Analysis of Algorithms - AD3351 - Important Questions With Answer - Unit 3 - Dynamic Programming and Greedy Technique
No ratings yet
Design and Analysis of Algorithms - AD3351 - Important Questions With Answer - Unit 3 - Dynamic Programming and Greedy Technique
8 pages
Probability and Statistics - MA3391 - Important Questions and 2 Marks With Answer - Unit 1 and 2
No ratings yet
Probability and Statistics - MA3391 - Important Questions and 2 Marks With Answer - Unit 1 and 2
42 pages
Deep Learning - AD3501 - Important Question and 2 Marks With Answers - Unit 1
No ratings yet
Deep Learning - AD3501 - Important Question and 2 Marks With Answers - Unit 1
13 pages
Data Exploration and Visualization - AD3301 - Important Questions With Answer - Unit 2 - Visualizing Using Matplotlib
No ratings yet
Data Exploration and Visualization - AD3301 - Important Questions With Answer - Unit 2 - Visualizing Using Matplotlib
8 pages
Data and Information Security - CW3551 - Important Questions and Question Bank
No ratings yet
Data and Information Security - CW3551 - Important Questions and Question Bank
9 pages
Machine Learning - AL3451 - Important Questions With Answer
No ratings yet
Machine Learning - AL3451 - Important Questions With Answer
25 pages
Data Structures Design Laboratory - AD3271 - Lab Manual2-1
No ratings yet
Data Structures Design Laboratory - AD3271 - Lab Manual2-1
76 pages
Probability and Statistics - MA3391 - Full Notes
No ratings yet
Probability and Statistics - MA3391 - Full Notes
292 pages
FDS UNIT 4 IMPORTANT QUES
No ratings yet
FDS UNIT 4 IMPORTANT QUES
18 pages
unit 1 foda
No ratings yet
unit 1 foda
10 pages
Machine Learning - AL3451 - Notes - Unit 5 - Design and Analysis of Machine Learning Experiments
No ratings yet
Machine Learning - AL3451 - Notes - Unit 5 - Design and Analysis of Machine Learning Experiments
33 pages
Deep Learning - AD3501 - Important Questions and 2 Marks With Answer - Unit 5 - Autoencoders and Generative Models
No ratings yet
Deep Learning - AD3501 - Important Questions and 2 Marks With Answer - Unit 5 - Autoencoders and Generative Models
9 pages
Artificial Intelligence - AL3391 - Important Questions With Answer - Unit 3 - Game Playing and CSP
No ratings yet
Artificial Intelligence - AL3391 - Important Questions With Answer - Unit 3 - Game Playing and CSP
8 pages
Artificial Intelligence - AL3391 - Hand Written Notes - Unit 5 - Probabilistic Reasoning
No ratings yet
Artificial Intelligence - AL3391 - Hand Written Notes - Unit 5 - Probabilistic Reasoning
45 pages
Design and Analysis of Algorithms - AD3351 - Important Questions With Answer - Unit 1 - Introduction
No ratings yet
Design and Analysis of Algorithms - AD3351 - Important Questions With Answer - Unit 1 - Introduction
10 pages
Data Exploration and Visualization - AD3301 - Important Questions With Answer - Unit 1 - Exploratory Data Analysis
No ratings yet
Data Exploration and Visualization - AD3301 - Important Questions With Answer - Unit 1 - Exploratory Data Analysis
8 pages
FDS UNIT 3 IMPORTANT QUES
No ratings yet
FDS UNIT 3 IMPORTANT QUES
46 pages
Data Structures Design Laboratory - AD3271 - Lab Manual
No ratings yet
Data Structures Design Laboratory - AD3271 - Lab Manual
43 pages
Data Exploration and Visualization - AD3301 - Hand Written Notes - Unit 5 - Multivariate and Time Series Analysis
No ratings yet
Data Exploration and Visualization - AD3301 - Hand Written Notes - Unit 5 - Multivariate and Time Series Analysis
59 pages
Deep Learning - AD3501 - Notes - Unit 2 - Convolutional Neural Networks
No ratings yet
Deep Learning - AD3501 - Notes - Unit 2 - Convolutional Neural Networks
36 pages
Data and Information Security - CW3551 - Important Questions on Model Paper With Answers
No ratings yet
Data and Information Security - CW3551 - Important Questions on Model Paper With Answers
19 pages
Deep Learning - AD3501 - Notes - Unit 1 - Deep Networks Basics
No ratings yet
Deep Learning - AD3501 - Notes - Unit 1 - Deep Networks Basics
45 pages
Database Design and Management - AD3391 - Important Questions With Answer - Unit 2 - Relational Model and SQL
100% (1)
Database Design and Management - AD3391 - Important Questions With Answer - Unit 2 - Relational Model and SQL
12 pages
Artificial Intelligence - AL3391 - Important Questions With Answer - Unit 5 - Probabilistic Reasoning
No ratings yet
Artificial Intelligence - AL3391 - Important Questions With Answer - Unit 5 - Probabilistic Reasoning
8 pages
Deep Learning - AD3501 - Notes - Unit 4 - Model Evaluation
No ratings yet
Deep Learning - AD3501 - Notes - Unit 4 - Model Evaluation
18 pages
Design and Analysis of Algorithms - AD3351 - Hand Written Notes - Unit 3 - Dynamic Programming and Greedy Technique
No ratings yet
Design and Analysis of Algorithms - AD3351 - Hand Written Notes - Unit 3 - Dynamic Programming and Greedy Technique
41 pages
Data Exploration and Visualization - AD3301 - Important Questions With Answer - Unit 4 - Bivariate Analysis
No ratings yet
Data Exploration and Visualization - AD3301 - Important Questions With Answer - Unit 4 - Bivariate Analysis
8 pages
Probability and Statistics - MA3391 - Notes (1)
No ratings yet
Probability and Statistics - MA3391 - Notes (1)
292 pages
Machine Learning - AL3451 - Notes - Unit 4 - Neural Networks
No ratings yet
Machine Learning - AL3451 - Notes - Unit 4 - Neural Networks
38 pages
Brainkart_Operating Systems - AL3452 2021 Regulation - Notes.bin
No ratings yet
Brainkart_Operating Systems - AL3452 2021 Regulation - Notes.bin
282 pages
Fundamentals of Data Science and Analytics - AD3491 - Important Questions With Answer - Unit 2 - Descriptive Analytics
No ratings yet
Fundamentals of Data Science and Analytics - AD3491 - Important Questions With Answer - Unit 2 - Descriptive Analytics
92 pages
AID 5th Semester Deep Learning Laboratory - AD3511 - Lab Manual
No ratings yet
AID 5th Semester Deep Learning Laboratory - AD3511 - Lab Manual
86 pages
Fds Unit 2 Notes
No ratings yet
Fds Unit 2 Notes
84 pages
Artificial Intelligence - AL3391 - Important Questions With Answer - Unit 4 - Logical Reasoning
No ratings yet
Artificial Intelligence - AL3391 - Important Questions With Answer - Unit 4 - Logical Reasoning
8 pages
Artificial Intelligence - AL3391 - Hand Written Notes - Unit 1 - Intelligent Agents
No ratings yet
Artificial Intelligence - AL3391 - Hand Written Notes - Unit 1 - Intelligent Agents
65 pages
AID 4th Semester Machine Learning Laboratory - Lab Manual
No ratings yet
AID 4th Semester Machine Learning Laboratory - Lab Manual
56 pages
AID 3rd Semester - Design and Analysis of Algorithms Laboratory - AD3351 - Lab Manual
No ratings yet
AID 3rd Semester - Design and Analysis of Algorithms Laboratory - AD3351 - Lab Manual
36 pages
AID 4th Semester Machine Learning Laboratory - Lab Manual2
No ratings yet
AID 4th Semester Machine Learning Laboratory - Lab Manual2
61 pages
Python Machine Learning: Machine Learning Algorithms for Beginners - Data Management and Analytics for Approaching Deep Learning and Neural Networks from Scratch
From Everand
Python Machine Learning: Machine Learning Algorithms for Beginners - Data Management and Analytics for Approaching Deep Learning and Neural Networks from Scratch
Ahmed Ph. Abbasi
No ratings yet
NMR-Spectroscopy: Modern Spectral Analysis
From Everand
NMR-Spectroscopy: Modern Spectral Analysis
Ursula Weber
No ratings yet
Discrete Math Basis Syllabus
No ratings yet
Discrete Math Basis Syllabus
5 pages
MATH 140 Chapter 3 Practice Sheet 3 PDF
No ratings yet
MATH 140 Chapter 3 Practice Sheet 3 PDF
2 pages
Design and Analysis of Experiments
No ratings yet
Design and Analysis of Experiments
6 pages
JupyterLab MAPEO DE INUNDACIONES
No ratings yet
JupyterLab MAPEO DE INUNDACIONES
19 pages
Essential Statistics in Business and Economics 3rd Edition Doane Test Bank download pdf
100% (22)
Essential Statistics in Business and Economics 3rd Edition Doane Test Bank download pdf
59 pages
Learning Curves: Distribution Without The Prior Written Consent of Mcgraw-Hill Education
No ratings yet
Learning Curves: Distribution Without The Prior Written Consent of Mcgraw-Hill Education
18 pages
Unit 1 and 2. Dfa
0% (1)
Unit 1 and 2. Dfa
43 pages
12.04.dynamic Programming
No ratings yet
12.04.dynamic Programming
97 pages
Flow Through Venturi Meter Lab Report G3
No ratings yet
Flow Through Venturi Meter Lab Report G3
11 pages
Cs Mcq's Mod For Conduct
No ratings yet
Cs Mcq's Mod For Conduct
8 pages
(Join AICTE Telegram Group) 22303 (MOS) Mechanics of Structural
0% (1)
(Join AICTE Telegram Group) 22303 (MOS) Mechanics of Structural
4 pages
How To Create A Cleanse Library
No ratings yet
How To Create A Cleanse Library
9 pages
4-1 Inverse and Direct Variation
No ratings yet
4-1 Inverse and Direct Variation
18 pages
Perl Programming Exercises 1 - 'A B C'
No ratings yet
Perl Programming Exercises 1 - 'A B C'
29 pages
ENGR 2213 Thermodynamics: F. C. Lai School of Aerospace and Mechanical Engineering University of Oklahoma
No ratings yet
ENGR 2213 Thermodynamics: F. C. Lai School of Aerospace and Mechanical Engineering University of Oklahoma
20 pages
SAS Macro
No ratings yet
SAS Macro
7 pages
Btech Ec 503 Control System 2012
No ratings yet
Btech Ec 503 Control System 2012
7 pages
Aristotle's Theory of The Unity of Science
67% (3)
Aristotle's Theory of The Unity of Science
286 pages
CH 08
No ratings yet
CH 08
81 pages
diagnostic test for 3is
No ratings yet
diagnostic test for 3is
4 pages
Ugrd Nsci6201 2313t All Answers DD
No ratings yet
Ugrd Nsci6201 2313t All Answers DD
765 pages
Oklahoma School Testing Program: 2009-2010 Released Items
No ratings yet
Oklahoma School Testing Program: 2009-2010 Released Items
28 pages
Module 1 PPT PDF
No ratings yet
Module 1 PPT PDF
90 pages
JHS Week 1 1 PDF
No ratings yet
JHS Week 1 1 PDF
12 pages
TS Maths Target 10 by 10 (23 - 24)
No ratings yet
TS Maths Target 10 by 10 (23 - 24)
69 pages
Density of Water Lab Conclusion
64% (14)
Density of Water Lab Conclusion
2 pages
Done Classic - Data - Structures - by - D. - Samanta PDF
No ratings yet
Done Classic - Data - Structures - by - D. - Samanta PDF
99 pages
Cs1004 Data Warehousing & Mining Unit 5
No ratings yet
Cs1004 Data Warehousing & Mining Unit 5
10 pages
Mensuration Formula
No ratings yet
Mensuration Formula
8 pages
Screening Sample Test PDF
No ratings yet
Screening Sample Test PDF
9 pages

Data Exploration and Visualization - Laboratory Practical Experiments

Uploaded by

Data Exploration and Visualization - Laboratory Practical Experiments

Uploaded by

Click on Subject/Paper under Semester to enter.

Professional English Discrete Mathematics Environmental Sciences

Database Design and Operating Systems -

Physics for Design and Analysis of Machine Learning -

Security - CW3551 Ethics - GE3791

VARUVN VADIVELAN INSTITUTE OF TECHNOLOGY

DEPARTMENT OF ARTIFICIAL INTELLIGENCE ANDDATA

DATA EXPLORATION AND VISUALIZATION LABORATORY

pip install numpy

Once NumPy is installed, you can use it as follows:

# Creating a DataFrame from a dictionary

# Create a sample dataset

2. Filtering by Multiple Conditions:

Database Design and Operating Systems -

Physics for Design and Analysis of Machine Learning -

Security - CW3551 Ethics - GE3791

You might also like