dev record final (3)
dev record final (3)
ACADEMIC YEAR(2024-2025)
REGULATION-2021
Register No. :
Year/Semester/Sec :
II/III/B
1
BONAFIDE CERTIFICATE
semester.
2
AD3301–DATA EXPLORATION AND VISUALIZATION LABORATORY
INDEX
3
Ex 1.
Install the data Analysis and Visualization tool: R/ Python /Power BI.
DATE:
Aim:
Algorithm:
1. Installation:
4
dframe
3. Importing Data with Pandas
6. Installation
pip install matplotlib
7. Pandas Plotting
# import the required module
import matplotlib.pyplot as plt #
plot a histogram
df['Observation Value'].hist(bins=10)
# shows presence of a lot of outliers/extreme values
df.boxplot(column='Observation Value', by = 'Time period')# plotting
points as a scatter plot
x = df["Observation Value"]y =
df["Time period"]
plt.scatter(x, y, label= "stars", color= "m",
marker= "*", s=30)
# x-axis label
plt.xlabel('Observation Value')#
frequency label plt.ylabel('Time
period')
# function to show the plot
plt.show()
5
Output:
Result:
Thus the program to Install the data Analysis and Visualization tool using python has been
executed successfully.
6
Ex 2. Perform exploratory data analysis (EDA) on with datasets
like email data set. Export all your emails as a dataset,
DATE:
import them inside a pandas data frame, visualize them and
get different insights from the data.
AIM:
To perform exploratory data analysis (EDA) on with datasets like email data set. Export all
your emails as a dataset, import them inside a pandas data frame, visualize them and get different
insights from the data.
ALGORITHM:
PROGRAM:
import numpy as np
import pandas as pd
drive.mount('/content/gdrive')
import mailbox
mbox = mailbox.mbox(mboxfile)
mbox
7
print(key)
import csv
writer = csv.writer(outputfile)
writer.writerow(['subject','from','date','to','label','thread'])
dfs.dtypes
dfs = dfs[dfs['date'].notna()]
dfs.to_csv('gmail.csv')
dfs.info()
dfs.head(10)
dfs.columns
def extract_email_ID(string):
if not email:
myemail = '[email protected]'
dfs.drop(columns='to', inplace=True)
8
dfs.head(10)
import datetime
import pytz
def refactor_timezone(x):
est = pytz.timezone('US/Eastern')
return x.astimezone(est)
dfs.index = dfs['date']
del dfs['date']
print(dfs['label'].value_counts())
est = pytz.timezone('US/Eastern')
9
df[~ind].plot.scatter('year', 'timeofday', s=s, alpha=0.6, ax=ax, color=color)
ax.set_ylim(0, 24)
ax.yaxis.set_major_locator(MaxNLocator(8))
ax.set_xlabel('')
ax.set_ylabel('')
ax.set_title(title)
ax.grid(ls=':', color='k')
return ax
sent = dfs[dfs['label']=='sent']
received = dfs[dfs['label']=='inbox']
year = df[df['year'].notna()]['year'].values
T = year.max() - year.min()
ax.grid(ls=':', color='k')
10
def plot_number_perdhour_per_year(df, ax, label=None, dt=1, smooth=False,
weight_fun=None, **plot_kwargs):
tod = df[df['timeofday'].notna()]['timeofday'].values
year = df[df['year'].notna()]['year'].values
Ty = year.max() - year.min()
T = tod.max() - tod.min()
if weight_fun is None:
else:
weights = weight_fun(df)
if smooth:
hst = f(x)
else:
ax.grid(ls=':', color='k')
orientation = plot_kwargs.get('orientation')
ax.set_xlim(0, 24)
11
ax.xaxis.set_major_locator(MaxNLocator(8))
for ts in ax.get_xticks()]);
ax.set_ylim(0, 24)
ax.yaxis.set_major_locator(MaxNLocator(8))
for ts in ax.get_yticks()]);
class TriplePlot:
def __init__(self):
gs = gridspec.GridSpec(6, 6)
plt.setp(self.ax2.get_yticklabels(), visible=False);
plt.setp(self.ax3.get_xticklabels(), visible=False);
12
tpl.plot(dfs[dfs['from'] == addr], color=colors[ct], alpha=0.3, yr_bin=0.5, markersize=1.0)
plt.xlabel('');
import scipy.ndimage
plt.figure(figsize=(8,5))
ax = plt.subplot(111)
df_r = received[received['dayofweek']==dow]
df_s = sent[sent['dayofweek']==dow]
13
plt.legend(loc='upper left')
OUTPUT:
sub thr
from date label
ject ead
2019-09-
New Books: The Python 20 inb 1645216686
1 [email protected]
Journeyman + Understandi... 14:07:05 ox 186738105
+00:00
2019-09-
iPhone 11 Pro og iPhone 11 News_Europe@Inside 20 inb 1645190169
2
er her Apple.Apple.com 10:33:27 ox 696380553
+00:00
2019-09-
=?utf-
support@totebagfactor 20 inb 1645209548
3 8?Q?Save=20on=20Burlap=2
y.com 15:32:31 ox 975264659
0Bags=20Today=21...
+00:00
2019-09-
Hi there, looking for the best [email protected] 17 inb 1644916038
4
Dashain deals? ... .np 06:19:10 ox 153843699
+00:00
2019-09-
The file =?UTF-
20 inb 1645222431
5 8?B?J0JyYW5kX0Jvb2sgdG [email protected]
19:04:16 ox 795507661
VzdC5wZGY...
+00:00
14
15
RESULT:
Thus the program tp perform exploratory data analysis (EDA) on with datasets like
email data set and exporting all your emails as a dataset, import them inside a pandas data
frame, visualize them and get different insights from the data was executed and
implemented successfully.
16
Ex 3. Working with Nupy arrays, Pandas data frames, Basic plots using
DATE: Matplotlib.Numpy arrays using matplotlib
Aim:
To Work with Numpy arrays, Pandas data frames, Basic plots using Matplotlib.
Numpy arrays using matplotlib
Algorithm:
import numpy as np
from matplotlib import pyplot as plt
x = np.arange(1,11)
y=2*x+5
plt.title("Matplotlib demo")
plt.xlabel("x axis caption")
plt.ylabel("y axis caption")
plt.plot(x,y)
plt.show()
Output:
Result:
17
3(b)Pandas dataframe using matplotlib
import pandas as pd
Basic plots
18
Result:
Thus the given program to work with Numpy arrays, Pandas data frames, Basic
plots using Matplotlib,Numpy arrays using matplotlib has been executed successfully.
19
Ex 4. Explore various variable and row filters in python for cleaning data.
Apply various plot features in python on sample data sets and visualize
DATE:
Aim:
To Explore various variable and row filters in python for cleaning data. Apply various plot
features in python on sample data sets and visualize.
Algorithm:
Program:
import pandas as pd
import numpy as np
print df
Output:
20
Program :( Checking duplicate)
import pandas as pd
import numpy as np
print df['one'].isnull()
Output:
a False
b True
c False
d True
e False
f False
g True
h False
Name: one, dtype: bool
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randn(3, 3), index=['a', 'c', 'e'],columns=['one',
'two', 'three'])
df = df.reindex(['a', 'b', 'c'])
print df
print ("NaN replaced with '0':")
print df.fillna(0)
Output:
Output:
import pandas as pd
import numpy as np
df = pd.DataFrame({'one':[10,20,30,40,50,2000],
'two':[1000,0,30,40,50,60]})
print df.replace({1000:10,2000:60})
Output:
one two
0 10 10
1 20 0
2 30 30
3 40 40
4 50 50
5 60 60
Result:
Thus the given program explore various variable and row filters in python for cleaning
data and applying various plot features in python has been executed successfully.
22
Ex 5. Perform Time Series Analysis and apply the various visualization
techniques.
DATE:
Aim:
To perform time series analysis and apply the various visualization techniques.
Algorithm:
Program:
23
# Time series data source: fpp pacakge in R.
import matplotlib.pyplot as plt
df=pd.read_csv('https://raw.githubusercontent.com/selva86/datasets/master/a10.csv',
parse_dates=['date'], index_col='date')
# Draw Plot
def plot_df(df, x, y, title="", xlabel='Date', ylabel='Value', dpi=100):
plt.figure(figsize=(16,5), dpi=dpi)
plt.plot(x, y, color='tab:red')
plt.gca().set(title=title, xlabel=xlabel, ylabel=ylabel)
plt.show()
plot_df(df, x=df.index, y=df.value, title='Monthly anti-diabetic drug sales in
Australia from 1992 to 2008.')
Output:
Result:
Thus the program to to perform time series analysis and apply the various visualization
techniques is executed successfully.
24
Ex 6. Perform Data Analysis and representation on a Map using variousMap data sets with
Mouse Rollover effect, user interaction, etc..
DATE:
Aim:
To Perform Data Analysis and representation on a Map usingvarious Map data sets with
Mouse Rollover effect, user interaction.
Algorithm:
Program:
Import folium
import pandas as
pd
data = pd.read_csv("your_data.csv")
# Create a map centered on a specific location
m = folium.Map(location=[45.523, -122.675], zoom_start=13)
# Perform some analysis on your data
# e.g. group by and count the data points in each location
location_data = data.groupby(['lat','lon']).count()
location for i in
25
range(0,len(location_data)):
folium.Marker(
location= [location_data.iloc[i].name[0],
location_data.iloc[i].name[1]], popup=f'Count:
{location_data.iloc[i][0]}', icon=folium.Icon(color='red')
).add_to(m)
for i in
range(0,len(location_data)):
folium.Marker(
location= [location_data.iloc[i].name[0],
location_data.iloc[i].name[1]], popup=f'Count:
{location_data.iloc[i][0]}', icon=folium.Icon(color='red')
).add_child(folium.Popup("Additional Info")).add_to(m)
# Add a button to the map to allow the user to toggle the visibility
# of the marker
folium.LayerControl().add_to(m)
# Display the
map m
26
Output:
Result:
Thus, the program to perform data analysis and representation on a map using various
map data sets with mouse rollover effect, user interaction, etc.. is written and executed
successfully.
27
Ex 7. Build cartographic visualization for multiple datasets involving
DATE: various countries of the worldstates and districts in India etc.
Aim:
Algorithm:
1.Collecting and cleaning the data: This would likely involve using libraries such as Pandas to
import and manipulate the data, and performing tasks such as removing missing values and
ensuring that the data is in the correct format.
2.Mapping the data: Once the data is cleaned, you would need to use a library such as Folium or
Plotly to create the actual map and overlay the data on top of it. You may also need to use
shapefiles to define the boundaries of countries, states, and districts.
3.Styling the map: After the data is mapped, you would likely want to customize the appearance
of the map to make it more visually appealing. This might involve using functions to change the
colors of the map, add labels, and create interactive elements such as hover-over text.
4.Exporting the map: Finally, you would likely want to export the map in a format that can be
easily shared or embedded on a website. This might involve using libraries such as Matplotlib or
Seaborn to save the map as an image, or using libraries such as Plotly to create an interactive
map that can be embedded in a webpage.
28
Program:
Output:
Result:
Thus the given program to build cartographic visualization for multiple datasets involving
various countries of the worldstates and districts in India has been executed successfully.
29
Ex 8. Perform EDA on Wine Quality Data Set
DATE:
Aim:
Algorithm:
Program:
30
Output:
Result:
Thus the given program to to Perform EDA on Wine Quality Data Set is executed successfully.
31
Ex 9. Use a case study on a data set and apply the various EDA and
visualization techniques andpresent an analysis report
DATE:
Aim:
To use a case study on a data set and apply the various EDA and visualization
techniques andpresent an analysis report
Algorithm:
1. Install the necessary packages.
2. Create a data set of date and price.
3. Fix the start date and end date.
4. Generate a price corresponding to the dates.
5. List the data by columns.
6. Group the columns by date.
7. Display the data .
8. Show the plot.
9. Present an analysis report.
Program:
import datetime
import math
import pandas as pd import
random import radar
from faker import Faker fake =
Faker()
def generateData(n): listdata
= []
start = datetime.datetime(2019, 8, 1)
end = datetime.datetime(2019, 8, 30) delta =
end - start
for _ in range(n):
date = radar.random_datetime(start='2019-08-1', stop='2019-08-
30').strftime("%Y-%m-%d")
price = round(random.uniform(900, 1000), 4)
32
Date Price
2019-08-01 999.598900
2019-08-02 957.870150
2019-08-04 978.674200
2019-08-05 963.380375
2019-08-06 978.092900
2019-08-07 987.847700
2019-08-08 952.669900
2019-08-10 973.929400
listdata.append([date, price])
df = pd.DataFrame(listdata, columns = ['Date', 'Price']) df['Date']
= pd.to_datetime(df['Date'], format='%Y-%m-%d') df =
df.groupby(by='Date').mean()
import matplotlib.pyplot as plt
plt.rcParams['figure.figsize'] =(14,10)
plt.plot(df)
Output:
Result:
Thus the given program to apply the various EDA and visualization techniques and
presenting an analysis report has been executed successfully.
33
34