0% found this document useful (0 votes)

10 views

Vansh

Uploaded by

vanshk970

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views

Vansh

Uploaded by

vanshk970

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

DELHI TECHNOLOGICAL

UNIVERSITY

IT-205n
PRACTICAL FILE

DATA SCIENCE
AND

VISUALIZATION

Submitted By: Submitted to:

Vansh Dr Abhishek Verma

23/IT/169 (Group-2)
3rd Semester
Delhi Technological University
INDEX

S.No Experiment Date Sign

Familiarize with Python software (Fibonacci numbers, sorting a

1 22-08-2024
list of numbers)
Write a program to load a dataset from the UCI repository into
2 Python workspace and print its dimensions. Also, load the 29-08-2024
target or class variable and print its dimensions.
Write a program to clean the data by removing noisy data or
3 05-09-2024
outliers and solving missing value problems.

Write a program to explore different data visualisation

4 12-09-2024
techniques.
Write a program to perform statistical analysis of the data in a
5 given dataset (mean, variance, standard deviation, median, 19-09-2024
mode).
Write a program to perform a classification experiment on a
6 dataset and its target or class variable (Naïve Bayes, Random 26-09-2024
Forest).
Write a program to perform a regression experiment on a
7 03-10-2024
dataset (linear regression).

Write a program to perform a clustering experiment on a

8 24-10-24
dataset (K-means, Hierarchical agglomerative clustering).

Write a program to perform time series analysis for a given

9 24-10-24
dataset.

Write a program to perform association rule mining for a given

10 31-10-24
dataset.

Vassu Yadav (23/IT/173)

PROGRAM 1
Objective: To familiarize with Python software (Fibonacci numbers, sorting a list of
numbers)

Code:
def fibonacci(n):
fib_sequence = [0, 1]
while len(fib_sequence) < n:
fib_sequence.append(fib_sequence[-1] + fib_sequence[-2])
return fib_sequence
n = 10
print(f"First {n} Fibonacci numbers: {fibonacci(n)}")
def bubble_sort(arr):
n = len(arr)
for i in range(n):
for j in range(0, n-i-1):
if arr[j] > arr[j+1]:
arr[j], arr[j+1] = arr[j+1], arr[j]
return arr
arr = [64, 34, 25, 12, 22, 11, 90]
sorted_arr = bubble_sort(arr)
print("Sorted array:", sorted_arr)

Output:

Vassu Yadav (23/IT/173)

PROGRAM 2
Objective: Write a program to load a dataset from UCI repository into Python workspace
and print its dimensions. Also load the target or class variable and print its dimensions.

Code:
import pandas as pd
from ucimlrepo import fetch_ucirepo

rice_dataset = fetch_ucirepo("Rice (Cammeo and Osmancik)")

data = rice_dataset.data.features
target = rice_dataset.data.targets

print("Data dimensions:", data.shape)

print("Target dimensions:", target.shape)

Output:

Vassu Yadav (23/IT/173)

PROGRAM 3
Objective: Write a program to clean the data by removing noisy data or outliers and
solving missing value problem.

Code:
import pandas as pd
from ucimlrepo import fetch_ucirepo

rice_cammeo_and_osmancik = fetch_ucirepo(id=545)
X = rice_cammeo_and_osmancik.data.features
y = rice_cammeo_and_osmancik.data.targets

df = X.copy()
df['Target'] = y
df=df.fillna(df.drop(columns=['Target']).median())

numeric_columns = df.select_dtypes(include=['float64', 'int64']).columns

def remove_outliers_iqr(df, numeric_columns):

Q1 = df[numeric_columns].quantile(0.25)
Q3 = df[numeric_columns].quantile(0.75)
IQR = Q3 - Q1

lower_bound = Q1 - 1.5 * IQR

upper_bound = Q3 + 1.5 * IQR
df_cleaned = df[~((df[numeric_columns] < lower_bound) | (df[numeric_columns] >
upper_bound)).any(axis=1)]
return df_cleaned

df_cleaned = remove_outliers_iqr(df, numeric_columns)

print(f"\nOriginal dataset size: {df.shape[0]}")
print(f"Cleaned dataset size: {df_cleaned.shape[0]}")
Output:

Vassu Yadav (23/IT/173)

PROGRAM 4
Objective: Write a program to explore different data visualization techniques
Code:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
url = "https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data"
column_names = ['sepal_length', 'sepal_width', 'petal_length', 'petal_width',
'class']
dataset = pd.read_csv(url, names=column_names)

plt.figure(figsize=(8, 6))
sns.scatterplot(x='sepal_length', y='sepal_width', hue='class', data=dataset)
plt.title("Scatter Plot of Sepal Length vs Sepal Width")
plt.show()

plt.figure(figsize=(8, 6))
dataset['sepal_length'].hist(bins=20)
plt.title("Histogram of Sepal Length")
plt.xlabel("Sepal Length")
plt.ylabel("Frequency")
plt.show()

plt.figure(figsize=(8, 6))
sns.boxplot(x='class', y='sepal_length', data=dataset)
plt.title("Box Plot of Sepal Length by Class")
plt.show()

sns.pairplot(dataset, hue='class')
plt.show()
Output:

Vassu Yadav (23/IT/173)

Vassu Yadav (23/IT/173)
PROGRAM 5
Objective: Write a program to perform statistical analysis of the data in a given dataset
(mean, variance, standard deviation, median, mode).
Code:
import pandas as pd
from ucimlrepo import fetch_ucirepo

rice_cammeo_and_osmancik = fetch_ucirepo(id=545)
X = rice_cammeo_and_osmancik.data.features
y = rice_cammeo_and_osmancik.data.targets

df = pd.DataFrame(X)
df['Target'] = y

def statistical_analysis(dataframe, feature):

analysis = {}
analysis['Mean'] = dataframe[feature].mean()
analysis['Variance'] = dataframe[feature].var()
analysis['Standard Deviation'] = dataframe[feature].std()
analysis['Median'] = dataframe[feature].median()
analysis['Mode'] = dataframe[feature].mode()[0]
return analysis

feature_to_analyze = 'Area'
stats = statistical_analysis(df, feature_to_analyze)
print(f'Statistical Analysis for {feature_to_analyze}:')
for stat, value in stats.items():
print(f'{stat}: {value}')
Output:

Vassu Yadav (23/IT/173)

PROGRAM 6
Objective: WAP to perform a classification experiment on a dataset and its class variable
(Naïve Bayes, Random Forest).
Code:
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.naive_bayes import GaussianNB
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, classification_report,
confusion_matrix
from sklearn.datasets import load_iris
import numpy as np

def load_data():
data = load_iris()
X = data.data
y = data.target
return X, y

def naive_bayes_classification(X, y):

nb_model = GaussianNB()
scores = cross_val_score(nb_model, X, y, cv=5)

print("\nNaive Bayes Classifier Results (5-Fold CV):")

print(f"Mean Accuracy: {np.mean(scores)}")
print("Accuracy per Fold:", scores)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,

random_state=42)
nb_model.fit(X_train, y_train)
y_pred = nb_model.predict(X_test)

print("Classification Report:\n", classification_report(y_test, y_pred))

print("Confusion Matrix:\n", confusion_matrix(y_test, y_pred))

def random_forest_classification(X, y):

# Initialize the Random Forest classifier
rf_model = RandomForestClassifier(n_estimators=100, random_state=42)
scores = cross_val_score(rf_model, X, y, cv=5) # 5-fold cross-validation

print("\nRandom Forest Classifier Results (5-Fold CV):")

print(f"Mean Accuracy: {np.mean(scores)}")
print("Accuracy per Fold:", scores)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,

random_state=42)
rf_model.fit(X_train, y_train)
y_pred = rf_model.predict(X_test)

print("Classification Report:\n", classification_report(y_test, y_pred))

print("Confusion Matrix:\n", confusion_matrix(y_test, y_pred))

def perform_classification_experiment():
X, y = load_data()
naive_bayes_classification(X, y)
random_forest_classification(X, y)

Vassu Yadav (23/IT/173)

perform_classification_experiment()

Output:

Vassu Yadav (23/IT/173)

PROGRAM 7

Objective: WAP to perform a regression experiment on a dataset (linear regression).

Code:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.datasets import fetch_california_housing

data = fetch_california_housing()
X = pd.DataFrame(data.data, columns=data.feature_names)
y = pd.Series(data.target)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,

random_state=42)

model = LinearRegression()
model.fit(X_train, y_train)

y_pred = model.predict(X_test)

mse = mean_squared_error(y_test, y_pred)

r2 = r2_score(y_test, y_pred)

print("Mean Squared Error:", mse)

print("R^2 Score:", r2)
Output:

Vassu Yadav (23/IT/173)

PROGRAM 8

Objective: WAP to perform a clustering experiment on a dataset (K- means, Hierarchical

agglomerative clustering).

Code:
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans, AgglomerativeClustering
from sklearn.datasets import load_iris
from sklearn.preprocessing import StandardScaler
from scipy.cluster.hierarchy import dendrogram, linkage

data = load_iris()
X = pd.DataFrame(data.data, columns=data.feature_names)
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

kmeans = KMeans(n_clusters=3, random_state=42)

kmeans_labels = kmeans.fit_predict(X_scaled)
hac = AgglomerativeClustering(n_clusters=3, metric='euclidean', linkage='ward')
hac_labels = hac.fit_predict(X_scaled)

plt.figure(figsize=(12, 5))
plt.subplot(1, 2, 1)
plt.scatter(X_scaled[:, 0], X_scaled[:, 1], c=kmeans_labels, cmap='viridis',
marker='o')
plt.title("K-means Clustering")
plt.xlabel(data.feature_names[0])
plt.ylabel(data.feature_names[1])
plt.subplot(1, 2, 2)
Z = linkage(X_scaled, method='ward')
dendrogram(Z)
plt.title("Hierarchical Agglomerative Clustering (Dendrogram)")
plt.xlabel("Sample Index")
plt.ylabel("Distance")
plt.tight_layout()
plt.show()
Output:

Vassu Yadav (23/IT/173)

PROGRAM 9

Objective: Write a program to perform time series analysis for a given dataset.
Code:
import pandas as pd
import matplotlib.pyplot as plt
from statsmodels.tsa.seasonal import seasonal_decompose
from statsmodels.tsa.holtwinters import ExponentialSmoothing

# Step 1: Load the Household Power Consumption dataset from UCI repository
url = 'https://archive.ics.uci.edu/ml/machine-learning-
databases/00235/household_power_consumption.zip'
data = pd.read_csv(url, sep=';', parse_dates={'DateTime': ['Date', 'Time']},
infer_datetime_format=True, na_values=['?'], low_memory=False)

# Convert to datetime index and clean data

data.set_index('DateTime', inplace=True)
data = data[['Global_active_power']].astype(float)
data.dropna(inplace=True)

# Resample to daily frequency

data = data.resample('D').mean()

# Step 2: Plot the time series data

plt.figure(figsize=(12, 6))
plt.plot(data, label="Global Active Power")
plt.title("Global Active Power Consumption")
plt.xlabel("Date")
plt.ylabel("Power (kW)")
plt.legend()
plt.show()

# Step 3: Decompose the time series to observe trend and seasonality

decomposition = seasonal_decompose(data, model='additive', period=365) # Assumes
yearly seasonality
decomposition.plot()
plt.show()

# Step 4: Forecast using Exponential Smoothing

model = ExponentialSmoothing(data, trend="add", seasonal="add",
seasonal_periods=365)
model_fit = model.fit()

# Forecast the next 30 days

forecast = model_fit.forecast(steps=30)

# Plot the forecasted values

plt.figure(figsize=(12, 6))
plt.plot(data, label="Historical Data")
plt.plot(forecast, label="Forecast", color="red")
plt.title("Forecast of Global Active Power Consumption")
plt.xlabel("Date")
plt.ylabel("Power (kW)")
plt.legend()
plt.show()

Vassu Yadav (23/IT/173)

Output:

Vassu Yadav (23/IT/173)

PROGRAM 10

Objective: Write a program to perform association rule mining for a given dataset.
Code:
import pandas as pd
from mlxtend.frequent_patterns import fpgrowth, association_rules

# Sample transactional dataset (or replace with your own dataset)

data = {'Milk': [1, 1, 0, 1, 0],
'Bread': [1, 0, 1, 1, 1],
'Butter': [0, 1, 1, 0, 1],
'Cheese': [1, 0, 1, 1, 0],
'Eggs': [0, 1, 1, 0, 1]}

# Convert the dictionary into a DataFrame

df = pd.DataFrame(data)

# Step 1: Generate frequent itemsets using FP-Growth algorithm

frequent_itemsets = fpgrowth(df, min_support=0.6, use_colnames=True)

# Step 2: Generate association rules

rules = association_rules(frequent_itemsets, metric="lift", min_threshold=1.2)

# Display the results

print("Frequent Itemsets using FP-Growth:")
print(frequent_itemsets)
print("\nAssociation Rules:")
print(rules)plt.show()
Output:

Vassu Yadav (23/IT/173)

Secrets of The Zodiac
100% (10)
Secrets of The Zodiac
17 pages
Syllabus AIML
No ratings yet
Syllabus AIML
14 pages
B.Tech.AIDS R 2021
No ratings yet
B.Tech.AIDS R 2021
31 pages
Skill Based Projects - Data - Science (See List On Last Page)
No ratings yet
Skill Based Projects - Data - Science (See List On Last Page)
4 pages
DSP U2
No ratings yet
DSP U2
172 pages
DSP U1
No ratings yet
DSP U1
89 pages
Datascience
No ratings yet
Datascience
8 pages
PDS Practical
No ratings yet
PDS Practical
94 pages
20CAI213 DATA SCIENCE LABORATORY Manual 2024
No ratings yet
20CAI213 DATA SCIENCE LABORATORY Manual 2024
61 pages
vamshi ml-1,2
No ratings yet
vamshi ml-1,2
25 pages
DSBDA Lab Manual
No ratings yet
DSBDA Lab Manual
167 pages
PDS Practical
No ratings yet
PDS Practical
94 pages
Sample Practical File
No ratings yet
Sample Practical File
10 pages
Data Analyst Nanodegree Program - Syllabus
50% (2)
Data Analyst Nanodegree Program - Syllabus
7 pages
Xii Ip Study Material
No ratings yet
Xii Ip Study Material
92 pages
ip kvs
No ratings yet
ip kvs
92 pages
Ip Practical File 2
No ratings yet
Ip Practical File 2
30 pages
PDS Practical
No ratings yet
PDS Practical
94 pages
Data Analysis Lab - Final - 23-24
No ratings yet
Data Analysis Lab - Final - 23-24
11 pages
Data Science Lab Manual
No ratings yet
Data Science Lab Manual
85 pages
3rd EXPERIMENT
No ratings yet
3rd EXPERIMENT
13 pages
Ldddgi
No ratings yet
Ldddgi
16 pages
University Institute of Engineering Department of Computer Science & Engineering
No ratings yet
University Institute of Engineering Department of Computer Science & Engineering
11 pages
fds-fundamentals-of-data-science-laboratory
No ratings yet
fds-fundamentals-of-data-science-laboratory
53 pages
DS&BD Lab Manul
No ratings yet
DS&BD Lab Manul
98 pages
syllabus sem 6
No ratings yet
syllabus sem 6
6 pages
Edit Ds
No ratings yet
Edit Ds
37 pages
ISL56 Python Lab - EXAM-FINAL-QB
No ratings yet
ISL56 Python Lab - EXAM-FINAL-QB
4 pages
SL-III Lab Manual
No ratings yet
SL-III Lab Manual
74 pages
Dsbda Lab Manual
No ratings yet
Dsbda Lab Manual
167 pages
Student Support Material For All Students - Class - XII - IP - 0
No ratings yet
Student Support Material For All Students - Class - XII - IP - 0
173 pages
IDS_PW (1)
No ratings yet
IDS_PW (1)
10 pages
30 Data Science Minor
No ratings yet
30 Data Science Minor
18 pages
1DS21ET046
No ratings yet
1DS21ET046
9 pages
Nd002 Syllabus 2018 June v9
No ratings yet
Nd002 Syllabus 2018 June v9
5 pages
xslx.
No ratings yet
xslx.
4 pages
XII Informatics Practices
No ratings yet
XII Informatics Practices
147 pages
AI Record 2024
No ratings yet
AI Record 2024
19 pages
ML[1]
No ratings yet
ML[1]
49 pages
ML File Fnail Merged
No ratings yet
ML File Fnail Merged
82 pages
Cs3353 Foundations of Data Science L T P C 3 0 0 3
No ratings yet
Cs3353 Foundations of Data Science L T P C 3 0 0 3
2 pages
Write a python program to implement the arithmetic operations for the following
No ratings yet
Write a python program to implement the arithmetic operations for the following
34 pages
Informatics Practices
No ratings yet
Informatics Practices
9 pages
Udacity Dandsyllabus
No ratings yet
Udacity Dandsyllabus
7 pages
Dand Syllabus v7 Terms 1
No ratings yet
Dand Syllabus v7 Terms 1
6 pages
1152CS239-Intro. To Data Science-Syllabus
No ratings yet
1152CS239-Intro. To Data Science-Syllabus
6 pages
DS-DS Lab-1
No ratings yet
DS-DS Lab-1
4 pages
Python Programming Changing
No ratings yet
Python Programming Changing
3 pages
IDP Assignment - 4 - 5 (Saswat Mohanty - 1941012407 - CSE-D)
No ratings yet
IDP Assignment - 4 - 5 (Saswat Mohanty - 1941012407 - CSE-D)
15 pages
6C - Data Science - Syllabus - 01
No ratings yet
6C - Data Science - Syllabus - 01
4 pages
DATASCIENCE (1)
No ratings yet
DATASCIENCE (1)
3 pages
CS3352 FDS
No ratings yet
CS3352 FDS
23 pages
List of Experiments - CL-I
No ratings yet
List of Experiments - CL-I
3 pages
Core Java Programming Book
From Everand
Core Java Programming Book
Manish Soni
No ratings yet
Python Advanced Programming: The Guide to Learn Python Programming. Reference with Exercises and Samples About Dynamical Programming, Multithreading, Multiprocessing, Debugging, Testing and More
From Everand
Python Advanced Programming: The Guide to Learn Python Programming. Reference with Exercises and Samples About Dynamical Programming, Multithreading, Multiprocessing, Debugging, Testing and More
Marcus Richards
No ratings yet
Computer Engineering Laboratory Solution Primer
From Everand
Computer Engineering Laboratory Solution Primer
Karan Bhandari
No ratings yet
DEEP LEARNING TECHNIQUES: CLUSTER ANALYSIS and PATTERN RECOGNITION with NEURAL NETWORKS. Examples with MATLAB
From Everand
DEEP LEARNING TECHNIQUES: CLUSTER ANALYSIS and PATTERN RECOGNITION with NEURAL NETWORKS. Examples with MATLAB
César Pérez López
No ratings yet
Oracle Certified Professional Java Programmer OCPJP 1Z0 809
From Everand
Oracle Certified Professional Java Programmer OCPJP 1Z0 809
Manish Soni
No ratings yet
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
César Pérez López
No ratings yet
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
From Everand
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
César Pérez López
No ratings yet
PHP Package Mastery: 100 Essential Tools in One Hour - 2024 Edition
From Everand
PHP Package Mastery: 100 Essential Tools in One Hour - 2024 Edition
Kanto
No ratings yet
Data Sheet Tic 253 N
No ratings yet
Data Sheet Tic 253 N
5 pages
Code-Switching Attitudes and Practices and The Student'S Academic Achievement in English Research Methodology
No ratings yet
Code-Switching Attitudes and Practices and The Student'S Academic Achievement in English Research Methodology
10 pages
Akshay DBpedia GSoC 2017 Proposal
No ratings yet
Akshay DBpedia GSoC 2017 Proposal
12 pages
Red Hat Enterprise Linux-6-Storage Administration Guide-En-US
No ratings yet
Red Hat Enterprise Linux-6-Storage Administration Guide-En-US
224 pages
Photogate Timer PDF
100% (1)
Photogate Timer PDF
46 pages
CH 19
No ratings yet
CH 19
17 pages
ZTA in Cloud Networks Application Challenges and Future Opportunities
No ratings yet
ZTA in Cloud Networks Application Challenges and Future Opportunities
15 pages
GXTP540HG10V-182MM-[G-B]
No ratings yet
GXTP540HG10V-182MM-[G-B]
2 pages
MSP432™ MCUs
No ratings yet
MSP432™ MCUs
64 pages
Ultrastructure of A Bacterial Cell
No ratings yet
Ultrastructure of A Bacterial Cell
34 pages
ABAP Code (BPC 10 NW) To Read An Application's Transaction Data Within The BAdI
No ratings yet
ABAP Code (BPC 10 NW) To Read An Application's Transaction Data Within The BAdI
4 pages
Eigenfaces Khuang
No ratings yet
Eigenfaces Khuang
13 pages
7 Atomic Software Component
No ratings yet
7 Atomic Software Component
13 pages
Page Object Model (POM) Important Questions
No ratings yet
Page Object Model (POM) Important Questions
6 pages
PGRRCDE MCA AI Assignmensts 1 and 2
No ratings yet
PGRRCDE MCA AI Assignmensts 1 and 2
2 pages
12.3 Halogens
No ratings yet
12.3 Halogens
16 pages
(2013) Key computational issues in ICME
No ratings yet
(2013) Key computational issues in ICME
22 pages
Year 7 Topic Test
No ratings yet
Year 7 Topic Test
9 pages
TGT-P-H01-RP-0002 Rev.0 PDF
100% (1)
TGT-P-H01-RP-0002 Rev.0 PDF
41 pages
1 s2.0 S0255270107000451 Main
No ratings yet
1 s2.0 S0255270107000451 Main
6 pages
Multiply Using Expanded Form: Ccss - Math.Content.4.Nbt.B.5
No ratings yet
Multiply Using Expanded Form: Ccss - Math.Content.4.Nbt.B.5
3 pages
Syllabus SSC
No ratings yet
Syllabus SSC
1 page
Erd 2
No ratings yet
Erd 2
18 pages
Memory Mapping S2 Series
No ratings yet
Memory Mapping S2 Series
12 pages
Experiment To Test The Presence of Carbohydrate
100% (1)
Experiment To Test The Presence of Carbohydrate
3 pages
Calculating Equivalent Resistance, Total Current and Total
100% (1)
Calculating Equivalent Resistance, Total Current and Total
39 pages
University of Gondar
100% (7)
University of Gondar
107 pages
Nemes Commission Checklist (1)
No ratings yet
Nemes Commission Checklist (1)
3 pages
Pwa Roads & Drainage Cad Standards Manual Ver 5.0
No ratings yet
Pwa Roads & Drainage Cad Standards Manual Ver 5.0
208 pages

Vansh

Uploaded by

Vansh

Uploaded by

DELHI TECHNOLOGICAL

Submitted By: Submitted to:

Vansh Dr Abhishek Verma

S.No Experiment Date Sign

Familiarize with Python software (Fibonacci numbers, sorting a

Write a program to explore different data visualisation

Write a program to perform a clustering experiment on a

Write a program to perform time series analysis for a given

Write a program to perform association rule mining for a given

Vassu Yadav (23/IT/173)

Vassu Yadav (23/IT/173)

rice_dataset = fetch_ucirepo("Rice (Cammeo and Osmancik)")

print("Data dimensions:", data.shape)

Vassu Yadav (23/IT/173)

numeric_columns = df.select_dtypes(include=['float64', 'int64']).columns

def remove_outliers_iqr(df, numeric_columns):

lower_bound = Q1 - 1.5 * IQR

df_cleaned = remove_outliers_iqr(df, numeric_columns)

Vassu Yadav (23/IT/173)

Vassu Yadav (23/IT/173)

def statistical_analysis(dataframe, feature):

Vassu Yadav (23/IT/173)

def naive_bayes_classification(X, y):

print("\nNaive Bayes Classifier Results (5-Fold CV):")

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,

print("Classification Report:\n", classification_report(y_test, y_pred))

def random_forest_classification(X, y):

print("\nRandom Forest Classifier Results (5-Fold CV):")

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,

print("Classification Report:\n", classification_report(y_test, y_pred))

Vassu Yadav (23/IT/173)

Vassu Yadav (23/IT/173)

Objective: WAP to perform a regression experiment on a dataset (linear regression).

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,

mse = mean_squared_error(y_test, y_pred)

print("Mean Squared Error:", mse)

Vassu Yadav (23/IT/173)

Objective: WAP to perform a clustering experiment on a dataset (K- means, Hierarchical

kmeans = KMeans(n_clusters=3, random_state=42)

Vassu Yadav (23/IT/173)

# Convert to datetime index and clean data

# Resample to daily frequency

# Step 2: Plot the time series data

# Step 3: Decompose the time series to observe trend and seasonality

# Step 4: Forecast using Exponential Smoothing

# Forecast the next 30 days

# Plot the forecasted values

Vassu Yadav (23/IT/173)

Vassu Yadav (23/IT/173)

# Sample transactional dataset (or replace with your own dataset)

# Convert the dictionary into a DataFrame

# Step 1: Generate frequent itemsets using FP-Growth algorithm

# Step 2: Generate association rules

# Display the results

Vassu Yadav (23/IT/173)

You might also like