0% found this document useful (0 votes)

15 views

Data Mining & Data Science Practical Slips

Data Mining & Data Science Practical Slips (1)

Uploaded by

ag8411877

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views

Data Mining & Data Science Practical Slips

Data Mining & Data Science Practical Slips (1)

Uploaded by

ag8411877

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 45

Slip 1

Q.1 Write a R program to calculate the multiplication table using a function.

[15]
Solution:-

multiplication_table <- function(n) {

table <- matrix(0, nrow = n, ncol = n)
for (i in 1:10) {
cat(i*n,"\n")
}
}
n <- as.integer(readline("Enter the number: "))
cat("Multiplication Table : of ", n,":\n")
multiplication_table(n)

Q.2 Write a python program the Categorical values in numeric format for a
given dataset.
[15]

Solution:-
import pandas as pd
from sklearn.preprocessing import LabelEncoder
# Sample dataset

1
data = {
'Category': ['A', 'B', 'A', 'C', 'B', 'A']
}
# Creating a DataFrame
df = pd.DataFrame(data)
# Initialize the LabelEncoder
label_encoder = LabelEncoder()
# Apply label encoding to the 'Category' column
df ['Category_encoded'] = label_encoder.fit_transform(df
['Category'])
print(df)

Slip 2
Q.1 Consider the student data set It can be downloaded from:
https://drive.google.com/open?id=1oakZCv7g3mlmCSdv9J8kdSaqO5_6dIOw
Write a programme in python to apply simple linear regression and find out
mean
absolute error, mean squared error and root mean squared error.
[15]

Solution:-

import numpy as nm
import pandas as pd
2
data_set= pd.read_csv('student_scores.csv')
print(data_set)
y = data_set['Scores'].values.reshape(-1, 1)
X = data_set['Hours'].values.reshape(-1, 1)
print(X)
print(y)
print(X.shape)
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size =
0.2)
print(X_train)
print(X_test)
from sklearn.linear_model import LinearRegression
regressor = LinearRegression()
regressor.fit(X_train, y_train)
print(regressor.intercept_)
print(regressor.coef_)
score = regressor.predict([[9.5]])
print(score)
y_pred = regressor.predict(X_test)
print(y_pred)
from sklearn.metrics import mean_absolute_error,
mean_squared_error

3
mae = mean_absolute_error(y_test, y_pred)
mse = mean_squared_error(y_test, y_pred)
rmse = nm.sqrt(mse)
print(mae)
print(mse)
print(rmse)
print('Actual',y_test)
print('Predicted',y_pred)

Q.2 Write a R program to reverse a number and also calculate the sum of
digits of that
number. [15]

Solution:-
x = as.integer(readline("Enter any number:- "))
temp=x
rev=0
while(temp>0)
{
rem = temp%%10
rev=(rev*10)+rem
temp=floor(temp/10)
}

4
cat("Reverse of number is ",rev)
sum=0
while(x>0)
{
rem = x%%10
sum=sum+rem
x=floor(x/10)
}
cat("Sum of digits of the number is ",sum)

Slip 3
Q.1 Write a python program the Categorical values in numeric format for a
given dataset.
[15]
Solution:-

import pandas as pd
from sklearn.preprocessing import LabelEncoder
# Sample dataset
data = {
'Category': ['A', 'B', 'A', 'C', 'B', 'A']
}
# Creating a DataFrame
df = pd.DataFrame(data)

5
# Initialize the LabelEncoder
label_encoder = LabelEncoder()
# Apply label encoding to the 'Category' column
df ['Category_encoded'] = label_encoder.fit_transform(df
['Category'])
print(df)

Q.2 Write a R program to create a data frame using two given vectors and
display the
duplicate elements [15]

Solution:-
vector1 <- c(1,2,3,4,5,6,7,8,6,4)
vector2 <- c(1, 'B', 'C', 'D', 'E', 'D', 'F', 'G',2,3)
data<-data.frame(vector1,vector2)
duplicates =
data[duplicated(data$vector1)|duplicated(data$vector1,fromLast
=TRUE),]
cat("Original Data Frame:\n")
print(data)
cat("\nDuplicate Elements:\n")
print(duplicates)

Slip 4

6
Q.1 Write a R program to calculate the multiplication table using a function.
[15]

Solution:-

multiplication_table <- function(n) {

table <- matrix(0, nrow = n, ncol = n)

for (i in 1:10) {
cat(i*n,"\n")
}

n <- as.integer(readline("Enter the number: "))

cat("Multiplication Table : of ", n,":\n")

multiplication_table(n)

Q.2 Consider following dataset

7
weather=['Sunny','Sunny','Overcast','Rainy','Rainy','Rainy','Overcast','S
unny','Sunny','Rainy','Sunny','Overcast','Overcast','Rainy']
temp=['Hot','Hot','Hot','Mild','Cool','Cool','Cool','Mild','Cool','Mild','Mi
ld','Mild','Hot','Mild']
play=['No','No','Yes','Yes','Yes','No','Yes','No','Yes','Yes','Yes','Yes','Y
es','No']. Use Naïve Bayes algorithm to predict[ 0:Overcast, 2:Mild]
tuple belongs to which class whether to play the sports or not.
[15]

Solution:-

weather=['sunny','sunny','overcast','rainy','rainy','rainy','overcast','sunny'
,'sunny','rainy','su
nny','overcast','overcast','rainy']
temp=['hot','hot','hot','mild','cool','cool','cool','mild','cool','mild','mild','mi
ld','hot','mild']
play=['No','No','Yes','Yes','Yes','No','Yes','No','Yes','Yes','Yes','Yes','Yes',
'No']
from sklearn import preprocessing
le = preprocessing.LabelEncoder()
wheather_encoded = le.fit_transform(weather)
print(wheather_encoded)
temp_encoded = le.fit_transform(temp)
label = le.fit_transform(play)
print("Temp:",temp_encoded)
print("Play:",label)
features = list(zip(wheather_encoded,temp_encoded))
print(features)
from sklearn.naive_bayes import GaussianNB
model = GaussianNB()
model.fit(features,label)
predicted = model.predict([[0,2]])
print("Predicted Value:",predicted)

Slip 5
Q.1 Write a python program to find all null values in a given data set
and remove them.
(Download dataset from github.com)
[15]

8
Solution:-
/* For this copy and paste diabetes dataset in same folder (not in
jupyter folder) , delete 2 or 3
values where 0 is written (means now it becomes null values) ,
rename it as diabetes_null_values and
then copy and paste in ur jupyter folder */
import pandas as pd
# Load the dataset
df =pd.read_csv('diabetes_null_values.csv')
print(df)
# Display the number of null values in each column
null_counts = df.isnull().sum()
print("Null value counts:\n", null_counts)
# Remove rows with any null values
df_cleaned = df.dropna()
# Display the cleaned dataset
print("\nCleaned dataset:\n", df_cleaned)
Q.2 Consider the student data set It can be downloaded from:
https://drive.google.com/open?id=1oakZCv7g3mlmCSdv9J8kdSaqO5_6dIOw
Write a programme in python to apply simple linear regression and find out
mean
absolute error, mean squared error and root mean squared error.
[15]
Solution:-
import numpy as nm

9
import pandas as pd
data_set= pd.read_csv('student_scores.csv')
print(data_set)
y = data_set['Scores'].values.reshape(-1, 1)
X = data_set['Hours'].values.reshape(-1, 1)
print(X)
print(y)
print(X.shape)
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size =
0.2)
print(X_train)
print(X_test)
from sklearn.linear_model import LinearRegression
regressor = LinearRegression()
regressor.fit(X_train, y_train)
print(regressor.intercept_)
print(regressor.coef_)
score = regressor.predict([[9.5]])
print(score)
y_pred = regressor.predict(X_test)
print(y_pred)

10
from sklearn.metrics import mean_absolute_error,
mean_squared_error
mae = mean_absolute_error(y_test, y_pred)
mse = mean_squared_error(y_test, y_pred)
rmse = nm.sqrt(mse)
print(mae)
print(mse)
print(rmse)
print('Actual',y_test)
print('Predicted',y_pred)

Slip 6

Q.1 Write a python program to splitting the dataset into training and
testing set. [15]

Solution:-
(
// numpy for mathematical operations
// pandas to use .csv or .xl file, or to import column from dataset
// Scikit-Learn, also known as sklearn is a python library to
implement machine learning models
and statistical modelling. Through scikit-learn, we can implement
various machine learning

11
models for regression, classification, clustering, and statistical
tools for analyzing these models.
// The encode() function in Python is responsible for returning the
encoded form of any given
string
// The fit_transform () method is used to fit the data into a model
and transform it into a form
that is more suitable for the model in a single step.
//: means all row, : -1 means excluding last column
)
Solution:
import numpy as np
import pandas as pd
dataset = pd.read_csv("play_tennis.csv")
dataset
from sklearn import preprocessing
le = preprocessing.LabelEncoder()
dataset['outlook'] = le.fit_transform(dataset.outlook)
dataset['temp'] = le.fit_transform(dataset.temp)
dataset['humidity'] = le.fit_transform(dataset.humidity)
dataset['wind'] = le.fit_transform(dataset.wind)
dataset['play'] = le.fit_transform(dataset.play)
x=dataset.iloc[:,:-1].values

12
print(x)
y=dataset.iloc[:,4].values
print(y)
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test=train_test_split(x,y,test_size=0.2)
print(x_train)
print(x_test)

Q.2 Write a script in R to create a list of employees and perform the

following:
a. Display names of employees in the list.
b. Add an employee at the end of the list.
c. Remove the third element of the list.
[15]
Solution:-
Employee<-data.frame(
eno=c(1,2,3),
ename=c("Pratik","Rohan","Tushar"),
sal=c(10000,20000,30000)
)
print(Employee)
new_data<-rbind(Employee,c(4,"XYZ",2000))
print(new_data)
data<-new_data[-3,]
print(data)
13
Slip 7
Q.1 Write a R program to create a data frame using two given vectors and
display the
duplicate elements.
[15]

Solution:-

vector1 <- c(1,2,3,4,5,6,7,8,6,4)

vector2 <- c(1, 'B', 'C', 'D', 'E', 'D', 'F', 'G',2,3)
data<-data.frame(vector1,vector2)
# 11.Write a R program to create a data frame using two given
vectors and display the duplicate
duplicates =
data[duplicated(data$vector1)|duplicated(data$vector1,fromLast
=TRUE),]
cat("Original Data Frame:\n")
print(data)
cat("\nDuplicate Elements:\n")
print(duplicates)
Q.2 Write a Python program build Decision Tree Classifier using
Scikit-learn
package for diabetes data set (download database from
https://www.kaggle.com/uciml/pima-indians-diabetes-database)
[15]

14
Solution:-
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score
#Load the diabetes dataset (downloaded from the provided URL)
#dataset_url = 'https://raw.githubusercontent.com/uciml/pima-
indians-
#diabetes-database/master/diabetes.csv'
df=pd.read_csv("diabetes.csv")
# Split features (X) and target (y)
X = df.drop('Outcome', axis=1)
y = df['Outcome']
# Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y,
test_size=0.2,
random_state=42)
# Create and train the Decision Tree Classifier
clf = DecisionTreeClassifier(random_state=42)
clf.fit(X_train, y_train)
# Make predictions on the test set
y_pred = clf.predict(X_test)
# Calculate accuracy

15
accuracy = accuracy_score(y_test, y_pred)
print ("Accuracy:", accuracy)

Slip 8

Q.1 Write a python program to splitting the dataset into training and
testing set. [15]

Solution:--
(
// numpy for mathematical operations
// pandas to use .csv or .xl file, or to import column from dataset
// Scikit-Learn, also known as sklearn is a python library to
implement machine learning models
and statistical modelling. Through scikit-learn, we can implement
various machine learning
models for regression, classification, clustering, and statistical
tools for analyzing these models.
// The encode() function in Python is responsible for returning the
encoded form of any given
string
// The fit_transform () method is used to fit the data into a model
and transform it into a form
that is more suitable for the model in a single step.
//: means all row, : -1 means excluding last column

16
)
Solution:
import numpy as np
import pandas as pd
dataset = pd.read_csv("play_tennis.csv")
dataset
from sklearn import preprocessing
le = preprocessing.LabelEncoder()
dataset['outlook'] = le.fit_transform(dataset.outlook)
dataset['temp'] = le.fit_transform(dataset.temp)
dataset['humidity'] = le.fit_transform(dataset.humidity)
dataset['wind'] = le.fit_transform(dataset.wind)
dataset['play'] = le.fit_transform(dataset.play)
x=dataset.iloc[:,:-1].values
print(x)
y=dataset.iloc[:,4].values
print(y)
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test=train_test_split(x,y,test_size=0.2)
print(x_train)
print(x_test)
Q.2 Consider following dataset
weather=['Sunny','Sunny','Overcast','Rainy','Rainy','Rainy','Overcast','S

17
unny','Sunny','Rainy','Sunny','Overcast','Overcast','Rainy']
temp=['Hot','Hot','Hot','Mild','Cool','Cool','Cool','Mild','Cool','Mild','Mi
ld','Mild','Hot','Mild']
play=['No','No','Yes','Yes','Yes','No','Yes','No','Yes','Yes','Yes','Yes','Y
es','No']. Use Naïve Bayes algorithm to predict[ 0:Overcast, 2:Mild]
tuple belongs to which class whether to play the sports or not.
[15]
Solution:-
weather=['sunny','sunny','overcast','rainy','rainy','rainy','overcas
t','sunny','sunny','rainy','su
nny','overcast','overcast','rainy']
temp=['hot','hot','hot','mild','cool','cool','cool','mild','cool','mild',
'mild','mild','hot','mild']
play=['No','No','Yes','Yes','Yes','No','Yes','No','Yes','Yes','Yes','
Yes','Yes','No']
from sklearn import preprocessing
le = preprocessing.LabelEncoder()
wheather_encoded = le.fit_transform(weather)
print(wheather_encoded)
temp_encoded = le.fit_transform(temp)
label = le.fit_transform(play)
print("Temp:",temp_encoded)
print("Play:",label)
features = list(zip(wheather_encoded,temp_encoded))
print(features)
from sklearn.naive_bayes import GaussianNB
18
model = GaussianNB()
model.fit(features,label)
predicted = model.predict([[0,2]])
print("Predicted Value:",predicted)

Slip 9

Q.1 Write a R program to reverse a number and also calculate the sum of
digits of that
number. [15]

Solution:-
x = as.integer(readline("Enter any number:- "))
temp=x
rev=0
while(temp>0)
{
rem = temp%%10
rev=(rev*10)+rem
temp=floor(temp/10)
}
cat("Reverse of number is ",rev)
sum=0

19
while(x>0)
{
rem = x%%10
sum=sum+rem
x=floor(x/10)
}
cat("Sum of digits of the number is ",sum)

Q.2 Write a Python Programme to read the dataset (“Iris.csv”). dataset

download from
(https://archive.ics.uci.edu/ml/datasets/iris) and apply Apriori algorithm.
[15]
Solution:-
/* Pls type each line separately in Jupyter */
/** Before importing the libraries, we will use the below line of
code to install the apyori
package to use further **/
pip install apyori /**Type this command on Command Prompt
**/
/** Type the following program in Jupyter **/
import numpy as nm
import matplotlib.pyplot as mtp
import pandas as pd
dataset = pd.read_csv('Iris.csv')

20
dataset
transactions=[]
for i in range(0, 150): transactions.append([str(dataset.values[i,j])
for j in range(0,5)])
from apyori import apriori
rules= apriori(transactions= transactions, min_support=0.003,
min_confidence = 0.2,
min_lift=3, min_length=2, max_length=2)
results= list(rules)
results
for item in results:pair = item[0] , item = [x for x in pair]
print("Rule: " + item[0] + " -> " + item[1])
print("Support: " + str(item[1]))
print("Confidence: " + str(item[2][0][2]))
print("Lift: " + str(item[2][0][3]))
print("=====================================")

Slip 10

Q.1 Consider following observations/data. And apply simple linear regression

and find
out estimated coefficients b0 and b1.( use numpy package)
x= [0, 1, 2, 3, 4, 5, 6, 7, 8, 9,11,13]
y = ([1, 3, 2, 5, 7, 8, 8, 9, 10, 12,16, 18]
[15]
21
Solution:-
import numpy as np
from sklearn.linear_model import LinearRegression
x= np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9,11,13]).reshape((-1, 1))
y = np.array([1, 3, 2, 5, 7, 8, 8, 9, 10, 12,16, 18])
print(x)
print(y)
model = LinearRegression()
model.fit(x, y)
print('intercept:-',model.intercept_)
print('Slope:- ', model.coef_)
Q.2 Write a R program to create a data frame using two given vectors and
display the
duplicate elements [15]

Slip 11

Q.1 Write a R program to reverse a number and also calculate the sum
of digits of that
number. [15]
Solution:-
x = as.integer(readline("Enter any number:- "))
temp=x
rev=0
while(temp>0)
{
rem = temp%%10
rev=(rev*10)+rem
temp=floor(temp/10)
}
cat("Reverse of number is ",rev)
sum=0
while(x>0)
{
rem = x%%10

23
sum=sum+rem
x=floor(x/10)
}
cat("Sum of digits of the number is ",sum)

Q.2 Consider following observations/data. And apply simple linear

regression and find
out estimated coefficients b1 and b1 Also analyse the performance of the
model
(Use sklearn package)
x = np.array([1,2,3,4,5,6,7,8])
y = np.array([7,14,15,18,19,21,26,23])
[15]
Solution:-
import numpy as np
from sklearn.linear_model import LinearRegression
x= np.array([1,2,3,4,5,6,7,8]).reshape((-1, 1))
print(x)
y = np.array([7,14,15,18,19,21,26,23])
print(y)
model = LinearRegression()
model.fit(x, y)
x_new = np.array(9).reshape((-1, 1))
y_new_pred = model.predict(x_new)
print(y_new_pred)

24
print('Slope:- ', model.coef_)

Slip 12

Q.1 Write a python program to implement multiple Linear Regression model

for a car
dataset. Dataset can be downloaded from:
https://www.w3schools.com/python/python_ml_multiple_regression.asp
[15]
/**** From the above link, copy data of car into excel file, save it
by .xls and then convert it into
.csv ***/
Solution:
import pandas
from sklearn import linear_model
df = pandas.read_csv("car.csv")
print(df)
X = df[['Weight', 'Volume']]
print(X)
y = df['CO2']
print(y)
regr = linear_model.LinearRegression()
regr.fit(X, y)
predictedCO2 = regr.predict([[2300, 1300]])

25
print(predictedCO2)
Q.2 Write a R program to calculate the sum of two matrices of given
size. [15]

Solution:-
# Define a function to calculate the sum of two matrices
matrix_sum <- function(matrix1, matrix2) {
if (dim(matrix1) != dim(matrix2)) {
stop("Matrices must have the same dimensions for addition.")
}

result_matrix <- matrix1 + matrix2

return(result_matrix)
}

# Input the size of the matrices

n_rows <- as.integer(readline("Enter the number of rows for the
matrices: "))
n_cols <- as.integer(readline("Enter the number of columns for
the matrices: "))

# Create the first matrix

cat("Enter values for the first matrix:\n")

26
matrix1 <- matrix(nrow = n_rows, ncol = n_cols)
for (i in 1:n_rows) {
for (j in 1:n_cols) {
matrix1[i, j] <- as.integer(readline(paste("Enter element at
position [", i, ",", j, "]: ")))
}
}

# Create the second matrix

cat("Enter values for the second matrix:\n")
matrix2 <- matrix(nrow = n_rows, ncol = n_cols)
for (i in 1:n_rows) {
for (j in 1:n_cols) {
matrix2[i, j] <- as.integer(readline(paste("Enter element at
position [", i, ",", j, "]: ")))
}
}

# Calculate the sum of the matrices

result <- matrix_sum(matrix1, matrix2)

# Print the result

cat("Sum of the two matrices:\n")

27
print(result)

Slip 13
Q.1 Write a python programme to implement multiple linear regression model
for stock
market data frame as follows:
Stock_Market = {'Year':
[2017,2017,2017,2017,2017,2017,2017,2017,2017,2017,2017,2017,2016,2016,
20
16,2016,2016,2016,2016,2016,2016,2016,2016,2016],
'Month': [12, 11,10,9,8,7,6,5,4,3,2,1,12,11,10,9,8,7,6,5,4,3,2,1],
'Interest_Rate':
[2.75,2.5,2.5,2.5,2.5,2.5,2.5,2.25,2.25,2.25,2,2,2,1.75,1.75,1.75,1.75,1.75,1.75,1
.7
5,1.75,1.75,1.75,1.75],
'Unemployment_Rate':
[5.3,5.3,5.3,5.3,5.4,5.6,5.5,5.5,5.5,5.6,5.7,5.9,6,5.9,5.8,6.1,6.2,6.1,6.1,6.1,5.9,6.
2,6
.2,6.1],
'Stock_Index_Price':
[1464,1394,1357,1293,1256,1254,1234,1195,1159,1167,1130,1075,1047,965,9
43,
958,971,949,884,866,876,822,704,719] }
And draw a graph of stock market price verses interest rate.
[15]

Solution:-
import pandas as pd
from sklearn import linear_model
data = {'year':
28
[2017,2017,2017,2017,2017,2017,2017,2017,2017,2017,2017,2017,2
016,2016,2016,201
6,2016,2016,2016,2016,2016,2016,2016,2016],
'month': [12,11,10,9,8,7,6,5,4,3,2,1,12,11,10,9,8,7,6,5,4,3,2,1],
'interest_rate':
[2.75,2.5,2.5,2.5,2.5,2.5,2.5,2.25,2.25,2.25,2,2,2,1.75,1.75,1.75,1.75,1
.75,1.75,1.75,1.75,
1.75,1.75,1.75],
'unemployment_rate':
[5.3,5.3,5.3,5.3,5.4,5.6,5.5,5.5,5.5,5.6,5.7,5.9,6,5.9,5.8,6.1,6.2,6.1,6.1,
6.1,5.9,6.2,6.2,6.1],
'index_price':
[1464,1394,1357,1293,1256,1254,1234,1195,1159,1167,1130,1075,1
047,965,943,958,97
1,949,884,866,876,822,704,719] }
df = pd.DataFrame(data)
print(df)
x = df[['interest_rate','unemployment_rate']]
print(x)
y = df['index_price']
print(y)
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(x, y, test_size =
0.2)

29
print(X_train)
print(X_test)
regr = linear_model.LinearRegression()
regr.fit(X_train, y_train)
print('Intercept: \n', regr.intercept_)
print('Coefficients: \n', regr.coef_)
y_pred=regr.predict(X_test)
print(y_pred)
from sklearn.metrics import r2_score
Accuracy=r2_score(y_test,y_pred)*100
print(Accuracy)
import matplotlib.pyplot as plt
plt.scatter(y_test,y_pred);
plt.xlabel('Actual');
plt.ylabel('Predicted');
import seaborn as sns
sns.regplot(x=y_test,y=y_pred,ci=None,color ='red');
Q.2 Write a R program to concatenate two given factors.
[15]

Solution:-
data1 <- c("ABC","PQR","XYZ")
data2 <- c(1,2,3)
30
factor1<-factor(data1)
factor2<-factor(data2)
print(factor1)
print(factor2)
concatinated<-c(factor1,factor2)
print(concatinated)

Slip 14

Q.1 Write a script in R to create a list of employees and perform the following:
a. Display names of employees in the list.
b. Add an employee at the end of the list.
c. Remove the third element of the list.
[15]
Solution:-
Employee<-data.frame(
eno=c(1,2,3),
ename=c("Pratik","Rohan","Tushar"),
sal=c(10000,20000,30000)
)
print(Employee)
new_data<-rbind(Employee,c(4,"XYZ",2000))
print(new_data)
data<-new_data[-3,]
31
print(data)

Q.2 Consider following observations/data. And apply simple linear

32
Q.1 Write a R program to add, multiply and divide two vectors of integer
type. (vector
length should be minimum 4)
[15]
Solution: -
vector1<-c(1,2,3,4,5)
vector2<-c(6,7,8,9,10)
Addition<- vector1+vector2
print(Addition)
Multiplication<-vector1*vector2
print(Multiplication)
Division<-vector1/vector2
print(Division)
Q.2 Write a Python program build Decision Tree Classifier using Scikit-learn
package for diabetes data set (download database from
https://www.kaggle.com/uciml/pima-indians-diabetes-database)
[15]
Solution:-
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score
#Load the diabetes dataset (downloaded from the provided URL)
#dataset_url = 'https://raw.githubusercontent.com/uciml/pima-
indians-

33
#diabetes-database/master/diabetes.csv'
df=pd.read_csv("diabetes.csv")
# Split features (X) and target (y)
X = df.drop('Outcome', axis=1)
y = df['Outcome']
# Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y,
test_size=0.2,
random_state=42)
# Create and train the Decision Tree Classifier
clf = DecisionTreeClassifier(random_state=42)
clf.fit(X_train, y_train)
# Make predictions on the test set
y_pred = clf.predict(X_test)
# Calculate accuracy
accuracy = accuracy_score(y_test, y_pred)
print ("Accuracy:", accuracy)
Slip 16
Q.1 Write a python program to implement multiple Linear Regression model
for a car
dataset. Dataset can be downloaded from:
https://www.w3schools.com/python/python_ml_multiple_regression.asp
[15]

34
/**** From the above link, copy data of car into excel file, save it
by .xls and then convert it into
.csv ***/
Solution:
import pandas
from sklearn import linear_model
df = pandas.read_csv("car.csv")
print(df)
X = df[['Weight', 'Volume']]
print(X)
y = df['CO2']
print(y)
regr = linear_model.LinearRegression()
regr.fit(X, y)
predictedCO2 = regr.predict([[2300, 1300]])
print(predictedCO2)
Q.2 Write a script in R to create a list of employees and perform the following:
a. Display names of employees in the list.
b. Add an employee at the end of the list.
c. Remove the third element of the list.
[15]
Solution:-
Employee<-data.frame(
eno=c(1,2,3),

35
ename=c("Pratik","Rohan","Tushar"),
sal=c(10000,20000,30000)
)
print(Employee)
new_data<-rbind(Employee,c(4,"XYZ",2000))
print(new_data)
data<-new_data[-3,]
print(data)
Slip 17

Q.1 Write a python program to implement k-means algorithms on a

synthetic dataset.

[15]
/* Write all the coding in Single ….. in Jupyter */
Solution :
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.datasets import make_blobs
data = make_blobs(n_samples=300, n_features=2, centers=5,
cluster_std=1.8,random_state=101)
data[0].shape
data[1]

36
plt.scatter(data[0][:,0],data[0][:,1],c=data[1],cmap='brg')
from sklearn.cluster import KMeans
kmeans = KMeans(n_clusters=5)
kmeans.fit(data[0])
kmeans.cluster_centers_
kmeans.labels_f, (ax1, ax2) = plt.subplots(1, 2,
sharey=True,figsize=(10,6))
ax1.set_title('K Means')
ax1.scatter(data[0][:,0],data[0][:,1],c=kmeans.labels_,cmap='brg')
ax2.set_title("Original")
ax2.scatter(data[0][:,0],data[0][:,1],c=data[1],cmap='brg')
Q.2 Write a R program to sort a list of strings in ascending and descending
order.

[15]
Solution:-
list<-c("apple","banana","Pineapple","mango","Orange")
asc<-sort(list)
print(asc)
desc<-sort(list,decreasing = TRUE)
print(desc)
Slip 18

37
Q.1 Write a R program to reverse a number and also calculate the sum
of digits of that
number. [15]
Solution:-
x = as.integer(readline("Enter any number:- "))
temp=x
rev=0
while(temp>0)
{
rem = temp%%10
rev=(rev*10)+rem
temp=floor(temp/10)
}
cat("Reverse of number is ",rev)
sum=0
while(x>0)
{
rem = x%%10
sum=sum+rem
x=floor(x/10)
}
cat("Sum of digits of the number is ",sum)

38
Q.2 Write a python program to implement hierarchical Agglomerative
clustering algorithm. (Download Customer.csv dataset from github.com).
[15]
Solution :
import numpy as nm
import matplotlib.pyplot as mtp
import pandas as pd
dataset = pd.read_csv('Mall_Customers.csv')
x = dataset.iloc[:, [3, 4]].values
import scipy.cluster.hierarchy as shc
dendro = shc.dendrogram(shc.linkage(x, method="ward"))
mtp.title("Dendrogrma Plot")
mtp.ylabel("Euclidean Distances")
mtp.xlabel("Customers")
mtp.show()
from sklearn.cluster import AgglomerativeClustering
hc= AgglomerativeClustering(n_clusters=5, affinity='euclidean',
linkage='ward')
y_pred= hc.fit_predict(x)
mtp.scatter(x[y_pred == 0, 0], x[y_pred == 0, 1], s = 100, c =
'blue', label = 'Cluster 1')
mtp.scatter(x[y_pred == 1, 0], x[y_pred == 1, 1], s = 100, c =
'green', label = 'Cluster 2')

39
mtp.scatter(x[y_pred== 2, 0], x[y_pred == 2, 1], s = 100, c = 'red',
label = 'Cluster 3')
mtp.scatter(x[y_pred == 3, 0], x[y_pred == 3, 1], s = 100, c =
'cyan', label = 'Cluster 4')
mtp.scatter(x[y_pred == 4, 0], x[y_pred == 4, 1], s = 100, c =
'magenta', label = 'Cluster 5')
mtp.title('Clusters of customers')
mtp.xlabel('Annual Income (k$)')
mtp.ylabel('Spending Score (1-100)')
mtp.legend()
mtp.show()
Slip 19

Q.1 Write a python program to implement k-means algorithm to build

prediction model
(Use Credit Card Dataset CC GENERAL.csv Download from kaggle.com)
[15]
Solution :
import numpy as nm
import matplotlib.pyplot as mtp
import pandas as pd
dataset = pd.read_csv('creditcard.csv')
dataset

40
x = dataset.iloc[:, [3, 4]].values
print(x)
from sklearn.cluster import KMeans
wcss_list= []
for i in range(1, 11):kmeans = KMeans(n_clusters=i, init='k-
means++', random_state= 42)
kmeans.fit(x)
wcss_list.append(kmeans.inertia_)
mtp.plot(range(1, 11), wcss_list)
mtp.title('The Elobw Method Graph')
mtp.xlabel('Number of clusters(k)')
mtp.ylabel('wcss_list')
mtp.show()
kmeans = KMeans(n_clusters=3, init='k-means++',
random_state= 42)
y_predict= kmeans.fit_predict(x)
mtp.scatter(x[y_predict == 0, 0], x[y_predict == 0, 1], s = 100, c =
'blue', label ='Cluster 1')
#for first cluster
mtp.scatter(x[y_predict == 1, 0], x[y_predict == 1, 1], s = 100, c =
'green', label ='Cluster 2')
#for second cluster
mtp.scatter(x[y_predict== 2, 0], x[y_predict == 2, 1], s = 100, c =
'red', label ='Cluster 3')

41
#for third cluster
mtp.scatter(kmeans.cluster_centers_[:, 0],
kmeans.cluster_centers_[:, 1], s = 300,
c = 'yellow', label = 'Centroid')
mtp.title('Clusters of Credit Card')
mtp.xlabel('V3')
mtp.ylabel('V4')
mtp.legend()
mtp.show()

Q.2 Write a script in R to create a list of employees and perform the

following:
a. Display names of employees in the list.
b. Add an employee at the end of the list.
c. Remove the third element of the list.
[15]
Employee<-data.frame(
eno=c(1,2,3),
ename=c("Pratik","Rohan","Tushar"),
sal=c(10000,20000,30000)
)
print(Employee)
new_data<-rbind(Employee,c(4,"XYZ",2000))
print(new_data)
data<-new_data[-3,]

42
print(data)

Slip 20

Q.1 Write a python program to implement hierarchical clustering

algorithm. (Download
Wholesale customers data dataset from github.com).
[15]
import numpy as nm
import matplotlib.pyplot as mtp
import pandas as pd
dataset = pd.read_csv('Wholesale customers data.csv')
dataset
x = dataset.iloc[:, [3, 4]].values
print(x)
import scipy.cluster.hierarchy as shc
dendro = shc.dendrogram(shc.linkage(x, method="ward"))
mtp.title("Dendrogrma Plot")
mtp.ylabel("Euclidean Distances")
mtp.xlabel("Customers")
mtp.show()
from sklearn.cluster import AgglomerativeClustering
hc= AgglomerativeClustering(n_clusters=5, affinity='euclidean',
linkage='ward')

43
y_pred= hc.fit_predict(x)
mtp.scatter(x[y_pred == 0, 0], x[y_pred == 0, 1], s = 100, c =
'blue', label = 'Cluster 1')
mtp.scatter(x[y_pred == 1, 0], x[y_pred == 1, 1], s = 100, c =
'green', label = 'Cluster 2')
mtp.scatter(x[y_pred== 2, 0], x[y_pred == 2, 1], s = 100, c = 'red',
label = 'Cluster 3')
mtp.scatter(x[y_pred == 3, 0], x[y_pred == 3, 1], s = 100, c =
'cyan', label = 'Cluster 4')
mtp.scatter(x[y_pred == 4, 0], x[y_pred == 4, 1], s = 100, c =
'magenta', label = 'Cluster 5')
mtp.title('Clusters of customers')
mtp.xlabel('Milk')
mtp.ylabel('Grocery')
mtp.legend()
mtp.show()
Q.2 Write a R program to concatenate two given factors.
[15]
data1 <- c("ABC","PQR","XYZ")
data2 <- c(1,2,3)
factor1<-factor(data1)
factor2<-factor(data2)
print(factor1)
print(factor2)

44
concatinated<-c(factor1,factor2)
print(concatinated)

035 Assignment PDF
No ratings yet
035 Assignment PDF
14 pages
DM Practice
No ratings yet
DM Practice
15 pages
Rajeek8 12
No ratings yet
Rajeek8 12
21 pages
If With: February 26, 2024
No ratings yet
If With: February 26, 2024
7 pages
Data Science Practical
No ratings yet
Data Science Practical
22 pages
Untitled Document
No ratings yet
Untitled Document
19 pages
Kanish 9-12
No ratings yet
Kanish 9-12
18 pages
Da Program
No ratings yet
Da Program
18 pages
House Pricing
No ratings yet
House Pricing
15 pages
ML MANUAL WITH OUTPUTS (2)
No ratings yet
ML MANUAL WITH OUTPUTS (2)
30 pages
3170724-machine-learning-lab-manual
No ratings yet
3170724-machine-learning-lab-manual
11 pages
Data Science Lab Manual
No ratings yet
Data Science Lab Manual
32 pages
AIML Record 56
No ratings yet
AIML Record 56
28 pages
Data_Mining_Practicals_Complete
No ratings yet
Data_Mining_Practicals_Complete
13 pages
Machine Learnin
100% (2)
Machine Learnin
23 pages
ML
No ratings yet
ML
11 pages
DM Slip Solutions
100% (1)
DM Slip Solutions
24 pages
Correction
No ratings yet
Correction
3 pages
MlLabManualdocx 2024 09 04 22 02 58
No ratings yet
MlLabManualdocx 2024 09 04 22 02 58
19 pages
21brs1474 ML Lab 2
No ratings yet
21brs1474 ML Lab 2
25 pages
XII - Informatics Practices (LAB MANUAL)
100% (1)
XII - Informatics Practices (LAB MANUAL)
42 pages
Deep Learning Practical
No ratings yet
Deep Learning Practical
12 pages
PR
No ratings yet
PR
17 pages
Data Science Manual
No ratings yet
Data Science Manual
16 pages
ML Remaining
No ratings yet
ML Remaining
17 pages
Data Science Record_05
No ratings yet
Data Science Record_05
20 pages
Big Data Merged
No ratings yet
Big Data Merged
7 pages
ML LAB(R22) MANUAL (4)
No ratings yet
ML LAB(R22) MANUAL (4)
25 pages
DA Lab ANSWERS
No ratings yet
DA Lab ANSWERS
10 pages
Import Pandas As PD DF PD - Read - CSV ("Titanic - Train - CSV") DF - Head
No ratings yet
Import Pandas As PD DF PD - Read - CSV ("Titanic - Train - CSV") DF - Head
20 pages
ml lab
No ratings yet
ml lab
23 pages
exp4_kmeansclustering_sem3
No ratings yet
exp4_kmeansclustering_sem3
3 pages
Ml Lab Manual
No ratings yet
Ml Lab Manual
36 pages
ML LAB FILE (2)
No ratings yet
ML LAB FILE (2)
48 pages
Utkarsh Da 5 Final
No ratings yet
Utkarsh Da 5 Final
13 pages
univds
No ratings yet
univds
8 pages
Machine Learning Laboratory (21AIL66)
No ratings yet
Machine Learning Laboratory (21AIL66)
7 pages
BCSL606 MACHINE LEARNING LAB
No ratings yet
BCSL606 MACHINE LEARNING LAB
33 pages
AI ML - Cycle 2 Programs (1)
No ratings yet
AI ML - Cycle 2 Programs (1)
15 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
11 pages
ML Record
No ratings yet
ML Record
18 pages
Jntuk R20 ML
No ratings yet
Jntuk R20 ML
43 pages
Dataframe programs
No ratings yet
Dataframe programs
12 pages
Ml Solution
No ratings yet
Ml Solution
60 pages
G Pandey Practical
No ratings yet
G Pandey Practical
33 pages
MLLabManual
No ratings yet
MLLabManual
24 pages
PythonFile[1]
No ratings yet
PythonFile[1]
5 pages
Matplotlib linechatsy
No ratings yet
Matplotlib linechatsy
38 pages
MMPS Record IP
No ratings yet
MMPS Record IP
73 pages
Machine
100% (1)
Machine
45 pages
BCSL606 MACHINE LEARNING LAB FINAL DRAFT
No ratings yet
BCSL606 MACHINE LEARNING LAB FINAL DRAFT
32 pages
lab-5-nguyenngocmaithi-20130120
No ratings yet
lab-5-nguyenngocmaithi-20130120
20 pages
Python Programming Practical No 16to 30
No ratings yet
Python Programming Practical No 16to 30
81 pages
Pattern Recognition Lab
No ratings yet
Pattern Recognition Lab
24 pages
Machine Learning with PySpark and MLlib — Solving a Binary Classification Problem _ by Susan Li _ Towards Data Science
No ratings yet
Machine Learning with PySpark and MLlib — Solving a Binary Classification Problem _ by Susan Li _ Towards Data Science
10 pages
14401172022_tanu raman ml lab file
No ratings yet
14401172022_tanu raman ml lab file
21 pages
23BCE7199 ML Lab Assignment[1]
No ratings yet
23BCE7199 ML Lab Assignment[1]
15 pages
Programs Lab Bca
No ratings yet
Programs Lab Bca
16 pages
ml lab
No ratings yet
ml lab
14 pages
Profound Python Data Science
From Everand
Profound Python Data Science
Onder Teker
No ratings yet
Introduction To User Studies
No ratings yet
Introduction To User Studies
52 pages
EDUCATIONAL PSYCHOLOGY-FINAL Exam
50% (2)
EDUCATIONAL PSYCHOLOGY-FINAL Exam
4 pages
Lesson Plan History of Manicure
No ratings yet
Lesson Plan History of Manicure
2 pages
Food Processing DLL 1 Week 2
No ratings yet
Food Processing DLL 1 Week 2
6 pages
National Flag-The Tricolour: Quiz Time
No ratings yet
National Flag-The Tricolour: Quiz Time
5 pages
Ame101 F17 PS2
No ratings yet
Ame101 F17 PS2
2 pages
Ssic Lecture Notes Modules 1 To 5 Ordered and Edited
No ratings yet
Ssic Lecture Notes Modules 1 To 5 Ordered and Edited
96 pages
CHE 110 Molecular Orbital Practice Problems Answers PDF
No ratings yet
CHE 110 Molecular Orbital Practice Problems Answers PDF
3 pages
Analysis of Time-Dependent Deformation of A CFRP Mirror Under Hot and Humid Conditions
No ratings yet
Analysis of Time-Dependent Deformation of A CFRP Mirror Under Hot and Humid Conditions
16 pages
2025_EPAM1514_Study guide_Reader(1)
No ratings yet
2025_EPAM1514_Study guide_Reader(1)
78 pages
Schema
No ratings yet
Schema
1 page
Isaca CISM: Practice Exam
No ratings yet
Isaca CISM: Practice Exam
37 pages
Hse Law
No ratings yet
Hse Law
43 pages
Swimming Pool
No ratings yet
Swimming Pool
66 pages
Viva Questions .. (12th) Term 2
No ratings yet
Viva Questions .. (12th) Term 2
7 pages
BofA MM Outlook
No ratings yet
BofA MM Outlook
77 pages
+1 Basic Concepts of Chemistry
No ratings yet
+1 Basic Concepts of Chemistry
16 pages
IT Report Springboard
No ratings yet
IT Report Springboard
40 pages
Module 6 - Good Manners and Right Conduct
No ratings yet
Module 6 - Good Manners and Right Conduct
7 pages
TS311 Week 2 - Sustainable Tourism - Unit 2 S02 - 2020 - Compressed PDF
No ratings yet
TS311 Week 2 - Sustainable Tourism - Unit 2 S02 - 2020 - Compressed PDF
57 pages
CUET-UG - 2024 Ip
No ratings yet
CUET-UG - 2024 Ip
3 pages
Ga Ing05 Ci Vi Unido
No ratings yet
Ga Ing05 Ci Vi Unido
68 pages
July 25, 2022
No ratings yet
July 25, 2022
8 pages
5 Sense Properties, Stereotype, & Sense Relation
No ratings yet
5 Sense Properties, Stereotype, & Sense Relation
62 pages
Where can buy Effective Leadership and Organization’s Market Success 1st Edition Ila Sharma ebook with cheap price
100% (1)
Where can buy Effective Leadership and Organization’s Market Success 1st Edition Ila Sharma ebook with cheap price
55 pages
Asking Questions - TeachingEnglish - British Council - BBC
No ratings yet
Asking Questions - TeachingEnglish - British Council - BBC
4 pages
USP 1117 Microbiological Best Laboratory Practices - Draft
No ratings yet
USP 1117 Microbiological Best Laboratory Practices - Draft
11 pages
Faktor-Faktor Yang Memotivasi Minat Mahasiswa Dalam Berwirausaha Setelah Mendapatkan Materi Kwu
No ratings yet
Faktor-Faktor Yang Memotivasi Minat Mahasiswa Dalam Berwirausaha Setelah Mendapatkan Materi Kwu
10 pages
ADSP Lab Manual
No ratings yet
ADSP Lab Manual
15 pages
Full Download Operational Energy 1st Edition Alan Howard PDF DOCX
100% (1)
Full Download Operational Energy 1st Edition Alan Howard PDF DOCX
67 pages