0% found this document useful (0 votes)
15 views

Data Mining & Data Science Practical Slips

Data Mining & Data Science Practical Slips (1)

Uploaded by

ag8411877
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views

Data Mining & Data Science Practical Slips

Data Mining & Data Science Practical Slips (1)

Uploaded by

ag8411877
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 45

Slip 1

Q.1 Write a R program to calculate the multiplication table using a function.


[15]
Solution:-

multiplication_table <- function(n) {


table <- matrix(0, nrow = n, ncol = n)
for (i in 1:10) {
cat(i*n,"\n")
}
}
n <- as.integer(readline("Enter the number: "))
cat("Multiplication Table : of ", n,":\n")
multiplication_table(n)

Q.2 Write a python program the Categorical values in numeric format for a
given dataset.
[15]

Solution:-
import pandas as pd
from sklearn.preprocessing import LabelEncoder
# Sample dataset

1
data = {
'Category': ['A', 'B', 'A', 'C', 'B', 'A']
}
# Creating a DataFrame
df = pd.DataFrame(data)
# Initialize the LabelEncoder
label_encoder = LabelEncoder()
# Apply label encoding to the 'Category' column
df ['Category_encoded'] = label_encoder.fit_transform(df
['Category'])
print(df)

Slip 2
Q.1 Consider the student data set It can be downloaded from:
https://drive.google.com/open?id=1oakZCv7g3mlmCSdv9J8kdSaqO5_6dIOw
Write a programme in python to apply simple linear regression and find out
mean
absolute error, mean squared error and root mean squared error.
[15]

Solution:-

import numpy as nm
import pandas as pd
2
data_set= pd.read_csv('student_scores.csv')
print(data_set)
y = data_set['Scores'].values.reshape(-1, 1)
X = data_set['Hours'].values.reshape(-1, 1)
print(X)
print(y)
print(X.shape)
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size =
0.2)
print(X_train)
print(X_test)
from sklearn.linear_model import LinearRegression
regressor = LinearRegression()
regressor.fit(X_train, y_train)
print(regressor.intercept_)
print(regressor.coef_)
score = regressor.predict([[9.5]])
print(score)
y_pred = regressor.predict(X_test)
print(y_pred)
from sklearn.metrics import mean_absolute_error,
mean_squared_error

3
mae = mean_absolute_error(y_test, y_pred)
mse = mean_squared_error(y_test, y_pred)
rmse = nm.sqrt(mse)
print(mae)
print(mse)
print(rmse)
print('Actual',y_test)
print('Predicted',y_pred)

Q.2 Write a R program to reverse a number and also calculate the sum of
digits of that
number. [15]

Solution:-
x = as.integer(readline("Enter any number:- "))
temp=x
rev=0
while(temp>0)
{
rem = temp%%10
rev=(rev*10)+rem
temp=floor(temp/10)
}

4
cat("Reverse of number is ",rev)
sum=0
while(x>0)
{
rem = x%%10
sum=sum+rem
x=floor(x/10)
}
cat("Sum of digits of the number is ",sum)

Slip 3
Q.1 Write a python program the Categorical values in numeric format for a
given dataset.
[15]
Solution:-

import pandas as pd
from sklearn.preprocessing import LabelEncoder
# Sample dataset
data = {
'Category': ['A', 'B', 'A', 'C', 'B', 'A']
}
# Creating a DataFrame
df = pd.DataFrame(data)

5
# Initialize the LabelEncoder
label_encoder = LabelEncoder()
# Apply label encoding to the 'Category' column
df ['Category_encoded'] = label_encoder.fit_transform(df
['Category'])
print(df)

Q.2 Write a R program to create a data frame using two given vectors and
display the
duplicate elements [15]

Solution:-
vector1 <- c(1,2,3,4,5,6,7,8,6,4)
vector2 <- c(1, 'B', 'C', 'D', 'E', 'D', 'F', 'G',2,3)
data<-data.frame(vector1,vector2)
duplicates =
data[duplicated(data$vector1)|duplicated(data$vector1,fromLast
=TRUE),]
cat("Original Data Frame:\n")
print(data)
cat("\nDuplicate Elements:\n")
print(duplicates)

Slip 4

6
Q.1 Write a R program to calculate the multiplication table using a function.
[15]

Solution:-

multiplication_table <- function(n) {


table <- matrix(0, nrow = n, ncol = n)

for (i in 1:10) {
cat(i*n,"\n")
}

n <- as.integer(readline("Enter the number: "))

cat("Multiplication Table : of ", n,":\n")

multiplication_table(n)

Q.2 Consider following dataset


7
weather=['Sunny','Sunny','Overcast','Rainy','Rainy','Rainy','Overcast','S
unny','Sunny','Rainy','Sunny','Overcast','Overcast','Rainy']
temp=['Hot','Hot','Hot','Mild','Cool','Cool','Cool','Mild','Cool','Mild','Mi
ld','Mild','Hot','Mild']
play=['No','No','Yes','Yes','Yes','No','Yes','No','Yes','Yes','Yes','Yes','Y
es','No']. Use Naïve Bayes algorithm to predict[ 0:Overcast, 2:Mild]
tuple belongs to which class whether to play the sports or not.
[15]

Solution:-

weather=['sunny','sunny','overcast','rainy','rainy','rainy','overcast','sunny'
,'sunny','rainy','su
nny','overcast','overcast','rainy']
temp=['hot','hot','hot','mild','cool','cool','cool','mild','cool','mild','mild','mi
ld','hot','mild']
play=['No','No','Yes','Yes','Yes','No','Yes','No','Yes','Yes','Yes','Yes','Yes',
'No']
from sklearn import preprocessing
le = preprocessing.LabelEncoder()
wheather_encoded = le.fit_transform(weather)
print(wheather_encoded)
temp_encoded = le.fit_transform(temp)
label = le.fit_transform(play)
print("Temp:",temp_encoded)
print("Play:",label)
features = list(zip(wheather_encoded,temp_encoded))
print(features)
from sklearn.naive_bayes import GaussianNB
model = GaussianNB()
model.fit(features,label)
predicted = model.predict([[0,2]])
print("Predicted Value:",predicted)

Slip 5
Q.1 Write a python program to find all null values in a given data set
and remove them.
(Download dataset from github.com)
[15]

8
Solution:-
/* For this copy and paste diabetes dataset in same folder (not in
jupyter folder) , delete 2 or 3
values where 0 is written (means now it becomes null values) ,
rename it as diabetes_null_values and
then copy and paste in ur jupyter folder */
import pandas as pd
# Load the dataset
df =pd.read_csv('diabetes_null_values.csv')
print(df)
# Display the number of null values in each column
null_counts = df.isnull().sum()
print("Null value counts:\n", null_counts)
# Remove rows with any null values
df_cleaned = df.dropna()
# Display the cleaned dataset
print("\nCleaned dataset:\n", df_cleaned)
Q.2 Consider the student data set It can be downloaded from:
https://drive.google.com/open?id=1oakZCv7g3mlmCSdv9J8kdSaqO5_6dIOw
Write a programme in python to apply simple linear regression and find out
mean
absolute error, mean squared error and root mean squared error.
[15]
Solution:-
import numpy as nm

9
import pandas as pd
data_set= pd.read_csv('student_scores.csv')
print(data_set)
y = data_set['Scores'].values.reshape(-1, 1)
X = data_set['Hours'].values.reshape(-1, 1)
print(X)
print(y)
print(X.shape)
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size =
0.2)
print(X_train)
print(X_test)
from sklearn.linear_model import LinearRegression
regressor = LinearRegression()
regressor.fit(X_train, y_train)
print(regressor.intercept_)
print(regressor.coef_)
score = regressor.predict([[9.5]])
print(score)
y_pred = regressor.predict(X_test)
print(y_pred)

10
from sklearn.metrics import mean_absolute_error,
mean_squared_error
mae = mean_absolute_error(y_test, y_pred)
mse = mean_squared_error(y_test, y_pred)
rmse = nm.sqrt(mse)
print(mae)
print(mse)
print(rmse)
print('Actual',y_test)
print('Predicted',y_pred)

Slip 6

Q.1 Write a python program to splitting the dataset into training and
testing set. [15]

Solution:-
(
// numpy for mathematical operations
// pandas to use .csv or .xl file, or to import column from dataset
// Scikit-Learn, also known as sklearn is a python library to
implement machine learning models
and statistical modelling. Through scikit-learn, we can implement
various machine learning

11
models for regression, classification, clustering, and statistical
tools for analyzing these models.
// The encode() function in Python is responsible for returning the
encoded form of any given
string
// The fit_transform () method is used to fit the data into a model
and transform it into a form
that is more suitable for the model in a single step.
//: means all row, : -1 means excluding last column
)
Solution:
import numpy as np
import pandas as pd
dataset = pd.read_csv("play_tennis.csv")
dataset
from sklearn import preprocessing
le = preprocessing.LabelEncoder()
dataset['outlook'] = le.fit_transform(dataset.outlook)
dataset['temp'] = le.fit_transform(dataset.temp)
dataset['humidity'] = le.fit_transform(dataset.humidity)
dataset['wind'] = le.fit_transform(dataset.wind)
dataset['play'] = le.fit_transform(dataset.play)
x=dataset.iloc[:,:-1].values

12
print(x)
y=dataset.iloc[:,4].values
print(y)
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test=train_test_split(x,y,test_size=0.2)
print(x_train)
print(x_test)

Q.2 Write a script in R to create a list of employees and perform the


following:
a. Display names of employees in the list.
b. Add an employee at the end of the list.
c. Remove the third element of the list.
[15]
Solution:-
Employee<-data.frame(
eno=c(1,2,3),
ename=c("Pratik","Rohan","Tushar"),
sal=c(10000,20000,30000)
)
print(Employee)
new_data<-rbind(Employee,c(4,"XYZ",2000))
print(new_data)
data<-new_data[-3,]
print(data)
13
Slip 7
Q.1 Write a R program to create a data frame using two given vectors and
display the
duplicate elements.
[15]

Solution:-

vector1 <- c(1,2,3,4,5,6,7,8,6,4)


vector2 <- c(1, 'B', 'C', 'D', 'E', 'D', 'F', 'G',2,3)
data<-data.frame(vector1,vector2)
# 11.Write a R program to create a data frame using two given
vectors and display the duplicate
duplicates =
data[duplicated(data$vector1)|duplicated(data$vector1,fromLast
=TRUE),]
cat("Original Data Frame:\n")
print(data)
cat("\nDuplicate Elements:\n")
print(duplicates)
Q.2 Write a Python program build Decision Tree Classifier using
Scikit-learn
package for diabetes data set (download database from
https://www.kaggle.com/uciml/pima-indians-diabetes-database)
[15]

14
Solution:-
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score
#Load the diabetes dataset (downloaded from the provided URL)
#dataset_url = 'https://raw.githubusercontent.com/uciml/pima-
indians-
#diabetes-database/master/diabetes.csv'
df=pd.read_csv("diabetes.csv")
# Split features (X) and target (y)
X = df.drop('Outcome', axis=1)
y = df['Outcome']
# Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y,
test_size=0.2,
random_state=42)
# Create and train the Decision Tree Classifier
clf = DecisionTreeClassifier(random_state=42)
clf.fit(X_train, y_train)
# Make predictions on the test set
y_pred = clf.predict(X_test)
# Calculate accuracy

15
accuracy = accuracy_score(y_test, y_pred)
print ("Accuracy:", accuracy)

Slip 8

Q.1 Write a python program to splitting the dataset into training and
testing set. [15]

Solution:--
(
// numpy for mathematical operations
// pandas to use .csv or .xl file, or to import column from dataset
// Scikit-Learn, also known as sklearn is a python library to
implement machine learning models
and statistical modelling. Through scikit-learn, we can implement
various machine learning
models for regression, classification, clustering, and statistical
tools for analyzing these models.
// The encode() function in Python is responsible for returning the
encoded form of any given
string
// The fit_transform () method is used to fit the data into a model
and transform it into a form
that is more suitable for the model in a single step.
//: means all row, : -1 means excluding last column

16
)
Solution:
import numpy as np
import pandas as pd
dataset = pd.read_csv("play_tennis.csv")
dataset
from sklearn import preprocessing
le = preprocessing.LabelEncoder()
dataset['outlook'] = le.fit_transform(dataset.outlook)
dataset['temp'] = le.fit_transform(dataset.temp)
dataset['humidity'] = le.fit_transform(dataset.humidity)
dataset['wind'] = le.fit_transform(dataset.wind)
dataset['play'] = le.fit_transform(dataset.play)
x=dataset.iloc[:,:-1].values
print(x)
y=dataset.iloc[:,4].values
print(y)
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test=train_test_split(x,y,test_size=0.2)
print(x_train)
print(x_test)
Q.2 Consider following dataset
weather=['Sunny','Sunny','Overcast','Rainy','Rainy','Rainy','Overcast','S

17
unny','Sunny','Rainy','Sunny','Overcast','Overcast','Rainy']
temp=['Hot','Hot','Hot','Mild','Cool','Cool','Cool','Mild','Cool','Mild','Mi
ld','Mild','Hot','Mild']
play=['No','No','Yes','Yes','Yes','No','Yes','No','Yes','Yes','Yes','Yes','Y
es','No']. Use Naïve Bayes algorithm to predict[ 0:Overcast, 2:Mild]
tuple belongs to which class whether to play the sports or not.
[15]
Solution:-
weather=['sunny','sunny','overcast','rainy','rainy','rainy','overcas
t','sunny','sunny','rainy','su
nny','overcast','overcast','rainy']
temp=['hot','hot','hot','mild','cool','cool','cool','mild','cool','mild',
'mild','mild','hot','mild']
play=['No','No','Yes','Yes','Yes','No','Yes','No','Yes','Yes','Yes','
Yes','Yes','No']
from sklearn import preprocessing
le = preprocessing.LabelEncoder()
wheather_encoded = le.fit_transform(weather)
print(wheather_encoded)
temp_encoded = le.fit_transform(temp)
label = le.fit_transform(play)
print("Temp:",temp_encoded)
print("Play:",label)
features = list(zip(wheather_encoded,temp_encoded))
print(features)
from sklearn.naive_bayes import GaussianNB
18
model = GaussianNB()
model.fit(features,label)
predicted = model.predict([[0,2]])
print("Predicted Value:",predicted)

Slip 9

Q.1 Write a R program to reverse a number and also calculate the sum of
digits of that
number. [15]

Solution:-
x = as.integer(readline("Enter any number:- "))
temp=x
rev=0
while(temp>0)
{
rem = temp%%10
rev=(rev*10)+rem
temp=floor(temp/10)
}
cat("Reverse of number is ",rev)
sum=0

19
while(x>0)
{
rem = x%%10
sum=sum+rem
x=floor(x/10)
}
cat("Sum of digits of the number is ",sum)

Q.2 Write a Python Programme to read the dataset (“Iris.csv”). dataset


download from
(https://archive.ics.uci.edu/ml/datasets/iris) and apply Apriori algorithm.
[15]
Solution:-
/* Pls type each line separately in Jupyter */
/** Before importing the libraries, we will use the below line of
code to install the apyori
package to use further **/
pip install apyori /**Type this command on Command Prompt
**/
/** Type the following program in Jupyter **/
import numpy as nm
import matplotlib.pyplot as mtp
import pandas as pd
dataset = pd.read_csv('Iris.csv')

20
dataset
transactions=[]
for i in range(0, 150): transactions.append([str(dataset.values[i,j])
for j in range(0,5)])
from apyori import apriori
rules= apriori(transactions= transactions, min_support=0.003,
min_confidence = 0.2,
min_lift=3, min_length=2, max_length=2)
results= list(rules)
results
for item in results:pair = item[0] , item = [x for x in pair]
print("Rule: " + item[0] + " -> " + item[1])
print("Support: " + str(item[1]))
print("Confidence: " + str(item[2][0][2]))
print("Lift: " + str(item[2][0][3]))
print("=====================================")

Slip 10

Q.1 Consider following observations/data. And apply simple linear regression


and find
out estimated coefficients b0 and b1.( use numpy package)
x= [0, 1, 2, 3, 4, 5, 6, 7, 8, 9,11,13]
y = ([1, 3, 2, 5, 7, 8, 8, 9, 10, 12,16, 18]
[15]
21
Solution:-
import numpy as np
from sklearn.linear_model import LinearRegression
x= np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9,11,13]).reshape((-1, 1))
y = np.array([1, 3, 2, 5, 7, 8, 8, 9, 10, 12,16, 18])
print(x)
print(y)
model = LinearRegression()
model.fit(x, y)
print('intercept:-',model.intercept_)
print('Slope:- ', model.coef_)
Q.2 Write a R program to create a data frame using two given vectors and
display the
duplicate elements [15]

Solution:-
vector1 <- c(1,2,3,4,5,6,7,8,6,4)
vector2 <- c(1, 'B', 'C', 'D', 'E', 'D', 'F', 'G',2,3)
data<-data.frame(vector1,vector2)
duplicates =
data[duplicated(data$vector1)|duplicated(data$vector1,fromLast
=TRUE),]
cat("Original Data Frame:\n")
print(data)
22
cat("\nDuplicate Elements:\n")
print(duplicates)

Slip 11

Q.1 Write a R program to reverse a number and also calculate the sum
of digits of that
number. [15]
Solution:-
x = as.integer(readline("Enter any number:- "))
temp=x
rev=0
while(temp>0)
{
rem = temp%%10
rev=(rev*10)+rem
temp=floor(temp/10)
}
cat("Reverse of number is ",rev)
sum=0
while(x>0)
{
rem = x%%10

23
sum=sum+rem
x=floor(x/10)
}
cat("Sum of digits of the number is ",sum)

Q.2 Consider following observations/data. And apply simple linear


regression and find
out estimated coefficients b1 and b1 Also analyse the performance of the
model
(Use sklearn package)
x = np.array([1,2,3,4,5,6,7,8])
y = np.array([7,14,15,18,19,21,26,23])
[15]
Solution:-
import numpy as np
from sklearn.linear_model import LinearRegression
x= np.array([1,2,3,4,5,6,7,8]).reshape((-1, 1))
print(x)
y = np.array([7,14,15,18,19,21,26,23])
print(y)
model = LinearRegression()
model.fit(x, y)
x_new = np.array(9).reshape((-1, 1))
y_new_pred = model.predict(x_new)
print(y_new_pred)

24
print('Slope:- ', model.coef_)

Slip 12

Q.1 Write a python program to implement multiple Linear Regression model


for a car
dataset. Dataset can be downloaded from:
https://www.w3schools.com/python/python_ml_multiple_regression.asp
[15]
/**** From the above link, copy data of car into excel file, save it
by .xls and then convert it into
.csv ***/
Solution:
import pandas
from sklearn import linear_model
df = pandas.read_csv("car.csv")
print(df)
X = df[['Weight', 'Volume']]
print(X)
y = df['CO2']
print(y)
regr = linear_model.LinearRegression()
regr.fit(X, y)
predictedCO2 = regr.predict([[2300, 1300]])

25
print(predictedCO2)
Q.2 Write a R program to calculate the sum of two matrices of given
size. [15]

Solution:-
# Define a function to calculate the sum of two matrices
matrix_sum <- function(matrix1, matrix2) {
if (dim(matrix1) != dim(matrix2)) {
stop("Matrices must have the same dimensions for addition.")
}

result_matrix <- matrix1 + matrix2


return(result_matrix)
}

# Input the size of the matrices


n_rows <- as.integer(readline("Enter the number of rows for the
matrices: "))
n_cols <- as.integer(readline("Enter the number of columns for
the matrices: "))

# Create the first matrix


cat("Enter values for the first matrix:\n")

26
matrix1 <- matrix(nrow = n_rows, ncol = n_cols)
for (i in 1:n_rows) {
for (j in 1:n_cols) {
matrix1[i, j] <- as.integer(readline(paste("Enter element at
position [", i, ",", j, "]: ")))
}
}

# Create the second matrix


cat("Enter values for the second matrix:\n")
matrix2 <- matrix(nrow = n_rows, ncol = n_cols)
for (i in 1:n_rows) {
for (j in 1:n_cols) {
matrix2[i, j] <- as.integer(readline(paste("Enter element at
position [", i, ",", j, "]: ")))
}
}

# Calculate the sum of the matrices


result <- matrix_sum(matrix1, matrix2)

# Print the result


cat("Sum of the two matrices:\n")

27
print(result)

Slip 13
Q.1 Write a python programme to implement multiple linear regression model
for stock
market data frame as follows:
Stock_Market = {'Year':
[2017,2017,2017,2017,2017,2017,2017,2017,2017,2017,2017,2017,2016,2016,
20
16,2016,2016,2016,2016,2016,2016,2016,2016,2016],
'Month': [12, 11,10,9,8,7,6,5,4,3,2,1,12,11,10,9,8,7,6,5,4,3,2,1],
'Interest_Rate':
[2.75,2.5,2.5,2.5,2.5,2.5,2.5,2.25,2.25,2.25,2,2,2,1.75,1.75,1.75,1.75,1.75,1.75,1
.7
5,1.75,1.75,1.75,1.75],
'Unemployment_Rate':
[5.3,5.3,5.3,5.3,5.4,5.6,5.5,5.5,5.5,5.6,5.7,5.9,6,5.9,5.8,6.1,6.2,6.1,6.1,6.1,5.9,6.
2,6
.2,6.1],
'Stock_Index_Price':
[1464,1394,1357,1293,1256,1254,1234,1195,1159,1167,1130,1075,1047,965,9
43,
958,971,949,884,866,876,822,704,719] }
And draw a graph of stock market price verses interest rate.
[15]

Solution:-
import pandas as pd
from sklearn import linear_model
data = {'year':
28
[2017,2017,2017,2017,2017,2017,2017,2017,2017,2017,2017,2017,2
016,2016,2016,201
6,2016,2016,2016,2016,2016,2016,2016,2016],
'month': [12,11,10,9,8,7,6,5,4,3,2,1,12,11,10,9,8,7,6,5,4,3,2,1],
'interest_rate':
[2.75,2.5,2.5,2.5,2.5,2.5,2.5,2.25,2.25,2.25,2,2,2,1.75,1.75,1.75,1.75,1
.75,1.75,1.75,1.75,
1.75,1.75,1.75],
'unemployment_rate':
[5.3,5.3,5.3,5.3,5.4,5.6,5.5,5.5,5.5,5.6,5.7,5.9,6,5.9,5.8,6.1,6.2,6.1,6.1,
6.1,5.9,6.2,6.2,6.1],
'index_price':
[1464,1394,1357,1293,1256,1254,1234,1195,1159,1167,1130,1075,1
047,965,943,958,97
1,949,884,866,876,822,704,719] }
df = pd.DataFrame(data)
print(df)
x = df[['interest_rate','unemployment_rate']]
print(x)
y = df['index_price']
print(y)
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(x, y, test_size =
0.2)

29
print(X_train)
print(X_test)
regr = linear_model.LinearRegression()
regr.fit(X_train, y_train)
print('Intercept: \n', regr.intercept_)
print('Coefficients: \n', regr.coef_)
y_pred=regr.predict(X_test)
print(y_pred)
from sklearn.metrics import r2_score
Accuracy=r2_score(y_test,y_pred)*100
print(Accuracy)
import matplotlib.pyplot as plt
plt.scatter(y_test,y_pred);
plt.xlabel('Actual');
plt.ylabel('Predicted');
import seaborn as sns
sns.regplot(x=y_test,y=y_pred,ci=None,color ='red');
Q.2 Write a R program to concatenate two given factors.
[15]

Solution:-
data1 <- c("ABC","PQR","XYZ")
data2 <- c(1,2,3)
30
factor1<-factor(data1)
factor2<-factor(data2)
print(factor1)
print(factor2)
concatinated<-c(factor1,factor2)
print(concatinated)

Slip 14

Q.1 Write a script in R to create a list of employees and perform the following:
a. Display names of employees in the list.
b. Add an employee at the end of the list.
c. Remove the third element of the list.
[15]
Solution:-
Employee<-data.frame(
eno=c(1,2,3),
ename=c("Pratik","Rohan","Tushar"),
sal=c(10000,20000,30000)
)
print(Employee)
new_data<-rbind(Employee,c(4,"XYZ",2000))
print(new_data)
data<-new_data[-3,]
31
print(data)

Q.2 Consider following observations/data. And apply simple linear


regression and find
out estimated coefficients b1 and b1 Also analyse the performance of the
model
(Use sklearn package)
x = np.array([1,2,3,4,5,6,7,8])
y = np.array([7,14,15,18,19,21,26,23])
[15]
Solution:-
import numpy as np
from sklearn.linear_model import LinearRegression
x= np.array([1,2,3,4,5,6,7,8]).reshape((-1, 1))
print(x)
y = np.array([7,14,15,18,19,21,26,23])
print(y)
model = LinearRegression()
model.fit(x, y)
x_new = np.array(9).reshape((-1, 1))
y_new_pred = model.predict(x_new)
print(y_new_pred)
print('Slope:- ', model.coef_)
Slip 15

32
Q.1 Write a R program to add, multiply and divide two vectors of integer
type. (vector
length should be minimum 4)
[15]
Solution: -
vector1<-c(1,2,3,4,5)
vector2<-c(6,7,8,9,10)
Addition<- vector1+vector2
print(Addition)
Multiplication<-vector1*vector2
print(Multiplication)
Division<-vector1/vector2
print(Division)
Q.2 Write a Python program build Decision Tree Classifier using Scikit-learn
package for diabetes data set (download database from
https://www.kaggle.com/uciml/pima-indians-diabetes-database)
[15]
Solution:-
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score
#Load the diabetes dataset (downloaded from the provided URL)
#dataset_url = 'https://raw.githubusercontent.com/uciml/pima-
indians-

33
#diabetes-database/master/diabetes.csv'
df=pd.read_csv("diabetes.csv")
# Split features (X) and target (y)
X = df.drop('Outcome', axis=1)
y = df['Outcome']
# Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y,
test_size=0.2,
random_state=42)
# Create and train the Decision Tree Classifier
clf = DecisionTreeClassifier(random_state=42)
clf.fit(X_train, y_train)
# Make predictions on the test set
y_pred = clf.predict(X_test)
# Calculate accuracy
accuracy = accuracy_score(y_test, y_pred)
print ("Accuracy:", accuracy)
Slip 16
Q.1 Write a python program to implement multiple Linear Regression model
for a car
dataset. Dataset can be downloaded from:
https://www.w3schools.com/python/python_ml_multiple_regression.asp
[15]

34
/**** From the above link, copy data of car into excel file, save it
by .xls and then convert it into
.csv ***/
Solution:
import pandas
from sklearn import linear_model
df = pandas.read_csv("car.csv")
print(df)
X = df[['Weight', 'Volume']]
print(X)
y = df['CO2']
print(y)
regr = linear_model.LinearRegression()
regr.fit(X, y)
predictedCO2 = regr.predict([[2300, 1300]])
print(predictedCO2)
Q.2 Write a script in R to create a list of employees and perform the following:
a. Display names of employees in the list.
b. Add an employee at the end of the list.
c. Remove the third element of the list.
[15]
Solution:-
Employee<-data.frame(
eno=c(1,2,3),

35
ename=c("Pratik","Rohan","Tushar"),
sal=c(10000,20000,30000)
)
print(Employee)
new_data<-rbind(Employee,c(4,"XYZ",2000))
print(new_data)
data<-new_data[-3,]
print(data)
Slip 17

Q.1 Write a python program to implement k-means algorithms on a


synthetic dataset.

[15]
/* Write all the coding in Single ….. in Jupyter */
Solution :
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.datasets import make_blobs
data = make_blobs(n_samples=300, n_features=2, centers=5,
cluster_std=1.8,random_state=101)
data[0].shape
data[1]

36
plt.scatter(data[0][:,0],data[0][:,1],c=data[1],cmap='brg')
from sklearn.cluster import KMeans
kmeans = KMeans(n_clusters=5)
kmeans.fit(data[0])
kmeans.cluster_centers_
kmeans.labels_f, (ax1, ax2) = plt.subplots(1, 2,
sharey=True,figsize=(10,6))
ax1.set_title('K Means')
ax1.scatter(data[0][:,0],data[0][:,1],c=kmeans.labels_,cmap='brg')
ax2.set_title("Original")
ax2.scatter(data[0][:,0],data[0][:,1],c=data[1],cmap='brg')
Q.2 Write a R program to sort a list of strings in ascending and descending
order.

[15]
Solution:-
list<-c("apple","banana","Pineapple","mango","Orange")
asc<-sort(list)
print(asc)
desc<-sort(list,decreasing = TRUE)
print(desc)
Slip 18

37
Q.1 Write a R program to reverse a number and also calculate the sum
of digits of that
number. [15]
Solution:-
x = as.integer(readline("Enter any number:- "))
temp=x
rev=0
while(temp>0)
{
rem = temp%%10
rev=(rev*10)+rem
temp=floor(temp/10)
}
cat("Reverse of number is ",rev)
sum=0
while(x>0)
{
rem = x%%10
sum=sum+rem
x=floor(x/10)
}
cat("Sum of digits of the number is ",sum)

38
Q.2 Write a python program to implement hierarchical Agglomerative
clustering algorithm. (Download Customer.csv dataset from github.com).
[15]
Solution :
import numpy as nm
import matplotlib.pyplot as mtp
import pandas as pd
dataset = pd.read_csv('Mall_Customers.csv')
x = dataset.iloc[:, [3, 4]].values
import scipy.cluster.hierarchy as shc
dendro = shc.dendrogram(shc.linkage(x, method="ward"))
mtp.title("Dendrogrma Plot")
mtp.ylabel("Euclidean Distances")
mtp.xlabel("Customers")
mtp.show()
from sklearn.cluster import AgglomerativeClustering
hc= AgglomerativeClustering(n_clusters=5, affinity='euclidean',
linkage='ward')
y_pred= hc.fit_predict(x)
mtp.scatter(x[y_pred == 0, 0], x[y_pred == 0, 1], s = 100, c =
'blue', label = 'Cluster 1')
mtp.scatter(x[y_pred == 1, 0], x[y_pred == 1, 1], s = 100, c =
'green', label = 'Cluster 2')

39
mtp.scatter(x[y_pred== 2, 0], x[y_pred == 2, 1], s = 100, c = 'red',
label = 'Cluster 3')
mtp.scatter(x[y_pred == 3, 0], x[y_pred == 3, 1], s = 100, c =
'cyan', label = 'Cluster 4')
mtp.scatter(x[y_pred == 4, 0], x[y_pred == 4, 1], s = 100, c =
'magenta', label = 'Cluster 5')
mtp.title('Clusters of customers')
mtp.xlabel('Annual Income (k$)')
mtp.ylabel('Spending Score (1-100)')
mtp.legend()
mtp.show()
Slip 19

Q.1 Write a python program to implement k-means algorithm to build


prediction model
(Use Credit Card Dataset CC GENERAL.csv Download from kaggle.com)
[15]
Solution :
import numpy as nm
import matplotlib.pyplot as mtp
import pandas as pd
dataset = pd.read_csv('creditcard.csv')
dataset

40
x = dataset.iloc[:, [3, 4]].values
print(x)
from sklearn.cluster import KMeans
wcss_list= []
for i in range(1, 11):kmeans = KMeans(n_clusters=i, init='k-
means++', random_state= 42)
kmeans.fit(x)
wcss_list.append(kmeans.inertia_)
mtp.plot(range(1, 11), wcss_list)
mtp.title('The Elobw Method Graph')
mtp.xlabel('Number of clusters(k)')
mtp.ylabel('wcss_list')
mtp.show()
kmeans = KMeans(n_clusters=3, init='k-means++',
random_state= 42)
y_predict= kmeans.fit_predict(x)
mtp.scatter(x[y_predict == 0, 0], x[y_predict == 0, 1], s = 100, c =
'blue', label ='Cluster 1')
#for first cluster
mtp.scatter(x[y_predict == 1, 0], x[y_predict == 1, 1], s = 100, c =
'green', label ='Cluster 2')
#for second cluster
mtp.scatter(x[y_predict== 2, 0], x[y_predict == 2, 1], s = 100, c =
'red', label ='Cluster 3')

41
#for third cluster
mtp.scatter(kmeans.cluster_centers_[:, 0],
kmeans.cluster_centers_[:, 1], s = 300,
c = 'yellow', label = 'Centroid')
mtp.title('Clusters of Credit Card')
mtp.xlabel('V3')
mtp.ylabel('V4')
mtp.legend()
mtp.show()

Q.2 Write a script in R to create a list of employees and perform the


following:
a. Display names of employees in the list.
b. Add an employee at the end of the list.
c. Remove the third element of the list.
[15]
Employee<-data.frame(
eno=c(1,2,3),
ename=c("Pratik","Rohan","Tushar"),
sal=c(10000,20000,30000)
)
print(Employee)
new_data<-rbind(Employee,c(4,"XYZ",2000))
print(new_data)
data<-new_data[-3,]

42
print(data)

Slip 20

Q.1 Write a python program to implement hierarchical clustering


algorithm. (Download
Wholesale customers data dataset from github.com).
[15]
import numpy as nm
import matplotlib.pyplot as mtp
import pandas as pd
dataset = pd.read_csv('Wholesale customers data.csv')
dataset
x = dataset.iloc[:, [3, 4]].values
print(x)
import scipy.cluster.hierarchy as shc
dendro = shc.dendrogram(shc.linkage(x, method="ward"))
mtp.title("Dendrogrma Plot")
mtp.ylabel("Euclidean Distances")
mtp.xlabel("Customers")
mtp.show()
from sklearn.cluster import AgglomerativeClustering
hc= AgglomerativeClustering(n_clusters=5, affinity='euclidean',
linkage='ward')

43
y_pred= hc.fit_predict(x)
mtp.scatter(x[y_pred == 0, 0], x[y_pred == 0, 1], s = 100, c =
'blue', label = 'Cluster 1')
mtp.scatter(x[y_pred == 1, 0], x[y_pred == 1, 1], s = 100, c =
'green', label = 'Cluster 2')
mtp.scatter(x[y_pred== 2, 0], x[y_pred == 2, 1], s = 100, c = 'red',
label = 'Cluster 3')
mtp.scatter(x[y_pred == 3, 0], x[y_pred == 3, 1], s = 100, c =
'cyan', label = 'Cluster 4')
mtp.scatter(x[y_pred == 4, 0], x[y_pred == 4, 1], s = 100, c =
'magenta', label = 'Cluster 5')
mtp.title('Clusters of customers')
mtp.xlabel('Milk')
mtp.ylabel('Grocery')
mtp.legend()
mtp.show()
Q.2 Write a R program to concatenate two given factors.
[15]
data1 <- c("ABC","PQR","XYZ")
data2 <- c(1,2,3)
factor1<-factor(data1)
factor2<-factor(data2)
print(factor1)
print(factor2)

44
concatinated<-c(factor1,factor2)
print(concatinated)

45

You might also like