0% found this document useful (0 votes)
4 views

Ml Lab Manual Cse

The document outlines the practical record for students at Kallam Haranadhareddy Institute of Technology, including sections for student information, certificates, and an index of experiments. It details the vision, mission, and program-specific outcomes for the Computer Science and Engineering department, along with course outcomes related to machine learning. Additionally, it provides sample experiments involving Python programming, regression analysis, classification algorithms, and machine learning techniques.

Uploaded by

228x1a05d2
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Ml Lab Manual Cse

The document outlines the practical record for students at Kallam Haranadhareddy Institute of Technology, including sections for student information, certificates, and an index of experiments. It details the vision, mission, and program-specific outcomes for the Computer Science and Engineering department, along with course outcomes related to machine learning. Additionally, it provides sample experiments involving Python programming, regression analysis, classification algorithms, and machine learning techniques.

Uploaded by

228x1a05d2
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 49

KALLAM HARANADHAREDDY

INSTITUTE OF TECHNOLOGY
(Approved by AICTE New Delhi & Affiliated to JNTUK, Kakinada)
NH- 5, Chowdavaram, Guntur-522 019
An ISO 9001:2015 Certified Institution, Accredited by NAAC & NBA

KHIT

PRACTICAL RECORD

Name:……………………………………………………………………..
Roll No:……………..…... Year & Semester:…………..………..

Branch:…………………. Section:…………………………........

Lab:……………………………………………………………………..…
KALLAM HARANADHAREDDY
INSTITUTE OF TECHNOLOGY
(APPROVED BY AICTE NEW DELHI, AFFLIATED TO
JNTUK, KAKINADA) CHOWDAVARAM, GUNTUR-19

Roll No:

CERTIFICATE

This is to Certify that the Bonafide Record of the Laboratory Work done by

Mr/Ms…………………………………………………………………………………………..

of……..B.Tech/M.Tech/Diploma……...Semester in ………..Branch has completed…..…..


experiments in ……………………………………………………….………………………..

Laboratory during the Academic year 20 -20

Faculty-in-charge Head of the Department

Internal Examiner External Examiner


INDEX
EX. PAGE

NO DATE NAME OF THE EXPERIMENT FROM TO MARKS SIGNATURE

.
EX. PAGE
NO
DATE NAME OF THE EXPERIMENT FROM TO MARKS SIGNATURE

.
CSE DEPARTMENT VISION, MISSION, GOALS

Vision
Imparting quality technical education to learners in the field of Computer Science and
Engineering to produce technically competent software personnel with advanced skills, knowledge and
behavior to meet the computational global real time challenges.

Mission

To impart the quality technical education through training in


M1: fundamentals of software and hardware in the field of computer science
and engineering with global standards.
To educate the students to become software professionals and lifelong
M2:
learners through professional training and practice.
To develop professional ethics in students to lead the life with good
M3:
human values.
To be a state of the art research centre in the field of computer science &
M4:
engineering promoting innovation and research.

PROGRAM SPECIFIC OUTCOMES (PSOs)

PSO-1: Acquaintance of knowledge and implement concepts of programming


languages, software engineering, computer networks, databases and computer
automation to solve computing problems.

PSO-2: Understand, Analyze, Design, Develop and Test computer programs for the
problems related to Algorithms, Internet of Things, Data Sciences, Cloud Computing,
Artificial Intelligence and Machine Learning.

PSO-3: Apply theoretical and practical knowledge by using Modern software tools and
techniques to build application software.
Course Outcomes:

At the end of the course, students will be able to:

• Apply Machine learning approaches for a given problem.

• Analyze and identify the need for machine learning techniques for a particular domain.
• Develop the real time applications and predict its outcomes using machine learning
algorithms.

CO- PO Mapping
(3/2/1 indicates strength of correlation) 3-Strong, 2-Medium, 1-Weak
COs Programme Outcomes(POs) PSOs
PO1 PO2 PO3 PO4 PO5 PO6 PO7 PO8 PO9 PO10 PO11 PO12 PSO1 PSO2 PSO3

CO1 1 1 1 1 1 1 2

CO2 1 1 2 1 1 1 1 1 1 1
Exp. No: Date:

1. Install the python software/Anaconda- python and install useful package for machine learning
load the dataset (sample), understand, and visualize the data.

CSE KHIT
Exp. No: Date:

CSE KHIT
Exp. No: Date:

CSE KHIT
Exp. No: Date:

CSE KHIT
Exp. No: Date:

CSE KHIT
Exp. No: Date:

CSE KHIT
Exp. No: Date:

B) Loading, understanding and visualizing the data set.

import pandas as pd
import matplotlib.pyplot as plt
# importing the dataset
dataset
=pd.read_csv('D:/Salary_Data.csv')
print(dataset.head())

Output:

CSE KHIT
Exp. No: Date:

2. Implement simple linear regression.

import numpy as np

import matplotlib.pyplot as plt

def estimate_coef(x,y):
# number of observations/points
n = np.size(x)

# mean of x and y vector


m_x = np.mean(x)
m_y = np.mean(y)

# calculating cross-deviation and deviation about x


SS_xy = np.sum(y*x) - n*m_y*m_x
SS_xx = np.sum(x*x) - n*m_x*m_x

# calculating regression coefficients


b_1 = SS_xy / SS_xx
b_0 = m_y - b_1*m_x
return (b_0, b_1)
def plot_regression_line(x, y, b):

# plotting the actual points as scatter plot


plt.scatter(x,y,color="m",marker= "o",s= 30)

# predicted response vector


y_pred = b[0] + b[1]*x

# plotting the regression line


plt.plot(x, y_pred, color = "g")

# putting labels
plt.xlabel('x')
plt.ylabel('y')

# function to show plot


plt.show()
def main():
# observations / data
x = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
y = np.array([1, 3, 2, 5, 7, 8, 8, 9, 10, 12])
Exp. No: Date:

# estimating coefficients
b = estimate_coef(x, y)
print("Estimated coefficients:\nb_0 = {}\nb_1 = {}".format(b[0], b[1]))

# plotting regression line


plot_regression_line(x, y, b)
if __name__ == "__main__":
main()

Output:
Estimated coefficients:
b_0 = -0.0586206896552
b_1 = 1.45747126437

And graph obtained looks like this:

CSE KHIT
Exp. No: Date:

CSE KHIT
Exp. No: Date:

3. Implement multivariate linear regression.

import pandas

from sklearn import linear_model

a={

'slips' : [2,4,6,8],

'open' : [2,4,6,8],

'marks' : [20,40,60,80]

df = pandas.DataFrame(a)

X = df[['slips', 'open']]

y = df['marks']

regr = linear_model.LinearRegression()

regr.fit(X, y)

#predict the Marks for slips=5 and open=5

predictedMarks= regr.predict([[5,5]])

print(predictedMarks)

Output:

[ 50 . ]

CSE KHIT
Exp. No: Date:

4. Implement simple Logistic Regression and Multivariate Logistic Regression.

import pandas as pd

from sklearn.linear_model import LogisticRegression

from sklearn.model_selection import train_test_split

from sklearn import metrics

a={

'slips' : [2,4,6,8,10,15,18],

'pass_or_fail' : [0,1,1,1,1,1]

DF=pd.DataFrame(a)

#split dataset in features and target variable

feature_cols = ['slips']

X = DF[feature_cols] # Features

y = DF.pass_or_fail # Target variable

# split X and y into training and testing sets

X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.5,random_state=0)

#instantiate the model (using the default parameters)

logreg = LogisticRegression()#

fit the model with data

logreg.fit(X_train,y_train)

y_pred=logreg.predict(X_test)

#import the metrics class

cnf_matrix = metrics.confusion_matrix(y_test, y_pred)

print("Accuracy:",metrics.accuracy_score(y_test, y_pred))
Exp. No: Date:

print(cnf_matrix)

Output:

runfile('C:/Users/student/.spyder-py3/temp.py', wdir='C:/Users/student/.spyder-py3')

Accuracy: 0.75

[[0 0]

[1 3]]

CSE KHIT
Exp. No: Date:

5. Implement Decision Trees.

# Load libraries
import pandas as pd

from sklearn.tree import DecisionTreeClassifier

# Import Decision Tree Classifier

from sklearn.model_selection import train_test_split


# Import train_test_split function

from sklearn import metrics

#Import scikit-learn metrics module for accuracy calculation


a={

'easy' : [0,1,1,0,0,0,0,0],

'slips' : [0,0,2,2,4,6,8,10],

'result' : [0,1,1,0,1,1,1,1]

pima=pd.DataFrame(a)

print(pima.head())

#split dataset in features and target variable

feature_cols = ['easy', 'slips']

X = pima[feature_cols] # Features

y = pima.result # Target variable

# Split dataset into training set and test set

X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.25,random_state=1)

# 75% training and 25% test

# Create Decision Tree classifier object

clf = DecisionTreeClassifier()

# Train Decision Tree Classifier


clf = clf.fit(X_train,y_train)

#Predict the response for test dataset

y_pred =clf.predict(X_test)

# Model Accuracy, how often is the classifier correct?

print("Accuracy:",metrics.accuracy_score(y_test, y_pred))

OUTPUT:
runfile('C:/Users/student/.spyder-py3/untitled1.py', wdir='C:/Users/student/.spyder-py
easy slips result
0 0 0 0
1 1 0 1
2 1 2 1
3 0 2 0
4 0 4 1
Accuracy: 1.0

CSE KHIT
Exp. No: Date:

6. Implement any 3 ClassificationAlgorithms.

import pandas as pd

import matplotlib.pyplot as plt

# importing the dataset

dataset = pd.read_csv('Salary_Data.csv')

print(dataset.head())

# data preprocessing

X = dataset.iloc[:, :-1].values # independent variable array

y = dataset.iloc[:, 1].values # dependent variable vector

# splitting the dataset

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, random_state=0)

# fitting the regression model

from sklearn.linear_model import LinearRegression

regressor = LinearRegression()

regressor.fit(X_train, y_train)

# actually produces the linear eqn for the data

# predicting the test set results

y_pred = regressor.predict(X_test)

print(y_pred)

print(y_test)

# visualizing the results

# plot for the TRAIN


plt.scatter(X_train, y_train, color='red')

# plotting the observation line

plt.plot(X_train, regressor.predict(X_train), color='blue')

# plotting the regression line

plt.title("Salary vs Experience (Training set)")

# stating the title of the graph

plt.xlabel("Years of experience")

# adding the name of x-axis

plt.ylabel("Salaries")

# adding the name of y-axis

plt.show()

# specifies end of graph

# plot for the TEST

plt.scatter(X_test, y_test, color='red')

plt.plot(X_train, regressor.predict(X_train), color='blue')

# plotting the regression line

plt.title("Salary vs Experience (Testing set)")

plt.xlabel("Years of experience")

plt.ylabel("Salaries")

plt.show()

Output:

CSE KHIT
Exp. No: Date:

import pandas

from sklearn import linear_model

a={

'slips' : [2,4,6,8],

'open' : [2,4,6,8],

'marks' : [20,40,60,80]

df = pandas.DataFrame(a)

X = df[['slips', 'open']]

y = df['marks']

regr = linear_model.LinearRegression()

regr.fit(X, y)

#predict the Marks for slips=5 and open=5

predictedMarks= regr.predict([[5,5]])

print(predictedMarks)

Output:

[ 50 . ]

CSE KHIT
Exp. No: Date:

import pandas as pd

from sklearn.linear_model import LogisticRegression

from sklearn.model_selection import train_test_split

from sklearn import metrics

a={

'slips' : [2,4,6,8,10,15,18],

'pass_or_fail' : [0,1,1,1,1,1]

DF=pd.DataFrame(a)

#split dataset in features and target variable

feature_cols = ['slips']

X = DF[feature_cols] # Features

y = DF.pass_or_fail # Target variable

# split X and y into training and testing sets

X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.5,random_state=0)

#instantiate the model (using the default parameters)

logreg = LogisticRegression()#

fit the model with data

logreg.fit(X_train,y_train)

y_pred=logreg.predict(X_test)

#import the metrics class

cnf_matrix = metrics.confusion_matrix(y_test, y_pred)

print("Accuracy:",metrics.accuracy_score(y_test, y_pred))

Exp. No: Date:


Exp. No: Date:

print("Accuracy:",metrics.accuracy_score(y_test, y_pred))

print(cnf_matrix)

Output:

runfile('C:/Users/student/.spyder-py3/temp.py', wdir='C:/Users/student/.spyder-py3')

Accuracy: 0.75

[[0 0]

[1 3]]
Exp. No: Date:

7. Implement Random Forests Algorithm.

#Import scikit-learn dataset library

from sklearn import datasets

import pandas as pd

from sklearn.model_selection

import train_test_split

from sklearn.ensemble importRandomForestClassifier

from sklearn import metrics

#Load dataset

iris = datasets.load_iris()

# print the label species(setosa, versicolor,virginica)

print(iris.target_names)

# print the names of the four features

print(iris.feature_names)

# print the iris data (top 5 records)

print(iris.data[0:5])

# print the iris labels (0:setosa, 1:versicolor, 2:virginica)

print(iris.target)

# Creating a DataFrame of given iris dataset.

data=pd.DataFrame({

'sepal length':iris.data[:,0],

'sepal width':iris.data[:,1],

'petal length':iris.data[:,2],

'petal width':iris.data[:,3],

CSE KHIT
Exp. No: Date:

'species':iris.target

})

data.head()

X=data[['sepal length', 'sepal width', 'petal length', 'petal width']] # Features y=data['species']

# Labels

# Split dataset into training set and test set

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3) # 70% training and 30% test

#Import Random Forest Model

#Create a Gaussian Classifier

clf=RandomForestClassifier(n_estimators=100)

#Train the model using the training sets y_pred=clf.predict(X_test)

clf.fit(X_train,y_train)

y_pred=clf.predict(X_test)

#Import scikit-learn metrics module for accuracy calculation

# Model Accuracy, how often is the classifier correct?

print("Accuracy:",metrics.accuracy_score(y_test, y_pred))

Output:

CSE KHIT
Exp. No: Date:

8. Implement K-Means, KNN algorithms.

#Create artificial data set

from sklearn.datasets import make_blobs

raw_data = make_blobs(n_samples = 200, n_features = 2, centers = 4, cluster_std = 1.8)

#Data imports

import pandas as pd

import numpy as np

#Visualization imports

import matplotlib.pyplot as plt

#Visualize the data

plt.scatter(raw_data[0][:,0], raw_data[0][:,1])

plt.scatter(raw_data[0][:,0], raw_data[0][:,1], c=raw_data[1])

#Build and train the model from

sklearn.cluster import KMeans

model = KMeans(n_clusters=4)

model.fit(raw_data[0])

#See the predictions

print(model.labels_)

print(model.cluster_centers_)

#PLot the predictions against the original data set

f, (ax1, ax2) = plt.subplots(1, 2, sharey=True,figsize=(10,6))

ax1.set_title('Our Model')

ax1.scatter(raw_data[0][:,0],raw_data[0][:,1],c=model.labels_)

CSE KHIT
Exp. No: Date:

ax2.set_title('Original Data')

ax2.scatter(raw_data[0][:,0], raw_data[0][:,1],c=raw_data[1])

Output:

CSE KHIT
Exp. No: Date:

#Common imports
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
#Import the data set
raw_data = pd.read_csv('classified_data.csv', index_col = 0)
print(raw_data.columns)

#Import standardization functions from scikit-learn


from sklearn.preprocessing import StandardScaler
#Standardize the data set scaler

= StandardScaler()
scaler.fit(raw_data.drop('TARGET CLASS', axis=1))
scaled_features = scaler.transform(raw_data.drop('TARGET CLASS', axis=1))
scaled_data = pd.DataFrame(scaled_features, columns = raw_data.drop('TARGET CLASS',
axis=1).columns)
#Split the data set into training data and test data
from sklearn.model_selection import train_test_split
x = scaled_data
y = raw_data['TARGET CLASS']
x_training_data, x_test_data, y_training_data, y_test_data = train_test_split(x, y, test_size = 0.3)
#Train the model and make predictions
from sklearn.neighbors import KNeighborsClassifier
model = KNeighborsClassifier(n_neighbors = 1)
model.fit(x_training_data, y_training_data)
predictions = model.predict(x_test_data)
#Performance measurement
from sklearn.metrics import classification_report
from sklearn.metrics import confusion_matrix

CSE KHIT
Exp. No:
Date:
print(classification_report(y_test_data, predictions))
print(confusion_matrix(y_test_data, predictions))
#Selecting an optimal K value
error_rates = []
for i in np.arange(1, 101):
new_model=KNeighborsClassifier(n_neighbors=i)
new_model.fit(x_training_data,y_training_data)
new_predictions=new_model.predict(x_test_data)
error_rates.append(np.mean(new_predictions != y_test_data))
plt.figure(figsize=(16,12))
plt.plot(error_rates)

Output:

CSE KHIT
Exp. No: Date:

9. Implement SVM on any applicable datasets.

#Import scikit-learn dataset library

from sklearn import datasets

from sklearn.model_selection import train_test_split

from sklearn import svm

from sklearn import metrics

#Load dataset

cancer=datasets.load_breast_cancer()

# print the names of the 13 features

print("Features: ", cancer.feature_names)

# print the label type of cancer('malignant' 'benign')

print("Labels: ", cancer.target_names)

# print data(feature)shape

cancer.data.shape

# print the cancer data features (top 5 records)

print(cancer.data[0:5])

# print the cancer labels (0:malignant, 1:benign)

print(cancer.target)

# Split dataset into training set and test set

X_train, X_test, y_train, y_test = train_test_split(cancer.data, cancer.target,


test_size=0.3,random_state=109) # 70% training and 30% test

#Create a svm Classifier

clf = svm.SVC(kernel='linear')
# Linear Kernel

CSE KHIT
Exp. No: Date:

#Train the model using the training sets


clf.fit(X_train, y_train)

#Predict the response for test dataset


y_pred = clf.predict(X_test)
# Model Accuracy: how often is the classifier correct?
print("Accuracy:",metrics.accuracy_score(y_test, y_pred))
# Model Precision: what percentage of positive tuples are labeled as such?
print("Precision:",metrics.precision_score(y_test, y_pred))
#ModelRecall:what percentage of positive tuples are labelled as such?

print("Recall:",metrics.recall_score(y_test, y_pred))

Output:

CSE KHIT
Exp. No: Date:

10. Implement Neural Networks.

import numpy as np

class NeuralNetwork():

def init (self):

# seeding for random number generation

np.random.seed(1)

# converting weights to a 3 by 1 matrix with values from -1 to 1 and mean of 0

self.synaptic_weights = 2 * np.random.random((3, 1)) - 1

def sigmoid(self, x):

# applying the sigmoid function

return 1 / (1 + np.exp(-x))

def sigmoid_derivative(self, x):

# computing derivative to the Sigmoid function

return x * (1 - x)

def train(self, training_inputs, training_outputs, training_iterations):

# training the model to make accurate predictions while adjusting weights continually

for iteration in range(training_iterations):

# siphon the training data via the neuron

output = self.think(training_inputs)

# computing error rate for back-propagation

error = training_outputs - output

# performing weight adjustments

adjustments = np.dot(training_inputs.T, error * self.sigmoid_derivative(output))

CSE KHIT
Exp. No: Date:

self.synaptic_weights += adjustments

def think(self, inputs):

# passing the inputs via the neuron to get output

# converting values to floats

inputs = inputs.astype(float)

output = self.sigmoid(np.dot(inputs, self.synaptic_weights))

return output

if name == " main ":

# initializing the neuron class

neural_network = NeuralNetwork()

print("Beginning Randomly Generated Weights:")

print(neural_network.synaptic_weights)

# training data consisting of 4 examples--3 input values and 1 output

training_inputs = np.array([[0, 0, 1],

[1, 1, 1],

[1, 0, 1],

[0, 1, 1]])

training_outputs = np.array([[0, 1, 1, 0]]).T

# training taking place

neural_network.train(training_inputs,training_outputs,15000)

print("Ending Weights After Training: ")

print(neural_network.synaptic_weights)

user_input_one = str(input("User Input One: "))

CSE KHIT
Exp. No: Date:

user_input_two = str(input("User Input Two: "))

user_input_three = str(input("User Input Three: "))

print("Considering New Situation: ", user_input_one, user_input_two, user_input_three)

print("New Output data: ")

print(neural_network.think(np.array([user_input_one, user_input_two, user_input_three])))

print("Wow, we did it!")

Output:

CSE KHIT
Exp. No: Date:

11. Implement PCA.

# import all libraries

import pandas as pd

import numpy as np

import matplotlib.pyplot as plt

%matplotlib inline

from sklearn.decomposition import PCA

from sklearn.preprocessing import StandardScaler

#import the breast _cancer dataset

from sklearn.datasets import load_breast_cancer

data=load_breast_cancer()

data.keys()

# Check the output classes

print(data['target_names'])

# Check the input attributes

print(data['feature_names'])

# construct a dataframe using pandas

df1=pd.DataFrame(data['data'],columns=data['feature_names'])

# Scale data befor applying PCA

scaling=StandardScaler()

# Use fit and transform method

scaling.fit(df1)

Scaled_data=scaling.transform(df1)

CSE KHIT
Exp. No: Date:

# Set the n_components=3

principal=PCA(n_components=3)

principal.fit(Scaled_data)

x=principal.transform(Scaled_data)

# Check the dimensions of data after PCA

print(x.shape)

# Check the values of eigen vectors

# prodeced by principal components

principal.components_

plt.figure(figsize=(10,10))

plt.scatter(x[:,0],x[:,1],c=data['target'],cmap='plasma')

plt.xlabel('pc1')

plt.ylabel('pc2')

# immport relevant libraries for 3d graph

from mpl_toolkits.mplot3d import Axes3D

fig = plt.figure(figsize=(10,10))

# choose projection 3d for creating a 3d graph

axis = fig.add_subplot(111, projection='3d')

# x[:,0]is pc1,x[:,1] is pc2 while x[:,2] is pc3

axis.scatter(x[:,0],x[:,1],x[:,2], c=data['target'],cmap='plasma')

axis.set_xlabel("PC1", fontsize=10)

axis.set_ylabel("PC2", fontsize=10)

axis.set_zlabel("PC3", fontsize=10)

CSE KHIT
Exp. No: Date:

# check how much variance is explained by each principal component

print(principal.explained_variance_ratio_)

(569,3)

CSE KHIT
Exp. No: Date:

CSE KHIT
Exp. No: Date:

12. Implement anomaly detection and recommendation

.# import pandas library

import pandas as pd

# Get the data

column_names = ['user_id', 'item_id', 'rating', 'timestamp']

path = 'https://media.geeksforgeeks.org/wp-content/uploads/file.tsv'

df = pd.read_csv(path, sep='\t', names=column_names)

# Check the head of the data

df.head()

# Check out all the movies and their respective IDs

movie_titles=pd.read_csv('https://media.geeksforgeeks.org/wp-
content/uploads/Movie_Id_Titles.csv')

movie_titles.head()

data = pd.merge(df, movie_titles, on='item_id')

data.head()

# Calculate mean rating of all movies

data.groupby('title')['rating'].mean().sort_values(ascending=False).head()

# Calculate count rating of all movies

data.groupby('title')['rating'].count().sort_values(ascending=False).head()

# creatingdataframe with 'rating' count values

ratings = pd.DataFrame(data.groupby('title')['rating'].mean())

ratings['num of ratings'] = pd.DataFrame(data.groupby('title')['rating'].count())

ratings.head()

CSE KHIT
Exp. No: Date:

import matplotlib.pyplot as plt

import seaborn as sns

sns.set_style('white')

%matplotlib inline

# plot graph of 'num of ratings column'

plt.figure(figsize =(10, 4))

ratings['num of ratings'].hist(bins = 70)

# plot graph of 'ratings' column

plt.figure(figsize =(10, 4))

ratings['rating'].hist(bins = 70)

# Sorting values according to

# the 'num of rating column'

moviemat = data.pivot_table(index ='user_id', columns ='title', values='rating')

moviemat.head()

ratings.sort_values('num of ratings', ascending =False).head(10)

# analysing correlation with similar movies

starwars_user_ratings = moviemat['Star Wars (1977)']

liarliar_user_ratings = moviemat['Liar Liar (1997)']

starwars_user_ratings.head()

# analysing correlation with similar movies

similar_to_starwars = moviemat.corrwith(starwars_user_ratings)

similar_to_liarliar = moviemat.corrwith(liarliar_user_ratings)

CSE KHIT
Exp. No: Date:

corr_starwars=pd.DataFrame(similar_to_starwars,columns=['Correlation'])

corr_starwars.dropna(inplace = True)

corr_starwars.head()

Output:

CSE KHIT
Exp. No: Date:

Anomaly recommendation

# import pandas library

import pandas as pd

# Get the data

column_names = ['user_id', 'item_id', 'rating', 'timestamp']

path = 'https://media.geeksforgeeks.org/wp-content/uploads/file.tsv'

df = pd.read_csv(path, sep='\t', names=column_names)

# Check the head of the data

df.head()

# Check out all the movies and their respective IDs

movie_titles = pd.read_csv('https://media.geeksforgeeks.org/wp-

content/uploads/Movie_Id_Titles.csv')

movie_titles.head()

data = pd.merge(df, movie_titles, on='item_id')

data.head()

# Calculate mean rating of all movies

data.groupby('title')['rating'].mean().sort_values(ascending=False).head()

# Calculate count rating of all movies

data.groupby('title')['rating'].count().sort_values(ascending=False).head()

# creating dataframe with 'rating' count values

ratings = pd.DataFrame(data.groupby('title')['rating'].mean())

ratings['num of ratings'] = pd.DataFrame(data.groupby('title')['rating'].count())

ratings.head()

CSE KHIT
Exp. No: Date:

import matplotlib.pyplot as plt

import seaborn as sns

sns.set_style('white')

%matplotlib inline

# plot graph of 'num of ratings column'

plt.figure(figsize =(10, 4))

ratings['num of ratings'].hist(bins = 70)

# plot graph of 'ratings' column plt.figure(figsize =(10, 4))

ratings['rating'].hist(bins = 70)

Output:

CSE KHIT
Exp. No: Date:

CSE KHIT
Exp. No: Date:

CSE KHIT
Exp. No: Date:

49

You might also like