ML Lab Programs PDF
ML Lab Programs PDF
1. The probability that it is Friday and that a student is absent is 3 %. Since there are 5 school
days in a week, the probability that it is Friday is 20 %. What is the probability that a student
is absent given that today is Friday? Apply Baye’s rule in python to get the result. (Ans: 15%)
4. Given the following data, which specify classifications for nine combinations of VAR1
and VAR2, predict a classification for a case where VAR1=0.906 and VAR2=0.606, using
the result of kˇmeans clustering with 3 means (i.e., 3 centroids)
The probability that it is Friday and that a student is absent is 3 %. Since there are 5
school days in a week, the probability that it is Friday is 20 %. What is the probability
that a student is absent given that today is Friday? Apply Baye’s rule in python to get
the result. (Ans: 15%)
PROCEDURE:
If A and B are two events in a sample space S, then the conditional probability
of A given B is defined as
Similarly, the P(B/A) formula is:
P(A/B) =P(A∩B) / P(B)
P(B/A) =P(A∩B) / P(A)
Two events A and B are independent if and only if P(A∩B)=P(A)P(B)
For the given experiment, we have, P(A∩B)=0.03
P(A)=0.2
P(B|A)=P(A∩B)/P(A)
=0.03/0.2=0.15
Another Example: If a fair die /is rolled. Let A be the event that the outcome is
an odd number, i.e., A={1,3,5}. Also let B be the event that the outcome is less
than or equal to 3, i.e., B={1,2,3}. The figure shows the Venn diagram of the
events. p(B|A)=(2/6)/(3/6)=2/3=0.666
Source code:
probitisFridaynstudentAbsent=float(input("Enter the probability of
being Friday and student is absent: "))
probitisFriday=0.2
pstudentisAbsentgivenitisFriday=probitisFridaynstudentAbsent/probit
isFriday
print("Probability that student is absent given it is Friday is:",
pstudentisAbsentgivenitisFriday)
Out put:
Enter the probability of being Friday and student is absent: 0.03 Probability
that student is absent given it is Friday is: 0.15
PROBLEM STATEMENT 2
Extract the data from database using python
PROCEDURE:
SOURCE CODE:
import mysql.connector
OUTPUT:
| FIRST_NAME | Country |
+------------+-------------+
| Shikhar | India |
| Jonathan | SouthAfrica |
| Kumara | Srilanka |
| Virat | India |
| Rohit | India |
PROBLEM STATEMENT 3
K-NN Algorithm
• • Load the training data.
• • Choose K the number of nearest neighbors to look
• • Compute the test point’s distance from each training point
• • Sort the distances in ascending (or descending) order
• • Use the sorted distances to select the Knearest neighbors
• • Use majority rule(for classification) or averaging (for regression)
Advantages of KNN
1. Easy to understand
2. No assumptions about data
3. Can be applied to both classification and regression
4. Works easily on multi-class problems
Disadvantages of KNN
5. Memory Intensive / Computationally expensive
6. Sensitive to scale of data
7. Not work well on rare event (skewed) target variable
8. Struggle when high number of independent variables
SOURCE CODE:
import numpy as np
import pandas as pd
import math
sns.set()
import warnings
warnings.filterwarnings('ignore')
#%matplotlib inline
df=pd.read_csv(r'C:\Users\AIMLJAVA4\Desktop\lab 3\diabetes.csv')
print(df)
df.info(verbose=True)
df.describe().T
df_copy[['Glucose','BloodPressure','SkinThickness','Insulin','BMI']] =
df_copy[['Glucose','BloodPressure','SkinThickness','Insulin','BMI']].replace(0,np.NaN)
print(df_copy.isnull().sum())
p = df.hist(figsize = (20,20))
X = pd.DataFrame(sc_X.fit_transform(df_copy.drop(["Outcome"],axis = 1),),
X.head()
y =df_copy.Outcome
test_scores = []
train_scores = []
for i in range(1,15):
knn = KNeighborsClassifier(i)
knn.fit(X_train,y_train)
train_scores.append(knn.score(X_train,y_train))
test_scores.append(knn.score(X_test,y_test))
## score that comes from testing on the same datapoints that were used for training
max_train_score = max(train_scores)
train_scores_ind))))
## score that comes from testing on the datapoints that were split in the beginning to be used for
testing solely
max_test_score = max(test_scores)
test_scores_ind))))
knn = KNeighborsClassifier(11)
knn.fit(X_train,y_train)
knn.score(X_test,y_test)
y_pred = knn.predict(X_test)
plt.ylabel('Actual label')
plt.xlabel('Predicted label')
#import classification_report
print(classification_report(y_test,y_pred))
#Curve tells us about how good the model can distinguish between two things
y_pred_proba = knn.predict_proba(X_test)[:,1]
#import GridSearchCV
param_grid = {'n_neighbors':np.arange(1,50)}
knn = KNeighborsClassifier()
knn_cv= GridSearchCV(knn,param_grid,cv=5)
knn_cv.fit(X,y)
OUTPUT:
SOURCE CODE:
The following training examples map descriptions of individuals onto high, medium and
lowcredit-worthiness.
medium skiing design single twenties no -> highRisk
high golf trading married forties yes -> lowRisk
low speedway transport married thirties yes -> medRisk
medium football banking single thirties yes -> lowRisk
high flying media married fifties yes -> highRisk
low football security single twenties no -> medRisk
medium golf media single thirties yes -> medRisk
medium golf transport married forties yes -> lowRisk
high skiing banking single thirties yes -> highRisk
low golf unemployed married forties yes -> highRisk
Input attributes are (from left to right) income, recreation, job, status, age-group, home-
owner. Find the unconditional probability of `golf' and the conditional probability of `single'
given `medRisk' in the dataset?
SOURCE CODE:
total_Records=10
numGolfRecords=4
unConditionalprobGolf=numGolfRecords / total_Records
print("Unconditional probability of golf: ={}".format(unConditionalprobGolf))
#conditional probability of 'single' given 'medRisk'
numMedRiskSingle=2
numMedRisk=3
probMedRiskSingle=numMedRiskSingle/total_Records
probMedRisk=numMedRisk/total_Records
conditionalProb=(probMedRiskSingle/probMedRisk)
print("Conditional probability of single given medRisk: = {}".format(conditionalProb))
OUTPUT:
Unconditional probability of golf: =0.4
Conditional probability of single given medRisk: = 0.6666666666666667
PROBLEM STATEMENT 6
SOURCE CODE:
import numpy as np
import matplotlib.pyplot as plt
# putting labels
plt.xlabel('x')
plt.ylabel('y')
def main():
# observations / data
x = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
y = np.array([1, 3, 2, 5, 7, 8, 8, 9, 10, 12])
# estimating coefficients
b = estimate_coef(x, y)
print("Estimated coefficients:\nb_0 = {} \
\nb_1 = {}".format(b[0], b[1]))
if __name__ == "__main__":
main()
OUTPUT: