ML Python Exercises UOM BDS Classification
ML Python Exercises UOM BDS Classification
import pandas as pd
df = pd.read_csv("salaries.csv")
print(df)
target = df['salary_more_then_100k']
le_company = LabelEncoder()
le_job = LabelEncoder()
le_degree = LabelEncoder()
inputs['company_n'] = le_company.fit_transform(inputs['company'])
inputs['job_n'] = le_job.fit_transform(inputs['job'])
inputs['degree_n'] = le_degree.fit_transform(inputs['degree'])
print(inputs)
model = tree.DecisionTreeClassifier()
model.fit(inputs_n, target)
# Print model score and make predictions
Output:
Model Score: 1.0
Naïve Bayes algorithm is a supervised learning algorithm, which is based on Bayes theorem and used
for solving classification problems. It is mainly used in text classification that includes a high-
dimensional training dataset.
The Bayesian method of calculating conditional probabilities is used in machine learning applications
that involve classification tasks. A simplified version of the Bayes Theorem, known as the Naive Bayes
Classification, is used to reduce computation time and costs.
PROGRAM:
'high', 'medium']
le = preprocessing.LabelEncoder()
# Converting string labels into numbers
age_encoded = le.fit_transform(age)
print(age_encoded)
income_encoded = le.fit_transform(income)
print(income_encoded)
student_encoded = le.fit_transform(student)
print(student_encoded)
credit_encoded = le.fit_transform(credit_rating)
print(credit_encoded)
label = le.fit_transform(buys_computer)
print(label)
# Combining age, income, student, and credit rating into a single list of tuples
model = GaussianNB()
model.fit(features, label)
# Predict output
Output:
Multinomial Naive Bayes is one of the most popular supervised learning classifications that is used
for the analysis of the categorical text data. Text data classification is gaining popularity because
there is an enormous amount of information available in email, documents, websites, etc. that needs
to be analyzed.
Examples of categorical variables are race, sex, age group, and educational level. While the latter two
variables may also be considered in a numerical manner by using exact values for age and highest
grade completed, it is often more informative to categorize such variables into a relatively small
number of groups.
This dataset is the result of a chemical analysis of wines grown in the same region in Italy but
derived from three different plant varieties. Dataset comprises of 13 features (alcohol, malic_acid,
ash,alkalinity_of_ash,magnesium,total_phenols,flavonoids,nonflavanoid_phenols,proanthocyanins,c
olor_intensity,hue,od280/od315_of_diluted_wines,proline) and type of wine plant variety. This data
has three type of wine Class_0, Class_1, Class_2.
PROGRAM:
# Load dataset
wine = datasets.load_wine()
print("Features:", wine.feature_names)
print("Labels:", wine.target_names)
print(wine.data.shape)
print(wine.data[:5])
print(wine.target)
# Split dataset into training set and test set (70% training and 30% test)
random_state=109)
gnb = GaussianNB()
y_pred = gnb.predict(X_test)
print("Predicted Labels:",y_pred)
Output:
4. LINEAR KERNEL:[pg.no:59-61]
Kernels, also known as kernel techniques or kernel functions, are a collection of distinct forms of
pattern analysis algorithms, using a linear classifier, they solve an existing non-linear problem. SVM
(Support Vector Machines) uses Kernels Methods in ML to solve classification and regression issues.
Linear Kernel is used when the data is Linearly separable, that is, it can be separated using a single
Line. It is one of the most common kernels to be used. It is mostly used when there are a Large
number of Features in a particular Data Set.
PROGRAM:
# Load libraries
import pandas as pd
import seaborn as sn
# Load dataset
df = pd.read_csv("iris.csv", names=column_names)
test_size=0.3, random_state=1)
clf = SVC(kernel='linear')
clf.fit(X_train, y_train)
y_pred = clf.predict(X_test)
print(y_pred)
# Calculate accuracy
print(classification_report(y_test, y_pred))
cm = pd.crosstab(y_test,y_pred,rownames=['Actual'],colnames=['Predicted'])
ax = sn.heatmap(cm, annot=True)
plt.show()
Output:
5. POLYNOMINAL KERNEL:[pg.no:61-63]
PROGRAM:
# Import libraries
import pandas as pd
import seaborn as sn
'petal-width', 'Class']
df = pd.read_csv("iris.csv", names=colnames)
test_size=0.3, random_state=1)
clf.fit(X_train, y_train)
y_pred = clf.predict(X_test)
print(y_pred)
print("Accuracy:", accuracy*100)
print(classification_report(y_test, y_pred))
rownames=['Actual'],colnames=['Predicted'])
ax = sn.heatmap(confusion_matrix, annot=True)
plt.show()
Output:
6. RADIAL BASIS FUNCTION KERNEL:[pg.no:63-65]
PROGRAM:
# Load libraries
import pandas as pd
import seaborn as sn
'petal-width', 'Class']
X = dataset.drop('Class', axis=1)
y = dataset['Class']
# Split dataset into training set and test set (70% training and 30% test)
test_size=0.3, random_state=1)
clf.fit(X_train, y_train)
y_pred = clf.predict(X_test)
print(classification_report(y_test, y_pred))
# Generate and display confusion matrix as a heatmap
rownames=['Actual'], colnames=['Predicted'])
sn.heatmap(confusion_matrix, annot=True)
plt.show()
7. K-NEAREST NEIGHBOURS
The k-nearest neighbors algorithm, also known as KNN or k-NN, is a non-parametric, supervised
learning classifier, which uses proximity to make classifications or predictions about the grouping of
an individual data point.
# The make_blobs function from sklearn.datasets is used to generate a synthetic dataset with 500
samples, 2 features, 4 centers, and a cluster standard deviation of 1.5. The X variable contains the
feature vectors, and the y variable contains the corresponding labels.
PROGRAM:
import numpy as np
import pandas as pd
plt.style.use('seaborn')
plt.figure(figsize = (10,10))
plt.show()
knn5 = KNeighborsClassifier(n_neighbors = 5)
knn1 = KNeighborsClassifier(n_neighbors=1)
knn5.fit(X_train, y_train)
knn1.fit(X_train, y_train)
y_pred_5 = knn5.predict(X_test)
y_pred_1 = knn1.predict(X_test)
plt.figure(figsize = (15,5))
plt.subplot(1,2,1)
plt.subplot(1,2,2)
plt.show()
Output:
It builds decision trees on different samples and takes their majority vote for classification and
average in case of regression.
PROGRAM:
# Load libraries
import pandas as pd
import seaborn as sn
# Load dataset
df = pd.read_csv("iris.csv")
# Assign column names to the dataset
'petal-width', 'Class']
df.columns = colnames
test_size=0.3, random_state=1)
clf = RandomForestClassifier(n_estimators=100)
clf.fit(X_train, y_train)
y_pred = clf.predict(X_test)
print("Predictions:", y_pred)
print("Classification Report:")
print(classification_report(y_test, y_pred))
print("Confusion Matrix:")
cm = confusion_matrix(y_test, y_pred)
print(cm)
plt.figure(figsize=(8, 6))
plt.xlabel('Predicted')
plt.ylabel('Actual')
plt.show()