0% found this document useful (0 votes)
127 views38 pages

MLT - Lab - Manual FINAL

Uploaded by

svbdj.ek.sh.sk.v
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
127 views38 pages

MLT - Lab - Manual FINAL

Uploaded by

svbdj.ek.sh.sk.v
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 38

MACHINE LEARNING TECHNIQUES

LABORATORY MANUAL

(19AI24502)
V-SEMESTER - B.TECH - AI&DS

MAHENDRA ENGINEERING COLLEGE


(AUTONOMOUS)
DEPARTMENT OF
ARTIFICIAL INTELLIGENCE &DATA SCIENCE
MAHENDHIRAPURI, MALLASAMUDRAM
NAMAKKAL-637 503.

1
MAHENDRA ENGINEERING COLLEGE
(Autonomous)
Syllabus
Artificial Intelligence & Programme
Department
Data Science Code & Name
V Semester
Periods/ Cred
Course Maximum
Course Name Week it
Code marks
L T P C
MACHINE LEARNING
TECHNIQUES 0 0 4 2 100
LABORATORY
Upon completion of this course, the student should be able to get
an idea on:
 To apply the concepts of Machine Learning to solve real-world
problems.
 To implement basic algorithms in clustering &
Objective(s) classification applied to text & numeric data.
 To implement algorithms emphasizing the importance of
bagging & boosting in classification & regression.
 To implement algorithms related to dimensionality reduction.
 To apply machine learning algorithms for Natural Language
Processing applications.
On completion of this course, students will be able to
 To learn to use Weka tool for implementing machine
learning algorithms related to numeric data.
 To learn the application of machine learning algorithms for text
data.
Outcome(s)
 To use dimensionality reduction algorithms for image
processing applications.
 To apply CRFs in text processing applications.
 To use fundamental and advanced neural network algorithms
for solving real-world data.

LISTOF EXPERIMENTS

1. Solving Regression & Classification using Decision Trees

2
2. Root Node Attribute Selection for Decision Trees using Information Gain
3. Bayesian Inference in Gene Expression Analysis
4. Pattern Recognition Application using Bayesian Inference
5. Bagging in Classification
6. Bagging, Boosting applications using Regression Trees
7. Data & Text Classification using Neural Networks
8. Using Weka tool for SVM classification for chosen domain application
9. Data & Text Clustering using K-means algorithm
10 Data & Text Clustering using Gaussian Mixture Models
.
TOTAL PERIODS 45

INDEX

S.NO NAME OF THE EXPERIMENTS PAGE NO.

3
SOLVING REGRESSION & CLASSIFICATION USING 5
1. DECISION TREES

ROOT NODE ATTRIBUTE SELECTION FOR 11


2. DECISION TREES USING INFORMATION GAIN

BAYESIAN INFERENCE IN GENE EXPRESSION 15


3.
ANALYSIS
PATTERN RECOGNITION APPLICATION USING 17
4. BAYESIAN INFERENCE

BAGGING IN CLASSIFICATION 19
5.
BAGGING, BOOSTING APPLICATIONS USING 24
6.
REGRESSION TREES
DATA & TEXT CLASSIFICATION USING NEURAL 28
7.
NETWORKS
USING WEKA TOOL FOR SVM CLASSIFICATION 30
8. FOR CHOSEN DOMAIN APPLICATION

DATA & TEXT CLUSTERING USING K-MEANS 33


9.
ALGORITHM
DATA & TEXT CLUSTERING USING GAUSSIAN 36
10.
MIXTURE MODELS

Exp No :1 Solving Regression & Classification Using Decision Trees

4
Aim:
To solve Regression & Classification using Decision Trees.

Algorithm:
Step 1: Import the required libraries.

Step 2: Initialize and print the Dataset.

Step 3: Select all the rows and column 1 from the dataset to “X”.

Step 4: Select all of the rows and column 2 from the dataset to “y”.

Step 5: Fit decision tree regressor to the dataset.

Step 6: Predicting a new value.

Step 7: Visualising the result.

Step 8: The tree is finally exported and shown in the TREE


STRUCTURE below, visualized using http://www.webgraphviz.com/ by
copying the data from the ‘tree.dot’ file.

Program:

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

dataset = np.array(
[['Asset Flip', 100, 1000],
['Text Based', 500, 3000],
['Visual Novel', 1500, 5000],
['2D Pixel Art', 3500, 8000],
['2D Vector Art', 5000, 6500],
['Strategy', 6000, 7000],
['First Person Shooter', 8000, 15000],
['Simulator', 9500, 20000],

5
['Racing', 12000, 21000],
['RPG', 14000, 25000],
['Sandbox', 15500, 27000],
['Open-World', 16500, 30000],
['MMOFPS', 25000, 52000],
['MMORPG', 30000, 80000]
])

# print the dataset


print(dataset)

Output:
[['Asset Flip' '100' '1000']
['Text Based' '500' '3000']
['Visual Novel' '1500' '5000']
['2D Pixel Art' '3500' '8000']
['2D Vector Art' '5000' '6500']
['Strategy' '6000' '7000']
['First Person Shooter' '8000' '15000']
['Simulator' '9500' '20000']
['Racing' '12000' '21000']
['RPG' '14000' '25000']
['Sandbox' '15500' '27000']
['Open-World' '16500' '30000']
['MMOFPS' '25000' '52000']
['MMORPG' '30000' '80000']]

X = dataset[:, 1:2].astype(int)

# print X
print(X)

6
Output:
[[ 100]
[ 500]
[ 1500]
[ 3500]
[ 5000]
[ 6000]
[ 8000]
[ 9500]
[12000]
[14000]
[15500]
[16500]
[25000]
[30000]]
y = dataset[:, 2].astype(int)
print(y)

Output:
[1000 3000 5000 8000 6500 7000 15000 20000 21000 25000 27000 30000
52000 80000]

from sklearn.tree import DecisionTreeRegressor


regressor = DecisionTreeRegressor(random_state = 0)
regressor.fit(X, y)

Output:
DecisionTreeRegressor(ccp_alpha=0.0, criterion='mse', max_depth=None,

7
max_features=None, max_leaf_nodes=None,
min_impurity_decrease=0.0, min_impurity_split=None,
min_samples_leaf=1, min_samples_split=2,
min_weight_fraction_leaf=0.0, presort='deprecated',
random_state=0, splitter='best')

y_pred = regressor.predict([[3750]])
print("Predicted price: % d\n"% y_pred)
Output:
Predicted price: 8000

X_grid = np.arange(min(X), max(X), 0.01)


X_grid = X_grid.reshape((len(X_grid), 1))
plt.scatter(X, y, color = 'red')
plt.plot(X_grid, regressor.predict(X_grid), color = 'blue'))
plt.title('Profit to Production Cost (Decision Tree Regression)')
plt.xlabel('Production Cost')

plt.ylabel('Profit')
plt.show()

from sklearn.tree import export_graphviz


export_graphviz(regressor, out_file ='tree.dot',
feature_names =['Production Cost'])

8
Output:

Output (Decision Tree):

9
Result:
The above experiment of Solving Regression & Classification using
Decision Trees is executed and output verified successfully.

10
Exp No 2a: Bagging Applications Using Regression Trees.

Aim:
To Solve Bagging, applications using Regression Trees.

Algorithm:

Step 1: Start

Step 2: import the libraries like numpy ,matplotlib,etc

Step 3: Do the experiment with coin as given in question

Step 4: find the head and tail probability and cumulative

Step 5: get output by using the function show

Step 6: stop

Program:

Bagging Examples

1 # evaluate bagging algorithm for classification


2 from numpy import mean
3 from numpy import std
4 from sklearn.datasets import make_classification
5 from sklearn.model_selection import cross_val_score
6 from sklearn.model_selection import RepeatedStratifiedKFold
7 from sklearn.ensemble import BaggingClassifier
8 # define dataset
9 X, y = make_classification(n_samples=1000, n_features=20, n_informative=15,
1 n_redundant=5, random_state=5)
0 # define the model

11
1 model = BaggingClassifier()
1 # evaluate the model
1 cv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=1)
2 n_scores = cross_val_score(model, X, y, scoring='accuracy', cv=cv, n_jobs=-1,
1 error_score='raise')
3 # report performance
1 print('Accuracy: %.3f (%.3f)' % (mean(n_scores), std(n_scores)))
4
1
5
1
6

OUTPUT:

Accuracy: 0.856 (0.037)

Result:

12
The above experiment of Solving Bagging applications using Regression
Trees. is executed and output verified successfully.

Exp No 2b: Boosting Applications Using Regression Trees.

Aim:
To Solve Boosting applications using Regression Trees.

Algorithm:

Step 1: Start

Step 2: import the libraries like numpy , matplotlib etc

Step 3: Do the experiment with the given dataset.

Step 4: Train and test the given model and datasets.

Step 5: get output

Step 6: stop

Program:

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from matplotlib import style
style.use('fivethirtyeight')
from sklearn.tree import DecisionTreeClassifier
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import cross_validate
import scipy.stats as sps

13
# Load in the data and define the column labels

dataset = pd.read_csv('data\mushroom.csv',header=None)
dataset = dataset.sample(frac=1)
dataset.columns = ['target','cap-shape','cap-surface','cap-color','bruises','odor','gill-
attachment','gill-spacing',
'gill-size','gill-color','stalk-shape','stalk-root','stalk-surface-above-
ring','stalk-surface-below-ring','stalk-color-above-ring',
'stalk-color-below-ring','veil-type','veil-color','ring-number','ring-
type','spore-print-color','population',
'habitat']

# Encode the feature values from strings to integers since the sklearn
DecisionTreeClassifier only takes numerical values
for label in dataset.columns:
dataset[label] = LabelEncoder().fit(dataset[label]).transform(dataset[label])

Tree_model = DecisionTreeClassifier(criterion="entropy",max_depth=1)

X = dataset.drop('target',axis=1)
Y = dataset['target'].where(dataset['target']==1,-1)

predictions = np.mean(cross_validate(Tree_model,X,Y,cv=100)['test_score'])

print('The accuracy is: ',predictions*100,'%')

OUTPUT:

The accuracy is: 73.06860322953968 %

14
Result:
The above experiment of Solving Boosting applications using Regression
Trees. is executed and output verified successfully.

Exp No:3 Data & Text Classification Using Neural Networks

Aim:
To solve Data & Text Classification using Neural Networks

Algorithm:

1. Refer to libraries we need

2. provide training data

3. organize our data

4. iterate: code + test the results + tune the model

5. Abstract

Program:

training = []
output = []
# create an empty array for our output
output_empty = [0] * len(classes)
# training set, bag of words for each sentence
for doc in documents:
# initialize our bag of words
bag = []

15
# list of tokenized words for the pattern
pattern_words = doc[0]
# stem each word
pattern_words = [stemmer.stem(word.lower()) for word in pattern_words]
# create our bag of words array
for w in words:
bag.append(1) if w in pattern_words else bag.append(0)
training.append(bag)
# output is a '0' for each tag and '1' for current tag
output_row = list(output_empty)
output_row[classes.index(doc[1])] = 1
output.append(output_row)
# sample training/output
i=0
w = documents[i][0]
print ([stemmer.stem(word.lower()) for word in w])
print (training[i])
print (output[i])

Output:
['how', 'ar', 'you', '?']
[0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
[1, 0, 0]

16
Result:
The above experiment of Solving Data & Text Classification using Neural
Networks Trees .is executed and output verified successfully.

Exp No: 4 Data & Text Clustering using K-means algorithm

Aim:
To Solve Data & Text Clustering using K-means algorithm

Algoithm:

1. Refer to libraries we need and import them as necessary

2. provide training data in csv extention

3. organize our data and train them

4. iterate: code + test the results + tune the model

5. get executable output

Program:

def
elbow_method(Y_sklearn)
:
"""
This is the function used to get optimal number
of clusters in order to feed to the k-means
clustering algorithm.
"""
17
number_clusters = range(1, 7) # Range of
possible clusters that can be generated
kmeans = [KMeans(n_clusters=i, max_iter =
600) for i in number_clusters] # Getting no. of
clusters

score =
[kmeans[i].fit(Y_sklearn).score(Y_sklearn) for i
in range(len(kmeans))] # Getting score
corresponding to each cluster.
score = [i*-1 for i in score] # Getting list of
positive scores.

plt.plot(number_clusters, score)
plt.xlabel('Number of Clusters')
plt.ylabel('Score')
plt.title('Elbow Method')
plt.show()
elbow_method(Y_sklearn)
# Optimal Clusters = 2

OUTPUT:

18
Result:
The above experiment of Solving Data & Text Clustering using K-means
algorithm Trees .is executed and output verified successfully.

Exp No: 5 Data & Text Clustering Using Gaussian Mixture Models

Aim:
To solve Data & Text Clustering using Gaussian Mixture Models

Algorithm:

 Initialize the mean μkμk, the covariance matrix ΣkΣk and the mixing
coefficients πkπk by some random values(or other values).
 Compute the CkCk values for all k.
 Again Estimate all the parameters using the current \C_k values.
 Compute log-likelihood function.
 Put some convergence criterion
 If the log-likelihood value converges to some value (or if all the parameters
converge to some values) then stop, else return to Step 2.
This algorithm only guarantee that we land to a local optimal point, but it do not
guarantee that this local optima is also the global one. And so, if the algorithm

19
starts from different initialization points, in general it lands into different
configurations.

Program:
import os
for dirname, _, filenames in os.walk('/kaggle/input'):
for filename in filenames:
print(os.path.join(dirname, filename))
/kaggle/input/ccdata/CC GENERAL.csv

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from pandas import DataFrame
from sklearn.preprocessing import StandardScaler, normalize
from sklearn.decomposition import PCA
from sklearn.mixture import GaussianMixture
from sklearn.metrics import silhouette_score
from sklearn.model_selection import train_test_split
from sklearn import metrics
In [3]:
raw_df = pd.read_csv('../input/ccdata/CC GENERAL.csv')
raw_df = raw_df.drop('CUST_ID', axis = 1)
raw_df.fillna(method ='ffill', inplace = True)
raw_df.head(2)
Out[3]:

20
P
B
O U C C M P
A IN
N R PU AS A I R
L ST ON P
E C C RC H_ S C N C
A A EO U
O A H HA A H R I _
N P LL FF R
F S A SES D _ E P M F
B C U M _P C
F H S _IN V A D A U U T
A E R E UR H
_ _ E ST A D I Y M L E
L _ C N CH A
P A S_ AL N V T M _ L N
A F H TS AS S
U D F LM CE A _ E P _ U
N R A _P ES E
R V R EN _F N L N A P R
C E S U _F S
C A E TS_ RE C I T Y A E
E Q E R RE _
H N Q FRE Q E M S M Y
U S C QU T
A C U QU UE _ I E M
E H EN R
S E E EN N T T N E
N AS CY X
E N CY C R T N
C ES
S C Y X S T
Y
Y

2
4 1
0 0
0 3 0.
. 1 1
. 9. 0
0. 9 0 0. 0 .
9 0.08 5 0
81 5 0. 95. 0 16 0.0 0 8 1
0 0 0.0 333 0 2 0 0
81 . 0 4 0 66 0 0 0 2
0 3 9 0
82 4 0 67 . 2
7 7 0
0 0 0
4 8 0
0 8
9 7
4

1 3 0. 0 0. 0.0 6 0. 0.0 0.00 0.2 4 0 7 4 1 0. 1


2 90 . 0 4 00 000 5 0 1 0 2 2
0 90 0 4 00 0 0 0 7 2
2 91 2 00 0 3 2. 2
. . . . 3 2

21
P
B
O U C C M P
A IN
N R PU AS A I R
L ST ON P
E C C RC H_ S C N C
A A EO U
O A H HA A H R I _
N P LL FF R
F S A SES D _ E P M F
B C U M _P C
F H S _IN V A D A U U T
A E R E UR H
_ _ E ST A D I Y M L E
L _ C N CH A
P A S_ AL N V T M _ L N
A F H TS AS S
U D F LM CE A _ E P _ U
N R A _P ES E
R V R EN _F N L N A P R
C E S U _F S
C A E TS_ RE C I T Y A E
E Q E R RE _
H N Q FRE Q E M S M Y
U S C QU T
A C U QU UE _ I E M
E H EN R
S E E EN N T T N E
N AS CY X
E N CY C R T N
C ES
S C Y X S T
Y
Y

4 9 0
4
6 4 3
0
7 5 2 2
0 2
4 4 5 2
1
1 8 9
7
6 3 7

In [4]:
# Standardize data
scaler = StandardScaler()
scaled_df = scaler.fit_transform(raw_df)

22
# Normalizing the Data
normalized_df = normalize(scaled_df)
# Converting the numpy array into a pandas DataFrame
normalized_df = pd.DataFrame(normalized_df)
# Reducing the dimensions of the data
pca = PCA(n_components = 2)
X_principal = pca.fit_transform(normalized_df)
X_principal = pd.DataFrame(X_principal)
X_principal.columns = ['P1', 'P2']
X_principal.head(2)
Out[4]:

P1 P2

0 -0.489949 -0.679976

1 -0.519098 0.544828

In [5]:
gmm = GaussianMixture(n_components = 3)
gmm.fit(X_principal)
Out[5]:
GaussianMixture(covariance_type='full', init_params='kmeans', max_iter=100,

23
means_init=None, n_components=3, n_init=1, precisions_init=None,
random_state=None, reg_covar=1e-06, tol=0.001, verbose=0,
verbose_interval=10, warm_start=False, weights_init=None)
In [6]:
# Visualizing the clustering
plt.scatter(X_principal['P1'], X_principal['P2'],
c = GaussianMixture(n_components = 3).fit_predict(X_principal), cmap
=plt.cm.winter, alpha = 0.6)
plt.show()

Result:
The above experiment of Solving Data & Text Clustering using Gaussian
Mixture Models is executed and output verified successfully.

Root Node Attribute Selection For Decision Trees Using


Exp No: 6
Information Gain

Aim:
To solve Data & Text Clustering using Gaussian Mixture Models

24
Algorithm:

 Root Node: It represents the entire population or sample and this further

gets divided into two or more homogeneous sets.

 Splitting: It is a process of dividing a node into two or more sub-nodes.

 Decision Node: When a sub-node splits into further sub-nodes, then it is

called the decision node.

 Leaf / Terminal Node: Nodes do not split is called Leaf or Terminal node.

 Pruning: When we remove sub-nodes of a decision node, this process is

called pruning. You can say the opposite process of splitting.

 Branch / Sub-Tree: A subsection of the entire tree is called branch or sub-

tree.

 Parent and Child Node: A node, which is divided into sub-nodes is called a

parent node of sub-nodes whereas sub-nodes are the child of a parent node.

Program:

Now, lets draw a Decision Tree for the following data using Information
gain.
Training set: 3 features and 2 classes

25
X Y Z C

1 1 1 I

1 1 0 I

0 0 1 II

1 0 0 II

Output:

Here, we have 3 features and 2 output classes.


To build a decision tree using Information gain. We will take each of the
feature and calculate the information for each feature.

Split on feature X

26
Split on feature Y

Split on feature Z

27
Result:
The above experiment of root node attribute selection for decision trees
using information gain is executed and output verified successfully.

28
Exp No: 7 Bayesian Inference In Gene Expression Analysis

Aim:
To solve Bayesian Inference in Gene Expression Analysis.

Algorithm:

 Start
 Omics techniques have changed the way we depict the molecular features of
a cell.
 The integrative and quantitative analysis of omics data raises unprecedented
expectations for understanding biological systems on a global scale.
 However, its inherently noisy nature, together with limited knowledge of
potential sources of variation impacting health and disease, require the use
of proper mathematical and computational methods for its analysis and
integration.
 Bayesian inference of probabilistic models allows propagation of the
uncertainty from the experimental data
 Stop

Program:

1. Prior: P(θ = t) is the probability a priori that the unobserved parameter of


interest θ takes a value t, based only on our knowledge before performing the
experiment. For the example above, the prior probability distribution should be
centered around 1. Bayesian statistics allows us to tune this expectation: If we
are confident that the expression of Pparγ is 1, we will suggest as prior a
distribution with almost no dispersion around the expected value (Figure 1A).
That would be a strong prior. On the other hand, if our experiments were
unclear, we would propose a non-informative or weak prior with a lot of
dispersion around the expected value (Figure 1B).
2. Likelihood: P(y | θ = t), how plausible it is to have observed the data y if we
are in certain scenario, that is, if θ was to take a value of t. In our example, we
know that the measurements of the gene expression using RNA-Seq after
normalization and in the log2 scale follow approximately a normal distribution,
centered around its expected actual expression 1:

29
Output:

P(y|θ)~N(θ,σ)

Posterior: P(θ = t|y) is the probability that the parameter θ takes a value t,
knowing now that we have observed the data y. Given the data and the a
priori distribution of the parameter of interest, we want to infer the probability of
the different possible values of the parameters of interest. Using the Bayes
Theorem:

Output:

P(θ|y)=p(y|θ)p(θ)p(y)∝ P(y|θ)⋅P(θ), since P(y) is constant

In complex models with many parameters, the relationships between the data and
the parameters, and among parameters, can be represented in Directed Acyclic
Graphs (DAGs), which are useful representations of complex probabilistic models
and show their modular structure (10). Figure 2 shows the DAG for the Bayesian
inference problem of trying to infer differences in gene expression across two
conditions. In particular, DAGs help us to decompose the joint priors and posterior
distributions through the chain rule, paying attention only to the parents of each
parameter:

Output:

P(θ1,…,θn)=∏i=1nP(θi|parents(θi))

Result:
The above experiment of solving Bayesian Inference in Gene Expression
Analysis.

30
Exp No: 8 Pattern Recognition Application Using Bayesian Inference.

Aim:
To solve Pattern Recognition Application using Bayesian
Inference.

Algorithm:

 Start
 Electromyogram (EMG) has been utilized to interface signals for prosthetic
hands and information devices
 A scale mixture model is a stochastic EMG model in which the EMG
variance is considered as a random variable, enabling the representation of
uncertainty in the variance.
 his model is extended in this study and utilized for EMG pattern
classification.
 The proposed method is trained by variational Bayesian learning, thereby
allowing the automatic determination of the model complexity.
 Stop

Procedure:

A mutual information-based determination method is


introduced. Simulation and EMG analysis experiments demonstrated the
relationship between the hyperparameters and classification accuracy of the
proposed method as well as the validity of the proposed method. The comparison
using public EMG datasets revealed that the proposed method outperformed the
various conventional classifiers.

These results indicated the validity of the proposed method and


its applicability to EMG-based control systems. In EMG pattern recognition, a
classifier based on a generative model that reflects the stochastic characteristics of
EMG signals can outperform the conventional general-purpose classifier.

31
PROGRAM:

from sklearn import tree


from sklearn.datasets import load_iris
from sklearn.metrics import accuracy_score
import numpy#Preparing the data set - Loading the data via iris.data - Loading the
descriptions of the data via iris.target
#The names of the plant species can be retrieved via "iris_target_names". The
names are stored as IDs (numbers) in "data".
iris = load_iris()
x_coordinate = iris.data
y_coordinate = iris.target
plant_names = iris.target_names#Create random indexes used to retrieve the data
in the iris dataset
array_ids = numpy.random.permutation(len(x_coordinate))#In "train" the data is
used for learning for the Machine Learning program.
#In "real" the actual data is stored, which is used to check the predicted data.
#The last 15 values are used for "real" for checking, the rest for "train".
x_coordinate_train = x_coordinate[array_ids[:-15]]
x_coordinate_real = x_coordinate[array_ids[-15:]]y_coordinate_train =
y_coordinate[array_ids[:-15]]
y_coordinate_real = y_coordinate[array_ids[-15:]]#Classify the data using a
decision tree and train it with the previously created data.
data_classification = tree.DecisionTreeClassifier()
data_classification.fit(x_coordinate_train,y_coordinate_train)#Create predictions
from existing data (in data set "real")
prediction = data_classification.predict(x_coordinate_real)#Display the predicted
names
print(prediction)
#The actual values
print(y_coordinate_real)
#Calculate the accuracy of the predicted data -
# Method accuracy_score() gets the predicted value and the actual value returned
print("Accuracy in percent: %.2f" %
((accuracy_score(prediction,y_coordinate_real)) * 100))

32
OUTPUT:

[1 1 0 2 2 0 1 2 1 0 2 1 2 0 2]
[2 1 0 2 2 0 1 2 1 0 2 1 2 0 2]
Accuracy in percent: 93.33

Result:
The above experiment of solving pattern recognition application using
bayesian inference.

33
Exp No: 9 Bagging In Classification

Aim:
To solve Bagging in classification.

Algorithm:

 Start
 A Bagging classifier is an ensemble meta-estimator that fits base classifiers
each on random subsets of the original dataset
 Such a meta-estimator can typically be used as a way to reduce the variance
of a black-box estimator
 Each base classifier is trained in parallel with a training set which is
generated by randomly drawing.
 Stop

Program:

Since Bagging resamples the original training dataset with replacement, some
instance(or data) may be present multiple times while others are left out.
Original training dataset: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10
Resampled training set 1: 2, 3, 3, 5, 6, 1, 8, 10, 9, 1
Resampled training set 2: 1, 1, 5, 6, 3, 8, 9, 10, 2, 7
Resampled training set 3: 1, 5, 8, 9, 2, 10, 9, 7, 5, 4
Algorithm for the Bagging classifier:
Classifier generation:

Let N be the size of the training set.


for each of t iterations:
sample N instances with replacement from the original training set.
apply the learning algorithm to the sample.
store the resulting classifier.

Classification:
for each of the t classifiers:
predict class of instance using classifier.
return class that was predicted most often.

34
Below is the Python implementation of the above algorithm:

from sklearn import model_selection


from sklearn.ensemble import BaggingClassifier
from sklearn.tree import DecisionTreeClassifier
import pandas as pd

# load the data


url = "/home/debomit/Downloads/wine_data.xlsx"
dataframe = pd.read_excel(url)
arr = dataframe.values
X = arr[:, 1:14]
Y = arr[:, 0]

seed = 8
kfold = model_selection.KFold(n_splits = 3,
random_state = seed)

# initialize the base classifier


base_cls = DecisionTreeClassifier()

# no. of base classifier


num_trees = 500

# bagging classifier
model = BaggingClassifier(base_estimator = base_cls,
n_estimators = num_trees,
random_state = seed)

results = model_selection.cross_val_score(model, X, Y, cv = kfold)


print("accuracy :")
print(results.mean())

35
Output:

accuracy :
0.8372093023255814

Result:
The above experiment of solving Bagging in classification is executed and
output verified successfully.

36
Using Weka Tool For SVM Classification For
Exp No: 10
Chosen Domain Application.

Aim:
To solve using weka tool for SVM classification for chosen
domain application.

Algorithm:

Weka tool is an open-source tool developed by students of Waikato


university which stands for Waikato Environment for Knowledge Analysis having
all inbuilt machine learning algorithms.
 It is used for solving real-life problems using data mining techniques. The
tool was developed using the Java programming language so that it is
platform-independent.
 The tool itself contains some data sets in the data file of the application, We
can them to implement our algorithms.
 The dataset we are going to use is breast-cancer.arff. Classification can be
defined by Prediction models that predict continuous-valued functions,
while classification models predict categorical class marks.
 In this article, we are going to learn the classification implementation on a
dataset using WEKA tool. We will use two different classifiers for this.

PROGRAM:

>>> import numpy as np


>>> from sklearn.datasets import make_classification
>>> from sklearn.model_selection import train_test_split
>>> from sklearn import svm
>>> X, y = make_classification(n_samples=10, random_state=0)
>>> X_train , X_test , y_train, y_test = train_test_split(X, y, random_state=0)
>>> clf = svm.SVC(kernel='precomputed')
>>> # linear kernel computation
>>> gram_train = np.dot(X_train, X_train.T)
>>> clf.fit(gram_train, y_train)
SVC(kernel='precomputed')
>>> # predict on training examples
>>> gram_test = np.dot(X_test, X_train.T)
37
>>> clf.predict(gram_test)
array([0, 1, 0])

OUTPUT:

RESULT:
Thus by using Weka tool for SVM classification for chosen domain
application is successfully executed and verified successfully.

38

You might also like