0% found this document useful (0 votes)

127 views38 pages

MLT - Lab - Manual FINAL

Uploaded by

svbdj.ek.sh.sk.v

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

127 views38 pages

MLT - Lab - Manual FINAL

Uploaded by

svbdj.ek.sh.sk.v

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 38

MACHINE LEARNING TECHNIQUES

LABORATORY MANUAL

(19AI24502)
V-SEMESTER - B.TECH - AI&DS

MAHENDRA ENGINEERING COLLEGE

(AUTONOMOUS)
DEPARTMENT OF
ARTIFICIAL INTELLIGENCE &DATA SCIENCE
MAHENDHIRAPURI, MALLASAMUDRAM
NAMAKKAL-637 503.

1
MAHENDRA ENGINEERING COLLEGE
(Autonomous)
Syllabus
Artificial Intelligence & Programme
Department
Data Science Code & Name
V Semester
Periods/ Cred
Course Maximum
Course Name Week it
Code marks
L T P C
MACHINE LEARNING
TECHNIQUES 0 0 4 2 100
LABORATORY
Upon completion of this course, the student should be able to get
an idea on:
 To apply the concepts of Machine Learning to solve real-world
problems.
 To implement basic algorithms in clustering &
Objective(s) classification applied to text & numeric data.
 To implement algorithms emphasizing the importance of
bagging & boosting in classification & regression.
 To implement algorithms related to dimensionality reduction.
 To apply machine learning algorithms for Natural Language
Processing applications.
On completion of this course, students will be able to
 To learn to use Weka tool for implementing machine
learning algorithms related to numeric data.
 To learn the application of machine learning algorithms for text
data.
Outcome(s)
 To use dimensionality reduction algorithms for image
processing applications.
 To apply CRFs in text processing applications.
 To use fundamental and advanced neural network algorithms
for solving real-world data.

LISTOF EXPERIMENTS

1. Solving Regression & Classification using Decision Trees

2
2. Root Node Attribute Selection for Decision Trees using Information Gain
3. Bayesian Inference in Gene Expression Analysis
4. Pattern Recognition Application using Bayesian Inference
5. Bagging in Classification
6. Bagging, Boosting applications using Regression Trees
7. Data & Text Classification using Neural Networks
8. Using Weka tool for SVM classification for chosen domain application
9. Data & Text Clustering using K-means algorithm
10 Data & Text Clustering using Gaussian Mixture Models
.
TOTAL PERIODS 45

INDEX

S.NO NAME OF THE EXPERIMENTS PAGE NO.

3
SOLVING REGRESSION & CLASSIFICATION USING 5
1. DECISION TREES

ROOT NODE ATTRIBUTE SELECTION FOR 11

2. DECISION TREES USING INFORMATION GAIN

BAYESIAN INFERENCE IN GENE EXPRESSION 15

3.
ANALYSIS
PATTERN RECOGNITION APPLICATION USING 17
4. BAYESIAN INFERENCE

BAGGING IN CLASSIFICATION 19
5.
BAGGING, BOOSTING APPLICATIONS USING 24
6.
REGRESSION TREES
DATA & TEXT CLASSIFICATION USING NEURAL 28
7.
NETWORKS
USING WEKA TOOL FOR SVM CLASSIFICATION 30
8. FOR CHOSEN DOMAIN APPLICATION

DATA & TEXT CLUSTERING USING K-MEANS 33

9.
ALGORITHM
DATA & TEXT CLUSTERING USING GAUSSIAN 36
10.
MIXTURE MODELS

Exp No :1 Solving Regression & Classification Using Decision Trees

4
Aim:
To solve Regression & Classification using Decision Trees.

Algorithm:
Step 1: Import the required libraries.

Step 2: Initialize and print the Dataset.

Step 3: Select all the rows and column 1 from the dataset to “X”.

Step 4: Select all of the rows and column 2 from the dataset to “y”.

Step 5: Fit decision tree regressor to the dataset.

Step 6: Predicting a new value.

Step 7: Visualising the result.

Step 8: The tree is finally exported and shown in the TREE

STRUCTURE below, visualized using http://www.webgraphviz.com/ by
copying the data from the ‘tree.dot’ file.

Program:

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

dataset = np.array(
[['Asset Flip', 100, 1000],
['Text Based', 500, 3000],
['Visual Novel', 1500, 5000],
['2D Pixel Art', 3500, 8000],
['2D Vector Art', 5000, 6500],
['Strategy', 6000, 7000],
['First Person Shooter', 8000, 15000],
['Simulator', 9500, 20000],

5
['Racing', 12000, 21000],
['RPG', 14000, 25000],
['Sandbox', 15500, 27000],
['Open-World', 16500, 30000],
['MMOFPS', 25000, 52000],
['MMORPG', 30000, 80000]
])

# print the dataset

print(dataset)

Output:
[['Asset Flip' '100' '1000']
['Text Based' '500' '3000']
['Visual Novel' '1500' '5000']
['2D Pixel Art' '3500' '8000']
['2D Vector Art' '5000' '6500']
['Strategy' '6000' '7000']
['First Person Shooter' '8000' '15000']
['Simulator' '9500' '20000']
['Racing' '12000' '21000']
['RPG' '14000' '25000']
['Sandbox' '15500' '27000']
['Open-World' '16500' '30000']
['MMOFPS' '25000' '52000']
['MMORPG' '30000' '80000']]

X = dataset[:, 1:2].astype(int)

# print X
print(X)

6
Output:
[[ 100]
[ 500]
[ 1500]
[ 3500]
[ 5000]
[ 6000]
[ 8000]
[ 9500]
[12000]
[14000]
[15500]
[16500]
[25000]
[30000]]
y = dataset[:, 2].astype(int)
print(y)

Output:
[1000 3000 5000 8000 6500 7000 15000 20000 21000 25000 27000 30000
52000 80000]

from sklearn.tree import DecisionTreeRegressor

regressor = DecisionTreeRegressor(random_state = 0)
regressor.fit(X, y)

Output:
DecisionTreeRegressor(ccp_alpha=0.0, criterion='mse', max_depth=None,

7
max_features=None, max_leaf_nodes=None,
min_impurity_decrease=0.0, min_impurity_split=None,
min_samples_leaf=1, min_samples_split=2,
min_weight_fraction_leaf=0.0, presort='deprecated',
random_state=0, splitter='best')

y_pred = regressor.predict([[3750]])
print("Predicted price: % d\n"% y_pred)
Output:
Predicted price: 8000

X_grid = np.arange(min(X), max(X), 0.01)

X_grid = X_grid.reshape((len(X_grid), 1))
plt.scatter(X, y, color = 'red')
plt.plot(X_grid, regressor.predict(X_grid), color = 'blue'))
plt.title('Profit to Production Cost (Decision Tree Regression)')
plt.xlabel('Production Cost')

plt.ylabel('Profit')
plt.show()

from sklearn.tree import export_graphviz

export_graphviz(regressor, out_file ='tree.dot',
feature_names =['Production Cost'])

8
Output:

Output (Decision Tree):

9
Result:
The above experiment of Solving Regression & Classification using
Decision Trees is executed and output verified successfully.

10
Exp No 2a: Bagging Applications Using Regression Trees.

Aim:
To Solve Bagging, applications using Regression Trees.

Algorithm:

Step 1: Start

Step 2: import the libraries like numpy ,matplotlib,etc

Step 3: Do the experiment with coin as given in question

Step 4: find the head and tail probability and cumulative

Step 5: get output by using the function show

Step 6: stop

Program:

Bagging Examples

1 # evaluate bagging algorithm for classification

2 from numpy import mean
3 from numpy import std
4 from sklearn.datasets import make_classification
5 from sklearn.model_selection import cross_val_score
6 from sklearn.model_selection import RepeatedStratifiedKFold
7 from sklearn.ensemble import BaggingClassifier
8 # define dataset
9 X, y = make_classification(n_samples=1000, n_features=20, n_informative=15,
1 n_redundant=5, random_state=5)
0 # define the model

11
1 model = BaggingClassifier()
1 # evaluate the model
1 cv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=1)
2 n_scores = cross_val_score(model, X, y, scoring='accuracy', cv=cv, n_jobs=-1,
1 error_score='raise')
3 # report performance
1 print('Accuracy: %.3f (%.3f)' % (mean(n_scores), std(n_scores)))
4
1
5
1
6

OUTPUT:

Accuracy: 0.856 (0.037)

Result:

12
The above experiment of Solving Bagging applications using Regression
Trees. is executed and output verified successfully.

Exp No 2b: Boosting Applications Using Regression Trees.

Aim:
To Solve Boosting applications using Regression Trees.

Algorithm:

Step 1: Start

Step 2: import the libraries like numpy , matplotlib etc

Step 3: Do the experiment with the given dataset.

Step 4: Train and test the given model and datasets.

Step 5: get output

Step 6: stop

Program:

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from matplotlib import style
style.use('fivethirtyeight')
from sklearn.tree import DecisionTreeClassifier
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import cross_validate
import scipy.stats as sps

13
# Load in the data and define the column labels

dataset = pd.read_csv('data\mushroom.csv',header=None)
dataset = dataset.sample(frac=1)
dataset.columns = ['target','cap-shape','cap-surface','cap-color','bruises','odor','gill-
attachment','gill-spacing',
'gill-size','gill-color','stalk-shape','stalk-root','stalk-surface-above-
ring','stalk-surface-below-ring','stalk-color-above-ring',
'stalk-color-below-ring','veil-type','veil-color','ring-number','ring-
type','spore-print-color','population',
'habitat']

# Encode the feature values from strings to integers since the sklearn
DecisionTreeClassifier only takes numerical values
for label in dataset.columns:
dataset[label] = LabelEncoder().fit(dataset[label]).transform(dataset[label])

Tree_model = DecisionTreeClassifier(criterion="entropy",max_depth=1)

X = dataset.drop('target',axis=1)
Y = dataset['target'].where(dataset['target']==1,-1)

predictions = np.mean(cross_validate(Tree_model,X,Y,cv=100)['test_score'])

print('The accuracy is: ',predictions*100,'%')

OUTPUT:

The accuracy is: 73.06860322953968 %

14
Result:
The above experiment of Solving Boosting applications using Regression
Trees. is executed and output verified successfully.

Exp No:3 Data & Text Classification Using Neural Networks

Aim:
To solve Data & Text Classification using Neural Networks

Algorithm:

1. Refer to libraries we need

2. provide training data

3. organize our data

4. iterate: code + test the results + tune the model

5. Abstract

Program:

training = []
output = []
# create an empty array for our output
output_empty = [0] * len(classes)
# training set, bag of words for each sentence
for doc in documents:
# initialize our bag of words
bag = []

15
# list of tokenized words for the pattern
pattern_words = doc[0]
# stem each word
pattern_words = [stemmer.stem(word.lower()) for word in pattern_words]
# create our bag of words array
for w in words:
bag.append(1) if w in pattern_words else bag.append(0)
training.append(bag)
# output is a '0' for each tag and '1' for current tag
output_row = list(output_empty)
output_row[classes.index(doc[1])] = 1
output.append(output_row)
# sample training/output
i=0
w = documents[i][0]
print ([stemmer.stem(word.lower()) for word in w])
print (training[i])
print (output[i])

Output:
['how', 'ar', 'you', '?']
[0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
[1, 0, 0]

16
Result:
The above experiment of Solving Data & Text Classification using Neural
Networks Trees .is executed and output verified successfully.

Exp No: 4 Data & Text Clustering using K-means algorithm

Aim:
To Solve Data & Text Clustering using K-means algorithm

Algoithm:

1. Refer to libraries we need and import them as necessary

2. provide training data in csv extention

3. organize our data and train them

4. iterate: code + test the results + tune the model

5. get executable output

Program:

def
elbow_method(Y_sklearn)
:
"""
This is the function used to get optimal number
of clusters in order to feed to the k-means
clustering algorithm.
"""
17
number_clusters = range(1, 7) # Range of
possible clusters that can be generated
kmeans = [KMeans(n_clusters=i, max_iter =
600) for i in number_clusters] # Getting no. of
clusters

score =
[kmeans[i].fit(Y_sklearn).score(Y_sklearn) for i
in range(len(kmeans))] # Getting score
corresponding to each cluster.
score = [i*-1 for i in score] # Getting list of
positive scores.

plt.plot(number_clusters, score)
plt.xlabel('Number of Clusters')
plt.ylabel('Score')
plt.title('Elbow Method')
plt.show()
elbow_method(Y_sklearn)
# Optimal Clusters = 2

OUTPUT:

18
Result:
The above experiment of Solving Data & Text Clustering using K-means
algorithm Trees .is executed and output verified successfully.

Exp No: 5 Data & Text Clustering Using Gaussian Mixture Models

Aim:
To solve Data & Text Clustering using Gaussian Mixture Models

Algorithm:

 Initialize the mean μkμk, the covariance matrix ΣkΣk and the mixing
coefficients πkπk by some random values(or other values).
 Compute the CkCk values for all k.
 Again Estimate all the parameters using the current \C_k values.
 Compute log-likelihood function.
 Put some convergence criterion
 If the log-likelihood value converges to some value (or if all the parameters
converge to some values) then stop, else return to Step 2.
This algorithm only guarantee that we land to a local optimal point, but it do not
guarantee that this local optima is also the global one. And so, if the algorithm

19
starts from different initialization points, in general it lands into different
configurations.

Program:
import os
for dirname, _, filenames in os.walk('/kaggle/input'):
for filename in filenames:
print(os.path.join(dirname, filename))
/kaggle/input/ccdata/CC GENERAL.csv

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from pandas import DataFrame
from sklearn.preprocessing import StandardScaler, normalize
from sklearn.decomposition import PCA
from sklearn.mixture import GaussianMixture
from sklearn.metrics import silhouette_score
from sklearn.model_selection import train_test_split
from sklearn import metrics
In [3]:
raw_df = pd.read_csv('../input/ccdata/CC GENERAL.csv')
raw_df = raw_df.drop('CUST_ID', axis = 1)
raw_df.fillna(method ='ffill', inplace = True)
raw_df.head(2)
Out[3]:

20
P
B
O U C C M P
A IN
N R PU AS A I R
L ST ON P
E C C RC H_ S C N C
A A EO U
O A H HA A H R I _
N P LL FF R
F S A SES D _ E P M F
B C U M _P C
F H S _IN V A D A U U T
A E R E UR H
_ _ E ST A D I Y M L E
L _ C N CH A
P A S_ AL N V T M _ L N
A F H TS AS S
U D F LM CE A _ E P _ U
N R A _P ES E
R V R EN _F N L N A P R
C E S U _F S
C A E TS_ RE C I T Y A E
E Q E R RE _
H N Q FRE Q E M S M Y
U S C QU T
A C U QU UE _ I E M
E H EN R
S E E EN N T T N E
N AS CY X
E N CY C R T N
C ES
S C Y X S T
Y
Y

2
4 1
0 0
0 3 0.
. 1 1
. 9. 0
0. 9 0 0. 0 .
9 0.08 5 0
81 5 0. 95. 0 16 0.0 0 8 1
0 0 0.0 333 0 2 0 0
81 . 0 4 0 66 0 0 0 2
0 3 9 0
82 4 0 67 . 2
7 7 0
0 0 0
4 8 0
0 8
9 7
4

1 3 0. 0 0. 0.0 6 0. 0.0 0.00 0.2 4 0 7 4 1 0. 1

2 90 . 0 4 00 000 5 0 1 0 2 2
0 90 0 4 00 0 0 0 7 2
2 91 2 00 0 3 2. 2
. . . . 3 2

21
P
B
O U C C M P
A IN
N R PU AS A I R
L ST ON P
E C C RC H_ S C N C
A A EO U
O A H HA A H R I _
N P LL FF R
F S A SES D _ E P M F
B C U M _P C
F H S _IN V A D A U U T
A E R E UR H
_ _ E ST A D I Y M L E
L _ C N CH A
P A S_ AL N V T M _ L N
A F H TS AS S
U D F LM CE A _ E P _ U
N R A _P ES E
R V R EN _F N L N A P R
C E S U _F S
C A E TS_ RE C I T Y A E
E Q E R RE _
H N Q FRE Q E M S M Y
U S C QU T
A C U QU UE _ I E M
E H EN R
S E E EN N T T N E
N AS CY X
E N CY C R T N
C ES
S C Y X S T
Y
Y

4 9 0
4
6 4 3
0
7 5 2 2
0 2
4 4 5 2
1
1 8 9
7
6 3 7

In [4]:
# Standardize data
scaler = StandardScaler()
scaled_df = scaler.fit_transform(raw_df)

22
# Normalizing the Data
normalized_df = normalize(scaled_df)
# Converting the numpy array into a pandas DataFrame
normalized_df = pd.DataFrame(normalized_df)
# Reducing the dimensions of the data
pca = PCA(n_components = 2)
X_principal = pca.fit_transform(normalized_df)
X_principal = pd.DataFrame(X_principal)
X_principal.columns = ['P1', 'P2']
X_principal.head(2)
Out[4]:

P1 P2

0 -0.489949 -0.679976

1 -0.519098 0.544828

In [5]:
gmm = GaussianMixture(n_components = 3)
gmm.fit(X_principal)
Out[5]:
GaussianMixture(covariance_type='full', init_params='kmeans', max_iter=100,

23
means_init=None, n_components=3, n_init=1, precisions_init=None,
random_state=None, reg_covar=1e-06, tol=0.001, verbose=0,
verbose_interval=10, warm_start=False, weights_init=None)
In [6]:
# Visualizing the clustering
plt.scatter(X_principal['P1'], X_principal['P2'],
c = GaussianMixture(n_components = 3).fit_predict(X_principal), cmap
=plt.cm.winter, alpha = 0.6)
plt.show()

Result:
The above experiment of Solving Data & Text Clustering using Gaussian
Mixture Models is executed and output verified successfully.

Root Node Attribute Selection For Decision Trees Using

Exp No: 6
Information Gain

Aim:
To solve Data & Text Clustering using Gaussian Mixture Models

24
Algorithm:

 Root Node: It represents the entire population or sample and this further

gets divided into two or more homogeneous sets.

 Splitting: It is a process of dividing a node into two or more sub-nodes.

 Decision Node: When a sub-node splits into further sub-nodes, then it is

called the decision node.

 Leaf / Terminal Node: Nodes do not split is called Leaf or Terminal node.

 Pruning: When we remove sub-nodes of a decision node, this process is

called pruning. You can say the opposite process of splitting.

 Branch / Sub-Tree: A subsection of the entire tree is called branch or sub-

tree.

 Parent and Child Node: A node, which is divided into sub-nodes is called a

parent node of sub-nodes whereas sub-nodes are the child of a parent node.

Program:

Now, lets draw a Decision Tree for the following data using Information
gain.
Training set: 3 features and 2 classes

25
X Y Z C

1 1 1 I

1 1 0 I

0 0 1 II

1 0 0 II

Output:

Here, we have 3 features and 2 output classes.

To build a decision tree using Information gain. We will take each of the
feature and calculate the information for each feature.

Split on feature X

26
Split on feature Y

Split on feature Z

27
Result:
The above experiment of root node attribute selection for decision trees
using information gain is executed and output verified successfully.

28
Exp No: 7 Bayesian Inference In Gene Expression Analysis

Aim:
To solve Bayesian Inference in Gene Expression Analysis.

Algorithm:

 Start
 Omics techniques have changed the way we depict the molecular features of
a cell.
 The integrative and quantitative analysis of omics data raises unprecedented
expectations for understanding biological systems on a global scale.
 However, its inherently noisy nature, together with limited knowledge of
potential sources of variation impacting health and disease, require the use
of proper mathematical and computational methods for its analysis and
integration.
 Bayesian inference of probabilistic models allows propagation of the
uncertainty from the experimental data
 Stop

Program:

1. Prior: P(θ = t) is the probability a priori that the unobserved parameter of

interest θ takes a value t, based only on our knowledge before performing the
experiment. For the example above, the prior probability distribution should be
centered around 1. Bayesian statistics allows us to tune this expectation: If we
are confident that the expression of Pparγ is 1, we will suggest as prior a
distribution with almost no dispersion around the expected value (Figure 1A).
That would be a strong prior. On the other hand, if our experiments were
unclear, we would propose a non-informative or weak prior with a lot of
dispersion around the expected value (Figure 1B).
2. Likelihood: P(y | θ = t), how plausible it is to have observed the data y if we
are in certain scenario, that is, if θ was to take a value of t. In our example, we
know that the measurements of the gene expression using RNA-Seq after
normalization and in the log2 scale follow approximately a normal distribution,
centered around its expected actual expression 1:

29
Output:

P(y|θ)~N(θ,σ)

Posterior: P(θ = t|y) is the probability that the parameter θ takes a value t,
knowing now that we have observed the data y. Given the data and the a
priori distribution of the parameter of interest, we want to infer the probability of
the different possible values of the parameters of interest. Using the Bayes
Theorem:

Output:

P(θ|y)=p(y|θ)p(θ)p(y)∝ P(y|θ)⋅P(θ), since P(y) is constant

In complex models with many parameters, the relationships between the data and
the parameters, and among parameters, can be represented in Directed Acyclic
Graphs (DAGs), which are useful representations of complex probabilistic models
and show their modular structure (10). Figure 2 shows the DAG for the Bayesian
inference problem of trying to infer differences in gene expression across two
conditions. In particular, DAGs help us to decompose the joint priors and posterior
distributions through the chain rule, paying attention only to the parents of each
parameter:

Output:

P(θ1,…,θn)=∏i=1nP(θi|parents(θi))

Result:
The above experiment of solving Bayesian Inference in Gene Expression
Analysis.

30
Exp No: 8 Pattern Recognition Application Using Bayesian Inference.

Aim:
To solve Pattern Recognition Application using Bayesian
Inference.

Algorithm:

 Start
 Electromyogram (EMG) has been utilized to interface signals for prosthetic
hands and information devices
 A scale mixture model is a stochastic EMG model in which the EMG
variance is considered as a random variable, enabling the representation of
uncertainty in the variance.
 his model is extended in this study and utilized for EMG pattern
classification.
 The proposed method is trained by variational Bayesian learning, thereby
allowing the automatic determination of the model complexity.
 Stop

Procedure:

A mutual information-based determination method is

introduced. Simulation and EMG analysis experiments demonstrated the
relationship between the hyperparameters and classification accuracy of the
proposed method as well as the validity of the proposed method. The comparison
using public EMG datasets revealed that the proposed method outperformed the
various conventional classifiers.

These results indicated the validity of the proposed method and

its applicability to EMG-based control systems. In EMG pattern recognition, a
classifier based on a generative model that reflects the stochastic characteristics of
EMG signals can outperform the conventional general-purpose classifier.

31
PROGRAM:

from sklearn import tree

from sklearn.datasets import load_iris
from sklearn.metrics import accuracy_score
import numpy#Preparing the data set - Loading the data via iris.data - Loading the
descriptions of the data via iris.target
#The names of the plant species can be retrieved via "iris_target_names". The
names are stored as IDs (numbers) in "data".
iris = load_iris()
x_coordinate = iris.data
y_coordinate = iris.target
plant_names = iris.target_names#Create random indexes used to retrieve the data
in the iris dataset
array_ids = numpy.random.permutation(len(x_coordinate))#In "train" the data is
used for learning for the Machine Learning program.
#In "real" the actual data is stored, which is used to check the predicted data.
#The last 15 values are used for "real" for checking, the rest for "train".
x_coordinate_train = x_coordinate[array_ids[:-15]]
x_coordinate_real = x_coordinate[array_ids[-15:]]y_coordinate_train =
y_coordinate[array_ids[:-15]]
y_coordinate_real = y_coordinate[array_ids[-15:]]#Classify the data using a
decision tree and train it with the previously created data.
data_classification = tree.DecisionTreeClassifier()
data_classification.fit(x_coordinate_train,y_coordinate_train)#Create predictions
from existing data (in data set "real")
prediction = data_classification.predict(x_coordinate_real)#Display the predicted
names
print(prediction)
#The actual values
print(y_coordinate_real)
#Calculate the accuracy of the predicted data -
# Method accuracy_score() gets the predicted value and the actual value returned
print("Accuracy in percent: %.2f" %
((accuracy_score(prediction,y_coordinate_real)) * 100))

32
OUTPUT:

[1 1 0 2 2 0 1 2 1 0 2 1 2 0 2]
[2 1 0 2 2 0 1 2 1 0 2 1 2 0 2]
Accuracy in percent: 93.33

Result:
The above experiment of solving pattern recognition application using
bayesian inference.

33
Exp No: 9 Bagging In Classification

Aim:
To solve Bagging in classification.

Algorithm:

 Start
 A Bagging classifier is an ensemble meta-estimator that fits base classifiers
each on random subsets of the original dataset
 Such a meta-estimator can typically be used as a way to reduce the variance
of a black-box estimator
 Each base classifier is trained in parallel with a training set which is
generated by randomly drawing.
 Stop

Program:

Since Bagging resamples the original training dataset with replacement, some
instance(or data) may be present multiple times while others are left out.
Original training dataset: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10
Resampled training set 1: 2, 3, 3, 5, 6, 1, 8, 10, 9, 1
Resampled training set 2: 1, 1, 5, 6, 3, 8, 9, 10, 2, 7
Resampled training set 3: 1, 5, 8, 9, 2, 10, 9, 7, 5, 4
Algorithm for the Bagging classifier:
Classifier generation:

Let N be the size of the training set.

for each of t iterations:
sample N instances with replacement from the original training set.
apply the learning algorithm to the sample.
store the resulting classifier.

Classification:
for each of the t classifiers:
predict class of instance using classifier.
return class that was predicted most often.

34
Below is the Python implementation of the above algorithm:

from sklearn import model_selection

from sklearn.ensemble import BaggingClassifier
from sklearn.tree import DecisionTreeClassifier
import pandas as pd

# load the data

url = "/home/debomit/Downloads/wine_data.xlsx"
dataframe = pd.read_excel(url)
arr = dataframe.values
X = arr[:, 1:14]
Y = arr[:, 0]

seed = 8
kfold = model_selection.KFold(n_splits = 3,
random_state = seed)

# initialize the base classifier

base_cls = DecisionTreeClassifier()

# no. of base classifier

num_trees = 500

# bagging classifier
model = BaggingClassifier(base_estimator = base_cls,
n_estimators = num_trees,
random_state = seed)

results = model_selection.cross_val_score(model, X, Y, cv = kfold)

print("accuracy :")
print(results.mean())

35
Output:

accuracy :
0.8372093023255814

Result:
The above experiment of solving Bagging in classification is executed and
output verified successfully.

36
Using Weka Tool For SVM Classification For
Exp No: 10
Chosen Domain Application.

Aim:
To solve using weka tool for SVM classification for chosen
domain application.

Algorithm:

Weka tool is an open-source tool developed by students of Waikato

university which stands for Waikato Environment for Knowledge Analysis having
all inbuilt machine learning algorithms.
 It is used for solving real-life problems using data mining techniques. The
tool was developed using the Java programming language so that it is
platform-independent.
 The tool itself contains some data sets in the data file of the application, We
can them to implement our algorithms.
 The dataset we are going to use is breast-cancer.arff. Classification can be
defined by Prediction models that predict continuous-valued functions,
while classification models predict categorical class marks.
 In this article, we are going to learn the classification implementation on a
dataset using WEKA tool. We will use two different classifiers for this.

PROGRAM:

>>> import numpy as np

>>> from sklearn.datasets import make_classification
>>> from sklearn.model_selection import train_test_split
>>> from sklearn import svm
>>> X, y = make_classification(n_samples=10, random_state=0)
>>> X_train , X_test , y_train, y_test = train_test_split(X, y, random_state=0)
>>> clf = svm.SVC(kernel='precomputed')
>>> # linear kernel computation
>>> gram_train = np.dot(X_train, X_train.T)
>>> clf.fit(gram_train, y_train)
SVC(kernel='precomputed')
>>> # predict on training examples
>>> gram_test = np.dot(X_test, X_train.T)
37
>>> clf.predict(gram_test)
array([0, 1, 0])

OUTPUT:

RESULT:
Thus by using Weka tool for SVM classification for chosen domain
application is successfully executed and verified successfully.

Fundamentals of Bayesian Epistemology 1 : Introducing Credences Michael G. Titelbaum - The full ebook version is ready for instant download
100% (1)
Fundamentals of Bayesian Epistemology 1 : Introducing Credences Michael G. Titelbaum - The full ebook version is ready for instant download
54 pages
An Introduction To Bayesian Statistics
100% (9)
An Introduction To Bayesian Statistics
20 pages
Xiao Et Al. - 2025 - A General Probabilistic Framework for Impact Local
No ratings yet
Xiao Et Al. - 2025 - A General Probabilistic Framework for Impact Local
35 pages
Reviewer For Management Science
No ratings yet
Reviewer For Management Science
23 pages
Combine PDF
No ratings yet
Combine PDF
75 pages
UNIT III MACHINE LEARNING
No ratings yet
UNIT III MACHINE LEARNING
19 pages
Bayesian Estimation of The Concentration Parameter of The L-Mode Von Mises Distribution
No ratings yet
Bayesian Estimation of The Concentration Parameter of The L-Mode Von Mises Distribution
3 pages
Notes On Econometrics I: Grace Mccormack
No ratings yet
Notes On Econometrics I: Grace Mccormack
50 pages
Baio Bayesian Health Economics PDF
No ratings yet
Baio Bayesian Health Economics PDF
67 pages
ML5_Implementation
No ratings yet
ML5_Implementation
32 pages
ml record
No ratings yet
ml record
19 pages
Exam Sta 426
No ratings yet
Exam Sta 426
6 pages
Sims Zha - Error Bands For IRF
No ratings yet
Sims Zha - Error Bands For IRF
42 pages
Title
No ratings yet
Title
10 pages
Decision Trees
67% (3)
Decision Trees
14 pages
Lecture 7 Dynare
No ratings yet
Lecture 7 Dynare
46 pages
7 Bayesian Analysis
No ratings yet
7 Bayesian Analysis
96 pages
Regresion Heterocedástica
No ratings yet
Regresion Heterocedástica
21 pages
Chapter 4
No ratings yet
Chapter 4
3 pages
305_BA_MachineLearning_And_Cognitive_Intellegence_using_Python_1
No ratings yet
305_BA_MachineLearning_And_Cognitive_Intellegence_using_Python_1
32 pages
Bishop Solutions PDF
No ratings yet
Bishop Solutions PDF
87 pages
ML Practical Updated
No ratings yet
ML Practical Updated
64 pages
Decision Tree
No ratings yet
Decision Tree
16 pages
Bayesian Model Averaging For Linear Regression Models
No ratings yet
Bayesian Model Averaging For Linear Regression Models
14 pages
Introduction To Decision Tree: Gini Index
No ratings yet
Introduction To Decision Tree: Gini Index
15 pages
When The World Becomes Too Real
No ratings yet
When The World Becomes Too Real
7 pages
ml using python programs
No ratings yet
ml using python programs
12 pages
Machine Learning
No ratings yet
Machine Learning
16 pages
MLT 1 - 7 Kanish
No ratings yet
MLT 1 - 7 Kanish
24 pages
3 Bayesian Deep Learning
No ratings yet
3 Bayesian Deep Learning
33 pages
Is Lab Aman Agarwal PDF
No ratings yet
Is Lab Aman Agarwal PDF
8 pages
Probabilistic Engineering Mechanics: Swarup Ghosh, Subrata Chakraborty
No ratings yet
Probabilistic Engineering Mechanics: Swarup Ghosh, Subrata Chakraborty
12 pages
AIH_Lab2
No ratings yet
AIH_Lab2
10 pages
2002 Banning. Archaeological Survey as Optimal Search
No ratings yet
2002 Banning. Archaeological Survey as Optimal Search
11 pages
ml lab programs 2
No ratings yet
ml lab programs 2
16 pages
Decision Tree and Related Techniques For Classification in Scalation
No ratings yet
Decision Tree and Related Techniques For Classification in Scalation
12 pages
Leon-Garcia-IPPR_Chapters 1-6
No ratings yet
Leon-Garcia-IPPR_Chapters 1-6
180 pages
Karmbir 19 ML
No ratings yet
Karmbir 19 ML
20 pages
Don't Calibrate MMM Models Through Experiments
No ratings yet
Don't Calibrate MMM Models Through Experiments
27 pages
Naive Bayes
No ratings yet
Naive Bayes
60 pages
Orange3 Data Mining Library Using Python
50% (2)
Orange3 Data Mining Library Using Python
102 pages
Mlsp Lab Exp4
No ratings yet
Mlsp Lab Exp4
9 pages
Programs Lab Bca
No ratings yet
Programs Lab Bca
16 pages
IV Ai & Ds Al3451 Ml Unit2
No ratings yet
IV Ai & Ds Al3451 Ml Unit2
50 pages
practical 15 python
No ratings yet
practical 15 python
6 pages
MLA Lab 6:-Implementation of Decision Tree
No ratings yet
MLA Lab 6:-Implementation of Decision Tree
16 pages
A Tutorial On Conducting and Interpreting A Bayesian ANOVA in JASP
No ratings yet
A Tutorial On Conducting and Interpreting A Bayesian ANOVA in JASP
22 pages
Phase 3 IBM
No ratings yet
Phase 3 IBM
7 pages
ML Priyesha - 778
No ratings yet
ML Priyesha - 778
23 pages
ML for predictive analysis
No ratings yet
ML for predictive analysis
4 pages
ML 2
No ratings yet
ML 2
3 pages
Random Forest: The Algorithm in A Nutshell
No ratings yet
Random Forest: The Algorithm in A Nutshell
10 pages
Decision Trees
No ratings yet
Decision Trees
11 pages
Silver Oak College of Computer Application: Subject:Machine Learning
No ratings yet
Silver Oak College of Computer Application: Subject:Machine Learning
15 pages
확통1 LectureNote09 on Bayesian Statistical Inference
No ratings yet
확통1 LectureNote09 on Bayesian Statistical Inference
78 pages
Machine Learnig Revision
No ratings yet
Machine Learnig Revision
93 pages
PRACTICAL FILE DL
No ratings yet
PRACTICAL FILE DL
14 pages
ML RECORD - Merged
No ratings yet
ML RECORD - Merged
33 pages
Practical No4 - 5 ML
No ratings yet
Practical No4 - 5 ML
11 pages
Mlda - Lab
No ratings yet
Mlda - Lab
35 pages
Machine Learning (Se204A) Lab Manual
No ratings yet
Machine Learning (Se204A) Lab Manual
27 pages
LAB MANUAL For Machine Learning
No ratings yet
LAB MANUAL For Machine Learning
15 pages
Bayesian Inference: Statisticat, LLC
No ratings yet
Bayesian Inference: Statisticat, LLC
30 pages
unit 4 ML
No ratings yet
unit 4 ML
24 pages
Seminar Presentation
No ratings yet
Seminar Presentation
25 pages
Machine Learning Strategies
No ratings yet
Machine Learning Strategies
59 pages
Optimized Classification On Forest Covertype: COMP5318 M L D M A 2 R
No ratings yet
Optimized Classification On Forest Covertype: COMP5318 M L D M A 2 R
16 pages
ML Shristi File
No ratings yet
ML Shristi File
49 pages
Lecture 7.2 - DTC Algorithm Implementation
No ratings yet
Lecture 7.2 - DTC Algorithm Implementation
7 pages
Dl
No ratings yet
Dl
10 pages
Chapter 03
No ratings yet
Chapter 03
30 pages
Chapter 03
No ratings yet
Chapter 03
30 pages
AIML 7 To 11
No ratings yet
AIML 7 To 11
7 pages
MlLabManualdocx 2024 09 04 22 02 58
No ratings yet
MlLabManualdocx 2024 09 04 22 02 58
19 pages
Orange 3
100% (1)
Orange 3
46 pages
Machine Learning
No ratings yet
Machine Learning
16 pages
entropy and information gain for decision tree algorithm
No ratings yet
entropy and information gain for decision tree algorithm
12 pages
Bayesian-Statistics Final 20140416 3
No ratings yet
Bayesian-Statistics Final 20140416 3
38 pages
Week 7 Laboratory Activity
No ratings yet
Week 7 Laboratory Activity
12 pages
ML Recap
No ratings yet
ML Recap
96 pages
Bayes Manuscripts
No ratings yet
Bayes Manuscripts
180 pages
Chapter - 5 (New) PDF
No ratings yet
Chapter - 5 (New) PDF
17 pages
Machine Learning Guide: Meher Krishna Patel
No ratings yet
Machine Learning Guide: Meher Krishna Patel
121 pages
An Introduction To Bayesian Statistics and MCMC Methods
No ratings yet
An Introduction To Bayesian Statistics and MCMC Methods
69 pages
Preface To The Second Edition V 1 1
No ratings yet
Preface To The Second Edition V 1 1
9 pages
DEEP LEARNING TECHNIQUES: CLUSTER ANALYSIS and PATTERN RECOGNITION with NEURAL NETWORKS. Examples with MATLAB
From Everand
DEEP LEARNING TECHNIQUES: CLUSTER ANALYSIS and PATTERN RECOGNITION with NEURAL NETWORKS. Examples with MATLAB
César Pérez López
No ratings yet
DATA MINING and MACHINE LEARNING: CLUSTER ANALYSIS and kNN CLASSIFIERS. Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING: CLUSTER ANALYSIS and kNN CLASSIFIERS. Examples with MATLAB
César Pérez López
No ratings yet
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
From Everand
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
César Pérez López
No ratings yet
DATA MINING and MACHINE LEARNING. CLASSIFICATION PREDICTIVE TECHNIQUES: NAIVE BAYES, NEAREST NEIGHBORS and NEURAL NETWORKS: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. CLASSIFICATION PREDICTIVE TECHNIQUES: NAIVE BAYES, NEAREST NEIGHBORS and NEURAL NETWORKS: Examples with MATLAB
César Pérez López
No ratings yet
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
César Pérez López
No ratings yet

MLT - Lab - Manual FINAL

Uploaded by

MLT - Lab - Manual FINAL

Uploaded by

MACHINE LEARNING TECHNIQUES

MAHENDRA ENGINEERING COLLEGE

1. Solving Regression & Classification using Decision Trees

S.NO NAME OF THE EXPERIMENTS PAGE NO.

ROOT NODE ATTRIBUTE SELECTION FOR 11

BAYESIAN INFERENCE IN GENE EXPRESSION 15

DATA & TEXT CLUSTERING USING K-MEANS 33

Exp No :1 Solving Regression & Classification Using Decision Trees

Step 2: Initialize and print the Dataset.

Step 5: Fit decision tree regressor to the dataset.

Step 6: Predicting a new value.

Step 7: Visualising the result.

Step 8: The tree is finally exported and shown in the TREE

# print the dataset

from sklearn.tree import DecisionTreeRegressor

X_grid = np.arange(min(X), max(X), 0.01)

from sklearn.tree import export_graphviz

Output (Decision Tree):

Step 2: import the libraries like numpy ,matplotlib,etc

Step 3: Do the experiment with coin as given in question

Step 4: find the head and tail probability and cumulative

Step 5: get output by using the function show

1 # evaluate bagging algorithm for classification

Accuracy: 0.856 (0.037)

Exp No 2b: Boosting Applications Using Regression Trees.

Step 2: import the libraries like numpy , matplotlib etc

Step 3: Do the experiment with the given dataset.

Step 4: Train and test the given model and datasets.

Step 5: get output

print('The accuracy is: ',predictions*100,'%')

The accuracy is: 73.06860322953968 %

Exp No:3 Data & Text Classification Using Neural Networks

1. Refer to libraries we need

2. provide training data

3. organize our data

4. iterate: code + test the results + tune the model

Exp No: 4 Data & Text Clustering using K-means algorithm

1. Refer to libraries we need and import them as necessary

2. provide training data in csv extention

3. organize our data and train them

4. iterate: code + test the results + tune the model

5. get executable output

1 3 0. 0 0. 0.0 6 0. 0.0 0.00 0.2 4 0 7 4 1 0. 1

Root Node Attribute Selection For Decision Trees Using

gets divided into two or more homogeneous sets.

 Splitting: It is a process of dividing a node into two or more sub-nodes.

 Decision Node: When a sub-node splits into further sub-nodes, then it is

called the decision node.

 Pruning: When we remove sub-nodes of a decision node, this process is

called pruning. You can say the opposite process of splitting.

 Branch / Sub-Tree: A subsection of the entire tree is called branch or sub-

Here, we have 3 features and 2 output classes.

1. Prior: P(θ = t) is the probability a priori that the unobserved parameter of

P(θ|y)=p(y|θ)p(θ)p(y)∝ P(y|θ)⋅P(θ), since P(y) is constant

A mutual information-based determination method is

These results indicated the validity of the proposed method and

from sklearn import tree

Let N be the size of the training set.

from sklearn import model_selection

# load the data

# initialize the base classifier

# no. of base classifier

results = model_selection.cross_val_score(model, X, Y, cv = kfold)

Weka tool is an open-source tool developed by students of Waikato

>>> import numpy as np

You might also like