0% found this document useful (0 votes)

23 views17 pages

Machine Learning

Unit 5 data science

Uploaded by

bonamkotaiah

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

23 views17 pages

Machine Learning

Unit 5 data science

Uploaded by

bonamkotaiah

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 17

MACHINE LEARNING

Machine Learning is a branch of artificial intelligence.

Capability to learn without being explicitly programmed.
It provides techniques to extract data and then appends various methods to learn from the
collected data and then with the help of some well-defined algorithms to be able to predict
future trends from the data.

Ex: Google search, Amazon, Netflix

Arthur Samuel first used the term "machine learning" in 1959.

Features of Machine Learning:

Machine learning uses data to detect various patterns in a given dataset.
It can learn from past data and improve automatically.
It is a data-driven technology.
Machine learning is much similar to data mining as it also deals with the huge amount of
the data.
Categories of Machine Learning
At a broad level, machine learning can be classified as:
1) Supervised Learning
Supervised learning is a type of machine learning in which the algorithm is trained
on the labeled dataset.

In supervised learning, the algorithm is provided with input features and

corresponding output labels, and it learns to generalize from this data to make
predictions on new, unseen data.

There are two main types of supervised learning:

 Regression:

In regression task the algorithm learns to predict continuous values based on input
features.

Regression algorithms in machine learning are:

Linear Regression, Polynomial Regression, Ridge Regression, Decision Tree

Regression, Random Forest Regression, Support Vector Regression, etc

 Classification:

In classification task the algorithm learns to assign input data to a specific category or
class based on input features.

The output labels in classification are discrete values.

Classification algorithms can be binary, where the output is one of two possible
classes, or multiclass, where the output can be one of several classes.

The different Classification algorithms in machine learning are: Logistic Regression,

Naive Bayes, Decision Tree, Support Vector Machine (SVM), K-Nearest Neighbors
(KNN), etc

2. Unsupervised Machine Learning

Unsupervised learning is a type of machine learning where the algorithm learns

to recognize patterns in data without being explicitly trained using labeled examples.
The goal of unsupervised learning is to discover the underlying structure or distribution
in the data.

There are two main types of unsupervised learning:

 Clustering:

Clustering algorithms group similar data points together based on their

characteristics. Some popular clustering algorithms include K-means, Hierarchical
clustering.

 Dimensionality Reduction:

Dimensionality reduction algorithms reduce the number of input variables in a dataset

while preserving as much of the original information as possible.

This is useful for reducing the complexity of a dataset and making it easier to
visualize and analyze.

Some popular dimensionality reduction algorithms include Principal Component

Analysis (PCA), t-SNE, and Autoencoders.

Ex:

Applying a classification model to new data.

---
INTRODUCING SCIKIT-LEARN

Scikit-learn (Sklearn) is the most useful and robust library for machine learning in
Python. It provides a selection of efficient tools for machine learning and statistical modeling
including classification, regression, clustering and dimensionality reduction via a consistence
interface in Python.

Data Representation in Scikit-Learn

Data as table

The best way to represent data in Scikit-learn is in the form of tables. A table represents
a 2-D grid of data where rows represent the individual elements of the dataset and the columns
represents the quantities related to those individual elements.

Ex:

import seaborn as sns

iris = sns.load_dataset('iris')
iris.head()

In general, we will refer to the rows of the matrix as samples, the number of rows as
n_samples, columns of the matrix as features, and the number of columns as n_features.

Data as Feature Matrix

Features matrix may be defined as the table layout where information can be thought
of as a 2-D matrix. It is stored in a variable named X and assumed to be two dimensional with
shape [n_samples, n_features]. Mostly, it is contained in a NumPy array or a Pandas
DataFrame.

The samples (rows) always refer to the individual objects and the features (columns)
always refer to the distinct observations that describe each sample in a quantitative manner.
Data as Target array

Along with Features matrix, denoted by X, we also have target array. It is also called
label. It is denoted by y. The label or target array is usually one-dimensional having length
n_samples. Target array may have both the values, continuous numerical values and discrete
values.

---

SCIKIT-LEARN’S ESTIMATOR API

It is one of the main APIs implemented by Scikit-learn.

It provides a consistent interface for a wide range of ML applications.

That’s why all machine learning algorithms in Scikit-Learn are implemented via Estimator
API.
The Scikit-Learn API is designed with the following guiding principles

Consistency

All objects share a common interface drawn from a limited set of methods, with

consistent documentation.

Inspection

All specified parameter values are exposed as public attributes.

Limited object hierarchy

Only algorithms are represented by Python classes; datasets are represented in

standard formats (NumPy arrays, Pandas DataFrames, SciPy sparse matrices) and

parameter names use standard Python strings.

Composition

As we know that, ML algorithms can be expressed as the sequence of many

fundamental algorithms. Scikit-learn makes use of these fundamental algorithms
whenever needed.

Sensible defaults

When models require user-specified parameters, the library defines an appropriate

default value.

Steps in using Estimator API

Step 1: Choose a class of model

In this first step, we need to choose a class of model. It can be done by importing the
appropriate Estimator class from Scikit-learn.

Step 2: Choose model hyperparameters

In this step, we need to choose class model hyperparameters. It can be done by instantiating
the class with desired values.
Step 3: Arranging the data

Next, we need to arrange the data into features matrix (X) and target vector(y).

Step 4: Model Fitting

Now, we need to fit the model to your data. It can be done by calling fit() method of the
model instance.

Step 5: Applying the model

After fitting the model, we can apply it to new data. For supervised learning, use predict()
method to predict the labels for unknown data. While for unsupervised learning, use predict()
or transform() to infer properties of the data.

Ex: Supervised learning example: Simple linear regression

---

FEATURE ENGINEERING
Feature engineering is the process of transforming raw data into features that are
suitable for machine learning models. In other words, it is the process of selecting, extracting,
and transforming the most relevant features from the available data to build more accurate and
efficient machine learning models.

The success of machine learning models heavily depends on the quality of the features
used to train them. Feature engineering involves a set of techniques that enable us to create
new features by combining or transforming the existing ones.
Categorical Features
It transforms each categorical attribute into a numeric representation. Transforming
categorical data into numeric data is often called “categorical-column encoding”.

One-hot encoding is the simplest and most basic categorical-column encoding method.
The idea is to have a unique binary number of multiple digits for each category. Hence, the
number of digits is the number of categories. The binary number has one digit as 1 and the rest
zeros, hence the name ‘one-hot.’

Text Features
Another common need in feature engineering is to convert text to a set of representative
numerical values. One of the simplest methods of encoding data is by word counts: you take
each snippet of text, count the occurrences of each word within it, and put the results in a table.

Ex:
sample = ['problem of evil', 'evil queen', 'horizon problem']

Image Features
Another common need is to suitably encode images for machine learning analysis.
The simplest approach is to use the pixel values.
Derived Features
The goal of creating derived features is to improve the performance of machine learning
models by providing additional information or reducing noise in the data. Derived features are
useful in many machine-learning applications, including image recognition, natural language
processing, and financial analysis.

---

NAIVE BAYES CLASSIFICATION

 Naive Bayes algorithm is a supervised learning algorithm, which is based on Bayes

theorem and used for solving classification problems.
 It is mainly used in text classification that includes a high-dimensional training dataset.
 It is one of the simple and most effective classification algorithms which helps in
building the fast machine learning models that can make quick predictions.
 It predicts on the basis of the probability of an object.
 Some popular examples of Naive Bayes Algorithm are spam filtration, classifying
articles.

The Naive Bayes algorithm is comprised of two words Naive and Bayes:

Naive:

It is called Naive because it assumes that the occurrence of a certain feature is

independent of the occurrence of other features.

Ex: Such as if the fruit is identified on the bases of color, shape, and taste, then red,
spherical, and sweet fruit is recognized as an apple.

Bayes:

It is called Bayes because it depends on the principle of Bayes' Theorem.

Bayes' Theorem:

Bayes' theorem is also known as Bayes' Rule or Bayes' law, which is used to determine
the probability of a hypothesis with prior knowledge. It depends on the conditional probability.

The formula for Bayes' theorem is given as:

Where,

P(A|B) is Posterior probability: Probability of hypothesis A on the observed event B.

P(B|A) is Likelihood probability: Probability of the evidence given that the probability of a
hypothesis is true.

P(A) is Prior Probability: Probability of hypothesis before observing the evidence.

P(B) is Marginal Probability: Probability of Evidence.

Types of Naïve Bayes Model:

Gaussian:

The Gaussian model assumes that features follow a normal distribution. This means if
predictors take continuous values instead of discrete, then the model assumes that these values
are sampled from the Gaussian distribution.

Multinomial:

The Multinomial Naive Bayes classifier is used when the data is multinomial
distributed. It is primarily used for document classification problems, it means a particular
document belongs to which category such as Sports, Politics, education, etc.

Steps to implement:

Data Pre-processing step

Fitting Naive Bayes to the Training set

Predicting the test result

Test accuracy of the result(Creation of Confusion matrix)

Visualizing the test set result.

---
LINEAR REGRESSION

Linear regression is one of the easiest and most popular Machine Learning algorithms.

Linear regression algorithm shows a linear relationship between a dependent and one or more
independent variables.

Types of Linear Regression

Simple Linear Regression

This is the simplest form of linear regression, and it involves only one independent
variable and one dependent variable. The equation for simple linear regression is.

y=\beta_{0}+\beta_{1}X

where:

Y is the dependent variable

X is the independent variable
β0 is the intercept
β1 is the slope

Ex:
import matplotlib.pyplot as plt
import seaborn as sns; sns.set()
import numpy as np
rng = np.random.RandomState(1)
x = 10 * rng.rand(50)
y = 2 * x - 5 + rng.randn(50)
from sklearn.linear_model import LinearRegression
model = LinearRegression(fit_intercept=True)
model.fit(x[:, np.newaxis], y)
xfit = np.linspace(0, 10, 1000)
yfit = model.predict(xfit[:, np.newaxis])
plt.scatter(x, y)
plt.plot(xfit, yfit);
Multiple Linear Regression

This involves more than one independent variable and one dependent variable. The
equation for multiple linear regression is:

y=\beta_{0}+\beta_{1}X+\beta_{2}X+.........\beta_{n}X

where:

Y is the dependent variable

X1, X2, …, Xp are the independent variables

β0 is the intercept

β1, β2, …, βn are the slopes

Best Fit Line

Our primary objective while using linear regression is to locate the best-fit line, which
implies that the error between the predicted and actual values should be kept to a minimum.
The best Fit Line equation provides a straight line that represents the relationship between the
dependent and independent variables. The slope of the line indicates how much the dependent
variable changes for a unit change in the independent variable(s).

---
DECISION TREES AND RANDOM FORESTS

Decision Trees Classification

Decision Tree is a Supervised learning technique. It is mostly preferred for solving

classification problems. It is a tree-structured classifier, where internal nodes represent the
features of a dataset, branches represent the decision rules and each leaf node represents the
outcome.

A decision tree simply asks a question, and based on the answer (Yes/No), it further
split the tree into subtrees.

In a Decision tree, there are two nodes, which are the Decision Node and Leaf Node.
Decision nodes are used to make any decision and have multiple branches, whereas Leaf nodes
are the output of those decisions and do not contain any further branches.

Implementation of Decision Tree

 Data Pre-processing step

 Fitting a Decision-Tree algorithm to the Training set
 Predicting the test result
 Test accuracy of the result.
 Visualizing the test set result.
Random Forest

Random Forest is a popular machine learning algorithm that belongs to the supervised
learning technique. It can be used for both Classification and Regression problems in ML.

Random Forest is a classifier that contains a number of decision trees on various

subsets of the given dataset and takes the average to improve the predictive accuracy of that
dataset. The greater number of trees in the forest leads to higher accuracy.

Implementation of Random Forest Algorithm

 Data Pre-processing step

 Fitting the Random forest algorithm to the Training set
 Predicting the test result
 Test accuracy of the result (Creation of Confusion matrix)
 Visualizing the test set result.

---

PRINCIPAL COMPONENT ANALYSIS

Principal component analysis is a fast and flexible unsupervised method for

dimensionality reduction in data. It is a statistical process that converts the observations of
correlated features into a set of linearly uncorrelated features. These new transformed
features are called the Principal Components. The number of these PCs are either equal to or
less than the original features present in the dataset.

Some properties of these principal components are given below:

 The principal component must be the linear combination of the original features.
 These components are orthogonal, i.e., the correlation between a pair of variables is zero.
 The importance of each component decreases when going to 1 to n, it means the 1 PC
has the most importance, and n PC will have the least importance.

Steps for PCA algorithm

 Getting the dataset

 Representing data into a structure
 Standardizing the data
 Calculating the Covariance of Z
 Calculating the Eigen Values and Eigen Vectors
 Sorting the Eigen Vectors
 Calculating the new features Or Principal Components
 Remove less or unimportant features from the new dataset.

PCA as Noise Filtering

When utilizing real-life data several factors can impact the data. One significant
element is noise. Data collection often presents opportunities for human error and the potential
for unreliable data collection tools leading to inaccuracies commonly referred to as noise. This
noise can present challenges in machine learning, as algorithms can misinterpret and generalize
from this noise.

If a dataset has a high volume of noise, it can severely disrupt the whole data analysis.
Data scientists, often measure noise using a signal to noise ratio. Therefore, data scientists
must address and manage noise in their data science algorithms.

PCA aims to eliminate damaged data from a signal or image utilizing preservative noise
while keeping the essential features. It's a geometric and statistical technique that lowers the
input signal data dimensionality by projecting it along different axes. In simple terms, you can
imagine projecting a point in the XY plane along the X-axis and subsequently removing the
noisy Y-axis. This process is known as "dimensionality reduction."

---

K-MEANS CLUSTERING

K-Means Clustering is an Unsupervised Learning algorithm, which groups the

unlabeled dataset into different clusters. Here K defines the number of pre-defined
clusters that need to be created in the process.

The algorithm takes the unlabeled dataset as input, divides the dataset into k-number
of clusters, and repeats the process until it does not find the best clusters. The value of k
should be predetermined in this algorithm.

The k-means clustering algorithm mainly performs two tasks:

 Determines the best value for K center points or centroids by an iterative process.
 Assigns each data point to its closest k-center. Those data points which are near to the
particular k-center, create a cluster.

Hence each cluster has datapoints with some commonalities, and it is away from other
clusters.

The working of the K-Means algorithm is explained in the below steps:

 Step-1: Select the number K to decide the number of clusters.

 Step-2: Select random K points or centroids. (It can be other from the input dataset).
 Step-3: Assign each data point to their closest centroid, which will form the predefined K
clusters.
 Step-4: Calculate the variance and place a new centroid of each cluster.
 Step-5: Repeat the third steps, which means reassign each datapoint to the new closest
centroid of each cluster.
 Step-6: If any reassignment occurs, then go to step-4 else go to FINISH.
 Step-7: The model is ready.

---

Data Science II: Charles C.N. Wang
No ratings yet
Data Science II: Charles C.N. Wang
38 pages
Scikit - Notes ML
100% (2)
Scikit - Notes ML
12 pages
Intro To Scikit Learning
No ratings yet
Intro To Scikit Learning
18 pages
algorithmeknn-121213175830-phpapp02
No ratings yet
algorithmeknn-121213175830-phpapp02
52 pages
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
César Pérez López
No ratings yet
Python 06 MachineLearning
No ratings yet
Python 06 MachineLearning
45 pages
Machine Learning
No ratings yet
Machine Learning
51 pages
Machine Learning Algorithms
No ratings yet
Machine Learning Algorithms
28 pages
Final ML
No ratings yet
Final ML
2 pages
04_MLModelingBasics
No ratings yet
04_MLModelingBasics
61 pages
Scikit Learn
No ratings yet
Scikit Learn
17 pages
3. Introduction to Machine Learning
No ratings yet
3. Introduction to Machine Learning
27 pages
ML Lectures Summary 2
No ratings yet
ML Lectures Summary 2
52 pages
Week 01
No ratings yet
Week 01
37 pages
Artificial Intelligence Algorithms
From Everand
Artificial Intelligence Algorithms
akosnemeth
No ratings yet
Machine Learning
No ratings yet
Machine Learning
32 pages
ML UNIT-II
No ratings yet
ML UNIT-II
37 pages
ML Notes -2025
No ratings yet
ML Notes -2025
145 pages
Data Science
No ratings yet
Data Science
38 pages
Assignment1_LATEX
No ratings yet
Assignment1_LATEX
11 pages
AIYA SESSION 4
No ratings yet
AIYA SESSION 4
42 pages
1 An Introduction To Machine Learning With Scikit Learn
No ratings yet
1 An Introduction To Machine Learning With Scikit Learn
2 pages
1 - An Introduction To Machine Learning With Scikit-Learn
No ratings yet
1 - An Introduction To Machine Learning With Scikit-Learn
9 pages
Project Occupancy Alfonso Vicente Aragues
No ratings yet
Project Occupancy Alfonso Vicente Aragues
18 pages
CS601_Machine Learning_Unit 1_Notes_1672759748
No ratings yet
CS601_Machine Learning_Unit 1_Notes_1672759748
13 pages
Fire extinguisher prediction using machine learning report
No ratings yet
Fire extinguisher prediction using machine learning report
48 pages
UNIT 1
No ratings yet
UNIT 1
28 pages
DAC ML Tutorial Final Deck
No ratings yet
DAC ML Tutorial Final Deck
150 pages
MODELS (AutoRecovered)
No ratings yet
MODELS (AutoRecovered)
9 pages
Feature and Feature Extractionlect2
No ratings yet
Feature and Feature Extractionlect2
28 pages
Research Trends in Machine Learning: Muhammad Kashif Hanif
No ratings yet
Research Trends in Machine Learning: Muhammad Kashif Hanif
80 pages
Presentation
No ratings yet
Presentation
10 pages
UCS-401_CSE7th M L Lect 02_done
No ratings yet
UCS-401_CSE7th M L Lect 02_done
22 pages
Classification Algorithms I
No ratings yet
Classification Algorithms I
14 pages
SRU ADA Unit-3
No ratings yet
SRU ADA Unit-3
78 pages
DATA MINING and MACHINE LEARNING. CLASSIFICATION PREDICTIVE TECHNIQUES: SUPPORT VECTOR MACHINE, LOGISTIC REGRESSION, DISCRIMINANT ANALYSIS and DECISION TREES: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. CLASSIFICATION PREDICTIVE TECHNIQUES: SUPPORT VECTOR MACHINE, LOGISTIC REGRESSION, DISCRIMINANT ANALYSIS and DECISION TREES: Examples with MATLAB
César Pérez López
No ratings yet
Introduction to AI
No ratings yet
Introduction to AI
51 pages
Fulldoc - Dsec Mca - Crime Prediction (1) - 051521
No ratings yet
Fulldoc - Dsec Mca - Crime Prediction (1) - 051521
65 pages
1739168641630
No ratings yet
1739168641630
30 pages
Slides on DataI
No ratings yet
Slides on DataI
33 pages
Lecture 17&18 - Introduction To Machine Learning
No ratings yet
Lecture 17&18 - Introduction To Machine Learning
51 pages
Module 3 Data Science Machine Learning
No ratings yet
Module 3 Data Science Machine Learning
53 pages
Top 10 Machine Learning Algorithms With Their Use
100% (1)
Top 10 Machine Learning Algorithms With Their Use
12 pages
SK Learn
No ratings yet
SK Learn
9 pages
Feature Engineering For Machine Learning
No ratings yet
Feature Engineering For Machine Learning
41 pages
16 Comparison of Data Science Algorithms
No ratings yet
16 Comparison of Data Science Algorithms
13 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
2 pages
The Implication of Statistical Analysis and Feature Engineering For Model Building Using Machine Learning Algorithms
No ratings yet
The Implication of Statistical Analysis and Feature Engineering For Model Building Using Machine Learning Algorithms
11 pages
Mastering Data Structures and Algorithms in Python & Java
From Everand
Mastering Data Structures and Algorithms in Python & Java
Sachin Naha
No ratings yet
Data Science Vijay1
No ratings yet
Data Science Vijay1
88 pages
ML Workshop
No ratings yet
ML Workshop
78 pages
DEEP LEARNING TECHNIQUES: CLUSTER ANALYSIS and PATTERN RECOGNITION with NEURAL NETWORKS. Examples with MATLAB
From Everand
DEEP LEARNING TECHNIQUES: CLUSTER ANALYSIS and PATTERN RECOGNITION with NEURAL NETWORKS. Examples with MATLAB
César Pérez López
No ratings yet
Unit 2 MLMM
No ratings yet
Unit 2 MLMM
41 pages
Machine Learning
100% (6)
Machine Learning
115 pages
Semi Supervised Learning
No ratings yet
Semi Supervised Learning
86 pages
An Introduction To Supervised Learning With Scikit-Learn: Machine Learning: The Problem Setting
No ratings yet
An Introduction To Supervised Learning With Scikit-Learn: Machine Learning: The Problem Setting
4 pages
VTU ML (1)
No ratings yet
VTU ML (1)
62 pages
Machine Learning with Python: Foundations and Applications: ML, #1
From Everand
Machine Learning with Python: Foundations and Applications: ML, #1
Mohammed Nurudeen
No ratings yet
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
24 pages
Unit-1-MLF-1
No ratings yet
Unit-1-MLF-1
33 pages
Instant Access To Deep Learning Vol 2 From Basics To Practice Andrew Glassner Ebook Full Chapters
100% (5)
Instant Access To Deep Learning Vol 2 From Basics To Practice Andrew Glassner Ebook Full Chapters
62 pages
Mirpur University of Science & Technology, MUST Mirpur AJ&K
No ratings yet
Mirpur University of Science & Technology, MUST Mirpur AJ&K
3 pages
Lab Ex 1,2 Oop
No ratings yet
Lab Ex 1,2 Oop
15 pages
Design of Analog Filters Shauuman
50% (4)
Design of Analog Filters Shauuman
7 pages
Error Detection and Correction
No ratings yet
Error Detection and Correction
36 pages
Integer simplex method (gomory's cutting plane method) Example-2 (Mixed integer)
No ratings yet
Integer simplex method (gomory's cutting plane method) Example-2 (Mixed integer)
7 pages
Regularization in Deep Learning (1)
No ratings yet
Regularization in Deep Learning (1)
49 pages
BSC CS III Semester
No ratings yet
BSC CS III Semester
2 pages
Lecture 7 - Data Cleaning
No ratings yet
Lecture 7 - Data Cleaning
36 pages
Shanthi Pavan - Chopped CT DSM, Unified Theory - TCAS 2019
No ratings yet
Shanthi Pavan - Chopped CT DSM, Unified Theory - TCAS 2019
12 pages
Exam For Information Theory and Source Coding: Dr.-Ing. Michael Mecking BMW Group, Munich
No ratings yet
Exam For Information Theory and Source Coding: Dr.-Ing. Michael Mecking BMW Group, Munich
8 pages
Online Course Assignments
No ratings yet
Online Course Assignments
8 pages
Intro To ML PDF
No ratings yet
Intro To ML PDF
66 pages
Data Structure MCQ Question
0% (1)
Data Structure MCQ Question
6 pages
16 Clustering
No ratings yet
16 Clustering
30 pages
M2 Dav
No ratings yet
M2 Dav
148 pages
Lab 8 - Sorting
No ratings yet
Lab 8 - Sorting
12 pages
AI - GroupPre - Slides - v.4
No ratings yet
AI - GroupPre - Slides - v.4
28 pages
The Dial-a-Ride Problem (DARP) : Variants, Modeling Issues and Algorithms
No ratings yet
The Dial-a-Ride Problem (DARP) : Variants, Modeling Issues and Algorithms
13 pages
Mining Frequent Itemsets Using Apriori Algorithm
No ratings yet
Mining Frequent Itemsets Using Apriori Algorithm
5 pages
A New Classification Technique: Random Weighted LSTM (RWL)
No ratings yet
A New Classification Technique: Random Weighted LSTM (RWL)
4 pages
Unit4 DL Final
No ratings yet
Unit4 DL Final
30 pages
Advance Math 1st Prelim
No ratings yet
Advance Math 1st Prelim
2 pages
TP1 Final Report IP
No ratings yet
TP1 Final Report IP
42 pages
Soft Computing v.imp Ques - 5 Year PYQs ( RRSIMT)
No ratings yet
Soft Computing v.imp Ques - 5 Year PYQs ( RRSIMT)
30 pages
4 8 Dividing A Polynomial by A Polynomial
No ratings yet
4 8 Dividing A Polynomial by A Polynomial
25 pages
Optimal Sample Size Selection For Torusity Estimation Using A PSO Based Neural Network
No ratings yet
Optimal Sample Size Selection For Torusity Estimation Using A PSO Based Neural Network
14 pages
CG (Circle Drawing) : BITS Pilani
No ratings yet
CG (Circle Drawing) : BITS Pilani
12 pages
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
43 pages
MA111 Short Test 1 With Solutions
No ratings yet
MA111 Short Test 1 With Solutions
8 pages

Machine Learning

Uploaded by

Machine Learning

Uploaded by

MACHINE LEARNING

Machine Learning is a branch of artificial intelligence.

Ex: Google search, Amazon, Netflix

Arthur Samuel first used the term "machine learning" in 1959.

Features of Machine Learning:

In supervised learning, the algorithm is provided with input features and

There are two main types of supervised learning:

Regression algorithms in machine learning are:

Linear Regression, Polynomial Regression, Ridge Regression, Decision Tree

The output labels in classification are discrete values.

The different Classification algorithms in machine learning are: Logistic Regression,

2. Unsupervised Machine Learning

Unsupervised learning is a type of machine learning where the algorithm learns

There are two main types of unsupervised learning:

Clustering algorithms group similar data points together based on their

Dimensionality reduction algorithms reduce the number of input variables in a dataset

Some popular dimensionality reduction algorithms include Principal Component

Applying a classification model to new data.

Data Representation in Scikit-Learn

import seaborn as sns

Data as Feature Matrix

SCIKIT-LEARN’S ESTIMATOR API

It is one of the main APIs implemented by Scikit-learn.

It provides a consistent interface for a wide range of ML applications.

All specified parameter values are exposed as public attributes.

Limited object hierarchy

Only algorithms are represented by Python classes; datasets are represented in

parameter names use standard Python strings.

As we know that, ML algorithms can be expressed as the sequence of many

When models require user-specified parameters, the library defines an appropriate

Steps in using Estimator API

Step 1: Choose a class of model

Step 2: Choose model hyperparameters

Step 4: Model Fitting

Step 5: Applying the model

Ex: Supervised learning example: Simple linear regression

NAIVE BAYES CLASSIFICATION

 Naive Bayes algorithm is a supervised learning algorithm, which is based on Bayes

It is called Naive because it assumes that the occurrence of a certain feature is

It is called Bayes because it depends on the principle of Bayes' Theorem.

The formula for Bayes' theorem is given as:

P(A|B) is Posterior probability: Probability of hypothesis A on the observed event B.

P(A) is Prior Probability: Probability of hypothesis before observing the evidence.

P(B) is Marginal Probability: Probability of Evidence.

Types of Naïve Bayes Model:

Data Pre-processing step

Fitting Naive Bayes to the Training set

Predicting the test result

Test accuracy of the result(Creation of Confusion matrix)

Visualizing the test set result.

Types of Linear Regression

Simple Linear Regression

Y is the dependent variable

Y is the dependent variable

X1, X2, …, Xp are the independent variables

β1, β2, …, βn are the slopes

Best Fit Line

Decision Trees Classification

Decision Tree is a Supervised learning technique. It is mostly preferred for solving

Implementation of Decision Tree

 Data Pre-processing step

Random Forest is a classifier that contains a number of decision trees on various

Implementation of Random Forest Algorithm

 Data Pre-processing step

PRINCIPAL COMPONENT ANALYSIS

Principal component analysis is a fast and flexible unsupervised method for

Some properties of these principal components are given below:

Steps for PCA algorithm

 Getting the dataset

PCA as Noise Filtering

K-Means Clustering is an Unsupervised Learning algorithm, which groups the

The k-means clustering algorithm mainly performs two tasks:

The working of the K-Means algorithm is explained in the below steps:

 Step-1: Select the number K to decide the number of clusters.

You might also like