AI-Module 4 - Updated
AI-Module 4 - Updated
3
• Feature engineering process selects the most useful predictor variables
for the model.
9
• Example: Categorical Imputation
10
• Example: Numerical Imputation
28/04/2024 14
(i) Integer Encoding:
• Integer encoding consist in replacing the
categories by digits from 1 to n (or 0 to n-1),
where n is the number of distinct categories of
the variable.
• Each unique category is assigned an integer
value.
• This method is also called as label encoding.
• This method is used when there exists ordinal
relationship in the variables.
28/04/2024 15
(ii) One-Hot Encoding:
• For categorical variables where no ordinal
relationship exists, a one-hot encoding (OHE) can be
applied.
• Here a new binary variable is added for each
unique integer value.
• In the “color” variable example, there are 3
categories: red, green and blue.
• Therefore 3 binary variables: ‘color_red’,
‘color_blue’ and ‘color_green’ are needed.
• A “1” value is placed in the binary variable for the
color and “0” values for the other colors.
• The binary variables are often called “dummy
variables or indicator variables”.
28/04/2024 Prof. Trupthi Rao, Dept. of AI & DS, GAT 16
4. Feature Splitting:
• Feature splitting is the process of separating features into two or more
parts to make new features.
• This technique helps the algorithms to better understand and learn the
patterns in the dataset.
• Example 1: Sale Date is split into year, month and day.
28/04/2024 17
• Example 2: Time stamp is split into 6 different attributes.
24
a) Evolution of Machine Learning
• The term Machine Learning (ML) was first used by Arthur Samuel, one of
the pioneers of Artificial Intelligence at IBM, in 1959.
• Machine learning (ML) is an important tool for the goal of leveraging
technologies around artificial intelligence.
• Because of its learning and decision-making abilities, machine learning is often
referred to as AI, though, in reality, it is a subdivision of AI.
• Until the late 1970s, it was a part of AI’s evolution. Then, it branched off to
evolve on its own.
• Machine learning is now responsible for some of the most significant
advancements in technology.
28/04/2024 25
b) What is Machine Learning (ML)?
• Machine learning (ML) is defined as a discipline of artificial intelligence (AI) that
provides machines the ability to automatically learn from data and past experiences
to identify patterns and make predictions with minimal human intervention.
• Machine learning is a branch of artificial intelligence (AI) and computer science
which focuses on the use of data and algorithms to imitate the way that humans learn,
gradually improving its accuracy.
28/04/2024 30
• supervised machine learning, is defined by its use of labeled datasets
to train algorithms to classify data or predict outcomes accurately.
• As input data is fed into the model, the model adjusts its weights until
it has been fitted appropriately. This occurs as part of the cross
validation process to ensure that the model avoids overfitting or
underfitting
• Supervised learning helps organizations solve a variety of real-world
problems at scale, such as classifying spam in a separate folder from
your inbox. Some methods used in supervised learning include neural
networks, naïve bayes, linear regression, logistic regression, random
forest, and support vector machine (SVM).
28/04/2024 31
1. Supervised Machine Learning Algorithms:
• The primary purpose of supervised learning is to scale the scope of
data and to make predictions of unavailable, future or unseen data
based on labeled sample data.
• Supervised learning is where there are input variables (x) and an
output variable (Y) and an algorithm is used to learn the mapping
function from the input to the output Y = f(x) .
• The goal is to approximate the mapping function so well that when
there comes a new input data (x), the machine should be able to
predict the output variable (Y) for that data.
• Supervised machine learning includes two major
processes: classification and regression.
28/04/2024 Prof. Trupthi Rao, Dept. of AI & DS, GAT 32
Classification is the process which basically categorizes a set of data into classes
(yes/no, true/false, 0/1, yes/no/may be). There are various types of Classification
problems, such as: Binary Classification, Multi-class Classification, Multi-label
Classification. Examples for classification problems are: Spam filtering, Image
classification, Sentiment analysis, Classifying cancerous and non-cancerous
tumors, Customer churn prediction etc.
Dimensionality reduction: Most of the time, there is a lot of noise in the incoming data.
Machine learning algorithms use dimensionality reduction to remove this noise while distilling
the relevant information. Examples: Image compression, classify a database full of emails into
“not spam” and “spam”.
28/04/2024 Prof. Trupthi Rao, Dept. of AI & DS, GAT 41
• unsupervised machine learning, uses machine learning algorithms to
analyze and cluster unlabeled datasets (subsets called clusters). These
algorithms discover hidden patterns or data groupings without the
need for human intervention.
• This method’s ability to discover similarities and differences in
information make it ideal for exploratory data analysis, cross-selling
strategies, customer segmentation, and image and pattern recognition.
• It’s also used to reduce the number of features in a model through the
process of dimensionality reduction. Principal component analysis
(PCA) and singular value decomposition (SVD) are two common
approaches for this. Other algorithms used in unsupervised learning
include neural networks, k-means clustering, and probabilistic
clustering methods.