0% found this document useful (0 votes)
7 views

Intro to Machine learning

Machine Learning (ML) is a subset of Artificial Intelligence (AI) that includes Supervised and Unsupervised Learning. Supervised Learning uses labeled data to predict outcomes, while Unsupervised Learning analyzes unlabeled data to find hidden patterns. Effective model training requires proper data splitting and performance evaluation metrics to ensure accuracy and generalization.

Uploaded by

easyupload999
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

Intro to Machine learning

Machine Learning (ML) is a subset of Artificial Intelligence (AI) that includes Supervised and Unsupervised Learning. Supervised Learning uses labeled data to predict outcomes, while Unsupervised Learning analyzes unlabeled data to find hidden patterns. Effective model training requires proper data splitting and performance evaluation metrics to ensure accuracy and generalization.

Uploaded by

easyupload999
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 13

Supervised and Unsupervised

Machine Learning

Understanding Training Sets, Test Sets, and Performance Metrics


• Machine Learning (ML) is a
subset of Artificial
Intelligence (AI) that
enables systems to learn
patterns from data.
• Two primary types:
Supervised Learning and
Unsupervised Learning
Introduction to Machine
Learning

Machine Learning (ML) is a subset of Artificial Intelligence (AI) that enables systems to learn patterns
from data.
Two primary types: Supervised Learning and Unsupervised Learning
Supervised learning

Definition: The model learns from


labeled data (input-output pairs).

Goal: Map inputs to the correct output


using past examples.

Examples:
• Spam detection (emails labeled as
spam or not spam)
• Image classification (dog vs. cat)
• Predicting hotel prices
Unsupervised learning
Unsupervised learning is particularly useful
when dealing with data that lacks clear
labels or predefined categories.

1. Handling unlabelled data


2. Discover hidden patterns
3. Real world applications:
• Marketing, healthcare, and finance
• unsupervised learning can segment
data or detect anomalies,
• Valuable tool for data analysis and
decision-making.
Supervised Learning Workflow
1. Collect labeled data.
2. Split into training set and test
set.
3. Train the model using the
training set.
4. Evaluate performance on the
test set.
5. Optimize and deploy the model.
Training Set vs. Test Set

1. Training Set: Used to train the machine learning


model.
2. Test Set: Used to evaluate model performance
on unseen data.
3. Typical split: 70% Training, 30% Test or 80%
Training, 20% Test
K-Fold cross validation
1. Divide the entire dataset into K
equally sized folds or segments
2. For each unique group treat it as
a test set
3. Remaining groups are considered
as train test
Overfitting vs Underfitting
Overfitting: A model learns the training data
too well, including noise and random
fluctuations, leading to poor generalization on
unseen data.

Symptoms:
1. High accuracy on training data but low
accuracy on test data.
2. The model captures noise as if it were a
pattern.

Underfitting: A model is too simple to capture Symptoms:


the underlying patterns in the data, leading to 1. Low accuracy on both training and test data.
poor performance on both training and test 2. Model fails to capture important patterns.
data.
Performance evaluation metrics for
classification
A confusion matrix is a performance evaluation
tool for classification models. It helps visualize
how well a model's predictions match the actual
class labels.

1. True Positive (TP): Correctly predicted


positive cases.
2. False Positive (FP) (Type I Error): Incorrectly
predicted positive cases (actual negative).
3. False Negative (FN) (Type II Error):
Incorrectly predicted negative cases (actual
positive).
4. True Negative (TN): Correctly predicted
negative cases.
Metrics Derived from the Confusion
Matrix
Common regression metrics
Conclusions
• Supervised Learning is best for predictive tasks with labeled data.
• Unsupervised Learning helps discover hidden structures in unlabeled
data.
• Proper data splitting and performance evaluation are crucial for
model effectiveness.

You might also like