0% found this document useful (0 votes)

26 views

Lecture 3 Ver2

Uploaded by

Abo dahab

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

26 views

Lecture 3 Ver2

Uploaded by

Abo dahab

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 42

CSE 588 - Pattern Recognition

Lecture 3

Dr. Dina Salem

• Introduction to Types of Pattern Recognition Tasks

• Step1: Data collection and preprocessing
• Step5: Model Evaluation
Introduction to Types of Pattern
Recognition Tasks
Types of PR Tasks

• As we discussed earlier, there are two main types of PR tasks according to the type and
method of learning:
1. Supervised Learning
2. Unsupervised Learning
• Adding, there are other types such as:
3. Semi-supervised Learning: A combination of a small amount of labeled data and a large amount
of unlabeled data. (Example: Training a system on a few labeled medical images and a large
dataset of unlabeled ones to improve diagnostic accuracy.)
4. Reinforcement Learning: Feedback from actions (rewards or penalties) but not specific labeled
examples. (Example: a robot learns to navigate a maze by receiving rewards for reaching the goal
and penalties for hitting walls, improving its strategy over time based on this feedback.)
Pattern Recognition Systems can also be divided into several types, depending on
the knowledge needed to be extracted from the available data.
3
1. Classification

• Type of Learning: Supervised

• Knowledge Needed: Labeled data with predefined categories or classes.
• Purpose: Assign new data to one of the predefined classes by learning from labeled training
data.
• Example: Identifying whether an email is spam or not based on learned patterns from labeled
emails.
• Some Algorithms:
1. Decision Trees: Learn rules from labeled data to classify new instances.
2. Support Vector Machines (SVM): Find the optimal boundary (hyperplane) that separates
different classes.
3. Neural Networks (NN): Use layers of neurons to model complex decision boundaries for
classification tasks.
4. k-Nearest Neighbors (k-NN): Classify new data points based on the majority class of their
nearest neighbors in the dataset.

4
2. Clustering

• Type of Learning: Unsupervised

• Knowledge Needed: No labeled data; the algorithm identifies inherent patterns or groupings in
the data.
• Purpose: Discover natural groupings (clusters) of similar data points without predefined
categories.
• Example: Grouping customers into distinct segments based on purchasing behaviors, even if the
segments aren’t predefined.
• Some Algorithms:
1. k-Means: Partitions data into k clusters by minimizing the variance within each cluster.
2. Hierarchical Clustering: Builds a tree of clusters based on distance between data points.
3. DBSCAN (Density-Based Spatial Clustering of Applications with Noise): Clusters data based on
density, identifying core points, and outliers

5
3. Regression

• Type of Learning: Supervised

• Knowledge Needed: Labeled data with continuous target values (instead of classes).
• Purpose: Predict continuous values based on input features by learning from labeled training
data.
• Example: Predicting house prices based on patterns related to square footage, location, and
other features.
• Some Algorithms:
1. Linear Regression: Models the relationship between input variables and a continuous output by
fitting a linear equation.
2. Polynomial Regression: Extends linear regression to model non-linear relationships.
3. Support Vector Regression (SVR): Similar to SVM, but predicts continuous output instead of
classifying data points.

6
4. Dimensionality Reduction

• Type of Learning: Unsupervised

• Knowledge Needed: Data without explicit labels but with patterns to reduce redundancy or
find the most informative features.
• Purpose: Reduce the number of features in the data while retaining the most important
structures or patterns.
• Example: Simplifying a dataset of images for facial recognition to reduce the number of pixels
analyzed, while still capturing the most important features needed for recognizing faces.
• Some Algorithms:
1. Principal Component Analysis (PCA): Identifies the directions (principal components) along
which the data has the most variance, reducing the dimensionality of the data.
2. t-SNE (t-Distributed Stochastic Neighbor Embedding): Reduces dimensionality while preserving
the relationships between data points for visualization
7
5. Outlier Detection

• Type of Learning: Supervised or Unsupervised

• Knowledge Needed: In unsupervised cases, patterns of normal behavior are inferred from the
data; in supervised cases, labeled data contains examples of outliers or anomalies.
• Purpose: Identify data points that deviate significantly from the majority, indicating potential
anomalies or noise.
• Example: Identifying unusual sensor readings in an industrial machine by detecting values that
fall far outside the normal operating range.
• Some Algorithms:
1. Isolation Forest: Constructs trees to isolate anomalies from the rest of the data based on their
uniqueness.
2. One-Class SVM: Learns a model of the "normal" class and flags instances that don't fit as
outliers.
3. LOF (Local Outlier Factor): Measures the local density deviation of data points to identify
anomalies.

8
6. Association Rule Learning

• Type of Learning: Unsupervised

• Knowledge Needed: No explicit labels; the algorithm identifies patterns of co-occurrence or
frequent associations between variables.
• Purpose: Discover relationships between variables, typically in transactional datasets.
• Example: In a retail store, discovering that customers who buy bread often also buy butter.
• Some Algorithms:
1. Apriori Algorithm: Finds frequent itemsets and generates association rules based on those
itemsets.
2. FP-Growth (Frequent Pattern Growth): A more efficient algorithm for finding frequent itemsets
without candidate generation.
3. Eclat: Uses a depth-first search strategy to find frequent itemsets for association rule learning

9
Other Types (Assignment)

• Other Types include:

1. Density Estimation Algorithms (Unsupervised Learning)
2. Sequence Analysis Algorithms (Supervised or Unsupervised Learning)
3. Topic Modeling Algorithms (Unsupervised Learning)
4. Reinforcement Learning Algorithms (Knowledge Gained from Interaction)

TASK: Write the knowledge needed, purpose, some algorithms with theory of operation
in one sentence and give an example for each of the previous PR tasks.

10
Data Collection and Preprocessing

In a Pattern Recognition (PR) system, data collection and preprocessing are the
foundation steps that directly impact the success of pattern recognition tasks.
Next is a brief explanation of each step along with examples of algorithms used.
Definition: Data collection involves gathering the raw data required for training
and testing the pattern recognition model. The quality and quantity of this data
are critical for the effectiveness of the system.
DATA COLLECTION

Sensor Data: Collected from physical sensors (e.g., camera images, audio recordings, biometric
sensors).
Types of Data: Historical Data: Data from previous instances stored in databases, often used for supervised learning
tasks (e.g., medical records, transaction logs).
Generated Data: Synthetic data created for specific needs (e.g., simulated datasets for rare events).

Challenges: Ensuring data is representative, diverse, and large enough to cover

the variations in patterns.

Manual Data Entry: Human-driven data entry processes.

Automated Sensors: Devices like cameras, microphones, or IoT sensors that
Tools/Techniques: automatically collect data.
Web Scraping: Algorithms to extract data from websites or online resources.
Definition: Preprocessing refers to
transforming raw data into a clean and
structured format suitable for pattern
recognition algorithms. It is essential to
reduce noise, handle missing values,
Pre- normalize, and enhance data quality.

Processing Importance: Preprocessing ensures that

the data used in the PR system is clean,
consistent, and optimized for the
algorithms, leading to better pattern
recognition accuracy and performance.
1. Data Cleaning: Removing or fixing incomplete,
inconsistent, or noisy data.

• Example Algorithm: Outlier removal techniques (e.g., Z-score for outlier

detection).

Pre- 2. Data Transformation: Converting data into a suitable

format, such as normalizing numerical values or encoding
Processing categorical variables.
• Example Algorithm: Min-Max Normalization (scales data to a range
Key Steps [0,1]) or Z-score normalization.

3. Dimensionality Reduction: Reducing the number of

features or variables in the dataset while retaining
important information.
• Example Algorithms: Principal Component Analysis (PCA) and t-SNE (t-
distributed Stochastic Neighbor Embedding).
4. Feature Extraction/Selection: Identifying
and selecting the most relevant features for
pattern recognition to reduce redundancy
and computational complexity.
Pre-Processing • Example Algorithms: Lasso regression (for feature
selection), Mutual Information.
Key Steps 5. Handling Missing Data: Dealing with
(cont.) incomplete data by either imputing values
or removing instances with missing
attributes.
• Example Algorithms: K-Nearest Neighbor (KNN)
Imputation (predicting missing values),
Mean/Median Imputation.
Examples of Pre-Processing Algorithms
1. Image • Gaussian Smoothing: Reduces noise in images.

Preprocessing: • Histogram Equalization: Enhances contrast in images.

• Tokenization: Splits text into words or sentences.

2. Text Data • Stop Word Removal: Removes common but unimportant words (e.g.,
"the", "is").
Preprocessing: • Stemming/Lemmatization: Reduces words to their base form (e.g.,
"running" to "run").

3. Time-Series Data • Resampling: Adjusts the frequency of time-series data

(e.g., from daily to monthly).
Preprocessing: • Smoothing: Applies moving averages to reduce volatility.
Evaluation

• How predictive is the model we learned?

• Error on the training data is not a good indicator of performance on future data
Q: Why?
A: Because new data will probably not be exactly the same as the training data!
• Overfitting – fitting the training data too precisely - usually leads to poor results on new
data
Evaluation issues
Possible evaluation measures:

• Classification Accuracy
• Total cost/benefit – when different errors involve different costs
• Lift and ROC curves
• Error in numeric predictions

How reliable are the predicted results ?

18
Classifier error rate

Natural performance measure for classification problems: error rate

• Success: instance’s class is predicted correctly
• Error: instance’s class is predicted incorrectly
• Error rate: proportion of errors made over the whole set of instances
Training set error rate: is way too optimistic!
• you can find patterns even in random data

19
Evaluation on “LARGE” data
• If many (thousands) of examples are available, including several hundred
examples from each class, then a simple evaluation is sufficient
• Randomly split data into training and test sets (usually 2/3 for train, 1/3
for test)
• Build a classifier using the train set and evaluate it using the test set.

20
Classification Step 1:
Split data into train and test sets
THE PAST
Results Known

+
+ Training set
-
-
+
Data

Testing set

21
Classification Step 2:
Build a model on a training set
THE PAST
Results Known

+
+ Training set
-
-
+
Data

Model Builder

Testing set

22
Classification Step 3:
Evaluate on test set (Re-train?)
Results Known
+
+ Training set
-
-
+
Data

Model Builder
Evaluate
Predictions
+
Y N
-
+
Testing set -

23
Handling unbalanced data
• Sometimes, classes have very unequal frequency
• Attrition prediction: 97% stay, 3% attrite (in a month)
• medical diagnosis: 90% healthy, 10% disease
• eCommerce: 99% don’t buy, 1% buy
• Security: >99.99% of Americans are not terrorists
• Similar situation with multiple classes
• Majority class classifier can be 97% correct, but useless

24
Balancing unbalanced data
• With two classes, a good approach is to build BALANCED train and test sets, and train
model on a balanced set
• randomly select desired number of minority class instances
• add equal number of randomly selected majority class
• Generalize “balancing” to multiple classes
• Ensure that each class is represented with approximately equal proportions in train
and test

25
A note on parameter tuning
• It is important that the test data is not used in any way to create the classifier
• Some learning schemes operate in two stages:
• Stage 1: builds the basic structure
• Stage 2: optimizes parameter settings
• The test data can’t be used for parameter tuning!
• Proper procedure uses three sets: training data, validation data, and test data
• Validation data is used to optimize parameters

26
Once evaluation is complete, all the
data can be used to build the final
classifier

Making the Generally, the larger the training

most of the data the better the classifier (but
returns diminish)
data
The larger the test data the more
accurate the error estimate

27
Classification:
Train, Validation, Test split
Results Known
+
Model
+ Training set Builder
-
-
+
Data
Evaluate
Model Builder
Predictions
+
-
Y N +
Validation set -

+
- Final Evaluation
+
Final Test Set Final Model -
28
Evaluating Classification & Predictive
Performance

29
Multiple methods are available to
classify or predict

Why For each method, multiple

Evaluate? choices are available for settings

To choose best model, need to

assess each model’s performance

30
Accuracy Measures (Classification)

31
Misclassification error

Error = classifying a Error rate = percent of

record as belonging to misclassified records
one class when it out of the total
belongs to another records in the
class. validation data
32
“High separation of records” means
that using predictor variables
attains low error
Separation of
Records
“Low separation of records” means
that using predictor variables does
not improve much on naïve rule

33
• 201 1’s correctly classified as “1”

Confusion • 85 1’s incorrectly classified as “0”

• 25 0’s incorrectly classified as “1”

Matrix • 2689 0’s correctly classified as “0”

Classification Confusion Matrix

Predicted Class
Actual Class 1 0
1 201 85
0 25 2689
34
35
Confusion matrix glossary
• In a 2-class problem where the class is either C or not C the confusion
matrix looks like this:
Classifier Output
True Class C not C

C TP FN

not C FP TN

• TP is the number of true positives. It’s a C, and classifier output is C

• FN is the number of false negatives. It’s a C, and classifier output is not C.
• TN is the number of true negatives. It’s not C, and classifier output is not C.
• FP is the number of false positives. It’s not C, and classifier output is C.
36
Accuracy: The accuracy of a measurement is how
close a result comes to the true value. Acc. = no of
correct classified patterns/ total no of patterns =
TP+TN/(TP+TN+FP+FN)
Error Rate: (sum of misclassified records)/(total
records) = (FP+FN)/(TP+TN+FP+FN)
Performance
measurements Sensitivity (True Positive Rate TPR): measures the
proportion of positives that are correctly identified.
TPR=TP/P =TP/(TP+FN)

Specificity (True Negative Rate TNR): measures the

proportion of negatives that are correctly identified.
TNR=TN/N =TN/(TN+FP)

37
Example
Using the specified confusion matrix Classification Confusion Matrix
calculate: Predicted Class
1- Accuracy
Actual Class 1 0
2- Error Rate
3- Sensitivity 1 201 85
4- Specificity 0 25 2689

Solution:
TP=201, FP=25, TN=2689, FN= 85
Overall error rate = (25+85)/3000 = 3.67%
Accuracy = 1 – err = (201+2689) = 96.33%
Sensitivity = 201/(201+85) = 68.14%
Specificity = 2689/(2689+25) = 99.08%
38
Most DM algorithms classify via a 2-step process:
For each record,
1. Compute probability of belonging to class “1”
2. Compare to cutoff value, and classify
Cutoff for accordingly

classification • Default cutoff value is 0.50

If >= 0.50, classify as “1”
If < 0.50, classify as “0”
• Can use different cutoff values
• Typically, error rate is lowest for cutoff = 0.50

39
Cutoff Actual Class
1
Prob. of "1"
0.996
Actual Class
1
Prob. of "1"
0.506
Table 1
1
0.988
0.984
0
0
0.471
0.337
1 0.980 1 0.218
• If cutoff is 0.50: 13 records 1 0.948 0 0.199
are classified as “1” 1 0.889 0 0.149
• If cutoff is 0.80: seven 1 0.848 0 0.048
records are classified as
0 0.762 0 0.038
“1”
1 0.707 0 0.025
1 0.681 0 0.022
1 0.656 0 0.016
0 0.622 0 0.004

40
Cut off Prob.Val. for Success (Updatable) 0.25

Confusion Classification Confusion Matrix

Matrix for Actual Class

Predicted Class

owner non-owner

Different owner
non-owner
11
4
1
8

Cutoffs Cut off Prob.Val. for Success (Updatable) 0.75

Classification Confusion Matrix

Predicted Class

Actual Class owner non-owner

owner 7 5
non-owner 1 11

41
Assignment: Prepare a report to:
• Discuss the theory of operation of discriminant analysis classifier including:
1. Idea of operation
2. Mathematical formulation
3. Advantages and disadvantages
4. Python instruction for using this classifier with a detailed description of its
function including all input and output parameters
• Write python instructions used in data splitting and constructing confusion matrix with
detailed explanation of the used functions
• Use any of available datasets in python to write a full program that classifies this dataset
using discriminant analysis classifier. Split the data into 70% training and construct the
resulting confusion matrix. Obtain as many performance measurements as you can. Each
step in your program should be accompanied by a screenshot of its output.

Depression PDF
No ratings yet
Depression PDF
210 pages
Machine Learning
No ratings yet
Machine Learning
54 pages
Unit 5 Pattern Recognition
No ratings yet
Unit 5 Pattern Recognition
10 pages
Assignment No 1
No ratings yet
Assignment No 1
9 pages
What Are The Basic Concepts in Machine Learning
No ratings yet
What Are The Basic Concepts in Machine Learning
3 pages
Unit 3
No ratings yet
Unit 3
33 pages
Asign-3 DWDM
No ratings yet
Asign-3 DWDM
27 pages
Pattern recognition unit 2
No ratings yet
Pattern recognition unit 2
24 pages
Machine Learning Case Study
No ratings yet
Machine Learning Case Study
8 pages
Types of Machine Learning
No ratings yet
Types of Machine Learning
5 pages
Semi Supervised Learning
No ratings yet
Semi Supervised Learning
86 pages
387d7226-8e90-4a6c-802e-d4382964c288 Machine Learning and Data Analytics Frameworks
No ratings yet
387d7226-8e90-4a6c-802e-d4382964c288 Machine Learning and Data Analytics Frameworks
34 pages
ML Unit 1
No ratings yet
ML Unit 1
21 pages
Unit Iii
No ratings yet
Unit Iii
10 pages
part DATAMINIG
No ratings yet
part DATAMINIG
23 pages
ML Lect1
100% (1)
ML Lect1
51 pages
Machine%20learning.pdf
No ratings yet
Machine%20learning.pdf
4 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
64 pages
Lecture Notes on Machine Learning Concepts.docx
No ratings yet
Lecture Notes on Machine Learning Concepts.docx
5 pages
Machine Learning concise notes
No ratings yet
Machine Learning concise notes
7 pages
Machine Learning
No ratings yet
Machine Learning
5 pages
CS3491 - Aiml - Unit Iii Supervised Learning
No ratings yet
CS3491 - Aiml - Unit Iii Supervised Learning
162 pages
Unit 1
No ratings yet
Unit 1
52 pages
DWDM R13 Unit 1 PDF
No ratings yet
DWDM R13 Unit 1 PDF
10 pages
Decision Region vs
No ratings yet
Decision Region vs
4 pages
Module 3
No ratings yet
Module 3
11 pages
MLANS
No ratings yet
MLANS
26 pages
Unit 4 AI LASK
No ratings yet
Unit 4 AI LASK
7 pages
FAM_QUESTION_BANK_CT[1]
No ratings yet
FAM_QUESTION_BANK_CT[1]
14 pages
MachineLearning
No ratings yet
MachineLearning
16 pages
document4 (1)
No ratings yet
document4 (1)
2 pages
Computer 1st to 3rd unit
No ratings yet
Computer 1st to 3rd unit
22 pages
Machine Learning Concept1
No ratings yet
Machine Learning Concept1
16 pages
Data Mining unit-1 complete
No ratings yet
Data Mining unit-1 complete
45 pages
Basic of Machine Learning
No ratings yet
Basic of Machine Learning
7 pages
ds unit 2
No ratings yet
ds unit 2
36 pages
AI PROJECT CYCLE EASY NOTES
No ratings yet
AI PROJECT CYCLE EASY NOTES
7 pages
Unit-2 Introduction To Data Mining
100% (1)
Unit-2 Introduction To Data Mining
11 pages
Unit 5 big data
No ratings yet
Unit 5 big data
14 pages
AI and DS QB1
No ratings yet
AI and DS QB1
31 pages
Overview of Machine Learning
No ratings yet
Overview of Machine Learning
49 pages
ML
No ratings yet
ML
17 pages
Machine Learning Is A Branch of Artificial Intelligence (AI)
No ratings yet
Machine Learning Is A Branch of Artificial Intelligence (AI)
80 pages
AI(Part-II)
No ratings yet
AI(Part-II)
11 pages
ML
No ratings yet
ML
3 pages
Data Mining Simran
No ratings yet
Data Mining Simran
128 pages
Decision Tree
No ratings yet
Decision Tree
16 pages
BI SHORT NOTES
No ratings yet
BI SHORT NOTES
15 pages
Introduction to Machine Learning Algorithms
No ratings yet
Introduction to Machine Learning Algorithms
3 pages
Unit I
No ratings yet
Unit I
19 pages
Unit 1 Data Mining
No ratings yet
Unit 1 Data Mining
16 pages
Manual Data
No ratings yet
Manual Data
13 pages
Pattern Recognition
No ratings yet
Pattern Recognition
9 pages
CH 2
No ratings yet
CH 2
37 pages
DMDW Qa-4
No ratings yet
DMDW Qa-4
14 pages
4_Unit 2 - Lecture 1 Types of DataSet-L1
No ratings yet
4_Unit 2 - Lecture 1 Types of DataSet-L1
17 pages
ml unit 2
No ratings yet
ml unit 2
23 pages
AI Unit V and II PPT
No ratings yet
AI Unit V and II PPT
40 pages
Module2 ch2
No ratings yet
Module2 ch2
36 pages
Introduction to Machine Learning
No ratings yet
Introduction to Machine Learning
3 pages
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
César Pérez López
No ratings yet
DOC-20231104-WA0024.
No ratings yet
DOC-20231104-WA0024.
3 pages
Mostafa_Nasser-Sayed_Resume_21-10-2023-02-51-49-1
No ratings yet
Mostafa_Nasser-Sayed_Resume_21-10-2023-02-51-49-1
1 page
Yasser Mohamed CV
No ratings yet
Yasser Mohamed CV
1 page
A Threat Actor Encyclopedia
No ratings yet
A Threat Actor Encyclopedia
275 pages
Peter's Resume
No ratings yet
Peter's Resume
2 pages
Opening For Security Manager
No ratings yet
Opening For Security Manager
2 pages
Yehia Cyber
No ratings yet
Yehia Cyber
1 page
Summary
No ratings yet
Summary
15 pages
Computer Exam
No ratings yet
Computer Exam
8 pages
Sheet 1
No ratings yet
Sheet 1
1 page
هه
No ratings yet
هه
6 pages
CSE491 Assignment 1
No ratings yet
CSE491 Assignment 1
17 pages
Lecture 2
No ratings yet
Lecture 2
34 pages
Comm 602 L2
No ratings yet
Comm 602 L2
24 pages
Quiz 3-Spring 2022-Solved
No ratings yet
Quiz 3-Spring 2022-Solved
3 pages
اسئله سنوات دوائر كهربائيه 2
No ratings yet
اسئله سنوات دوائر كهربائيه 2
17 pages
Pathophysiology of Hydrocephalus
No ratings yet
Pathophysiology of Hydrocephalus
8 pages
1719397514 R2X Foundationa Classes
No ratings yet
1719397514 R2X Foundationa Classes
2 pages
Why Did Freud Turn To Pschyology To Treat His Patients?
No ratings yet
Why Did Freud Turn To Pschyology To Treat His Patients?
9 pages
Anshika Gupta Resume
No ratings yet
Anshika Gupta Resume
2 pages
Internship Coordinator Brochure
No ratings yet
Internship Coordinator Brochure
2 pages
Part A: Listening Preparation: Match The Vocabulary With The Correct Definition and Write Your Answer in The Blank
No ratings yet
Part A: Listening Preparation: Match The Vocabulary With The Correct Definition and Write Your Answer in The Blank
2 pages
101 Motivation Quotes
No ratings yet
101 Motivation Quotes
28 pages
PB - I Physics Xii 2023-24
No ratings yet
PB - I Physics Xii 2023-24
7 pages
Reflection:: Bulacan State University College of Education Pulilan Extension
80% (5)
Reflection:: Bulacan State University College of Education Pulilan Extension
2 pages
MANIPAL UNIVERSITY JAIPUR (All Forms For PH.D.)
No ratings yet
MANIPAL UNIVERSITY JAIPUR (All Forms For PH.D.)
26 pages
ANNIE S MANUEL Accomplishment Report BSP GSP Encampment 1-20-2024
No ratings yet
ANNIE S MANUEL Accomplishment Report BSP GSP Encampment 1-20-2024
6 pages
Cathie Lesson Plan Final
No ratings yet
Cathie Lesson Plan Final
10 pages
Case Synthesis
No ratings yet
Case Synthesis
37 pages
BDK Prof Suroso
No ratings yet
BDK Prof Suroso
15 pages
Christ Acknowledgement
No ratings yet
Christ Acknowledgement
2 pages
Joining Application Form - Hrd-di-14
No ratings yet
Joining Application Form - Hrd-di-14
6 pages
Sample Lecture Exam Questions
No ratings yet
Sample Lecture Exam Questions
2 pages
ADHD Practice Guidelines
No ratings yet
ADHD Practice Guidelines
158 pages
DS-WK-11-Lec-21-22 Asg-10
No ratings yet
DS-WK-11-Lec-21-22 Asg-10
7 pages
Interactive Discussion and Recitation and Posttest
No ratings yet
Interactive Discussion and Recitation and Posttest
3 pages
Prisma
No ratings yet
Prisma
21 pages
Portfolio EAPP
No ratings yet
Portfolio EAPP
20 pages
A.2.2. Assessment Criteria
No ratings yet
A.2.2. Assessment Criteria
2 pages
MODULE 2 (Lessons 4-6)
No ratings yet
MODULE 2 (Lessons 4-6)
29 pages
Problem-Based Learning
67% (3)
Problem-Based Learning
63 pages
NGO Workshop Application Form
0% (1)
NGO Workshop Application Form
5 pages
Dinah B. Tonog V. Court of Appeals G.R. No. 122906 February 7, 2002
No ratings yet
Dinah B. Tonog V. Court of Appeals G.R. No. 122906 February 7, 2002
1 page
Spelling Lesson
No ratings yet
Spelling Lesson
4 pages
Hewson 2017
No ratings yet
Hewson 2017
17 pages

Lecture 3 Ver2

Uploaded by

Lecture 3 Ver2

Uploaded by

CSE 588 - Pattern Recognition

Dr. Dina Salem

• Introduction to Types of Pattern Recognition Tasks

• Type of Learning: Supervised

• Type of Learning: Unsupervised

• Type of Learning: Supervised

• Type of Learning: Unsupervised

• Type of Learning: Supervised or Unsupervised

• Type of Learning: Unsupervised

• Other Types include:

Challenges: Ensuring data is representative, diverse, and large enough to cover

Manual Data Entry: Human-driven data entry processes.

Processing Importance: Preprocessing ensures that

• Example Algorithm: Outlier removal techniques (e.g., Z-score for outlier

Pre- 2. Data Transformation: Converting data into a suitable

3. Dimensionality Reduction: Reducing the number of

Preprocessing: • Histogram Equalization: Enhances contrast in images.

• Tokenization: Splits text into words or sentences.

3. Time-Series Data • Resampling: Adjusts the frequency of time-series data

• How predictive is the model we learned?

How reliable are the predicted results ?

Natural performance measure for classification problems: error rate

Making the Generally, the larger the training

Why For each method, multiple

To choose best model, need to

Error = classifying a Error rate = percent of

Confusion • 85 1’s incorrectly classified as “0”

Matrix • 2689 0’s correctly classified as “0”

Classification Confusion Matrix

• TP is the number of true positives. It’s a C, and classifier output is C

Specificity (True Negative Rate TNR): measures the

classification • Default cutoff value is 0.50

Confusion Classification Confusion Matrix

Matrix for Actual Class

Cutoffs Cut off Prob.Val. for Success (Updatable) 0.75

Classification Confusion Matrix

Actual Class owner non-owner

You might also like