ML UNIT4
ML UNIT4
How it works: The model learns from a set of examples (training data) where the correct
class is already known. It identifies patterns and features that distinguish each class.
Goal: Once trained, the model can predict the class of new, unseen data.
Examples:
Data Collection & Preparation: Gather and clean your data. This includes handling missing
values, formatting data, and potentially splitting it into training, validation, and test sets.
Model Selection: Choose an appropriate classification algorithm (e.g., k-NN, SVM, Decision
Tree).
Training: Feed the training data to the chosen algorithm. The model learns the relationships
between features and classes.
Evaluation: Assess the model's performance on a separate dataset (validation set or test set).
Metrics like accuracy, precision, recall, and F1-score are used.
Deployment (Optional): Integrate the trained model into a real-world application to make
predictions on new data.
This involves studying various classification algorithms, understanding their strengths and
weaknesses, and knowing when to apply them. The image focuses on k-NN, but here are a few
common ones:
Support Vector Machines (SVM): Effective in high-dimensional spaces, good for complex
datasets.
4. k-Nearest Neighbor
What is it? A supervised learning algorithm used for both classification and regression. It
classifies a new data point based on the classes of its 'k' nearest neighbors in the training
data.
5. Working of k-NN
2. Calculate Distance: Find the distance between the new data point and all points in the
training data (e.g., Euclidean distance).
3. Find Nearest Neighbors: Identify the 'k' points with the smallest distances.
4. Majority Vote: Assign the new data point to the class that is most frequent among its 'k'
neighbors.
Strengths:
Weaknesses:
7. Applications of k-NN
Credit Scoring: Assessing the creditworthiness of individuals based on the history of similar
borrowers.
1. Decision Tree
What is it? A decision tree is a supervised learning algorithm used for both classification and
regression. It's structured like a tree with nodes representing decisions or tests on features,
branches representing the outcomes of those decisions, and leaves representing the final
classifications or predictions.
How it works: The tree is built by recursively partitioning the data based on the features that
best separate the classes or minimize the prediction error.
Analogy: Think of a flowchart where each step asks a question about a feature, leading you
down a different path until you reach a final decision.
Core Idea: The goal is to find the "best" features to split the data at each node. "Best" usually
means maximizing information gain or minimizing impurity (e.g., Gini impurity).
Recursive Process:
4. Repeat steps 2 and 3 for each branch until a stopping criterion is met (e.g., maximum
depth, minimum samples per leaf, or pure nodes).
Process: To classify a new data point, you start at the root node and traverse down the tree
based on the feature values of the data point. At each node, you follow the branch
corresponding to the outcome of the test on that feature. This process continues until you
reach a leaf node, which provides the final classification or prediction.
o High entropy: Indicates a lot of mixing (e.g., equal number of examples from
different classes).
o Low entropy: Indicates that the data points are mostly from one class.
Information Gain: Measures the reduction in entropy achieved by splitting the data based on
a particular feature. The feature with the highest information gain is usually chosen for
splitting at each node.
Common Algorithms:
o ID3 (Iterative Dichotomiser 3): Uses information gain as the splitting criterion (for
categorical features).
o C4.5: An improvement over ID3, handles both categorical and continuous features,
and can deal with missing values.
o CART (Classification and Regression Trees): Uses Gini impurity as the splitting
criterion and can be used for both classification and regression.
2. For each feature, calculate the information gain (or reduction in impurity) obtained
by splitting on that feature.
3. Choose the feature with the highest information gain (or greatest reduction in
impurity).
5. Recursively repeat steps 1-4 for each child node until a stopping criterion is met.
Strengths:
Weaknesses:
Medical Diagnosis: Predicting diseases based on patient symptoms and medical history.
Customer Churn Prediction: Identifying customers who are likely to cancel their services.
Key Idea: Instead of relying on a single decision tree (which can be prone to overfitting),
Random Forest creates a "forest" of trees, each trained on a different subset of the data and
features. The final prediction is made by aggregating the predictions of all the individual
trees.
1. Bootstrap Sampling: Create multiple subsets (samples) of the training data by randomly
sampling with replacement. This means some data points may appear multiple times in a
subset, while others may be left out. Each subset is used to train a different decision tree.
2. Feature Randomness: When building each decision tree, consider only a random subset of
features at each node (rather than all available features). This introduces further diversity
among the trees.
3. Decision Tree Training: Train a decision tree on each of the bootstrapped datasets, using the
selected random features for splitting at each node. These trees are typically not pruned
(fully grown) to maximize their variance.
4. Prediction Aggregation:
o Regression: For regression, the Random Forest averages the predictions of all the
individual trees.
Concept: Since each tree is trained on a different bootstrapped subset of the data, some
data points are "out-of-bag" (not included) in the training set for a particular tree.
Use: These out-of-bag samples can be used to estimate the generalization error of the
Random Forest, similar to how a validation set is used. This is called the out-of-bag error
estimate.
Advantage: Provides a built-in estimate of the model's performance without the need for a
separate validation set.
Strengths:
o High Accuracy: Generally achieves high accuracy due to the ensemble nature and
reduction in overfitting.
o Robustness: Less prone to overfitting compared to individual decision trees.
o Handles High Dimensionality: Can handle datasets with a large number of features.
Weaknesses:
o Memory Usage: Storing a large number of trees can require significant memory.
Natural Language Processing: Tasks like sentiment analysis, text classification, and named
entity recognition.
Bioinformatics: Analyzing gene expression data, predicting protein function, and disease
diagnosis.
Financial Modeling: Predicting stock prices, credit risk assessment, and fraud detection.
Core Idea: SVMs aim to find the optimal hyperplane that maximally separates data points of
different classes.
Hyperplane: A hyperplane is a generalization of a line (in 2D) or a plane (in 3D) to higher
dimensions. It's a decision boundary that separates data points of different classes.
Separating Data: In a binary classification problem, the SVM finds the hyperplane that best
divides the data points into two classes.
Margin: The margin is the distance between the hyperplane and the nearest data points
(support vectors) of each class. A larger margin generally leads to better generalization.
Optimization: SVMs use optimization techniques to find the hyperplane that maximizes the
margin while also correctly classifying as many training examples as possible.
Support Vectors: The data points closest to the hyperplane are called support vectors. They
play a crucial role in defining the hyperplane and are the only points that influence the
optimization process.
Definition: The hyperplane that maximizes the margin between the two classes.
5. Kernel Trick
Solution: The kernel trick maps the data points into a higher-dimensional space where it
becomes possible to find a linear separating hyperplane.
Kernels: Kernel functions are used to perform this mapping implicitly without actually
transforming the data. Common kernels include:
o Radial Basis Function (RBF) Kernel: A very versatile kernel that can capture complex
non-linear relationships.
Strengths:
Weaknesses:
7. Applications of SVM
Image Classification: Identifying objects or categories in images.
1. Discuss Regression
What is it? Regression is a supervised machine learning task where the goal is to predict a
continuous numerical value (as opposed to classification, where the goal is to predict a
category).
How it works: Regression models learn the relationship between independent variables
(features) and a dependent variable (target) to predict the target value for new, unseen data.
Examples:
This involves studying various regression algorithms, understanding their strengths and weaknesses,
and knowing when to apply them. The image focuses on linear regression, but here are a few
common ones:
Simple Linear Regression: Predicts a continuous value based on a single predictor variable,
assuming a linear relationship.
Polynomial Regression: Models a non-linear relationship between the predictors and the
target by introducing polynomial terms.
Support Vector Regression (SVR): Applies the principles of Support Vector Machines to
regression tasks.
Decision Tree Regression: Uses a decision tree to predict continuous values by partitioning
the data space into regions.
Random Forest Regression: An ensemble method that combines multiple decision trees for
regression.
Equation: y = mx + c, where:
Interpretation: The slope (m) represents the change in the dependent variable (y) for a unit
change in the independent variable (x).
Calculation: The slope is calculated using the available data points (training data) by
minimizing the sum of squared errors (least squares method).
2. Calculate Slope (m): Use the least squares method to find the line of best fit and calculate
the slope.
3. Calculate Intercept (c): Use the calculated slope and any data point to find the y-intercept.
Imagine you want to predict ice cream sales (y) based on temperature (x). You collect data on sales
and temperature for several days. Using this data, you calculate the slope and intercept of the simple
linear regression line. Now, given a new temperature, you can plug it into the equation to predict ice
cream sales.
What is it? An extension of simple linear regression where you have multiple independent
variables (x1, x2, ..., xn) affecting the dependent variable (y).
o m1, m2, ..., mn are the coefficients of the respective independent variables.
o c is the y-intercept.
Overfitting: The model learns the training data too well, including noise, and performs
poorly on unseen data.
Underfitting: The model is too simple to capture the underlying patterns in the data and
performs poorly on both training and unseen data.
Outliers: Data points that are far from the other data points can significantly affect the
regression line.
Non-linearity: If the relationship between predictors and the target is not linear, linear
regression models will not perform well.
(This is broader than just regression, but covers both regression and classification)
Regression Applications:
o Predicting house prices, stock prices, sales forecasts, etc. (as mentioned above).
Classification Applications:
o Spam detection.
o Medical diagnosis.