ML Supervised Learning Unit 3
ML Supervised Learning Unit 3
VI Semester
MACHINE LEARNING
Course Code: CA-C27T
Supervised learning
Supervised learning is a type of machine learning where the model learns from labeled data,
meaning each training example consists of input data (features) along with their corresponding
correct output (labels). The goal of supervised learning is to learn a mapping from input to output
based on the available labeled data, such that the model can make accurate predictions on new,
unseen data.
Types of Supervised Learning:
Classification:
• Classification is a type of supervised learning where the goal is to predict the
category or class label of input data.
• The output variable is discrete and categorical, with a finite number of
possible values or classes.
• Example: Email spam detection, sentiment analysis, image classification.
Regression:
• Regression is a type of supervised learning where the goal is to predict a
continuous output variable based on input features.
• The output variable is numerical, representing a quantity or value along a
continuous scale.
• Example: Predicting house prices, stock prices, temperature forecasting, salary
of person.
Classification
Binary Classifier:
A binary classifier is a type of classification model that predicts two possible outcomes or
classes for each input instance. The output variable is binary, meaning it has only two
possible values.
Examples:
Logistic Regression: A popular linear model used for binary classification tasks.
Support Vector Machines (SVM): Effective for separating two classes with a hyperplane in the
feature space.
Decision Trees: Can be used for binary classification by splitting the feature space into regions
corresponding to each class.
Multi-class Classifier:
A multi-class classifier is a type of classification model that predicts multiple possible
outcomes or classes for each input instance. The output variable can have more than two
possible values.
Examples:
1. Random Forest: Can handle multi-class classification tasks by combining multiple decision
trees.
2. K-Nearest Neighbors (KNN): Can be used for multi-class classification by assigning the
majority class among the K nearest neighbors.
Lazy Learner (Instance-based Learning):
Lazy learners, also known as instance-based learners, delay the process of
learning until a new instance needs to be classified. They store the training
instances and perform classification based on similarity measures between the
new instance and the stored instances.
Example:
1. K-Nearest Neighbors (KNN): A lazy learning algorithm that stores all
instances of the training data and classifies new instances based on the
majority class among its nearest neighbors.
Eager Learner (Model-based Learning):
Eager learners, also known as model-based learners, construct a generalized
model from the training data during the learning phase. This model is then
used to make predictions on new instances without requiring the entire training
data.
Example:
1. Decision Trees: An eager learning algorithm that constructs a tree-based
model by recursively partitioning the feature space based on the training data.
Classsification learning steps.
1. Data Collection and Preprocessing:
1. Gather labeled data, where each instance is associated with a class label. Preprocess the data by handling missing values, outliers,
and scaling features if necessary.
2. Feature Selection and Engineering:
1. Select relevant features that contribute to the predictive task. Create new features or transform existing ones to enhance the model's
performance.
3. Model Selection:
1. Choose an appropriate classification algorithm based on the nature of the problem, size of the dataset, and computational resources
available.
4. Training the Model:
1. Train the selected model on the labeled training data. The model learns to identify patterns and relationships between input features
and class labels.
5. Model Evaluation:
1. Evaluate the trained model's performance using metrics such as accuracy, precision, recall, and F1-score on a separate validation set
or through cross-validation.
6. Hyperparameter Tuning:
1. Fine-tune the model's hyperparameters to optimize its performance. This may involve grid search, random search, or other
optimization techniques.
7. Testing and Deployment:
1. Assess the model's performance on unseen data using the test set. If satisfactory, deploy the model into production for real-world use.
K-Nearest Neighbors (KNN)
K-Nearest Neighbors (KNN) is a simple and effective classification algorithm
used for both binary and multi-class classification tasks. It's a type of lazy
learning algorithm that stores all instances of the training data and makes
predictions for new instances based on their similarity to the nearest neighbors.
When knn can use.
KNN working steps
The K-Nearest Neighbors (KNN) algorithm is a simple and intuitive classification
algorithm. Here are the steps involved in its working:
Initialize the Training Data:
• Store all instances of the training data along with their corresponding class
labels
Calculate Distance:
• For a new instance (datapoint), calculate the distance between this datapoint
and all other datapoints in the training set.
• Common distance metrics include
• A. Euclidean distance,
• B. Manhattan distance,
• C. Minkowski distance.
Select Nearest Neighbors:
• Choose the K nearest neighbors to the new instance based on the calculated
distances. K is a predefined hyperparameter.
Majority Voting:
• For classification tasks, count the class labels of the K nearest neighbors.
• Assign the most common class label among the K nearest neighbors as the
predicted class for the new instance.
Prediction:
• Once the majority class label (or average value for regression) is determined,
assign it as the prediction for the new instance.
Hyperparameter Tuning:
1. Experiment with different values of K and choose the one that gives the best
performance on a validation set or through cross-validation.
Model Deployment:
2. Deploy the trained KNN model into production for making predictions on new, unseen
data.
Advantages of K-Nearest Neighbors (KNN):
1. Simple and Intuitive: KNN is easy to understand and implement, making it
suitable for beginners in machine learning.
2. Non-parametric: KNN doesn't make any assumptions about the underlying
data distribution, making it robust to noisy data and outliers.
3. Versatile: KNN can be applied to both classification and regression tasks,
making it a versatile algorithm.
4. No Training Phase: KNN is a lazy learner, meaning it doesn't require a training
phase. The model is simply memorizing the training data, making it
computationally efficient during training.
5. Adaptability to New Data: KNN can easily adapt to new data without needing
to retrain the model.
Disadvantages of K-Nearest Neighbors (KNN):
1. Computationally Expensive: KNN requires calculating distances between
the new instance and all instances in the training data, which can be
computationally expensive, especially for large datasets.
2. High Memory Requirement: Since KNN stores all training instances in
memory, it can require a significant amount of memory, especially for large
datasets.
3. Sensitive to Irrelevant Features: KNN is sensitive to irrelevant features
and can be biased towards features with higher magnitudes.
4. Need for Optimal K: The choice of the hyperparameter K (number of
neighbors) significantly affects the model's performance. Finding the optimal
value of K can be challenging and may require experimentation.
Applications:
1. Classification:
1. KNN is commonly used for classification tasks such as email spam detection, sentiment
analysis, and image recognition.
2. Regression:
1. KNN can be applied to regression problems such as predicting house prices, stock prices,
and demand forecasting.
3. Recommendation Systems:
1. KNN-based collaborative filtering algorithms are used in recommendation systems to
suggest items to users based on their similarity to other users.
4. Anomaly Detection:
1. KNN can be used for anomaly detection in cybersecurity, fraud detection, and network
intrusion detection.
5. Medical Diagnosis:
1. KNN is utilized in medical diagnosis to classify diseases based on patient symptoms and
medical history.
KNN PROGRAM
This program does the following:
1. Loads the Iris dataset using load_iris() from scikit-learn.
2. Splits the dataset into features (X) and target (y).
3. Splits the data into training and testing sets using train_test_split().
4. Scales the features using StandardScaler() to ensure mean=0 and
variance=1.
5. Initializes the KNN classifier with a specified number of neighbors (k).
6. Trains the KNN classifier on the scaled training data.
7. Makes predictions on the scaled testing data.
8. Evaluates the accuracy of the classifier using accuracy_score().
Decision trees
1. Root Node:
1. The topmost node of the decision tree, representing the feature that best splits the dataset into subsets. It has
no incoming edges.
2. Leaf Node (Terminal Node):
1. Nodes at the bottom of the decision tree that do not split further. Each leaf node represents a class label (in
classification) or a predicted value (in regression).
Decision Rule:
2. The rule or condition based on which the dataset is split at each node. It involves comparing the value of a
feature with a threshold.
Impurity:
3. A measure of the disorder or uncertainty in a dataset. Decision trees aim to minimize impurity at each node to
make the splits more informative.
Gini Impurity:
4. A measure of impurity used in classification tasks. It calculates the probability of misclassifying a randomly
chosen data point if it were labeled according to the distribution of classes in the subset.
Entropy:
5. Another measure of impurity is used in classification tasks. It quantifies the uncertainty in a dataset's class
distribution.
Pruning:
6. The process of removing nodes or branches from the decision tree to prevent overfitting. Pruning helps simplify
the tree and improve its generalization ability.
Working Principle:
1. Splitting Criteria:
1. Decision trees recursively split the data into subsets based on the values of features. The
algorithm selects the best feature and split point that maximizes the homogeneity (or purity) of
the subsets.
2. Tree Construction:
1. Starting from the root node, the algorithm iteratively splits the data at each node based on the
selected splitting criteria until a stopping criterion is met. This process continues until the
subsets are pure (all instances belong to the same class) or the tree reaches a predefined depth.
3. Decision Rules:
1. At each internal node, the decision tree applies a decision rule based on a feature value.
4. Leaf Nodes:
1. When a stopping criterion is met (e.g., maximum depth reached, no further improvement in
purity), the algorithm creates a leaf node representing the majority class (in classification) or the
average value (in regression) of the instances in that subset.
5. Prediction:
1. To make predictions for new instances, the decision tree traverses from the root node down to a
leaf node based on the decision rules defined at each node. The predicted class or value at the
leaf node is assigned to the new instance.
Advantages of Decision Trees:
1. Interpretability:
1. Decision trees are easy to understand and interpret, making them suitable for
explaining the decision-making process to non-experts.
2. Handles Mixed Data Types:
1. Decision trees can handle both numerical and categorical data without the need
for feature scaling or one-hot encoding.
3. Implicit Feature Selection:
1. Decision trees perform implicit feature selection by selecting the most informative
features for splitting the data at each node.
4. Handles Missing Values and Outliers:
1. Decision trees can handle missing values and outliers in the data by choosing
alternative splits.
Disadvantages of Decision Trees:
1. Overfitting:
1. Decision trees are prone to overfitting, especially on noisy or high-dimensional
data.
2. Instability:
1. Small variations in the data or random noise can lead to different decision tree
structures, making them sensitive to the specific training data used.
3. Limited Expressiveness:
1. Decision trees may fail to capture complex relationships in the data, especially
when the decision boundaries are not axis-aligned.
APPLICATION OF DECISION TREE