Unit-1
Unit-1
Unit 1
Course outlines:
Unit I: Introduction to machine learning:
Introduction – Learning, Types of Learning, Well defined learning problems, Designing a Learning System, History of ML, Introduction of
Machine Learning Approaches, Introduction to Model Building, Sensitivity Analysis, Underfitting and Overfitting, Bias and Variance,
Concept Learning Task, Find – S Algorithms, Version Space and Candidate Elimination Algorithm, Inductive Bias, Issues in Machine
Learning and Data Science Vs Machine Learning.
Unit-II Mining association and supervised learning:
Classification and Regression, Regression: Linear Regression, Multiple Linear Regression, Logistic Regression, Polynomial Regression,
Decision Trees: ID3, C4.5, CART. Apriori Algorithm: Market basket analysis, Association Rules. Neural Networks: Introduction,
Perceptron, Multilayer Perceptron, Support vector machine.
UNIT-III UNSUPERVISED LEARNING:
Introduction to clustering, K-means clustering, K-Nearest Neighbor, Iterative distance-based clustering, Dealing with continuous,
categorical values in K-Means, Hierarchical: AGNES, DIANA, Partitional: K-means clustering, K-Mode Clustering, density-based
clustering, Expectation Maximization, Gaussian Mixture Models.
UNIT-IV PROBABILISTIC LEARNING & ENSEMBLE
Bayesian Learning, Bayes Optimal Classifier, Naıve Bayes Classifier, Bayesian Belief Networks. Ensembles methods: Bagging &
boosting, C5.0 boosting, Random Forest, Gradient Boosting Machines and XGBoost.
UNIT-V REINFORCEMENT LEARNING & CASE STUDIES
Reinforcement Learning: Introduction to Reinforcement Learning, Learning Task, Example of Reinforcement Learning in Practice,
Learning Models for Reinforcement – (Markov Decision process, Q Learning – Q Learning function, QLearning Algorithm), Application
of Reinforcement Learning.
Case Study: Health Care, E Commerce, Smart Cities
Introduction: Machine learning
▪ Machine learning is a subfield of artificial intelligence that focuses on the development of
algorithms that can learn patterns and make predictions based on what it has learnt from data.
▪ Aim: Analyse data to discover patterns, make predictions, or automate tasks.
Amazon Echo is a smart speaker that uses Alexa, the virtual assistant AI technology developed by
Amazon. Amazon Alexa is capable of voice interaction, playing music, setting alarms, playing
audiobooks, and giving real-time information such as news, weather, sports, and traffic reports.
As you can see in the illustration below, the person wants to know the current temperature in
Chicago. The person’s voice is first converted into a machine-readable format. The formatted
data is then fed into the Amazon Alexa system for processing and analyzing. Finally, Alexa
returns the desired voice output via Amazon Echo.
AI, ML and DL
• Machine Learning is a subset of artificial intelligence that helps you build AI-driven applications.
• Deep Learning is a subset of machine learning that uses vast volumes of data and complex algorithms to train
a model.
Difference between AI, ML and DL
Aspect Artificial Intelligence (AI) Machine Learning (ML) Deep Learning (DL)
Subset of AI that enables systems to learn Subfield of ML that uses neural networks
Intelligence demonstrated by machines, aiming to mimic
Definition and make predictions without explicit with multiple layers to model and process
human cognitive functions.
programming. complex data representations.
➢ Supervised Learning:
➢ Unsupervised Learning:
➢ Semi-Supervised Learning:
➢ Reinforcement Learning:
➢ Transfer Learning:
➢ Deep Learning:
Supervised learning:
2 30 Female $35,000 No
4 28 Female $45,000 No
➢ Model Training: The labeled dataset is used to train a model using various algorithms such as
regression, decision trees, random forests, support vector machines, or neural networks. The model
learns the underlying patterns and relationships between the input features and the output labels.
➢ Model Evaluation: The trained model is evaluated using separate test data that was not used during
the training phase. Evaluation metrics such as accuracy, precision, recall, or mean squared error are
commonly used to assess the model's performance.
➢ Prediction: Once the model is trained and validated, it can be used to make predictions on new, unseen
data by providing the input features, and the model generates the corresponding output based on its
learned patterns.
Data pre-processing:
➢ Pre-processing includes a number of techniques and actions:
➢ Data cleaning: These techniques, manual and automated, remove data incorrectly added or classified.
➢ Data imputations: Most ML frameworks include methods and APIs for balancing or filling in missing data. Techniques
generally include imputing missing values with standard deviation, mean, median and k-nearest neighbors (k-NN) of
the data in the given field.
➢ Oversampling: Bias or imbalance in the dataset can be corrected by generating more observations/samples with
methods like repetition.
➢ Data integration: Combining multiple datasets to get a large corpus can overcome incompleteness in a single
dataset.
➢ Data normalization: The size of a dataset affects the memory and processing required for iterations during training.
Normalization reduces the size by reducing the order and magnitude of data.
• Advantages of Supervised Machine Learning
• Supervised Learning models can have high accuracy as they are trained
on labelled data.
• The process of decision-making in supervised learning models is often
interpretable.
• It can often be used in pre-trained models which saves time and
resources when developing new models from scratch.
• Disadvantages of Supervised Machine Learning
• It has limitations in knowing patterns and may struggle with unseen or
unexpected patterns that are not present in the training data.
• It can be time-consuming and costly as it relies on labeled data only.
• It may lead to poor generalizations based on new data
• Applications of Supervised Learning
• Supervised learning is used in a wide variety of applications, including:
• Image classification: Identify objects, faces, and other features in images.
• Natural language processing: Extract information from text, such as sentiment, entities, and relationships.
• Speech recognition: Convert spoken language into text.
• Recommendation systems: Make personalized recommendations to users.
• Predictive analytics: Predict outcomes, such as sales, customer churn, and stock prices.
• Medical diagnosis: Detect diseases and other medical conditions.
• Fraud detection: Identify fraudulent transactions.
• Autonomous vehicles: Recognize and respond to objects in the environment.
• Email spam detection: Classify emails as spam or not spam.
• Quality control in manufacturing: Inspect products for defects.
• Credit scoring: Assess the risk of a borrower defaulting on a loan.
• Gaming: Recognize characters, analyze player behavior, and create NPCs.
• Customer support: Automate customer support tasks.
• Weather forecasting: Make predictions for temperature, precipitation, and other meteorological parameters.
• Sports analytics: Analyze player performance, make game predictions, and optimize strategies.
Unsupervised Learning:
▪ Unsupervised learning deals with unlabeled data.
▪ The model learns to identify similarities, differences, or groupings in the data based on its intrinsic properties.
▪ The objective is to discover the underlying structure or patterns within the data without any predefined output labels.
▪ Common tasks in unsupervised learning include clustering similar data points together or dimensionality reduction to
identify important features.
1. Dataset Preparation: An unlabeled dataset is collected, consisting only of input features without any
corresponding output labels.
2. Model Training: The model is trained on the unlabeled data using algorithms such as clustering,
dimensionality reduction, or generative models. The model identifies patterns, relationships, or groupings in
the data based on statistical properties or other measures of similarity.
3. Model Evaluation: Unsupervised learning models are evaluated based on internal metrics such as cohesion,
separation, or reconstruction error. Domain-specific evaluation measures can also be utilized, depending on
the task.
4. Knowledge Extraction: Once the model is trained, it can be used to gain insights, find anomalies, or create
representations that aid in downstream tasks, such as data visualization, anomaly detection, or feature
extraction.
Semi-supervised learning:
▪ Semi-supervised machine learning is a learning process that combines elements of both supervised and
unsupervised learning.
▪ In this approach, a model is trained on a dataset that contains a mixture of labeled and unlabeled examples.
▪ The goal of semi-supervised learning is to leverage the information present in both labeled and unlabeled data
to improve the model's performance, especially when obtaining a large amount of labeled data is costly or
impractical.
▪ Suppose you have 100 labeled movie reviews (50 positive, 50 negative) and 1000 unlabeled movie reviews.
▪ You train a base sentiment analysis model using the labeled data.
▪ In the self-training phase, you use the base model to predict sentiment labels for the unlabeled reviews.
▪ You treat these predictions as pseudo-labels and incorporate them into the training set.
▪ In the co-training phase, you could train two separate sentiment analysis models on different types of features
(e.g., bag-of-words). These models exchange their predictions on the unlabeled data to enhance each other's
training.
• Advantages of Semi- Supervised Machine Learning
• It leads to better generalization as compared to supervised learning, as it
takes both labeled and unlabeled data.
• Can be applied to a wide range of data.
• Disadvantages of Semi- Supervised Machine Learning
• Semi-supervised methods can be more complex to implement compared
to other approaches.
• It still requires some labeled data that might not always be available or
easy to obtain.
• The unlabeled data can impact the model performance accordingly.
• Applications of Semi-Supervised Learning
• Here are some common applications of semi-supervised learning:
• Image Classification and Object Recognition: Improve the accuracy of models by
combining a small set of labeled images with a larger set of unlabeled images.
• Natural Language Processing (NLP): Enhance the performance of language models and
classifiers by combining a small set of labeled text data with a vast amount of unlabeled
text.
• Speech Recognition: Improve the accuracy of speech recognition by leveraging a limited
amount of transcribed speech data and a more extensive set of unlabeled audio.
• Recommendation Systems: Improve the accuracy of personalized recommendations by
supplementing a sparse set of user-item interactions (labeled data) with a wealth of
unlabeled user behavior data.
• Healthcare and Medical Imaging: Enhance medical image analysis by utilizing a small set
of labeled medical images alongside a larger set of unlabeled images.
Reinforcement learning:
▪ Reinforcement learning involves training an agent to interact with an environment and learn from the
feedback it receives.
▪ The agent learns through a trial-and-error process by taking actions and receiving rewards or penalties based
on its performance.
▪ The goal is to maximize the cumulative reward over time, leading to the development of optimal strategies or
policies.
• Helps you to discover which action yields the highest reward over the longer period.
• Reinforcement learning also provides the learning agent with a reward function.
• It also allows it to figure out the best method for obtaining large rewards.
Advantages and disadvantages of reinforcement learning:
Advantages:
• It can solve higher-order and complex problems. Also, the solutions obtained will be very accurate.
• This model will undergo a rigorous training process that can take time. This can help to correct any errors.
• Due to it’s learning ability, it can be used with neural networks. This can be termed as deep reinforcement learning.
• Since the model learns constantly, a mistake made earlier would be unlikely to occur in the future.
• The best part is that even when there is no training data, it will learn through the experience it has from processing the
training data.
Disadvantages:
• The use of reinforcement learning models for solving simpler problems won’t be correct. The reason being, the models
generally tackle complex problems.
• Reinforcement Learning models require a lot of training data to develop accurate results.
• This consumes time and lots of computational power.
• When it comes to building models on real-world examples, the maintenance cost is very high.
• Excessive training can lead to overloading of the states of the model.
• This may happen if too much memory space goes out in processing the training data.
Transfer learning (Pre-trained models):
▪ Transfer learning involves leveraging knowledge or models learned from one task or domain to improve performance on
another related task or domain.
▪ In transfer learning, pretrained models are commonly used. These are models that have been trained on a large dataset for a
different task, such as image classification on ImageNet (1.2million images with 1000 categories).
▪ These pretrained models capture general features that can be valuable for a variety of tasks.
▪ The pre-trained models are fine-tuned or adapted to the new problem with a smaller amount of task-specific data.
▪ pre-trained models are often used for image classification, object detection, natural language processing, and generative
modeling. By leveraging pre-trained models, we can reduce the need for extensive training on large datasets from scratch.
6. EfficientNet: EfficientNetB0-EfficientNetB7
9. Xception
Advantages and disadvantages of transfer learning:
Advantages:
• Speed up the training process: By using a pre-trained model, the model can learn more quickly and effectively on the
second task, as it already has a good understanding of the features and patterns in the data.
• Better performance: Transfer learning can lead to better performance on the second task, as the model can leverage the
knowledge it has gained from the first task.
• Handling small datasets: When there is limited data available for the second task, transfer learning can help to prevent
overfitting, as the model will have already learned general features that are likely to be useful in the second task.
Disadvantages:
• Domain mismatch: The pre-trained model may not be well-suited to the second task if the two tasks are vastly different
or the data distribution between the two tasks is very different.
• Overfitting: Transfer learning can lead to overfitting if the model is fine-tuned too much on the second task, as it may
learn task-specific features that do not generalize well to new data.
• Complexity: The pre-trained model and the fine-tuning process can be computationally expensive and may require
specialized hardware.
Deep learning neural network:
4. Long Short-Term Memory (LSTM): LSTMs are a specialized type of RNN that addresses the vanishing gradient
problem, which occurs when training deep neural networks. LSTMs have a more complex structure with memory cells,
input gates, forget gates, and output gates. They are widely used for tasks that involve longer-term dependencies, such as
speech recognition, text generation, and sentiment analysis.
5. Generative Adversarial Networks (GAN): GANs consist of two neural networks, a generator and a discriminator,
that compete against each other in a game-theoretic framework. The generator tries to produce realistic data samples,
while the discriminator aims to distinguish between real and generated samples. GANs are popular for generating realistic
images, video synthesis, and data augmentation.
6. Transformer: Transformers are an attention-based model architecture that has gained significant attention in natural
language processing (NLP). Unlike traditional sequential models like RNNs, Transformers can capture dependencies
between words in a sentence simultaneously, enabling parallel computation and improved performance. Transformers have
been used in tasks such as machine translation, question answering, and text summarization.
Deep learning layers:
1. Input Layer: This is the first layer of the neural network, which receives the raw input data. Its role is to pass this input
data to the next layer without modifying it. The number of neurons in this layer corresponds to the dimensionality of the
input data.
2. Dense (Fully Connected) Layer: In this layer, each neuron is connected to every neuron in the previous and next layers.
This dense connectivity allows the network to learn complex patterns in the data. Each connection has an associated
weight, which the network learns during training to make predictions.
3. Convolutional Layer: This layer applies a set of learnable filters (kernels) to the input data using the convolution
operation. Each filter captures different local patterns in the input, allowing the network to extract hierarchical features
such as edges, textures, and shapes from images or spatial data.
4. Pooling Layer: Pooling layers reduce the spatial dimensions of the input data by down-sampling. Common pooling
operations include max pooling and average pooling, which respectively take the maximum or average value from a set
of values within a small window. Pooling helps to decrease the computational complexity of the model and make it more
robust to variations in input data.
Deep learning layers:
5. Recurrent Layer: Recurrent layers are designed to process sequential data by maintaining an internal state (hidden
state) that is updated at each time step. The output of the layer at each time step depends not only on the current
input but also on the previous hidden state, allowing the network to capture temporal dependencies in the data.
6. Dropout Layer: Dropout layers are a regularization technique used during training to prevent overfitting. During
training, a fraction of randomly selected neurons in the layer are temporarily dropped out (set to zero) with a
certain probability. This forces the network to learn more robust features by preventing it from relying too much
on any individual neuron.
7. Batch Normalization Layer: This layer normalizes the activations of the previous layer across the mini-batch of
data. Normalization helps to stabilize and accelerate the training process by reducing internal covariate shift and
allowing higher learning rates.
8. Activation Layer: Activation layers apply non-linear transformations to the output of the previous layer. Common
activation functions include ReLU (Rectified Linear Unit), sigmoid, and tanh. These non-linearities introduce
flexibility into the model, enabling it to learn complex mappings between inputs and outputs.
9. Output Layer: The output layer produces the final predictions or outputs of the network. The number of neurons
in this layer depends on the desired output dimensionality for the task at hand. The activation function used in this
layer depends on the nature of the task (e.g., softmax for multi-class classification, linear for regression).
Well posed learning problems:
A well-defined problem includes not only a clear problem statement but also well-defined evaluation
criteria. This means that the problem statement should precisely outline what needs to be achieved or
predicted, and the evaluation criteria should provide a measurable way to assess the performance of the
solution.
For example, in a classification problem, the problem statement might specify that the task is to classify
emails as either spam or not spam. The evaluation criteria could be the accuracy of the classifier, measured
by the proportion of correctly classified emails in a test dataset.
In a regression problem, the problem statement might involve predicting house prices based on various
features. The evaluation criteria could be the mean squared error (MSE) between the predicted prices and
the actual prices in a test dataset.
Well posed learning problems:
▪ A computer program is said to learn from experience E in context to some task T and some performance
measure P, if its performance on T, as was measured by P, upgrades with experience E.
▪ Any problem can be segregated as well-posed learning problem if it has three traits –
I. Task
II. Performance Measure
III. Experience
▪ Certain examples that efficiently defines the well-posed learning problem are –
1. To better filter emails as spam or not
Task – Classifying emails as spam or not
Performance Measure – The fraction of emails accurately classified as spam or not spam
Experience – Observing you label emails as spam or not spam
2. A checkers learning problem
Task – Playing checkers game
Performance Measure – percent of games won against opposer
Experience – playing implementation games against itself
Well posed learning problems:
3. Handwriting Recognition Problem
Task – Acknowledging handwritten words within portrayal
Performance Measure – percent of words accurately classified
Experience – a directory of handwritten words with given classifications
4. Fruit Prediction Problem
Task – forecasting different fruits for recognition
Performance Measure – able to predict maximum variety of fruits
Experience – training machine with the largest datasets of fruits images
5. Face Recognition Problem
Task – predicting different types of faces
Performance Measure – able to predict maximum types of faces
Experience – training machine with maximum amount of datasets of different face images
6. Automatic Translation of documents
Task – translating one type of language used in a document to other language
Performance Measure – able to convert one language to other efficiently
Experience – training machine with a large dataset of different types of languages
Designing a learning system:
▪ Designing a learning system involves several steps which are discussed below.
1. Problem Definition and Understanding:
1. Clearly define the problem you intend to solve. Understand the context, goals, and objectives of the problem.
2. Determine whether the problem requires supervised learning, unsupervised learning, reinforcement learning, or a
combination of these approaches.
3. Identify the input data (features) and the desired output (target) for the learning system.
2. Data Collection and Preprocessing:
1. Gather relevant data for training, validation, and testing. Data can come from various sources, such as databases,
APIs, sensors, or surveys.
2. Clean the data by handling missing values, outliers, and noisy data.
3. Preprocess the data by transforming and scaling features. This might involve techniques like normalization,
feature extraction, and dimensionality reduction.
3. Feature Engineering:
1. Select or engineer appropriate features that will be used as input for the learning algorithm.
2. Create new features that capture relevant patterns and information from the data.
3. Ensure that the features are meaningful, relevant, and contribute to the learning process.
Designing a learning system:
4.Model Selection:
1. Choose a suitable learning algorithm or model architecture based on the problem type (e.g.,
classification, regression, clustering) and the characteristics of the data.
2. Consider factors such as model complexity and computational requirements.
5.Model Training:
1. Split the dataset into training, testing and validation sets.
2. Use the training data to train the selected model. During this process, the model learns the relationships
between the input features and the target output.
3. Tune hyperparameters using the validation set to optimize the model's performance.
6.Model Evaluation:
1. Evaluate the trained model's performance using the testing set, which the model has not seen during
training.
2. Use appropriate evaluation metrics based on the problem type. For example, accuracy, precision, recall,
F1-score, mean squared error, etc.
3. Analyze the results to understand how well the model is performing and whether it meets the desired
criteria.
Designing a learning system:
7. Model Optimization:
1. If the model's performance is not satisfactory, consider adjusting hyperparameters, experimenting with
different algorithms, or collecting more relevant data.
2. Address issues like overfitting (model performs well on training data but poorly on new data) or
underfitting (model is too simple to capture underlying patterns).
▪ Low variance machine learning models: linear regression, logistic regression, and linear discriminant analysis.
▪ High variance machine learning models: decision tree, support vector machine, and k-nearest neighbours.
Different Combinations of Bias-Variance:
There are four possible combinations of bias and variances, which are represented by the below diagram.
▪ High Bias, Low Variance: A model with high bias and low variance is said to be underfitting.
▪ High Variance, Low Bias: A model with high variance and low bias is said to be overfitting.
▪ High-Bias, High-Variance: A model has both high bias and high variance, which means that the model is not able to
capture the underlying patterns in the data (high bias) and is also too sensitive to changes in the training data (high
variance). As a result, the model will produce inconsistent and inaccurate predictions on average.
▪ Low Bias, Low Variance: A model that has low bias and low variance means that the model is able to capture the
underlying patterns in the data (low bias) and is not too sensitive to changes in the training data (low variance). This
is the ideal scenario for a machine learning model, as it is able to generalize well to new, unseen data and produce
consistent and accurate predictions. But in practice, it’s not possible.
Bias-Variance Trade-Off:
▪ While building the machine learning model, it is really important to take care of bias and variance in order to avoid
overfitting and underfitting in the model.
▪ If the model is very simple with fewer parameters, it may have low variance and high bias.
▪ Whereas, if the model has a large number of parameters, it will have high variance and low bias.
▪ So, it is required to make a balance between bias and variance errors, and this balance between the bias error and
variance error is known as the Bias-Variance trade-off.
▪ Use a more complex model: One of the main reasons for high bias is the very simplified model. it will not be able to
capture the complexity of the data. In such cases, we can make our mode more complex by increasing the number of
hidden layers in the case of a deep neural network. Or we can use a more complex model like Polynomial regression
for non-linear datasets, CNN for image processing, and RNN for sequence learning.
▪ Increase the number of features: By adding more features to train the dataset will increase the complexity of the
model. And improve its ability to capture the underlying patterns in the data.
▪ Reduce Regularization of the model: Regularization techniques such as L1 or L2 regularization can help to prevent
overfitting and improve the generalization ability of the model. if the model has a high bias, reducing the strength of
regularization or removing it altogether can help to improve its performance.
▪ Increase the size of the training data: Increasing the size of the training data can help to reduce bias by providing
the model with more examples to learn from the dataset.
Ways to reduce variance in machine learning:
• Cross-validation: By splitting the data into training and testing sets multiple times, cross-validation can help identify
if a model is overfitting or underfitting and can be used to tune hyperparameters to reduce variance.
• Feature selection: By choosing the only relevant feature will decrease the model’s complexity. and it can reduce the
variance error.
• Ensemble methods: It will combine multiple models to improve generalization performance. Bagging, boosting, and
stacking are common ensemble methods that can help reduce variance and improve generalization performance.
• Simplifying the model: Reducing the complexity of the model, such as decreasing the number of parameters or
layers in a neural network, can also help reduce variance and improve generalization performance.
• Early stopping: Early stopping is a technique used to prevent overfitting by stopping the training of the deep
learning model when the performance on the validation set stops improving.
Overfitting and underfitting:
▪ Overfitting occurs when a model learns the training data too well, capturing not only the underlying patterns but also
the noise and randomness present in the data.
▪ As a result, an overfit model will perform exceptionally well on the training data but will struggle to generalize to
new, unseen data.
▪ This phenomenon can be thought of as the model "memorizing" the training data rather than learning the true
underlying relationships.
Fig: Underfitting
Example:
Characteristics of underfitting :
▪ High training error: The model's performance on the training data is not good.
▪ High validation/testing error: The model's performance on new data is also poor.
▪ Model is too simple: The model lacks the capacity to capture the underlying relationships in the data.
▪ Oversimplified features: The model might not be able to understand the complexities of the data.
• Causes of Underfitting:
• Simplistic model: Using a model with too few parameters or overly simplified structure.
• Insufficient training: Not training the model for enough epochs or iterations.
• Insufficient features: Lack of relevant features or using overly generalized features.
Vapnik-chervonenkis (VC) dimension:
▪ Vapnik-Chervonenkis (VC) dimension measures the capacity of a hypothesis space (set of possible functions or
classifiers) to shatter a given set of points.
1. Shattering: The concept of shattering refers to whether a hypothesis space can classify (or label) a given set of points
in all possible ways. In other words, if a hypothesis space can assign arbitrary labels to a set of points, it is said to
shatter those points.
2. VC Dimension: The VC dimension of a hypothesis space is the maximum number of points that can be shattered by
the space. In other words, it's the largest dataset size for which the hypothesis space can represent all possible
labelings.
Fig:Correlation matrix for the five attributes of interest in the diamonds dataset
Sensitivity Analysis:
Fig: Pie chart showing the average contribution of each diamond attribute on the diamond price
Concept learning:
▪ In Machine Learning, concept learning can be termed as “a problem of searching through a predefined space of potential
hypothesis for the hypothesis that best fits the training examples” – Tom Mitchell.
▪ Find-S algorithm, is a machine learning algorithm that seeks to find a maximally specific hypothesis based on labeled
training data. It starts with the most specific hypothesis and generalizes it by incorporating positive examples. It
ignores negative examples during the learning process.
▪ The algorithm's objective is to discover a hypothesis that accurately represents the target concept by progressively
expanding the hypothesis space until it covers all positive instances.
Find-S algorithm follows the steps written below:
▪ Start with the most specific hypothesis i.e. h = {ϕ, ϕ, ϕ, ϕ, ϕ, ϕ}
▪ Take the next example and if it is negative, then no changes occur to the hypothesis.
▪ If the example is positive and we find that our initial hypothesis is too specific then we update our current hypothesis
to a general condition.
▪ Keep repeating the above steps till all the training examples are complete.
▪ After we have completed all the training examples we will have the final
hypothesis when can use to classify the new examples.
▪ Consider example 1 :
The data in example 1 is { GREEN, HARD, NO, WRINKLED }. We see that our initial hypothesis is more specific and
we have to generalize it for this example. Hence, the hypothesis becomes :
h = { GREEN, HARD, NO, WRINKLED }
• Consider example 2 :
Here we see that this example has a negative outcome. Hence we neglect this example and our hypothesis remains the
same.
h = { GREEN, HARD, NO, WRINKLED }
• Consider example 3 :
Here we see that this example has a negative outcome. Hence we neglect this example and our hypothesis remains the
same.
h = { GREEN, HARD, NO, WRINKLED }
Example: Find-S algorithm
• Consider example 4 :
The data present in example 4 is { ORANGE, HARD, NO, WRINKLED }. We compare every single attribute with
the initial data and if any mismatch is found we replace that particular attribute with a general case ( ” ? ” ). After
doing the process the hypothesis becomes :
h = { ?, HARD, NO, WRINKLED }
• Consider example 5 :
The data present in example 5 is { GREEN, SOFT, YES, SMOOTH }. We compare every single attribute with the
initial data and if any mismatch is found we replace that particular attribute with a general case ( ” ? ” ). After doing
the process the hypothesis becomes :
h = { ?, ?, ?, ? }
• Since we have reached a point where all the attributes in our hypothesis have the general condition, example 6 and
example 7 would result in the same hypothesizes with all general attributes.
h = { ?, ?, ?, ? }
• Hence, for the given data the final hypothesis would be :
Final Hypothesis: h = { ?, ?, ?, ? }
Example 2: Find S algorithm
• Looking at the data set, we have six attributes and a final attribute that defines the positive or negative
example. In this case, yes is a positive example, which means the person will go for a walk.
Temperat
Time Weather Company Humidity Wind Goes
ure
Morning Sunny Warm Yes Mild Strong Yes
Evening Rainy Cold No Mild Normal No
Morning Sunny Moderate Yes Normal Normal Yes
Evening Sunny Cold Yes High Strong Yes
Step 1:
Initialize G & S as most General and specific hypothesis.
G ={'?', '?','?','?', '?','?'}
S = {'φ','φ','φ','φ','φ','φ'}
Step 2:
for each +ve example: make a specific hypothesis more general.
s = {'φ','φ','φ','φ','φ','φ'}
Take the most specific hypothesis as your 1st positive instance.
• Step 3:
Compare with another positive instance for each attribute.
if (attribute value = hypothesis value) do nothing.
else
replace the hypothesis value with more general constraint '?'.
Since instance 2 is also positive so we will compare with it. In instance 2 attribute humidity is changing so we will generalize
that attribute.
• S={'sunny', 'warm','?', 'Strong', 'warm', 'same'}
Step 4:
Instance 3 is negative so for each -ve example make general hypothesis more specific.
we will make the general hypothesis more specific by comparing all the attributes of the negative instance with the positive
instance if attribute found different to create a dedicated set for the attribute.
G ={<'sunny', '?','?','?', '?','?'> , <'?', 'warm','?','?', '?','?'> , <'?', '?','Normal','?', '?','?'> , < '?', '?','?','?', '?','same'>}
• Solution:
• S0: (0, 0, 0) Most Specific Boundary
• G0: (?, ?, ?) Most Generic Boundary
• The first example is negative, the hypothesis at the specific boundary is consistent, hence we retain it, and the
hypothesis at the generic boundary is inconsistent hence we write all consistent hypotheses by removing one
“?” at a time.
• S1: (0, 0, 0)
• G1: (Small, ?, ?), (?, Blue, ?), (?, ?, Triangle)
Continue:
The second example is negative, the hypothesis at the specific boundary is consistent, hence we retain it, and the
hypothesis at the generic boundary is inconsistent hence we write all consistent hypotheses by removing one “?” at a time.
S2: (0, 0, 0)
G2: (Small, Blue, ?), (Small, ?, Circle), (?, Blue, ?), (Big, ?, Triangle), (?, Blue, Triangle)
The third example is positive, the hypothesis at the specific boundary is inconsistent, hence we extend the specific
boundary, and the consistent hypothesis at the generic boundary is retained and inconsistent hypotheses are removed from
the generic boundary.
S3: (Small, Red, Circle)
G3: (Small, ?, Circle)
The fourth example is negative, the hypothesis at the specific boundary is consistent, hence we retain it, and the hypothesis
at the generic boundary is inconsistent hence we write all consistent hypotheses by removing one “?” at a time.
S4: (Small, Red, Circle)
G4: (Small, ?, Circle)
Continue:
The second example is negative, the hypothesis at the specific boundary is consistent, hence we retain it, and the
hypothesis at the generic boundary is inconsistent hence we write all consistent hypotheses by removing one “?” at a time.
S2: (0, 0, 0)
G2: (Small, Blue, ?), (Small, ?, Circle), (?, Blue, ?), (Big, ?, Triangle), (?, Blue, Triangle)
The third example is positive, the hypothesis at the specific boundary is inconsistent, hence we extend the specific
boundary, and the consistent hypothesis at the generic boundary is retained and inconsistent hypotheses are removed from
the generic boundary.
S3: (Small, Red, Circle)
G3: (Small, ?, Circle)
The fourth example is negative, the hypothesis at the specific boundary is consistent, hence we retain it, and the hypothesis
at the generic boundary is inconsistent hence we write all consistent hypotheses by removing one “?” at a time.
S4: (Small, Red, Circle)
G4: (Small, ?, Circle)
The fifth example is positive, the hypothesis at the specific boundary is inconsistent, hence we extend the specific
boundary, and the consistent hypothesis at the generic boundary is retained and inconsistent hypotheses are removed from
the generic boundary.
S5: (Small, ?, Circle)
G5: (Small, ?, Circle)
Advantages and disadvantages:
Advantages of CEA over Find-S:
1.Improved accuracy: CEA considers both positive and negative examples to generate the hypothesis, which can result in
higher accuracy when dealing with noisy or incomplete data.
2.Flexibility: CEA can handle more complex classification tasks, such as those with multiple classes or non-linear decision
boundaries.
3.More efficient: CEA reduces the number of hypotheses by generating a set of general hypotheses and then eliminating
them one by one. This can result in faster processing and improved efficiency.
4.Better handling of continuous attributes: CEA can handle continuous attributes by creating boundaries for each attribute,
which makes it more suitable for a wider range of datasets.
Disadvantages of CEA in comparison with Find-S:
1.More complex: CEA is a more complex algorithm than Find-S, which may make it more difficult for beginners or those
without a strong background in machine learning to use and understand.
2.Higher memory requirements: CEA requires more memory to store the set of hypotheses and boundaries, which may
make it less suitable for memory-constrained environments.
3.Slower processing for large datasets: CEA may become slower for larger datasets due to the increased number of
hypotheses generated.
4.Higher potential for overfitting: The increased complexity of CEA may make it more prone to overfitting on the training
data, especially if the dataset is small or has a high degree of noise.
Issues in machine learning:
1. Bias and Fairness:
Issue: Bias in training data can lead to discriminatory or unfair predictions, disproportionately affecting certain groups.
Example: A hiring model trained on historical data might unfairly favor male candidates if the past hiring decisions
were biased towards male applicants.
2. Data Quality and Quantity:
Issue: Inaccurate or insufficient data can lead to poor model performance.
Example: A weather forecasting model trained on incomplete or incorrect weather data might struggle to accurately
predict future weather patterns.
3. Overfitting and Underfitting:
Issue: Overfitting occurs when a model captures noise in the training data and doesn't generalize well to new data,
while underfitting is when the model is too simple to capture underlying patterns.
Example: An overfitted spam email classifier memorizes specific words in the training data, leading to poor
performance on new emails.
Issues in machine learning with example:
4. Interpretable Models:
Issue: Complex models like deep neural networks can be difficult to interpret, making it challenging to understand why
a certain prediction was made.
Example: A medical diagnosis model based on a neural network might accurately diagnose patients, but doctors may
struggle to explain the reasoning behind the predictions.
5. Feature Engineering:
Issue: Selecting relevant features and engineering them properly is crucial for model performance.
Example: Building a sentiment analysis model for movie reviews requires identifying and representing important
features like sentiment-bear]ing words.
6. Computational Resources:
Issue: Training large and complex models can be computationally expensive and require powerful hardware.
Example: Training a deep learning model for image recognition might require specialized GPUs to process the vast
amount of data efficiently.
7. Scalability:
Issue: Adapting machine learning solutions to handle large datasets or real-time applications can be challenging.
Example: An e-commerce recommendation system needs to quickly process user interactions and adjust
recommendations in real-time as more users interact with the platform.
Issues in machine learning with example:
8. Ethical Considerations:
Issue: Machine learning applications can raise ethical concerns, such as privacy violations or biased decision-making.
Example: An AI-powered lending model might unfairly deny loans to certain demographic groups, perpetuating
historical biases.
9. Model Robustness:
Issue: Models can be sensitive to small changes in input data, making them susceptible to adversarial attacks.
Example: An autonomous vehicle's image recognition system might misinterpret a small sticker on a stop sign, leading
to a potentially dangerous situation.
10. Continual Learning:
Issue: Traditional models might struggle to adapt to new data over time without forgetting previous knowledge.
Example: A language translation model needs to continuously learn and incorporate new language patterns as
languages evolve.
Machine learning vs data science:
Aspect Machine Learning Data Science
Building algorithms to learn from data and make Extracting insights from data to inform decisions
Focus
predictions or decisions and strategies
Decision trees, neural networks, support vector Statistical analysis, data visualization, data
Techniques
machines, etc. preprocessing, etc.
Role Specialized subset of data science Broad field encompassing various activities
Data
Utilizes data to train and improve models Utilizes data for analysis and decision-making
Utilization
Integration Used within data science for predictive modeling Part of the broader data analysis process
Dependency Requires data for training and evaluation Requires data for analysis and insights
Quiz:
Q.1 In supervised learning, what type of data does the model learn from?
a) Unlabeled data
b) Labeled data
c) Noisy data
d) Both labeled and unlabeled data
Q.2 What type of data does the model learn from in unsupervised learning?
a) Labeled data
b) Noisy data
c) Unlabeled data
d) Both labeled and unlabeled data
Q.3 What does semi-supervised learning utilize?
a) Only labeled data
b) Only unlabeled data
c) Both labeled and unlabeled data
d) Noisy data
Quiz:
Q.4 In reinforcement learning, how does the model learn to make decisions?
a) By receiving labeled data
b) By interacting with an environment and receiving feedback
c) By clustering data points
d) By memorizing patterns in the data
Q.5 What is the primary characteristic of unsupervised learning?
a)The model learns from both labeled and unlabeled data, using a combination of supervised and unsupervised
techniques.
b) Input data is labeled, and the model learns to map input to output based on provided examples.
c) Input data is not labeled, and the model learns to find patterns or structure in the data.
d) The model interacts with an environment, receiving feedback in the form of rewards or penalties.
Q.6 What is an example of supervised learning?
a) Clustering
b) Reinforcement learning
c) Classification
d) Dimensionality reduction
Quiz:
Q.7 What is transfer learning in machine learning?
a) It refers to transferring data between different devices.
b) It involves transferring knowledge from one machine learning task to another.
c) It is the process of transferring data from a local machine to a cloud server.
d) It involves transferring data from one domain to another without any modifications.
Q.8 What is a key characteristic of deep learning algorithms?
a) They require a small amount of data to train effectively.
b) They only work with shallow neural networks.
c) They involve the use of multiple layers to learn hierarchical representations of data.
d) They are not suitable for processing unstructured data.
Q.9 What is a characteristic of a well-defined learning problem?
a) Ambiguity in the desired output
b) Lack of available data
c) Clear specification of input and output
d) Complexity beyond current technology
Quiz:
Q.10 What is an example of bias in machine learning?
a) Selecting a model that is too simple
b) Selecting a model that is too complex
c) Failing to consider certain features that are important for prediction
d) Fitting the training data too closely
Q.11 What is the main goal of the Candidate Elimination Algorithm?
a) To find the best hyperparameters for a model
b) To eliminate candidates for the final model based on their performance
c) To identify the most suitable algorithm for a given dataset
d) To incrementally update the version space based on observed data
Q.12 What does inductive bias refer to in machine learning?
a) The inherent limitations of the learning algorithm
b) The bias introduced by the data collection process
c) The bias towards simpler models
d) The bias towards complex models
Quiz:
Q.13 What does sensitivity analysis in machine learning involve?
a) Analyzing the sensitivity of a model's predictions to changes in its parameters
b) Analyzing the sensitivity of the dataset
c) Sensing the environment for data
d) Analyzing the sensitivity of the loss function
Q.14 What is underfitting in machine learning?
a) When a model performs well on the training data but poorly on unseen data
b) When a model performs poorly on both the training and unseen data
c) When a model is too complex and captures noise in the training data
d) When a model fails to capture the underlying patterns in the data
Q.15 What is overfitting in machine learning?
a) The model fits the training data too closely and fails to generalize well to unseen test data.
b) The model's predictions are highly sensitive to small changes in the training data, leading to inaccurate
performance.
c) The model performs poorly on both the training and unseen test data due to underfitting.
d) The model tends to oversimplify the underlying patterns in the data, resulting in biased predictions.
References:
1. https://data-flair.training/blogs/machine-learning-tutorial/
2. https://www.javatpoint.com/supervised-machine-learning
3. https://towardsdatascience.com/unsupervised-machine-learning-example-
in-keras-8c8bf9e63ee0
4. https://www.enjoyalgorithms.com/blogs/supervised-unsupervised-and-
semisupervised-learning
5. https://techvidvan.com/tutorials/reinforcement-
learning/#:~:text=What%20is%20Reinforcement%20Learning?
6. https://www.javatpoint.com/transfer-learning-in-machine-learning
7. https://john.sisler.info/resume/deep-learning-specialization/neural-
networks-and-deep-learning
8. https://www.simplilearn.com/tutorials/machine-learning-
tutorial/classification-in-machine-learning
9. https://www.javatpoint.com/bias-and-variance-in-machine-learning
10. https://www.superannotate.com/blog/overfitting-and-underfitting-in-
machine-learning
References:
11. https://www.geeksforgeeks.org/ml-find-s-algorithm/
12. https://www.edureka.co/blog/find-s-algorithm-in-machine-learning/
13. https://www.geeksforgeeks.org/ml-candidate-elimination-algorithm/?ref=gcse
14. https://www.getwayssolution.com/2019/12/candidate-elimination-algorithm-concept.html