DMI UNIT 4
DMI UNIT 4
❖ DISADVANTAGES:
1) The decision tree contains lots of layers, which makes it complex.
2) It may have an overfitting issue, which can be resolved using the Random Forest
algorithm.
3) For more class labels, the computational complexity of the decision tree may
increase.
4) Instability to small variations: Decision Trees are sensitive to small variations in
the data. A small change in the training data can result in a significantly different
tree structure, which can reduce the stability and reliability of the model.
❖ APPLICATIONS:
1) Medical Diagnosis: Decision Trees are used in healthcare for diagnosing
diseases based on symptoms and patient characteristics. They can help doctors
make decisions by analyzing patient data and suggesting potential diagnoses or
treatment plans.
2) Credit Risk Assessment: Financial institutions use Decision Trees to assess the
creditworthiness of loan applicants. By analyzing factors like credit history,
income, and debt, Decision Trees can predict the likelihood of a borrower
defaulting on a loan.
3) Customer Relationship Management: Decision Trees are employed in customer
relationship management (CRM) systems to analyze customer data and predict
customer behavior. This can include predicting customer churn, identifying
upsell opportunities, or determining the most effective marketing strategies.
4) Fraud Detection: Decision Trees are utilized in fraud detection systems across
various industries, including banking, insurance, and e-commerce. By analyzing
transaction data and user behavior, Decision Trees can identify patterns
indicative of fraudulent activity, helping to prevent financial losses and protect
against cybercrime.
❖ TREE PRUNING
1) Pruning is a process of deleting the unnecessary nodes from a tree in order to
get the optimal decision tree.
2) A too-large tree increases the risk of overfitting, and a small tree may not
capture all the important features of the dataset. Therefore, a technique that
decreases the size of the learning tree without reducing accuracy is known as
Pruning.
3) Such methods typically use statistical measures to remove the least-reliable
branches.
4) Pruned trees tend to be smaller and less complex and, thus, easier to
comprehend. They are usually faster and better at correctly classifying
independent test data (i.e., of previously unseen tuples) than unpruned trees.
5) In the prepruning approach, a tree is “pruned” by halting its construction early
(e.g., by deciding not to further split or partition the subset of training tuples
at a given node). Upon halting, the node becomes a leaf. The leaf may hold
the most frequent class among the subset tuples or the probability distribution
of those tuples.
6) The second and more common approach is postpruning, which removes
subtrees from a “fully grown” tree. A subtree at a given node is pruned by
removing its branches and replacing it with a leaf. The leaf is labeled with the
most frequent class among the subtree being replaced.
7) There are mainly two types of tree pruning technology used:
▪ Cost Complexity Pruning: It's a method where the tree is pruned by
removing nodes with the least impact on overall accuracy, based on a cost-
complexity measure, which balances tree simplicity and accuracy.
▪ Reduced Error Pruning: This method involves iteratively removing
branches of the decision tree that don't significantly improve accuracy on a
separate validation dataset, aiming to reduce overfitting by simplifying the
tree.
❖ DISADVANTAGES
• Complexity in Rule Generation: Creating effective rules often requires domain
expertise and significant manual effort to analyze and understand the data. In
complex domains, the number of rules generated can be very large, leading to
maintenance challenges.
• Rule Redundancy and Overfitting: Rule-based systems can suffer from
overfitting, where rules are overly specific to the training data and do not
generalize well to unseen data. This can result in redundant rules that describe
noise or outliers in the data.
• Interpretability vs. Performance Trade-off: While rule-based systems are often
praised for their interpretability, overly complex rule sets can become difficult to
interpret. Additionally, simpler rule sets may sacrifice predictive performance.
• Difficulty in Handling Noisy Data: Rule-based systems are sensitive to noisy
data, as they may generate rules that describe outliers or irrelevant patterns.
Preprocessing steps such as data cleaning and outlier detection are often
necessary to improve the quality of rules generated.
❖ APPLICATIONs
1. Expert Systems:
• Rule-based systems are widely used in expert systems, which are computer
programs that mimic the decision-making ability of a human expert in a
specific domain.
• Expert systems use rules to encode the knowledge and expertise of domain
experts, allowing them to make decisions, provide recommendations,
diagnose problems, or solve complex problems.
2. Business Rules Management:
• Rule-based algorithms are used in business rules management systems
(BRMS), which are software tools that enable organizations to define,
manage, and execute business rules.
• BRMS allow businesses to automate decision-making processes, enforce
business policies and regulations, and ensure consistency and compliance
across various business operations.
3. Medical Diagnosis:
• Rule-based systems are applied in medical diagnosis and decision support
systems to assist healthcare professionals in diagnosing diseases, selecting
appropriate treatments, and making clinical decisions.
• These systems use rules to interpret patient data, symptoms, medical history,
and test results, helping doctors to make informed decisions and improve
patient outcomes.
4. Natural Language Processing (NLP):
• Rule-based algorithms are utilized in natural language processing (NLP)
applications for tasks such as text classification, information extraction,
sentiment analysis, and question answering.
• Rule-based approaches allow developers to define grammatical rules,
patterns, and linguistic rules to analyze and process natural language text,
enabling the extraction of meaningful information and insights from textual
data.
4.6 Support vector machines
1) Support Vector Machine or SVM is one of the most popular Supervised Learning
algorithms, which is used for Classification as well as Regression problems.
2) The main objective of the SVM algorithm is to find the optimal hyperplane in an
N-dimensional space that can separate the data points in different classes in the
feature space.
3) The hyperplane tries that the margin between the closest points of different classes
should be as maximum as possible
4) The dimension of the hyperplane depends upon the number of features. If the
number of input features is two, then the hyperplane is just a line. If the number of
input features is three, then the hyperplane becomes a 2-D plane.
o Linear SVM: Linear SVM is used for linearly separable data, which means if a
dataset can be classified into two classes by using a single straight line, then such
data is termed as linearly separable data, and classifier is used called as Linear SVM
classifier.
o Non-linear SVM: Non-Linear SVM is used for non-linearly separated data, which
means if a dataset cannot be classified by using a straight line, then such data is
termed as non-linear data and classifier used is called as Non-linear SVM classifier.
So as it is 2-d space so by just using a straight line, we can easily separate these two
classes. But there can be multiple lines that can separate these classes. Consider the below
image:
Hence, the SVM algorithm helps to find the best line or decision boundary; this best boundary or
region is called as a hyperplane. SVM algorithm finds the closest point of the lines from both the
classes. These points are called support vectors. The distance between the vectors and the
hyperplane is called as margin. And the goal of SVM is to maximize this margin.
The hyperplane with maximum margin is called the optimal hyperplane.
Non-Linear SVM:
1) If data is linearly arranged, then we can separate it by using a straight line, but for non-
linear data, we cannot draw a single straight line.
2) linear SVMs can be extended to create nonlinear SVMs for the classification
of linearly inseparable data (also called nonlinearly separable data, or
nonlinear data for short). Such SVMs are capable of finding nonlinear
decision boundaries (i.e., nonlinear hypersurfaces) in input space.
3) We obtain a nonlinear SVM by extending the approach for linear SVMs as
follows: In the first step, we transform the original input data into a higher
dimensional space using a nonlinear mapping.
Once the data have been transformed into the new higher space, the second
step searches for a linear separating hyperplane in the new space.
4) So to separate these data points, we need to add one more dimension. For
linear data, we have used two dimensions x and y, so for non-linear data, we
will add a third dimension z. It can be calculated as:
z=x2 +y2
5) By adding the third dimension, the sample space will become as below image:
6) So now, SVM will divide the datasets into classes in the following way.
Consider the below image:
7) Since we are in 3-d Space, hence it is looking like a plane parallel to the x-
axis. If we convert it in 2d space with z=1, then it will become as:
❖ ADVANTAGES
1. Effective in High-Dimensional Spaces: SVMs perform well in high-
dimensional spaces, making them suitable for classification tasks where the
number of features exceeds the number of samples. This capability makes
SVMs particularly useful in text classification, image recognition, and
bioinformatics.
2. Memory Efficient: SVMs only use a subset of training points (support vectors)
to define the decision boundary, which makes them memory efficient,
especially when dealing with large datasets.
3. Versatile Kernel Functions: SVMs can use different kernel functions to
transform input data into higher-dimensional feature spaces. This flexibility
allows SVMs to capture complex relationships between data points, making
them effective in handling non-linear classification tasks.
4. Robust to Overfitting: SVMs have regularization parameters that help control
the trade-off between maximizing the margin and minimizing classification
errors. This regularization makes SVMs robust to overfitting, especially in
cases of noisy or small datasets.
❖ DISADVANTAGES
1. Sensitivity to Noise and Outliers: SVMs are sensitive to noise and outliers in the
training data, as they aim to maximize the margin between classes. Outliers or
mislabeled instances can significantly affect the decision boundary and lead to
poor generalization performance.
2. Black Box Model: SVMs produce complex decision boundaries, especially in
higher-dimensional feature spaces or when using non-linear kernels. As a result,
they can be challenging to interpret and understand compared to simpler models
like decision trees or logistic regression.
3. Computationally Intensive for Large Datasets: Training an SVM on a large
dataset can be computationally intensive, especially when using non-linear
kernels or when the number of features is high. SVMs have a time complexity of
O(n^3) for training and O(n) for testing, where n is the number of training
instances.
4. Memory Intensive: SVMs store all support vectors in memory, which can be
memory-intensive for datasets with a large number of support vectors. This can
lead to scalability issues, particularly when dealing with datasets that cannot fit
into memory.
• APPLICATIONS
1. Text Classification:
• SVMs are widely used in natural language processing (NLP) for tasks such
as text classification, sentiment analysis, and spam detection.
• In text classification, SVMs can effectively classify documents into
different categories (e.g., news articles, emails) based on the presence of
certain keywords or features.
2. Image Recognition:
• SVMs are employed in computer vision applications for image recognition
and object detection tasks.
• SVMs can classify images into different categories (e.g., recognizing
handwritten digits, identifying objects in photographs) by learning
discriminative features from image data.
3. Bioinformatics:
• SVMs are utilized in bioinformatics for tasks such as protein classification,
gene expression analysis, and disease diagnosis.
• SVMs can analyze biological data (e.g., DNA sequences, gene expression
profiles) and classify samples into different classes (e.g., healthy vs.
diseased) based on relevant features.
4. Financial Forecasting:
• SVMs are applied in financial forecasting and stock market analysis to
predict stock price movements, identify trading signals, and detect
anomalies.
• SVMs can analyze financial data (e.g., stock prices, trading volumes) and
predict future trends or patterns, helping traders and investors make
informed decisions.
1. Initialization
The process of a genetic algorithm starts by generating the set of individuals, which is
called population. Here each individual is the solution for the given problem. An individual
contains or is characterized by a set of parameters called Genes. Genes are combined into
a string and generate chromosomes, which is the solution to the problem. One of the
most popular techniques for initialization is the use of random binary strings.
2. Fitness Assignment
Fitness function is used to determine how fit an individual is? It means the ability of an
individual to compete with other individuals. In every iteration, individuals are evaluated
based on their fitness function. The fitness function provides a fitness score to each
individual. This score further determines the probability of being selected for
reproduction. The high the fitness score, the more chances of getting selected for
reproduction.
3. Selection
The selection phase involves the selection of individuals for the reproduction of offspring.
All the selected individuals are then arranged in a pair of two to increase reproduction.
Then these individuals transfer their genes to the next generation.
4. Reproduction
After the selection process, the creation of a child occurs in the reproduction step. In this
step, the genetic algorithm uses two variation operators that are applied to the parent
population. The two operators involved in the reproduction phase are given below:
o Crossover: The crossover plays a most significant role in the reproduction phase
of the genetic algorithm. In this process, a crossover point is selected at random
within the genes. Then the crossover operator swaps genetic information of two
parents from the current generation to produce a new individual representing the
offspring
The genes of parents are exchanged among themselves until the crossover point is met.
These newly generated offspring are added to the population. This process is also called
or crossover. Types of crossover styles available:
o One point crossover
o Two-point crossover
o Livery crossover
o Inheritable Algorithms crossover
Mutation
The mutation operator inserts random genes in the offspring (new child) to maintain the
diversity in the population. It can be done by flipping some bits in the chromosomes.
Mutation helps in solving the issue of premature convergence and enhances
diversification. The below image shows the mutation process:
Types of mutation styles available,
o Flip bit mutation
o Gaussian mutation
o Exchange/Swap mutation
5. Termination
After the reproduction phase, a stopping criterion is applied as a base for termination.
The algorithm terminates after the threshold fitness solution is reached. It will identify the
final solution as the best solution in the population.
1) Randomly initialize populations p
❖ ADVANTAGES
1. Parallelism: Genetic algorithms can be parallelized easily, allowing multiple
candidate solutions to be evaluated simultaneously. This parallelism can lead to
faster convergence and better utilization of computational resources.
2. Iterative Improvement: Genetic Algorithms iteratively evolve a population of
candidate solutions over generations, gradually improving the quality of
solutions over time. This iterative process allows for continual refinement until
satisfactory solutions are found.
3. No Need for Derivative Information: Unlike some optimization methods that
rely on derivative information, Genetic Algorithms do not require such
information. This property makes them suitable for problems where derivative
information is unavailable, noisy, or computationally expensive to obtain.
4. Versatility: Genetic Algorithms can optimize various types of problems,
including discrete functions, multi-objective problems, and continuous functions.
This versatility makes them applicable to a wide range of optimization tasks
across different domains.
❖ DISADVANTAGES
1. Inefficiency for Simple Problems: Genetic Algorithms may not be the most
efficient choice for solving straightforward problems with simple solution spaces.
Their exploration of the solution space and iterative process may be overkill for
problems that have known, straightforward solutions.
2. No Guarantee of Quality Solution: Genetic Algorithms do not guarantee
finding the optimal or best solution to a problem. Due to their stochastic nature
and reliance on probabilistic mechanisms, there is no assurance that the final
solution obtained will be the best possible solution. The quality of the solution
depends on factors such as population size, genetic operators, and termination
criteria.
3. Repetitive Fitness Evaluations: Genetic Algorithms require evaluating the
fitness of individuals in the population repeatedly, which can be computationally
expensive for complex fitness functions or large populations. This repetitive
calculation of fitness values can lead to computational challenges, especially
when dealing with resource-intensive fitness evaluations or when optimizing
real-time systems.
4. Domain Knowledge Required: Genetic algorithms often require domain
knowledge and problem-specific expertise to design effective fitness functions,
choose appropriate genetic operators, and tune algorithm parameters. Lack of
domain knowledge may lead to suboptimal solutions or inefficient algorithm
configurations.
❖ APPLICATIONS
1. Traffic Optimization:
• Genetic algorithms can optimize traffic flow in cities by adjusting
traffic light timings.
• The algorithm considers factors like traffic volume, congestion, and
pedestrian crossings to find the most efficient timing patterns for traffic
lights, reducing congestion and travel times.
2. Design Optimization:
• Engineers use genetic algorithms to optimize the design of structures,
such as bridges or buildings.
• By defining design parameters like material strength, weight, and cost,
the algorithm searches for the best combination of these parameters to
create an optimal design that meets all requirements.
3. Financial Portfolio Optimization:
• Investors use genetic algorithms to optimize their investment
portfolios.
• The algorithm considers factors like risk tolerance, expected returns,
and correlation between assets to construct a diversified portfolio that
maximizes returns while minimizing risk.
4. Robotics and Automation:
• Genetic algorithms are used in robotics to optimize the movement and
behavior of robots.
• By defining objectives like task completion time, energy efficiency, and
obstacle avoidance, the algorithm evolves robot control strategies to
perform tasks efficiently and adapt to changing environments.
1. Rule Base
Rule Base is a component used for storing the set of rules and the If-Then conditions
given by the experts are used for controlling the decision-making systems.
2. Fuzzification
Fuzzification is a module or component for transforming the system inputs, i.e., it
converts the crisp number into fuzzy steps. The crisp numbers are those inputs which are
measured by the sensors and then fuzzification passed them into the control systems for
further processing. This component divides the input signals into following five states in
any Fuzzy Logic system:
4. Defuzzification
Defuzzification is a module or component, which takes the fuzzy set inputs generated
by the Inference Engine, and then transforms them into a crisp value. It is the last step
in the process of a fuzzy logic system. The crisp value is a type of value which is acceptable
by the user. Various techniques are present to do this, but the user has to select the best
one for reducing the errors.
1. Define Fuzzy Sets: Imagine categories that aren't strict yes/no. For example, a
"tall" person might be anyone above 5'10", but someone 5'9" could be "kind of
tall" too. You define fuzzy sets with a range and a "degree of membership"
(between 0 and 1) for each element.
2. Input & Fuzzification: When using the algorithm, you take an input (like a
person's height) and figure out how much it belongs to each fuzzy set ("short,"
"kind of short," "tall," etc.).
3. Rules & Processing: You define rules based on fuzzy sets (e.g., "If someone is kind
of tall and strong, they might be good at basketball"). The algorithm processes the
input through these fuzzy rules.
4. Defuzzification: Finally, the algorithm combines the results and gives a fuzzy
output (e.g., there's a 70% chance this person is "decent" at basketball based on
the fuzzy rules).
5. Fuzzy logics are not suitable for those problems that require high accuracy
6. Difficulty in Formalization: Formalizing fuzzy concepts into precise mathematical
terms can be difficult. Determining appropriate membership functions and fuzzy rules
often relies on empirical knowledge or trial and error, which can be time-consuming
and uncertain.
❖Applications:
1. Control Systems: Fuzzy logic is widely used in control systems, such as in household
appliances, automotive systems, and industrial processes, where precise control based
on imprecise inputs is necessary.
2. Pattern Recognition: Fuzzy sets are applied in pattern recognition tasks where data
might be ambiguous or noisy, such as in image processing, speech recognition, and
handwriting recognition.
3. Decision Making: Fuzzy sets are used in decision-making processes, particularly in
situations where criteria are subjective or uncertain, such as in financial analysis, risk
assessment, and medical diagnosis.
4. Natural Language Processing: Fuzzy sets play a role in natural language processing
tasks, like information retrieval, sentiment analysis, and text summarization, where the
meaning of words and phrases can be vague or context-dependent.
1. You have a basket of apples (objects) with attributes like color (red, green)
and ripeness (ripe, unripe).
2. The algorithm groups them by color (red apples, green apples).
3. But some red apples might be unripe (uncertain outcome).
4. It then defines a core of clearly ripe red apples and a boundary of uncertain
red apples.
5. Finally, it might create a rule: "If an apple is red, it's likely ripe" (considering
the core).
❖ ALGORITHM:
1) Data Organization: You have data with objects (things you're analyzing) and
attributes (characteristics of those objects). These attributes can be anything
from color (red, blue) to size (small, large).
2) Grouping by Similarities: The algorithm groups objects based on their shared
attributes. This grouping helps identify patterns and potential relationships.
3) Defining "Roughness": Not all objects in a group might have the same outcome
(decision). The algorithm creates two areas: a definite core (objects with clear
outcomes) and a boundary region (objects with uncertain outcomes).
4) Extracting Rules: By analyzing the core and boundaries, the algorithm can
identify "if-then" rules that connect specific attributes to certain outcomes.
These rules can be used for predictions despite some data uncertainty.
❖ Advantages:
1. Handles Uncertainty: Rough sets can handle situations where data has missing
values or isn't perfectly clear-cut. It works well with real-world data that often
isn't pristine.
2. Reduces Data Dimensionality: By identifying irrelevant information, rough
sets can simplify complex datasets. This makes analysis faster and helps focus
on the key factors.
3. Knowledge Discovery: Rough sets can help uncover hidden patterns and
relationships within data. This can be useful for decision making and
prediction.
4. Easy to Understand: Compared to some complex data analysis techniques,
rough set theory has a relatively intuitive framework.
❖ Disadvantages:
1. Computational Complexity: While simpler than some algorithms, rough set
computations can become demanding with very large datasets.
2. Limited to Categorical Data: Rough sets primarily work with categorical data
(e.g., colors, sizes) and might not be ideal for continuous data (e.g.,
temperature, weight).
3. Interpretation Challenges: Interpreting the results of rough set analysis can
require some expertise in the specific application domain.
4. May Not Capture All Uncertainties: While rough sets handle some types of
uncertainty, they might not be suitable for all situations with complex data
inconsistencies.
❖ Applications:
1. Feature Selection: Rough sets can help identify the most relevant features
(variables) in a dataset for tasks like classification or prediction. This
improves model performance and reduces training time.
2. Medical Diagnosis: By analyzing patient data, rough sets can aid in medical
diagnosis by identifying patterns and relationships between symptoms and
diseases.
3. Decision Support Systems: Rough sets can be integrated into systems that
recommend courses of action based on learned patterns from past data. This
can be helpful in various domains like finance, marketing, and engineering.
4. Fault Detection: In manufacturing and other technical fields, rough sets can
be used to analyze sensor data and identify potential equipment failures before
they occur.
4.7 K- Nearest neighbor classifier
1. K-Nearest Neighbor is one of the simplest Machine Learning algorithms based
on Supervised Learning technique.
2. K-NN algorithm assumes the similarity between the new case/data and available
cases and put the new case into the category that is most similar to the available
categories.
3. K-NN algorithm can be used for Regression as well as for Classification but
mostly it is used for the Classification problems.
4. It is also called a lazy learner algorithm because it does not learn from the
training set immediately instead it stores the dataset and at the time of
classification, it performs an action on the dataset.
5. KNN algorithm at the training phase just stores the dataset and when it gets new
data, then it classifies that data into a category that is much similar to the new
data.
6. When given an unknown tuple, a k-nearest-neighbor classifier searches the
pattern space for the k training tuples that are closest to the unknown tuple.
7. “Closeness” is defined in terms of a distance metric, such as Euclidean distance
8. For k-nearest-neighbor classification, the unknown tuple is assigned the most
common class among its k-nearest neighbors. When k = 1, the unknown tuple
is assigned the class of the training tuple that is closest to it in pattern space.
9. The K-NN algorithm works by finding the K nearest neighbors to a given data
point based on a distance metric, such as Euclidean distance. The class or value
of the data point is then determined by the majority vote or average of the K
neighbors. This approach allows the algorithm to adapt to different patterns and
make predictions based on the local structure of the data.
How does K-NN work?
The K-NN working can be explained on the basis of the below algorithm:
❖ Advantages:
1. Simplicity: KNN is incredibly easy to understand and implement. It doesn't involve
complex model training, making it accessible to beginners.
2. Interpretability: The predictions made by KNN are easy to interpret. You can see which
neighbors influenced the outcome, providing insights into the decision process.
3. Robust to Noise: KNN can handle noisy data to some extent, as outliers might not
significantly impact the final prediction if enough neighbors are considered.
4. Adapts to New Data: Since KNN doesn't require explicit training, it can seamlessly
incorporate new data points without retraining the entire model. This is beneficial for
situations where data keeps evolving.
❖ Disadvantages:
1. Curse of Dimensionality: KNN's performance can suffer in high-dimensional data (many
features). Calculating distances in a space with many dimensions becomes inefficient.
2. High Computational Cost: For large datasets, KNN can be computationally expensive during
prediction. It needs to calculate distances to all data points for each new prediction.
3. Sensitive to Choice of K: Choosing the optimal value for K (number of neighbors) is crucial
for KNN's effectiveness. A poor choice of K can lead to overfitting or underfitting.
4. Data Storage Requirements: KNN needs to store the entire training dataset for prediction.
This can be a challenge for very large datasets.
❖ Applications: