0% found this document useful (0 votes)

28 views19 pages

ida unit-4

Uploaded by

Lakshmi Ganji

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

28 views19 pages

ida unit-4

Uploaded by

Lakshmi Ganji

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 19

UNIT – IV

OBJECT SEGMENTION
Object Segmentation: Regression Vs Segmentation – Supervised and Unsupervised
Learning, Tree Building – Regression, Classification, over fitting, Pruning and
Complexity, Multiple Decision Trees etc.

Time Series Methods: Arima, Measures of Forecast Accuracy, STL approach, Extract
features from generated model as Height, Average Energy etc and Analyze for
prediction

Object segmentation

Object segmentation in data analytics involves using techniques from computer vision and machine
learning to analyze images and extract information about the objects within them. The goal is to automatically
identify and isolate individual objects or regions of interest within an image or video.

Object segmentation can be applied to a wide range of industries and use cases. For example, in healthcare,
it can be used to analyze medical images and identify abnormalities or lesions in the body. In agriculture, it can be
used to identify and track the growth of crops and monitor the health of plants. In manufacturing, it can be used to
detect defects in products and ensure quality control.

There are several techniques that can be used for object segmentation in data analytics. Traditional
computer vision methods include thresholding, edge detection, and region growing. More advanced techniques use
machine learning algorithms such as convolutional neural networks (CNNs) and deep learning frameworks like
Mask R-CNN.
In CNN-based object segmentation, the neural network is trained on a large dataset of labeled images,
where the objects of interest are annotated with bounding boxes or pixel-level masks. The network learns to detect
and segment objects based on features learned from the training data.

Mask R-CNN, on the other hand, is a deep learning framework that combines object detection and instance
segmentation. This allows it to detect and classify multiple objects within an image and provide a precise
segmentation mask for each object.

Overall, object segmentation in data analytics can help organizations extract valuable insights and
information from visual data, leading to improved decision-making and better outcomes.

Regression and segmentation

Regression and segmentation are two different techniques used in data analytics for different types of data
analysis tasks.

Regression analysis is a statistical technique used to model the relationship between a dependent variable
and one or more independent variables. The goal of regression analysis is to identify the nature and strength of the
relationship between the variables, and to make predictions about the dependent variable based on the values of the
independent variables.

Regression can be used for a variety of tasks, including predicting sales or revenue based on marketing
spend, estimating the impact of a particular factor on customer satisfaction, or forecasting future trends based on
historical data.

Segmentation, on the other hand, involves dividing a larger dataset into smaller groups or segments based
on similarities or differences in the data. The goal of segmentation is to identify patterns and relationships within the
data that may not be apparent when analyzing the dataset as a whole.

Segmentation can be used for a variety of tasks, including market research, customer profiling, and product
development. For example, a company may use segmentation to identify different customer groups based on
demographic or behavioral factors, and then tailor their marketing strategies to each group.

While regression and segmentation are both important techniques in data analytics, they are used for
different types of analysis tasks. Regression is used when analyzing the relationship between one or more dependent
variables and independent variables, while segmentation is used when analyzing patterns and relationships within a
larger dataset.

Supervised and Unsupervised Learning

Supervised learning
Supervised learning, as the name indicates, has the presence of a supervisor as a teacher. Basically
supervised learning is when we teach or train the machine using data that is well-labelled. Which means some data
is already tagged with the correct answer. After that, the machine is provided with a new set of examples (data) so
that the supervised learning algorithm analyses the training data (set of training examples) and produces a correct
outcome from labeled data.

For instance, suppose you are given a basket filled with different kinds of fruits. Now the first step is to
train the machine with all the different fruits one by one like this
If the shape of the object is rounded and has a depression at the top, is red in color, then it will be labeled as
–Apple.

If the shape of the object is a long curving cylinder having Green-Yellow color, then it will be labeled as –
Banana.

Now suppose after training the data, you have given a new separate fruit, say Banana from the basket, and
asked to identify it.

Since the machine has already learned the things from previous data and this time has to use it wisely. It
will first classify the fruit with its shape and color and would confirm the fruit name as BANANA and put it in the
Banana category. Thus the machine learns the things from training data(basket containing fruits) and then applies
the knowledge to test data(new fruit).

Supervised learning is classified into two categories of algorithms:

 Classification: A classification problem is when the output variable is a category, such as “Red” or “blue” ,
“disease” or “no disease”.
 Regression: A regression problem is when the output variable is a real value, such as “dollars” or “weight”.

Supervised learning deals with or learns with “labeled” data. This implies that some data is already tagged
with the correct answer.

Types:-

 Regression
 Logistic Regression
 Classification
 Naive Bayes Classifiers
 K-NN (k nearest neighbors)
 Decision Trees
 Support Vector Machine

Advantages:-

 Supervised learning allows collecting data and produces data output from previous experiences.
 Helps to optimize performance criteria with the help of experience.
 Supervised machine learning helps to solve various types of real-world computation problems.
 It performs classification and regression tasks.
 It allows estimating or mapping the result to a new sample.
 We have complete control over choosing the number of classes we want in the training data.

Disadvantages:-

 Classifying big data can be challenging.

 Training for supervised learning needs a lot of computation time. So, it requires a lot of time.
 Supervised learning cannot handle all complex tasks in Machine Learning.
 Computation time is vast for supervised learning.
 It requires a labelled data set.
 It requires a training process.

Unsupervised learning
Unsupervised learning is the training of a machine using information that is neither classified nor labeled
and allowing the algorithm to act on that information without guidance. Here the task of the machine is to group
unsorted information according to similarities, patterns, and differences without any prior training of data.

Unlike supervised learning, no teacher is provided that means no training will be given to the machine.
Therefore the machine is restricted to find the hidden structure in unlabeled data by itself.
For instance, suppose it is given an image having both dogs and cats which it has never seen.

Thus the machine has no idea about the features of dogs and cats so we can’t categorize it as ‘dogs and cats
‘. But it can categorize them according to their similarities, patterns, and differences, i.e., we can easily categorize
the above picture into two parts. The first may contain all pics having dogs in them and the second part may contain
all pics having cats in them. Here you didn’t learn anything before, which means no training data or examples.

It allows the model to work on its own to discover patterns and information that was previously undetected.
It mainly deals with unlabelled data.

Unsupervised learning is classified into two categories of algorithms:

 Clustering: A clustering problem is where you want to discover the inherent groupings in the data, such as
grouping customers by purchasing behavior.
 Association: An association rule learning problem is where you want to discover rules that describe large
portions of your data, such as people that buy X also tend to buy Y.

Types of Unsupervised Learning:-

Clustering

1. Exclusive (partitioning)
2. Agglomerative
3. Overlapping
4. Probabilistic

Clustering Types:-

1. Hierarchical clustering
2. K-means clustering
3. Principal Component Analysis
4. Singular Value Decomposition
5. Independent Component Analysis

Supervised vs. Unsupervised Machine Learning:

Unsupervised machine
Parameters Supervised machine learning
learning

Algorithms are trained using labeled Algorithms are used against data that
Input Data
data. is not labeled

Computational
Simpler method Computationally complex
Complexity

Accuracy Highly accurate Less accurate

No. of classes No. of classes is known No. of classes is not known

Uses real-time analysis of

Data Analysis Uses offline analysis
data

Algorithms used Linear and Logistics regression, Random K-Means clustering, Hierarchical
forest,
clustering,
Support Vector Machine, Neural
Apriori algorithm, etc.
Network, etc.

Output Desired output is given. Desired output is not given.

Training data Use training data to infer model. No training data is used.

It is not possible to learn larger It is possible to learn larger

Complex model and more complex models than with and more complex models with
supervised learning. unsupervised learning.

Model We can test our model. We can not test our model.

Supervised learning is also called Unsupervised learning is also called

Called as
classification. clustering.

Example: Find a face in an

Example Example: Optical character recognition.
image.

Advantages of unsupervised learning:

 It does not require training data to be labeled.

 Dimensionality reduction can be easily accomplished using unsupervised learning.
 Capable of finding previously unknown patterns in data.
 Flexibility: Unsupervised learning is flexible in that it can be applied to a wide variety of problems,
including clustering, anomaly detection, and association rule mining.
 Exploration: Unsupervised learning allows for the exploration of data and the discovery of novel and
potentially useful patterns that may not be apparent from the outset.
 Low cost: Unsupervised learning is often less expensive than supervised learning because it doesn’t require
labeled data, which can be time-consuming and costly to obtain.

Disadvantages of unsupervised learning :

 Difficult to measure accuracy or effectiveness due to lack of predefined answers during training.
 The results often have lesser accuracy.
 The user needs to spend time interpreting and label the classes which follow that classification.
 Lack of guidance: Unsupervised learning lacks the guidance and feedback provided by labeled data, which
can make it difficult to know whether the discovered patterns are relevant or useful.
 Sensitivity to data quality: Unsupervised learning can be sensitive to data quality, including missing
values, outliers, and noisy data.
 Scalability: Unsupervised learning can be computationally expensive, particularly for large datasets or
complex algorithms, which can limit its scalability.
Difference b/w Supervised and Unsupervised Learning :
SUPERVISED LEARNING UNSUPERVISED LEARNING

Uses Known and Labeled Data as

Input Data input Uses Unknown Data as input

Computational
Complexity Less Computational Complexity More Computational Complex

Real Time Uses off-line analysis Uses Real Time Analysis of Data

Number of
Classes Number of Classes are known Number of Classes are not known

Accuracy of Moderate Accurate and Reliable

Results Accurate and Reliable Results Results

Output data Desired output is given. Desired output is not given.

In supervised learning it is not In unsupervised learning it is possible

possible to learn larger and more to learn larger and more complex
complex models than with models than with unsupervised
Model supervised learning learning

In supervised learning training In unsupervised learning training data

Training data data is used to infer model is not used.

Supervised learning is also called Unsupervised learning is also called

Another name classification. clustering.

Test of model We can test our model. We can not test our model.

Example Optical Character Recognition Find a face in an image.

What Are Regression Trees ?

. A regression tree is basically a decision tree that is used for the task of regression which
can be used to predict continuous valued outputs instead of discrete outputs.

Mean Square Error

In Decision Trees for Classification, we saw how the tree asks right questions at the right
node in order to give accurate and efficient classifications. The way this is done in Classification
Trees is by using 2 measures , namely Entropy and Information Gain. But since we are
predicting continuous variables, we cannot calculate the entropy and go through the same
process. We need a different measure now. A measure that tells us how much our predictions
deviate from the original target and that’s the entry-point of mean square error.

TREE BUILDING

In regression, tree building refers to the construction of a decision tree model to predict
continuous numerical values. The process involves dividing the feature space into distinct
regions or segments, where each segment corresponds to a specific predicted value.

Decision Tree is one of the most commonly used, practical approaches for supervised
learning. It can be used to solve both Regression and Classification tasks with the latter being put
more into practical application.

It is a tree-structured classifier with three types of nodes. The Root Node is the initial
node which represents the entire sample and may get split further into further nodes.
The Interior Nodes represent the features of a data set and the branches represent the decision
rules. Finally, the Leaf Nodes represent the outcome. This algorithm is very useful for solving
decision-related problems.

With a particular data point, it is run completely through the entirely tree by
answering True/False questions till it reaches the leaf node. The final prediction is the average of
the value of the dependent variable in that particular leaf node. Through multiple iterations, the
Tree is able to predict a proper value for the data point.
Here's a general overview of the tree building process in regression:

1. Data Preparation: Start with a dataset that includes input features (independent variables)
and corresponding target values (dependent variable). Ensure the data is cleaned and
preprocessed, handling missing values and outliers as needed.
2. Splitting Criteria: Choose a splitting criterion that determines how to divide the data at
each node of the decision tree. Popular splitting criteria for regression include mean
squared error (MSE), mean absolute error (MAE), or variance reduction.
3. Root Node: Begin by creating the root node of the tree, considering all the available
features and the corresponding target values.
4. Recursive Splitting: At each internal node, the algorithm evaluates different feature and
threshold combinations to find the best split that minimizes the chosen splitting criterion.
The goal is to create child nodes that capture distinct subsets of the data with more
homogeneous target values.
5. Terminal Nodes: The splitting process continues until certain stopping conditions are
met. These conditions can include a predefined maximum depth of the tree, a minimum
number of samples required to split a node, or a minimum improvement in the chosen
splitting criterion.
6. Prediction: Once the tree is built, each leaf node represents a segment of the feature
space. The target value assigned to a leaf node is typically the average (or median) of the
target values within that segment.
7. Prediction for New Data: To predict the target value for new data, traverse the decision
tree by evaluating the input features at each node and following the appropriate branch
until reaching a leaf node. The predicted value is then based on the target value
associated with that leaf node.

It's worth mentioning that decision trees can suffer from overfitting, especially when they
are allowed to grow deeply. Techniques like pruning, ensemble methods (e.g., random forests,
gradient boosting), or regularization can be applied to mitigate overfitting and improve the
performance of the regression tree model.

Tree building in classification

Tree building in classification refers to the process of constructing a decision tree model
to classify data into discrete categories or classes. Decision trees are a popular machine learning
algorithm for classification tasks due to their interpretability and ability to handle both
categorical and numerical features.

Here's a general overview of the tree building process in classification:

1. Data Preparation: Start with a labeled dataset where each data point has a set of features
(independent variables) and a corresponding class label (dependent variable). Ensure the
data is cleaned, preprocessed, and encoded appropriately, handling missing values and
categorical variables as needed.
2. Splitting Criteria: Choose a splitting criterion that determines how to divide the data at
each node of the decision tree. Common splitting criteria for classification include Gini
impurity and information gain (entropy).
3. Root Node: Begin by creating the root node of the tree, considering all the available
features and their corresponding class labels.
4. Recursive Splitting: At each internal node, the algorithm evaluates different feature and
threshold combinations to find the best split that maximizes the chosen splitting criterion.
The goal is to create child nodes that separate the data into subsets with more
homogeneous class labels.
5. Terminal Nodes: The splitting process continues until certain stopping conditions are
met. These conditions can include a predefined maximum depth of the tree, a minimum
number of samples required to split a node, or a minimum improvement in the chosen
splitting criterion. At the end of the process, the tree will have leaf nodes that represent
the predicted class labels.
6. Prediction: To classify new data, traverse the decision tree by evaluating the input
features at each node and following the appropriate branch until reaching a leaf node. The
predicted class label is then based on the majority class of the training samples associated
with that leaf node.
7. Pruning and Regularization: Decision trees can suffer from overfitting, where they
become too complex and tailored to the training data, leading to poor generalization.
Techniques like pruning, which remove or collapse nodes in the tree, and regularization
methods can be applied to reduce overfitting and improve the performance of the
classification tree model.

It's important to note that decision trees are prone to high variance and can be sensitive to
small changes in the data. Ensemble methods such as random forests and gradient boosting are
commonly used to mitigate these issues and improve the accuracy and stability of the
classification models.

OVERFITTING

Overfitting is a common issue that can occur in decision tree models, where the model
becomes too complex and captures noise or irrelevant patterns in the training data, leading to
poor generalization on unseen data. Overfitting in decision trees can result in high variance and
decreased performance.

Above figures represents overfitting in regression

Above figure represents overfitting in classification

Here are some reasons why decision trees can be prone to overfitting and techniques to
mitigate this problem:

1. Tree Depth: Decision trees with excessive depth tend to overfit the training data, as they
can memorize specific instances or noise. Setting a maximum depth or limiting the
number of nodes can help control the complexity of the tree.
2. Minimum Samples for Split: Setting a minimum number of samples required to split a
node can prevent the tree from creating overly specific and noise-sensitive splits. It
ensures that a node must have enough representative samples before further division.
3. Pruning: Pruning is a technique used to remove or collapse nodes from a decision tree to
simplify it and reduce overfitting. Post-pruning, also known as backward pruning,
involves pruning nodes after the tree has been fully grown. Pruning can be based on
metrics such as the reduction in impurity or the decrease in error on validation data.
4. Feature Selection: Decision trees can be sensitive to irrelevant or noisy features. Careful
feature selection or feature engineering can help eliminate or reduce the impact of such
features, leading to a more robust and less overfit model.
5. Ensemble Methods: Ensemble methods, such as Random Forests and Gradient Boosting,
can help mitigate overfitting in decision trees. These methods involve combining multiple
decision trees to make predictions, which helps reduce the individual tree's overfitting
and improve overall performance and generalization.
6. Regularization: Regularization techniques can be applied to decision trees to prevent
overfitting. One such technique is to introduce constraints on the tree structure, such as
limiting the number of leaf nodes, restricting the depth, or imposing a penalty on the
complexity of the tree during training.
7. Cross-Validation: Proper evaluation of the model using cross-validation techniques can
help detect and mitigate overfitting. Cross-validation provides a more robust estimate of
the model's performance by assessing its generalization ability on multiple subsets of the
data.

By employing these strategies, it is possible to reduce overfitting in decision trees and

build more reliable and generalizable models.

PRUNING AND COMPLEXITY

Pruning and controlling the complexity of decision trees are important techniques to
prevent overfitting and improve the generalization ability of the model. Pruning involves
removing or collapsing nodes from the tree, simplifying its structure and reducing overfitting.
Here's a closer look at pruning and managing complexity in decision trees:
1. Pre-Pruning:
 Maximum Depth: Set a maximum depth for the tree, limiting the number of levels
or splits. This prevents the tree from growing too deep and capturing noise or
irrelevant patterns.
 Minimum Samples for Split: Define a minimum number of samples required to
split a node. Nodes with fewer samples than the specified threshold are not split
further, avoiding the creation of very specific and noise-sensitive splits.
 Minimum Samples per Leaf: Specify a minimum number of samples required to
be present in a leaf node. If a potential split would result in a leaf node with fewer
samples, the split is not performed, preventing overly complex and overfitted
branches.
2. Post-Pruning:
 Reduced-Error Pruning: Start with a fully grown decision tree and iteratively
evaluate the impact of removing or replacing nodes on a validation dataset. If
removing a node leads to an improvement in performance, prune it. This process
continues until further pruning results in performance degradation.
 Cost-Complexity Pruning: Assign a cost or complexity parameter to each node in
the tree based on its impurity or error rate. By iteratively evaluating the impact of
removing nodes with the lowest cost-complexity ratio, a sequence of trees with
different complexities is generated. The optimal complexity is determined by
cross-validation or using a separate validation dataset.
3. Complexity Parameters:
 Tree Depth: Limiting the depth of the tree controls its complexity. A shallower
tree is less likely to overfit and may generalize better.
Minimum Impurity Decrease: Define a threshold for the minimum impurity
decrease required for a split to occur. This prevents splits that do not contribute
significantly to improving the overall purity or classification accuracy.
 Maximum Leaf Nodes: Set a maximum number of leaf nodes in the tree, which
indirectly controls the complexity. A smaller number of leaf nodes results in a
simpler tree.
4. Ensemble Methods:
 Random Forests: Random Forests combine multiple decision trees by training
them on different subsets of the data and averaging their predictions. The
randomness in feature selection and training data reduces overfitting.
 Gradient Boosting: Gradient Boosting trains decision trees sequentially, with each
subsequent tree correcting the mistakes of the previous ones. Regularization
techniques, such as shrinkage or learning rate, help manage complexity.

By using pruning techniques and controlling the complexity of decision trees, you can
strike a balance between capturing useful patterns in the data and avoiding overfitting, resulting
in more reliable and generalizable models.

Using multiple decision trees in object segmentation is a common approach to improve

the accuracy and robustness of the segmentation task. One popular ensemble method for object
segmentation is the Random Forest-based segmentation.

Here's an overview of how multiple decision trees can be used for object segmentation:

1. Training Data Preparation: Collect a labeled dataset consisting of input images and
corresponding ground truth segmentation masks. Each pixel or region in the image
should be labeled as either part of the object or background.
2. Feature Extraction: Extract relevant features from the input images to represent the pixel
or region characteristics. These features can include color, texture, gradient, or other
image descriptors.
3. Random Forest Training:
 Construct a Random Forest ensemble by training multiple decision trees using the
labeled training dataset.
 Each decision tree is trained using a random subset of the training data and a
random subset of the features.
 At each pixel or region, the decision trees learn to predict the class label (object or
background) based on the extracted features.
4. Inference:
 For a new input image, pass each pixel or region through the ensemble of decision
trees.
 Each decision tree provides a prediction for the class label of the pixel or region.
 The final segmentation result is obtained by aggregating the predictions from all
the decision trees. For example, voting or averaging can be used to determine the
final class label.

Using multiple decision trees in object segmentation offers several advantages:

 Robustness: The ensemble can handle complex object boundaries and variations in the
input images, improving robustness to noise and occlusions.
 Accuracy: By combining predictions from multiple decision trees, the segmentation
accuracy can be enhanced compared to using a single decision tree.
 Efficiency: The decision trees can be trained and evaluated in parallel, making the
approach computationally efficient.

Additionally, other ensemble methods, such as Gradient Boosting or Convolutional

Neural Networks (CNNs), can also be utilized for object segmentation, providing further
improvements in accuracy and performance.

TIME SERIES METHOD

Time series methods are specifically designed to analyze and forecast data that exhibits a
temporal or sequential order. These methods take into account the dependence and patterns
present in time-dependent data. Here are some commonly used time series methods:

1. Moving Averages (MA): Moving averages smooth out the irregularities in a time series
by calculating the average of a sliding window of consecutive data points. They are
useful for identifying trends and removing short-term fluctuations or noise.
2. Autoregressive (AR) Models: AR models assume that the value of a variable in the time
series is linearly dependent on its previous values. An AR(p) model considers the
previous p values to predict the current value. The model coefficients are estimated using
techniques like the Yule-Walker equations or maximum likelihood estimation.
3. Moving Average with Exogenous Inputs (ARMAX): ARMAX models extend the
autoregressive model by incorporating exogenous variables that may influence the time
series. These models are useful when the variable of interest is affected by external
factors.
4. Autoregressive Integrated Moving Average (ARIMA): ARIMA models combine the
autoregressive (AR) and moving average (MA) components with differencing to handle
non-stationary time series. The differencing step removes trends and seasonality, making
the time series stationary before applying the AR and MA components.
5. Seasonal ARIMA (SARIMA): SARIMA models are an extension of ARIMA models that
can handle seasonal patterns in the data. They incorporate additional seasonal
components to capture recurring patterns within a given season.
6. Exponential Smoothing: Exponential smoothing methods forecast future values by
assigning exponentially decreasing weights to past observations. This includes simple
exponential smoothing (SES), Holt's linear exponential smoothing, and Holt-Winters'
seasonal exponential smoothing for data with trend and/or seasonality.
7. Vector Autoregression (VAR): VAR models are used when analyzing multiple time
series variables that influence each other. VAR models capture the interdependencies
among variables and can be used for forecasting and understanding the dynamic
relationships between them.
8. State Space Models: State space models represent a time series as a combination of
unobserved (latent) states and observed outputs. They are widely used for modeling and
forecasting complex time series data and can handle various structures and dependencies.
9. Machine Learning Approaches: Machine learning algorithms, such as Support Vector
Machines (SVM), Random Forests, and Recurrent Neural Networks (RNN), can be
applied to time series analysis and forecasting. These models can capture complex
patterns and dependencies in the data but often require larger datasets and more
computational resources.

The choice of method depends on the characteristics of the time series, including trend,
seasonality, data volume, and the presence of exogenous variables. It is important to consider the
specific requirements and properties of the data when selecting an appropriate time series
method.

ARIMA (Autoregressive Integrated Moving Average) is a time series forecasting method and
is not directly applicable to object segmentation tasks. Object segmentation involves identifying
and delineating objects within an image or video, while ARIMA is used for modeling and
predicting values in a time series.

However, in certain cases, time series analysis techniques can be used as a pre-processing
step in object segmentation tasks. For example, if you have a video sequence of images and want
to segment objects based on their motion or temporal patterns, you can apply motion-based or
spatio-temporal methods that incorporate time series analysis.

Here are some ways in which time series analysis techniques can be used in object
segmentation:

1. Optical Flow: Optical flow methods estimate the motion vectors of pixels or regions
between consecutive frames in a video. By analyzing the temporal changes in pixel
intensity, optical flow can help identify object boundaries and track their movements over
time.
2. Temporal Smoothing: Time series smoothing techniques, such as moving averages or
exponential smoothing, can be applied to temporal sequences of pixel intensities to
reduce noise or short-term fluctuations. Smoothing can improve the accuracy of
subsequent segmentation algorithms that rely on stable and smooth intensity profiles.
3. Temporal Context: The temporal context of an object's appearance and motion can be
leveraged to enhance segmentation. By considering the evolution of an object's
appearance over time, you can incorporate temporal consistency constraints to refine
object boundaries or handle occlusions.
4. Temporal Segmentation: Time series clustering or segmentation algorithms can be used
to identify temporal segments or regions in a video that exhibit similar patterns. This can
be useful for segmenting objects with distinct temporal behaviors or activities.

It's important to note that while time series analysis techniques can complement object
segmentation, they are not the primary methods used for segmenting objects in images or videos.
Traditional object segmentation techniques, such as thresholding, region-based methods, or deep
learning-based approaches like semantic segmentation and instance segmentation, are more
commonly employed for accurate and robust object segmentation tasks.
Measures of forecast accuracy

Measures of forecast accuracy are used to assess the performance and reliability of
forecasting models. These measures help quantify the accuracy of predictions by comparing the
forecasted values to the actual values. Here are some commonly used measures of forecast
accuracy:

1. Mean Absolute Error (MAE): MAE calculates the average absolute difference between
the forecasted values and the actual values. It measures the average magnitude of errors
without considering their direction, making it a robust measure. The formula for MAE is:

MAE = (1/n) * Σ|Actual - Forecast|

2. Mean Squared Error (MSE): MSE calculates the average of the squared differences
between the forecasted values and the actual values. It penalizes larger errors more than
MAE and is commonly used in many forecasting applications. The formula for MSE is:

MSE = (1/n) * Σ(Actual - Forecast)^2

3. Root Mean Squared Error (RMSE): RMSE is the square root of MSE and provides an
interpretable measure in the same units as the original data. It is widely used and is
helpful in understanding the average size of errors. The formula for RMSE is:

RMSE = sqrt(MSE)

4. Mean Absolute Percentage Error (MAPE): MAPE measures the average percentage
difference between the forecasted values and the actual values. It is useful when the scale
of the data varies widely and provides a relative measure of accuracy. The formula for
MAPE is:

MAPE = (1/n) * Σ(|(Actual - Forecast) / Actual|) * 100

5. Symmetric Mean Absolute Percentage Error (SMAPE): SMAPE is another measure of

percentage error, but it addresses some of the limitations of MAPE by using the average
of the absolute differences relative to the sum of the actual and forecast values. The
formula for SMAPE is:

SMAPE = (1/n) * Σ(|Actual - Forecast| / ((|Actual| + |Forecast|) / 2)) * 100

These measures provide different perspectives on forecast accuracy and have their
strengths and weaknesses. Some measures are more sensitive to outliers or extreme errors, while
others focus on the overall magnitude of errors. It is important to choose the appropriate measure
based on the specific requirements and characteristics of the forecasting task.

The STL (Seasonal and Trend decomposition using Loess)

this approach is a popular time series decomposition method that separates a time series
into three components: seasonality, trend, and residual. It is particularly useful when dealing with
time series that exhibit both trend and seasonal patterns.

Here's an overview of the STL approach:

1. Seasonal Component: The STL approach first estimates the seasonal component of the
time series. It does this by applying a locally weighted regression (loess) to smooth out
the series and identify the underlying seasonal patterns. The loess method fits a smooth
curve to the data, giving more weight to nearby observations. The seasonal component
represents the repeating patterns at fixed intervals, such as daily, weekly, or yearly
cycles.
2. Trend Component: After extracting the seasonal component, the STL approach estimates
the trend component. This is done by applying another loess smoothing, but this time on
the detrended series. The detrended series is obtained by removing the seasonal
component from the original time series. The trend component captures the long-term
direction or tendency of the time series.
3. Residual Component: The residual component represents the remaining variation in the
time series after removing the seasonal and trend components. It contains the irregular or
unpredictable fluctuations that are not accounted for by the seasonal and trend patterns.
The residual component can be further analyzed for identifying any remaining patterns or
to model the random component of the series.

The STL approach has several advantages:

 It provides a robust and flexible method for decomposing time series data into
interpretable components.
 It handles irregularities, outliers, and non-linear patterns in the data effectively.
 It allows for separate analysis and modeling of the individual components, enabling
better understanding and forecasting of the time series.

Once the time series is decomposed using STL, the individual components can be
analyzed, modeled, and forecasted independently. For example, forecasting can be done
separately for the trend and seasonal components, and then combined to obtain the forecast for
the original series.

STL is widely used in various fields, including finance, economics, and environmental
studies, to analyze and forecast time series data. It provides a valuable tool for understanding the
underlying patterns and extracting meaningful information from complex time series.

extract features from generated model as height,average energy etc and

analyze for prediction

To extract features from a generated model, such as a decision tree or random forest, as
height and average energy, and analyze them for prediction, you can follow these steps:
1. Traverse the Model: Start at the root node of the decision tree or the ensemble of trees in
the random forest. Traverse the model by following the branching logic based on the
splitting criteria until you reach the leaf nodes.
2. Track Depth or Height: As you traverse the model, keep track of the depth or height of
each node you visit. The depth represents the number of levels from the root node to the
current node. Store the depth for each feature node.
3. Calculate Average Energy: At each leaf node, calculate the average energy of the training
samples associated with that node. The energy measure can be specific to your domain or
problem. It could be a statistical measure like mean, variance, or any other relevant
metric that captures the energy or magnitude of the samples associated with that leaf
node.
4. Extract Features: Extract the features you are interested in, such as feature names or
indices, depth or height, and average energy, for each feature node in the model.
5. Analyze for Prediction: Once you have extracted the features, you can analyze them for
prediction purposes. Here are some possible analyses:
 Feature Importance: Analyze the relationship between the extracted features and
the target variable. You can examine the importance of each feature based on their
heights, average energy, or any other relevant measure. This analysis can help
identify the most influential features in the model.
 Feature Selection: Use the extracted features as inputs for further analysis or
prediction tasks. You can apply feature selection techniques, such as filtering or
wrapper methods, to select a subset of features that are most relevant for your
prediction task. This can help reduce dimensionality and improve prediction
accuracy.
 Model Evaluation: Evaluate the performance of your model using the extracted
features. Compare different models or variations of your model based on their
prediction accuracy, using appropriate evaluation metrics such as accuracy,
precision, recall, or F1-score.
 Interpretation: Analyze the relationship between the extracted features and the
model's decision-making process. Investigate how the features' heights and
average energy contribute to the model's predictions. This can provide insights
into the internal workings and behavior of the model.

By extracting features such as height and average energy from your generated model and
conducting analysis for prediction, you can gain a better understanding of the model's behavior
and potentially improve prediction performance by selecting important features or incorporating
feature-related insights into your prediction pipeline.

Analysis Study of Malware Classification Portable Executable Using Hybrid Machine Learning
No ratings yet
Analysis Study of Malware Classification Portable Executable Using Hybrid Machine Learning
6 pages
Machine Learning with Clustering: A Visual Guide for Beginners with Examples in Python
From Everand
Machine Learning with Clustering: A Visual Guide for Beginners with Examples in Python
Artem Kovera
No ratings yet
Data Analytics - Object Segmentation UNIT-IV
No ratings yet
Data Analytics - Object Segmentation UNIT-IV
34 pages
U4_DA(R18)_Notes+DTLExmple_23.12.2022
No ratings yet
U4_DA(R18)_Notes+DTLExmple_23.12.2022
42 pages
Object Segmentation Unit 4
No ratings yet
Object Segmentation Unit 4
23 pages
Supervised and Unsupervised Machine Learning
No ratings yet
Supervised and Unsupervised Machine Learning
3 pages
Data Analytics Unit IV
No ratings yet
Data Analytics Unit IV
13 pages
Unit 3
No ratings yet
Unit 3
13 pages
Unit 2 ML
No ratings yet
Unit 2 ML
141 pages
unit 1
No ratings yet
unit 1
8 pages
UNIT4
No ratings yet
UNIT4
12 pages
Unit 3 and Unit 4 Notes - Data Science - III BCA 2
No ratings yet
Unit 3 and Unit 4 Notes - Data Science - III BCA 2
27 pages
Da Unit-4
No ratings yet
Da Unit-4
43 pages
Data Analytics - Unit-IV
No ratings yet
Data Analytics - Unit-IV
21 pages
types of ml
No ratings yet
types of ml
10 pages
Supervised learning
No ratings yet
Supervised learning
19 pages
Wa0000.
No ratings yet
Wa0000.
26 pages
Supervised Vs Unsupervised Learning
No ratings yet
Supervised Vs Unsupervised Learning
4 pages
Machine Learning
No ratings yet
Machine Learning
12 pages
unit 1
No ratings yet
unit 1
7 pages
Decision Trees
No ratings yet
Decision Trees
5 pages
4.introduction To Learning - Unit 2
No ratings yet
4.introduction To Learning - Unit 2
8 pages
m Learning
No ratings yet
m Learning
11 pages
Types of Machine Learning
No ratings yet
Types of Machine Learning
14 pages
Data Analytics - Unit 4 (22IT513PE)
100% (1)
Data Analytics - Unit 4 (22IT513PE)
30 pages
Data Analytics Unit IV
No ratings yet
Data Analytics Unit IV
36 pages
Machine Learning - Part -1
No ratings yet
Machine Learning - Part -1
17 pages
2 ML
No ratings yet
2 ML
9 pages
ML Unit 1
No ratings yet
ML Unit 1
6 pages
ML Unit 1
No ratings yet
ML Unit 1
42 pages
IT 802 ML Unit-2 Notes
No ratings yet
IT 802 ML Unit-2 Notes
19 pages
DATA ANAYTICS Notes UNIT4
100% (1)
DATA ANAYTICS Notes UNIT4
45 pages
Classification of Machine Learning
No ratings yet
Classification of Machine Learning
73 pages
UNIT II Material
No ratings yet
UNIT II Material
20 pages
Intro MLT 08Jan25
No ratings yet
Intro MLT 08Jan25
21 pages
Supervised Learning (Classification and Regression)
No ratings yet
Supervised Learning (Classification and Regression)
14 pages
models_for_machine_learning
No ratings yet
models_for_machine_learning
11 pages
MLT Unit 1
No ratings yet
MLT Unit 1
15 pages
All algos_of_ML
No ratings yet
All algos_of_ML
31 pages
AI UNIT-4-1
No ratings yet
AI UNIT-4-1
9 pages
NI
No ratings yet
NI
10 pages
Chapter 01 Introduction to ML
No ratings yet
Chapter 01 Introduction to ML
178 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
5 pages
Supervised Unsupervised Reinforcement
No ratings yet
Supervised Unsupervised Reinforcement
39 pages
Rtmnu AIIIII
No ratings yet
Rtmnu AIIIII
57 pages
Learning Algorithms
No ratings yet
Learning Algorithms
28 pages
unit 2
No ratings yet
unit 2
13 pages
DAUnit-4
No ratings yet
DAUnit-4
51 pages
Module1 ML
No ratings yet
Module1 ML
13 pages
Supervised Unsupervised
No ratings yet
Supervised Unsupervised
12 pages
Unit 5 PPT
No ratings yet
Unit 5 PPT
32 pages
Unit-IV_new (1)
No ratings yet
Unit-IV_new (1)
18 pages
What Is Machine Learning
No ratings yet
What Is Machine Learning
4 pages
Ann Unit 2
No ratings yet
Ann Unit 2
21 pages
30905022071_AGNIK KR JANA_CA2
No ratings yet
30905022071_AGNIK KR JANA_CA2
9 pages
Machine Learning Unit-I
No ratings yet
Machine Learning Unit-I
41 pages
Machine Learning Theory
100% (1)
Machine Learning Theory
12 pages
The Fundamentals of Machine Learning: Building Intelligent Systems from Data
From Everand
The Fundamentals of Machine Learning: Building Intelligent Systems from Data
Ethan Bennett
No ratings yet
Unit 3
No ratings yet
Unit 3
33 pages
Capture d’écran, le 2025-03-18 à 04.47.36
No ratings yet
Capture d’écran, le 2025-03-18 à 04.47.36
63 pages
AIML Unit-IV & V
100% (1)
AIML Unit-IV & V
47 pages
ns2
No ratings yet
ns2
23 pages
cn unit 4
No ratings yet
cn unit 4
27 pages
The-Evolution-of-AI
No ratings yet
The-Evolution-of-AI
8 pages
cn unit 5
No ratings yet
cn unit 5
28 pages
cn unit 1
No ratings yet
cn unit 1
47 pages
Evolution-of-Communication
No ratings yet
Evolution-of-Communication
10 pages
student_Subject_Dm
No ratings yet
student_Subject_Dm
1 page
irs mid
No ratings yet
irs mid
13 pages
unit 04 devops updated
No ratings yet
unit 04 devops updated
64 pages
ML Assignment (22BCE8086) 2
No ratings yet
ML Assignment (22BCE8086) 2
19 pages
Bai Bao Online SCIE
No ratings yet
Bai Bao Online SCIE
12 pages
CASE STUDY ON SOFTWARE ENGINEERING
No ratings yet
CASE STUDY ON SOFTWARE ENGINEERING
19 pages
SonarQube Rules
No ratings yet
SonarQube Rules
11 pages
Deep Neural Networks and Tabular Data: A Survey
No ratings yet
Deep Neural Networks and Tabular Data: A Survey
22 pages
IJCRT2403067
No ratings yet
IJCRT2403067
6 pages
Big Data Analytics Project
No ratings yet
Big Data Analytics Project
21 pages
Prediction of Idiopathic Recurrent Spontaneous Miscarriage Using Machine Learning
No ratings yet
Prediction of Idiopathic Recurrent Spontaneous Miscarriage Using Machine Learning
8 pages
Review paper of house rate prediction
No ratings yet
Review paper of house rate prediction
7 pages
Unit 4 Updated Notes
No ratings yet
Unit 4 Updated Notes
13 pages
Deep Learning For Hate Speech Detection in Tweets (Pinkesh Badjatiya and Others)
No ratings yet
Deep Learning For Hate Speech Detection in Tweets (Pinkesh Badjatiya and Others)
3 pages
Emerald
No ratings yet
Emerald
23 pages
Imbalanced Data: How To Handle Imbalanced Classification Problems
No ratings yet
Imbalanced Data: How To Handle Imbalanced Classification Problems
17 pages
Design Analysis and Performance Prediction of Packed Bed Latent Heat Storage System Employing Machine Learning Models
No ratings yet
Design Analysis and Performance Prediction of Packed Bed Latent Heat Storage System Employing Machine Learning Models
17 pages
Enhancing Vulnerability Prioritization
No ratings yet
Enhancing Vulnerability Prioritization
12 pages
House Price Prediction
100% (1)
House Price Prediction
17 pages
Predicting Energy Consumption Using Stacked LSTM Snapshot Ensemble
No ratings yet
Predicting Energy Consumption Using Stacked LSTM Snapshot Ensemble
24 pages
Real Final
No ratings yet
Real Final
32 pages
Railways Tender Price Prediction Using Machine Learning and Deep Learning Algorithms
No ratings yet
Railways Tender Price Prediction Using Machine Learning and Deep Learning Algorithms
8 pages
A_Document Hair Loss - MS Word Final
No ratings yet
A_Document Hair Loss - MS Word Final
62 pages
# (2023 GMD) Machine Learning For Numerical Weather and Climate Modelling A Review
No ratings yet
# (2023 GMD) Machine Learning For Numerical Weather and Climate Modelling A Review
45 pages
Transaction Fraud Detection Using GRU-centered Sandwich-Structured Model
No ratings yet
Transaction Fraud Detection Using GRU-centered Sandwich-Structured Model
6 pages
ARTICULO (2023) - Assessing Real-Time Attention Levels of The Students During Online Classes
No ratings yet
ARTICULO (2023) - Assessing Real-Time Attention Levels of The Students During Online Classes
15 pages
BoostingDEA and R Language
No ratings yet
BoostingDEA and R Language
8 pages
AI and Robotics Complete practice set final - converted
No ratings yet
AI and Robotics Complete practice set final - converted
12 pages
Unit 4 Data warehousing and Data mining
No ratings yet
Unit 4 Data warehousing and Data mining
15 pages
WHETHER DETECTION PROJECT
No ratings yet
WHETHER DETECTION PROJECT
80 pages
C5 IEEE CreditRiskScoringAnalysisBasedonMachineLearningModels
No ratings yet
C5 IEEE CreditRiskScoringAnalysisBasedonMachineLearningModels
6 pages
Accurate Prediction of Heart Disease Using Machine Learning-A Case Study On The Cleveland Dataset - IJISRT24JUL1400
No ratings yet
Accurate Prediction of Heart Disease Using Machine Learning-A Case Study On The Cleveland Dataset - IJISRT24JUL1400
8 pages

ida unit-4

Uploaded by

ida unit-4

Uploaded by

UNIT – IV

Regression and segmentation

Supervised and Unsupervised Learning

Supervised learning is classified into two categories of algorithms:

 Classifying big data can be challenging.

Unsupervised learning is classified into two categories of algorithms:

Types of Unsupervised Learning:-

Supervised vs. Unsupervised Machine Learning:

Accuracy Highly accurate Less accurate

No. of classes No. of classes is known No. of classes is not known

Uses real-time analysis of

Output Desired output is given. Desired output is not given.

It is not possible to learn larger It is possible to learn larger

Supervised learning is also called Unsupervised learning is also called

Example: Find a face in an

Advantages of unsupervised learning:

 It does not require training data to be labeled.

Disadvantages of unsupervised learning :

Uses Known and Labeled Data as

Accuracy of Moderate Accurate and Reliable

Output data Desired output is given. Desired output is not given.

In supervised learning it is not In unsupervised learning it is possible

In supervised learning training In unsupervised learning training data

Supervised learning is also called Unsupervised learning is also called

Example Optical Character Recognition Find a face in an image.

What Are Regression Trees ?

Mean Square Error

Tree building in classification

Here's a general overview of the tree building process in classification:

Above figures represents overfitting in regression

Above figure represents overfitting in classification

By employing these strategies, it is possible to reduce overfitting in decision trees and

PRUNING AND COMPLEXITY

Using multiple decision trees in object segmentation is a common approach to improve

Using multiple decision trees in object segmentation offers several advantages:

Additionally, other ensemble methods, such as Gradient Boosting or Convolutional

TIME SERIES METHOD

MAE = (1/n) * Σ|Actual - Forecast|

MSE = (1/n) * Σ(Actual - Forecast)^2

MAPE = (1/n) * Σ(|(Actual - Forecast) / Actual|) * 100

5. Symmetric Mean Absolute Percentage Error (SMAPE): SMAPE is another measure of

SMAPE = (1/n) * Σ(|Actual - Forecast| / ((|Actual| + |Forecast|) / 2)) * 100

The STL (Seasonal and Trend decomposition using Loess)

Here's an overview of the STL approach:

The STL approach has several advantages:

extract features from generated model as height,average energy etc and

You might also like