Predictive Maintenance
Predictive Maintenance
PRÉDICTIVE
Application of Machine Learning Algorithms
2 MAINTENANCE CONDITIONNELLE VS MAINTENANCE PRÉDICTIVE
DIFFÉRENCE SIMILARITÉ
failures:
2.IoT technology
The most important part of predictive maintenance (and arguably the hardest one) is building
predictive (a.k.a prognostic) algorithms. In essence, you must build a model that will consider
many different variables and how they interconnect and impact one another – with the ultimate
goal being able to predict machine failures.
5 HOW TO ESTABLISH A PREDICTIVE MAINTENANCE PROGRAM
Linear Regression,
Decision Trees, Random
Forest & SVM For
Predictive Maintenance
10 DATA PREPROCESSING STEP 1
11 DATA PREPROCESSING STEP 1
Categorical data refers to the information that has specific categories within the dataset such as
•Ordinal categorical variables — These variables can be ordered. Ex — Size of a T-shirt. We can
can’t say that Blue<Green as it doesn’t make any sense to compare the colors as they don’t
A training set denotes the subset of a dataset that is used for training the deep learning model.
A test set is the subset of the dataset that is used for testing the machine learning model.
15 DATA PREPROCESSING STEP 1
Linear regression is used for finding linear relationship between target and one or more
predictors. There are two types of linear regression- Simple and Multiple.
The core idea is to obtain a line that best fits the data. The best fit line is the one for which total
prediction error (all data points) are as small as possible. Error is the distance between the point
to the regression line.
17 LINEAR REGRESSION
1. Simple Linear Regression
This method uses a single independent variable to predict a dependent variable by fitting a best
linear relationship.
c m
2. Multiple Linear Regression
This method uses more than one independent variable to predict a dependent variable by
fitting a best linear relationship.
18 LINEAR REGRESSION – SOME QUESTIONS
Linear regression is used for finding linear relationship between target and one or more
predictors. There are two types of linear regression- Simple and Multiple.
Gradient descent is an iterative optimization algorithm to find the minimum of a function. Here
that function is our Loss Function. Gradient Descent starts with an initial set of parameter
values (𝒄 𝒂𝒏𝒅 𝒎) and iteratively moves towards a set of values that minimizes the Cost
function.
20 LINEAR REGRESSION
Loss Function
The loss is the error in our predicted value of 𝒄 and 𝒎. Our goal is to minimize this error to obtain
the most accurate value of 𝒄 and 𝒎.
Mean Squared Error (also called L2 loss) is the average of the squared differences between the
actual and the predicted values. For a data point Yi and its predicted value Ŷi, where n is the
total number of data points in the dataset, the mean squared error is defined as:
21 LINEAR REGRESSION
Loss Function
RMSE takes the MSE value and applies a square root over it. RMSE can be directly used to
interpret the ‘average error’ that our prediction model makes. For a data point Yi and its
predicted value Ŷi, where n is the total number of data points in the dataset, RMSE is defined as:
22 LINEAR REGRESSION
Similarities Differences
Both MAE and RMSE express average model In RMSE, since the errors are squared before they
prediction error in units of the variable of interest. are averaged, it gives a relatively high weight to
Both metrics can range from 0 to ∞ and are large errors. This means the RMSE should be more
indifferent to the direction of errors. useful when large errors are particularly
undesirable.
23 LINEAR REGRESSION
STEPS OF GRADIENT DESCENT
Dₘ is the value of the partial derivative with respect to m. Similarly let´s find the partial
derivative with respect to c, Dc :
24 LINEAR REGRESSION
STEPS OF GRADIENT DESCENT
3.Now we update the current value of m and c using the following equation:
4.We repeat this process until our loss function is a very small value or ideally 0 (which
means 0 error or 100% accuracy). The value of m and c that we are left with now will be the
optimum values.
25 LINEAR REGRESSION
STEPS OF GRADIENT DESCENT
The choice of correct learning rate is very important as it ensures that Gradient Descent
converges in a reasonable time :
1. If we choose α to be very large, Gradient Descent can overshoot the minimum. It may fail to
converge or even diverge.
2. If we choose α to be very small, Gradient Descent will take small steps to reach local
minima and will take a longer time to reach minima.
First of all, you need to make sure that you train the model on the training dataset and build
evaluation metrics on the test set to avoid overfitting. Afterwards, you can check several
evaluation metrics to determine how well your model performed.
What is overfitting?
the model is useful in reference only to its training
data set, and not to any other new data sets.
For example, it would be a big red flag if our model
saw 99% accuracy on the training set but only 55%
accuracy on the test set.
To find out …
In Batch Gradient Descent, all the training data is taken into consideration to take a single
step. We take the average of the gradients of all the training examples and then use that mean
gradient to update our parameters. So that’s just one step of gradient descent in one epoch.
Regression Model
In the case of SGD, there will be ‘m’ iterations per epoch, where ‘m’ is the number of
observations in a dataset.
31 LINEAR REGRESSION – SOME QUESTIONS
The batch size is the number of samples to
What is Mini-Batch Gradient Descent ? process before the model is updated
If we have a dataset with 1280 samples and we choose a batch size of 32 and 1000 epochs.
Then, we´ll have 40 batches, each with 32 samples. The model will be updated after each batch
of 32 samples. One epoch will involve 40 batches or 40 updates to the model (iterations).
With 1000 epochs, the model will be exposed to the whole dataset 1000 times, that is a total
2) predict the rainfall of the coming days based on increasing the temperature,
3) Agricultural scientists often use linear regression to measure the effect of fertilizer and water
on crop yields,
4) Data scientists for professional sports teams often use linear regression to measure the
5) Crime Data Mining : Predicting the crime rate of a state based on drug usage, number of
Decision tree is the most powerful and popular tool for classification and prediction. Decision
Tree algorithms are referred to as CART or Classification and Regression Trees.
A decision tree typically starts with a single node, which branches into possible outcomes. Each
of those outcomes leads to additional nodes, which branch off into other possibilities. This gives
it a tree-like shape.
37 DECISION TREE
38 DECISION TREE
EXAMPLE OF A DECISION TREE
39 DECISION TREE
What is Entropy?
Information gain measures the reduction of uncertainty given some feature and it is also a
deciding factor for which attribute should be selected as a decision node or root node.
It is just entropy of the full dataset – entropy of the dataset given some feature.
Example:
Suppose our entire population has a total of 30 instances. The objective is to predict whether
the person will go to the gym or not. Let’s say 16 people go to the gym and 14 people don’t.
Feature 1 is “Energy” which takes two values “high” and “low”
Feature 2 is “Motivation” which takes 3 values “No motivation”, “Neutral” and “Highly
motivated”.
41 DECISION TREE - CLASSIFICATION
Information Gain
Now we have the value of E(Parent) and E(Parent|Energy), information gain will be:
42 DECISION TREE - CLASSIFICATION
Information Gain
Now we have the value of E(Parent) and E(Parent|Motivation), information gain will
be:
To find out …
F1-SCORE
44 CLASSIFICATION METRICS
Confusion Matrix
Accuracy
Accuracy is one metric for evaluating classification models. Informally, accuracy is the
fraction of predictions our model got right. Formally, accuracy has the following definition:
It works well only if there are equal number of samples belonging to each class.
For example, consider that there are 98% samples of class A and 2% samples of class B in our
training set. Then our model can easily get 98% training accuracy by simply predicting every
training sample belonging to class A.
47 CLASSIFICATION METRICS
Accuracy
Imagine someone claimed to create a model to identify terrorists trying to board flights with
greater than 99 percent accuracy ... entirely in their head. Would you believe them?
Well, here’s the model: simply label every single person flying from a U.S. airport as “not a
terrorist.” Given the 800 million average passengers on U.S. flights per year and the 19
(confirmed) terrorists who boarded U.S. flights from 2000–2017, this model achieves an
While this solution has nearly perfect accuracy, this problem is one in which accuracy is clearly
Accuracy
The terrorist detection task is an imbalanced classification problem: we have two classes we
most of the data points. Another imbalanced classification problem occurs in disease detection
when the rate of the disease in the public is very low. In both these cases, the negative class—
Recall
Recall attempts to answer the following question: What proportion of actual positives was
identified correctly?
Mathematically, we define recall as the number of true positives divided by the number of true
positives plus the number of false negatives.
50 CLASSIFICATION METRICS
Recall
Recall is a good measure to determine, when the cost of False Negative is high. For
instance, in sick patient detection. If a sick patient (Actual Positive) goes through the
test and predicted as not sick (Predicted Negative). The cost associated with False
Precision
Precision attempts to answer the following question: What proportion of positive identifications
was actually correct?
Mathematically, precision the number of true positives divided by the number of true positives plus
the number of false positives.
52 CLASSIFICATION METRICS
Precision
Precision is a good measure to determine, when the costs of False Positive is high. For
instance, email spam detection. In email spam detection, a false positive means that an
email that is non-spam (actual negative) has been identified as spam (predicted spam).
The email user might lose important emails if the precision is not high for the spam
detection model.
53 CLASSIFICATION METRICS
EXAMPLE
54 CLASSIFICATION METRICS
Precision & Recall
Example:
from flying.
55 CLASSIFICATION METRICS
Precision & Recall
In some situations, we might know we want to maximize either recall or precision at the
expense of the other metric. For example, in preliminary disease screening of patients for
follow-up examinations, we would probably want a recall near 1.0 - we want to find all patients
who actually have the disease—and we can accept a low precision - we accidentally find some
patients have the disease who actually don’t have it - if the cost of the follow-up examination
isn’t high. However, in cases where we want to find an optimal blend of precision and recall, we
F1 Score
F1 Score is needed when you want to seek a balance between Precision and Recall.
The F1 score is the harmonic mean of precision and recall, taking both metrics into
account in the following equation:
F1 Score might be a better measure to use if [we need to seek a balance between Precision
and Recall] AND [there is an uneven class distribution (large number of Actual Negatives)].
57 CLASSIFICATION METRICS - RECAP
Four Outcomes of Binary Classification
•True positives: data points labeled as positive that are actually positive
•False positives: data points labeled as positive that are actually negative
•True negatives: data points labeled as negative that are actually negative
•False negatives: data points labeled as negative that are actually positive
Conclusion: Since IG₂ > IG₁, we would first split the data using feature F₂
60 DECISION TREE- CLASSIFICATION
Step1: Sort the data (ascending order) based on the continuous independent variable.
Step3: For quantitative data, there will be only two splits, either ≤ or >. The check is applied to all
the average values computed. Whichever threshold results in the lowest impurity (lowest
random-state :
It is important to note that random-state value can have significant effect on the quality of your
model (by quality I essentially mean accuracy to predict). So, it is important to find the best
random-state value to provide you with the most accurate model