The document discusses the concept of Empirical Risk Minimization (ERM) in neural networks and deep learning, emphasizing its importance in selecting models that minimize empirical risk based on training data. It explains the distinction between empirical risk and true risk, as well as the impact of bias and variance errors on model performance. Additionally, it highlights regularization techniques to prevent overfitting and improve model generalization on unseen data.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0 ratings0% found this document useful (0 votes)
3 views
unit-online-1.2
The document discusses the concept of Empirical Risk Minimization (ERM) in neural networks and deep learning, emphasizing its importance in selecting models that minimize empirical risk based on training data. It explains the distinction between empirical risk and true risk, as well as the impact of bias and variance errors on model performance. Additionally, it highlights regularization techniques to prevent overfitting and improve model generalization on unseen data.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 20
NEURAL NETWORKS & DEEP LEARNING
(21MCA24DB3)
Prepared & Presented By:
Dr. Balkishan Assistant Professor Department of Computer Science & Applications Maharshi Dayanand University Rohtak Empirical Risk Minimization The empirical risk minimization principle states that the learning algorithm should choose a function/model/hypothesis which minimizes the empirical risk Understanding the concept of risk
• What is loss function
• Given a set of inputs and outputs, loss function measures the difference between the predicted output and the true output. • But this is applicable only to the given set of inputs and outputs. • We want to know what the loss is over all the possibilities. • This is where “true risk” comes into picture. • True risk computes the average loss over all the possibilities. What exactly is Empirical Risk Minimization
• If we compute the loss using the data points in our
dataset, it’s called empirical risk. • It’s “empirical (experimental)” and not “true” because we are using a dataset that’s a subset of the whole population. • When we build our learning model, we need to pick the function that minimizes the empirical risk i.e. the difference between the predicted output and the actual output for the data points in our dataset. • This process of finding the function that minimizes the empirical risk is called empirical risk minimization. Importance of Empirical Risk Minimization
• ERM is essential to understanding the limits
of machine learning algorithms and to form a good basis for practical problem-solving skills. Empirical risk minimization (ERM) • It is a principle in statistical learning theory which defines a family of learning algorithms and is used to give theoretical bounds on their performance. • The idea is that we don’t know exactly how well an algorithm will work in practice (the true "risk") because we don't know the true distribution of data that the algorithm will work on but as an alternative we can measure its performance on a known set of training data. • We assumed that our samples come from this distribution and use our dataset as an approximation. Example of Empirical Risk Minimization • Example: We want to build a model that can differentiate between a male and a female based on specific features. • If we select 150 random people where women are really short, and men are really tall, then the model might incorrectly assume that height is the differentiating feature. • For building a truly accurate model, we have to gather all the women and men in the world to extract differentiating features. • Unfortunately, that is not possible! So we select a small number of people and hope that this sample is representative of the whole population. • If we compute the loss using the data points in our dataset, it’s called empirical risk. • It is “empirical” and not “true” because we are using a dataset that’s a subset of the whole population. • When our learning model is built, we have to pick a function that minimizes the empirical risk that is the delta between predicted output and actual output for data points in the dataset. • This process of finding this function is called empirical risk minimization (ERM). • We want to minimize the true risk. Training and Testing of Model Training and Testing of Model Model Fitting Model (Function) Fitting • How well a model performs on training /evaluation datasets will define its characteristics
Underfit Overfit Good Fit
Training Dataset Poor Very Good Good
Evaluation Very Poor Poor Good
Dataset Model Fitting – Visualization
Variations of model fitting
Errors in Machine Learning
• In machine learning, an error is a
measure of how accurately an algorithm can make predictions for the previously unknown dataset. • On the basis of these errors, the machine learning model is selected that can perform best on the particular dataset. Machine Learning Errors • Reducible errors: These errors can be reduced to improve the model accuracy. Such errors can further be classified into bias and Variance. • Irreducible errors: These errors will always be present in the model regardless of which algorithm has been used. The cause of these errors is unknown variables whose value can't be reduced. What is Bias
• In general, a machine learning model
analyses the data, find patterns in it and make predictions. • While training, the model learns these patterns in the dataset and applies them to test data for prediction. • While making predictions, a difference occurs between prediction values made by the model and actual values/expected values, and this difference is known as bias errors or Errors due to bias. What is a Variance Error
• The variance would specify the amount of
variation in the prediction if the different training data was used. • In simple words, variance tells that how much a random variable is different from its expected value. • Ideally, a model should not vary too much from one training dataset to another, which means the algorithm should be good in understanding the hidden mapping between inputs and output variables. • Variance errors are either of low variance or high variance.
Over-fitted model where we see model performance on, a)
training data b) new data Regularizing a Deep Network (Technique to prevent overfitting) • Regularization is a technique which makes slight modifications to the learning algorithm such that the model generalizes better. • This in turn improves the model’s performance on the unseen data. • Reduce the complexity of the model