Cross-Entropy Cost Functions used in Classification
Last Updated :
05 May, 2023
Cost functions play a crucial role in improving a Machine Learning model's performance by being an integrated part of the gradient descent algorithm which helps us optimize the weights of a particular model. In this article, we will learn about one such cost function which is the cross-entropy function which is generally used for classification problems.
What is Entropy?
Entropy is a measure of the randomness or we can say uncertainty in the probability distribution of some specific target. If you have studied entropy in physics in your higher school then you must have encountered that the entropy of gases is higher than that of the solids because of the faster movement of particles in the latter which leads to a higher degree of randomness.
What is the difference between Entropy and Cross-Entropy?
Entropy just gives us the measure of the randomness in a particular probability distribution but the requirement is somewhat different here in machine learning which is to compare the difference between the probability distribution of the predicted probabilities and the true probabilities of the target variable. And for this case, the cross-entropy function comes in handy and serves the purpose.
H\left ( X \right )=\left\{\begin{matrix} - \int_{x}^{}p(x)\log p(x),& \rm{if\; X \;is \;continous}\\ -\sum_{x}^{} p(x)\log p(x),& \rm{if\; X \;is \;discrete} \end{matrix}\right.
Due to this only we try to minimize the difference between the predicted and the actual probability distribution of the target variable. And as the value of the cost function decreases the performance of the Machine Learning model improves.
In the above-shown cost functions mathematical formula you must encounter a negative sign the significance of that is to make the values positive to avoid any confusion and better clarity. The original result without the negative sign will be negative because the probability values are between the range 0 to 1 and for these values the value of the logarithm is negative as can be seen from the below graph.
Logarithmic Function Range and DomainCross Entropy for Multi-class Classification
Now that we have a basic understanding of the theory and the mathematical formulation of the cross entropy function now let's try to work on a sample problem to get a feel of how the value for the cross entropy cost function is calculated.
Example 1:
Actual Probabilities:
[0, 1, 0, 0]
Predicted Probabilities:
[0.21, 0.68, 0.09, 0.10]
Cross Entropy = - 1 * log(0.68) = 0.167
From the above example, we can observe that all the three value for which the actual probabilities was equal to zero becomes zero and the value of the cross entropy depends upon the predicted value for the class whose probability is one.
Example 2:
Actual Probabilities:
[0, 1, 0, 0]
Predicted Probabilities:
[0.12, 0.28, 0.34, 0.26]
Cross Entropy = - 1 * log(0.28) = 0.558
Example 3:
Actual Probabilities:
[0, 1, 0, 0]
Predicted Probabilities:
[0.05, 0.80, 0.09, 0.06]
Cross Entropy = - 1 * log(0.80) = 0.096
One may say that all three examples are more or less the same but no there is a subtle difference between all three which is comparable in that as the predicted probability is close to the actual probability the value of the cross entropy decreases but as the predicted probability deviates from the actual probability value of the cross entropy function shoots up.
What is Binary Cross Entropy?
We have learned about the multiclass classification which is when there are more than two classes for which we are predicting probabilities. And hence as the name suggests it is a special case of the multiclass classification when the number of classes is only two so, there is a need to predict only probability for one class and the other one will be 1 - probability predicted.
We can modify the formula for the binary cross entropy as well.
Loss=-\left ( p\left ( x \right )\log \left ( q\left ( x \right ) \right ) + \left(1 - p\left ( x \right ) \right )\log \left (1 - q\left ( x \right ) \right ) \right )
Categorical Cross-Entropy
The error in classification for the complete model is given by the mean of cross-entropy for the complete training dataset. This is the categorical cross-entropy. Categorical cross-entropy is used when the actual-value labels are one-hot encoded. This means that only one 'bit' of data is true at a time, like [1, 0, 0], [0, 1, 0], or [0, 0, 1]. The categorical cross-entropy can be mathematically represented as:
Sparse Categorical Cross-Entropy
In sparse categorical cross-entropy, truth labels are labeled with integral values. For example, if a 3-class problem is taken into consideration, the labels would be encoded as [1], [2], [3]. Note that binary cross-entropy cost functions, categorical cross-entropy, and sparse categorical cross-entropy are provided with the Keras API.
Similar Reads
Binary Cross Entropy/Log Loss for Binary Classification
Binary cross-entropy (log loss) is a loss function used in binary classification problems. It quantifies the difference between the actual class labels (0 or 1) and the predicted probabilities output by the model. The lower the binary cross-entropy value, the better the modelâs predictions align wit
4 min read
Categorical Cross-Entropy in Multi-Class Classification
Categorical Cross-Entropy (CCE), also known as softmax loss or log loss, is one of the most commonly used loss functions in machine learning, particularly for classification problems. It measures the difference between the predicted probability distribution and the actual (true) distribution of clas
6 min read
Classification using PyTorch linear function
In machine learning, prediction is a critical component. It is the process of using a trained model to make predictions on new data. PyTorch is an open-source machine learning library that allows developers to build and train neural networks. One common use case in PyTorch is using linear classifier
7 min read
What Is Cross-Entropy Loss Function?
Cross-entropy loss also known as log loss is a metric used in machine learning to measure the performance of a classification model. Its value ranges from 0 to 1 with lower being better. An ideal value would be 0. The goal of an optimizer tasked with training a classification model with cross-entrop
8 min read
Classification Metrics using Sklearn
Machine learning classification is a powerful tool that helps us make predictions and decisions based on data. Whether it's determining whether an email is spam or not, diagnosing diseases from medical images, or predicting customer churn, classification algorithms are at the heart of many real-worl
14 min read
ROC Curves for Multiclass Classification in R
Receiver Operating Characteristic (ROC) curves are a powerful tool for evaluating the performance of classification models. While ROC curves are straightforward for binary classification, extending them to multiclass classification presents additional challenges. In this article, we'll explore how t
3 min read
Gaussian Process Classification (GPC) on Iris Dataset
A potent machine learning approach that may be used for both regression and classification problems is Gaussian process classification or GPC. It is predicated on the notion of using a probabilistic model that depicts a distribution across functions, known as a Gaussian process. Using this distribut
7 min read
Multiclass classification using scikit-learn
Multiclass classification is a popular problem in supervised machine learning.Problem - Given a dataset of m training examples, each of which contains information in the form of various features and a label. Each label corresponds to a class, to which the training example belongs. In multiclass clas
5 min read
Basic Concept of Classification (Data Mining)
Data Mining: Data mining in general terms means mining or digging deep into data that is in different forms to gain patterns, and to gain knowledge on that pattern. In the process of data mining, large data sets are first sorted, then patterns are identified and relationships are established to perf
10 min read
Dataset for Classification
Classification is a type of supervised learning where the objective is to predict the categorical labels of new instances based on past observations. The goal is to learn a model from the training data that can predict the class label for unseen data accurately. Classification problems are common in
5 min read