ML_Unit-5
ML_Unit-5
Kinds of feature
There are two types of data: Qualitative and Quantitative data, which are further classified
into:
The data is classified into four categories:
• Nominal feature.
• Ordinal feature.
• Discrete feature.
• Continuous feature.
Qualitative feature
Qualitative or Categorical Data is data that can’t be measured or counted in the form of
numbers. These types of data are sorted by category, not by number. That’s why it is also known
as Categorical Data.
These data consist of audio, images, symbols, or text. The gender of a person, i.e., male, female,
or others, is qualitative data.
Qualitative data tells about the perception of people. This data helps market researchers
understand the customers’ tastes and then design their ideas and strategies accordingly.
The Qualitative feature are further classified into two parts:
Nominal Feature (categorical)
Nominal Feature is used to label variables without any order or quantitative value. The color
of hair can be considered nominal feature, as one color can’t be compared with another color.
Examples of Nominal Feature :
• Colour of hair (Blonde, red, Brown, Black, etc.)
• Marital status (Single, Widowed, Married)
• Nationality (Indian, German, American)
• Gender (Male, Female, Others)
• Eye Color (Black, Brown, etc.)
Ordinal Feature
Ordinal feature have natural ordering where a number is present in some kind of order by their
position on the scale. These feature are used for observation like customer satisfaction,
happiness, etc., but we can’t do any arithmetical tasks on them.
Examples of Ordinal Feature :
• Letter grades in the exam (A, B, C, D, etc.)
• Ranking of people in a competition (First, Second, Third, etc.)
• Economic Status (High, Medium, and Low)
• Education Level (Higher, Secondary, Primary)
Quantitative Feature (continuous)
Quantitative feature can be expressed in numerical values, making it countable and including
statistical feature analysis. These kinds of feature are also known as Numerical feature.
Examples of Quantitative Feature :
• Height or weight of a person or object
• Room Temperature
• Scores and Marks (Ex: 59, 80, 60, etc.)
• Time
Continuous Feature
Continuous feature are in the form of fractional numbers.
Examples of Continuous Feature :
• Height of a person
• Speed of a vehicle
• “Time-taken” to finish the work
• Wi-Fi Frequency
• Market share price
Calculations on Features:
The possible calculations on features are statistics of central tendency, statistics of dispersion
and shape statistics.
Statistics of central tendency:
● The mean or average value;
● The median, which is the middle value if we order the instances from lowest to highest feature
value;
● And the mode, which is the majority value or values.
The second kind of calculation on features are statistics of dispersion or ‘spread’.
Two well-known statistics of dispersion are the variance or average squared deviation from the
(arithmetic) mean, and its square root, the standard deviation.
Other statistics of dispersion include percentiles. The p-th percentile is the value such that p
per cent of the instances fall below it.
The skew and ‘peakedness’ of a distribution can be measured by shape statistics such as
skewness and kurtosis.
Feature transformations
Feature transformations aim at improving the utility of a feature by removing, changing, or
adding information.
Binarisation transforms a categorical feature into a set of Boolean features, one for each value
of the categorical feature.
Unordering trivially turns an ordinal feature into a categorical one by discarding the ordering
of the feature values.
Thresholding and discretisation: Thresholding transforms a quantitative or an ordinal feature
into a Boolean feature by finding a feature value to split on.
Concretely, let f : X →R be a quantitative feature and let t ∈ R be a threshold, then
ft : X → {true, false} is a Boolean feature defined by ft (x) = true if f (x) ≥ t and ft (x) = false
if f (x) < t . We can choose such thresholds in an unsupervised or a supervised way.
Discretisation transforms a quantitative feature into an ordinal feature. Each ordinal value is
referred to as a bin and corresponds to an interval of the original quantitative feature.
Unsupervised discretisation methods typically require one to decide the number of bins
beforehand. A simple method that often works reasonably well is to choose the bins so that
each bin has approximately the same number of instances: this is referred to as equal-
frequency discretisation.
Another unsupervised discretisation method is equal-width discretisation, which chooses the
bin boundaries so that each interval has the same width.
In supervised discretisation methods, we can distinguish between top–down or divisive
discretisation methods on the one hand, and bottom–up or agglomerative discretisation
methods on the other. Divisive methods work by progressively splitting bins, whereas
agglomerative methods proceed by initially assigning each instance to its own bin and
successively merging bins. In either case an important role is played by the stopping criterion,
which decides whether a further split or merge is worthwhile.
Normalization and calibration:
Thresholding and discretisation are feature transformations that remove the scale of a
quantitative feature.
Normalization and calibration involve adding a scale to an ordinal or categorical feature.
If this is done in an unsupervised fashion it is usually called normalisation, whereas calibration
refers to supervised approaches taking in the (usually binary) class labels.
Feature normalisation is often required to neutralise the effect of different quantitative features
being measured on different scales. If the features are approximately normally distributed, we
can convert them into z-scores by centring on the mean and dividing by the standard deviation.
In certain cases it is mathematically more convenient to divide by the variance instead, as we
have seen in . If we don’t want to assume normality we can centre on the median and divide by
the interquartile range.
Feature calibration is understood as a supervised feature transformation adding a meaningful
scale carrying class information to arbitrary features.
The problem of feature calibration can thus be stated as follows: given a feature F : X → F,
construct a calibrated feature Fc :X →[0,1] such that Fc(x) estimates the probability Fc(x) =
P(⊕|v), where v = F(x) is the value of the original feature for x.
Feature construction and selection
Feature selection
A feature is an attribute that has an impact on a problem or is useful for the problem, and
choosing the important features for the model is known as feature selection.
Below are some benefits of using feature selection in machine learning:
o It helps in avoiding the curse of dimensionality.
o It helps in the simplification of the model so that it can be easily interpreted by the
researchers.
o It reduces the training time.
o It reduces overfitting hence enhance the generalization.
Feature Selection Techniques
There are mainly two types of Feature Selection techniques, which are:
Supervised Feature Selection technique
Supervised Feature selection techniques consider the target variable and can be used for the
labelled dataset.
Unsupervised Feature Selection technique
Unsupervised Feature selection techniques ignore the target variable and can be used for the
unlabelled dataset.
3. Embedded Methods
Embedded methods combined the advantages of both filter and wrapper methods by
considering the interaction of features along with low computational cost. These are fast
processing methods similar to the filter method but more accurate than the filter method.
These methods are also iterative, which evaluates each iteration, and optimally finds the most
important features that contribute the most to training in a particular iteration. Some
techniques of embedded methods are:
o Regularization- Regularization adds a penalty term to different parameters of the
machine learning model for avoiding overfitting in the model. This penalty term is
added to the coefficients; hence it shrinks some coefficients to zero. Those features with
zero coefficients can be removed from the dataset. The types of regularization
techniques are L1 Regularization (Lasso Regularization) or Elastic Nets (L1 and L2
regularization).
o Random Forest Importance - Different tree-based methods of feature selection help
us with feature importance to provide a way of selecting features. Here, feature
importance specifies which feature has more importance in model building or has a
great impact on the target variable. Random Forest is such a tree-based method, which
is a type of bagging algorithm that aggregates a different number of decision trees. It
automatically ranks the nodes by their performance or decrease in the impurity (Gini
impurity) over all the trees. Nodes are arranged as per the impurity values, and thus it
allows to pruning of trees below a specific node. The remaining nodes create a subset
of the most important features.
How to choose a Feature Selection Method?
Input Output Feature Selection technique
Variable Variable
Numerical Numerical o Pearson's correlation coefficient (For linear
Correlation).
o Spearman's rank coefficient (for non-linear
correlation).
Numerical Categorical o ANOVA correlation coefficient (linear).
o Kendall's rank coefficient (nonlinear).
Categorical Numerical o Kendall's rank coefficient (linear).
o ANOVA correlation coefficient (nonlinear).
Categorical Categorical o Chi-Squared test (contingency tables).
o Mutual Information.
Feature construction
Step 1: In the Random forest model, a subset of data points and a subset of features is selected
for constructing each decision tree. Simply put, n random records and m features are taken
from the data set having k number of records.
Step 2: Individual decision trees are constructed for each sample.
Step 3: Each decision tree will generate an output.
Step 4: Final output is considered based on Majority Voting or Averaging for Classification
and regression, respectively.
For example: consider the fruit basket as the data as shown in the figure below. Now n number
of samples are taken from the fruit basket, and an individual decision tree is constructed for
each sample. Each decision tree will generate an output, as shown in the figure. The final output
is considered based on majority voting. In the below figure, you can see that the majority
decision tree gives output as an apple when compared to a banana, so the final output is taken
as an apple.