Feature+Encoding
Feature+Encoding
Another very important check before feeding data to Machine Learning Algorithms is to check
whether all the features are of Numerical data types or not. If the dataset has categorical features
with String data types, then the machine cannot interpret it directly, thus it needs to be converted
into numerical fields. This process of converting the String (or object) Data type features to
numerical data type features is termed as encoding.
Once we classify the categorical features present in our dataset, next step is to choose an
appropriate encoding technique.
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
2. Label Encoding:
In label encoding, we replace the categorical value with a numeric value between 0 and
the number of classes minus 1. If the categorical variable value contains 5 distinct classes,
we use (0, 1, 2, 3, and 4) as the labels.
Using the same example as before, below is what the State feature would encoded into
using label encoding –
[email protected]
18XHT46RCY
The labels (0 to 5) are assigned in the Alphabetical order of the unique values i.e. Delhi:0,
Gujrat:1 and so on.
3. Ordinal Encoding
An Ordinal Encoder is used to encode categorical features into an ordinal numerical value
(ordered set). This approach transforms categorical value to numerical value in ordered
sets. This encoding technique appears almost similar to Label Encoding. However, label
encoding would not consider whether a variable is ordinal or not, but in the case of ordinal
encoding, it will assign a sequence of numerical values as per the order of data.
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Note that if were to use Label Encoding instead, Excellent would assigned 1, Good 2 and
so on, leading to misinformation regarding the order.
Python Codes -
1. One-hot Encoding :
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
2. Label Encoding –
# Argument Categories = None to use the unique classes from the feature, and
ordered = None to ensure nominal encoding.
data['ref_city_cat'] = pd.Categorical(values =data['referral_preferred_city'] ,
categories=None, ordered=None)
3. Ordinal Encoding –
[email protected]
i) Importing the encoders from sklearn
18XHT46RCY
from sklearn.preprocessing import OrdinalEncoder
ord_encoder.fit(data_train[['referral_preferred_city']])
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
[email protected]
18XHT46RCY
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.