When working with datasets, we often encounter categorical data, which needs to be converted into numerical format for machine learning algorithms to process. For example, a column representing car brands ("Toyota"
, "Honda"
, "Ford"
) or colors ("Red"
, "Blue"
, "Green"
) is categorical data for Cars Dataset. One common method to achieve this is Label Encoding.
In this Article, we will understand the concept of label encoding briefly with python implementation.
Label Encoding
Label Encoding is a technique that is used to convert categorical columns into numerical ones so that they can be fitted by machine learning models which only take numerical data. It is an important pre-processing step in a machine-learning project. It assigns a unique integer to each category in the data, making it suitable for machine learning models that work with numerical inputs.
Example of Label Encoding
Suppose we have a column Height in some dataset that has elements as Tall, Medium, and short. To convert this categorical column into a numerical column we will apply label encoding to this column. After applying label encoding, the Height column is converted into a numerical column having elements 0, 1, and 2 where 0 is the label for tall, 1 is the label for medium, and 2 is the label for short height.
Height | Height |
Tall | 0 |
Medium | 1 |
Short | 2 |
How to Perform Label Encoding in Python
We will apply Label Encoding on the iris dataset on the target column which is Species. It contains three species Iris-setosa, Iris-versicolor, Iris-virginica.
Python
import numpy as np
import pandas as pd
df = pd.read_csv('../../data/Iris.csv')
df['species'].unique()
Output:
array(['Iris-setosa', 'Iris-versicolor', 'Iris-virginica'], dtype=object)
After applying Label Encoding with LabelEncoder() our categorical value will replace with the numerical value[int].
Python
from sklearn import preprocessing
label_encoder = preprocessing.LabelEncoder()
df['species']= label_encoder.fit_transform(df['species'])
df['species'].unique()
Output:
array([0, 1, 2], dtype=int64)
Advantages of Label Encoding
1. Label Encoding is straightforward to use. It requires less preprocessing because it directly converts each unique category into a numeric value. Wedon’t need to create additional features or complex transformations.
For example, if you have categories like ["Red", "Green", "Blue"]
, Label Encoding simply assigns integers like [0, 1, 2]
without extra steps
2. Label Encoding works well for ordinal data, where the order of categories is meaningful (e.g., Low
, Medium
, High
). The numerical representation saves the relationship between categories
Example: (Low = 0
, Medium = 1
, High = 2
), which helps the model understand their ranking or progression. It avoids unnecessary computations, making it both efficient and relevant in such cases.
Limitation of label Encoding
If the encoded values imply a relationship (e.g., Red = 0
and Blue = 2
might suggest Red < Blue
), the model may incorrectly interpret the data as ordinal. To address this, we consider using One-Hot Encoding.
Conclusion
Label Encoding is an essential technique for preprocessing categorical data in machine learning. It's simple, efficient, and works well for ordinal data. However, be cautious of its limitations and use other encoding techniques like One-Hot Encoding when necessary.
Similar Reads
How to Add Text Labels to a Histogram in Plotly
Plotly is a powerful and versatile library for creating interactive visualizations in Python. Among its many features, Plotly allows users to create histograms, which are essential for visualizing the distribution of numerical data. Adding text labels to histograms can enhance the interpretability o
3 min read
Best way to learn python
Python is a versatile and beginner-friendly programming language that has become immensely popular for its readability and wide range of applications. Whether you're aiming to start a career in programming or just want to expand your skill set, learning Python is a valuable investment of your time.
11 min read
Python 101
Welcome to "Python 101," your comprehensive guide to understanding and mastering the fundamentals of Python programming. Python is a versatile and powerful high-level programming language that has gained immense popularity due to its simplicity and readability. Whether you're an aspiring programmer,
13 min read
Numpy - String Functions & Operations
NumPy String functions belong to the numpy.char module and are designed to perform element-wise operations on arrays. These functions can help to handle and manipulate string data efficiently.Table of ContentString OperationsString Information String Comparison In this article, weâll explore the var
5 min read
Python Tkinter - Label
Tkinter Label is a widget that is used to implement display boxes where you can place text or images. The text displayed by this widget can be changed by the developer at any time you want. It is also used to perform tasks such as underlining the part of the text and spanning the text across multipl
4 min read
PYGLET â Accessing Label Text
In this article we will see how we can access the label text in PYGLET module in python. Pyglet is easy to use but powerful library for developing visually rich GUI applications like games, multimedia etc. A window is a "heavyweight" object occupying operating system resources. Windows may appear as
2 min read
How to Get the Tkinter Label Text?
Prerequisite: Python GUI â Tkinter Python offers multiple options for developing a GUI (Graphical User Interface). Out of all the GUI methods, tkinter is the most commonly used method. It is a standard Python interface to the Tk GUI toolkit shipped with Python. Python with tkinter is the fastest and
2 min read
Learn Python Basics
âPython is a versatile, high-level programming language known for its readability and simplicity. Whether you're a beginner or an experienced developer, Python offers a wide range of functionalities that make it a popular choice in various domains such as web development, data science, artificial in
9 min read
PYGLET â Setting Label Style
In this article we will see how we can set the label style in PYGLET module in python. Pyglet is easy to use but powerful library for developing visually rich GUI applications like games, multimedia etc. A window is a "heavyweight" object occupying operating system resources. Windows may appear as f
2 min read
PYGLET â Showing Text using Label
In this article we will see how we can show text on the window in PYGLET module in python. Pyglet is easy to use but powerful library for developing visually rich GUI applications like games, multimedia etc. A window is a "heavyweight" object occupying operating system resources. Windows may appear
2 min read