0% found this document useful (0 votes)

40 views

Unit 3 Categorical - Data

This document discusses categorical data in pandas. Categorical data can take on a limited, fixed set of values and cannot perform numerical operations. Categorical data is useful for string variables with a small set of values, variables where the logical order differs from lexical order, and as a signal to libraries about variable type. The document describes how to create categorical data using Series and the Categorical constructor. It also covers describing, renaming, adding/removing categories, and comparing categorical data.

Uploaded by

Vatsal Bhalani

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

40 views

Unit 3 Categorical - Data

Uploaded by

Vatsal Bhalani

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

L.J.

Institute of Engineering & Technology Semester: V (2022) PDS (3150713)

 Categorical data
Categorical variables can take on only a limited, and usually fixed number of possible values.
Besides the fixed length, categorical data might have an order but cannot perform numerical
operation. Categorical are a Pandas data type.

The categorical data type is useful in the following cases:

A string variable consisting of only a few different values. Converting such a string variable to a
categorical variable will save some memory.

The lexical order of a variable is not the same as the logical order (“one”, “two”, “three”). By
converting to a categorical and specifying an order on the categories, sorting and min/max will
use the logical order instead of the lexical order.

As a signal to other python libraries that this column should be treated as a categorical variable
(e.g. to use suitable statistical methods or plot types).

Object Creation

Categorical object can be created in multiple ways. The different ways have been described
below :

import pandas as pd

s = pd.Series(["a","b","c","a"], dtype="category")
print s
Its output is as follows,
0 a
1 b
2 c
3 a
dtype: category
Categories (3, object): [a, b, c]
The number of elements passed to the series object is four, but the categories are only three.
Observe the same in the output Categories.

pd.Categorical
Using the standard pandas Categorical constructor, we can create a category object.
pandas.Categorical(values, categories, ordered)

Let’s take an example

import pandas as pd

1
L.J. Institute of Engineering & Technology Semester: V (2022) PDS (3150713)

cat = pd.Categorical(['a', 'b', 'c', 'a', 'b', 'c'])

print cat
Its output is as follows,
[a, b, c, a, b, c]
Categories (3, object): [a, b, c]

Let’s have another example,

import pandas as pd

cat = cat=pd.Categorical(['a','b','c','a','b','c','d'], ['c', 'b', 'a'])

print cat
Its output is as follows,
[a, b, c, a, b, c, NaN]
Categories (3, object): [c, b, a]
Here, the second argument signifies the categories. Thus, any value which is not present in the
categories will be treated as NaN.

Now, take a look at the following example,

import pandas as pd

cat = cat=pd.Categorical(['a','b','c','a','b','c','d'], ['c', 'b', 'a'],ordered=True)

print cat
Its output is as follows,

[a, b, c, a, b, c, NaN]

Categories (3, object): [c < b < a]

Logically, the order means that, a is greater than b and b is greater than c.

Describe()
Using the .describe() command on the categorical data, we get similar output to
a Series or DataFrame of the type string.
Describe() is used to view some basic statistical details like percentile, mean, std etc. of a data
frame or a series of numeric values.

import pandas as pd
import numpy as np

2
L.J. Institute of Engineering & Technology Semester: V (2022) PDS (3150713)

cat = pd.Categorical(["a", "c", "c", np.nan], categories=["b", "a", "c"])

df = pd.DataFrame({"cat":cat, "s":["a", "c", "c", np.nan]})

print df.describe()
print df["cat"].describe()

Its output is as follows,

cat s
count 3 3
unique 2 2
top cc
freq 2 2
count 3
unique 2
top c
freq 2
Name: cat, dtype: object

Renaming Categories
Renaming categories is done by assigning new values to the series.cat.categories property.
import pandas as pd

s = pd.Series(["a","b","c","a"], dtype="category")
s.cat.categories = ["Group %s" % g for g in s.cat.categories]
print s.cat.categories
Its output is as follows ,
Index([u'Group a', u'Group b', u'Group c'], dtype='object')
Initial categories [a,b,c] are updated by the s.cat.categories property of the object.

Appending New Categories

Using the Categorical.add.categories() method, new categories can be appended.
import pandas as pd

s = pd.Series(["a","b","c","a"], dtype="category")
s = s.cat.add_categories([4])
print s.cat.categories
Its output is as follows ,

3
L.J. Institute of Engineering & Technology Semester: V (2022) PDS (3150713)

Index([u'a', u'b', u'c', 4], dtype='object')

Removing Categories
Using the Categorical.remove_categories() method, unwanted categories can be removed.
import pandas as pd

s = pd.Series(["a","b","c","a"], dtype="category")
print ("Original object:")
print s

print ("After removal:")

print s.cat.remove_categories("a")
Its output is as follows −
Original object:
0 a
1 b
2 c
3 a
dtype: category
Categories (3, object): [a, b, c]

After removal:
0 NaN
1 b
2 c
3 NaN
dtype: category
Categories (2, object): [b, c]

Comparison of Categorical Data

Comparing categorical data with other objects is possible in three cases −
 comparing equality (== and !=) to a list-like object (list, Series, array, ...) of the same
length as the categorical data.
 all comparisons (==, !=, >, >=, <, and <=) of categorical data to another categorical
Series, when ordered==True and the categories are the same.
 all comparisons of a categorical data to a scalar.

Take a look at the following example

4
L.J. Institute of Engineering & Technology Semester: V (2022) PDS (3150713)

import pandas as pd

cat = pd.Series([1,2,3]).astype("category", categories=[1,2,3], ordered=True)

cat1 = pd.Series([2,2,2]).astype("category", categories=[1,2,3], ordered=True)

print cat>cat1

Its output is as follows,

0 False

1 False

2 True

dtype: bool

MS Word Presentation
0% (1)
MS Word Presentation
29 pages
Candidate 3 - (68 Out of 70) - Additional Comments
No ratings yet
Candidate 3 - (68 Out of 70) - Additional Comments
2 pages
Pandas Basics
No ratings yet
Pandas Basics
84 pages
MODULE 3 - The Web and The Internet
100% (2)
MODULE 3 - The Web and The Internet
9 pages
Boost Trie
No ratings yet
Boost Trie
20 pages
categorical.rst
No ratings yet
categorical.rst
22 pages
pandas (1)
No ratings yet
pandas (1)
25 pages
Pandas Dataframe Export The CSV File
No ratings yet
Pandas Dataframe Export The CSV File
9 pages
Pandas
No ratings yet
Pandas
5 pages
Pandas
No ratings yet
Pandas
42 pages
Python Pandas ch-2
No ratings yet
Python Pandas ch-2
56 pages
Pandas
No ratings yet
Pandas
25 pages
Python Pandas New Sylabus
No ratings yet
Python Pandas New Sylabus
53 pages
Starting Out With Pandas - Ext
No ratings yet
Starting Out With Pandas - Ext
18 pages
The Pandas Library
No ratings yet
The Pandas Library
39 pages
PPT for Assignment-3 (Final_Pandas_Lab)
No ratings yet
PPT for Assignment-3 (Final_Pandas_Lab)
40 pages
Pandas Questions
No ratings yet
Pandas Questions
11 pages
Pandas
No ratings yet
Pandas
13 pages
Creation of Series Using List, Dictionary & Ndarray
No ratings yet
Creation of Series Using List, Dictionary & Ndarray
65 pages
More On Pandas
No ratings yet
More On Pandas
51 pages
Class 12 Practical File
No ratings yet
Class 12 Practical File
29 pages
IP Practical File - Reference
No ratings yet
IP Practical File - Reference
98 pages
UNIT - 3 Pandas
No ratings yet
UNIT - 3 Pandas
21 pages
Python 3rd unit question and answer
No ratings yet
Python 3rd unit question and answer
25 pages
14_Pandas
No ratings yet
14_Pandas
25 pages
Pandas DataFrame
No ratings yet
Pandas DataFrame
70 pages
20 Pandas Functions For 80% of Your Data Science
No ratings yet
20 Pandas Functions For 80% of Your Data Science
22 pages
Pandas Data Structures: Sections
No ratings yet
Pandas Data Structures: Sections
13 pages
Tutorial Data Visualization Pandas Matplotlib Seaborn
No ratings yet
Tutorial Data Visualization Pandas Matplotlib Seaborn
32 pages
Chapter 4 - Python For Data Analysis
No ratings yet
Chapter 4 - Python For Data Analysis
47 pages
Pandas - Digitalocean
No ratings yet
Pandas - Digitalocean
15 pages
Python Pandas Module - Introduction-07-11-2023
No ratings yet
Python Pandas Module - Introduction-07-11-2023
84 pages
Lab-3 Pandas Library
No ratings yet
Lab-3 Pandas Library
14 pages
Ai Workflow Data Preparation With Numpy and Pandas: MR Hew Ka Kian Hew - Ka - Kian@Rp - Edu.Sg
No ratings yet
Ai Workflow Data Preparation With Numpy and Pandas: MR Hew Ka Kian Hew - Ka - Kian@Rp - Edu.Sg
26 pages
Python Pandas Interview Questions
100% (1)
Python Pandas Interview Questions
17 pages
5CS037 WS02 PandasForDataAnalysis
No ratings yet
5CS037 WS02 PandasForDataAnalysis
30 pages
Top Python Questions 1735201448
No ratings yet
Top Python Questions 1735201448
25 pages
Pandas Cheat Sheet........
No ratings yet
Pandas Cheat Sheet........
11 pages
Python Pandas Tutorial For Beginners
No ratings yet
Python Pandas Tutorial For Beginners
203 pages
Pandas python
No ratings yet
Pandas python
11 pages
Phan1_Pandas_Numpy_Matplotlib
No ratings yet
Phan1_Pandas_Numpy_Matplotlib
158 pages
Data Handing Using Pandas-I
100% (2)
Data Handing Using Pandas-I
46 pages
Pandas 1705297450
No ratings yet
Pandas 1705297450
21 pages
Introduction To Pandas
No ratings yet
Introduction To Pandas
26 pages
Pandas Dataframe
No ratings yet
Pandas Dataframe
48 pages
CO3_1_Pandas Series and Data Frame
No ratings yet
CO3_1_Pandas Series and Data Frame
37 pages
Dataframe Notes
No ratings yet
Dataframe Notes
47 pages
Class 6 Pandas
No ratings yet
Class 6 Pandas
13 pages
DOC-20230110-WA0046. (1)
No ratings yet
DOC-20230110-WA0046. (1)
8 pages
All Document Reader 1715619870900
No ratings yet
All Document Reader 1715619870900
6 pages
Lec 02 - DS100 Fa23 - Pandas 1
No ratings yet
Lec 02 - DS100 Fa23 - Pandas 1
61 pages
Pandas DataFrame Notes
No ratings yet
Pandas DataFrame Notes
13 pages
Chapter 1 - Part 2 - DataFrame (1)
No ratings yet
Chapter 1 - Part 2 - DataFrame (1)
48 pages
Python for ML
No ratings yet
Python for ML
41 pages
CSL-410-L17
No ratings yet
CSL-410-L17
27 pages
Python For Data Science
No ratings yet
Python For Data Science
45 pages
Python-for-Data-Analysis (Pandas
No ratings yet
Python-for-Data-Analysis (Pandas
31 pages
Python For DA
100% (2)
Python For DA
47 pages
FDS Notes Unit-4
No ratings yet
FDS Notes Unit-4
30 pages
Data Structures in C / C ++: Exercises and Solved Problems
From Everand
Data Structures in C / C ++: Exercises and Solved Problems
Fulbia Torres
No ratings yet
Profound Python Data Science
From Everand
Profound Python Data Science
Onder Teker
No ratings yet
Oracle Certified Professional Java Programmer OCPJP 1Z0 809
From Everand
Oracle Certified Professional Java Programmer OCPJP 1Z0 809
Manish Soni
No ratings yet
The Essential R Reference
From Everand
The Essential R Reference
Mark Gardener
No ratings yet
Data Structures and Algorithm
From Everand
Data Structures and Algorithm
Knowledge Flow
No ratings yet
ADC Measurement Detail
No ratings yet
ADC Measurement Detail
9 pages
Transitioning To Splunk Cloud Course Description
No ratings yet
Transitioning To Splunk Cloud Course Description
2 pages
Building Maturing and Rocking A Security Operations Center Brandie Anderson PDF
No ratings yet
Building Maturing and Rocking A Security Operations Center Brandie Anderson PDF
19 pages
CSS 2.1 Help Cheat Sheet
No ratings yet
CSS 2.1 Help Cheat Sheet
1 page
Unit1 CGR
No ratings yet
Unit1 CGR
81 pages
Module1 Vtu Python Notes
No ratings yet
Module1 Vtu Python Notes
114 pages
351 - 27435 - EE418 - 2016 - 1 - 2 - 1 - 0 1 Lec1 EE418 Introduction
No ratings yet
351 - 27435 - EE418 - 2016 - 1 - 2 - 1 - 0 1 Lec1 EE418 Introduction
38 pages
Professional Resume
100% (1)
Professional Resume
1 page
C Language Escape Sequences: Dan Gookin C All-in-One Desk Reference For Dummies
No ratings yet
C Language Escape Sequences: Dan Gookin C All-in-One Desk Reference For Dummies
1 page
Ptek Sim 21
No ratings yet
Ptek Sim 21
26 pages
Uml Deployment Diagram
No ratings yet
Uml Deployment Diagram
3 pages
Manual de Instruções ION ALESIS DM6 (12 Páginas) 2
No ratings yet
Manual de Instruções ION ALESIS DM6 (12 Páginas) 2
1 page
Case Study in Architectural Structures
No ratings yet
Case Study in Architectural Structures
18 pages
CSS - 06-Week 5 - Module 5 - Setting-Up Remote Access
No ratings yet
CSS - 06-Week 5 - Module 5 - Setting-Up Remote Access
5 pages
Quickspecs: HP Z6 G5 Workstation
No ratings yet
Quickspecs: HP Z6 G5 Workstation
59 pages
CSG2344 Assignment 2 2017
No ratings yet
CSG2344 Assignment 2 2017
7 pages
CCNA 1 (v5.1 + v6.0) Chapter 1 Exam Answers 2020 - 100% Full
No ratings yet
CCNA 1 (v5.1 + v6.0) Chapter 1 Exam Answers 2020 - 100% Full
25 pages
Lab Manual For Computer Organization and Assembly Language: Stack
No ratings yet
Lab Manual For Computer Organization and Assembly Language: Stack
8 pages
Top 20 Data Entry Operator Interview Questions and Answers
No ratings yet
Top 20 Data Entry Operator Interview Questions and Answers
4 pages
Module 1 (Chapter 1 (What Is Iot) & Chapter 2 (Iot Network Architecture and Design) )
No ratings yet
Module 1 (Chapter 1 (What Is Iot) & Chapter 2 (Iot Network Architecture and Design) )
6 pages
IGNOU MCA 3rd Semster Object Oriented Analysis
No ratings yet
IGNOU MCA 3rd Semster Object Oriented Analysis
17 pages
True Launch Assurance Training - Handover Issue
No ratings yet
True Launch Assurance Training - Handover Issue
52 pages
How To Become A Youtube Influencer
No ratings yet
How To Become A Youtube Influencer
35 pages
Tomb Raider Settings
No ratings yet
Tomb Raider Settings
4 pages
Informatica Basic Dac Obia7964
0% (1)
Informatica Basic Dac Obia7964
96 pages
Model Answer Format MIC Class Test 2
No ratings yet
Model Answer Format MIC Class Test 2
7 pages

Unit 3 Categorical - Data

Uploaded by

Unit 3 Categorical - Data

Uploaded by

L.J.

Institute of Engineering & Technology Semester: V (2022) PDS (3150713)

The categorical data type is useful in the following cases:

Let’s take an example

cat = pd.Categorical(['a', 'b', 'c', 'a', 'b', 'c'])

Let’s have another example,

cat = cat=pd.Categorical(['a','b','c','a','b','c','d'], ['c', 'b', 'a'])

Now, take a look at the following example,

cat = cat=pd.Categorical(['a','b','c','a','b','c','d'], ['c', 'b', 'a'],ordered=True)

Categories (3, object): [c < b < a]

cat = pd.Categorical(["a", "c", "c", np.nan], categories=["b", "a", "c"])

Its output is as follows,

Appending New Categories

Index([u'a', u'b', u'c', 4], dtype='object')

print ("After removal:")

Comparison of Categorical Data

Take a look at the following example

cat = pd.Series([1,2,3]).astype("category", categories=[1,2,3], ordered=True)

Its output is as follows,

You might also like