100% found this document useful (1 vote)

3K views

AI Supplement IX (Web)

Uploaded by

Ratan Bharti

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

3K views

AI Supplement IX (Web)

Uploaded by

Ratan Bharti

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 168

SUPPLEMENT

Dr. Sachin Gupta

MCA, M.Tech, M.Phil, Ph.D
Dean (Research and Innovation)
MAIT, GGSIPU, Delhi

Dr. Bhoomi Gupta

MCA, M.Tech, M.Phil, Ph.D
Associate Professor,
MAIT, GGSIPU, Delhi

Educational Publishers
SULTAN CHAND & SONS (P) LTD
Educational Publishers
4859/24, Darya Ganj, New Delhi-110 002
Phones : 4354 6000 (100 Lines), 2324 3939
Fax : (011) 4354 6004, 2325 4295
E-mail : [email protected]
Buy books online at : www.sultan-chand.com

ISBN: 978-81-19446-83-4

Edition 2024

All rights reserved.

No part of this book may be reproduced or copied in any form or by any means (graphic, electronic or mechanical, including
photocopying, recording, taping, or information retrieval system) or reproduced on any disc, tape, perforated media or any other
information storage device, etc., without the prior written permission of the publishers. Breach of this condition is liable for legal
action. Anyone who brings information regarding any such reproduction will be handsomely rewarded.

Publication of Key to this book is strictly prohibited.

Every effort has been made to avoid errors or omissions in this publication. In spite of this, some errors might have crept in. Any
mistake, error or discrepancy noted may be brought to our notice which shall be taken care of in the next edition. It is notified that
neither the publishers nor the author or seller will be responsible for any damage or loss of action to anyone, of any kind, in any
manner, therefrom.

For faulty binding, misprints or for missing pages, etc., the publishers’ liability is limited to replacement within one month of the
purchase by a similar edition. All expenses in this connection are to be borne by the purchaser.

All disputes are subject to Delhi jurisdiction only.

Printed at: Narain Printers, Noida

SYLLABUS
ARTIFICIAL INTELLIGENCE (Code 417)
CLASS IX

Total Marks: 100 (Theory 50 + Practical 50)

Units No. of Hours Max. Marks
for Theory and for Theory and
Practical Practical
EMPLOYABILITY SKILLS
Unit 1: Communication Skills–I 10 2
Unit 2: Self-Management Skills–I 10 2
PART A Unit 3: ICT Skills–I 10 2
Unit 4: Entrepreneurial Skills–I 15 2
Unit 5: Green Skills–I 05 2
Total 50 10
SUBJECT-SPECIFIC SKILLS Theory Practical Marks
(in Hours) (in Hours)
Unit 1: AI Reflection, Project Cycle and Ethics 30 25 10
Unit 2: Data Literacy 22 28 10
PART B
Unit 3: Maths for AI (Statistics & Probability) 12 13 07
Unit 4: Introduction to Generative AI 08 12 05
Unit 5: Introduction to Python 01 09 08
Total 160 40
PRACTICAL WORK
Unit 5: Introduction to Python
15
Practical File (minimum 15 programs)
Practical Examination
• Simple programs using input and output function
PART C • Variables, Arithmetic Operators, Expressions, Data Types
15
• Flow of control and conditions
• Lists
*Any 3 programs based on the above topics
Viva Voce 5
Total 35
Project Work / Field Visit / Student Portfolio
15
PART D *relate it to Sustainable Development Goals
Total 15
GRAND TOTAL 210 100

DETAILED CURRICULUM/TOPICS FOR CLASS IX

PART A: Employability Skills
S.No. Units Duration of Hours
1. Unit 1: Communication Skills–I 10
2. Unit 2: Self-Management Skills–I 10
3. Unit 3: Information and Communication Technology Skills-I 10
4. Unit 4: Entrepreneurial Skills–I 15
5. Unit 5: Green Skills–I 05
Total 50
Note: Detailed curriculum/ topics to be covered under Part A: Employability Skills can be downloaded from CBSE website.
PART B: Subject-Specific Skills
1. Unit 1: AI Reflection, Project Cycle and Ethics
2. Unit 2: Data Literacy
3. Unit 3: Maths for AI (Statistics & Probability)
4. Unit 4: Introduction to Generative AI
5. Unit 5: Introduction to Python

UNIT 1: AI Reflection, Project Cycle and Ethics

SUB-UNIT LEARNING OUTCOMES SESSION / ACTIVITY / PRACTICAL
To identify and appreciate Artificial Intelligence and Session: Introduction to AI and setting up the context
describe its applications in daily life of the curriculum
• Recommended Activity: Make a statement about
lighting and LUIS will interpret and adjust the house
accordingly
https://aidemos.microsoft.com/luis/demo

To recognize, engage and relate with the three realms Recommended Activity: The AI Game
AI of AI: Computer Vision, Data Statistics and Natural • Learners to participate in three games based on
REFLECTION Language Processing different AI domains
— Game 1: Rock, Paper and Scissors (based on
data) https://next.rockpaperscissors.ai/
— Game 2: Semantris (based on Natural Language
Processing – NLP)
https://research.google.com/semantris/
— Game 3: Quick Draw (based on Computer
Vision – CV)
https://quickdraw.withgoogle.com/
To identify the AI Project Cycle framework Session: Introduction to AI Project Cycle
• Problem Scoping
• Data Acquisition
• Data Exploration
• Modeling
• Evaluation
• Deployment
To learn problem scoping and ways to set goals for an Session: Problem Scoping
AI project Activity: Brainstorm around the theme provided and
set a goal for the AI project
• Discuss various topics within the given theme and
select one.
• Fill in the 4Ws problem canvas and a problem
statement to learn more about the problem
identified in the community/ society
• List down/ Draw a mind map of problems related
AI PROJECT to the selected topic and choose one problem to be
CYCLE the goal for the project.
To identify stakeholders involved in the problem • Activity: To set actions around the goal
scoped. Brainstorm on the ethical issues involved • List down the stakeholders involved in the problem.
around the problem selected • Search on the current actions taken to solve this
problem.
• Think around the ethics involved in the goal of your
project.
To understand the iterative nature of problem scoping Activity: Data and Analysis
for in the AI project cycle • What are the data features needed?
Foresee the kind of data required and the kind of • How will the features collected affect the problem?
analysis to be done • Where can you get the data?
• How frequent do you have to collect the data?
• What happens if you don’t have enough data?
• What kind of analysis needs to be done?
• How will it be validated?
• How does the analysis inform the action?
Share what the students have discussed so far Presentation: Presenting the goal, actions and data
Teamwork Activity:
• Brainstorming solutions for the problem statement.
To identify data requirements and find reliable Session: Data Acquisition
sources to obtain relevant data Activity: Introduction to data and its types.
• Students work around the scenarios given to them
and think of ways to acquire data.
Activity: Data Features
• Identifying the possible data features affecting the
problem.
Activity: System Maps
• Creating system maps considering data features
identified.
To understand the purpose of Data Visualization Session: Data Exploration/ Data Visualization
• Need of visualizing data
• Ways to visualize data using various types of
graphical tools
Quiz Time
To use various types of graphs to visualize acquired Recommended Activities: Let’s use Graphical Tools
data • Selecting an appropriate graphical format and
presenting the graph sketched
• Understanding graphs using
https://datavizcatalogue.com/
• Listing of newly learnt data visualization
techniques
• Top 10 Song Prediction: Identify the data features,
collect the data and convert into graphical
representation.
• Collect and store data in a spreadsheet and create
some graphical representations to understand the
data effectively.
To understand modeling (Rule-based & Learning- Session: Modeling
based) • Introduction to modeling and types of models
(Rule-based & Learning-based)
To understand various evaluation techniques Session: Evaluation
Learners will understand about new terms
• True Positive
• False Positive
• True Negative
• False Negative
Challenge students to think about how they can apply Session: Deployment
their knowledge of deployment in future AI projects Recommended Case Study: Preventable Blindness
and encourage them to continue exploring different Activity: Implementation of AI project cycle to develop
deployment methods. an AI Model for Personalized Education
To understand and reflect on the ethical issues Session: Ethics
around AI Video Session: Discussing about AI Ethics
Recommended Activity: Ethics Awareness
• Students play the role of major stakeholders, and
they have to decide what is ethical and what is not
for a given scenario.
• Students to explore Moral Machine
• (https://www.moralmachine.net/) to understand
more about the impact of ethical concerns
To gain awareness around AI bias and AI access Session: AI Bias and AI Access
• Discussing about the possible bias in data collection
• Discussing about the implications of AI technology
To let the students analyze the advantages and Recommended Activity: Balloon Debate
disadvantages of Artificial Intelligence • Students divide in teams of 3 and 2 teams are given
same theme. One team goes in affirmation to AI for
their section while the other one goes against it.
• They have to come up with their points as to why AI
is beneficial/ harmful for the society.
UNIT 2: DATA LITERACY
SUB-UNIT LEARNING OUTCOMES SESSION / ACTIVITY / PRACTICAL
• To define data literacy and recognize its importance Session: Basics of data literacy
• To understand how data literacy enables informed • Introduction to Data Literacy
decision-making and critical thinking • Impact of data Literacy
• To apply the Data Literacy Process Framework to • How to become Data Literate?
analyze and interpret data effectively • What are data security and privacy? How are they
• To differentiate between Data Privacy and Security related to AI?
Basics of data • To identify potential risks associated with data • Best Practices for Cyber Security
literacy breaches and unauthorized access
• To learn measures to protect data privacy and Recommended Activity: Impact of News Articles
enhance data security Reference Videos:
• https://www.youtube.com/watch?v=yhO_t-c3yJY
• https://www.youtube.com/watch?v=aO858HyFbKI
• https://www.cbse.gov.in/cbsenew/documents/
Cyber%20Safety.pdf
• To determine the best methods to acquire data Session: Acquiring, Processing, and Interpreting
• To classify different types of data and enlist Data
different methodologies to acquire it • Types of data
• To define and describe data interpretation • Data Acquisition/Acquiring Data
Acquiring, • To enlist and explain the different methods of data • Best Practices for Acquiring Data
Processing, interpretation • Features of data and Data Preprocessing
and • To recognize the types of data interpretation • Data Processing and Data Interpretation
Interpreting • To realize the importance of data interpretation • Types of Data Interpretation
Data • Importance of Data Interpretation
Recommended Activities:
• Trend analysis
• Visualize and Interpret Data
• To recognize the importance of data visualization Session: Project Interactive Data Dashboard &
• To discover different methods of data visualization Presentation
Project
• Data visualization Using Tableau Reference Links
Interactive
• https://public.tableau.com/en-us/s/download
Data
• https://www.datawrapper.de/
Dashboard &
Video Links:
Presentation
• https://www.youtube.com/watch?v=NLCzpPRCc7U
• https://www.youtube.com/watch?v=_M8BnosAD78

UNIT 3: MATHS FOR AI (Statistics & Probability)

SUB-UNIT LEARNING OUTCOMES SESSION / ACTIVITY / PRACTICAL
To analyze the data in the form of numbers/images Session: Importance of Maths for AI
and find the relation/pattern between them. • Finding Patterns in Numbers and images.
Use of Maths in AI. • Uses of Maths-
— Statistics
— Linear Algebra
Importance — Probability
of Maths for — Calculus
AI
To understand number patterns, picture analogy Activity:
• observe the number pattern and find the missing
number.
• To find connections between sets of images and
use that to solve problems
To understand the concept of Statistics in real life Session:
• Definition of Statistics
• Applications
Statistics — Disaster Management
— Sports
— Diseases Prediction
— Weather Forecast
Application in various real-life scenarios Activity: Uses of Statistics in daily life
• Students will explore the applications of statistics
in real life. They collect data and can apply various
statistical measures to analyze the data.
Activity: Car Spotting and Tabulating
Purpose: To implement the concept of data collection,
analysis and interpretation.
Activity Introduction:
• In this activity, Students will be engaged in data
collection and tabulation.
• Data collection plays a key role in Artificial
Intelligence as it forms the basis of statistics and
interpretation by AI.
• This activity will also require students to answer a
set of questions based on the recorded data.
To understand the concept of Probability in real life Session: Introduction to Probability
and explore various types of events • How to calculate the probability of an event
• Types of events
• Understand the concept of Probability using a
relatable example
Exercise: Identify the type of event
Application in various real-life scenarios Session: Applications of Probability
Probability
• Sports
• Weather Forecast
• Traffic Estimation
Exercise: Revision time
Session:
• Introduction to Generative AI
• Generative AI VS Conventional AI

UNIT 4: INTRODUCTION TO GENERATIVE AI

LEARNING OUTCOMES SESSION / ACTIVITY / PRACTICAL
Students will be able to define Generative AI & classify Recommended Activity:
different kinds • Guess the Real Image VS the AI-generated image
• Students will be able to explain how Generative AI Session:
works and recognize how it learns • Introduction to Generative AI
• Generative AI VS Conventional AI
Session:
• Types of Generative AI
• Examples of Generative AI
Session:
• Benefits of using Generative AI
• Limitations of using Generative AI
Recommended Activities:
• Applying Generative AI tools to create content • Hands-on Activity: GAN Paint
• Understanding the ethical considerations of using • Generative AI tools
Generative AI
Session:
• Ethical considerations of using Generative AI

UNIT 5: INTRODUCTION TO PYTHON

LEARNING OUTCOMES SESSION / ACTIVITY / PRACTICAL
To learn basic programming skills through gamified Recommended Activity:
platforms • Introduction to programming using Online Gaming
portals like Code Combat
To acquire introductory Python programming skills in Session:
a very user-friendly format • Introduction to Python language
• Introducing Python programming and its
applications
Theory + Practical: Python Basics
• Students go through lessons on Python Basics
(Variables, Arithmetic Operators, Expressions,
Comparison Operators, Logical operators,
Assignment Operators, Data Types – integer, float,
strings, type conversion, using print() and input()
functions)
• Students will try some simple problem-solving
exercises on Python Compiler.
Practical: Flow of control and conditions
1. Students go through lessons on conditional and
iterative statements (if, for and while)
2. Students will try some basic problem-solving
exercises using conditional and iterative statements
on Python Compiler.
Practical: Python Lists
3. Students go through lessons on Python Lists
(Simple operations using list)
4. Students will try some basic problem-solving
exercises using lists on Python Compiler.

PART C: PRACTICAL WORK

UNIT 5: INTRODUCTION TO PYTHON: Suggested Program List
• To print personal information like Name, Father’s Name, Class, School Name
• To print the following patterns using multiple print commands:

PRINT
• To find square of number 7
• To find the sum of two numbers 15 and 20
• To convert length given in kilometers into meters
• To print the table of 5 up to five terms
• To calculate Simple Interest if the principle_amount = 2000, rate_of_interest = 4.5, time = 10
• To calculate Area and Perimeter of a rectangle
• To calculate Area of a triangle with Base and Height
INPUT • To calculate average marks of 3 subjects
• To calculate discounted amount with discount %
• To calculate Surface Area and Volume of a Cuboid
• Create a list in Python of children selected for science quiz with following names—Arjun, Sonakshi, Vikram,
Sandhya, Sonal, Isha, Kartik perform the following tasks on the list in sequence—
— Print the whole list
— Delete the name “Vikram” from the list
— Add the name “Jay” at the end
— Remove the item which is at the second position
LIST • Create a list num = [23,12,5,9,65,44]
— print the length of the list
— print the elements from second to fourth position using positive indexing
— print the elements from position third to fifth using negative indexing
• Create a list of first 10 even numbers, add 1 to each list item and print the final list.
• Create a list List_1 = [10,20,30,40]. Add the elements [14,15,12] using extend function. Now sort the final
list in ascending order and print it.
• Program to check if a person can vote
• To check the grade of a student
• Input a number and check if the number is positive, negative or zero and display an appropriate message
• To print first 10 natural numbers
IF, FOR, WHILE
• To print first 10 even numbers
• To print odd numbers from 1 to n
• To print sum of first 10 natural numbers
• Program to find the sum of all numbers stored in a list
Important • https://cbseacademic.nic.in/web_material/Curriculum21/publication/secondary/Python_Content_Manual.pdf
Links • https://drive.google.com/drive/folders/1qRAckDculA5i164OUFDlilxb8mT65MMb

PART D: Project Work / Field Visit / Student Portfolio (*relate it to Sustainable Development Goals)
SUGGESTED PROJECTS/ FIELD VISIT / PORTFOLIO (ANY ONE HAS TO BE DONE)
1. Create an AI Model using tools like:
— Teachable Machine (https://teachablemachine.withgoogle.com/)
— Machine Learning for Kids (https://machinelearningforkids.co.uk/)
2. Choose an issue that pertains to the objectives of sustainable development and carry out the actions listed
Suggested below.
Projects — To understand more about the problem identified, create a 4Ws problem canvas.
— To identify the data features and create a system map to understand relationship between them
— To visualize the data collected graphically (Spreadsheet software to be used store and visualize
the data)
— Suggest an AI enabled solution to it (Prototype/Research Work)
Suggested Visit to an industry or IT company or any other place that is creating or using AI applications and present the
Field Visit report for the same. Visit can be on physical or virtual mode.
Suggested Maintaining a record of all AI activities and projects (For Example Letter to Future Self, Smart Home Floor
Student Plan, Future Job Advertisement, Research Work on AI for SDGs and AI in Different Sectors, 4Ws canvas, System
Portfolio Map). (Minimum 5 Activities)
CONTENTS
SUBJECT-SPECIFIC SKILLS
1. AI Reflection, Project Cycle and Ethics 1–26
EVALUATION AND METRICS IN MACHINE LEARNING . . . 1
DIFFERENT ML TASKS—DIFFERENT EVALUATION METRICS . . . 1
• Classification Models . . . 2
• Regression Models . . . 2
EVALUATION EXAMPLES . . . 2
• Evaluation Methods . . . 3
• ROC- AUC Curve . . . 6
DEPLOYMENT OF AI MODELS . . . 7
• Examples of AI Deployment . . . 7
• AI Deployment in Smartphones . . . 7
• Mapping the Problem to AI Project Cycle . . . 9
SOME AI APPLICATIONS . . . 12
• Plantix: An AI-based Solution . . . 15
WHAT ARE ETHICS . . . 17
• Ethics vs Morals . . . 19
• Why are ethics important . . . 20
• AI Ethics Principles . . . 20

2. Data Literacy 27–96

INTRODUCTION . . . 27
WHAT IS DATA . . . 28
• Data in Computing . . . 29
• Data vs Information . . . 30
DIKW MODEL . . . 33
• How does data influence our lives . . . 35
INTRODUCTION TO DATA LITERACY . . . 38
• Importance of Data Literacy . . . 40
• Impact of Data Literacy . . . 41
• Data Literacy Impact Stories . . . 41
• How to become Data Literate . . . 43
• Data Literacy Process Framework . . . 47
DATA SECURITY AND PRIVACY . . . 47
• Increasing Importance of Data Security and Privacy . . . 48
• Exposure to Cyber Crimes . . . 50
• Government Initiatives – Data Privacy in India . . . 51
DATA PRIVACY, SECURITY AND ARTIFICIAL INTELLIGENCE . . . 52
• Cybersecurity Best Practices . . . 54
ACQUIRING, PROCESSING AND INTERPRETING DATA . . . 55
• Where does data come from . . . 56
• What are variables . . . 57
TYPES OF DATA . . . 58
• Categorization by Data Property . . . 59
• Categorization by Organization . . . 61
• Categorization by Application . . . 62
• Types of Data used in AI Applications . . . 62
DATA ACQUISITION . . . 63
• What are Data Sources . . . 64
• Primary vs Secondary Data Sources . . . 65
• Primary Data . . . 66
• Primary Data Collection Techniques . . . 66
• Secondary Data . . . 68
• Best Practices for Data Acquisition . . . 69
• Features of Data . . . 70
• Data Preprocessing . . . 72
• Data Usability . . . 73
• Data Processing and Data Interpretation . . . 73
• Methods of Data Interpretation . . . 74
• Types of Data Interpretation . . . 78
• Importance of Data Interpretation . . . 82
PROJECT: INTERACTIVE DATA . . . 82
• Dashboard & Presentation . . . 82
• Tools for Data Dashboards and Data Presentations . . . 83

3. Mathematics for AI (Statistics & Probability) 97–128

IMPORTANCE OF MATHS IN AI . . . 97
• Applications of Mathematics in AI . . . 100
• Important Mathematical Concepts for Understanding AI . . . 100
STATISTICS . . . 101
• What is Statistics . . . 101
• Where It All Began . . . 101
UNDERSTANDING STATISTICS: THE CRICKET WAY . . . 102
• Frequency . . . 103
• Tally . . . 104
• Dot Plots . . . 104
• Applications of Statistics . . . 105
PROBABILITY . . . 113
• Probability in Data Science . . . 114
• Probability: Terminology . . . 115
• Probability: Non-Technical Terminology . . . 117
• Calculating Probability . . . 120
• Descriptive Statistics vs Inferential Statistics . . . 121
• Use of Probability in Different Applications . . . 122

4. Generative AI 129–156
INTRODUCTION . . . 129
WHAT IS GENERATIVE AI . . . 130
• Key Drivers of Generative AI . . . 131
• Evolution of Generative AI . . . 131
• Generative AI Applications . . . 133
• Generative AI: Unlimited Horizons . . . 134
GENERATIVE AI VS TRADITIONAL AI . . . 135
TYPES OF GENERATIVE AI . . . 137
• Generative Adversarial Networks (GANs) . . . 137
• Recurrent Neural Networks (RNNs) . . . 138
• Variational Autoencoders (VAEs) . . . 139
EXAMPLES OF GENERATIVE AI . . . 142
• OpenAI’s ChatGPT . . . 142
• Kidgeni . . . 142
• OpenAI SORA . . . 142
• MusicLM by Google . . . 143
• Chrome Music Lab: Song Maker . . . 143
• This Person Does Not Exist . . . 144
BENEFITS OF USING GENERATIVE AI . . . 145
LIMITATIONS OF USING GENERATIVE AI . . . 146
HOW TO USE GENERATIVE AI TOOLS IN REAL-WORLD SCENARIOS . . . 146
• Socially Beneficial Uses of Generative AI . . . 148
ETHICAL CONSIDERATIONS IN USING GENERATIVE AI . . . 150
POTENTIAL NEGATIVE IMPACT ON SOCIETY . . . 150
• Energy Usage Concerns . . . 151
RESPONSIBLE USE OF GENERATIVE AI . . . 151
• Responsible Use of AI for Students . . . 152

Answers to Objective Type Questions 156

AI Reflection, Project
1 Cycle and Ethics

EVALUATION AND METRICS IN MACHINE LEARNING

The term ‘metrics’ refers to measurements or standards used to quantify and evaluate
something.

In machine learning, evaluation and metrics are important to analyze how well a model
performs. These methods help us understand the accuracy and generalizability (how
correctly the model will work on a new unseen data) of ML models. While evaluation is the
process of assessment, metrics are quantitative measures used to assess the performance
of ML models.
We perform the evaluation of ML models to objectively assess their performance. Evaluation
metrics provide us with concrete measures to understand how well our models are performing
and whether they need improvement. We may also say that evaluation metrics provide
numerical feedback on how well a model is performing in solving a specific task such as
classification, regression, clustering or anomaly detection.

DIFFERENT ML TASKS—DIFFERENT EVALUATION METRICS

Machine Learning models can be broadly categorized into two types: Classification Models
and Regression Models.

 This chapter contains new topics only.

Classification Models
Classification models are those models whose output is a predicted class or category
instead of a numerical value.
Some examples of classification models include:
• A model that classifies the language of an input sentence (French, Spanish, etc.).
• A model that classifies the tree species of a given plant based on images (Apple Fuji,
Golden Delicious, etc.).
• A model that classifies whether a medical condition is present or not (positive or
negative).

A
B
C

Regression Models
Regression models output numerical values such as predicting a sales figure or a housing
price. The key distinction is that classification is about discrete categories while regression
deals with continuous quantities.
• For example, approximating the distance a car will travel with a certain quantity of fuel
is a prediction we can make on the basis of Regression.

EVALUATION EXAMPLES
In a classification task where we distinguish between apples and oranges based on features
like color and size, a sample evaluation metric could be accuracy, which tells us what
percentage of input fruits is correctly classified out of all the fruits in the dataset.

2 SUPPLEMENT—Decoding Artificial Intelligence–IX

If we train a model to predict house prices based on features like location and size, evaluation
metrics help us determine how close the predicted prices are to the actual prices, allowing us
to improve our model for better predictions.

Evaluation Methods
1. Train-Test Split
This is the most basic evaluation method in machine learning. The dataset is divided
into two parts: a training set and a testing set. The model is trained on the data from the
training set and is then evaluated for performance on the data from the testing set.
Example: If you have a dataset of 1,000 housing price records, you might use 80% of

the data (800 records) for training and 20% (200 records) for testing. After training the
model on 800 housing price records, you test the trained model’s performance on the
remaining 200 records to evaluate how accurate the price prediction is for any house in
the testing dataset.

2. Cross-Validation
Cross-validation is an advanced evaluation method. When we have less amount of data,
the dataset is divided into k equal parts (folds). The model is trained k times, each time
using a different fold as the testing set and the remaining folds as the training set. The
average performance across all k tests gives a more reliable estimate of the model’s
performance.
Example: In 5-fold cross-validation, the dataset is split into 5 parts. The model is

trained 5 times, each time using 4 parts for training and the remaining one part for
testing. The results from the 5 tests are averaged to give a final performance metric.

3. Confusion Matrix

Confusion matrix is used for classification problems. In a classification model, the

machine learning algorithm is trained and expected to predict the correct class for the
input provided. Confusion matrix shows the number of true positive (TP), true negative
(TN), false positive (FP) and false negative (FN) predictions made by the model. From
this matrix, several performance metrics can be derived.

AI Reflection, Project Cycle and Ethics 3

We will understand the confusion matrix with an example.

Look at the following images. Can you tell which of these are dogs and which are
not?

Let us create two classifications—‘Dog’ and ‘Not Dog’—as an example.
Possibility: The model can make correct predictions and may classify the provided input
correctly. There may be two such cases:
1. The model may predict the actual class correctly, i.e., the classifier between ‘Dog’ and
‘Not Dog’ images correctly classifies a dog image provided as input.

Input Image

ML Model Classification Output: Dog

Actual Class: Dog | Predicted Class: Dog
True Positive (TP): The model correctly predicts a positive class.
2. The model may predict the actual negative class correctly, i.e., the classifier between
‘Dog’ and ‘Not Dog’ images correctly classifies a not-dog image provided as input.

Input Image

4 SUPPLEMENT—Decoding Artificial Intelligence–IX

ML Model Classification Output: Not Dog
Actual Class: Not Dog | Predicted Class: Not Dog
True Negative (TN): The model correctly predicts a negative class.
Alternative Possibility: If the model is not trained on enough examples or is trained on
poor-quality data examples, it may get confused between the overlapping features of the
classes to be predicted and make some errors.
3. The model may incorrectly predict a positive class, i.e., the classifier between ‘Dog’ and
‘Not Dog’ images incorrectly classifies a not-dog image provided as input.

Input Image

ML Model Classification Output: Dog

Actual Class: Not Dog | Predicted Class: Dog
False Positive (FP): The model incorrectly predicts a positive class.
4. The model may incorrectly predict a negative class, i.e., the classifier between ‘Dog’ and
‘Not Dog’ images incorrectly classifies a dog image provided as input.

Input Image

ML Model Classification Output: Not Dog

Actual Class: Dog | Predicted Class: Not Dog
False Negative (FN): The model incorrectly predicts a negative class.
We may summarize the components of a confusion matrix for a binary classification problem
as:
• True Positive (TP): The model correctly predicts a positive class.
• True Negative (TN): The model correctly predicts a negative class.
• False Positive (FP): The model incorrectly predicts a positive class.
• False Negative (FN): The model incorrectly predicts a negative class.

AI Reflection, Project Cycle and Ethics 5

These values are presented in a 2 × 2 matrix called the Confusion Matrix, as illustrated below.

ACTUAL VALUES
POSITIVE NEGATIVE

POSITIVE
TRUE POSITIVE FALSE POSITIVE

PREDICTED VALUES
NEGATIVE
FALSE NEGATIVE TRUE NEGATIVE

Do it Yourself
CONFUSION MATRIX
Consider a Spam Classification ML algorithm which has the following performance: 50
emails correctly identified as spam, 10 emails incorrectly identified as spam (i.e., they are
actually not spam), 5 emails incorrectly identified as not spam (i.e., they are actually spam)
and 35 emails correctly identified as not spam. Count the values of TP, TN, FP and FN for
the above statement.

Actual Class
Spam Not Spam
Predicted Class Spam TP = FP =
Not Spam FN = TN =

ROC-AUC Curve
ROC Curve
The Receiver Operating Characteristic (ROC) curve Perfect
is a graphical representation used to evaluate the classifier ROC curve
1.0
performance of a binary classification model. It plots the Better
True positive rate

True Positive Rate (TPR) against the False Positive Rate

(FPR) at various threshold settings. True positive rate
Worse
r
ie

and false positive rate are calculated from the confusion 0.5
sif
as
Cl

matrix components as:

om
nd

• True Positive Rate (TPR): Also known as Recall or

Sensitivity; it is calculated as TPR = TP / (TP + FN).

0.0
0.0 0.5 1.0
• False Positive Rate (FPR): It is calculated as
False positive rate
FPR = FP / (FP + TN).

6 SUPPLEMENT—Decoding Artificial Intelligence–IX

AUC Curve
AUC (Area Under the ROC Curve) measures the entire two-dimensional area underneath the
ROC curve, providing a single numeric value to summarize the performance of the classifier.

AUC Values
• AUC = 1: Perfect model
ROC
• 0.9 ≤ AUC < 1: Excellent model
• 0.8 ≤ AUC < 0.9: Good model TPR
• 0.7 ≤ AUC < 0.8: Fair model
AUC
• 0.6 ≤ AUC < 0.7: Poor model
• 0.5 ≤ AUC < 0.6: Failed model
FPR
DEPLOYMENT OF AI MODELS
Deployment in AI refers to the process of taking a developed and tested AI model and
making it available for use in real-world applications. It is the final step in the AI project
cycle, where the model moves from the development environment to being actively used by
people or systems.

Key Steps in the Deployment Process

1. Testing and Validation: Before deployment, the AI model is tested to ensure it works
well with real-world data. This step helps identify and fix any errors or issues.
2. Integration with Existing Systems: This step involves connecting the AI model to the
user systems. For example, it may be added to a website, a mobile app or a software.
3. Monitoring and Maintenance: After deployment, the AI model is monitored to make
sure it performs well over time. This involves checking for any unusual behaviour or
drops in performance. Regular updates and improvements are made to the model to
keep it accurate.

Examples of AI Deployment
• Self-Driving Cars: AI models are deployed in self-driving cars to help them navigate
roads, recognize traffic signs and avoid obstacles.
• Medical Diagnosis Systems: In healthcare, AI models assist doctors by analyzing
medical images to detect diseases and recommend treatments.
• Chatbots: Many websites and apps use AI-powered chatbots to answer customer
questions, provide information and help with tasks.

AI Deployment in Smartphones
AI deployment in many smartphone features and apps that we use regularly are quite
common nowadays.

AI Reflection, Project Cycle and Ethics 7

The following are some features you may be already using:
1. Voice Assistants: These AI-powered assistants can answer questions, set reminders,
send messages and control smart home devices through voice commands. Examples:
Siri (Apple), Google Assistant (Android) and Alexa (Amazon).
2. Camera and Photography: AI algorithms detect and adjust settings for different
scenes (e.g., landscape, portrait or night mode). AI improves image quality by adjusting
brightness, contrast and colors automatically.
3. Facial Recognition: AI is used to recognize and authenticate users based on their facial
features for secure unlocking and payment verification. Examples: Face ID (Apple) and
various Android facial recognition systems.
4. Predictive Text and Autocorrect: AI predicts the next word or suggests entire phrases
based on the user’s typing habits and context. Examples: Smart Compose (Google) and
Smart Keyboard.
5. Personalized Recommendations: AI algorithms analyze user behaviour and preferences
to suggest content based on individual choices. Examples: Music apps (Spotify, Apple
Music) and video apps (YouTube, Netflix).

Case Study
CROP DISEASE DETECTION AI PLATFORM
Problem: How to Improve Disease Detection Efficiency to Prevent Crop Loss. Agriculture is
a critical industry and crop diseases can lead to significant losses in quality and yield. Crop
diseases can destroy fields, leading to food scarcity and huge losses to farmers.
Challenges:
• Lack of access to expert agronomists in rural and remote areas.
• Delays in diagnosing diseases can result in widespread crop damage.
• Visual symptoms of diseases, such as leaf spots or wilting, can be easily missed.
Example: Early blight in tomatoes affects the tomato crop
adversely. The symptoms of disease can be identified with the
help of AI using leaf images.
• Normal Leaf: Healthy, green leaves
• Diseased Leaf: Leaves with brown spots and yellowing
AI Solution Implementation: An AI-based crop disease detection platform can be developed
in collaboration with agricultural universities and tech companies. AI models are trained to
achieve high accuracy—comparable to expert agronomists—in detecting various crop diseases.
Deployment: The AI model for disease detection can be deployed in a mobile application. This
will allow farmers to click leaf pictures, which can then be inspected by AI to check for a disease
and suggest remedial measures.
How it works:
• Image Collection: Farmers or technicians click pictures of crop leaves using smartphones.
• AI Analysis: The digital images are analyzed by AI to detect signs of disease.
• Real-Time Feedback: The platform provides immediate feedback and suggests treatment.
This AI application helps in quick identification of crop diseases. Moreover, with a mobile app
deployment, even farmers without access to experts can use this technology.

8 SUPPLEMENT—Decoding Artificial Intelligence–IX

Mapping the Problem to AI Project Cycle
1. Problem Scoping: Develop an AI system that can accurately diagnose crop diseases from
images, providing real-time feedback to farmers.
2. Data Acquisition
• Sources: Collect images of healthy and diseased crops from various farms and
agricultural research centres.
• Tools: Use smartphones and cameras for capturing high-quality images.
• Good-Quality Data: Ensure the dataset includes images of different crops and
various stages of disease progression.
3. Data Exploration
• Validation: Check the images to ensure that the dataset we are building is useful
and can help in disease identification.
• Analysis: Explore the dataset to understand the common features of different diseases.
• Preprocessing: Prepare the data for model training by normalizing and augmenting
images.
4. Modelling
• Model Selection: Choose suitable AI models for image classification.
• Training: Train the models using labelled dataset, ensuring they learn to distinguish
between healthy and diseased crops.
5. Evaluation
• Testing: Evaluate the models on a separate testing dataset to measure accuracy.
• Fine-Tuning: Fine-tune the models based on evaluation results to achieve the
desired performance.
6. Deployment: Integrate the AI model into a user-friendly mobile app. Ensure the
platform is easily used by farmers, especially in rural and remote areas. Provide
training and resources to farmers on how to use the AI tool effectively.
This AI project cycle demonstrates how a crop disease detection platform can be developed
and deployed to help farmers detect crop diseases early, prevent loss and improve
agricultural productivity.

AI Project Cycle Mapping Template

Data Acquisition Data Exploration Modelling Evaluation Deployment
Use smartphones and Check whether the Choose suitable AI Evaluate the models on a Integrate the AI model
cameras for capturing images are useful and models for image separate testing dataset into a user-friendly
high-quality images of can help in disease classification which can to measure accuracy. mobile app. Ensure the
healthy and diseased identification. classify healthy and platform is easily used
plant leaves. diseased plants. by farmers, even in rural
Explore the dataset to
and remote areas.
understand the common
features of different
diseases.
AI Reflection, Project Cycle and Ethics 9
Activity 5
Creating an AI Model for Early Blight Detection in Tomatoes using Teachable Machine
In this activity, you will learn how to create an AI model to detect early blight in tomato leaves using
Teachable Machine, a user-friendly platform for training machine learning models. By the end of
this activity, you will have a functional AI model that can distinguish between healthy tomato leaves
and those affected by early blight.
Materials Needed
• Computer with internet access
• Teachable Machine account (you can sign up for free)
• Dataset of images of healthy and diseased (early blight) tomato leaves. You can access the
same by opening the link https://bit.ly/tmdataset in your web browser.
Steps
1. Download Dataset: Download the dataset of tomato leaf images from the given
link. Ensure you have separate folders for healthy and diseased leaves.
2. Access Teachable Machine: Open your web browser and search for the official
website of Teachable Machine. Click on ‘Get Started’ to begin.

3. Set Up Your Project: Choose ‘Image Project’ from the available project types. Click on ‘Standard
Image Model’.

10 SUPPLEMENT—Decoding Artificial Intelligence–IX

4. Create Classes for Healthy and Diseased Leaves: You will see two classes named
‘Class 1’ and ‘Class 2’ by default. Rename them to ‘Healthy’ and ‘Diseased’, respectively, as
shown in the given image.

5. Upload Images: For the ‘Healthy’ class, click on the ‘Upload’ button and select the images of
healthy tomato leaves from the downloaded dataset. Repeat this process for the ‘Diseased’
class by uploading the images of leaves affected by early blight.
6. Train the Model: Once all the images are uploaded, click on the ‘Train Model’ button. The
platform will start training your model using the uploaded images. This may take a few minutes
depending on the size of your dataset.

7. Test the Model: After the training is complete, you can test your model using the
‘Test Model’ section. Upload new images of tomato leaves to see if the model can correctly
classify them as healthy or diseased.

AI Reflection, Project Cycle and Ethics 11

8. Export the Model and Deploy: If you are satisfied with the performance of your
model, you can export it for deployment in your mobile application. You can also
upload it on Cloud to deploy it as a Cloud application. A sample model has been deployed on
Cloud for you to use. To access it, open the link https://bit.ly/tmdeployed in your web browser.

Discussion Questions
1. What challenges did you encounter while training the model?
2. Can you create a confusion matrix for the model?
3. How can the accuracy of the model be improved?
4. How can this AI model be useful for farmers in real-world scenarios?

SOME AI APPLICATIONS
Face Lock in Smartphones: Face lock is a security feature that
uses facial recognition technology to unlock smartphones.
Instead of typing a password or using fingerprint, users
simply have to look at their devices to gain access.
How It Works:
1. Camera Capture: The phone’s front camera captures an
image of the user’s face.
2. Facial Features Analysis: AI algorithms analyze unique
facial features such as the distance between the eyes,
nose shape or jawline.
3. Comparison: AI compares the captured image with the stored facial data from the
phone’s setup phase.
4. Unlocking: If the features match, the phone unlocks.

Such AI systems provide users with quick and easy access to their devices, without relying on
passwords. They also make it harder for unauthorized users to gain access as compared to
conventional device unlocking systems.

12 SUPPLEMENT—Decoding Artificial Intelligence–IX

Fraud Risk Detection: Fraud risk detection uses AI to identify potentially fraudulent activities,
especially in financial transactions like credit card purchases or
online banking.
How It Works:
1. Data Collection: AI systems collect data from transactions,
including amount, location and time.
2. Pattern Recognition: AI analyzes transaction patterns to
establish what is normal behaviour for each user.
3. Anomaly Detection: When a transaction deviates from these
patterns (e.g., unusually large amounts or purchases in a foreign country), AI flags it as
potentially fraudulent.
4. Alerting: The system then alerts the user or the bank for further verification.

Such AI systems help prevent financial losses due to fraud. They are designed to quickly
process large volumes of transactions to detect suspicious activity in real time.

Medical Imaging: Medical imaging involves the use

of AI to analyze medical images, such as X-rays, MRIs
and CT scans, to assist doctors in diagnosing diseases.
How It Works:
1. Image Acquisition: Medical images are captured
using various imaging technologies.
2. AI Analysis: AI algorithms process these images
to identify abnormalities such as tumors,
fractures or infections.
3. Comparison and Detection: AI compares the current image with large databases of
medical images to detect signs of disease.
4. Diagnosis Assistance: The AI system provides suggestions or highlights areas of
concern for doctors to review.
Such AI systems improve the accuracy and speed of diagnoses and help in early detection of
diseases, which ultimately leads to better treatment outcomes.

Problem: Plant Diseases and Nutrient Deficiencies Affect Crop Yields

In regions where maize is a staple crop, farmers face
significant losses due to crop pests, diseases like
maize necrosis, and deficiencies in essential
nutrients like nitrogen and potassium. These issues
lead to reduced yields and economic hardships for
small-scale farmers who lack access to advanced
agricultural resources.

AI Reflection, Project Cycle and Ethics 13

Can AI help solve this problem? How?
To watch a video on the given problem, scan the given QR code or open the link
https://www.youtube.com/watch?v=LP_A4jydmz4 in your web browser. Do you think
AI can help solve this problem? Can AI systems also recommend what nutrients are
lacking or how to identify and prevent diseases in maize crops?

Planning the AI Solution: Start by listing essential factors for maize crop health and
productivity.
This system aims to:
• Detect early signs of maize lethal necrosis and nutrient deficiencies.
• Recommend appropriate actions such as adjusting fertilizer application or
implementing disease-resistant varieties.
• Provide real-time alerts and guidance to farmers through a user-friendly
mobile application.
(Add other outcomes you think are needed.)
• ............................................................................................................................................................................
• ............................................................................................................................................................................

Problem Scoping
Define the scope of the problem to be addressed by your project. Identify various diseases
and nutrient deficiencies affecting maize crops, along with their specific symptoms and
impact on yield.

Data Acquisition
Collect the following data:
• Images of diseased maize plants and symptoms
• Farmer details including location, farm size and cropping practices
• Soil nutrient levels and historical yield data
Ensure data accuracy and reliability for effective decision-making in crop management.

14 SUPPLEMENT—Decoding Artificial Intelligence–IX

Data Cleaning
Standardize and clean the acquired data to eliminate duplicates, correct errors and fill any
missing information. This step improves data quality and usability for subsequent analysis.

Data Exploration
Analyze the cleaned data to identify patterns in disease outbreaks, nutrient deficiencies and
crop performance. This exploration aids in understanding the underlying factors influencing
maize health.

Modelling
Select AI algorithms best suited for image recognition and data analysis. Develop models that
can accurately identify disease symptoms and nutrient deficiencies from images and other
data inputs.

Testing and Evaluation

Evaluate the performance of the AI model through rigorous testing using diverse datasets.
Measure the accuracy of the model in detecting diseases and recommending appropriate
interventions.

Deployment
Deploy the AI-driven maize health monitoring system in a mobile application. Ensure
accessibility to farmers, providing timely insights and actionable recommendations.

Plantix: An AI-based Solution

Plantix (https://plantix.net/en/) is an AI-powered mobile application designed to assist
farmers in diagnosing plant diseases, nutrient deficiencies and pest infestations. It
further recommends treatment and management of the crop accordingly.

How Plantix Works:

1. Image Capture: Farmers use their smartphones to take pictures of the affected plants.
The images should clearly show the symptoms—discolored leaves, spots or deformities.
2. Image Upload: The captured images are then uploaded on the Plantix app. The app uses
advanced image recognition technology to analyze the photos.

AI Reflection, Project Cycle and Ethics 15

3. AI Analysis: Plantix employs AI algorithms trained on a vast database of plant images,
diseases and symptoms. The AI system compares the uploaded images with its database
to identify potential issues. The system recognizes various plant diseases, pest damage
and nutrient deficiencies by analyzing visual patterns and symptoms.
4. Diagnosis: Once the analysis is complete, Plantix provides a diagnosis of the problem.
It identifies the specific disease or deficiency affecting the plant. The app also gives a
detailed information about the identified issue, including its causes, symptoms and
possible impact on the crop.
5. Recommendations: Based on the diagnosis, Plantix offers tailored recommendations for
treatment and management. These recommendations may include:
• Appropriate pesticide or fungicide usage
• Fertilizer applications to correct nutrient deficiencies
• Cultural practices to prevent further spread of the disease
The app also advises on the optimal timing and method for applying treatments to
ensure effectiveness.
6. Monitoring and Follow-up: Plantix allows farmers to monitor their crops over time
by regularly capturing and uploading new images. This feature helps in tracking the
progress of treatments and ensuring that the recommended actions are effective.

Impact on Farmers

By using Plantix, farmers gain easy access to expert-level diagnostics and advice. This helps
them to take timely and informed actions to protect their crops from diseases and nutrient
deficiencies. As a result, they can improve crop yields, reduce losses and increase their overall
agricultural productivity.

16 SUPPLEMENT—Decoding Artificial Intelligence–IX

WHAT ARE ETHICS
Ethics refers to the principles and guidelines that help us determine what is right or wrong
in various situations. It involves making decisions and taking actions that align with values
such as fairness, honesty, respect and responsibility.

FUN To watch an interesting reference video on ethical scenarios, open the link
TIME https://www.youtube.com/watch?v=nyTmeb4vFqE in your web browser or scan
the given QR code.

Ethical Scenario I: Food Quality in a Restaurant

Scenario: Imagine you manage a fast-food restaurant. On a busy day, you notice that some
of the ingredients are past their expiration date but your supervisor insists on using them to
avoid wastage.
Discussion Questions:
1. Would you use the expired ingredients to prepare food?
............................................................................................................................................................................
2. Why would you do that?
............................................................................................................................................................................
3. What are the potential consequences of using the expired ingredients?
............................................................................................................................................................................
4. What are the potential consequences of refusing to use the expired ingredients?
............................................................................................................................................................................

Ethical Scenario II: Data Privacy

Scenario: Imagine you are developing a new mobile application that collects user data to
improve its services. You realize that disclosing the full extent of data collection could deter
users from downloading the app.
Discussion Questions:
1. Would you disclose all data collection practices to the users?
............................................................................................................................................................................
2. Why would you do that?
............................................................................................................................................................................
3. What are the potential consequences of full transparency about data collection?
............................................................................................................................................................................
4. What are the potential consequences of not fully disclosing data collection practices?
............................................................................................................................................................................
You will get a basic idea about the concept of ‘ethics’ by honestly answering the questions in
the given scenarios. Each decision that we make is guided by some ethical responsibilities and
moral values.
AI Reflection, Project Cycle and Ethics 17
Moral Scenario I: Honesty in Academics
Scenario: Imagine you are a high school student. You have an important test coming up
and you find out that a friend has access to the test answers. Your friend offers to share the
answers with you.
Discussion Questions:
1. Would you accept the answers from your friend?
............................................................................................................................................................................
............................................................................................................................................................................
2. Why would you do that?
............................................................................................................................................................................
............................................................................................................................................................................
3. What are the potential consequences of accepting the test answers?

............................................................................................................................................................................
............................................................................................................................................................................
4. What are the potential consequences of refusing to use the test answers?

Moral Scenario II: Responsibility in Group Projects

Scenario: Imagine you are part of a group project in school. One of the group members,
instead of contributing, is relying on everyone else to complete the work. You know that
informing the teacher could get your group member in trouble but you also want a fair
distribution of work.
Discussion Questions:
1. Would you talk to the teacher about the group member not contributing?
............................................................................................................................................................................
............................................................................................................................................................................
2. Why would you do that?
............................................................................................................................................................................
............................................................................................................................................................................
3. What are the potential consequences of informing the teacher?

18 SUPPLEMENT—Decoding Artificial Intelligence–IX

4. What are the potential consequences of not informing the teacher?

Ethics VS Morals
Both ethical and moral questions can be challenging for humans to answer—they present
different types of difficulties based on their nature.
Ethical questions: Focus on what society says is right or wrong. They involve rules and laws,
and affect groups of people.
Moral questions: Focus on personal beliefs about good and bad. They are based on individual
choices and affect an individual and their close circle.
A comparison table to illustrate the differences and challenges associated with ethical and
moral questions is presented below:

Aspect Ethical Questions Moral Questions

Based on Society’s rules and laws Personal beliefs and values
Guiding Principles Decide what is right or wrong for groups and professions Decide what is good or bad for individuals
Variability Can vary between societies and professions Can vary widely between individuals and cultures
Impact Affect many people and have broad implications Affect an individual and their close circle

Paired Examples

Ethical Questions Moral Questions

Using AI in Hiring Fair Treatment of Candidates
Should a company use AI to screen job applicants? Is it fair to judge someone based on an algorithm without meeting them?
Data Privacy Honesty and Transparency
Should a social media platform collect and sell user data? Is it right to share personal information without the user’s consent?
Food Quality Control Personal Responsibility
Should a restaurant serve food that is beyond its expiration date? Is it okay to serve food that you would not eat yourself?
The above table presents the questions of ethics and morality in pairs and it is important to
understand the minor differences between these domains.

Do it Yourself
1. Parneet takes home the silverware from a restaurant after dinner, which is considered
theft. Is this an ethical or moral concern? Explain why.
2. After Akshit and Aman discuss buying a new mobile phone, focusing on specific features,
Aman starts receiving notifications for mobile models that match their discussion.
Identify the ethical concern in this scenario.

AI Reflection, Project Cycle and Ethics 19

Why are ethics important
In the context of AI, ethics have become a significant concern because as AI technology
makes advancements, it has the potential to impact various aspects of our lives. Consider the
following potential implications of Artificial Intelligence on our daily lives:
1. Impact on People: AI systems can make decisions that affect people’s lives, like
determining job opportunities, loan approvals or medical diagnoses. Ensuring that these
decisions are fair and unbiased is crucial.
2. Bias and Discrimination: AI algorithms can learn from biased data and extend existing
inequalities. It is important to prevent AI from making biased or discriminatory decisions.
3. Privacy: AI relies on data and the way data is collected and used can infringe on people’s
privacy. Protecting personal information is an ethical concern.
4. Job Displacement: AI automation can lead to job loss in certain industries. Ethical
considerations include supporting workers and addressing societal impacts.
5. Transparency and Accountability: Understanding how AI makes decisions is vital. We
need to ensure transparency in AI systems and establish accountability for their actions.
6. Safety: In cases like autonomous vehicles, the ethical concern is about safety. AI must be
programmed to make safe decisions, especially when human lives are at stake.
7. Misuse: AI can be misused for harmful purposes, like spreading fake information or
cyberattacks. Ensuring AI is used responsibly is important.
8. Global Impact: As AI has global implications, addressing ethical concerns is necessary to
ensure that its development and use benefit everyone.

AI Ethics Principles
To make AI better, we need to identify the factors responsible for Human Rights Bias

affecting the quality of AI solutions with respect to humans and society.

The four principles—human rights, bias, privacy and inclusion—form
Inclusion Privacy
the ethical foundation for developing and deploying AI systems.

Human Rights
AI systems should respect and support human rights. This means AI should not harm people
or take away their freedom. AI should not be used to:
• Invade Privacy: AI should not peek into people’s private lives without permission.
• Discriminate: AI should treat everyone fairly, regardless of race, gender, age or religion.
• Limit Freedom: AI should support people’s rights to speak freely and stand up for what
they believe in.
• Take Control: AI should not take away control from humans.

FUN The video ‘Will Artificial Intelligence Take Over the World?’ provides
TIME an excellent analysis of AI and its control issues. To watch the
video and know more, scan the given QR code or open the link
https://www.youtube.com/watch?v=pnpq69WaRsM in your web browser.

20 SUPPLEMENT—Decoding Artificial Intelligence–IX

Bias
AI bias refers to the presence of unfair and unjust outcomes in artificial intelligence systems,
particularly the unfair treatment of specific groups of people. AI bias occurs because the
information used to teach AI systems can sometimes carry unfair ideas of people who operate
such systems.
AI systems should be fair and not show bias. Bias in AI can lead to unfair treatment of some
people or groups. To avoid bias:
• Find Bias: Regularly check AI systems for any unfair treatment.
• Reduce Bias: Use methods to make AI fairer and remove unfairness from data and
algorithms.
• Be Transparent: Make sure people understand how the AI system makes decisions so
that they can trust its fairness.
Some real-life case studies that highlight AI bias and its consequences are presented below:

Case Study
AMAZON’S GENDER-BIASED HIRING TOOL
Amazon developed an AI tool to help with the hiring process but it was found to be biased
against female candidates. This bias was due to the data used for training, which contained
mostly male resumes. As a result, the AI system favoured male candidates, thus perpetuating
gender bias in hiring.
You can read more about the same on
https://www.reuters.com/article/us-amazon-com-jobs-automation-insight-idUSKCN1MK08G
To watch a video on the Amazon gender bias fiasco, scan the given QR
code or open the link https://www.youtube.com/watch?v=JOzQjT-hJ8k in your
web browser.

Case Study
RACIAL BIAS IN FACIAL RECOGNITION
Facial recognition systems have shown racial bias, performing less accurately for people with
darker skin tones. This bias is because the training data usually contains more images of lighter-
skinned individuals. This leads to misidentification and unequal treatment of people based on
their skin color.
To read more on this, open the link
https://www.nytimes.com/2019/04/17/technology/facial-recognition-technology-bias.html in
your web browser.
To watch a video on racial bias in algorithms, scan the given QR code or open the
link https://www.youtube.com/watch?v=IzvgEs1wPFQ in your web browser.

To stop AI bias, it is essential to fix the problems in the data that is used to teach AI and to
ensure that AI systems are fair and unbiased.

AI Reflection, Project Cycle and Ethics 21

Privacy
Privacy concerns in the context of AI relate to how personal information is collected, stored
and used by AI systems. Privacy means keeping your personal information safe and not letting
it be seen or used by others without your permission.
AI systems should protect people’s privacy and handle personal data carefully by ensuring to:
• Protect Data: Use strong security measures to keep personal data safe from hacking or
being misused.
• Get Consent: Inform people how their data will be used and get their permission.
• Limit Data: Collect the data that is necessary and keep it only for the required period of
time.

Inclusion
AI should not discriminate against a particular group of people causing them any kind of
disadvantage. AI systems should be designed to be inclusive and benefit everyone. All AI
developers must strive to:
• Make AI Accessible: Design AI systems such that people with different abilities and
backgrounds can use it.
• Include Diverse Views: Involve people from different backgrounds while creating AI
systems to make sure they work well for everyone.
• Promote Equality: Use AI to create equal opportunities and help everyone, especially
those who are less fortunate.

Do it Yourself
CONFUSION MATRIX
A company developed a confidential AI recruiting tool but their machine learning experts later
discovered a significant issue: the new tool exhibited bias against women. It autonomously
learned to favour male candidates and penalized resumes containing the term ‘female’. As
a result, the tool failed to perform as intended.
(a) Which AI ethics issue is highlighted in this scenario?
(b) What could have caused the ethical concern identified in the situation?

Memory Bytes
Data statistics are crucial for understanding the characteristics of datasets used in AI.
Data preparation involves cleaning and organizing data to make it suitable for training AI models.
Data splitting divides data into training and testing sets to evaluate model performance.
Cross-validation is an advanced evaluation technique for assessing model reliability.

22 SUPPLEMENT—Decoding Artificial Intelligence–IX

Train-test split is the basic method for evaluating machine learning models.
Cross-validation gives a more reliable performance estimate, especially with limited data.
A confusion matrix evaluates classification models by showing true positives, true negatives, false positives
and false negatives.
True positive (TP) and true negative (TN) indicate correct predictions by the model.
False positive (FP) and false negative (FN) indicate incorrect predictions by the model.
The ROC-AUC curve measures the performance of classification models.
Classification models predict categorical outcomes based on input data.
Regression models predict numerical values from input data.
Evaluation metrics help determine how well a machine learning model performs.
Deployment of AI models involves integrating trained models into real-world applications.
Ethical considerations in AI involve ensuring fairness, transparency and accountability.
The AI project lifecycle includes problem scoping, data acquisition, exploration, modelling, evaluation
and deployment.
Ethical AI involves addressing biases and ensuring data fairness.
Privacy concerns in AI emphasize data protection and user consent.
Ethical AI development includes continuous monitoring and improvement.
AI models should be evaluated regularly to ensure ongoing fairness and accuracy.

Exercises
Objective Type Questions
I. Multiple Choice Questions (MCQs):
1. Which of the following uses Artificial Intelligence to function?
(a) Toothbrush
(b) Self-driving car
(c) Analog wrist watch
(d) Ceiling fan
2. Which programming language is commonly used today for developing AI applications?
(a) Ruby
(b) Java
(c) C
(d) Python
3. The field of Artificial Intelligence which helps to identify and process human images is:
(a) Face Detection
(b) Computer Vision
(c) Natural Language Processing
(d) Eye-in-Hand System

AI Reflection, Project Cycle and Ethics 23

4. What does NLP stand for in the context of Artificial Intelligence?
(a) Natural Language Processing
(b) Neural Language Processing
(c) Natural Linguistic Programming
(d) Neural Linguistic Processing
5. Which of the following is not considered a domain of Artificial Intelligence?
(a) Database Management System
(b) Machine Learning
(c) Data Science
(d) Natural Language Processing
6. What is the next stage after Problem Scoping in the AI project cycle?
(a) Data Exploration
(b) Data Acquisition
(c) Modelling
(d) Deployment
7. Which of the following is a part of Problem Scoping?
(a) Data Features
(b) 4Ws Canvas
(c) System Mapping
(d) Web Scraping
8. Which of the following is not a reliable method for Data Acquisition?
(a) Facebook Posts
(b) Surveys
(c) Gossips
(d) Web Scraping
9. In the context of Loopy System Maps, what does an arrow from X to Y with a ‘-’ sign indicate?
(a) Increasing X leads to a decrease in Y
(b) The relationship direction is reversed
(c) Increasing X results in an increase in Y
(d) The relationship is bidirectional
10. Which of the following is not a component of 4Ws Problem Canvas?
(a) When
(b) What
(c) Who
(d) Where
11. Breaking down a problem into a series of simple steps is termed as:
(a) Efficiency
(b) Modularity
(c) Both (a) and (b)
(d) Neither (a) nor (b)

24 SUPPLEMENT—Decoding Artificial Intelligence–IX

12. We use data exploration to:
(a) Visualize data
(b) Find patterns in data
(c) Simplify data
(d) Complicate data

II. Fill in the blanks

1. The principle that guides us in deciding what is good or bad is called ................................ .
2. AI solutions should follow ................................ .
3. Preferring one person over another based on personal opinions is known as ................................ .
4. Google primarily shows pictures of women when you search for images of ‘personal secretary’. This
is an example of ................................ .

III. State whether the following statements are True or False.

1. Modelling involves creating an AI model.
2. AI can be used on mobile phones.
3. Deployment is the final phase of the AI project cycle.
4. AI and ML models can make unfair decisions if they are not trained properly.
5. AI framework helps to provide unbiased results.

Subjective Type Questions

Unsolved Questions:
1. How can we use AI to make the world a better place to live?
2. List some common applications of Computer Vision.
3. Explain the differences between the three main domains of AI, focusing on the types of data they utilize.
4. Which domain of AI can be employed to develop a system for identifying people not wearing masks in
public spaces to ensure public health safety after COVID-19?
5. Research and play Google Draw, an online game that uses AI to recognize user-drawn images. Describe
your observations and explain which AI techniques or domains are utilized by the game.
6. Explain the various stages of AI Project Cycle with examples.
7. Differentiate between AI Projects and non-AI IT Projects.
8. Describe the purpose of 4Ws problem canvas statement used in the Problem Scoping stage of an AI
project.
9. Explain how a Problem Statement Template is used during the Problem Scoping stage of an AI project.
10. Define Problem Scoping and explain its various steps in detail.
11. List the individuals considered as stakeholders in the Problem Scoping stage.
12. Differentiate between training data and testing data, providing examples of each.
13. Describe various data collection methods and provide an example of how each method is used in a project.
14. What are some effective methods for collecting useful data?
15. Design a system map that shows all the steps and factors involved in helping farmers from a village
transport their produce to the market for sale.

AI Reflection, Project Cycle and Ethics 25

16. List some Indian government websites where open-source data can be accessed.
17. Why is the Data Exploration stage necessary? Explain with examples.
18. Define Data Visualization.
19. List some types of graphs commonly used for Data Visualization.
20. Explain the difference between Data Exploration and Data Acquisition.
21. Describe a Data Visualization technique and explain with the help of examples.
22. Explain the different stages of the AI Project Cycle with examples.
23. Define Artificial Intelligence (AI) and provide a real-life example of its application.
24. Explain the relationship between Machine Learning (ML) and Artificial Intelligence (AI).
25. Compare Rule-based and Learning-based approaches in terms of their usage and applications.
26. Explain the concept of Evaluation in the context of AI projects.
27. Describe the various techniques used to evaluate an AI model.
28. What is the importance of Model Evaluation?
29. Define the terms True Positive and False Positive in the context of model evaluation.
30. Define the term ‘deployment’ in the context of AI projects.
31. Explain the importance of Model Deployment in AI Projects.
32. List some challenges associated with deploying an AI model.
33. Explain the difference between Ethics and Moral. Provide examples of both.
34. What are the core principles that AI should follow?
35. What is Data Privacy?
36. Describe how considerations for inclusivity can be addressed while developing an AI model.
37. Identify and explain the major issues concerning AI Ethics.

26 SUPPLEMENT—Decoding Artificial Intelligence–IX

2 Data Literacy

Learning Objectives
Understanding and explaining the fundamental concepts of data literacy, including the
differences between data, information, knowledge and wisdom
Identifying and applying the steps involved in data collection, cleaning, preprocessing and
analysis to real-world scenarios
Utilizing various data visualization techniques to effectively communicate data insights
through charts, graphs and infographics
Implementing key data security and privacy measures to protect personal and sensitive
information
Developing a critical-thinking approach to evaluate data sources, interpret data accurately
and make evidence-based decisions

Introduction
The world we live in today can
rightly be termed as a ‘Golden Era
for Data’. Billions of smartphones,
computers and other electronic
gadgets, installed with countless
different types of software
applications, are generating
and storing huge amounts of
information on a variety of
topics every second of the day.
Imagine the number of web pages,
podcasts, images, audio, videos,
spreadsheets, written posts, blogs,
Source: Statista Digital Economy Compass 2019
articles, ebooks, infographics and
comments being generated every day.
Have you ever wondered what happens with this data? Who is this data important for? How do
we utilize this data? Or more importantly, what exactly is data? We shall find the answers to
all these questions and much more in this chapter on Data Literacy.
Welcome to an amazing journey through the wonderful world of data and information,
ultimately leading you towards Data Literacy, which is one of the most essential skills of the
21st Century.

g ont DATA LITERACY

r r
Ja Ale Data literacy is the ability to read, understand, create and communicate data as
information.

Most people believe that Data Literacy skills are important only for professionals working
with Computer Science. This is a common misconception. Can you guess what is common
between a top javelin star, a meteorologist, a business executive and a farmer? Not
surprisingly, all of them are part of the fields where data literacy has become a huge factor of
success.
Today, data literacy is equally important for any person working in any domain of modern
life—be it entertainment, agriculture, sports, business, journalism, finance, economics,
humanities, healthcare, medicine, astronomy or robotics. Let us begin our journey with
the understanding of some building blocks of Data Literacy, starting with the smallest
block, i.e., data.

‘The ability to take data—to be able to understand it, to process it, to

extract value from it, to visualize it, to communicate it–is going to be a
hugely important skill in the next decades.’
—Hal Varian, Chief Economist at Google

WHAT IS DATA
Look around you and try ‘describing’ the place, things and people you see. In the study of
English (or any other language), we learn to describe people, places and things with nouns
(which help us name something or somebody) and adjectives (which help us describe the
properties of a person, place or thing, or even our feelings and emotions).
Additionally, Mathematics helps us to describe those attributes or properties which are either
countable or measurable. The ability to describe is the foundation of human communication.
We may be able to communicate with another human being in Hindi, English, Spanish,
German or any language both parties know well, which helps us in the process of description.

28 SUPPLEMENT—Decoding Artificial Intelligence–IX

For humans, it is easy to store and process these descriptions. With the advantage of
our evolved brain, we can connect the knowledge of different domains like English and
Mathematics effortlessly.

Describing Language People/Things in a Classroom

Humans use describing language for
communicating names, attributes and Three crayons, one each of red,
feelings of people. yellow and purple color

A green school bag

A blackboard

An old teacher, with grey hair,

Nakul, a short, Angela, a tall happy wearing spectacles
smiling boy, wearing girl, wearing a
a blue T-shirt pink skirt

Data in Computing
Data is the ‘describing’ language of computers. Computers use data for describing
anything—be it non-measurable properties like names/feelings or any measurable thing like
count or quantity. Anything we store on a computer is the description of something, which we
call data; similarly, any output we receive from a computer is also called data. Let us look at
the formal definition of data.
Data is a collection of facts such as numbers, words, measurements and observations, or
just descriptions of things.
Whenever we think of data, we may think of numbers. However, data on computers can also
be videos, documents, spreadsheets, audio files, photographs and many other different
formats used by computers. Computers store data in binary digits—0 and 1, popularly known
as bits, which are grouped into bytes, where each byte contains 8 bits together.
Do not confuse ‘describing’ language with ‘programming’ language—the latter is the
language of instruction for the computer. A program is a set of instructions (written in
programming language) which tells the computer what to do with data and how to do it.

Do it Yourself
Describe the people below.

Data Literacy 29
Let us illustrate a few examples of data to understand it better. Consider the following table.
Each of the values within columns A to I is an example of data. The data may consist of
numbers, characters, symbols or their combinations as illustrated below:
• The numbers in columns A, G and I are all data values.
• The text stored in columns C, E and H are data values.
• The images in column F are data.
• The character and symbol combinations in column B and the character and number
combinations in column D are also data.

A B C D E F G H I
7802452133 B+ Sameer 10A Pizza 10000 Mumbai 99.1

8862546117 B– Angela 9B Club Sandwich 5000 Delhi 98.6

7839096657 A+ Riya 12C Fruit Salad 18000 Bengaluru 99.9

Raw Data: Meaning or significance of the data is unknown

Data can exist in various forms as observed in the above example. But does this data have
any meaning?
Let us play a guessing game: What do the data values in each column signify? Think about
the significance of the digits in column A. What may these numbers mean?
All the students in the class may interpret these values differently. The above data table
has no meaning of its own. It is just a collection of numbers, characters and their
combinations.

Data VS Information
Data and information are used almost interchangeably by most people. But for proficiency in
Data Literacy, you must understand that both the terms are very different from each other.
Data, by itself, does not have any meaning at all. It shall remain a meaningless collection
of characters, numbers, words or images. Information, on the other hand, represents
meaningful data. It is important to note that data and information are closely related to each
other, as explained further.
Let us rewrite the data table from the previous section again. This time we shall have
some additional details in the table as shown in the top row of the below illustration.
These additional details are called Context, which help us to understand the meaning
of data.

30 SUPPLEMENT—Decoding Artificial Intelligence–IX

Phone No. Grade Name Class Favourite Avatar Scholarship Home Town Percentage
in Last Food in Last Class
Class
7802452133 B+ Sameer 10A Pizza 10000 Mumbai 99.1

8862546117 B– Angela 9B Club 5000 Delhi 98.6

Sandwich

7839096657 A+ Riya 12C Fruit Salad 18000 Bengaluru 99.9

Student Information: Meaning or significance of the data is known

Is it possible to understand the meaning of data contained in each column now? You may
observe that the first column stores the phone number of the student and second column
contains the grade in the last class and so on. Just by adding context to the data values, it is
very easy to understand the meaning of the numbers or characters stored in the table as data.

g ont CONTEXT
a r er
J Al Context refers to the situation within which something exists or happens and that can
help explain it.

Definition of Information: When raw data is processed, interpreted, organized, structured or

presented in order to make it meaningful or useful, we call it information.
It is very important to understand here that the same set of data values may represent
different information based on the context. Consider the following illustration in relation to
the previous image:

Booking ID Blood Passenger Seat In-flight Photo Ticket Cost Destination Body
Group Name No. Order on ID (INR) Temperature
(°F)
7802452133 B+ Sameer 10A Pizza 10000 Mumbai 99.1

8862546117 B– Angela 9B Club 5000 Delhi 98.6

Sandwich

7839096657 A+ Riya 12C Fruit Salad 18000 Bengaluru 99.9

Traveller Information: A different context for same data values

Data Literacy 31
Interestingly, all the data values in the given table are the same as the Student Information
table given before. However, the meaning of the columns has changed completely. We can
now understand that data may be a raw, unorganized and meaningless group of characters or
numbers (or combinations of both) but when it is transformed with processing, organization
and is given a context, it becomes meaningful information.
The difference between data and information is as follows:

Data Information
It is a raw collection of numbers or characters. Processed data presented with context is known as information.
Data is independent of information. Information is dependent on data from which it is derived.
Data can be collected by observation and from records. Information is generated from data.
Data may be meaningless. Information is always meaningful.
Raw data cannot be used for decision-making. Since information is processed meaningful data, it helps us in
decision-making.
Data may be difficult to understand without context. Information is usually easier to understand.
Data is represented in bits and bytes. Information is represented with ideas and thoughts.
Example: Raw Data Example: Information
110001 The Pin Code of Parliament House in New Delhi is 110001.
We may summarize the relationship between data and information as under:
• Data refers to raw facts, observations, measurements or records that are typically
collected and stored for analysis, reference or for use in decision-making.
• Data can exist in various forms, including text, numbers, images, audio, video or any
other format that can be processed by computers.
• When data is processed and organized into a meaningful structure, it becomes
information that can be used to make predictions or support decision-making.

DATA INFORMATION KNOWLEDGE

Raw Processed Actionable
0101110 Authoritative Past
0111010 Capacity Experience
0101110

0101110
0111010
0101110

0101110
0111010
0101110
Decision-making
0101110
0111010
0101110

Adaptive

32 SUPPLEMENT—Decoding Artificial Intelligence–IX

DIKW MODEL
Having understood the difference between data and information, we can conclude that
information is derived from data. We may also say that information is at a higher level of
usefulness and understanding as compared to data because the former adds context to the latter.
Similarly, if we combine pieces of information with meaning, it gives rise to knowledge. In the
hierarchy of usefulness, data and information have a higher-order term called knowledge.

If information is the individual pieces, knowledge is the completed puzzle.

In fact, there is a complete model which allows us to understand how different levels of
usefulness of data evolve when we keep adding context and meaning; the model is popularly
known as the DIKW model.
The DIKW model is an abbreviation consisting of the following terms:
• Data
• Information
• Knowledge
• Wisdom

Wisdom
Usefulness

Knowledge

Volume
Information

Data

DIKW Pyramid

Definition: The DIKW model is a pyramid-shaped model of knowledge management which is

used to represent the relationship between Data, Information, Knowledge and Wisdom.
A pyramid structure, the DIKW model begins with a large base of raw, meaningless data. As
we move up in the hierarchy from data to wisdom, we get levels with more usefulness and
understanding but the volume at each successive level keeps decreasing. Wisdom, which
sits at the top of the pyramid, guides our actions using the combination of our knowledge
and experience. The DIKW model is very easily applied to human beings. The brain of human
beings is very good at extracting knowledge from information and it grows into wisdom with
experience. But until now, the same was not true for computers.
Converting data into information (called data processing) is considered an easy task for
computers but extracting knowledge and wisdom is difficult. However, with the recent
advancements in Computer Science and Data Science, we can now analyze large amounts of
data which we can convert to information. This information can then be used by computers
for the decision-making process in almost the same way as humans do.
The given illustration is a modified DIKW model for the new generation, with an additional

Data Literacy 33
decision-making level at the top, which helps us to convert data into decisions for both
human beings and computers.

Decisions Future
Add Purpose What action to take based on
Wisdom knowledge or wisdom?
Add Insights
Knowledge
Add Meaning Past
Information Understand Patterns
Discover Relationships
Add Context
Data

Modified DIKW Pyramid with Decisions

Case Study
Let us understand the DIKW pyramid with a simple illustration.

Level Composition
1. Data (Base level) Om, 2, DataRich School, Bus, Car, Home, 10, Petrol, 96.30, Delhi, AQI, bicycle,
Pollution, Distance.
The elements on the base level do not make any sense but represent possible storage
elements called data.
2. Information (Level 1) The below statements are possible information which we get by adding context to the
(Add Context) data in the base level.
• Om is a student of Class 10 of DataRich School.
• He lives in Delhi. The distance between his School and Home is 2 kilometres.
• Om can commute to school on foot, on bicycle, by car or by bus.
• Delhi is one of the most polluted cities in India, with Air Quality Index (AQI)
close to 400.
• Petrol price is 96.30 rupees/litre.
3. Knowledge (Level 2) Some statements which we know through our knowledge from information other
(Combine Information than the above statements can be:
to add meaning) • Walking and cycling make us fit.
• Bicycles are pollution-free vehicles.
• Vehicles add to pollution.
• Short distances can be covered on foot or on bicycle.
• To travel long distances, a vehicle is better. Vehicle pooling helps reduce pollution.
Some knowledge statements which can be derived by combining information
statements from level 1 are:
• Om can commute to his school without any vehicle because the distance is just
2 kilometres, which can be easily covered by walking.
• Om can commute to school by car but the expenses will be more and he shall be
adding to pollution in the city, thus making the AQI worse.

34 SUPPLEMENT—Decoding Artificial Intelligence–IX

4. Wisdom (Level 3) Pollution is causing an irreversible damage to our environment and it
(Add Experience to Knowledge, should be avoided at all costs. It would also be prudent to save money
Add insights) and stay fit by walking or cycling to the school. If necessary, a school
bus can be a viable option.
5. Decision (Level 4) How should Om commute?
(Add Purpose to Wisdom) Om should commute to his school by cycling or walking, not only to
stay healthy but also to save the environment.

Do it Yourself
Match the Data, Information, Knowledge, Wisdom and Decisions to the correct entries in the table.

12 Decisions
12 degrees Celsius in my city today Wisdom
Humans feel very cold in 12 degrees Celsius Knowledge
12 degrees Celsius Information
I should put on a sweater Data

How does data influence our lives

The use of data is touching human lives today in more ways than we can imagine. Most of the
helpful technology surrounding us is highly dependent on data. You generate and use data even
when you are not creating a document or spreadsheet. Every click that you make generates data
and so does every like, comment, share or post on social media. You might be surprised to learn
how data is affecting our everyday lives even if we do not ‘use’ data on purpose. Some key areas
of human lives which are constantly being affected by data are presented below.

Entertainment
The most popular entertainment apps like Instagram, Snapchat, Spotify, Netflix, Facebook,
and YouTube, along with all other social media platforms that you might be using passively,
record the data that you generate. Social media companies view your likes, comments, shares
and browsing data to create your unique profile. With the help of these, they ‘recommend’
new content, friends and channels and even target advertisements based on your interests.
Similarly, the recommendation algorithms of Netflix and YouTube can suggest videos based
on what you have previously watched and liked.
The algorithms are hidden but you produce data for them with your likes and clicks and, in
return, consume data which they recommend.

Experience AI
Learn more about use of data in entertainment by scanning the given QR code or by opening
the link https://www.youtube.com/watch?v=OuU3UfRM2pE in your web browser.

Data Literacy 35
Agriculture
Can you believe that farmers are making use of data in agriculture? Agri-tech is helping the
farmers in a big way—data is useful in weather prediction, soil nutrition, making efficient
decisions on what to plant, when to harvest and several other activities related to agriculture
and farming.

g ont AGRI-TECH
a r er
J Al Agri-tech is the use of technology for farming that is developed to improve efficiency
and profitability in agriculture.

Recall the Sustainable Development Goals adopted by the United Nations given on
https://sdgs.un.org/goals. SDG 2 is End hunger, achieve food security and improved nutrition
and promote sustainable agriculture. Food security is one of the most important sustainable
development goals for humans and data plays a major role in aiding farming and agriculture.

Experience AI
To learn more about how data is helping us in the agriculture domain, scan the
given QR code or open the link https://www.youtube.com/watch?v=A3-GjKOdUGo in your
web browser.

Online Shopping
The use of data in e-commerce and online shopping portals like Amazon allows such
companies to keep track of the buying habits of its customers. Data also helps companies to
make decisions about what items are required to be stocked by forecasting demands (like gifts
and sweets before festivals). Data is also useful in understanding the interests of customers
and offering discounts for increasing sales. Robotics and Artificial Intelligence are ways in
which technology and data helps e-commerce.

Experience AI
Scan the given QR code or open the link https://www.youtube.com/watch?v=IMPbKVb8y8s
in your web browser to discover how data and technology are lending a helping hand to
Amazon in its warehouses to maintain the position of a market leader in the online
shopping industry.

36 SUPPLEMENT—Decoding Artificial Intelligence–IX

Healthcare
People are becoming increasingly aware of their health. With the help of fitness trackers like
Fitbit, Apple watch and smart bands, it has never been easier to keep record of your health
data including pulse and heart rate, blood pressure, blood sugar levels, the number of steps
taken in a day, workouts and sleep patterns. What is even more intriguing is that these devices
give real-time alerts on prevailing health issues!

Experience AI
Apple watches have been helpful in saving the lives of users! Check out this news article to know the
fascinating application of data and technology in healthcare.
Scan the given QR code or open the link https://economictimes.indiatimes.com/magazines/
panache/apple-watch-saves-haryana-dentists-life-by-detecting-99-9-artery-blockage-ceo-tim-
cook-reacts/articleshow/90319891.cms in your web browser.

To know more features of smartwatches and their use in healthcare, watch the YouTube video by
scanning the given QR code or open the link https://www.youtube.com/watch?v=UZTBIlzGRoA
in your web browser.

Experience AI
It is predicted that data and technology will play a very important role in the future of human
healthcare with 24/7 monitoring of health by using connected devices. Watch this animation
for a sneak peek into the future of healthcare. To open, scan the given QR code or open the
link https://youtu.be/W0li-PI6yWo in your web browser.

Travel
Data is also altering the traditional ways of travel and vacation planning. A large number of
online travel and tourism companies like MakeMyTrip, Goibibo, Airbnb and Trivago are using
data and technology to help people choose flights, hotel reservations and book trains at best
prices—all from the comforts of their homes!
Data is also used in online maps or GPS applications like Garmin, MapmyIndia and Google
Maps for traffic management which helps to save fuel and manage commute time effectively.

g ont GPS
a r er
J Al GPS stands for Global Positioning System. It is a satellite navigation system which helps
in pinpointing the location of a person, vehicle or any object on the earth in real time.

Data Literacy 37
Experience AI
Have you ever wondered how Google Maps can use data to inform about traffic conditions
on a route in advance? To learn about the traffic-prediction service of Google, scan the
given QR code or open the link https://www.youtube.com/watch?v=RQJ3HmVtN4w in your
web browser.

Education
Teachers and parents always help students to achieve their goals in the field of education.
With the use of technology in education, teachers can keep a record of student interest, their
performance, subject-wise scores, the attendance, areas of improvement and other important
factors.

Experience AI
Scan the given QR code or open the link https://www youtube.com/watch?v=cgrfiPvwDBw
in your web browser to view a small animation on how technology and data is helping
teachers to enhance the study experience of their students.

Data use in our everyday life is increasing rapidly and is not restricted to the above examples.
Data finds its applications in endless activities affecting human life, including daily news,
scientific experiments, banking, self-driving cars, robotics, astronomy, industrial automation
and many more.

FUN The Rise of Data

TIME To explore the increase in data from the earliest years of civilization to the
present era, scan the given QR code or open the link—
https://www.youtube.com/watch?v=NVFR0algY9M in your web browser.

INTRODUCTION TO DATA LITERACY Communicating

Data literacy is the ability to read, understand, with Data
Reporting &
interpret and communicate with data. It includes a Presentation
set of skills including gathering, analyzing and using
information effectively. In today’s world, where we
are flooded with information from all directions, data Data
literacy helps us make sense of the clues hidden in Literacy Working
numbers, graphs and charts. Reading Data with Data
Collection &
Theory & Analysis
Management

38 SUPPLEMENT—Decoding Artificial Intelligence–IX

The following are some important steps to acquire data literacy skills:
1. Achieving a Data Mindset: One must start believing that data is present in abundance all
around us and that any problem can be solved, provided there is enough data available.
2. Data-Enabled Questioning: One must learn to frame reasonable questions which can be
answered appropriately using data and its analysis.
3. Describing Data: One must possess the ability to understand and describe data.
It is essential to understand concepts such as how data is categorized in data types
(e.g., numerical, categorical) and the way data is stored using variables, datasets and
data structures (e.g., tables, databases).
4. Data Collection and Acquisition: It is important to acquire the knowledge of methods
and techniques for collecting, acquiring and accessing data from its various sources.
5. Data Cleaning and Preprocessing: The ability to clean, transform and preprocess raw
data before analysis to ensure that the results are accurate, consistent and complete is
important.
6. Data Analysis and Interpretation: One must make use of the proficiency in analyzing
data using statistical methods, data visualization tools and software to derive insights,
identify patterns and make decisions based on analyzed data.
7. Data Visualization and Communication: It is necessary to learn skills to present
data and its analysis visually using charts, graphs, dashboards and infographics to
communicate findings effectively in an easily understandable manner.
8. Critical Thinking and Problem-Solving: One must acquire the capacity to critically
evaluate data and results, ask meaningful questions based on data analysis to solve
problems and make evidence-based decisions.
9. Ethical and Responsible Data Use: It is vital to understand ethical considerations related
to data privacy, security, confidentiality and bias when working with data.

Achieving a Data Data-Enabled Describing Data Data Collection and Data Cleaning and
Mindset Questioning Acquisition Preprocessing
Cultivating an Formulating precise, Interpreting, Identifying, sourcing, Rectifying issues and
understanding of and objective-driven summarizing and and compiling relevant, transforming data to
belief in data’s questions to guide articulating the high-quality data using make it analysis-ready
ubiquity and utility data activities meaning of data rigorous methods

Data Analysis and Data Visualization Critical Thinking & Ethical and
Interpretation and Communication Problem-Solving Responsible Data Use
Applying appropriate Conveying data-driven Critically evaluating Incorporating reliable,
techniques to extract insights effectively for data representations, ethically sourced data
meaningful patterns, decision-making conclusions and into choices using
trends and insights through visualization, arguments to ensure structured frameworks
from data storytelling and robust interpretation for optimal, socially
strategic planning responsible outcomes

Data Literacy 39
g ont • Data Analysis: The process of thoroughly studying data to find patterns or answers.
r r
Ja Ale • Graphs and Charts: These are visual ways to represent data, e.g., bar graphs or
pie charts.
• Algorithm: It is a step-by-step procedure that computers use to solve problems.

Importance of Data Literacy

1. Data literacy helps us understand that different types of data may have good quality
or poor quality, it may be reliable or unreliable, it may have varied relevance and its
usefulness may be different for specific purposes.
2. Data literacy helps us grasp the fact that different data carries different value or
significance. Some data may be more valuable or informative than others based on its
characteristics and context.
3. Data literacy makes it easier to understand how data is collected and presented.
This way, we can understand the presented information better and avoid falling for
misleading charts or other visual presentations.
4. Our brains are wired to love stories, not numbers. People are more likely to remember
information presented in a story-based format or narrative. Data literacy helps us turn
uninteresting data points into interesting stories.
5. Data literacy is the most useful tool in the age of social media for avoiding
misinformation. Consider the following scenario:
Imagine reading a social media post claiming that chocolate cures cancer based on a
‘recent study’. Would you believe it? Isn’t it important to verify it?
With data literacy, you can ask critical questions on the post like:
• Who collected the data?
• How many cases were considered?
• Where was the study published? How strong was the evidence?
• Have doctors accepted this?
Data literacy helps us distinguish between real, impactful reports and those circulated by
vested interests.

Remember
Data literacy is not just for data scientists! Anyone can sharpen their data literacy skills by following two
basic practices:
• Ask Questions: Don’t be afraid to ask ‘dumb’ questions. A good understanding always starts with
curiosity. Follow this up with data supporting the answers.
• Practice Makes Perfect: Look for data in your everyday life—news articles, weather reports, sports
statistics, etc. Try to interpret the data and see what story it tells.

40 SUPPLEMENT—Decoding Artificial Intelligence–IX

Impact of Data Literacy
Data literacy has a profound impact on the growth of both individuals and organizations.
Being data literate helps us make smarter decisions to shape the future of technology.
Let us consider some potential impact of data literacy:
• Making Informed Decisions: Data literacy empowers us to make informed choices.
For instance, when shopping online, a data literate person can select the best products
since they are capable of fully understanding the importance of reviews and ratings. On
the other hand, a person who is not data literate may rely on intuition or incomplete
information, leading to less optimal decisions.
• Solving Real-World Problems: Scientists use data to solve big problems like climate
change or disease outbreaks. By analyzing data, they can find patterns and make
important discoveries.
• Career Opportunities: Being data literate opens doors to exciting careers in fields like
technology, business and science. Companies value employees who can understand and
work with data efficiently.
• Enhanced Customer Insights: Data literacy allows organizations to gain deep insights
into customer behaviour, preferences and sentiment. This information is used to design
products, services and marketing strategies to meet the exact needs of customers and to
find areas for improvement.
• Real-Time Monitoring and Response: Real-time data analysis enables organizations to
monitor events, detect anomalies and respond promptly to changing conditions.

Data Literacy Impact Stories

Case Study
CASE I: STARBUCKS
Starbucks, one of the most popular chains of coffee stores, has utilized data
literacy to optimize its operations and make customer experiences better.
Equipped with data literacy, Starbucks employees analyze data from point-of-sale
systems, loyalty programs and customer feedback. By incorporating data literacy,
Starbucks has been able to make the following data-driven decisions to benefit the organization:
• It identified customer preferences for different drink sizes.
• It adjusted its ordering and inventory management accordingly.
• It reduced wastage and improved efficiency.
• Additionally, it used customer data to personalize promotions and rewards, leading to
increased customer loyalty and sales.

CASE II: MONDELEZ INTERNATIONAL

Mondelez International, the parent company of popular brands like Oreo and Cadbury, has
implemented a data literacy program for all its employees across the world. Employees, from
departments ranging from manufacturing to marketing, have been trained to understand and
interpret data related to their respective roles.

Data Literacy 41
This initiative has enabled Mondelez to make more
informed decisions about product development,
supply chain management and marketing campaigns.
For example, by analyzing consumer preferences
and social media data, it was able to develop successful product innovations and customize
marketing strategies for different regions.

CASE III: SOCIETAL IMPACT DURING COVID-19 PANDEMIC

Data literacy played a critical role throughout the COVID-19 pandemic on both individual and
societal levels.
• Empowered Individuals: A constant stream of fear-mongering, lies and false news
bombarded everyone during COVID-19. Data literate people were better equipped to
critically evaluate this information.
They could check things like the source of the data, the way it was presented (charts,
graphs, etc.) or any indication of biases. This critical thinking helped them make informed
decisions about their health like wearing masks or getting vaccinated—all based on
trustworthy evidence.
• Policy Decisions: Governments relied heavily on data to make important choices. Data
literacy allowed them to analyze infection rates, allocate resources (ventilators, oxygen,
PPE, medicines) judiciously and study the effectiveness of various measures taken. This
data-driven approach helped them implement preventive measures like lockdowns and
social-distancing to check the spread of the virus.

• Business Continuity: Businesses also used data to navigate uncertainty. By understanding
data on consumer behaviour and economic trends, they could adapt their strategies
accordingly like shifting to online operations or implementing safety protocols for
in-person work.
One of the most popular dashboards during COVID-19 was watched
by people across all countries to follow the trends about the
pandemic. To view the same, scan the given QR code or open the link
https://www.worldometers.info/coronavirus/ in your web browser.

42 SUPPLEMENT—Decoding Artificial Intelligence–IX

Do it Yourself
How reliable is the news?
Let us learn more about how data impacts reliability. Each student in the class may partner
with their friend and collect a popular news item. Also check out the source(s) of the news
item (TV, internet blog, WhatsApp, Instagram, etc.) Record the following information about
the news item:

Who is the author of the What is the web link of What is the description What are the facts and figures
news? the news? available in the source of the mentioned in the source?
news?
Prepare one document where all student pairs can share their findings and rate the sources
cited in the news items from 1 (least reliable) to 10 (most reliable), providing a valid reason
for the rating. Name the 5 most reliable news sources and 5 least reliable and share your
views on them.

Reliability Score Data Source Remarks and Reasons

Are all data sources equal? Why do you think this matters?

How to become Data Literate

We have already discovered that data is present everywhere and we are unconsciously using
data all the time, be it banks, hotels, schools or airports. Since abundance of data can be
confusing at times, understanding what specific data means and how to use it effectively has
become a very valuable skill. This new-age skill set is leading to exciting careers, especially in
the domains of Data Science, Artificial Intelligence and Machine Learning. Data literacy helps
us understand what data means and how to use it to our advantage for new-age careers.

THE DATA JOURNEY

TELL THE STORY

ANALYZE, MODEL data visualization

EXPLORE, CLEAN, storytelling
DESCRIBE data analysis
DEFINE, FIND, evaluating
data cleaning data modelling
GATHER decisions based
data on data
data discovery data exploration
interpretation
data
data gathering
standardization
data
management and
organization

Becoming data literate involves developing specific skills and knowledge to understand,
analyze and use data effectively.

Data Literacy 43
The step-by-step guide to data literacy is explained in detail below:
Step 1: Understand the basics of data: Start by learning what data is and how it is used. Data
refers to raw pieces of information (numbers, words or images) that can be analyzed
to gain insights. For example, a weather forecast is based on the data collected from
various sources.

‘Big data’ refers to large volumes of data that is too complex to process using traditional
methods.

Actions: Ask the Right Questions

• What kind of data is it? Numbers, text, images—data comes in many forms.
Understanding the type of data helps you interpret it correctly.
• Where does it come from? Be cautious of data without a credible source. Imagine a
friend telling you that they saw a talking cat—would you believe them blindly?

Step 2: Learn data analysis techniques: Explore methods to analyze data such as creating
charts, graphs and tables. This helps in visualizing and understanding the patterns
and trends within the data.
Actions: Learn Basic Statistics
• Mean, Median, Mode: These are the cornerstones of basic statistics. They
help you understand the central tendency of a dataset, explaining where most of
the data points lie.

g ont MEAN, MEDIAN, MODE

a r er
J Al Mean is the average, median is the middle value and mode is the most frequent number.
Statistics are important: Consider a social media post claiming that everyone in your city is way
above the average height. Data literacy helps you understand whether this is true—statistics like
mean and median can unveil the truth.

Step 3: Gain hands-on experience: Practise working with data. Start with simple datasets and
use tools like Microsoft Excel or Google Sheets to organize, analyze and visualize data.

44 SUPPLEMENT—Decoding Artificial Intelligence–IX

Actions: Visualize It
• Charts and Graphs: Data presented visually is often easier to understand.
Visualization allows looking at data in the form of images instead of just reading
long lists. There are many chart types to help you visualize data including bar
charts, pie charts, line graphs, etc. Each is suited for different purposes.

g ont Data Visualization Tools

r r
Ja Ale Software like Tableau or Power BI help create visual representations of data.

Our brain processes visuals much faster than text. A study done jointly between 3M and the
University of Minnesota about presentations concluded that using visual aids was found to be
43% more persuasive than unaided presentations. Charts and graphs can make complex data
digestible.

FACT
CHECK
BE A DATA DETECTIVE
The 3M study also claimed some other statistics as illustrated below. Search the internet and
find out which of the claims still remain unverified and may be false or misleading.

50%
of your brain is active in
visual processing alone

90%
of information transmitted
to the brain is visual

40%
of people respond
better to visuals

70% We process images

of your sensory
receptors are in
your eyes 60,000x
faster than text

Step 4: Develop critical thinking skills: Always question and evaluate data sources. Not all
data is reliable or accurate, so it is important to assess the credibility and relevance of
the information. Let us take an example to understand this. Consider researching your
favourite IPL team online. You will find different websites with conflicting information.
How would you decide which source(s) to trust?

Actions: Think Before You Share

• Not all data is created equal. Be critical of the information you see online. Is it from
a reliable source? Could it be biased?
Data Literacy 45
For example, consider an online chart showing that dogs are superior to cats as pets.
Data literacy helps you come up with questions such as ‘Who made this chart?’, ‘Is
there data on cat ownership too?’.

g ont BIAS
a r er
J Al Bias means data might be misrepresented to favour a particular viewpoint.

Step 5: Stay Curious and Keep Learning: Data literacy is an ever-evolving skill. Stay updated
with new technologies, data visualization tools and trends in data analysis. Take online
courses, read books and explore real-world applications of data in different fields.

Interesting fact
According to an IBM study, bad data decisions cost businesses an average of $3.1 trillion per year.
...of business leaders don’t trust the
33% information they use to make business
decisions (IBM)

$3.1T $15M
...of organizational losses
per year are due to poor
...is the estimated
data quality (Gartner, 2018)
amount of money
that poor data quality ...of annual revenue is spent correcting
costs the US economy ~20% data errors and dealing with business
problems caused by bad data (MIT, 2017)
per year
...of analysis time is spent vetting and
>40% validating analytics data before it can be used
for strategic decision-making (Forrester, 2018)

Data literacy can help you avoid costly mistakes.

Experience AI
Watch this informative YouTube video to learn what it means to be data literate and how
we can start looking at information and the world a little differently. To play, scan the given
QR code or open the link https://www.youtube.com/watch?v=yhO_t-c3yJY in your web browser.

It is important to remember that data literacy is a journey. The more you practise these steps,
the more comfortable you will become with the skills and knowledge required for understanding
data. So, start exploring, ask questions and use the power of data for an exciting career.

46 SUPPLEMENT—Decoding Artificial Intelligence–IX

Data Literacy Process Framework
Any organization working towards data literacy may follow a simple but effective
Data Literacy Framework for implementing their program. Data literacy framework
provides guidance on using data efficiently and with all levels of awareness.
Data literacy framework is an iterative process and can be revised for new goals after
previously set targets are achieved in the evaluation stage. The complete data literacy
framework can be divided into 6 parts, from planning to evaluation, each with a specific set of
goals as illustrated in the given image.

COMMUNICATE DEVELOP EVALUATE

Design a CULTURE Design an
communication The adopted program evaluation metric
plan explaining the will improve data for the program
purpose of the goal literacy skills through and decide on
and requesting for learning and will be how frequently
commitment imbibed into the progress will be
towards it. existing culture with measured.
time.

PLAN ASSESS PRESCRIPTIVE

Any program starts Introduce the
LEARNING
with a discussion on participants to a data Prescriptive learning
defining the goal, literacy assessment shall provide a set of
understanding the tool and find their resources for
participants, execution comfort level with individuals to choose
strategy and the data. from in alignment with
timeframe. their learning style.

DATA SECURITY AND PRIVACY

Data Security refers to practices that keep information safe from unauthorized access, theft,
corruption or destruction. This ensures protection of your data. It is considered as a shield
that keeps hackers, malware and other threats at bay.
Data Privacy focuses on control over your personal information. It is about who gets to see
your data and how it is used. This means that you decide who has the key to your personal
information and what they can do with it.
Both terms complement each other but have different meanings as shown in the table
below:

Feature Data Security Data Privacy

Focus Protection of data Control over data
Goal Prevents unauthorized access and misuse Ensures users have control over their data

Data Literacy 47
Examples:
• Data Security: Using strong passwords, encrypting data and having firewalls in place are
all examples of data security measures.
• Data Privacy: Choosing what information to share on social media platforms or deciding
whether to allow apps to access your location are some examples of data privacy choices.

g ont PERSONAL DATA

a r er
J Al Personal Data refers to the information or data collected by any government or private
organization/agency which can be used to identify an individual.

Increasing Importance of Data Security and Privacy

Imagine your life story—finances, health records and even those embarrassing childhood
photos—stored online. This is essentially happening today with most people. We access
several websites and apps every day and all that data needs a place to stay online.
For example, consider the following data and storage methods:
• Photos & Files: Cloud storage like Google Drive or Dropbox keeps your pictures,
documents and other files safe and can be accessed from anywhere.
• Banking Details: Banks have secure servers to store your account information,
transactions and more. You can manage it all online too!
• Health Records: Doctors store your medical history, test results and other information
electronically.
• Social Media: Facebook, Instagram, etc., store your profile, photos and the posts that
you share.
• Shopping Websites: When you shop online, the stores keep track of your purchases and
shipping details securely.
• Websites & Blogs: If you have a personal website or blog, it is stored on a server that
people can access on the internet.
• Mobile Apps: The apps that you use collect information to personalize your experience,
for example, fitness apps track your steps daily.
• Government Documents: Important documents like tax filings and birth certificates are
stored securely in government databases.
While some of the information stored
online is kept very secure, the same is
not true for all the websites and apps
that we use.
Companies collect tons of data
pertaining to us—what we buy, where
we browse and even who we chat
with online. This data can be used for
targeted advertising but it can also be intrusive.

48 SUPPLEMENT—Decoding Artificial Intelligence–IX

Data security and privacy are crucial because they help:
• Protect us from identity theft and fraud
• Maintain control over our personal information
• Safeguard sensitive information
• Build trust in online interactions
By understanding both data security and data privacy, we can make informed decisions about
how we share our information online and take necessary steps to protect ourselves.

Experience AI
DATA MISUSE AND HOW TO AVOID IT
Watch an informative YouTube video on what is data misuse and how we can avoid
the misuse of our personal data. To play, scan the given QR code or open the link
https://youtu.be/ixzgRHfEkoY in your web browser.

Case Study
WhatsApp
WhatsApp updated its privacy policy in 2016, allowing it to share user data with Facebook and
other companies. Concerns arose and legal cases were filed in Indian courts.
• What happened: The Delhi High Court ruled that WhatsApp’s policy of sharing user data
with Facebook violated privacy rights.
• Insights: This case showcased the importance of user consent and transparency in data
collection practices. It also raised concerns about data transfer across borders.
• Outcome: WhatsApp faced pressure to modify its policies and provide users with more
control over their data.
The focus ultimately shifted to the ongoing debate on national data protection laws in India.

Understanding how to keep your data secure (protected from unauthorized access) and private
(controlled by you) is very important in this digital age. India is a rapidly growing digital nation
but data security and privacy is an important focus area in this rapid digitalization.

Case Study
DigiLocker
DigiLocker is an Indian government initiative
that aims to make your life easier by providing
a secure digital document wallet. It acts as a
safe deposit box for your important documents
in cloud.
Working of DigiLocker:
• Stores digital documents: You can upload scanned copies of your government-issued
documents like driving licence, PAN card and educational certificates into your DigiLocker.

Data Literacy 49
• Issued by trusted sources: Government agencies and educational institutions can also
issue documents directly to your DigiLocker. This ensures authenticity and eliminates the
need for physical copies.
• Easy access: You can access your DigiLocker from anywhere and anytime using your
smartphone or computer. No more worries about lost or damaged documents.
• Sharing made convenient: You can easily share your documents electronically with
authorized entities like banks or employers.

Benefits of DigiLocker:
• Reduced paperwork: It eliminates the hassle of carrying physical documents everywhere.
• Increased security: Documents are stored securely in the government’s cloud structure.
• Convenience: Access and share your documents anytime, anywhere.
• Environment-friendly alternative: It saves paper and reduces the need for physical copies.

Getting Started with DigiLocker:

• You can download the DigiLocker app or visit the link https://www.digilocker.gov.in/
• Sign up using your Aadhaar or mobile number.
• Link your Aadhaar for additional features.
• Upload your scanned documents or wait for them to be issued electronically.

Do it Yourself
Research about DigiLocker online and make a presentation on its security and privacy
features.

Exposure to Cyber Crimes

If the data of a person or an organization is stolen or hacked, there are additional threats of
being exposed to cybercrimes which may include the following:
• Cyberbullying: It is when someone repeatedly and intentionally harasses, mistreats or
makes fun of another person using online means like emails or social media.
• Phishing/Hacking: It refers to the attempt of stealing personal data from the computer/
phone of a person by sending them some malicious links through emails or messages.
• Spamming: It refers to receiving a lot of unsolicited messages, emails and tele-calls
due to the leak of your personal contact information for the purpose of commercial
advertising or any prohibited purpose such as phishing.
• Impersonation: It is a form of digital identity theft where a person may collect
someone’s personal details including photographs from their social media posts and
impersonate them by creating fake accounts to commit fraud on their family and
friends.
The list of concerns regarding unethical use of data is endless.

50 SUPPLEMENT—Decoding Artificial Intelligence–IX

Experience AI
BE INTERNET AWESOME

Play Interland by Google to learn the safe practices of cyberworld. It is an interactive, fun game
which teaches you the safety precautions and the risks associated with cyberworld. To play,
scan the given QR code or open the link https://beinternetawesome.withgoogle.com/en_us/interland
in your web browser.

Government Initiatives – Data Privacy in India

The Government of India has enacted the Digital Personal Data Protection Act, 2023 (DPDPA).
It is a special legislation to safeguard the privacy of individuals in the digital age. The Act
focuses on protecting personal data, which includes any information that relates to an
identified or identifiable individual. This can range from your name, address and phone
number to financial information, health records and online activity.
Key Provisions:
1. Consent: The Act emphasizes obtaining informed consent (permission) from individuals
before processing their personal data. You have the right to know how your data is being
collected, used and stored. You can also choose to give or withhold your consent.
2. Rights of Individuals: The Act empowers individuals with various rights. These include
the right to access, rectify, erase and restrict the processing of your data.
3. Obligations for Businesses: Organizations collecting personal data have specific
obligations under the Act. They must implement robust security measures, check
privacy impact assessments and respond to user requests regarding their data.

Data Literacy 51
g ont DATA PROTECTION LAWS
a r er
J Al Data Protection refers to the set of privacy laws, policies and procedures that aim to minimize
intrusion into one’s privacy caused by the collection, storage and dissemination of personal
data. 137 out of 194 countries in the world have already put in place legislation to secure the
protection of data and privacy.

Benefits of DPDPA:
• Increased data privacy for individuals
• More transparency from organizations handling personal data
• Stronger data security measures
• Empowered users with more control over their information

Do it Yourself
Familiarize yourself with your rights under the DPDPA. You can access the official document
on the MeitY (Ministry of Electronics and Information Technology) website by opening the
link https://www.meity.gov.in/content/digital-personal-data-protection-act-2023 in your
web browser.

DATA PRIVACY, SECURITY AND ARTIFICIAL INTELLIGENCE

In the present-day scenario, where artificial intelligence is increasingly being used in apps,
websites and other software, the security and privacy of our data is also becoming more
important.
You may be wondering what AI has got to do
with Data Privacy and Security. AI is a Data
Guzzler—it works like a giant sponge and
soaks up massive amounts of data to learn
and improve. The more data it has, the better
it gets at recognizing patterns, making
predictions and even generating creative
text formats.
In short, AI feeds on anything from numbers and text to images and videos. If this data isn’t
secure, hackers could steal it, tamper with it or use it for malicious purposes.
Let us explore the details of the connection between data security, privacy and AI:
1. Protecting Information used by AI: AI systems use vast amounts of data to make
decisions. This data can be personal—your browsing habits or facial features. Keeping
this information secure and private is crucial to prevent unauthorized access or misuse.

Consider this: An AI system in your shopping app recommends products to you based on
your online browsing habits. Although quite convenient, it can be very intrusive. Do you
really want AI to know your every online move?

52 SUPPLEMENT—Decoding Artificial Intelligence–IX

2. Privacy Concerns with AI Decisions:
AI can analyze personal information
to make predictions or suggestions.
For example, facial recognition
systems raise privacy concerns
because they could be misused for
surveillance.

Consider this: Imagine an AI system
used for facial recognition at an
airport is hacked. Hackers could
gain access to sensitive personal
information, black out important information or even replace the faces of criminals with
innocent people in the security footage. Scary!
3. Safeguarding Training Data for AI: Training AI requires the creation and utilization
of large datasets. These datasets might contain sensitive information about millions of
users in a single place. Protecting these datasets from unauthorized access is essential to
ensure the accuracy and privacy of the information used.
4. Preventing Bias and Unfair Treatment: AI systems can inherit biases from the data they
are trained on, leading to unfair outcomes. Data security and privacy practices should
address these biases to promote fairness and ethical use of AI.

Case Study
THE IMPORTANCE OF FAIR TRAINING DATA
In 2018, Amazon scrapped an AI hiring tool because it discriminated against female
candidates. The tool learned from past hiring data which reflected gender bias. This case
highlights the importance of secure and unbiased data in AI development.

g ont ALGORITHMIC BIAS

r e r
Ja Al Algorithmic bias refers to systematic errors or prejudices in AI algorithms that often
result from biased training data.

5. Transparency and Responsibility: Data privacy regulations like DPDPA ensure

transparency and accountability in AI. Organizations must explain how they collect, use
and protect data to comply with these laws.
6. Innovation vs Privacy—A Balancing Act: Advancements in AI and Machine Learning,
including some extremely popular products like Microsoft Bing Chat, OpenAI ChatGPT
and Google Gemini, bring new challenges for data privacy.

Data Literacy 53
FOOD FOR THOUGHT—AI AND OUR DATA
With AI systems becoming more and more common, people are increasingly interested in understanding
how they work. This is important because we want to make sure these systems are fair and don’t misuse
our private information.
The future of AI depends on finding a good balance. We want to keep creating new and exciting
technologies but we also need to make sure our personal information is protected. By working together,
we can ensure that AI benefits everyone in a safe and responsible way.

Cybersecurity Best Practices

We can adopt several best practices to improve our online safety. In today’s digital world,
being connected 24×7 exposes us to a sea of information and online tools.
While it is all fun and games with memes and study groups,
keeping our data and devices safe is of utmost importance. Some
important cybersecurity practices are mentioned below:
1. Use Strong Passwords: Create unique and strong
passwords for your account. Incorporate a combination
of letters, numbers and special characters. Avoid using
information like birthdays or common words which are
easy to guess. Since memorizing multiple passwords can
pose an issue, consider using a password manager app to
store and encrypt your passwords securely.
2. Enable Two-Factor Authentication (2FA): Turn on 2FA whenever possible. This adds an
extra layer of security by requiring a second form of verification (a text message code) in
addition to your password.
3. Keep Software Updated: Regularly update your operating system, apps and antivirus
software. Software updates often include important security patches that help protect
against vulnerabilities.
4. Be Cautious with Links and Attachments: Avoid clicking on suspicious links or
downloading attachments from unknown sources. These could be phishing attempts to
steal your personal information or install malware on your device.
5. Use Secure Networks: Connect to secure Wi-Fi networks, especially when handling
sensitive information like using passwords for online banking. Avoid using public Wi-Fi
for tasks that require login credentials or sensitive data.
6. Be Mindful of Social Media Settings: Review and adjust your privacy settings on social
media platforms to control who can see your posts and personal information.
7. Backup Your Data Regularly: Backup important files and data to an external hard drive
or cloud storage. This ensures you can recover your data in case of a ransomware attack
or hardware failure.
8. Educate Yourself about Cybersecurity Risks: Stay informed about common
cybersecurity threats like phishing, malware and identity theft. Be cautious and
skeptical of unsolicited emails or messages asking for personal information.

54 SUPPLEMENT—Decoding Artificial Intelligence–IX

9. Limit Personal Information Online: Avoid sharing unnecessary personal details
(your address or phone number) on websites or social media. Be selective about what
information you make public.
10. Follow Secure Browsing Practices: If you are unsure about a website’s security, don’t
enter any personal information. Look for the padlock symbol in the address bar and a
URL that starts with ‘https’—these indicate a secure connection.
11. Practise Safe Online Gaming: Be cautious while interacting with other players online.
Avoid sharing personal information or clicking on links from unfamiliar sources on
gaming platforms.
12. Be Aware of Personal Data Rights: Personal data and identity of customers including
biometrics, phone numbers, Aadhaar numbers, bank details, etc., must be processed in a
manner that ensures appropriate security of personal data, including protection against
unauthorized access and accidental loss, destruction or damage.

g ont BIOMETRICS
r r
Ja Ale The physical characteristics of a person which can uniquely identify them are called
biometrics. Some popular biometric characteristics are fingerprints, eye scans and facial features.
If you have registered for an Aadhaar card with Unique Identification Authority of India (UIDAI),
your biometrics have been recorded and linked to your Aadhaar number to uniquely identify you.
Biometrics are also used for attendance management in offices and schools these days.

13. Talk it Out: If you encounter something suspicious online or if you are unsure about
something, talk to a trusted adult, teacher or parent. They can help you to handle the
situation and keep you safe.

FUN Learn how to protect yourself online. Follow the rules shared in the
TIME YouTube video to avoid becoming a victim of phishing, scams, ransomware
and cyber threats. To play, scan the given QR code or open the link
https://www.youtube.com/watch?v=aO858HyFbKI in your web browser.
You can also refer to the CBSE guidelines on cyber safety by accessing
the link https://www.cbse.gov.in/cbsenew/documents/Cyber%20Safety.pdf

ACQUIRING, PROCESSING AND INTERPRETING DATA

‘Data is the new oil’ is a famous quote by British mathematician Clive Humby, highlighting
how data has become important for individuals, organizations and even countries. In the
present era, where Artificial Intelligence and Machine Learning technologies are being used
for solving essential problems of the world, it is data that is enabling development of these
advanced technologies. In addition, data is also the cornerstone of Internet of Things (IoT).
All these technologies are driving today’s world but, in fact, they rely on data to optimize
processes, personalize experiences and automate tasks.

Data Literacy 55
Where does data come from
Data is all around us. The address of your home, the name of your school, the number of
students in your class and the day, date or time that we observe—they are all data. Data
doesn’t always record numbers or words; it may represent the temperature outside, the sound
of music or visuals like pictures and maps.
Quick Question: How much data did you generate today?
In a survey of Class 9 students in a school, majority of the students believed that since
they did not create any document, presentation or spreadsheet that day, they did not
generate any data. Interestingly, the students later realized they were wrong. Let us find
out how.
Directions: Read each question carefully and answer YES or NO.
1. Do you swipe up on reels?
2. Do you chat with your friends online?
3. Do you use a library card to borrow books?
4. Do you wave your hand to open the automatic doors at the school or grocery store?
5. Do you wear a fitness tracker that counts your steps?

6. Do you use a search engine to find information online?

7. Do you leave a review for the restaurants that you visit?

8. Do you use an app to listen to music?

9. Think about your favourite video game. Does it track your progress or high scores?

10. Does your school use a system where you tap your ID card or scan your fingerprint to
enter the school building?
(Bonus: Think about what data might be collected in each of the above activities.)
If the answer to any of the above questions is YES, then you are generating data. The level of
data integration in our daily lives is seamless and continually growing. Almost everything we
do generates data or involves the use of data-driven technologies. Data is important across all
aspects of human life.
We already know that data might not be useful by itself. However, when organized and given
some context, it gets converted into useful information. The data which has been converted
into information can help us make decisions.
For example, information can help us answer the following questions:
• Which is the best-selling laptop under INR 40,000?
• Which toy store has the best collection of board games?
• Which country has the coldest climate?
• Who is the best batsman in the Indian cricket team?
This indicates that before we seek information, we always need to ask some questions.

56 SUPPLEMENT—Decoding Artificial Intelligence–IX

In particular, we can say that our initiation to data literacy usually begins with some
questions which can be answered with relevant information or data related to each question.
For example, we can make weather forecasts with satellite data or predict traffic conditions
using map data, but we cannot use Pro Kabaddi League data to tell who is the most successful
batsman in the Indian cricket team.

g ont
r r RELEVANT INFORMATION
Ja Ale Relevant Information refers to specific data that can be applied to solve a problem.

Before we start using data to make decisions, we usually need to answer a few more questions:
• With so much data present around us, how do we decide on which data to use?
• Which data is relevant?
• What are the different types of data available?
• What are the potential reliable sources to look for to acquire relevant data?
Before finding answers to these questions, first we must understand how data is stored and
represented as variables in different domains.

What are variables

The term variable is usually used in the context of Mathematics, Programming and Statistics,
all of which are used in conjunction with data processing. Interestingly, its meaning
is interpreted differently based on usage. Let us understand the meaning of variable in
each context.
• In Mathematics, a variable is used to represent unknown quantities. For example, in
the equation given below, x and y are called variables and the equation is called a Linear
Equation in Two Variables.
5x + 20y = 100

• In Programming, a variable is a storage location which has been given a unique name
to identify it to the program code. For example, a programming statement like x = 9
represents a storage location in a computer—its unique name is x and we have stored a
value of 9 in this storage location.
Consider two programming statements in the same program as given below:

x = 9
# The value stored in the variable x is 9
x = 22
# The value stored in variable x is now 22

From the above example, you may observe that the storage location remained the same
but the value stored was changed.

Data Literacy 57
• In Statistics, a variable represents the characteristics which can be recorded, measured
or counted and can have different values. For example, the characteristics of a person—
height, weight, age, eye or hair color—can be called statistical variables because their
values vary from person to person and may even change with time. Similarly, for an
employee, the variables can be designation and salary, both of which can be different for
different people and can also change with time.
In the above meanings of variable, one thing is common. They store observations which may
change. So, let us make a general definition of variable:
A variable is something that can be used to store or record some observations which can
change with time or from one observation to another.
In contrast, a data item, observation or quantity that can assume only one value is called
a constant. For example, the value of Pi (p = 3.14) is considered a constant, the hours in
a day are constant and the acceleration due to gravity is also considered a constant with
g = 9.8 m/s2.
Depending on the type of values being stored in them, variables can be classified into two
different categories. Let us understand the categories of variables.

Numerical Variables
Numerical variables are used to store measurable values and can be represented by numbers.
• Numerical variables allow measurement.
• We use numbers to represent numerical variables.
• Basic mathematical operations are applicable to numerical variables.

Categorical Variables
Categorical variables are used to classify data into different categories or types.
• Categorical variables do not allow measurement.
• All Yes/No answer types also require categorical variables to store data. For example, the
answer to ‘Do you own a pet?’ can be classified as ‘Yes’ or ‘No’.
• Categorical variables can be represented with both numbers and characters but
mathematical operators do not apply in the number representation.
For example, in India, PIN or ‘Postal Index Number’ code uniquely identifies an area.
These PIN codes classify different areas with different numbers. However, we cannot
add two PIN codes together or say one PIN code is bigger than the other. Similarly, the
phone number of a person is stored as a categorical variable.

TYPES OF DATA
Any variable is meaningless unless it is associated with some data. In fact, it is the data which
decides what kind of variable shall be used to store it. Data may be classified based on the
following criteria:
• Categorization by Data Property
• Categorization by Organization
• Categorization by Applications

58 SUPPLEMENT—Decoding Artificial Intelligence–IX

Categorization by Data Property
The two types of data used in any decision-making based on data properties are Quantitative
and Qualitative Data. Let us understand both types of data.

Quantitative Data
As the name suggests, quantitative data is associated with the word ‘quantity’. Quantitative
data is used to answer research questions like ‘How many?’, ‘How much?’ or ‘Which is more?’.
Definition: All data values that can be measured, counted and compared to each other (less
or more) on the basis of quantity are called Quantitative Data. Some common examples of
quantitative data are listed below:
• Data which describes a person’s age, height or weight
• Data which describes a book’s number of pages, price or weight
• Data which describes a car’s maximum speed, mileage, weight, etc
In other words, data which contains a numerical variable is always quantitative data.
Quantitative data can be further categorized as follows:
1. Discrete Data: Discrete data consists of distinct, separate values that are whole numbers
and cannot be subdivided further. This type of data represents countable quantities or
categories with clear boundaries between values. Examples of discrete data include:
• Number of students in a classroom
• Number of items sold in a store
• Number of cars in a parking lot
• Number of siblings in a family

2. Continuous Data: Continuous data represents measurements that can take any value
within a range and can be infinitely subdivided into smaller increments. In continuous
data, values can vary continuously and may include decimal or fractional values.
Examples of continuous data include:
• Height and weight of individuals
• Temperature readings (25.3°C, 27.8°C, 30.1°C, etc.)
• Time taken to complete a task (3.45 minutes, 4.72 minutes, etc.)
• Blood pressure readings

Qualitative Data
Qualitative data is derived from the word ‘quality’. Qualitative data is used to answer
questions like ‘What type?’ or ‘What kind?’.
Definition: All data values that can depict the quality, properties or the distinguishing
characteristics to be classified into one out of two or more categories are called
Qualitative Data.

Data Literacy 59
Some common examples of qualitative data are listed below:
• Data which describes a person’s eye color, hair color, name, etc.
• Data which describes a book’s author, ISBN number, genre, publisher, etc.
• Data which describes a car’s color, fuel type (CNG, petrol, diesel) or body type
(hatchback, SUV, convertible)
In other words, data which contains a categorical variable is always qualitative data.

Importance of Quantitative and Qualitative Data

From the above examples, it can be observed that both quantitative and qualitative data can
be collected from the same object or person and can be used together to describe the same
thing at a time. Both quantitative and qualitative data are equally important to answer
research questions because they give the full picture together.

Object/Person being Numerical Variable Quantitative Data Categorical Variable Qualitative Data
observed (Data Collected) (Data Collected)
Boys • Age • 20 • Eye Color • Brown
• Height • 5.7 • Hair Color • Black
• Weight • 65.9 • Name • Aman

Case Study
EXAMPLE OF QUANTITATIVE AND QUALITATIVE DATA
Akash is a delivery boy who works for a gift company called ‘Gift-Shift’. He drives his red-colored
scooter which gives a mileage of 60 kmpl (kilometres per litre) and makes 5 trips a day to deliver
the gifts. He can carry three large boxes or several small boxes on each trip at most. He also
drives safely at a speed of 50 km/hr. Akash is a moderately built guy, weighing 75 kg and has
the same brown hair as the spots on his pet dog, a Beagle. The illustration below separates the
quantitative data from the qualitative data.

Quantitative Qualitative
How many? What ‘Type’
How much? How often? Which ‘Category’

Height of Vehicle Type of Job

Weight of Person Type of Pet

Number of Trips Hair Color

Number of Boxes Vehicle Color

Designation
Speed of Vehicle

Mileage Vehicle Model

60 SUPPLEMENT—Decoding Artificial Intelligence–IX

Do it Yourself
Try to guess the data stored in each type of variable given in the Case Study illustration.
For example, the Type of Pet is a variable which has data stored as ‘Beagle Dog’.
Suppose there was another variable called Number of Wheels. Would this variable
be qualitative or quantitative for vehicles?
(Hint: Try to apply the rules of distinguishing between qualitative and quantitative data based
on the rules of numerical and categorical variables.)

Experience AI
Learn more about qualitative vs quantitative data in healthcare by scanning the given
QR code or by opening the link https://www.youtube.com/watch?v=4iws9XCyTEk in your
web browser.

Categorization by Organization
Data may also be categorized based on its organization. Depending upon how the data has
been organized, we may have the following three categories.

Structured Data Semi-structured Data Unstructured Data

1. Structured Data: Structured data is organized in a predefined format which we may

find in spreadsheets with rows and columns, or databases with tables and fields. This
type of data is highly organized, easily searchable and can be analyzed using traditional
database queries.

Examples: Numerical data (i.e., sales figures, temperatures) and categorical data
(i.e., types of products, customer segments).
2. Semi-structured Data: Semi-structured data has a partial organizational structure,
often with tags or markers that provide some level of organization. This type of data
offers flexibility but has elements of organization for easy processing.

Examples: XML files, JSON data, web logs, etc.
3. Unstructured Data: Unstructured data is less organized and doesn’t follow a predefined
format. It can include text documents, emails, social media posts, images, videos and
audio recordings. Extracting meaning from unstructured data often requires sophisticated
techniques, for example, Natural Language Processing (NLP) for text analysis.

Examples: Text documents (i.e., emails, social media posts), images, videos, audio
recordings, etc.
Data Literacy 61
Categorization by Application
Data is sometimes used for a specific purpose, associated closely with application. Some
examples of data categorized by application include the following:

Temporal Data Spatial Data Biometric Data

1. Temporal Data: Temporal data includes timestamps or time-related information. This

data is collected over time, allowing for trend analysis. Temporal data is analyzed to
understand patterns and trends over time.

Examples: Sensor data (temperature readings over time), stock market data, weather
forecasts, etc.

2. Spatial Data: Spatial data refers to information related to geographic locations

or coordinates. It is used in Geographic Information Systems (GIS) for mapping and
spatial analysis.

Examples: GPS data, maps, satellite images, etc.
3. Biometric Data: Biometric data consists of unique physical or behavioural characteristics
used for identification. It is used in security systems and identity verification.

Examples: Fingerprints, facial recognition, voiceprints, etc.

Types of Data used in AI Applications

The properties of data help us categorize it using several approaches as discussed above.
However, the Machine Learning applications are also categorized into three different domains
based on the type of data they are trained on.
1. Data Science Machine Learning applications are trained on numerical and categorical
data (statistical data) present in tables, csv files, spreadsheets, database tables and even
in text files.
2. Computer Vision applications are trained on visual data including images and videos.
3. Natural Language Processing applications are trained on human conversation data
including text books, blogs, voice notes, conversations, etc.

Data Science ML Computer Vision Natural Language Processing

NLP

Tabular Data Visual Data Human Language

Tables and Spreadsheets Images and Videos Voice and Text

62 SUPPLEMENT—Decoding Artificial Intelligence–IX

DATA ACQUISITION
Data acquisition refers to the process of collecting and gathering data from various sources.
The data being collected for an AI system or other data science problems should be both
accurate and relevant. The data which may help in our study can be found using appropriate
data sources.
The quality and quantity of the acquired data significantly affects the performance and
reliability of AI systems. Data acquisition in AI includes gathering and showing countless
examples so that AI can learn to recognize patterns and make predictions.
Identifying Data Needs: The first step is to figure out what kind of data your AI system needs
to function. What problem are you trying to solve? What information will help the AI learn and
make decisions?
After identifying the data needs, we can discover data and then augment or generate data,
based on data availability, as discussed below:
• Data Discovery—Choosing Data Sources: The process of data discovery includes
searching online or offline to find the best sources. Some available data sources may
include public datasets, government-collected data or private company records. If the
data available is not in digital form, it will be required to be converted. The choice of data
sources depends on your project’s needs and budget.
We may get ample data from the discovery phase. However, sometimes, only very
limited data may be available for training our Machine Learning model. In such cases,
we may use data augmentation to increase the available data or use data generation
techniques to create new data.
• Data Augmentation—Increasing Available
Data: If the data discovery process results in
only a small amount of relevant data, which is
Single Image
not sufficient for our analysis needs, we need
to augment our data.
Data augmentation allows us to increase the
amount of data by using additional copies of After Augmentation
the available data after making small changes
to each copy. Consider the given illustration as an example.
• Data Generation—Creating Data: If there is no data available for solving a problem, we
can generate the required data synthetically or from the real world.
◼ For example, to teach a conversational bot about the rules, regulations and playbook of
a new video game, we may create a lot of synthetic data in the form of conversations
and train the bot on this synthetically generated data.
◼ We may also deploy sensors and IoT to capture real world data, for example, in Smart
Cities, and store it in datasets to be used in the training of ML algorithms.

Data Literacy 63
MOTION
ACCELEROMETER IR GAS TEMPERATURE CHEMICAL DETECTOR
SENSOR SENSOR SENSOR SENSOR SENSOR SENSOR
PROXIMITY SMOKE
SENSOR SENSOR

H
Emergency

Data Collection Methods: Once you have your sources, you need to collect the data. This could
involve web scraping, using sensors, conducting surveys or purchasing datasets.

Do it Yourself
Search online for weather-related datasets and compile the following information about them:
1. Web source/link to the dataset
2. Information stored in the dataset
3. Size of the dataset
4. Whether the dataset is free or paid

What are Data Sources

Definition: A data source is a general name given to some person, place, process or file where
the data, relevant to our problem statement can be found. We may find our answer in a single
source of data or in multiple sources, depending on our problem statement.
For example, consider our problem statement: Which type of fishes is bigger in size—river,
lake or sea fishes? To answer this question, we shall need to collect the size data for sea fishes
(first data source), river fishes (second data source) and lake fishes (third data source). It is
only after collecting the sizes of fishes from various water bodies that we would be able to
answer the research question correctly.
Whenever we come across a problem which can be answered with the help of data, it is
essential to understand that our solution shall only be as good as the data we collect. The
principle is called GIGO in Data Science.
The GIGO principle underlines the importance of quality in data collection. We need to choose
our data sources wisely to draw correct inferences from data.

g ont GIGO (Garbage In, Garbage Out)

a r er
J Al The principle of GIGO states that if we input incorrect data to any problem, which
depends on data, then the output of the process shall also be incorrect.

64 SUPPLEMENT—Decoding Artificial Intelligence–IX

Primary VS Secondary Data Sources
There are two categories of data collection methods based on how the data is being collected.
If you are gathering data by yourself, it is called Primary Data and if you are using the data
collected by someone else, it is known as Secondary Data. Accordingly, the sources of data
from which you can collect primary data are called primary data sources and likewise,
the secondary data sources contain data collected by someone else which you may use for
answering your data question.
It is important to understand that both primary and secondary data sources have their own
importance. For example, consider the following three data questions:
1. Which type of fishes is bigger in size—river, lake or sea fish?
It would be very difficult for a person to catch different fishes from the seas, lakes and
rivers and measure their sizes for collecting the required data. Therefore, primary data
may be ruled out in such a scenario. However, there are large organizations associated
with fisheries which regularly collect and record relevant information in datasets which
can be requested and used as secondary data sources.
2. Which types of sports are your classmates interested in? Which is the most popular sport
amongst your classmates?
The data for the above research questions can be collected as first-hand information.
Thus, here primary data collection is a useful technique.
3. Which social media platform is used by maximum people in your class? How does it
compare to students in other countries?
The above data questions can be answered with two data sources. One is primary data source
through which you can collect the details from your classmates. In addition, a secondary data
source like https://whatsthebigdata.com/social-media-statistics, which has collected
and recorded the social media usage statistics of young people across the world
over the years, is also useful. Here is a small graph which you can access by opening
the link shared above or by scanning the given QR code.
MOST USED SOCIAL MEDIA PLATFORMS WORLDWIDE
Facebook
YouTube
WhatsApp
Instagram
WeChat
TikTok
FB Messenger
Network

Telegram
Snapchat
Douyin
Kuaishou
X (Twitter)
Sina Weibo
QQ
Pinterest
0 500 1,000 1,500 2,000 2,500 3,000
Active Users (in millions)
Source: www.backlinko.com

Let us further explore primary and secondary data sources in the section ahead.
Data Literacy 65
Primary Data
The data which is collected first-hand by researchers, i.e., directly from a data source, is called
Primary Data.
Advantages of Primary Data
1. Primary data can be considered as the most trustworthy data because it comes from
authentic sources.
2. It can be customized to collect only the desired data as per the research question.
3. Primary data is up-to-date, based on real-time data collection.
4. The researcher has complete control on the data collection process.

Disadvantages of Primary Data

1. Primary data collection can be very expensive if several data sources are required to be
recorded.
2. Primary data collection can be a very time-consuming process.
The most common example of primary data collection on a very large scale is the Population
Census. To gather data for population census, government officials are required to pay a visit
to each and every house of the state/country, accumulating information about the residents
including age, education, religion, job, salary, marital status, etc. Primary data may also be
used by companies for market surveys to understand the interest of the customers before
launching a new product.
Based on the advantages and disadvantages discussed, we can observe that primary data is
suitable for personal surveys of small groups or in cases where reliability of data is much more
important as compared to the cost and time taken.

Primary Data Collection Techniques

There are several techniques for collecting primary data. Some popular primary data
collection techniques are discussed below:

Surveys
A survey may be conducted among people and is applicable to anything from which data can
be obtained.
For example, if you want to answer the research question ‘What are the career preferences
of classes 11 and 12 students of your school?’, the choice of criteria is classes 11 and 12 and
the sources of data are all the students studying in classes 11 and 12.
On the other hand, if your research question is ‘Which brands of private cars are the most
polluting?’, the choice of criteria is private cars and the source of data is the pollution check
conducted on all private cars for data collection.
Surveys can contain a variety of questions, based on the research problem, which the sources
may respond to and the responses are recorded to answer the research question.
Surveys are suitable for small groups but are very effective for data collection amongst very
large groups. Surveys can be conducted in person (for small groups), telephonically or even
online using WhatsApp, email and other social media platforms.

66 SUPPLEMENT—Decoding Artificial Intelligence–IX

Polls
Polls are a simpler form of surveys which are used to evaluate the quick understanding of
people’s preferences. Polls usually contain True/False or multiple-choice questions which the
responders complete quickly. The results can be immediately analyzed to see the inclination
of the respondents.
For example, look at the illustration below for an example of a poll taken from LinkedIn, a
popular professional networking website.

Let’s explore your interest.

If you can install only one application in your mobile, what would it be?

Facebook

Instagram

Other (Mention in comments)

The responses to the poll are shown below, where 68% selected LinkedIn, 8% chose
Facebook, 19% went with Instagram and 6% opted for Other.

LinkedIn 68%

Facebook 8%

Instagram 19%

Other (Mention in comments) 6%

Interviews
Interviews are designed to ask questions from respondents in a face-to-face environment.
The technique is applied usually when the number of respondents is very limited. Earlier,
interviews were conducted telephonically if a personal meeting was not feasible but
nowadays, online videoconferencing on platforms like Google Meet, Skype and Microsoft
Teams have made interviews a much better experience with live video transmission.

Questionnaires
Questionnaires usually contain a set of questions which may be structured to gather
information for use in a survey.

Feedback forms
Feedback forms are a form of survey containing a variety of questions but with a special use.
Feedback forms are utilized to gather data regarding the satisfaction level of a group after
an event or after the consumption of a product. This technique is different from a survey
Data Literacy 67
in the sense that surveys can be conducted for measuring the expectations while feedback
forms are designed to check if the expectations are met and to seek suggestions for further
improvement.
There are several other techniques for collecting primary data and you can easily identify
them if they fit the basic definition.

Secondary Data
Data which is not collected first-hand by the researcher but is used for answering research
questions is called secondary data. Usually, secondary data refers to data collected in the
past by someone and is made available by sharing. It can also be said that the data that one
considers secondary is, at one point, primary data for someone.

Advantages of Secondary Data

1. Secondary data is generally available free of cost or at a much lower expense as
compared to primary data.
2. Secondary data is easily accessible.
3. The time taken to collect secondary data is almost negligible as compared to primary
data collection.
4. Secondary data may be general purpose and useful in different forms of related research
questions.

Disadvantages of Secondary Data

1. We may need to extensively process secondary data and remove irrelevant parts to make
it useful for our own research question.
2. Secondary data may be obsolete and lead to wrong conclusions.
3. It is difficult to check the authenticity and reliability of secondary data.

A common example of secondary data is the data collected from various online sources. Some
other sources of secondary data may include, but is not limited to, the following:
• Government reports
• Internal records of an organization
• Reports by public research organizations
• Internet websites
• Libraries
• Journals, newspapers and magazines

Do it Yourself
Classify the following data collection methods as primary or secondary.
1. You have been asked to collect food preferences of all the students of your class for
a school picnic. You decide to create a poll and record the food preferences of your
classmates. Which data collection method is this?

68 SUPPLEMENT—Decoding Artificial Intelligence–IX

2. As a volunteer for the health club of your school, you have been assigned to record the
data of kids in your locality who are eligible for pulse polio vaccination. You decide to
conduct a door-to-door campaign in your area to collect the required data. Which data
collection method is this?
3. The cultural committee of DataRich High School is hosting the annual talent show for
students of classes 9, 10, 11 and 12. As a committee member, you have been tasked
with collection of details of students with different talents. You decide to reuse the
talent data collected last year, which contains all the required details, and add fresh
data of the new batch of class 9 using an online survey. Which data collection method
is this? (Hint: Data collection may require more than one method.)

Best Practices for Data Acquisition

By following certain best practices, organizations can effectively acquire high-quality data for
AI applications, leading to more accurate and reliable AI systems.
• Define Clear Objectives: Start by defining the objectives and requirements of the AI
project. Identify the specific types of data needed to achieve the desired outcomes.
• Data Quality is Key: ‘Garbage in, garbage out’ applies to AI too. Ensure your data
is accurate, complete and relevant to your project. Dirty data can lead to biased or
inaccurate AI models.
• Maintain Data Privacy and Security: Respect privacy and ensure data security
throughout the acquisition process. Remove sensitive information when necessary to
protect user privacy.
• Data Diversity is Essential: Instead of collecting data from one source, use multiple
possible sources. A diverse dataset helps the AI perform better in real-world scenarios.
• Consider Ethics and Privacy: Be mindful of data privacy regulations and ethical
considerations when acquiring data. Make sure you have proper permissions to collect
and use the data.
• Documentation is Essential: Keep a record of how and where you acquired your data.
This is crucial for maintaining transparency and reproducibility in your AI project.

Case Study
COLLECTING DATA FROM WEBSITES
Problem Statement: Rajesh has been tasked with
collecting product reviews from websites. What are
the best practices for this data collection?
The process of collecting data from websites using
automated programming methods is called Web
Scraping.
1. Define Clear Objectives:
• Goal: What details do you want to gain
from the reviews (e.g., sentiment analysis,
Web Scraping
understanding customer preferences)?

Data Literacy 69
• Focus: Specify the type of products, brands or websites that you would target for
reviews.
• Metrics: Determine how you would measure success (e.g., number of reviews collected,
accuracy of sentiment analysis).
2. Data Quality is Key: Choose established and reliable websites known for genuine user
reviews.
• Data Cleaning: Address typos, grammatical errors and inconsistencies in the data to
improve accuracy.
3. Maintain Data Privacy and Security: Avoid collecting personally identifiable information
(PII) unless explicitly authorized.
• Website Terms: Is it legal? Do you need permission? Adhere to the terms and conditions
of the websites that you are scraping data from.
• Secure Storage: Store collected data securely to prevent unauthorized access or
breaches.
4. Data Diversity is Essential: Collect reviews from various websites, review aggregators
and forums to capture diverse perspectives.
• Positive & Negative Reviews: Don’t just focus on positive reviews. Capture a balanced
mix of positive and negative feedback.
• Time Period: Include reviews from different timeframes to account for product updates,
changing customer trends, etc.
5. Consider Ethics and Privacy:
• Data Regulations: Be aware of data privacy regulations as per law that may apply to
your data collection methods.
• Consent & Transparency: If collecting user data directly, obtain informed consent and
be transparent about how it will be used.
• Respectful Scraping: Avoid overloading website servers with too many scraping
requests.
6. Documentation is Essential:
• Data Source Tracking: Maintain a record of the websites that you scraped data from,
including URLs and dates.

Features of Data
Data features refer to individual measurable properties or characteristics of data
that are used as inputs in Artificial Intelligence systems. Features represent specific
attributes or variables that provide information about the data points being analyzed.
Data features are used by AI or machine learning models to make predictions or
classifications.

70 SUPPLEMENT—Decoding Artificial Intelligence–IX

Consider a scenario where your AI system is created to predict house prices. Some features
(data points) that may influence the price of a house could be:
• Square footage
• Number of bedrooms
• Location
• Age of the house
Some important terminologies followed in AI systems with respect to data features:
1. Attributes or Variables: Features are often synonymous with attributes or variables in a
dataset. Each feature represents a specific aspect or property of the data.
2. Input Variables: Features are used as input variables in machine learning models to
make predictions, classify data and discover patterns.
3. Numerical or Categorical: Features can be numerical (age, temperature) or categorical
(color, gender). Numerical features contain measurable values while categorical features
represent discrete categories.
4. Dimensions: Features determine the number of dimensions of data space. In simple
words, high-dimensional data refers to datasets with many features.
5. Independent and Dependent Features:
• Independent features (or independent variables): Independent features are input
variables that are used in the model. They are predictor variables or attributes that
are provided to the model to make predictions.
These features are assumed to be independent of the target variable (output) in the
context of the model.
• Dependent feature (or dependent variable): Dependent feature is the target variable
that the model aims to predict based on the independent features. It represents the
output or result of the model’s predictions.
The model learns to relate the independent features to the dependent variable during the
training process.

Independent variables
Square footage, Number of
bedrooms, Location, etc.

AI system Model
Dependent variables
House Price Learns the relationship between given
independent variables and dependent
variable and produces a model

Data Literacy 71
For example, independent variables—square footage, number of bedrooms, location, age of
the house—allow the AI system to predict the house price, which becomes a dependent or
target variable.

Independent variables Model Prediction

(Previously Unseen) House Price for input
Square footage, Number of Uses previously learned independent variables
bedrooms, Location, etc. relationships between given
independent variables and
dependent variable and makes
a prediction for the new values
provided as input

Data Preprocessing
The quality and relevance of these features directly impacts the ‘goodness’ of the AI system.
To improve the quality of the collected data, we use a method called data preprocessing. The
process of preparing data for analysis by cleaning, transforming and organizing it so that the
AI model can learn from it efficiently is called data preprocessing. Let us understand some
common data preprocessing techniques with examples.

1. Handling Missing Values:

• Scenario: Consider a house-price prediction dataset. There may be some entries in your
house price data that have missing values for the category ‘number of bedrooms’.
• Techniques: You can fill in the missing values with the average number of bedrooms in
the dataset (known as imputation) or remove all the rows with missing data altogether
from the dataset (known as deletion).

g ont IMPUTATION
r e r
Ja Al It refers to replacing the missing values with a statistical measure (e.g., mean, median,
mode) of the feature.

2. Dealing with Outliers:

• Scenario: There might be a house in your dataset with a ridiculously high price compared
to others (an outlier).
• Techniques: You can decide to keep the outlier if it is a valid data point, remove it if it
seems like an error or cap the value at a certain limit to reduce its influence.
3. Encoding Categorical Data:
• Scenario: The answer to ‘Swimming Pool Availability: Yes or No’ in your dataset is text
data but it is likely that your model may require numerical features.
• Technique: You can convert the ‘yes/no’ to numerical labels using a technique called
one-hot encoding. This creates separate binary features (Yes = 1, No = 0) for each
unique response, indicating presence or absence.
72 SUPPLEMENT—Decoding Artificial Intelligence–IX
4. Feature Scaling:
• Scenario: There is no doubt that features like ‘square footage’ (ranging between 1000s
to 10,000s) contain a much larger range of values as compared to, say, ‘number of
bedrooms’ (ranging between 1 and 7). This can affect how the model learns. The model
may think that square footage is 1000s of times more important as compared to the
number of rooms, which is not correct.
• Techniques: Feature scaling techniques like normalization or standardization can be used
to bring all the features to a similar range, ensuring each feature contributes equally
to the model’s learning process.
5. Data Transformation:
• Scenario: Sometimes you may want to create a new feature by combining the existing
ones. For example, ‘total area’ could be calculated by adding ‘square footage’ and
‘garage size’.
• Technique: You can create new features based on your needs and domain knowledge to
potentially improve the model’s performance.
These preprocessing techniques, when applied to data, ensure that the data to be used is
clean, consistent and ready for your AI or Machine Learning model.

Data Usability
Data usability in ML simply means how well-suited the data is for training a Machine
Learning model. Usable data has the following characteristics:
• Clean and organized: This means the data is formatted properly and is free of errors.
• Accessible: Data is stored in a way that the ML model can readily access it.
• Understandable: The data is clear and documented so the model can learn from it
effectively.
Good data usability helps ML models train faster and make better predictions. Different
datasets may have different usability and companies usually assign a numeric value between
0 and 10 to usability.
Data Processing and Data Interpretation
Now that you understand data features and preprocessing, let us explore the next important
steps: data processing and data interpretation.
Data Processing takes the prepared (pre-processed) data as input and processes it according
to your analysis goals. Data processing helps us to convert our data into meaningful information,
which is suitable for analysis. Here are some common data processing techniques:
• Aggregation: It refers to combining data from various sources and summarizing it for
better decision-making. For example, calculating the average house price per pin code
in your housing data.
• Sorting: It means arranging data points in a specific order. For example, sorting house
prices from lowest to highest for more information.
• Filtering: Selecting some subsets of data based on a chosen criterion. For example, in
the data set, we may focus only on houses built in the last 5 years and ignore all other
house-related data.
Data Literacy 73
Data Interpretation is the process of using processed data to extract meaning, identify patterns
and draw conclusions. It helps us to make sense of the information that we have obtained from
the data. Data interpretation also helps to answer questions in our problem statement.
Here is data interpretation in action with an example:
• Scenario: You have processed your house-price data.
• Interpretation: By analyzing trends, you may find that houses with more bedrooms or
having a garage parking tend to be more expensive.
Data processing and interpretation are iterative processes. You may need to go back
and forth, refining your processing techniques based on your initial interpretations or
vice versa.
Here are some additional points to remember:
• The choice of data processing and interpretation techniques depends on the type of data
and analysis goals.
• Data visualization tools like charts and graphs can be powerful aids in interpreting data
and communicating insights to others.
• Always be cautious of drawing biased conclusions based on limited data or faulty
processing methods.

Methods of Data Interpretation

Data interpretation is the art of turning processed data into meaningful insights and stories.
All data interpretation activities are based on quantitative or qualitative method, as explained
in this section.
Quantitative Data Interpretation: Quantitative data interpretation focuses on analyzing
numerical data and using mathematical methods to draw conclusions and make predictions.
This type of data is measurable and expressed in numbers.
• Quantitative data helps us answer questions like ‘how many’, ‘how often’, ‘how much’
and the like, which report numbers in the answer. For example, ‘How many numbers of
likes are there on the Instagram post?’ is quantitative data and if the number of likes is
growing rapidly, we may interpret that the post is going ‘viral’.
Examples of quantitative data may also include counts, sums and averages about categories
like number of students in a class, scores in a math’s test, average temperature for a week or
sales figures for a business.
Quantitative data interpretation is supported by numerical graphs and charts. There are
different tools which help us collect data for quantitative interpretation, some of which are
mentioned below:
• Surveys are versatile tools for collecting data from a large population via multiple-
choice, Likert scale or open-ended questions.

74 SUPPLEMENT—Decoding Artificial Intelligence–IX

Never Sometimes Often Always

I order at least one meal from 1 2 3 4

an online food delivery app.

Never Sometimes Often Always

I use the product on a 1 2 3 4

monthly basis.

Never Sometimes Often Always

I drink coffee from cafes at 1 2 3 4

least once a day.

Some examples of Likert Scale

• Experiments done in controlled environments to test cause-and-effect by manipulating

variables. For example, finding the boiling point of water by increasing temperature.
• Observations involve systematic recording of data about the target.
• Existing Datasets allow researchers to efficiently utilize data collected by others.
• Sensors and Instruments help in automatically collecting numerical real-time data on
physical phenomena like temperature, pressure, etc.

FOOD FOR THOUGHT—AI AND OUR DATA

A general 4 Steps to Quantitative Data Analysis approach is applicable in most scenarios and is listed
below:
1. Knowing Your Variables: Figure out what you are measuring (height, weight, age) and how it is
measured (inches, pounds, years).
2. Making Sense of Numbers: Use descriptive statistics like mean (average) or median (middle value)
to summarize your data.
3. Choosing Your Measurement Scale: Decide on the scale for your measurements
4. Presenting Clearly: Present your data in easy-to-understand formats like charts or graphs.

Qualitative Data Interpretation: It is easy to gather and interpret quantitative data, which
deals with numbers and measurements. But what about the human element? How do we
understand people’s experiences and motivations? Qualitative data can capture emotions,
feelings and experiences that people have.
The interpretation of qualitative data helps the analysts to:
• Focus on Emotions and Feelings: Qualitative data helps us understand human
experiences. With qualitative data interpretation, we can gain insights into people’s
hopes, fears and emotions.
• Understand Motivations: By interpreting qualitative data, we can try to understand the
‘why’ behind people’s actions. Why did someone choose a particular product? Why did
a group of students react a certain way to a learning activity? Qualitative data helps us
discover the motivations and thought processes that drive human behaviour.

Data Literacy 75
COMPARING QUANTITATIVE AND QUALITATIVE DATA INTERPRETATION
Your school has launched a new app for students to learn Japanese language. Suppose you are
collecting, processing and interpreting data.
Quantitative data may tell you how many students download and use the new app but qualitative data,
through interviews with students, may reveal their feelings about the look and feel of the app, course
content, and even suggestions for improvement.

Examples of qualitative data include:

• Responses to open-ended survey questions
• Interview transcripts
• Observational notes in a field study
• Written feedback or reviews

Collecting Qualitative Data

Qualitative data collection is not about ticking boxes; it is about capturing the ‘why’ behind
human behaviour. Some methods used with qualitative data collection may include the following:
• Interviews: In-depth conversations or one-on-ones may explore experiences and
feelings. These open conversations that go beyond ‘yes/no’ answers are very helpful to
understand feelings and emotions.
• Focus Groups: Small group discussions about a specific topic, discussing shared ideas
and different thought processes is possible by conducting brainstorming sessions with a
facilitator.
• Observation: Researchers watch participants in their natural environment to observe
and record their behaviour and interactions. For example, a teacher quietly observing a
classroom can analyze a lot of qualitative data about the students.
• Document Analysis: This entails examining existing documents like journals, social
media posts or historical records to understand past experiences. This may also include
data collected from case studies.
• Longitudinal Studies: This data collection method is performed on the same data source
repeatedly over an extended period.

Do it Yourself
What is the type of interpretation associated with each of the following images? Give reasons.

76 SUPPLEMENT—Decoding Artificial Intelligence–IX

FUN Trend Analysis: Longitudinal studies are a great way to understand how some specific data
TIME changes over time. For example, if you are interested in checking the interest of the audience
about K-pop in India, which keeps changing with time, it would be termed as longitudinal
study.
However, keeping a record of the daily, weekly, monthly or yearly data of people’s interests
is a tough job. Fortunately, Google Trends keeps a record and maintains a public website
to help with trend analysis.
To explore the Google Trends website for India, open the link
https://trends.google.com/trends/?geo=IN in your web browser or scan the
given QR code. On the Google Trends homepage, click Explore in the top menu.
Add the search term as K-Pop and select the duration by changing the value ‘Past 12 Months’
to the period for which you want to view the trend.

On choosing to view the trend from 2004 onwards, we get the following graph. It shows that
in India, people’s interest in K-Pop started growing August 2020 onwards but witnessed a
decline after August, 2022.

On ‘Home’ Tab, click on ‘Year in Search 2023’ and prepare a list of the top 5 news events and top
5 sports events in the year 2023. You can also select any previous year and repeat the search.

Data Literacy 77
To summarize, both qualitative and quantitative data interpretation methods are compared in
the table below:
Feature Qualitative Data Interpretation Quantitative Data Interpretation
Focus Emotions, feelings, experiences, insights and motivationsNumbers, measurements, statistics and trends
Data Source Interviews, focus groups, observations and documents Surveys, experiments, existing datasets and sensors
Analysis Themes, narratives and sentiment analysis Descriptive statistics (mean, median, mode) and inferential
statistics (hypothesis testing)
Outcome Understanding the ‘why’ behind behaviour and uncovering Identifying patterns, trends, relationships and making
motivations predictions
Example Analyzing interview records to understand student Calculating the average exam score across different class
experiences with a new teaching method sections

Remember
Choosing the right data interpretation technique depends on the specific questions that you are trying to answer
and the type of data you have. Data interpretation may be qualitative, quantitative or a combination of both
strategies. By combining different techniques, you can get a better understanding of the story your data tells.

Types of Data Interpretation

Data interpretation approaches can be classified into three main categories, depending
on whether our interpretation is being presented in textual (text), tabular (numbers) or
graphical (images or charts) data. These three categories lead to almost all forms of data
being generated and consumed by humans.
Interestingly, these categories also form the foundation of the three main branches of
Artificial Intelligence as shown in the given illustration.

TYPES OF DATA

Textual Data Tabular Data Graphical Data

Examples: Examples: Spreadsheets Examples:

Text in Human Languages CSV Files, etc Images, Videos, etc

AI Branch – AI Branch – AI Branch –

Natural Language Processing Data Science Computer Vision

Data Interpretation by Types

1. Textual Representation: In this approach, the interpretation or findings of the data

analysis process are reported in the form of textual data. This approach is only suitable
for small amounts of data.
• Sometimes textual interpretation is useful where detailed explanations are needed.
• It helps in highlighting the thought process and reasoning behind your data analysis.
• It is also useful to add detailed explanations alongside other visual formats for a
comprehensive understanding.
78 SUPPLEMENT—Decoding Artificial Intelligence–IX

Disadvantage: While text allows for details, lengthy passages can become difficult to
understand for audiences with limited attention span. Key points might get lost within
paragraphs, hindering retention of information.

For example, consider the following data statement and its textual interpretation:
Statement: A recent survey found that among high school students in India, 62% reported feeling
stressed ‘often’ or ‘all the time’ while only 18% said that they ‘rarely’ or ‘never’ experienced stress.
Furthermore, the survey results indicated that students with high social media usage were more likely
to report stress frequently as compared to those with lower social media usage.
Textual Interpretation: A recent survey found that over 6 out of 10 high schoolers in India often feel
stressed. High social media usage is also associated with increased stress levels thanks to comparisons
or too much online information/misinformation.

2. Tabular Representation: In the tabular form of data interpretation, data is represented

systematically in the form of rows and columns for clarity and precision.
• Tabular interpretations help in presenting large datasets with many variables and
precise values, allowing easy comparisons.
• They highlight trends or patterns across different categories.
• Table columns offer a quick reference point for specific data points within the
interpretation.

Disadvantage: Tables may not be the best choice for conveying complex relationships or
detailed information. They can become cluttered if overloaded with data.

Do it Yourself
Consider the given tabular interpretation and answer the questions that follow.
Frequency % of Source of % of Social % of Coping % of
of Stress Students Stress Students Media and Students Mechanisms Students
Stress Used
Often or All 58% Academic 45% High Usage 70% Report Talking to 65%
the Time Pressure (3+ hours/ Frequent Friends /
day) Stress Family
Occasionally 22% Peer 28% Moderate 55% Report Listening 42%
Pressure Usage (1-2 Frequent to Music /
hours/day) Stress Relaxation
Techniques
Rarely or 20% Family 18% Low Usage 40% Report Physical 38%
Never Expectations (Less than 1 Frequent Activity
hour/day) Stress
1. What percentage of high school students experience stress often or all the time?
2. How many students report experiencing stress rarely or never?
3. What is the leading source of stress for high school students?
4. How does the prevalence of stress from peer pressure and family expectations compare
to academic pressure?
5. Is there a correlation between social media usage and stress levels?
6. What is the most common coping mechanism used by students to manage stress?

Data Literacy 79
3. Graphical Representation: Graphical presentation of data interpretation represents data
as graphs, charts or infographics, making complex data easier to understand.
Graphical representation is helpful for audiences who are more visually oriented and
for those who are less data literate.

Do it Yourself
PIE CHARTS
Pie charts are graphical representation of ‘parts of a whole’. As the name suggests, these
charts are shaped as pie. Each slice of the pie represents the portion of the entire pie allocated
to a specific category.
• Pie charts are circular charts. The circle/pie is divided into as many sections as there
are categories.
• Any category with a bigger proportion in terms of value is shown with a bigger slice of
the pie.
PIE CHART

By Car

12%

13% Ride a Bicycle

Walk to School
50%
25%

Use School Bus

Answer the following questions:

1. Which is the most used transport medium by students? State the percentage of students
who use this medium.
2. Which is the least used transport medium?
3. Does the pie chart cover the transport mediums of all students? How can you tell?
(Hint: Do the slices add up to 100%?)

• Graphical interpretation helps in showcasing trends, patterns and relations within the
data.
• It is also helpful in simplifying complex relationships for audiences with different
data literacy levels.
BUBBLE CHART
For three Variables

DataFit
TV Advertising

Sales
Bigger
Bubble Size
= More Sales

Radio Advertising

• It is easy to highlight key findings and draw immediate attention to most important
data points.

80 SUPPLEMENT—Decoding Artificial Intelligence–IX

Do it Yourself
BAR CHARTS
Bar charts represent data using vertical or horizontal bars, with the length of each bar proportional
to the value of data. They are one of the simplest charts to draw and easy to interpret.
For example, consider the following illustrations. They represent vertical and horizontal bar charts,
showing a student’s marks in 5 subjects and the temperature across 5 Indian cities, respectively.
Marks Temperature
50

Srinagar
40

Secunderabad
30

Gandhinagar
20

10 Visakhapatnam

0 Sawai Madhopur
sh

e
nd

nc
gli

at
Hi

cie

ie
M
En

Sc
S

0 10 20 30 40 50
ta
Da

Can you interpret the given charts? Try answering the following questions:
• In which subject did the student score minimum marks?
• Which city has the highest temperature?

Disadvantage: Poorly designed graphical interpretation can lead to misinterpretations or hide

important details. Additionally, graphs may not be suitable for presenting all the details of
your interpretation.

Do it Yourself
LINE GRAPHS
Understanding Line Graphs: A line graph is created by connecting various data points. Line
graphs are created to show the change in quantity over time.
Look at the line graph below and answer the questions that follow.
LINE GRAPH Points showing
For Two Variables Month-wise
Sales
10000
8000
Sales

6000 Coffee
4000 Ice Cream
2000

Jan Feb Mar Apr May Jun

Month

1. Which month sees the maximum sales of coffee?

2. Which month witnesses the maximum sales of ice cream?
3. Which months have more coffee sales than ice cream?
4. What type of data interpretation tool is shown in the illustration?

Data Literacy 81
Remember
The best presenters cleverly choose and combine these formats. Use text to explain complex points, tables
for accurately measured data points and graphical charts to show trends. It is also helpful to choose your
interpretation based on the audience, the nature of your data and the message you want to convey. By
understanding the strengths and weaknesses of each format, you can transform your data interpretation
into a compelling story that resonates with your audience.

Importance of Data Interpretation

The importance of data interpretation lies in its ability to transform raw data into knowledge,
which can help in deciding what further actions to take, along with information-based
decision-making, problem-solving and innovation.
1. Communication and Storytelling: Raw data can be quite overwhelming. Data
interpretation helps to transform complex findings into clear and easy-to-understand
stories for the audience.
2. Informed Decision-Making: By interpreting data, you can uncover patterns, trends,
and relationships that would otherwise be invisible. Better data leads to better
information and, in turn, we can make better decisions based on its interpretation.
For example, consider an ice-cream retailer who has observed a dip in sales. Data
interpretation can identify if it is due to a seasonal trend, pricing issue or market
competition. This helps in making judicious decisions about how to set pricing, make
packaging attractive or even work on new flavours.
3. Reduced Costs: Data interpretation also makes it easier to reduce the cost of bad
decisions. For example, the ice-cream retailer would reduce the loss incurred by not
stocking the flavours which are not in demand.
4. Identification of Needs: Data interpretation helps us in identifying the needs or gaps
which are required to be addressed to improve our system processes. For example, the
analysis of data by the ice-cream retailer indicates that there is 30% wastage while ice
cream is being stored for distribution. While interpreting this data, it is found that there
is need for a more spacious refrigerator which will help in reducing wastage.

PROJECT: INTERACTIVE DATA

Dashboard & Presentation
Data literacy enables us to understand how to communicate information and how to
understand data presented as information. Data dashboards and presentations are tools for
conveying information. The basic difference between these tools depends on whether we want
the information to be available in real time and take immediate action or whether we just
want the information to be presented for decision-making later.

82 SUPPLEMENT—Decoding Artificial Intelligence–IX

Data Dashboard
Data dashboard is a real-time dynamic information centre which provides continuous
monitoring and reporting of important data points. It is ideal for situations where you need to
stay updated on performance or trends.
For example, a video game dashboard may include the following information in real time:
• Achievements: Monitor progress of in-game achievements and badges.
• Friends: View online friends and recent interactions.
• Vital Statistics: Life availability, health packs and other important resources that you
have collected or are spending.
• Avatar Customization Items: View and manage your collected clothing, accessories and
animation.
• Game Passes and Items: See the game passes and items that you have purchased within
specific games.
It is important to note that most of the above-mentioned information will lose its relevance if
presented later.
Benefits: Dashboards provide real-time insights, promote data-driven decision-making and
help to maintain easy information-sharing within teams.

Data Presentation
Data presentation is a customized data story which communicates a specific message
and persuades or informs an audience. It is most suitable for one-time presentations on a
particular topic.
For example, a presentation with data showing a campaign on the progress made by India in
the last 10 years along with the projected growth may be called data presentation.
A research presentation that shares findings from a study and includes charts, graphs and
data tables with detailed explanations is also an example of data presentation.
Benefits: Data presentation is useful for clearly communicating complex information
by engaging the audience with a narrative. It helps to promote a deeper understanding of
the topic.

Tools for Data Dashboards and Data Presentations

Commonly, data visualization software tools like Tableau, Power BI and Google Data Studio
enable users to create interactive and dynamic data visualizations which may be useful for
Data Dashboards. On the other hand, presentation software like PowerPoint, Keynote and
Google Slides are useful for the creation of slides incorporating data stories and narratives
with visual elements. In the next section, we shall explore a very versatile tool called Tableau
for creating, sharing and exploring interactive data visualizations.

Data Literacy 83
Data Visualization using Tableau
Tableau is a powerful data visualization and business intelligence (BI) platform that helps
people see and understand data more effectively. It allows users to connect to various data
sources, explore and analyze information, and create interactive visualizations like charts,
graphs and dashboards.
The following are the two versions of Tableau:
Tableau Desktop (Paid): The full-fledged version, Tableau Desktop, offers a wide range
of functionalities. It allows you to perform complex data analysis and create sophisticated
visualizations. Students and teachers can request access to Tableau Desktop free of cost using
their school-issued email IDs.
Tableau Public (Free): Tableau Public is a free version with a limited feature set. You can
connect to common data sources like spreadsheets and cloud platforms, create basic to
moderate visualizations and publish them on the Tableau Public platform. We shall work with
Tableau Public to get familiar with Data Visualization.
Follow the step-by-step tutorial for learning the basics of Tableau:
Step 1: To begin, visit https://public.tableau.com/

Step 2: Create your account and complete the registration details.

84 SUPPLEMENT—Decoding Artificial Intelligence–IX

Step 3: Once logged in, you will come across the visualization of the day. Explore some
visualizations for inspiration.

Step 4: Select dataset—You may upload your own data or use the sample data available
on Tableau for experimenting. Click on the Learn tab, which contains simple
self-learning How-To Videos and a Sample Data tab. Download EU Superstore Sales
dataset by clicking on Dataset (xls) link.

Step 5: Once the dataset is available, click Create in the navigation bar to access free data
visualization tools of Tableau Public. Choose Web Authoring which will enable you
to create visualizations—viz in short—directly in the web browser. Web authoring
makes it possible to create a viz without installing any software.

Data Literacy 85
Step 6: On clicking Create, Tableau seeks the dataset that you intend to visualize. Click on
Upload from computer to select the downloaded dataset.

Step 7: Drag the Orders table to the canvas.

Step 8: Below the canvas is a data grid. Click Update Now in the data grid to view the first 100
rows of the dataset.

86 SUPPLEMENT—Decoding Artificial Intelligence–IX

Step 9: Tableau automatically detects and assigns the data type for fields in the table. You may
check the data type associated with each column as illustrated below. Interestingly,
Tableau also associates the data types pertaining to locations with their geographic
role automatically. Check out the Orders City column illustrated below.

Step 10: Drag the Returns table to the right of the Orders table on the canvas. It opens a
relationship edit window beneath the canvas. Relationships define how the tables
relate to each other. In this case, Orders and Returns are both identified by the
common field Order ID.

Data Literacy 87
Step 11: Rename the Sheet as my_first_tableau and click Publish to publish the viz with
your chosen name.

Step 12: Click Sheet 1 on the bottom-left corner of the screen. This changes our interface
to show us the extracted data fields.

Step 13: Choose Your Data Fields—When Tableau connects to this dataset, it assigns the
fields to either Dimensions or Measures. The qualitative fields that describe
the categories of data are in the top part of the pane under Dimensions. The
quantitative fields that measure the categories of data are in the bottom part of the
pane under Measures.
Step 14: Drag out a quantitative field or measure to find out ‘how many’. We will use Sales.
Notice the field displayed in the Columns shelf in green in the given screenshot.
Tableau creates a long bar and an axis showing a range of values.

88 SUPPLEMENT—Decoding Artificial Intelligence–IX

Step 15: We can now add qualitative fields or dimensions to better understand our data. If we
add the dimension State, we can see a bar graph taking shape. The single bar breaks
into multiple, one for each city. Dimension fields are displayed in blue when brought
onto the sheet.

Step 16: Organize the data—use the Sort button in the Menu bar to sort your data in the
ascending or descending order.

Data Literacy 89

Congratulations, you have created your first Tableau Visualization!

Step 17: Let us draw an alternative visualization—Use
‘Show Me’
On the top right corner of the Tableau canvas,
click on the ‘Show Me’ button to see
alternative visualization options available for
the data in use. It opens a list of visualizations
which are applicable and also tells what other
requirements in terms of dimensions or measures
are needed if we select a specific visualization.

90 SUPPLEMENT—Decoding Artificial Intelligence–IX

Let us choose the 4th option on the list to create a Map Visualization showing sales in
each state with dot plots. Try to hover the mouse over any dot to see how Tableau charts
interact with the user.

Experience AI
This is just the beginning—you can take your learning to the next level by checking out
the YouTube video from Tableau. To do so, scan the given QR code or open the link
https://youtu.be/iT1iHLGawIM in your web browser.

Data literacy is essential. It will be very helpful for everyone to learn these skills for the
modern world, where data-driven decision-making is increasingly important. With a good
conceptual grasp of data literacy, you will be able to interpret and analyze data effectively,
leading to more informed and accurate decisions.

One must also learn to differentiate between reliable and unreliable data sources, which is
vital in an era of information overload and misinformation. Proficiency in tools like Tableau
for creating interactive and visually appealing data presentations shall be helpful in building
a great professional profile in any career choice.

FUN Visit https://www.datawrapper.de/ and explore visualizations. To learn how to

TIME use Datawrapper to create visualizations, scan the given QR code or open the link
https://academy.datawrapper.de/article/245-how-to-create-your-first-
datawrapper-chart in your web browser.

Data Literacy 91
Memory Bytes
Data Literacy: The ability to read, understand, interpret and communicate with data.
Data vs Information: Data is raw facts while information is processed data.
DIKW Model: Data, Information, Knowledge and Wisdom hierarchy explains the value of data processing.
Data in Computing: Refers to data in the context of computer systems and digital formats.
Importance of Data Literacy: Helps in making informed decisions, understanding data presentations and
avoiding misinformation.
Types of Data: Quantitative (numerical) and Qualitative (descriptive)
Quantitative Data: Data that can be measured and counted, e.g., height, weight, age, etc.
Qualitative Data: Data that describes qualities or characteristics, e.g., color, name, type, etc.
Discrete Data: Quantitative data that can be counted and has distinct values like the number of students.
Continuous Data: Quantitative data that can take any value within a range such as the temperature.
Categorical Variables: Variables that represent categories and are often used for ‘Yes/No’ questions.
Data Collection and Acquisition: Methods of gathering data from various sources.
Data Cleaning and Preprocessing: Preparing raw data for analysis by removing errors and inconsistencies.
Data Analysis and Interpretation: Using statistical methods and tools to derive insights from data.
Data Visualization: Presenting data visually using charts, graphs and infographics to communicate findings.
Critical Thinking: Evaluating data sources and the validity of data to make informed decisions.
Ethical Use of Data: Considering privacy, security, confidentiality and bias when working with data.
Data Mindset: Believing that data is abundant and can solve problems when properly utilized.
Data-enabled Questioning: Learning to ask questions that can be answered using data.
Describing Data: Understanding how data is categorized, stored and structured.
Graph Types: Different graphs for data visualization include bar charts, pie charts and line graphs.
Impact of Data Literacy: Helps in various fields like healthcare, education, business and everyday
decision-making.
Steps to Achieve Data Literacy: Understanding the basics, learning analysis techniques, gaining hands-on
experience, developing critical thinking and ethical data use.

Exercises
Objective Type Questions
I. Multiple Choice Questions (MCQs):
1. What is data literacy?
(a) The ability to code
(b) The ability to read, understand, interpret and communicate with data
(c) The ability to create databases
(d) The ability to design websites

92 SUPPLEMENT—Decoding Artificial Intelligence–IX

2. Which of the following is NOT a step to achieve data literacy skills?
(a) Achieving a Data Mindset
(b) Data Collection and Acquisition
(c) Building Physical Databases
(d) Data Cleaning and Preprocessing
3. What does the term ‘data’ refer to?
(a) Processed and organized information
(b) Raw facts, observations, measurements or records
(c) Stories and narratives
(d) Conclusions and decisions
4. How is information derived from data?
(a) By adding context and processing the data
(b) By collecting more data
(c) By discarding irrelevant data
(d) By summarizing raw data
5. In the DIKW model, what does ‘W’ stand for?
(a) Work (b) Wisdom
(c) Width (d) Weight
6. What is a key component of data security?
(a) Sharing data widely
(b) Using strong passwords and encrypting data
(c) Allowing open access to data
(d) Publishing personal information online
7. Why is data privacy important?
(a) To increase data sharing
(b) To protect personal information of individuals
(c) To make data accessible to everyone
(d) To reduce data storage costs
8. What does ‘data-enabled questioning’ involve?
(a) Ignoring data during decision-making
(b) Learning to frame questions that can be answered using data
(c) Collecting data without questions
(d) Analyzing data without any context
9. Which term describes the visual representation of data?
(a) Data Cleaning
(b) Data Acquisition
(c) Graphs and Charts
(d) Algorithms
10. What is the primary goal of data cleaning?
(a) To remove all data
(b) To ensure data is accurate, consistent and complete
(c) To create new data sets
(d) To make data more complex

Data Literacy 93
11. How is knowledge different from information in the DIKW model?
(a) Knowledge is unprocessed data while information is processed data.
(b) Knowledge adds context and meaning to information.
(c) Information is more useful than knowledge.
(d) Knowledge is more irrelevant to decision-making than information.
12. What role does critical thinking play in data literacy?
(a) It helps in creating new data.
(b) It involves evaluating data and making evidence-based decisions.
(c) It reduces the amount of data needed.
(d) It eliminates the need for data analysis.
13. What is the function of an algorithm in data analysis?
(a) To create visual representations
(b) To process data step by step
(c) To ignore data patterns
(d) To delete data
14. Why is it important to describe data accurately?
(a) To reduce the amount of data collected
(b) To ensure proper understanding and categorization
(c) To increase data complexity
(d) To avoid data visualization
15. What is a common method to secure data online?
(a) Using weak passwords (b) Encrypting data
(c) Sharing data publicly (d) Avoiding data backups
16. In data literacy, what does ‘data visualization and communication’ entail?
(a) Storing data in physical formats
(b) Presenting data using charts, graphs and infographics
(c) Deleting unnecessary data
(d) Ignoring data analysis
17. What is meant by ‘data preprocessing’?
(a) Discarding irrelevant data
(b) Transforming raw data before analysis
(c) Collecting new data
(d) Ignoring data patterns
18. How can data literacy help in everyday life?
(a) By increasing the amount of data one collects
(b) By understanding and making sense of data in various contexts
(c) By reducing the use of data
(d) By ignoring data in decision-making
19. What is the significance of context in data interpretation?
(a) Makes data meaningless
(b) Helps transform data into meaningful information
(c) Complicates data analysis
(d) Reduces the accuracy of data

94 SUPPLEMENT—Decoding Artificial Intelligence–IX

20. Why is ethical and responsible data use important?
(a) For sharing data freely
(b) For ensuring data privacy and security
(c) For making data collection easier
(d) For increasing data complexity
21. What does data analysis and interpretation involve?
(a) Collecting raw data
(b) Analyzing data using statistical methods and tools
(c) Storing data securely
(d) Deleting unnecessary data
22. How does data literacy help in avoiding misinformation?
(a) By collecting more data
(b) By understanding and verifying the source and context of information
(c) By sharing information widely
(d) By reducing the amount of data analyzed
23. What is the difference between data and information?
(a) Data is processed and meaningful while information is raw and unprocessed.
(b) Data is raw and unprocessed while information is processed and meaningful.
(c) Data is more useful than information.
(d) Information is always numerical while data is not.
24. What is an example of personal data?
(a) General market trends
(b) An individual’s phone number
(c) Aggregate sales data
(d) Weather reports
25. Why is it important to have a ‘data mindset’?
(a) To avoid using data in business
(b) To recognize the abundance of data and its potential to solve problems
(c) To increase data storage costs
(d) To ignore data patterns
26. What is Tableau primarily used for?
(a) Word Processing (b) Data Visualization
(c) Web Development (d) Graphic Design
27. Which version of Tableau is free to use?
(a) Tableau Desktop (b) Tableau Public
(c) Tableau Server (d) Tableau Online
28. Which of the following is NOT a feature of Tableau Public?
(a) Connecting to spreadsheets
(b) Creating interactive visualizations
(c) Complex data analysis
(d) Publishing visualizations online
29. In Tableau, what does the ‘Orders’ table represent in the context of the tutorial?
(a) List of customers (b) Sales data
(c) Product inventory (d) Employee details

Data Literacy 95
30. What is the first step to start using Tableau Public?
(a) Downloading the software
(b) Purchasing a licence
(c) Creating an account on the Tableau Public website
(d) Installing a plug-in

Subjective Type Questions

Unsolved Questions:
1. Define data literacy and explain its importance in the modern world.
2. Differentiate between data and information with suitable examples.
3. Describe the DIKW model and explain the significance of each component.
4. Explain the difference between quantitative and qualitative data. Provide examples of each type.
5. Distinguish between discrete and continuous data with examples.
6. What are categorical variables? Provide an example and explain how they are used.
7. Discuss the various methods of data collection and acquisition. Why is proper data collection important?
8. Explain the process of data cleaning and preprocessing. Why is this step crucial before data analysis?
9. Describe the different statistical methods used in data analysis and their purposes.
10. What is data visualization and why is it important? List and describe the three types of graphs used in
data visualization.
11. How does critical thinking contribute to effective data literacy?
12. Discuss the ethical considerations when using and handling data. Why are they important?
13. What is data mindset and how does it influence one’s approach to problem-solving?
14. Explain the concept of describing data. How does it help in understanding the data better?
15. What are the key differences between primary and secondary data sources? Provide examples.
16. Describe the role of algorithms in data processing and analysis. Provide an example of a simple algorithm
used in data analysis.
17. Explain the significance of data accuracy, consistency and completeness in data analysis.
18. Discuss the impact of data literacy on decision-making in various fields such as healthcare, education
and business.
19. What are the common data security measures that can be taken to protect sensitive information?
20. How can data literacy help individuals and governments respond to crises such as the COVID-19 pandemic?
21. Explain the concept of real-time data monitoring and its benefits. Provide an example of its application.
22. Discuss the role of government initiatives in enhancing data privacy and security. Provide an example of
such an initiative.
23. What are the common challenges faced in data collection and how can they be overcome?
24. Describe the process of data interpretation and its importance in making data-driven decisions.
25. How can one develop data literacy skills? List the steps and describe their significance.
26. Explain the difference between Tableau Desktop and Tableau Public.
27. What are the key features that make Tableau a powerful tool for data visualization?
28. What are dimensions and measures in Tableau? Provide examples of each.
29. Discuss the significance of interactive visualizations and dashboards in business intelligence (BI).
Provide examples of how they can be used in real-world scenarios.

96 SUPPLEMENT—Decoding Artificial Intelligence–IX

Mathematics for AI
3 (Statistics & Probability)

Learning Objectives
Understanding the importance of Mathematics as a foundation of Artificial Intelligence
Understanding Statistics as a means to collect, organize and analyze data using various
statistical tools such as dot plots, tally charts and more
Defining probability, calculating the likelihood of events and interpreting sample spaces in
various probability scenarios
Explaining measures of central tendency (mean, median, mode) and graphical representations
to analyze and communicate data effectively
Demonstrating how mathematical concepts, especially Statistics and Probability, are essential
for developing and enhancing AI applications

IMPORTANCE OF MATHS IN AI
Let us play a game before we begin this chapter. Guess the next number in the sequence for
the illustrations below:

4 12

1, 3, 5, 7 ? ?
I. II. 8 16
Easy guesses? In the first case, the pattern is simply an arithmetic series while in the second
case, there is an arithmetic series pattern along with a geometrical pattern, both changing
together. The human mind is a powerful creation. We have a natural gift for recognizing and
understanding patterns when we see them, and we can make inferences about what happens
next in the pattern.
Mathematical patterns are also distributed throughout the natural world—from the intricate
spirals of seashells to the branching patterns of trees.
Consider some examples:
1. Fern: The tiny leaflets echo the shape of the entire fern leaf structure,
creating a beautiful display of self-similarity.
Did you notice a pattern here—the increasing/decreasing size of structure
and the number of leaflets as we go into more detail?
2. Symmetry: Symmetry is the balanced and proportional arrangement
of parts. It is visible in snowflakes, butterfly wings and the intricate
patterns of flowers.
3. Fibonacci Sequence: This sequence, where each number is the sum of the two preceding
ones (0, 1, 1, 2, 3, 5, etc.), appears in the arrangement of seeds in a sunflower head as
illustrated below:

0+1=1
1+1=2
1+2=3
2+3=5
3+5=8
5 + 8 = 13
..............
34
55
5
8
21
3
11
2

These are just a few examples of the many mathematical patterns that grace the natural
world. Mathematics can be used to explore and explain a lot of patterns that occur not only in
our world but also in data.

Some common mathematical series are presented below:

1. Arithmetic Sequence: Pattern: 2, 4, 6, 8, 10...
• Rule: Add 2 to the previous number. Next Number: 12
2. Geometric Sequence: Pattern: 3, 9, 27, 81...
• Rule: Multiply the previous number by 3. Next Number: 243
3. Fibonacci Sequence: Pattern: 0, 1, 1, 2, 3, 5, 8...
• Rule: Add the two immediately preceding numbers. Next Number: 13
4. Square Numbers: Pattern: 1, 4, 9, 16, 25...
• Rule: Each number is the square of its position in the sequence (1^2, 2^2, 3^2, etc.)
Next Number: 36
5. Even Numbers: Pattern: 2, 4, 6, 8...
• Rule: Each number is 2 times its position in the sequence. Next Number: 10
6. Powers of 2: Pattern: 1, 2, 4, 8, 16...
• Rule: Each number is 2n, where ‘n’ starts with 0. Next Number: 32

These patterns follow specific rules. By understanding these rules, we can accurately predict
what comes next in the pattern. This skill in recognizing patterns is valuable in two key areas:
Mathematics and Artificial Intelligence (AI).

98 SUPPLEMENT—Decoding Artificial Intelligence–IX

Mathematics and Patterns
In Mathematics, patterns help us see connections and structures within numbers or objects.
This lets us make predictions, solve problems and even build new mathematical theories.

AI Learns from Patterns

Just like humans, AI systems use patterns to make sense of information. By recognizing
patterns in data, AI can predict future events, categorize information and even make
decisions. For instance, AI can analyze weather patterns to predict storms, identify objects in
photographs or recommend products you may like based on your purchase history.

Do it Yourself
Observe the illustrations and answer the given questions.
1. The chart shows pollutants present in the Delhi atmosphere a year before and a year
after the COVID-19 lockdown.
(a) Identify which part of the graph shows the COVID-19
lockdown period.
(b) What change in the pattern helped you in identifying
the COVID-19 lockdown period? B F
A D
(c) What do you think is the reason for the change in the C E G
pattern?
(d) Do you think that the pollutants in Delhi air follow
a seasonal trend? Which portions of the graph show Pollutants in Delhi Air
repeated patterns?
1
2. Look at the number pyramid carefully. Do you observe a 1 1
pattern? 1 2 1
(a) Write your observations about the pattern. 1 3 3 1
1 4 6 4 1
(b) What should be the next line in the number pyramid? 1 5 10 10 5 1
1 6 15 20 15 6 1
1 7 21 35 35 21 7 1

3. Which of the following shapes should come in the blank space? Tick the correct one.

  

‘The key to artificial intelligence has always been the representation.’

—Jeff Hawkins

Mathematics for AI (Statistics & Probability) 99

Applications of Mathematics in AI
The evolution of Artificial Intelligence has always been associated with various fields
of Mathematics. Mathematics provides the foundation for creating, understanding and
improving AI systems.
The early rule-based systems of Artificial Intelligence to the modern learning-based
approaches—all are developed using Mathematics in various tasks, as listed below:
Understanding Data: Mathematics helps in understanding and manipulating data, which is
essential for AI. Linear algebra, for example, deals with matrices that are used to represent
data in AI algorithms.
Probability and Statistics: Many AI techniques, especially in Machine Learning, depend on
probability and statistics to make predictions and decisions based on data. These fields help in
handling uncertainty and making informed guesses.
Pattern Recognition: Identifying patterns is a core function of AI and Mathematics provides
the tools to describe and analyze these patterns. This includes recognizing trends, making
predictions and understanding relationships within data.
Optimization: Many AI problems involve finding the best solution from many possible options.
Techniques from mathematical optimization help in solving these problems efficiently.

Important Mathematical Concepts for Understanding AI

The ability to make predictions based on patterns in data is a powerful tool that helps us solve
problems and make informed decisions. Artificial Intelligence uses Mathematics for:
• Exploring Data (using Statistics): Analyzing and summarizing data to understand its
characteristics. For example:
◼ Mean (Average): What is the average score of students in a class?
◼ Median (Middle Value): What is the middle value of house prices in a neighbourhood?
◼ Mode (Most Common Value): Which product is most frequently purchased in a store?

• Finding Out Unknown or Missing Values (using Linear Algebra): Representing and
manipulating data using matrices to solve equations and model data. For example:
◼ Solving Systems of Equations: How many apples and oranges were sold if the total
number of fruits sold and their combined price are known?
◼ Matrix Operations: How to transform an image using rotation and scaling in
computer graphics?

• Calculus (Training and Improving AI Models): Adjusting and fine-tuning AI algorithms

to make them work better. For example:
◼ Derivatives: Helps in understanding how data items change with respect to each other.
For example, consider mixing colors to get the perfect shade of green. If you add
more blue, how does this affect the green shade? Derivatives help figure out how
small changes (like adding a drop of blue) affect the overall outcome (final color).
◼ Integrals: Consider integrals as finding the total amount. For example, if you know
how fast a car is going at different intervals, an integral helps you figure out the total
distance the car travels over a period of time.
100 SUPPLEMENT—Decoding Artificial Intelligence–IX
• Probability (Predicting Different Events): Assessing the likelihood of different
outcomes and making predictions based on data. For example:
◼ Coin Toss: What is the probability of getting heads in a coin toss?
◼ Weather Forecasting: What is the likelihood of rain based on current weather patterns?
Let us explore more about Statistics and Probability in the sections ahead.

STATISTICS
What is Statistics
Statistics is a special branch of Mathematics that is associated with the collection and
organization of facts and figures. Large volumes of unorganized data do not make sense
on their own but when statistical tools are applied to them, we may obtain a wealth of
information. Statistics gives meaning to raw data, allowing us to draw conclusions. Defined
formally, Statistics is the science of collecting, describing, organizing and analyzing data in
order to derive meaningful insights and inferences from it.

The term ‘Statistics’ is derived from the Latin ‘Statisticum Collegium’ which means ‘Council of
the State’ and was used in reference to the council which conveyed to kings the summarized
information about population, military, land, agriculture and the like. The word gave rise to the
Italian ‘Statista’, and later to the German ‘Statistik’ which signified ‘Science of the State’.

Where It All Began

The human fascination with numbers goes back to over 35,000 years. In recorded human
history, the Lebombo bone and the Ishango bone are the oldest known mathematical artefacts
and both have been used to document some form of count with the help of engravings on
bones. Right through human evolution, the counting system has been in use—for the solitary
cavemen to count their kills to small clans for counting family members or the cattle.
Interestingly, even the earliest form of trading, called the Barter System, was based on the
idea of exchanging a certain number or quantity of one item for a certain number or quantity
of another item, which required the traders to understand the concept of ‘more or less’ and
some ‘numeric value’ associated with the items, based on how abundant or scarce the traded
item was.

The oldest mathematical artefacts

Lebombo Bone (35000 BC) Ishango Bone (20000 BC)

Mathematics for AI (Statistics & Probability) 101

From the humble beginnings of simple counting, humans learnt the art of describing counts
by numbers and began to understand the natural events based on count—the distance
measured in counts of feet, seasonal variations from the length of days, weather forecast
from the count of elapsed sunsets and so on. With increasing intelligence, humans could make
more sense of the ‘data’ which the counts provided and developed the science of data, which
we now know as modern-day Statistics.
The most common application of Statistics in its early years was in population counting,
commonly known as the Census. The process of census allowed governments to frame their
overall policies for the benefit of maximum citizens.

UNDERSTANDING STATISTICS: THE CRICKET WAY

India has great fascination for cricket and cricket fans collect and trade trump cards of their
favourite players. These cards come in packets and you never know which player you will
come across in the pack.
Suppose you are a collector of cricket cards. Given below is your complete collection:

Which card is occurring the most number of times? Which is the rarest card in your collection?
How many ‘Gold’ cards do you have?
If you have an organized collection and know which cards are the most common in your
collection (like finding lots of Bumrah cards) or the rarest (maybe a Sachin Tendulkar
Gold Trump card), it would be easier for you to choose which cards to trade with your friends.
Statistics helps you organize your information and find answers to questions based on
data.

102 SUPPLEMENT—Decoding Artificial Intelligence–IX

With the knowledge of Statistics, you can do the following:
1. Collect Data: Open your card packets and carefully sort the cards by players.
2. Count: How many cards do you have of each player? Create a tally list. Which card has
the maximum frequency?

Player Name Tally Count

Jadeja \\ 2
Bumrah \\\\ * 5 (Most)
Kohli \\ 2
Tendulkar \ 1 (Rare)

3. Analyze: Now you can see the pattern. Which player has the most and the least number
of cards? Are there any players you haven’t found yet?
By collecting data (cards), counting and analyzing information, you are using statistics to
better understand your card collection.

Frequency
Frequency represents the number of times an observation occurs while recording data. Let us
do a simple experiment to understand the concept of ‘frequency’. We shall cast a die 20 times
and record the observations as illustrated below. Recall from the terminology section that our
data may be qualitative or quantitative in nature—frequency is always quantitative in nature.

Observations

Value Obtained on Twenty Throws of Die

Now, count the number of times each value appears on the die and record the results in a table
as shown below:

Value Tally Frequency

1 \\\ 3
2 \\\ 3
3 \\\ 3
4 \\\\ * 5
5 \\\\ 4
6 \\ 2
Total 20
Count of Observations

*We use a strikethrough line after the tally reaches 4 (\\\\) to represent 5 (\\\\).

Mathematics for AI (Statistics & Probability) 103

Tally
As you may note from the table, tally represents the number of times a value appears. Tally
is a visual tool used in counting since Prehistoric times, from around 35000 BC, with lines
etched on bones, rocks or trees. One line represented one item. It was eventually replaced with
numbers but is still a great tool for counting observations.
The numeric representation for the number of occurrences of a die value is its frequency.
We may observe from the previous table that the frequency of the die value 4 stands at 5.
We may also observe that the sum of all individual frequencies is equal to the total number
of items.

Dot Plots
Another representation of the numerical count of observations can be made visually and it
is called a dot plot. This representation shows all possible values on a number line and each
observation, corresponding to a value, is placed as a dot over it as shown in the illustration.
Please note that the dots are equal to the number of times an observation occurs (frequency).
The dot plot representation for the 20 throws of die is shown below. It gives an instant idea
about which is the most frequently occurring number on the die.

Dot Plot Representation

Frequency

1 2 3 4 5 6
Number on Die Face

FUN 1. Count the frequency of each animal in the illustration below and create a table showing
TIME tally and frequency. Create a dot plot to visualize the same.

2. Calculate the frequency of each vowel in the following text and make a table showing
tally and frequency. Create a dot plot to visualize the same.

104 SUPPLEMENT—Decoding Artificial Intelligence–IX

‘I have studied many languages—French, Spanish and a little Italian, but
no one told me that Statistics was a foreign language.’

—Charmaine J. Forde

Statistics: What does it do?

Statistics helps you observe patterns (e.g., the number of cards you have of each player) and
summarize your collected data in the form of information you can use. Statistics helps with
the following:
• Looks at the ‘what’: Statistics examines the patterns and trends in the data you have.
It does not tell you why things happen but shows you what is actually happening in your
collection. For example, it may inform you that you have more Jasprit Bumrah cards in
your collection as compared to Virat Kohli cards.
• Describes and summarizes data: Statistics helps you clearly understand your data.
Instead of just looking at a pile of cards, you can see a neat list or chart of how many
cards of each player you have.
• Helps you understand what is happening in your data: By organizing and describing
your data, statistics helps you make sense of it. You can see if there are any surprising
patterns or if your collection reflects what you expect (maybe more cards of the current
team players).

Applications of Statistics
Climate Action
Statistics play a crucial role in environmental monitoring by
helping us understand and track changes in environmental
factors over time. For example, the image shows the
per-capita CO2 emissions of various countries, indicating
which nations have the highest emissions. By collecting
and analyzing this data, we can identify trends, compare
emissions between countries and check the effectiveness of
policies aimed at reducing carbon footprints.
This type of statistical analysis is directly related to
Sustainable Development Goal (SDG) 13: Climate Action.
SDG 13 aims to combat climate change and its impact by
taking urgent action to reduce greenhouse gas emissions
and strengthen resilience to climate-related hazards.
Monitoring emissions through statistics helps track
progress towards these goals and implement policy
decisions that aid in mitigating climate change. Source: World Bank

Mathematics for AI (Statistics & Probability) 105

Weather Forecasting
Statistics is used to help weather forecasters predict the weather. They use data from the past,
analyze trends and create models. These models help them make informed guesses about
what the weather will be like.

Eradication of Poverty
Statistics are essential in tracking and understanding the changes in poverty levels as shown
in the given image. Statistics help governments in tracking the ratio of people living in
poverty. Consider the following Niti Aayog India illustration:

Steep decline in
2013-14 (Projected)

2022-23 (Projected)
24.82 crore
29.17%
Poverty individuals estimated
Headcount 11.28% to have escaped
multidimensional
Ratio poverty during last
during the last
9 years
9 years

The above image shows a significant decline in the Poverty Headcount

Ratio—from 29.17% in 2013-14 to 11.28% in 2022-23. This statistical data
1 NO
POVERTY

helps measure the percentage of the population living below the poverty
line over time. It also highlights the scale of poverty reduction efforts and
their impact on people’s lives.
The image also indicates that India has likely achieved SDG 1 which aims to reduce
multidimensional poverty by at least half by 2030. This is a significant milestone, showing
that India is ahead of schedule in its efforts to alleviate poverty.

Disaster Management
Statistics are vital in disaster management for understanding and preparing for natural
disasters. Authorities use statistics in disaster management to:
1. Alert Citizens: Predict and warn people in areas likely to be affected by natural
disasters.
2. Understand Impact: Know the number of people, services and buildings in the affected
areas.
3. Allocate Resources: Efficiently plan and provide necessary resources like food, water
and medical aid.
4. Improve Response: Analyze past disasters to improve future response and recovery
efforts.
5. Plan Evacuations: Design safe and effective evacuation routes based on population
data.

106 SUPPLEMENT—Decoding Artificial Intelligence–IX

Statistics help make informed decisions, ensuring better preparedness and response during
disasters.

Disaster
Impact

During
Preparedness (Emergency) Response

Before After
Prevention & Recovery
Mitigation

Risk
Assessment

The above image shows different activities associated with a disaster mitigation and recovery
cycle. Both before and after activities, as shown in the disaster management cycle, utilize
statistics for insightful operations.
For example, the following graph from Our World in Data illustrates the number of recorded
natural disaster events over time. This data helps identify trends, predict future disasters and
effectively allocate resources for prevention.

Number of recorded natural disaster events, 1900 to 2023

The number of global reported natural disaster events in any given year. Note that this largely reflects increases in
data reporting, and should not be used to assess the total number of events.

All disasters
400

300

200

100

0
1900 1920 1940 1960 1980 2000 2023

Source: EM-DAT, CRED/UCLouvain (2024); OurWorldInData.org/natural-disasters | CC BY

It is important to note that the graph may show fewer events before 1980 due to
underreporting. Thus, careful interpretation is needed to understand the true historical
trends and to ensure accurate planning and response strategies.

Mathematics for AI (Statistics & Probability) 107

The illustration for climate-related deaths per decade is shown below. Answer the following
questions based on the chart.

Source: OFDA/Cred International Disaster Database

1. What trend do you observe in the number of deaths related to climate-based disasters?
2. Why do you think this trend is happening? What factors are possibly leading to this
change?

Disease Prediction
Statisticians track diseases by analyzing data such as the number of cases, deaths and tests
conducted. This data helps create a picture of how a disease is spreading among people. By
analyzing trends and patterns, statisticians can predict how quickly a disease may spread in the
future. This allows for better preparedness and resource allocation. For example, consider the
COVID-19 outbreak timeline in India for the first wave, as shown in the following illustration.
First wave peak
100,000 cases
1 case
st
10,000,000 cases
10,000 cases
10,916,589 cases
New cases: 97,860

1000 cases
15 Feb 2021
30 Jan 2020

J F M A M J J A S O N D J F

National
Lockdown

100 deaths 1,000,000 tests 10,000,000 tests

1,55,732 deaths
1000 deaths
Vaccination
10,000 deaths 100,000 deaths

108 SUPPLEMENT—Decoding Artificial Intelligence–IX

Statistical models were used to assess the effectiveness of interventions like lockdowns or
vaccination campaigns. By comparing data before and after interventions, statisticians could
see whether these measures were helping to slow the spread of the disease.

Sports
Statistical analysis has become significantly important in sports management with
technological advancements. Let us consider the Indian Premier League (IPL) as an
example.

• Picking the best players: Statistics like runs scored, average (AVG), strike rate (SR)
and hundreds (100) help choose the best players. Virat Kohli has the most runs, so he is
ranked high in the given example.

• Knowing a player’s skills: Statistics like average and strike rate show how good a player
is at scoring runs often (average) and how fast they score (strike rate). A high strike
rate means a player scores runs quickly while a high average means they score runs
consistently.
• Knowing how players match up: By looking at a player’s statistics against a specific
bowler, teams can guess how the player will do in that game. For example, they might
see if a player who scores runs fast has trouble against slow bowlers.

• Planning the game: Statistics help coaches make plans for the upcoming matches. By
looking at how well a team scores runs—quickly or slowly—the coach can decide the
best game strategy.

• Finding new stars: Statistics can help find talented young players who are not famous
yet. Talent Scouts may look for players with high strike rates or good averages in lower
leagues.

Mathematics for AI (Statistics & Probability) 109

Activity 1
SPOT THE CAR!
Scan the given QR code or open the link given below in your net browser:
https://www.youtube.com/watch?v=4A5L3x3TVuc&ab_channel=CarvingCanyons

Data Collection
While watching the video, complete the table given below using the tally method of counting.

Car Color Number of cars spotted

Black
Red
White

Data Analysis
• Which is the most frequently occurring car color in the video?
• How many cars can be seen in all?

Data Interpretation
• Can you infer the most popular color choice of the residents of this area? Elucidate.

Activity 2
SOCIAL MEDIA STATS
Question: How much time do students in your grade spend on social media each day?
• Data Collection: Conduct a survey by asking your classmates how much time they typically
spend on social media daily.
• Data Analysis: Organize the accumulated data into a frequency table showing how many
students spend similar amount of time on social media daily. Calculate the average time spent
to see how much time a typical student in your class spends on social media every day.

Activity 3
SPORTING CHAMPIONS
Question: Who is the batting leader (with maximum runs) in your school cricket team?
• Data Collection: Find the batting records or statistics of your school’s cricket team and collect
data on the total runs scored by each player.
• Data Analysis: Rank the players based on their total runs. You can also calculate the average
runs scored per game to see who performs consistently.

110 SUPPLEMENT—Decoding Artificial Intelligence–IX

Case Study
PLANNING A SCHOOL CARNIVAL FOR DATARICH HIGH SCHOOL

School Carnival

Problem Statement
You are the Chairman of the organizing committee of DataRich School Carnival, which is due
to be held in December. The carnival, a much-awaited annual event, is attended by all students
from class 6 onwards. The event is big since the school has an intake of 500 students in each
batch, amounting to 3,500 students in all!
The carnival event committee members have proposed the following activities to be held this
year:

Magic Show Merry-Go-Round Balloon Shooting Masquerade Ball Treasure Hunt

Balloon
Shooting

The proposal for the carnival was sent to the budget committee for their consideration and
final approval.
However, the budget committee had already finalized the budget for DJ and Dance Floor
and informed the organizing committee that only three events from the above list could be
accommodated with the remaining funds. As Chairman of the organizing committee, you are
tasked with carefully choosing the activities which shall be liked by all students.
To make a quick but wise decision, you approach the Data Science teacher to help make the
right choice to ensure that everyone is happy. The teacher advises you to use statistics for the
problem at hand.

Statistical Problem-Solving Process

The Statistical Investigative Question which can be extracted from the problem statement
is: Which activities should be chosen for the School Carnival to make the maximum students
happy?
Step 1: Identify the Source of Data
Every Statistical Investigative Question can be answered with relevant data. We need
to identify the sources from which data would be collected.
We can observe from the problem statement that all the students of DataRich High
School from Class 6 onwards form the Population. The total number of students (3,500)
can be calculated by multiplying 500 (batch size) with the number of batches, which
is 7 (classes 6 to 12).
For data collection, we can randomly select 50 students from each batch and gather
data from 50 x 7 = 350 students to keep it manageable.

Mathematics for AI (Statistics & Probability) 111

Step 2: Questionnaire Preparation and Administration (Collect/Consider the Data)
To collect data from the students, we need to prepare a questionnaire. This questionnaire
must be prepared keeping in mind the data we are interested in, as explained in the
problem statement. The prepared questionnaire is then shared with the students in the
sample group to give their choices. A sample questionnaire for our data of interest is
shown as an illustration below.

DataRich High School

N
School Carnival

My Name
I am a (Gender) Boy Girl
I Study in Class
My Roll Number is

My Wish list for Carnival

(Select any Three—Write 1 for Yes, 0 for No)

Magic Show

Merry-Go-Round

Balloon

Balloon Shooting
Shooting

Masquerade Ball

Treasure Hunt

Step 3: Data Preparation

In this step, the students’ responses collected through the questionnaire are recorded
in the form of a table. Each question is converted into a column and each row records
the response received from a student.
The following table shows the response of Aman—Roll Number 20, Class 6. It indicates
that he would prefer Magic Show, Balloon Shooting and Treasure Hunt.
Name Gender Class Roll Magic Merry-Go- Balloon Masquerade Treasure
Number Show Round Shooting Ball Hunt
Aman Boy 6 20 1 0 1 0 1

112 SUPPLEMENT—Decoding Artificial Intelligence–IX

Step 4: Analyze the Data
The recorded responses are first summarized in a table, as shown below. Please note
that only a few records have been shown for the purpose of illustration. The complete
data set contains 350 rows, with each row corresponding to the preferences of one
student. The total number of responses for each preference is also shown.

S.No. Name Gender Class Roll Magic Merry-Go- Balloon Masquerade Treasure
Number Show Round Shooting Ball Hunt
1 Aman Boy 6 20 1 0 1 0 1
2 Anil Boy 6 21 0 1 0 1 1
... ... ... ... ... ... ... ... ... ...
350 Neetu Girl 12 5 1 1 0 0 1
Total 205 180 253 104 308
The data is visualized for presentation to the committee, allowing them to make
inferences from the analysis.

Student Preferences

400
308
Number of Students

300
253
205 180
200
104
100

0 Magic Show Merry-Go-Round Balloon Shooting Masquerade Ball Treasure Hunt

Balloon
Shooting

Step 5: Interpret the Data
As Chairman of the organizing committee, you presented the analysis of statistical
data to the committee members. From the collected data and the subsequent bar chart
visualization, the organizing committee could answer the Statistical Investigative
Question: Which activities should be chosen for the School Carnival to make the maximum
students happy? It concluded that the maximum number of students would be happy if
Magic Show, Balloon Shooting and Treasure Hunt were included in the carnival.

PROBABILITY
The great Greek philosopher Heraclitus is said to have quoted: ‘Change is the only constant
in life.’ We know this for a fact that we cannot tell what may happen in the future. You may
interpret the above statement by saying that what has happened before is not certain to
happen again in future.
Conclusion: The future is uncertain.

Mathematics for AI (Statistics & Probability) 113

But we have also studied that Statistics is the science of data and the statistical process relies
on using the data collected from past events to predict what may happen in the future.
Conclusion: We may predict the future using Statistics.
There is so much confusion in the above statements. Is statistics sufficient? Can statistics be
the only ammunition in the arsenal of a good data scientist? If not, then what are we missing?
While it is true that prediction of the future depends on data, it is also true that data itself
can be affected by uncertainty. To account for this uncertainty—or the chances of something
happening/not happening—a data scientist must equip themselves with the concept of
Probability Theory, in addition to the knowledge of Statistics. In fact, a sound knowledge of
Probability is necessary to have command over Statistics.

Probability in Data Science

You must have often used probability theory in real life without realizing it. Consider some
common real-life statements listed below:
• There is a high likelihood that India will win against Pakistan in the Cricket World Cup.
• The test was so hard that there is a possibility half of the class may fail.
• The car is running low on fuel; there is a chance that we may not reach on time.
• The sky is clear and sunny—there is no prediction of rain today.
The above statements indicate the probability of something either happening or not
happening. The common attribute in all these statements is the constant role of uncertainty in
life. In simple words, we cannot be sure about what shall happen in the future—we may only
make a guess. The guess here may only be an intuition or be completely random in the absence
of any data, or it may well be an educated guess based on some past happenings, as can be
seen from the examples given below:

Random Guess Statements Educated Guess Statements

1. My son may become a pilot one day. (Random) 1. India has a high chance of beating Pakistan in the next Cricket World Cup.
2. The teacher may give a test this week. (Intuition) 2. It is very likely to rain today because the sky is overcast.
To understand the above relationship between statistics and probability in a better way,
let us revisit the example under the Educated Guess Statements category. Point no. 2 is
self-explanatory and it is an educated guess based on experience—when the sky is overcast,
it usually rains. However, point no. 1 cannot be based on experience alone. Consider the
detailed explanation below:
Statement: India has a high chance of beating Pakistan in the next Cricket World Cup.
General Inference: India may or may not beat Pakistan because it is a future event.
So far, the guess is a random thought as there is no statistical data associated with the
assessment of the likelihood of our statement. Let us add statistical data to the statement.
Statistical Data: ICC World Cup India vs Pakistan (Last 8 matches)

Tournament Matches played India won Pakistan won

World Cup 8 8 0

114 SUPPLEMENT—Decoding Artificial Intelligence–IX

Educated Guess: There is a very high probability that India shall beat Pakistan in the next
World Cup because statistical records show that India has won all the 8 matches played
against Pakistan in the previous ICC World Cup Tournaments.

g ont INTUITION
a r er
J Al Human intuition is a feeling or understanding that makes you believe or know that something
is true without being able to explain why.

Let us combine all the information mentioned previously and formally define probability.
Probability is the branch of Mathematics that deals with uncertainty. While Statistics helps
us understand data from past events to make predictions about the future, probability
complements this process by evaluating the likelihood of these predictions in the face of
uncertainty.

The accuracy of weather prediction is quite high—90% for a five-day period as compared to
80% for a week. For 10 days, it gets as low as 50%.

Probability helps you understand the reasons behind the patterns that you see in your data
and use those insights to make an educated guess about what may come next. It helps with:
• Looking beyond patterns: Probability goes beyond just observing patterns. It tries to
understand why those patterns exist.
• Dealing with uncertainty: There is always some chance involved. Even with a lot of
information, you cannot be 100% sure about the predictions you make.
• Predicting the future based on data: Using the patterns you see, probability helps you
guess the chances of a prediction being correct. The more data you collect, the better
your guess (probability) will be.

Probability: Terminology
Let us get familiar with some terminologies of Probability which we shall use frequently
throughout our studies. Consider the following statement:
• Statement: The prices of gold may fall tomorrow.
• Experiment: The above statement is called an Experiment in the terminology of
Probability. Any uncertain statement, for which it is possible to have multiple outcomes,
is called an experiment.
For example, some other experiments may be written as:
◼ The sky shall remain clear today. (Can you think of multiple outcomes?)
◼ There may be no examination this year.
◼ Rolling a die or tossing a coin.
Mathematics for AI (Statistics & Probability) 115
• Outcome: There is a possibility that gold prices may increase, decrease or remain
the same tomorrow, and each of the above possibilities is called an Outcome of the
Experiment.
Similarly, getting the values 1, 2, 3, 4, 5 or 6 on the face of a die are the outcomes. Also,
heads and tails may be the outcomes of a coin-toss experiment. Each outcome is the
result of a single trial of the experiment.
• Sample Space: A set of all possible outcomes of an experiment is called the
Sample Space. For example, the rolling of a single die has a sample space—1, 2, 3, 4, 5
and 6.

Activity 4
UNDERSTAND CHANCE EVENTS
The award-winning website Seeing Theory was created by Daniel Kunin while
studying as an undergraduate at Brown University. The goal of this website is to
make statistics more accessible through interactive visualizations. To explore
the meaning of chance events on Seeing Theory, scan the given QR code or
open the link https://seeing-theory.brown.edu/basic-probability/index.html in your
web browser.

• Event: An Event is a subset of outcomes, for example, throwing 4 on a die or two

successive heads in a coin toss is an event. However, throwing a value of 8 on the die is
not an event because it is not in the sample space (set of all outcomes).
• Probability: It is the mathematical measure of the likelihood of an event happening,
which ranges between 0% (Not Happening) to 100 % (Surely Happening). It may also be
depicted as a range between 0.0 to 1.0, where a probability of 0.5 for Heads on a coin flip
means that there is a 50% chance of Heads occurring in the toss.
For an event which is almost impossible to happen (like petrol prices never rising in a
year), the probability is 0 and for a sure shot event (like finding a person wearing shoes
in a crowded mall), the probability is 1.

Zero Probability 50% Probability 100% Probability

• Petrol prices not rising • Getting Tails on a coin • Finding someone wearing
in a year toss shoes in a crowded mall
• Zombie attack in your • Passing a test • Any object falling on
school earth due to gravity

0 0.5 1
Probability Range

116 SUPPLEMENT—Decoding Artificial Intelligence–IX

g ont FAIR DIE
a r er
J Al It refers to a die that is labelled such that each label has an equal probability of coming up
when the shape is tossed on a flat surface, regardless of the materials used, the angle, the spin
or the speed with which the shape is tossed. In contrast, a loaded die or an unfair one may favour
one value over the other due to the way it is created. A fair coin can be understood the same way.

Probability: Non-Technical Terminology

Certain terms used commonly in our day-to-day interactions, which are associated with
probability, are presented below. Observe the use of these terms in your own conversation
with friends and family to see how often we use probability without realizing it.

Sure!