0% found this document useful (0 votes)
5 views

AI Chapter2

Ai

Uploaded by

mithungoes2779
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

AI Chapter2

Ai

Uploaded by

mithungoes2779
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 40

CHAPTER 2

BASICS OF AI
In this chapter
• • Types
Challenges or Problems
of AI
What happened,
• • Domains
AI Project Cycleof AI
Sofia? You look
• • Techniques of AI
• AI Scoping
Problem Project Cycle
Data• Acquisition
You are right,
very excited.
• Steps of AI Project Cycle
Data• Exploration
Roboto. Problem Scoping
• • Data Acquisition
• • Data Exploration
Modelling
• • Evaluation
Evaluation
Modelling

GET SET GO!


You want to make
a greeting card for your mother
and present it to her on Mother’s
day. Write steps you will follow to
make a greeting card for her.

...............................
Sofia: Tomorrow is my sister’s birthday, so I
...............................
want to prepare a cake. Will you help me with the
recipe? ...............................
Roboto: Yes, why not! You may check on the ...............................
Internet for ingredient lists, method and
preparation time. You may refer to different
designs and flavours.
Write the recipe of the cake you want to prepare
with all the sub-steps so that it will be easy for
you to proceed. Prepare the cake according to your recipe.
I am sure you will make the tastiest cake for your sister.
Sofia: Thanks for helping me out. It would be fun.
We face many challenges or problems in our day-to-day life. Do you know that many of these can be
solved by AI projects? The projects need to be completed in steps with the knowledge of the basics of
Artificial Intelligence. These steps comprise the life cycle of a project.

In this chapter, we will learn about types of Artificial Intelligence, its domains, techniques and the steps of the
Artificial Intelligence project cycle in detail.
TYPES OF ARTIFICIAL INTELLIGENCE (Taken from C9)
Artificial intelligence is a branch of computer science that tries to replicate human intelligence in
a machine so that the machine can perform the assigned tasks. AI is run by machine learning
algorithms. Machine learning algorithms are improving day by day.

The categories of AI technologies depend on their capacity to mimic human traits, the technology they
use to do this, their real-world applications and the theory of mind. AI can be classified broadly into two
categories on the basis of its capabilities and functionalities.

Type 1: On the Basis of Capabilities


On the basis of capabilities, AI can be categorised into the following types (Fig. 1.4):

Narrow Dedicated General Perform like Super Intelligent


AI for one task AI human AI than human

Fig 1.4 Type 1 AI

Narrow AI or Weak AI
Artificial Narrow Intelligence (ANI), also referred to as narrow AI or weak
AI, is goal-oriented. It is designed to perform a single task, for example,
facial recognition, speech recognition, voice assistant, driving a car,
cleaning the home (Fig. 1.5) or searching on the Internet. Machines with
weak AI are intelligent enough to complete the specific task they have to
do. Such machines seem to be intelligent but they operate on a limited set
of specifications and parameters. Hence, they are not able to fully mimic Fig. 1.5 Robotic floor
humanintelligence. Examples of weak AI are Apple’s Siri, playing chess cleaner—an example
of Narrow AI
with a computer, providing buying suggestions on an e-commerce
website, cleaning home with Roomba and image recognition.

92
General AI
Artificial General Intelligence (AGI), also referred to as deep AI, is
the concept where machines can mimic human intelligence and
behaviour. In this case, machines can learn and apply human-like
intelligence to solve a problem. AGI can think, understand and act as
a human does in a given situation (Fig. 1.6). Chess-playing computers
Fig. 1.6 Example of General AI and self-driving cars are examples of general AI.

Super AI or Strong AI
Super AI, although a hypothetical concept, makes machines so
intelligent that they can surpass human intelligence and can perform
any task better than a human with cognitive properties (Fig. 1.7).
However, AI scientists have not yet achieved strong AI. Some key
characteristics of strong AI include the ability to think, reason, solve
puzzles, make judgements and plans and learn and communicate on
its own. Fig. 1.7 Example of Super AI

92
Differentiate between Narrow and General AI.

Narrow AI General AI

Example: Example:

Type 2: On the Basis of Th I Nk B OT


Functionality Can you think of at least two
advantages and two disadvantages
Based on functionality, AI can be categorised into the of Narrow AI?
following types:

Reactive Machines
Reactive machines understand their environment and decide their actions. They do not store
memories or experiences. Hence, such machines do not learn from their experience. Examples
of such machines are IBM’s Deep Blue system and Google’s AlphaGo.

Limited-memory Machines
Limited-memory machines have characteristics of reactive machines and can also use their
experience for decision-making. Such machines can store some experience and data for a
short period of time. Self-driving cars are one of the best examples of limited memory systems.
These carscan store the recent speeds of nearby cars, the distance of other cars, their speed
limits and other information to navigate the road.

Theory of Mind
Theory of mind machines understand the emotions and beliefs of humans and are able to
interact socially like humans. These systems are still not developed but research are going on.

Self-awareness
Self-awareness machines will have consciousness, sentiments and self-awareness. These are
not still developed but can be considered as the future of AI.

DOMAINS OF ARTIFICIAL INTELLIGENCE (Taken from C8)


Machines can perform cognitive tasks such as thinking, perceiving, learning, problem-solving and
decision-making with the help of AI. AI is a simulation of human intelligence in a machine that is
programmed to think like humans and mimic their actions. Different domains of human intelligence
are logical, linguistic, spatial, musical, kinesthetic, interpersonal and intrapersonal. Similarly, the three
domains of artificial intelligence are as follows:
• Data

• Computer Vision (CV)

• Natural Language Processing (NLP)

Let us imagine a cloth braid made of red, blue and green cloth pieces
(Fig. 1.20). If the red one depicts data, the blue depicts natural language
processing and the green depicts computer vision, the braid is Artificial
Intelligence. So, data, computer vision and natural language processing
go side-by-side in AI. Let’s learn more about them. Fig. 1.20 A Braid

Data
Data is the domain of AI that relates to the scientific methods, algorithms and processes that help
gain information from various types of structures as well as unstructured data. It is the plural formof
‘datum’. It is a collection of raw facts and figures collected from observations. As the digital world
is growing, data is also growing, exponentially. We have entered the era of Big Data. As perJohn
Naisbitt, ‘We are drowning in information and starving for knowledge.’ Who will help to gain
knowledge from that Big Data? Is it a big question for us? There are 1 trillion web pages on the Internet.
Every day 2.5 exabytes (2.5 × 260 bytes) of data are generated. This data is growing exponentially,
year by year. AI can use the data to develop its intelligence. Machine learning explores, studies and
constructs algorithms that can learn and make predictions on data. Such algorithms do follow
program instructions but also make predictions or decisions, based on data. They build a model from
sample inputs. Machine learning algorithms can process a large amount of data and extract useful
information. Big Data is time-consuming and difficult to process by a human, but this Big Data is the best
fodder to train machine learning algorithms.

Data sciences are one of the domains of AI-related to data systems and processes, in which the
system collects a large amount of data, maintains data sets and derives meaning or creates senseout
of them. Data is the backbone of AI. It is the food for all AI-enabled systems. Almost 98% of AI
systems depend on data. Most of the data is collected from us by the device which all of us have in our
hands, all the time, i.e., smartphone.

Computer Vision
Computer Vision (CV) is one of the domains of AI that trains machines
for a visual understanding. This happens by gaining an understanding of
digital images and videos to interpret them as humans do (Fig. 1.21). A
machine can get and analyse visual information and afterward predict
some decisions about it. It acquires, screens, analyses, identifies and
extracts information from a still image or video. This process helps the
Fig. 1.21 Computer Vision
computers to understand any visual content and act on it, accordingly.
In computer vision, input to machines can be photographs, videos and pictures from thermal or infrared
sensors, indicators and different sources. Face locking and unlocking in smartphones use thisfeature to
unlock the phone in which the smartphone’s owner can set up his/her face as an unlocking mechanism
for locking and unlocking purposes. The front camera detects and captures the face and saves its
features during initiation. Next time onwards, whenever the features match, the phone is unlocked. The
other applications of computer vision are self-driven cars, robotics, etc. The images
and data are captured by cameras and recorders in the form ofimages and videos, etc. This image is
Artificial Intelligence_Book 8 (High Res PDF).indb 5 09-09-2021 07:24:01 PM
processed and manipulated to get some useful information. This processing is done to reduce noise,
control brightness and set color contrast.

Natural Language Processing


Natural Language Processing (NLP) is the domain of AI that trains the machines to understand human
language. NLP helps in understanding, processing and analysing
data from human language. The language used by a human is the
natural language. The field of Artificial Intelligence that gives a
machine the ability to read, process, understand and derive meaning
like a human is Natural Language Processing (NLP). The best and
oldest example of NLP is separating junk e-mails by looking at the
subject line and text. Currently, NLP is used in translation, sentiment
analysis and speech recognition (Fig. 1.22).
Fig. 1.22 Natural Language Processing

TECHNIQUES OF ARTIFICIAL INTELLIGENCE (Taken from C9)


In AI, machines perform tasks like speech recognition, problem-solving and learning, etc. This workrefers
to machines especially computers, working like humans. Machines can work and act like a human if
they have enough information, data and learning. So, in Artificial Intelligence, knowledgeengineering
plays a vital role. The relation between objects and properties are established to implement knowledge
engineering. Below are some techniques of Artificial Intelligence.

Artificial Intelligence_Book 8 (High Res PDF).indb 6 09-09-2021 07:24:01 PM


• Automation: Robotic Process Automation (RPA) or automation occurs when repetitive tasks with a
rule-based approach are automated through software or hardware systems.
• Machine Learning: Machine Learning (ML), a term coined by Artur Samuel in 1959, meant ‘the
ability to learn without being explicitly programmed.’ Here, the programming is minimised
but the machine learns from experience and actions like a human. This is the science of
getting a computer to act without programming
explicitly. Instead of hand-coding software
libraries with well- defined specific instructions for
a particular task, the machine gets trained using
large amounts of data and algorithms, and in turn
gains the capability to perform specific tasks. The
thought process and automation of predictive
analysis are done in Deep Learning which is a
subset of Machine Learning.
Fig. 1.9 Timeline of AI, ML and DL

There are three types of Machine Learning algorithms, which are as follows:

Supervised Learning Unsupervised Learning Reinforcement Learning


This algorithm uses training data and This algorithm explores input data This algorithm learns to
feedback from humans to learn the without being given an explicit perform a task simply by trying
relationship of the given inputs to a output variable, for example, to maximise rewards, it receives
given output, for example, how the exploring customer demographic for its actions, for example,
inputs ‘time in years’ and ‘interest rates’ data to identify patterns. maximises the points it receives
predict housing prices. for increasing returns of an
investment portfolio.
In this algorithm, you know how to In this algorithm, you do not know In this algorithm, you do not
classify the input data and the type of how to classify the data and you have a lot of training data; you
behaviour you want to predict, but you want the algorithm to find patterns cannot clearly define the ideal
need the algorithm to calculate it for and classify the data for you. end state; or the only way to
you on new data. learn about the environment is
to interact with it.
The example of this algorithm is face The example of this algorithm The example of this algorithm
recognition. The system of Facebook is buying habits. The buying is games. One of the most
is trained in such a way that it can habits are saved in the database. common places where
recognise the face of a person from a This database is used in the reinforcement learning is used
group or a single portrait photograph. unsupervised learning algorithm to is games like AlphaZero and
Facebook is using a supervised group the customers into similar AlphaGo.
learning algorithm that is trained to purchasing segments. This helps the
recognise the face of a particular companies to these grouped
person. It has a system which takes a segments and can even resemble
photo, find faces and guesses who is recommender systems.
that in the photo (suggest a tag) is a
supervised process.

136
Input
Data with States
Labels Labels Actions

Error Reinforcement Reward


Learning Learning Learning

Targets Evaluations

Mapping Classes Action

Output
Fig. 1.10 Machine Learning
Algorithms

Which type of learning do you think is the best fit for anomaly detection in a pattern?

• Deep Learning: It is a type of Machine Learning that can process a wider range of data resources,
requires less data processing by humans and can often produce more accurate results than
traditional machine learning approaches. In Deep Learning, interconnected layers of
software-based calculators known as ‘neurons’ form a neural network. The network can
ingest vast amounts of input data and process them through multiple layers that learn
increasingly complex features of the data at each layer. The network can then decide about
the data, learn if its determination is correct and use what it has learned to make
determinations about new data. For example, once it learns what an objectlooks like, it can
recognise the object in a new image.

• Machine Vision: This technology gives eyes to a machine. Here, machines (cameras) are used to
capture the visual information that is analysed and converted in analog to digital conversion
and digital signal processing is employed to process the data. Then, the resulting data is fed
to acomputer. For example, signature identification, pattern recognition, medical image
analysis, etc. Commented [h1]: NLP is repeated at two places. We
can omit from here or can give in very brief, referring the
• Natural Language Processing: The language earlier
used by a human is a natural language. The
field of AI which gives machines the
ability to read, process understand Write down the AI, ML and DL at the
and able to drivemeaning like a proper place, in the diagram.
human is Natural Language
Processing (NLP). The best and the ............................
oldest example of NLP is separating
junk e-mails bylooking at the subject
....................
line and text. Now, it is used in
136

.................
translation, sentiment analysis and
speech recognition.
• Robotics: This is the field of engineering that
focuses on the design and manufacturing
of robots. It is used to perform
difficult and repeated tasks. For
example, robots are usedin assembly
lines for car production.

AI PROJECT CYCLE (Taken from C9)


There are many problems which we are facing in our daily life. They could be small or big, sometimes
ignored or sometimes even critical. Many times we become so used to a problem that it becomes a
part of our life as we are not able to think of its solution.
f ACTBOT
The meaning of the challenge or problem itself defines the According to the Cambridge
solution required for that particular statement/situation/condition dictionary, a problem is a situation,
which stops us from achieving our goal. person, or thing that needs
immediate attention and
As you know, 17 Sustainable Development Goals (SDGs) were needs to be dealt with or
announced by the United Nations to be achieved by 2030. But to solved.
achieve these goals many factors like the economy, fund,
governmentsupport, public support, etc. pose as hurdles. To
achieve these goals, we need to plan the solutions to the Th I Nk B O T
problems that come What is your goal in life?
in the path. One of the biggest examples is COVID-19 pandemic, Think about the problems
which made day-to-day life challenging for all. We have dealt with that can be an obstacle in
your path. What can you do
many problems and these problems are the challenges
to deal with these obstacles
in the achievement of goals. Any of these problems can be between you and your goal?
selected as a project.

To plan a viable solution to a problem, we need to go through a


step-by-step process. To go through these steps, we need to select a project. The project is nothing buta
series of tasks that need to be completed to reach a specific outcome. A project can also be defined as
a set of inputs and outputs required to achieve a particular goal. Projects can range from simple to
complex and can be managed by one person or a team.

Many problems in the world can be solved by AI projects. But as defined earlier, these projects need to
be completed in a series of phases. These phases are a part of the life cycle of a project.

Ch E C k B OT
Look around you and observe. Can you name some problems which people around you are
facing these days? Can you help them in any way? Write any five ways.

1. ................................................ 2. ................................................ 3. ................................................

4. ................................................ 5. ................................................

136
STEPS OF AI PROJECT
CYCLE (Taken from C9)
To solve a problem with AI, we need to
do tasks in steps one after the other.
Fig. 2.1 AI Project Cycle
These tasks help us to complete an
AI project. To bake a cake, we have
followed the steps in sequence. The AI project cycle provides an appropriate framework that can leadus
towards the goal. There are mainly five stages of the AI project cycle that are as follows (Fig. 2.1). Let’s
learn about these stages in detail.

136
PROBLEM SCOPING
Problem scoping refers to identifying a problem and having a vision to solve it. The whole project
depends on this. This stage is time consuming. To define a problem and define the scope of that
problem is not an easy task. We need to have a deeper understanding of the problem so that the
picture becomes clearer while we are working to solve it.

Let’s take an example to understand this stage.

Imagine that the world’s largest and most precious diamond is in danger as Mr Ray has threatened to
steal it. Mr Ray is very notorious for his stealing capabilities. Till now, no one has been able to track
him. So, the situation is very critical. You have been appointed as the Chief Security Officer and your
job is to enhance the security of the diamond to make the area impossible for Mr Ray to break into and
steal the diamond. Now that you are aware of AI concepts, plan to use them in accomplishing your
task.

Start with listing down all the factors which you need to consider while framing a security system.

This system aims to ..........................................................................................................................................

.............................................................................................................................................................................

.............................................................................................................................................................................

Without knowing all the factors, it is quite difficult to quote a problem statement like who is going to
use that system, what can you do, etc.

Steps of Problem Scoping


The different steps involved in problem scoping are:

Setting Goals
Identifying and defining goals is the first and the foremost step in any planning process and AI is no
exception to this. Goals are the objectives that an AI project needs to achieve. Goals also help in
looking for the reasons why a problem exists. They aid in minimising the various challenges faced in
finding a solution to the problem. Identifying and defining goals beforehand set the basis on which we
define actions. These goals need to be:

• Specific: Specific goals enable the planning team to ensure what is to be achieved and give a clear
vision to the executing team to ensure how they are to be achieved. They also reduce the chances
of uncertainties that may arise at a later stage.
• Measurable: The goals must be measurable. This helps in comparing the actual performance with
the planned outcome. For example, if the goal statement is ‘to increase the sales from the present
level of 50,000 pens per month’, then the goal is not measurable as the exact number that needs to
be increased is not defined. A more appropriate goal will be to increase the sales from the present
level of 50,000 pens per month to 75,000 pens per month.
• Achievable: The goals and objectives must be achievable by keeping in mind the available resources.
• Realistic: The goals and objectives must be easy to deliver, especially if you face problems or
complications. This is important because these problems will reduce the overall quality of the
project’s outcome and lead to running over budget and not meeting the set deadlines.
• Time-bound: This is important so that the project does not overstep the allocated time frame.

Identifying the Stakeholders


It is important to identify the stakeholders to a problem before developing any model. Stakeholders
refer to the person or group of persons who are likely to be affected by a given problem and its
outcome (both success as well as failure). These persons may be organisations, government, NGOs,
employees, customers, suppliers, society, trade unions, society, etc. They may be external or internal to
the problem. They may range from a few to public at large. While determining a person to be a
stakeholder, it is important to define the category under which that person may fit. It is important to
identify who all can affect the problem and who all will be affected by the outcome of the problem.

Identifying the Existing Measures


Some research may already exist in scoping of similar problems. These are the known existing
measures to a problem. These can be availed from documentation of the existing researches.
A knowledge of these researches is also necessary to understand the gap between the current
knowledge and the future outcome that is expected. It helps in understanding the perspective from
which the problem has already been analysed and understand the change in perspective needed for
the proposed research about the problem. A knowledge of contradiction between the two or more
existing perspectives also proves helpful in arriving at a suitable future course of research.

In short, there is no need to reinvent the ‘wheel’, rather the focus must be on improving the durability of
the ‘wheel’.

Now, answer the following questions:

1. Write the name of your selected theme.


2. Why did you select this theme?
3. List various topics related to your theme.
4. List down the problems which come under these topics.

Identifying the Ethical Concerns


Ethical concerns refer to the moral conflict between what an individual needs and what a society
needs. Examples of ethical concerns can be a violation of legal rules, regulations, power, honesty,
professional morals, respect, rights of local society, etc.

It is important to identify moral concerns that may arise at the time of implementing a course of action
in the planning stage itself. These moral concerns may be identified by getting in touch with the
stakeholders about the problem and its outcome.

Let us now start scoping a problem. Look around you and select a theme which interests you the
most. For more options, you can also refer to the 17 Sustainable Development Goals. For example,in
health, there are medicinal aid, mobile medications, spreading of diseases, etc. all being very
different from each other but still a part of the health theme. Thus, to effectively understand the
problem and elaborate on it, we need to select one topic under the theme and write the problems
based on that topic.
138
Ch E C k B OT
Select any one theme from your surroundings which interests you the most. Some of the
suggested themes are given below:

Now, answer the following questions:


Environment Agriculture Traffic Infrastructure 1. Write the name of your selected theme.
2. Why did you select this theme?
3. List various topics related to your theme.
Digital
Health Security Education 4. List down the problems which come under
Literacy
these topics.
Women Cyber
Transport Entertainment
Safety Security

Travel Social
Disability Research
Tourism Welfare

We use the 4Ws Problem Canvas to help us out in problem scoping.

4Ws Problem Canvas


The 4Ws Problem Canvas is as follows (Fig. 2.2):

Who? What? Where? Why?

Fig. 2.2 4Ws Problem Canvas

The 4Ws Problem Canvas helps you in identifying the four crucial parameters related to the problem.Let

us go through each of the blocks one by one.

Who?
The ‘Who’ block helps you to analyse the people getting affected directly or indirectly due to this
project. Under this, you find out who the ‘Stakeholders’ to this problem are and what you know about
them. Stakeholders are the people who face the problem and would be benefited with the solution.
These people are the best describers who can help you to define the problem as they are the ones
facing the problem.

What?
Under this block, you must know what you have in hand. At this stage, you determine the nature of
the problem. What is the problem and how do you know that it is a problem? Under this block, you
also gather evidence to prove that the problem you have selected actually exists. Newspaper articles,
media, announcements, etc. are examples.
Where?
Now that you know who is associated with the problem and what the problem actually is. You need to
focus on the context/situation/location of the problem. This block will help you look into the situation
in which the problem arises, the context of it, and the locations where it is prominent.

Why?
Finally, you have all the major elements that affect the problem directly. Now, it is easy to understand
who the people that would be benefited by the solution are; what is to be solved; and where will the
solution be deployed. These three canvases now become the basis of why you want to solve this
problem. Thus, in the ‘Why’ canvas, think about the benefits which the stakeholders would get from the
solution and how it will benefit them as well as the society.

CH E C K B O T
In the earlier activity, you have chosen one theme and one topic. Based on that topic, fill in
the 4Ws Problem Canvas.
Let us fill the ‘Who’ canvas!

• Who are the stakeholders?


.......................................................................................................................................................................
• What do you know about them?
.......................................................................................................................................................................
Let us fill the ‘What’ canvas!
• What is the problem?
.......................................................................................................................................................................
• How do you know that it is a problem?
.......................................................................................................................................................................
Let us fill the ‘Where’ canvas!
• What is the context or situation in which the stakeholders experience the problem?
.......................................................................................................................................................................
• Where is the problem located?
.......................................................................................................................................................................
Let us fill the ‘why’ canvas!
• Why do you believe it is a problem worth solving to the stakeholders?
.......................................................................................................................................................................
• How will the solution improve their situation?
.......................................................................................................................................................................

After filling the 4Ws problem canvas, you now need to summarise all the 4Ws into one template. This
template is called the problem statement template (Fig. 2.3). The problem statement template helps
us to summarise all the key points into one single template so that in the future whenever there is a
need to look back at the basis of the problem, we can take a look at the problem statement template
and understand the key elements of it.

Problem Statement Template with space to fill details according to your goal:

Our [stakeholders(s)]
................................................................................
Who
................................................................................

................................................................................
has/have a problem [issue, problem, need]
that ................................................................................
What
................................................................................

................................................................................
when/while [context, situation]
................................................................................
Where
................................................................................

................................................................................
An ideal [benefit or solution for them]
solution would ................................................................................
Why
................................................................................

................................................................................
Fig. 2.3 Problem Statement Template
After observing these factors, you will get clarity towards the issue to be solved which leads youtowards
data acquisition.
TH I N K B O T
DATA ACQUISITION From where can you collect the data
for your project? Are data images,
The next stage of the AI project cycle is about acquiring text, numbers or all of these?
data for the project. Data is a piece of information or
facts and statistics that needs to be collected for reference or analysis. In an AI project, a large amount
of data is needed to train it. This data is collected from various sources.

For example, suppose you want to make an Artificially Intelligent system which can predict wheat
crop production by using the past weather, soil and climate data. This is the data by which AI-machine
is trained. Once machine training is complete, it is ready to predict wheat crop production using the
previous wheat training data while the next wheat crop prediction data set is known as testing data.
The stage of acquiring data from the relevant sources
is known as data acquisition. The quality of data is very
important. Authentic and relevant data should be used to GET IT RIGH T
train the machine. Data can be collected from random
websites as huge data is available online.
This is not true. The data available on
Features of Data the websites can be fake. Always collect
Features of data refer to the type of data required to train data from authentic sources.
the particular machine. Data can be collected from various
sources like the Internet, sensors, mobile devices, etc. Authentic and correct data can be collected
from government portals. The data which we collect must be opensourced and not someone’s
property. Data collected without abiding1 by copyright policies from private sources can be offensive2.
Some of the ways to collect data (Fig. 2.4) are as follows:

Surveys Web Scraping Sensors

API
Cameras Observations (Application
Program Interface)

Fig. 2.4 Some Ways to Collect Data

Machine learning is a subset of AI. This discipline of machine learning relies on data that we have
already studied to perform AI training in a supervised or unsupervised way. Data is the most valuable
resource to train the machine. The more authentic and relevant data you have, the better you can train
the machine. In both of the cases, the most important factor is not the learning process, but the quality
of data. Data acquisition is collecting data.

The data that you collect yourself rather than the data collected from another party is primary data.
This is collected by survey. As you know, mobile is one of the most important devices to collect data.
The sensors of the device, cameras and browsing history or digital footprints create good data. That
data is the most authentic as it is collected by the day-to-day transactions. It cannot be fake. That’s why
data is called the new gold of this era.

Types of Data
Different types of data are used in developing AI projects. There are various criteria on the basis ofwhich
data can be classified. The most basic criteria of classification of data is as follows:

• Quantitative data: Quantitative data is also known as numeric data. Quantitative data represents
the numerical value (i.e., how much, how often, how many). It gives information about the quantities

1
Abiding: Lasting for a long time and not changing
2
Offensive: Rude in a way that causes somebody to feel upset
of a specific thing. Examples of numerical Types of Data
data are height, length, size, weight, and so
on. The quantitative data can be classified
into two different types, based on the data Categorical or Numerical or
Qualitative Data Quantitative Data
sets. The two different classifications of
numerical data are continuous data and
discrete data. Nominal Ordinal Discrete Continuous
Data Data Data Data
○ Continuous data: Data that has any
numerical value is called continuous data. Fig. 2.5 Types of Data
It has an infinite number of probable values that can be selected within a given specific range.For
example, temperature range, 10.5 kg and 200.50 km.
○ Discrete data: It is information that can only take certain values, like the number of students ina
class can only be natural numbers, and not decimals or fractions.
• Qualitative data: It is also known as the categorical data. It describes the data that fits into the
categories. Qualitative data is not numerical. The categorical information involves categorical
variables that describe the features such as a person’s gender, home town, etc. Categorical
measures are defined in terms of natural language specifications, but not in terms of numbers.
Sometimes, categorical data can hold numerical values (quantitative value), but those values do not
have mathematical sense. Examples of the categorical data are birth date, favorite subject, citypin
code, etc. Here, the birth date and city pin code hold the quantitative value, but it does not give
numerical meaning.
The two different classifications of qualitative data are nominal data and ordinal data.
○ Nominal data: It is one of the types of qualitative information which helps to label the variables
without providing the numerical value. It is also known as nominal scale. It cannot be ordered
and measured. But sometimes, the data can be qualitative and quantitative. Examples of
nominal data are letters, symbols, words, gender, etc.
○ Ordinal data: Ordinal data/variable is a type of data which follows a natural order. The
significant feature of ordinal data is that the difference between the data values is not
determined. This variable is mostly found in surveys, finance, economics, questionnaires, and
so on.
Another criterion based on which data can be classified is the structure of the data. On this basis, data
can be classified as follows:

• Structured data: These data have a specific pattern or set of rules. They have a simple structure
and store the data in specific forms such as tabular form. For example, the cricket scoreboard,
school timetable, exam datesheet, etc.
• Unstructured data: The data structure which doesn’t have any specific pattern or constraints can
be stored in any form is known as unstructured data. Mostly the data that exists in the world is
unstructured data. For example, YouTube videos, Facebook photos, etc.
• Semi-structured data: It is the combination of both structured and unstructured data. Some data
can have a structure like a database whereas some data can have markers and tags to identify the
structure of data.
Some other types of data can be as follows:

• Useless data: Useless data is unique and discrete data with no relationship with the result. For
example, in a list of people eligible to vote, a column having ‘Yes’ in all rows of the column is more
or less useless.
• Time-stamped data: It helps the system to predict the next best action. It follows a specific time-
order to define the sequence. This time can be the time of data captured or processed or collected.
• Machine data: It is the result or output of a specific program, system or technology. It consists of
data related to a user’s interaction with the system like the user’s logged-in session data, specific
search records, user engagement such as comments, likes and shares, etc.
• Binary data: Binary data is made up of only two categories: 0 for ‘off’ or ‘no’ and 1 for ‘on’ or ‘yes’.
This type of data is very common in the study of AI as the machine understands only binary
information. Binary data is a very common outcome variable in classification problems in AI. For
example, an AI model developed to predict whether a tumour is malignant or benign may give the
result in form of binary data.
• Spatiotemporal data: It contains information related to geographical location and time. It records
the location through GPS and time-stamped data where the event is captured or data is collected.

• Open data: It is freely available data for everyone. Anyone can reuse this kind of data.
• Real-time data: The data which is available with the event is considered as real-time data.
• Time data: It is used to denote features like daily, weekly, fortnightly, monthly, annually, etc. This
data plays an important role while developing an AI model to schedule activities as a step of
machine intelligence.
• Interval data: Interval data has a gap between the data entries. This data is represented in the form
of groups like age groups (0–2 years, 2–5 years, 5–8 years, etc.), income groups (Less than 50,000,
50,000 to 1,00,000, etc.). This type of data is a more precise measurement scale data.

There is one more type of data, i.e., Big Data. Let’s learn about this.

Big Data
Big Data can be defined as a concept used to describe a large (huge) volume of data, which are both
structured and unstructured, and that gets increased day by day by any system or business. It is data
that is quite large and complex. The hugeness and complexity of Big Data can be imagined keeping in
mind that none of the traditional tools of data management can efficiently store and process this
Big Data.

For example, the daily data generated by stock exchanges run into terabytes (TB) each day. One can
imagine the hugeness of data in a year or a decade. The daily data generation in the form of messages,
audios, videos and animations, on social media platforms like Facebook and WhatsApp is more than 500
TB. A jet engine generates data of more than 10 TB in a 30-minute long flight. Imagine the data
accumulated by jet engines if there are thousands of flights each day. The data is sure to run into
Petabytes (PB).
Benefits of Big Data Processing
The ability to process Big Data brings in many benefits such as follows:

• Better decision-making: Big Data analytics has boosted the decision-making process to a great
extent. Rather than anonymously making decisions, companies are considering Big Data analytics
before concluding to any decision.
• New product development: Using Big Data analytics, trends of customer needs and satisfaction can
be analysed. This can further help to develop a whole new product according to their requirements.
• Reduction in cost: Using Big Data tools like Hadoop and Cloud-based analytics, cost saving in
business can be done. In business, when large amounts of data are there, then these tools help
to handle and maintain that data in more efficient ways.
• Businesses can utilise outside intelligence while making decisions: The businesses can access
data from social networking sites and fine-tune their business strategies according to the needs
and demands of society.
• Improved customer service: Traditional customer feedback systems are getting replaced by new
systems designed with Big Data technologies. In these new systems, Big Data and natural language
processing technologies are being used to read and evaluate consumer responses.
• Early identification of risk to the product/services: The use of huge data resources helps in
identifying the expected product or service-related risk at the earliest possible stage.
• Better operational efficiency: In case of new additional data, Big Data systems are used to identify
and segregate relevant data from the huge volume of new data. Such integration of Big Data
systems and data warehouse help an organisation to off-load infrequently accessed data.

Uses of Big Data


Big Data technology has immense uses. Some of the practical uses and applications of Big Data are
as follows:

• Education: The education industry is over-flooded with huge volumes of data covering details about
students, teachers, universities, courses, grades, educational resources, etc. These huge volumes
of data have helped the education industry towards the development of customised and dynamic
learning programs, reframing course material, advanced grading systems and career growth.
• Healthcare: The healthcare industry is also over-flooded with huge volumes of data covering details
about patients, doctors, hospitals, diseases, remedies, precautions, medicines, etc. The use of Big
Data in healthcare industries has resulted in reducing costs of treatment, preventive measuresto
be taken in case of epidemic outbreaks, identifying and preventing the growth of malignant
diseases at an early stage and recommendation of evidence-based medicines.
• Government: The government of every country has to work on huge volumes of data. It has to
keep a record of citizens, GDP, energy resources, geographical surveys, infrastructure, sector-wise
growth, growth prospects, etc. This Big Data helps the government to introduce welfare schemes,
security of data, identify areas of attention, meet national challenges like terrorism, unemployment,
poverty, overpopulation, etc.
• Media and Entertainment: The media and entertainment industry have immense data in the form of
photos, videos, audio, animations, reviews, comments, etc. The social media platforms have added
to the existing data resources of the industry. These Big Data resources have helped the industry
to predict the interest of the audience, scheduling online streaming, analysis of customer reviews,
audience-targeted advertising, etc.
• Meteorology: The meteorological departments have data flowing over many years towards weather
trends of different regions in the country and across the globe. These environment-related Big Data
have helped in weather forecasting, studying global warming, disaster prediction and management,
patterns of disasters, availability of resources like wind, water, etc.
• Transportation: The transportation industry has used Big Data to plan travel routes, manage traffic
congestions, identification of accident-prone areas, increasing traffic safety levels, etc.
• Banking: The banking sector data has skyrocketed with digitalisation of the industry. The data
comprises details of customers, banks and their branches, employee codes, account numbers and
balances, details of services like credit cards, overdrafts, time deposits, etc. The industry has used
Big Data to offer better customer services round the clock, prevent misuse of credit/debit cards,
clarity of business proposals, prevention of money laundering, mitigation of customer risk, etc.

Parameters of Big Data


The parameters of Big Data can be expressed in the form of 4Vs: Volume, Variety, Velocity and
Variability.

• Volume: The name Big Data itself means the data which is bigger than the ordinary data. The size
of data plays an important role in classifying the data as Big data or not. If the data generated runs
into TB but is only one-time and is not growing, then it cannot be classified as Big Data.
• Variety: Big Data gets accumulated from various sources. So, such data has a lot of variety
amongst the individual records. As it is a collection of structured data (like databases and
spreadsheets) as well as unstructured data (like e-mails, PDFs, photos, videos, audios, etc.), it ismost
useful for analytical purposes.
• Velocity: Velocity means the speed of the generation of data. Big Data is generated and processed
at an extraordinary speed. Big Data deals with the speed at which data flows in from sources like
business processes, application logs, networks, social media sites, sensors and mobile devices,
etc. The flow of data is colossal3 and continuous.
• Variability: Variability is the inconsistency shown by the data that may arise upon the introduction
of data from varied sources. This sometimes lessens the data processing speed as the machine
initially synchronises the data internally to give meaningful results.

Challenges of Big Data


The challenges facing the use and implementation of Big Data need to be understood before starting a
Big Data-based AI project. Some of the challenges of Big Data are:

• As a famous English proverb says, ‘A little knowledge is a dangerous thing’. Incomplete and
insufficient knowledge of Big Data may put the success of the entire project in jeopardy4.

3
Colossal: Extremely large
4
Jeopardy: In a dangerous position or situation and likely to be lost or harmed
• It cannot be guaranteed that the Big Data collected and analysed are totally (100%) accurate.
Redundant data, contradicting data or incomplete data are challenges that remain within it.
• Over-variety in Big Data may result in confusion and the selection of irrelevant data.
• A huge investment is involved in developing a Big Data model. Leaving a project incomplete may
result in huge losses.
• Sometimes, the less-trained managers may not be able to understand the complexity and quality
of Big Data. As a result, the outcome of the project may be different from what was planned or
targeted.
• The higher the volume of the data, the bigger the security risk. So, Big Data applications involve
cybersecurity risks and need to be plugged well in advance.
• The process of converting Big Data into valuable results is tricky and needs to be handled by well-
qualified and suitably trained personnel.
• The generation of such a massive amount of data needs space for storage and organisations face
challenges to handle such extensive data without suitable tools and technologies.
• The managing of Big Data is already a complex process. The upscaling or diversification of industry
using Big Data poses a greater challenge.
• The growth velocity at such a high rate creates a problem to look for insights using it. There is no
100% efficient way to filter out relevant data.

Sources of Data
Data can be obtained from various sources. All the sources of data (Fig. 2.6) can be broadly classified
into primary sources and secondary sources.

• Primary sources: These are the sources of data Sources of Data


where the user collects the data on their own accord.
This data is collected directly from the statistical
units. The various techniques used for the collectionof Primary Secondary
Sources Sources
data from primary sources are:
○ Interview: In this case, the data is collected
Interview Internal
through interviewing the target audience by
a person called interviewer and the person
who answers the interview is known as the Survey External
interviewee. Some basic business or product
related questions are asked and noted down Observation
in the form of notes, audio or video and this
data is stored for processing. These can be
Experimental
both structured and unstructured like personal
interviews or formal interviews through telephone, Fig. 2.6 Sources of Data
face-to-face, e-mail, etc.
○ Survey: In this case, a list of relevant questions are asked and answers are noted down in the
form of text, audio or video. The survey method can be obtained in both online and offline
modes like through website forms and e-mail. Then these survey answers are stored for
analysing. Examples are online surveys or surveys through social media polls.
○ Observation: In this case, the researcher keenly observes the behaviour and practices of the
target audience using some data collecting tool and stores the observed data in the form of
text, audio, video or any raw formats. In this method, the data is collected directly by postinga
few questions on the participants. For example, observing a group of customers and their
behaviour towards the products. The data obtained will be sent for processing.
○ Experimental: In this case, data is collected through performing experiments, research and
investigation. These outcomes help in arriving at a final solution to the problem. For example,
recently, the vaccine for COVID-19 has been developed by experimentation. Some commonly
used experiment methods are Completely Randomised Design (CRD), Randomised Block Design
(RBD), Latin Square Design (LSD) and Factorial Design (FD).
• Secondary sources: These are the sources of data where the researcher uses data from the
previously conducted research. The data may be obtained from internal sources (previous research
conducted by the management) or external sources (research conducted by another organisation,
government or NGO).
○ Internal sources: When data is collected from reports and records of the organisation itself,
they are known as the internal sources. An example of internal source is organisation.
○ External sources: When data is collected from sources outside the organisation, they are
known as the external sources. Examples of external sources are government publications,
news publications, Registrar General of India, planning commission, international labour bureau,
syndicate services and other non-governmental publications.
Special care needs to be taken in case of using secondary sources about the authenticity and validity of
the data being used.

DATA EXPLORATION
While collecting data, we can notice that all data is not in the same format, for example, date
formats can be dd/mm/yy, dd/mm/yyyy, etc. The data we have collected from various sources is
full of numbers and the quantity of data is too much that one cannot make sense out of it. To take a
fruitful decision from data, we need to summarise data, identify trends and the pattern in the data. By
observing the pattern, one can make the decision easily. Suppose if you want to search content, you
may go to the library. There are many books. Can you read or select all the books in one go? No, we
cannot do so as we need to browse the complete index. If we find some interesting content then only,
we choose the book for reading. Similarly, to analyse the data, we need to visualise the data in some
user-friendly formats so that the trends and patterns can be found easily.

Data visualisation refers to presenting data in the form of pictures or graphs. It helps the users to
see and analyse data visually. In this way, they can compare existing patterns, identify new patterns
and understand the relationship between different variables. AI makes the visualisation of data
interactive. This further enables the user to reformat the visuals according to one’s need and levelof
understanding. Using charts or graphs helps the user to visualise large amounts of complex data
better than studying spreadsheets or reports. It is a quick and easy way to convey concepts. It also
helps in quickly identifying the areas that need attention or improvement. It clarifies and pinpoints the
factors that influence customer behaviour. It helps in the proper allocation and placement of productsto
achieve maximum objectives. Data visualisation also helps in predicting future trends of sales, profits,
production, etc.
Data Visualisation Tools
There are various tools available to convert textual data into visual data. Some of the effective and
popular data visualisation tools are as follows:

• Excel: It is the most basic and the most widely used data visualisation tool for basic purposes. It
cannot be used for Big Data visualisation but is effective for smaller organisations with a small set
of data.
• Tableau: It is a business intelligence tool for visually analysing the data. Users can create and
distribute an interactive and shareable dashboard, which depict the trends, variations and density
of the data in the form of graphs and charts. It can connect to files, relational and Big Data sourcesto
acquire and process data. The software allows data blending and real-time collaboration which
makes it very unique. It is used by businesses, academic researchers and many government
organisations for visual data analysis.
• Datawrapper: It is used to visualise data as beautiful charts, maps and tables. The visualisations
created using Datawrapper are device adjustable. This means that the visuals automatically adjust
to the change in device from mobile to tablet to a personal computer.
• Visme: It is an online graphic tool to create charts, graphs, impressive infographics, clean and
professional slides, social media graphics, and so much more.
Some commonly used data visualisation graphical tools are:

• BoxPlot: A boxplot is used to display the summary


of the set of data values having properties like
minimum, first quartile, median, third quartile and
maximum. In the box plot, a box is created from the
first quartile to the third quartile, a vertical line is also
there which goes through the box at the median. Here
x-axis denotes the data to be plotted while the y-axis
shows the frequency distribution. (Fig. 2.7).
Fig. 2.7 BoxPlot
• Histogram: A histogram is a graphical representation of data using bars of different heights. In a
histogram, each bar groups numbers into ranges. Taller bars show that more data falls in that range
(Fig. 2.8).
• Heat Map: A heatmap is a
graphical representation of
data in two-dimension, using
colors to demonstrate different
factors. In this, to represent
more common values or higher
activities brighter reddish colors
are used and to represent
less common or lower activity
values, darker colors are
preferred. Heat maps are most
useful when examining a large Fig. 2.8 Histogram Fig. 2.9 Heat Map
number of values (Fig. 2.9).
• Line Chart: It is the simplest technique. It is used to plot the relationship or dependence of one
variable on another. For example, the price of different flavours of chocolates varies, which we can
represent with the help of this graph (Fig. 2.10).
• Bar Charts: Bar charts are used for comparing the quantities of different categories or groups.
Values of a category are represented with the help of bars and they can be configured with vertical
or horizontal bars, with the length or height of each bar representing the value (Fig. 2.11).

Fig. 2.10 Line Chart Fig. 2.11 Bar Chart

• Pie Chart: It is a circular statistical graph that uses slices or sectors


to illustrate numerical proportion. Each slice or sector denotes a
proportionate part of the whole.
These are generally used to
compare the parts of a whole and
are most effective when there are
limited components and when text
and percentages are included to
describe the content (Fig. 2.12).
Fig. 2.12 Pie Chart

• Scatter Chart: A scatter chart is a two-dimensional plot


representing the joint variation of two data items. Scatter plots
are used for examining the relationship, or correlations, between
X and Y variables (Fig. 2.13).
Fig. 2.13 Scatter Chart
• Bubble Chart: A bubble chart displays the values of three
numeric variables, where each observation’s data is
shown by a circle (bubble), while the horizontal and
vertical positions of the bubble show the values of two
other variables. It is a variation of scatter chart, in which
the data points are replaced with bubbles
(Fig. 2.14).
• Timeline Chart: Timeline charts are an infographic used
to show the progression of a particular event or activity
over time. It offers us a visual description of the past,
present and future condition of a given matter. They are
an essential tool that helps project leaders, corporate
Fig. 2.14 Bubble Chart
managers, historians and scientists keep track of time and educate others on the subject. Some
uses of timeline charts include showcasing historical events in a creative way and the presentationof
a business project’s plans. (Fig. 2.15).
• Treemap: Treemaps are visualisations for hierarchical data. They are made of a series of nested
rectangles of sizes proportional to the corresponding data value. A large rectangle represents a
branch of a data tree, and it is subdivided into smaller rectangles that represent the size of eachnode
within that branch. (Fig. 2.16).

Fig. 2.15 Timeline Chart Fig. 2.16 Treemap

The various types of graphical representations are used for Th I Nk B O T


representing different parameters of data. The graphical
If data which is not cleaned or
representation makes the data understandable for humans optimised is fed into an AI model,
as we can discover trends and patterns out of it. will it affect the system? How?

MODELLING
Data in the computer is stored in the most basic form of numbers Learning Based
(which is binary—0s and 1s). But when we talk about discovering
patterns and trends in data then the machine goes for mathematical AI Models
representations of the same. The ability to mathematically describe
Rule Based
the relationship between parameters is the heart of every AI
model. Thus, whenever we talk about developing AI models, it is Fig. 2.17 Classification of AI Models
the mathematical approach towards analysing data which we refer to. Generally, the AI model can be
classified (Fig. 2.17) as follows:

Rule-based Approach Rules


Rule-based Approach Answers
A rule-based approach is generally
Data
based on the data and rules fed to the
machine, where the machine reacts Fig. 2.18 Rule-based Approach
accordingly to deliver the desired
output. A rule-based Artificial Intelligence produces pre-defined outcomes that are based on a set
of certain rules, coded by humans. These are very simple systems just ruled by if-then statements.
The two major components of rule-based
Artificial Intelligence models are ‘a set of
rules’ and ‘a set of data’. You can develop
a basic AI model with the help of these Machine Trained
using
two components.
Labelled Dataset
Rule-based approach refers to the AI
modelling where the relationship or Rule-based AI model
patterns in data are defined by the

Testing using

Testing data
developer. The machine follows the
Labelled Dataset
rules or instructions mentioned by
the developer and performs its task
accordingly.
OUTPUT
For example, if you have a dataset
comprising 200 apples and 200 bananas
images (Fig. 2.19). Then in order to train a Testing Data
machine, feed this data into the machine Machine Identifies the image
as APPLE
and label each image as either apple or
banana. Now, if you test the machine with Fig. 2.19 An Example of a Rule-based Approach
the image of an apple, it will compare the image with the trained data and according to the labels of
trained images, it will identify the test image as an apple. This is known as the rule-based approach.
The rules given to the machine in this example are the labels given to the machine for each image in
the training dataset.

Let’s understand the rule-based approach with the help of some models.

• Regression: In regression, the algorithm generates a mapping function from the given data,
represented by the solid line. The dots shown in the graph are the data values and the solid line
here represents the mapping done for them (Fig. 2.20). With the help of this mapping function,
we can predict future data. To apply the Regression Classification
regression modelling technique, we need
continuous data. For example, if we want to
predict the salary of an employee, we can
use his past salaries as training data and
can predict his next salary.
• Classification: In classification, the
algorithm can determine which set a
given data point belongs to by utilising
a classification function represented Fig. 2.20 Regression Model Fig. 2.21 Classification Model
by the dotted line. The model classifies datasets according to the rules given to it. Usually, the
dataset used for classification are
labelled and the data then gets Date
Learning-based Approach Rules
sorted according to their labelling
Answers
(Fig. 2.21). Testing data is then
classified as one of the labels of Fig. 2.22 Learning-based Approach
the training dataset. For example, if we want to train a model to identify if an image is of a guitar
or a piano, we need to train it with multiple images of both guitar and piano along with their labels.
The machine will then classify images based on the labels and predict the correct label for testing
data. The classification works on the discrete dataset.

Learning-based Approach
Under learning-based approach, the machine is fed with data GET IT RIGh T
and the desired output to which the machine designs its own An algorithm that is once designed
cannot be altered without human
algorithm (or set of rules) to match the data to the desired
intervention.
output fed into the machine (Fig. 2.22). Learning-based
This is not true as machines work on the
approach refers to the AI modelling where the relationship
data fed in them.
or patterns in data are not defined by the developer. In this
approach, random data is fed to the machine and it is left on
the machine to figure out patterns and trends out of it. Generally, this approach is followed when the
data is unlabelled and too random for a human to make sense out of it. Thus, the machine looks at
the data, tries to extract similar features out of it and clusters same datasets together. In the end as
output, the machine tells us about the trends which it observed in the training data.

To understand this, we will take the earlier example of apples and bananas. Suppose a dataset has 200
images of apples and 200 images of bananas. These images are labelled with apples and bananas.
These apples and bananas may be of different shapes and sizes. Now, the algorithm is designed in
such a way that it can identify apples and bananas based on their features. It can predict any image
which belongs to apple and banana. After training, the machine is fed with testing data, this data may
not have the same or similar images but the machine can adapt itself and identify the correct label for
the new image feed.

Let’s understand this approach with the help of a model.


Clustering
Clustering: This is a machine learning approach where the machine
generates its own rules or algorithms to differentiate amongst the given
dataset to achieve the pre-decided goal. The data fed to such a model is
usually unlabelled or random and thus the developer feeds in the data directly
into the machine and instructs it to build its algorithm. The machine then
finds patterns or trends out of the training dataset and clusters the ones
which follow the same pattern (Fig. 2.23). The output rules might be very
different from what was expected as the machine has its way of recognising
patterns. As in the example given above if you have random data of stray Fig. 2.23 Clustering Model
cows which live in your locality since you are unable to find any meaningful
pattern amongst them, you would feed their data into the clustering algorithm. The algorithm would
then analyse the data and divide them into clusters according to their similarities based on the trends
noticed. The clusters are then given as the output. Clustering works on the discrete datasets.

Decision Tree
A decision tree is the most powerful and popular tool used for predictions and classification. This
method uses a set of declarative rules as an input for generating a decision tree. It is a flow control like
a tree structure. This uses a top-down approach. Each node in the tree act like a test case for some
attributes and each edge descending from that node corresponds to one of the possible answers to
the test case. It is a reverse tree, the root is at the top and the leaves are at the bottom. An instance
is classified by starting at the root node of the tree, testing the attribute specified by this node, then
moving down the tree branch corresponding to the value of the attribute. The tree-based method
empowers predictive models with high accuracy, stability and ease of interpretation. These are
considered to be one of the best and most used supervised learning methods. A decision tree can be
utilised for both classification and regression types of problems.
Some common terms used for decision tree are as follows:

• Root node: It represents the entire sample that is further divided into two or more homogeneous
sets.
• Splitting: It is a process by which a node is divided into two or more sub-nodes.
• Decision or interior node: It is the node where the splitting takes place. In other words, it is a place
where the sub-node is divided into other sub-nodes.
• Leaf node or terminal node: It is the node with no child node.
• Branch or Sub-tree: A subsection of the decision tree is known as a branch or sub-tree.
• Parent node and child node: The bottom node which derives from the top node is known as the
child node whereas the top node is known as the parent node.
The decision tree is made up of various nodes. These nodes are the parts of a decision tree. They areas
follows:

• Decision Node: It represents a decision, typically shown with a square.


• Chance Node: It represents probability or uncertainty, shown in a circle
• End Node: It represents the result or outcome, shown in a triangle.
Root node
The following things should
be followed while making the salary at least `50,000
decision tree:
decision nodes
• Observe your data.
Branches
• Decide what (data) will be the
root of the decision tree. commute more decline
than 1 hour
• Decide what (data) will be at the offer
leaves.
• Now, analyse the data properly
and prepare the decision tree.
decline
Example of the decision tree: A provides cab facility
offer
company is offering you a job and
leaf nodes
you are in dilemma either to accept Fig. 2.24 Decision Tree of Accepting a New Job Offer or Not
or decline the offer (Fig. 2.24). With
this decision tree, it will become
accept decline
easier to decide. offer offer
The advantages of the decision tree are as follows:

• It can generate understandable rules.


• It performs classification without requiring much computation.
• It can handle both continuous and categorical variables.
• It provides a clear indication of the fields which are most important for prediction and
classification.

EVALUATION
Once a model has been made and trained, it needs to go through proper testing so that one can
calculate the efficiency and performance of the model. Hence, the model is tested with the help of
testing data which was separated from the acquired dataset at the Data Acquisition stage.

A C T IV IT Y B OT
Let’s play JAM (Just a minute). Put some slips in a bowl and divide the class into two groups.
One student from each group will pick a slip from the bowl and speak for one minute on the selected topic.
The bowl can have the following topics:

• Stages of the AI project cycle • Decision tree TA L k B OT


• Delay reason for a project • Modelling What stages do you follow
when you prepare a science
• Different models • Evaluation
project on a windmill?
• AI model approaches

AI G L O S S A R Y
• Algorithm: Step-by-step procedure to solve a problem by following certain rules
• Automation: Occurs when repetitive tasks with a rule-based approach are automated through
software or hardware systems
• Robotics: The field of engineering that focuses on the design and manufacturing of robots Commented [h2]: Added

• Problem scoping: To identify a problem and the vision to solve it


• Data acquisition: The stage of acquiring data from the relevant sources
• Quantitative data: Numerical value (i.e., how much, how often, how many)
• Qualitative data: Data that fits into the categories and which is not numerical
• Big Data: A concept used to describe a large (huge) volume of data, which are both structured and
unstructured, and that gets increased day by day by any system or business.

• Data visualisation: Presenting data in the form of pictures or graphs


• Rule-based approach: An approach that is generally based on the data and rules fed to the machine,
where the machine reacts accordingly to deliver the desired output.

• Root node: The entire sample that is further divided into two or more homogeneous sets
• Splitting: A process by which a node is divided into two or more sub-nodes
• Decision node: A node where the splitting takes place
• Leaf node or terminal node: Node with no child node
• Branch or Sub-tree: A subsection of the decision tree
• Parent node: The top node is known as the parent node
• Child node: The bottom node which derives from the top node
• Chance nodes: Representation of probability or uncertainty, shown in a circle
• End nodes: Representation of the result or outcome, shown in a triangle

AI SU M M A R Y
• AI can be classified broadly in two categories, on the basis of their capabilities and on
functionalities.
• Narrow AI or weak AI is designed to perform a single task.
• General AI is the concept where machines can mimic human intelligence and behaviour.
• Super AI makes machines so intelligent that they can surpass human intelligence and can perfor many task
better than a human with cognitive properties.
• The three domains of Artificial Intelligence are: Data, Computer Vision and Natural Language Processing.
• There are three types of Machine Learning algorithms: Supervised Learning, Unsupervised Learning and
Reinforcement Learning.
• Deep Learning is a type of Machine Learning that can process a wider range of data resources requires less
data processing by humans.
• Machine Vision technology gives eyes to a machine. Commented [h3]: Added

• To solve a problem with AI, we need to do some tasks in a series of phases or steps. These phases or steps
are a part of the life cycle of a project.
• Problem scoping refers to the identification of a problem and the vision to solve it.
• Identifying and defining goals is the first and the foremost step in any planning process.
• It is important to identify the stakeholders of a problem before developing any model.
• Stakeholders are the people who face the given problem and would be benefited with the solution.
• We use the 4Ws problem canvas for problem scoping. The four blocks are ‘Who’, ‘What’, ‘Where’ and
‘Why’.

• The stage of acquiring data from the relevant sources is known as data acquisition.
• Quantitative data represents the numerical value (i.e., how much, how often, how many) and gives
information about the quantities of a specific thing.

• Qualitative data are not numerical.


• Big Data is a concept used to describe a large (huge) volume of data, which are both structured and
unstructured, and that gets increased day-by-day.

• Data can be obtained from various sources. All the sources of data can be broadly classified into
primary sources and secondary sources.

• Data visualisation helps the users see and analyse data visually.
• Some of the data visualisation tools are Excel, Tableau, Datawrapper, Visme, etc.
• A rule-based approach is generally based on the data and rules fed to the machine, where the
machine reacts accordingly to deliver the desired output.

• In regression, the algorithm generates a mapping function from the given data, represented by the
solid line.

• In classification, the algorithm can determine which set a given data point belongs to by utilising a
classification function represented by the dotted line.

• Under learning approach, the machine is fed with data and the desired output to which the machine
designs its own algorithm (or set of rules) to match the data to the desired output fed into the
machine.

• Clustering is a machine learning approach where the machine generates its own rules or algorithms
to differentiate amongst the given dataset to achieve the pre-decided goal.

• A decision tree is a powerful tool used for predictions and classification. It is a flow control like a
tree structure.

AI LAB
AI Tool: AutoDraw
AutoDraw is a new kind of drawing tool. It pairs machine learning with drawings from talented artiststo
help everyone create anything visual. It is the fastest way to draw. One of the features of this is that
there is nothing to download and it is available free of cost. It works quite well on all devices like
smartphone, tablet, laptop, desktop, etc.

Installing and launching the AutoDraw tool

The steps to install and launch the AutoDraw drawing tool are as follows:

Step 1: Open the web browser on your computer.

Step 2: Type the following URL in the address bar of the browser:

https://experiments.withgoogle.com/autodraw

You will get the following screen:


Step 3: Click on the LAUNCH EXPERIMENT button. You will get the following screen:

Step 4: Click on the Start Drawing button to start this. You will get a blank drawing canvas.

Select tool
Auto Draw tool
Draw tool
Type tool
Fill tool
Shape tool
Color tool
Zoom tool
Undo tool
Delete tool

Step 5: Now, click on the AutoDraw pencil / Draw tool and start drawing.
Step 6: When you draw an image using the Autodraw pencil, it suggests you many images which
are created by artists. You can select any of the images. The main idea is, you have to drawa
rough sketch and it will display a number of suggestion images at the top. Now, you can
choose any of the images which suits your needs.

Step 7: You can fill color in your drawing using the Fill tool.

Step 8: You can also type the name of your drawing using the Text tool.
Step 9: You can draw different shapes using the Shape tool and can fill different colors in them.

Make beautiful drawings and enjoy painting!

AI CA SE S TU D IES REAL-LIFE CONNECT


Decision tree: The following is the best example to connect the decision tree with our real life.
It shows how we decide our life. The decision tree is just the manifestation of the pictorial
presentation of the decision process.
Am I hungry? Root

Yes No
Branches

Have `60? Go to sleep Leaves

No
Yes

Go to restaurant Buy a burger

At the root, first condition is plotted which is about the question: hungry or not? If yes, then again, a
condition: do you have `60? If your answer is yes then you can go to the restaurant and if no then you
can buy a burger or if you are not hungry then you can go to sleep.

Based on this decision tree, answer the following questions:

1. How many branches does the tree shown above have?

2. How many leaves does the tree shown above have?

3. How many nodes are there in the shown tree?


B L E N d BOT ENGLISH
Collect the population data of India for the last 10 years. Draw a chart and try to find the pattern
in the data.
EX ER CI S E B O T
A. Tick (✓) the correct option.
1. Which of the following is an example of strong AI?
a. Chess-playing b. Facial recognition c. Making judgements and plans d. Cleaning home
2. Which of the following is not a domain of AI?
a. Data b. Computer Vision c. NLP d. Machine learning
3. What does NLP stand for?
a. Neutral Learning Projection b. Neuro-Linguistic Programming
c. Natural Language Processing d. Neural Logic Presentation
4. Face recognition is an example of which type of Machine learning algorithm?
a. Supervised b. Unsupervised c. Reinforcement d. All of these
5. Which of the following is not a part of the AI project cycle?
a. Problem facing b. Data acquisition c. Data exploration d. Evaluation
6. Which of the following is not the requirement of the goals setting?
a. Realistic b. Specific c. Not achievable d. Measurable
7. Which of the following is/are the step of problem scoping?
a. Setting goals b. Identifying the stakeholders
c. Identifying the existing measures d. All of these
8. Which one of the following is the first stage of an AI project life cycle?
a. Data Acquisition b. Modelling c. Problem scoping d. Evaluation
9. Which of the following is not a part of 4Ws problem canvas?
a. Who b. What c. Where d. Which
10. Which of the following is not a way to collect data?
a. Random b. Survey c. Sensors d. Observation
11. Which of the following is the most basic criteria of classification of date?
a. Numerical b. Nominal c. Ordinal d. All of these
12. Which of the following is not a parameter of Big Data?
a. Volume b. Variety c. Velocity d. Verbal
13. Which of the following is not a data visualisation tool?
a. Excel b. Word c. Tableau d. Visme
14. The advantage of the decision tree are/is:
a. Decision tree can generate understandable rules.
b. Decision tree performs classification without requiring much computation.
c. Decision tree can handle both continuous and categorical variable.
d. All of these

B. Fill in the blanks.


1. Narrow AI is designed to perform a ................................... task. weak
2. ______________ machines understand the emotions and beliefs of humans and are able to interact socially like
humans. Theory of mind
3. ________________ is a collection of raw facts and figures collected from observations. Data
4. ________________technology gives eyes to a machine. Machine Vision
5. There are mainly .................................... stages of the AI Project Cycle.
6. Identifying and .................................... is the first and the foremost step in any planning process.
7.concerns refer to the moral conflict between what an individual needs and what a
society needs.
8. The stage of acquiring data from the relevant sources is known as ................................... .
9.data has a simple structure and stores the data in specific forms such as
tabular form.

10. .................................... data is used to denote features like daily, weekly, fortnightly, monthly,
annually, etc.
11. Velocity means the .................................... of the generation of data.
12. The .................................... sources are the sources of data where the user collects the data on its
own accord.
13. .................................... allows users to create and distribute an interactive and shareable dashboard,
which depict the trends, variations, and density of the data in the form of graphs and charts.
14. node represents the result or outcome, shown in a triangle.

C. Write T for True or F for False.


1. Artificial General Intelligence (AGI) is the concept where machines can mimic human intelligence and behaviour. T
2. Reactive machines can use their experience for decision-making. F
3. NLP is the domain of AI that trains machines to understand human language. T ...........
4. Computer Vision is the field of engineering that focuses on the design and manufacturing of robots. F
5. To define a problem and define the scope of that problem is an easy task. ...........
6. The goals must be defined keeping in mind the real-life complications that may arise at a
later stage of the problem. ...........
7. Ethical concerns refer to the moral conflict between what an individual needs and what a
society needs. ...........
8. The 4Ws problem canvas is ‘Who’, ‘What’, ‘Where’ and ‘Why’. ...........
9. Textual type of data is also referred to as quantitative data. ...........
10. Unstructured data is data that is organised in a pre-defined manner. ...........
11. Higher the volume of the data, the bigger the security risk. ...........
12. Secondary sources are the sources of data where the researcher uses data from the
previously conducted research. ...........
13. Heat maps are most useful when examining a large number of values. ...........
14. Regression algorithm does not generate a mapping function from the given data, represented
by the solid line. ...........

D. Short answer questions.


1. How general AI is better than Weak AI?
2. What is robotics?
3. What is Computer Vision? Can you say this an eye of computer? Why or why not? Commented [h4]: Added
4. Write the names of all the stages of AI project cycle.
5. How many steps are there in problem scoping? Write their names.
6. What are the known existing measures to a problem?
7. What is 4Ws problem canvas?
8. What is problem statement template?
9. What is the most basic criteria of classification of data? Write about them.
10. Write any two parameters of Big Data.
11. Define classification rule-based AI model.
12. What is clustering approach?
13. What is a decision tree?

E. Long answer questions.


1. Explain weak, general and strong AI.
2. What are the domains of AI? Explain about each.
3. Differentiate among the three types of machine learning algorithms. Commented [h5]: Added

4. What is AI project life cycle? What are the steps involved in the AI project life cycle? Explain in brief.
5. Write about different steps involved in the problem scoping.
6. What are the features of data?
7. Classify data on the basis of its structure.

8. What do you mean by Big Data? Write any four benefits of it.
9. Write the uses of Big Data in healthcare, banking and education.
10. What are the challenges of Big Data?
11. Explain the sources of collecting data.
12. What is data visualisation? Write about some of the popular data visualisation tools.
13. Write about the models of rule-based approach.
14. What is decision tree? Which nodes are the parts of a decision tree? Explain.

F. Skill-based questions.
1. Explore about SDGs and select any one out of them. Write the problem statement for the same. Also,
define data acquisition and exploration. Draw at least one graph for the collected data. Which learning
approach is used for your AI project modelling? Problem-solving skill
2. Write about any one problem from your surrounding and then write the possible solution for that. Write
the factors which may affect the solution of the problem. Write the factors in the form of questions
and answers. Write the consequences, if the answer is yes. Draw a decision tree for the same.
Critical-thinking skill
3. Rahul is depressed as he had not performed well in exams. Monika, his friend, wants to help Rahul out
of this depression. She asks Rahul to write the goals of his life on paper with all the possible aspects
and consequences. She also asks him to write what will happen if he is not able to achieve the goals
and how his life will be affected if he achieves his goals. She also asks him to prepare a table for the
above problem, solution, consequences and aspects, and prepare a tree chart. Then, she explains that
he is defeated in achieving only one but, there are many ways which go to his goals. What values has
Monika shown by helping Rahul? Social skill

IN TH E AI LAB
1. Explore the following websites for decision trees. These websites will help you draw the decision tree,
online.
• Creately
• SmartDraw
• Lucid Chart
Take any problem and try to draw a decision tree by using any of the three websites.
2. Type the following URL in the address bar of the browser:
https://teachable-snake.netlify.app/
This is the web link of an interactive web game ‘Teachable Snake’. To control the snake movement, you
need to draw the arrow on the paper instead of using physical buttons to control the game. For this, you
can draw a black arrow on a piece of white paper as controller and move the snake by turning the paperin
different directions in front of webcam. Try moving the snake and enjoy the game.
3. The data can be visualised by the Datawrapper tool. It allows you to visualise the data in various forms.
Type the following URL in the address bar of the browser:
https://app.datawrapper.de/chart/Gb0eb/upload
Upload some data and visualise it.

A I PROJECTS
1. As Artificial Intelligence gets incorporated in various industries, the employability of unskilled labour
reduces day-by-day. A lot of global reports and surveys have predicted mass unemployment in the near
future due to emerging technologies. Try to identify the problem in this scenario after collecting the
required details. Draw a 4Ws canvas for this problem.
2. Record the sounds of birds which chirp in free environment. After this, try to observe caged birds. Now,
explore the clips by listening to them carefully. Now, answer the following questions:
a. Do the sounds of free and caged birds sound similar?
b. Can you identify any difference in the sounds of free and caged birds?
c. Can you predict if a bird is caged or not just by listening to its chirping?
This project explains how an Artificially Intelligent machine is able to predict answers according to the
data on which it is trained.
3. The following is a dataset comprising four parameters which lead to the prediction of whether a tiger
would be spotted or not. The parameters which affect the prediction are:
Outlook, Temperature, Humidity and Wind. Draw a decision tree for this dataset.
Outlook Temperature Humidity Wind Tiger Spotted?
Sunny Hot High Weak No
Sunny Hot High Strong No
Overcast Hot High Weak Yes
Rain Mild High Weak Yes
Rain Cool Normal Weak Yes
Rain Cool Normal Strong No
Overcast Cool Normal Strong Yes
Sunny Mild High Weak No
Sunny Cool Normal Weak Yes
Rain Mild Normal Weak Yes
Sunny Mild Normal Strong Yes
Overcast Mild High Strong Yes
Overcast Hot Normal Weak Yes
Rain Mild High Strong No
CONCEPTBOT
AI PROJECT CYCLE
Will be
Data visualisation modified as
tools per change
Setting goals
Excel Problem scoping
Steps of problem scoping Identifying the
Tableau Data acquisition
4Ws problem canvas stakeholders
Datawrapper Data exploration
Identifying the
Visme Modelling
Who What Where Why existing measures
BoxPlot Evaluation
Features of data Identifying the
Histogram
Types of data ethical concerns
Heat map
Big Data
Line chart Sources of data
Bar chart Primary sources
Rule-based
Pie chart Learning-based Secondary sources
Scatter chart Decision tree
Bubble chart
Timeline chart
Treemap

For Teachers
You may explain the concept of AI project cycle. Before going into detail, first select any one problem. Then, discuss
all the five stages of this problem. Explain each stage, in detail. Discuss 4Ws canvas and encourage students to
answer all the 4Ws. Explain the meaning to Big Data. Encourage them to give examples of Big Data from their
surroundings.

For Parents
You may ask your child to list the most important life goals and their planning to achieve these goals. Ask them to
write the process and stages which must be followed to achieve these goals.

You might also like