AI Chapter2
AI Chapter2
BASICS OF AI
In this chapter
• • Types
Challenges or Problems
of AI
What happened,
• • Domains
AI Project Cycleof AI
Sofia? You look
• • Techniques of AI
• AI Scoping
Problem Project Cycle
Data• Acquisition
You are right,
very excited.
• Steps of AI Project Cycle
Data• Exploration
Roboto. Problem Scoping
• • Data Acquisition
• • Data Exploration
Modelling
• • Evaluation
Evaluation
Modelling
•
...............................
Sofia: Tomorrow is my sister’s birthday, so I
...............................
want to prepare a cake. Will you help me with the
recipe? ...............................
Roboto: Yes, why not! You may check on the ...............................
Internet for ingredient lists, method and
preparation time. You may refer to different
designs and flavours.
Write the recipe of the cake you want to prepare
with all the sub-steps so that it will be easy for
you to proceed. Prepare the cake according to your recipe.
I am sure you will make the tastiest cake for your sister.
Sofia: Thanks for helping me out. It would be fun.
We face many challenges or problems in our day-to-day life. Do you know that many of these can be
solved by AI projects? The projects need to be completed in steps with the knowledge of the basics of
Artificial Intelligence. These steps comprise the life cycle of a project.
In this chapter, we will learn about types of Artificial Intelligence, its domains, techniques and the steps of the
Artificial Intelligence project cycle in detail.
TYPES OF ARTIFICIAL INTELLIGENCE (Taken from C9)
Artificial intelligence is a branch of computer science that tries to replicate human intelligence in
a machine so that the machine can perform the assigned tasks. AI is run by machine learning
algorithms. Machine learning algorithms are improving day by day.
The categories of AI technologies depend on their capacity to mimic human traits, the technology they
use to do this, their real-world applications and the theory of mind. AI can be classified broadly into two
categories on the basis of its capabilities and functionalities.
Narrow AI or Weak AI
Artificial Narrow Intelligence (ANI), also referred to as narrow AI or weak
AI, is goal-oriented. It is designed to perform a single task, for example,
facial recognition, speech recognition, voice assistant, driving a car,
cleaning the home (Fig. 1.5) or searching on the Internet. Machines with
weak AI are intelligent enough to complete the specific task they have to
do. Such machines seem to be intelligent but they operate on a limited set
of specifications and parameters. Hence, they are not able to fully mimic Fig. 1.5 Robotic floor
humanintelligence. Examples of weak AI are Apple’s Siri, playing chess cleaner—an example
of Narrow AI
with a computer, providing buying suggestions on an e-commerce
website, cleaning home with Roomba and image recognition.
92
General AI
Artificial General Intelligence (AGI), also referred to as deep AI, is
the concept where machines can mimic human intelligence and
behaviour. In this case, machines can learn and apply human-like
intelligence to solve a problem. AGI can think, understand and act as
a human does in a given situation (Fig. 1.6). Chess-playing computers
Fig. 1.6 Example of General AI and self-driving cars are examples of general AI.
Super AI or Strong AI
Super AI, although a hypothetical concept, makes machines so
intelligent that they can surpass human intelligence and can perform
any task better than a human with cognitive properties (Fig. 1.7).
However, AI scientists have not yet achieved strong AI. Some key
characteristics of strong AI include the ability to think, reason, solve
puzzles, make judgements and plans and learn and communicate on
its own. Fig. 1.7 Example of Super AI
92
Differentiate between Narrow and General AI.
Narrow AI General AI
Example: Example:
Reactive Machines
Reactive machines understand their environment and decide their actions. They do not store
memories or experiences. Hence, such machines do not learn from their experience. Examples
of such machines are IBM’s Deep Blue system and Google’s AlphaGo.
Limited-memory Machines
Limited-memory machines have characteristics of reactive machines and can also use their
experience for decision-making. Such machines can store some experience and data for a
short period of time. Self-driving cars are one of the best examples of limited memory systems.
These carscan store the recent speeds of nearby cars, the distance of other cars, their speed
limits and other information to navigate the road.
Theory of Mind
Theory of mind machines understand the emotions and beliefs of humans and are able to
interact socially like humans. These systems are still not developed but research are going on.
Self-awareness
Self-awareness machines will have consciousness, sentiments and self-awareness. These are
not still developed but can be considered as the future of AI.
Let us imagine a cloth braid made of red, blue and green cloth pieces
(Fig. 1.20). If the red one depicts data, the blue depicts natural language
processing and the green depicts computer vision, the braid is Artificial
Intelligence. So, data, computer vision and natural language processing
go side-by-side in AI. Let’s learn more about them. Fig. 1.20 A Braid
Data
Data is the domain of AI that relates to the scientific methods, algorithms and processes that help
gain information from various types of structures as well as unstructured data. It is the plural formof
‘datum’. It is a collection of raw facts and figures collected from observations. As the digital world
is growing, data is also growing, exponentially. We have entered the era of Big Data. As perJohn
Naisbitt, ‘We are drowning in information and starving for knowledge.’ Who will help to gain
knowledge from that Big Data? Is it a big question for us? There are 1 trillion web pages on the Internet.
Every day 2.5 exabytes (2.5 × 260 bytes) of data are generated. This data is growing exponentially,
year by year. AI can use the data to develop its intelligence. Machine learning explores, studies and
constructs algorithms that can learn and make predictions on data. Such algorithms do follow
program instructions but also make predictions or decisions, based on data. They build a model from
sample inputs. Machine learning algorithms can process a large amount of data and extract useful
information. Big Data is time-consuming and difficult to process by a human, but this Big Data is the best
fodder to train machine learning algorithms.
Data sciences are one of the domains of AI-related to data systems and processes, in which the
system collects a large amount of data, maintains data sets and derives meaning or creates senseout
of them. Data is the backbone of AI. It is the food for all AI-enabled systems. Almost 98% of AI
systems depend on data. Most of the data is collected from us by the device which all of us have in our
hands, all the time, i.e., smartphone.
Computer Vision
Computer Vision (CV) is one of the domains of AI that trains machines
for a visual understanding. This happens by gaining an understanding of
digital images and videos to interpret them as humans do (Fig. 1.21). A
machine can get and analyse visual information and afterward predict
some decisions about it. It acquires, screens, analyses, identifies and
extracts information from a still image or video. This process helps the
Fig. 1.21 Computer Vision
computers to understand any visual content and act on it, accordingly.
In computer vision, input to machines can be photographs, videos and pictures from thermal or infrared
sensors, indicators and different sources. Face locking and unlocking in smartphones use thisfeature to
unlock the phone in which the smartphone’s owner can set up his/her face as an unlocking mechanism
for locking and unlocking purposes. The front camera detects and captures the face and saves its
features during initiation. Next time onwards, whenever the features match, the phone is unlocked. The
other applications of computer vision are self-driven cars, robotics, etc. The images
and data are captured by cameras and recorders in the form ofimages and videos, etc. This image is
Artificial Intelligence_Book 8 (High Res PDF).indb 5 09-09-2021 07:24:01 PM
processed and manipulated to get some useful information. This processing is done to reduce noise,
control brightness and set color contrast.
There are three types of Machine Learning algorithms, which are as follows:
136
Input
Data with States
Labels Labels Actions
Targets Evaluations
Output
Fig. 1.10 Machine Learning
Algorithms
Which type of learning do you think is the best fit for anomaly detection in a pattern?
• Deep Learning: It is a type of Machine Learning that can process a wider range of data resources,
requires less data processing by humans and can often produce more accurate results than
traditional machine learning approaches. In Deep Learning, interconnected layers of
software-based calculators known as ‘neurons’ form a neural network. The network can
ingest vast amounts of input data and process them through multiple layers that learn
increasingly complex features of the data at each layer. The network can then decide about
the data, learn if its determination is correct and use what it has learned to make
determinations about new data. For example, once it learns what an objectlooks like, it can
recognise the object in a new image.
• Machine Vision: This technology gives eyes to a machine. Here, machines (cameras) are used to
capture the visual information that is analysed and converted in analog to digital conversion
and digital signal processing is employed to process the data. Then, the resulting data is fed
to acomputer. For example, signature identification, pattern recognition, medical image
analysis, etc. Commented [h1]: NLP is repeated at two places. We
can omit from here or can give in very brief, referring the
• Natural Language Processing: The language earlier
used by a human is a natural language. The
field of AI which gives machines the
ability to read, process understand Write down the AI, ML and DL at the
and able to drivemeaning like a proper place, in the diagram.
human is Natural Language
Processing (NLP). The best and the ............................
oldest example of NLP is separating
junk e-mails bylooking at the subject
....................
line and text. Now, it is used in
136
.................
translation, sentiment analysis and
speech recognition.
• Robotics: This is the field of engineering that
focuses on the design and manufacturing
of robots. It is used to perform
difficult and repeated tasks. For
example, robots are usedin assembly
lines for car production.
Many problems in the world can be solved by AI projects. But as defined earlier, these projects need to
be completed in a series of phases. These phases are a part of the life cycle of a project.
Ch E C k B OT
Look around you and observe. Can you name some problems which people around you are
facing these days? Can you help them in any way? Write any five ways.
4. ................................................ 5. ................................................
136
STEPS OF AI PROJECT
CYCLE (Taken from C9)
To solve a problem with AI, we need to
do tasks in steps one after the other.
Fig. 2.1 AI Project Cycle
These tasks help us to complete an
AI project. To bake a cake, we have
followed the steps in sequence. The AI project cycle provides an appropriate framework that can leadus
towards the goal. There are mainly five stages of the AI project cycle that are as follows (Fig. 2.1). Let’s
learn about these stages in detail.
136
PROBLEM SCOPING
Problem scoping refers to identifying a problem and having a vision to solve it. The whole project
depends on this. This stage is time consuming. To define a problem and define the scope of that
problem is not an easy task. We need to have a deeper understanding of the problem so that the
picture becomes clearer while we are working to solve it.
Imagine that the world’s largest and most precious diamond is in danger as Mr Ray has threatened to
steal it. Mr Ray is very notorious for his stealing capabilities. Till now, no one has been able to track
him. So, the situation is very critical. You have been appointed as the Chief Security Officer and your
job is to enhance the security of the diamond to make the area impossible for Mr Ray to break into and
steal the diamond. Now that you are aware of AI concepts, plan to use them in accomplishing your
task.
Start with listing down all the factors which you need to consider while framing a security system.
.............................................................................................................................................................................
.............................................................................................................................................................................
Without knowing all the factors, it is quite difficult to quote a problem statement like who is going to
use that system, what can you do, etc.
Setting Goals
Identifying and defining goals is the first and the foremost step in any planning process and AI is no
exception to this. Goals are the objectives that an AI project needs to achieve. Goals also help in
looking for the reasons why a problem exists. They aid in minimising the various challenges faced in
finding a solution to the problem. Identifying and defining goals beforehand set the basis on which we
define actions. These goals need to be:
• Specific: Specific goals enable the planning team to ensure what is to be achieved and give a clear
vision to the executing team to ensure how they are to be achieved. They also reduce the chances
of uncertainties that may arise at a later stage.
• Measurable: The goals must be measurable. This helps in comparing the actual performance with
the planned outcome. For example, if the goal statement is ‘to increase the sales from the present
level of 50,000 pens per month’, then the goal is not measurable as the exact number that needs to
be increased is not defined. A more appropriate goal will be to increase the sales from the present
level of 50,000 pens per month to 75,000 pens per month.
• Achievable: The goals and objectives must be achievable by keeping in mind the available resources.
• Realistic: The goals and objectives must be easy to deliver, especially if you face problems or
complications. This is important because these problems will reduce the overall quality of the
project’s outcome and lead to running over budget and not meeting the set deadlines.
• Time-bound: This is important so that the project does not overstep the allocated time frame.
In short, there is no need to reinvent the ‘wheel’, rather the focus must be on improving the durability of
the ‘wheel’.
It is important to identify moral concerns that may arise at the time of implementing a course of action
in the planning stage itself. These moral concerns may be identified by getting in touch with the
stakeholders about the problem and its outcome.
Let us now start scoping a problem. Look around you and select a theme which interests you the
most. For more options, you can also refer to the 17 Sustainable Development Goals. For example,in
health, there are medicinal aid, mobile medications, spreading of diseases, etc. all being very
different from each other but still a part of the health theme. Thus, to effectively understand the
problem and elaborate on it, we need to select one topic under the theme and write the problems
based on that topic.
138
Ch E C k B OT
Select any one theme from your surroundings which interests you the most. Some of the
suggested themes are given below:
Travel Social
Disability Research
Tourism Welfare
The 4Ws Problem Canvas helps you in identifying the four crucial parameters related to the problem.Let
Who?
The ‘Who’ block helps you to analyse the people getting affected directly or indirectly due to this
project. Under this, you find out who the ‘Stakeholders’ to this problem are and what you know about
them. Stakeholders are the people who face the problem and would be benefited with the solution.
These people are the best describers who can help you to define the problem as they are the ones
facing the problem.
What?
Under this block, you must know what you have in hand. At this stage, you determine the nature of
the problem. What is the problem and how do you know that it is a problem? Under this block, you
also gather evidence to prove that the problem you have selected actually exists. Newspaper articles,
media, announcements, etc. are examples.
Where?
Now that you know who is associated with the problem and what the problem actually is. You need to
focus on the context/situation/location of the problem. This block will help you look into the situation
in which the problem arises, the context of it, and the locations where it is prominent.
Why?
Finally, you have all the major elements that affect the problem directly. Now, it is easy to understand
who the people that would be benefited by the solution are; what is to be solved; and where will the
solution be deployed. These three canvases now become the basis of why you want to solve this
problem. Thus, in the ‘Why’ canvas, think about the benefits which the stakeholders would get from the
solution and how it will benefit them as well as the society.
CH E C K B O T
In the earlier activity, you have chosen one theme and one topic. Based on that topic, fill in
the 4Ws Problem Canvas.
Let us fill the ‘Who’ canvas!
After filling the 4Ws problem canvas, you now need to summarise all the 4Ws into one template. This
template is called the problem statement template (Fig. 2.3). The problem statement template helps
us to summarise all the key points into one single template so that in the future whenever there is a
need to look back at the basis of the problem, we can take a look at the problem statement template
and understand the key elements of it.
Problem Statement Template with space to fill details according to your goal:
Our [stakeholders(s)]
................................................................................
Who
................................................................................
................................................................................
has/have a problem [issue, problem, need]
that ................................................................................
What
................................................................................
................................................................................
when/while [context, situation]
................................................................................
Where
................................................................................
................................................................................
An ideal [benefit or solution for them]
solution would ................................................................................
Why
................................................................................
................................................................................
Fig. 2.3 Problem Statement Template
After observing these factors, you will get clarity towards the issue to be solved which leads youtowards
data acquisition.
TH I N K B O T
DATA ACQUISITION From where can you collect the data
for your project? Are data images,
The next stage of the AI project cycle is about acquiring text, numbers or all of these?
data for the project. Data is a piece of information or
facts and statistics that needs to be collected for reference or analysis. In an AI project, a large amount
of data is needed to train it. This data is collected from various sources.
For example, suppose you want to make an Artificially Intelligent system which can predict wheat
crop production by using the past weather, soil and climate data. This is the data by which AI-machine
is trained. Once machine training is complete, it is ready to predict wheat crop production using the
previous wheat training data while the next wheat crop prediction data set is known as testing data.
The stage of acquiring data from the relevant sources
is known as data acquisition. The quality of data is very
important. Authentic and relevant data should be used to GET IT RIGH T
train the machine. Data can be collected from random
websites as huge data is available online.
This is not true. The data available on
Features of Data the websites can be fake. Always collect
Features of data refer to the type of data required to train data from authentic sources.
the particular machine. Data can be collected from various
sources like the Internet, sensors, mobile devices, etc. Authentic and correct data can be collected
from government portals. The data which we collect must be opensourced and not someone’s
property. Data collected without abiding1 by copyright policies from private sources can be offensive2.
Some of the ways to collect data (Fig. 2.4) are as follows:
API
Cameras Observations (Application
Program Interface)
Machine learning is a subset of AI. This discipline of machine learning relies on data that we have
already studied to perform AI training in a supervised or unsupervised way. Data is the most valuable
resource to train the machine. The more authentic and relevant data you have, the better you can train
the machine. In both of the cases, the most important factor is not the learning process, but the quality
of data. Data acquisition is collecting data.
The data that you collect yourself rather than the data collected from another party is primary data.
This is collected by survey. As you know, mobile is one of the most important devices to collect data.
The sensors of the device, cameras and browsing history or digital footprints create good data. That
data is the most authentic as it is collected by the day-to-day transactions. It cannot be fake. That’s why
data is called the new gold of this era.
Types of Data
Different types of data are used in developing AI projects. There are various criteria on the basis ofwhich
data can be classified. The most basic criteria of classification of data is as follows:
• Quantitative data: Quantitative data is also known as numeric data. Quantitative data represents
the numerical value (i.e., how much, how often, how many). It gives information about the quantities
1
Abiding: Lasting for a long time and not changing
2
Offensive: Rude in a way that causes somebody to feel upset
of a specific thing. Examples of numerical Types of Data
data are height, length, size, weight, and so
on. The quantitative data can be classified
into two different types, based on the data Categorical or Numerical or
Qualitative Data Quantitative Data
sets. The two different classifications of
numerical data are continuous data and
discrete data. Nominal Ordinal Discrete Continuous
Data Data Data Data
○ Continuous data: Data that has any
numerical value is called continuous data. Fig. 2.5 Types of Data
It has an infinite number of probable values that can be selected within a given specific range.For
example, temperature range, 10.5 kg and 200.50 km.
○ Discrete data: It is information that can only take certain values, like the number of students ina
class can only be natural numbers, and not decimals or fractions.
• Qualitative data: It is also known as the categorical data. It describes the data that fits into the
categories. Qualitative data is not numerical. The categorical information involves categorical
variables that describe the features such as a person’s gender, home town, etc. Categorical
measures are defined in terms of natural language specifications, but not in terms of numbers.
Sometimes, categorical data can hold numerical values (quantitative value), but those values do not
have mathematical sense. Examples of the categorical data are birth date, favorite subject, citypin
code, etc. Here, the birth date and city pin code hold the quantitative value, but it does not give
numerical meaning.
The two different classifications of qualitative data are nominal data and ordinal data.
○ Nominal data: It is one of the types of qualitative information which helps to label the variables
without providing the numerical value. It is also known as nominal scale. It cannot be ordered
and measured. But sometimes, the data can be qualitative and quantitative. Examples of
nominal data are letters, symbols, words, gender, etc.
○ Ordinal data: Ordinal data/variable is a type of data which follows a natural order. The
significant feature of ordinal data is that the difference between the data values is not
determined. This variable is mostly found in surveys, finance, economics, questionnaires, and
so on.
Another criterion based on which data can be classified is the structure of the data. On this basis, data
can be classified as follows:
• Structured data: These data have a specific pattern or set of rules. They have a simple structure
and store the data in specific forms such as tabular form. For example, the cricket scoreboard,
school timetable, exam datesheet, etc.
• Unstructured data: The data structure which doesn’t have any specific pattern or constraints can
be stored in any form is known as unstructured data. Mostly the data that exists in the world is
unstructured data. For example, YouTube videos, Facebook photos, etc.
• Semi-structured data: It is the combination of both structured and unstructured data. Some data
can have a structure like a database whereas some data can have markers and tags to identify the
structure of data.
Some other types of data can be as follows:
• Useless data: Useless data is unique and discrete data with no relationship with the result. For
example, in a list of people eligible to vote, a column having ‘Yes’ in all rows of the column is more
or less useless.
• Time-stamped data: It helps the system to predict the next best action. It follows a specific time-
order to define the sequence. This time can be the time of data captured or processed or collected.
• Machine data: It is the result or output of a specific program, system or technology. It consists of
data related to a user’s interaction with the system like the user’s logged-in session data, specific
search records, user engagement such as comments, likes and shares, etc.
• Binary data: Binary data is made up of only two categories: 0 for ‘off’ or ‘no’ and 1 for ‘on’ or ‘yes’.
This type of data is very common in the study of AI as the machine understands only binary
information. Binary data is a very common outcome variable in classification problems in AI. For
example, an AI model developed to predict whether a tumour is malignant or benign may give the
result in form of binary data.
• Spatiotemporal data: It contains information related to geographical location and time. It records
the location through GPS and time-stamped data where the event is captured or data is collected.
• Open data: It is freely available data for everyone. Anyone can reuse this kind of data.
• Real-time data: The data which is available with the event is considered as real-time data.
• Time data: It is used to denote features like daily, weekly, fortnightly, monthly, annually, etc. This
data plays an important role while developing an AI model to schedule activities as a step of
machine intelligence.
• Interval data: Interval data has a gap between the data entries. This data is represented in the form
of groups like age groups (0–2 years, 2–5 years, 5–8 years, etc.), income groups (Less than 50,000,
50,000 to 1,00,000, etc.). This type of data is a more precise measurement scale data.
There is one more type of data, i.e., Big Data. Let’s learn about this.
Big Data
Big Data can be defined as a concept used to describe a large (huge) volume of data, which are both
structured and unstructured, and that gets increased day by day by any system or business. It is data
that is quite large and complex. The hugeness and complexity of Big Data can be imagined keeping in
mind that none of the traditional tools of data management can efficiently store and process this
Big Data.
For example, the daily data generated by stock exchanges run into terabytes (TB) each day. One can
imagine the hugeness of data in a year or a decade. The daily data generation in the form of messages,
audios, videos and animations, on social media platforms like Facebook and WhatsApp is more than 500
TB. A jet engine generates data of more than 10 TB in a 30-minute long flight. Imagine the data
accumulated by jet engines if there are thousands of flights each day. The data is sure to run into
Petabytes (PB).
Benefits of Big Data Processing
The ability to process Big Data brings in many benefits such as follows:
• Better decision-making: Big Data analytics has boosted the decision-making process to a great
extent. Rather than anonymously making decisions, companies are considering Big Data analytics
before concluding to any decision.
• New product development: Using Big Data analytics, trends of customer needs and satisfaction can
be analysed. This can further help to develop a whole new product according to their requirements.
• Reduction in cost: Using Big Data tools like Hadoop and Cloud-based analytics, cost saving in
business can be done. In business, when large amounts of data are there, then these tools help
to handle and maintain that data in more efficient ways.
• Businesses can utilise outside intelligence while making decisions: The businesses can access
data from social networking sites and fine-tune their business strategies according to the needs
and demands of society.
• Improved customer service: Traditional customer feedback systems are getting replaced by new
systems designed with Big Data technologies. In these new systems, Big Data and natural language
processing technologies are being used to read and evaluate consumer responses.
• Early identification of risk to the product/services: The use of huge data resources helps in
identifying the expected product or service-related risk at the earliest possible stage.
• Better operational efficiency: In case of new additional data, Big Data systems are used to identify
and segregate relevant data from the huge volume of new data. Such integration of Big Data
systems and data warehouse help an organisation to off-load infrequently accessed data.
• Education: The education industry is over-flooded with huge volumes of data covering details about
students, teachers, universities, courses, grades, educational resources, etc. These huge volumes
of data have helped the education industry towards the development of customised and dynamic
learning programs, reframing course material, advanced grading systems and career growth.
• Healthcare: The healthcare industry is also over-flooded with huge volumes of data covering details
about patients, doctors, hospitals, diseases, remedies, precautions, medicines, etc. The use of Big
Data in healthcare industries has resulted in reducing costs of treatment, preventive measuresto
be taken in case of epidemic outbreaks, identifying and preventing the growth of malignant
diseases at an early stage and recommendation of evidence-based medicines.
• Government: The government of every country has to work on huge volumes of data. It has to
keep a record of citizens, GDP, energy resources, geographical surveys, infrastructure, sector-wise
growth, growth prospects, etc. This Big Data helps the government to introduce welfare schemes,
security of data, identify areas of attention, meet national challenges like terrorism, unemployment,
poverty, overpopulation, etc.
• Media and Entertainment: The media and entertainment industry have immense data in the form of
photos, videos, audio, animations, reviews, comments, etc. The social media platforms have added
to the existing data resources of the industry. These Big Data resources have helped the industry
to predict the interest of the audience, scheduling online streaming, analysis of customer reviews,
audience-targeted advertising, etc.
• Meteorology: The meteorological departments have data flowing over many years towards weather
trends of different regions in the country and across the globe. These environment-related Big Data
have helped in weather forecasting, studying global warming, disaster prediction and management,
patterns of disasters, availability of resources like wind, water, etc.
• Transportation: The transportation industry has used Big Data to plan travel routes, manage traffic
congestions, identification of accident-prone areas, increasing traffic safety levels, etc.
• Banking: The banking sector data has skyrocketed with digitalisation of the industry. The data
comprises details of customers, banks and their branches, employee codes, account numbers and
balances, details of services like credit cards, overdrafts, time deposits, etc. The industry has used
Big Data to offer better customer services round the clock, prevent misuse of credit/debit cards,
clarity of business proposals, prevention of money laundering, mitigation of customer risk, etc.
• Volume: The name Big Data itself means the data which is bigger than the ordinary data. The size
of data plays an important role in classifying the data as Big data or not. If the data generated runs
into TB but is only one-time and is not growing, then it cannot be classified as Big Data.
• Variety: Big Data gets accumulated from various sources. So, such data has a lot of variety
amongst the individual records. As it is a collection of structured data (like databases and
spreadsheets) as well as unstructured data (like e-mails, PDFs, photos, videos, audios, etc.), it ismost
useful for analytical purposes.
• Velocity: Velocity means the speed of the generation of data. Big Data is generated and processed
at an extraordinary speed. Big Data deals with the speed at which data flows in from sources like
business processes, application logs, networks, social media sites, sensors and mobile devices,
etc. The flow of data is colossal3 and continuous.
• Variability: Variability is the inconsistency shown by the data that may arise upon the introduction
of data from varied sources. This sometimes lessens the data processing speed as the machine
initially synchronises the data internally to give meaningful results.
• As a famous English proverb says, ‘A little knowledge is a dangerous thing’. Incomplete and
insufficient knowledge of Big Data may put the success of the entire project in jeopardy4.
3
Colossal: Extremely large
4
Jeopardy: In a dangerous position or situation and likely to be lost or harmed
• It cannot be guaranteed that the Big Data collected and analysed are totally (100%) accurate.
Redundant data, contradicting data or incomplete data are challenges that remain within it.
• Over-variety in Big Data may result in confusion and the selection of irrelevant data.
• A huge investment is involved in developing a Big Data model. Leaving a project incomplete may
result in huge losses.
• Sometimes, the less-trained managers may not be able to understand the complexity and quality
of Big Data. As a result, the outcome of the project may be different from what was planned or
targeted.
• The higher the volume of the data, the bigger the security risk. So, Big Data applications involve
cybersecurity risks and need to be plugged well in advance.
• The process of converting Big Data into valuable results is tricky and needs to be handled by well-
qualified and suitably trained personnel.
• The generation of such a massive amount of data needs space for storage and organisations face
challenges to handle such extensive data without suitable tools and technologies.
• The managing of Big Data is already a complex process. The upscaling or diversification of industry
using Big Data poses a greater challenge.
• The growth velocity at such a high rate creates a problem to look for insights using it. There is no
100% efficient way to filter out relevant data.
Sources of Data
Data can be obtained from various sources. All the sources of data (Fig. 2.6) can be broadly classified
into primary sources and secondary sources.
DATA EXPLORATION
While collecting data, we can notice that all data is not in the same format, for example, date
formats can be dd/mm/yy, dd/mm/yyyy, etc. The data we have collected from various sources is
full of numbers and the quantity of data is too much that one cannot make sense out of it. To take a
fruitful decision from data, we need to summarise data, identify trends and the pattern in the data. By
observing the pattern, one can make the decision easily. Suppose if you want to search content, you
may go to the library. There are many books. Can you read or select all the books in one go? No, we
cannot do so as we need to browse the complete index. If we find some interesting content then only,
we choose the book for reading. Similarly, to analyse the data, we need to visualise the data in some
user-friendly formats so that the trends and patterns can be found easily.
Data visualisation refers to presenting data in the form of pictures or graphs. It helps the users to
see and analyse data visually. In this way, they can compare existing patterns, identify new patterns
and understand the relationship between different variables. AI makes the visualisation of data
interactive. This further enables the user to reformat the visuals according to one’s need and levelof
understanding. Using charts or graphs helps the user to visualise large amounts of complex data
better than studying spreadsheets or reports. It is a quick and easy way to convey concepts. It also
helps in quickly identifying the areas that need attention or improvement. It clarifies and pinpoints the
factors that influence customer behaviour. It helps in the proper allocation and placement of productsto
achieve maximum objectives. Data visualisation also helps in predicting future trends of sales, profits,
production, etc.
Data Visualisation Tools
There are various tools available to convert textual data into visual data. Some of the effective and
popular data visualisation tools are as follows:
• Excel: It is the most basic and the most widely used data visualisation tool for basic purposes. It
cannot be used for Big Data visualisation but is effective for smaller organisations with a small set
of data.
• Tableau: It is a business intelligence tool for visually analysing the data. Users can create and
distribute an interactive and shareable dashboard, which depict the trends, variations and density
of the data in the form of graphs and charts. It can connect to files, relational and Big Data sourcesto
acquire and process data. The software allows data blending and real-time collaboration which
makes it very unique. It is used by businesses, academic researchers and many government
organisations for visual data analysis.
• Datawrapper: It is used to visualise data as beautiful charts, maps and tables. The visualisations
created using Datawrapper are device adjustable. This means that the visuals automatically adjust
to the change in device from mobile to tablet to a personal computer.
• Visme: It is an online graphic tool to create charts, graphs, impressive infographics, clean and
professional slides, social media graphics, and so much more.
Some commonly used data visualisation graphical tools are:
MODELLING
Data in the computer is stored in the most basic form of numbers Learning Based
(which is binary—0s and 1s). But when we talk about discovering
patterns and trends in data then the machine goes for mathematical AI Models
representations of the same. The ability to mathematically describe
Rule Based
the relationship between parameters is the heart of every AI
model. Thus, whenever we talk about developing AI models, it is Fig. 2.17 Classification of AI Models
the mathematical approach towards analysing data which we refer to. Generally, the AI model can be
classified (Fig. 2.17) as follows:
Testing using
Testing data
developer. The machine follows the
Labelled Dataset
rules or instructions mentioned by
the developer and performs its task
accordingly.
OUTPUT
For example, if you have a dataset
comprising 200 apples and 200 bananas
images (Fig. 2.19). Then in order to train a Testing Data
machine, feed this data into the machine Machine Identifies the image
as APPLE
and label each image as either apple or
banana. Now, if you test the machine with Fig. 2.19 An Example of a Rule-based Approach
the image of an apple, it will compare the image with the trained data and according to the labels of
trained images, it will identify the test image as an apple. This is known as the rule-based approach.
The rules given to the machine in this example are the labels given to the machine for each image in
the training dataset.
Let’s understand the rule-based approach with the help of some models.
• Regression: In regression, the algorithm generates a mapping function from the given data,
represented by the solid line. The dots shown in the graph are the data values and the solid line
here represents the mapping done for them (Fig. 2.20). With the help of this mapping function,
we can predict future data. To apply the Regression Classification
regression modelling technique, we need
continuous data. For example, if we want to
predict the salary of an employee, we can
use his past salaries as training data and
can predict his next salary.
• Classification: In classification, the
algorithm can determine which set a
given data point belongs to by utilising
a classification function represented Fig. 2.20 Regression Model Fig. 2.21 Classification Model
by the dotted line. The model classifies datasets according to the rules given to it. Usually, the
dataset used for classification are
labelled and the data then gets Date
Learning-based Approach Rules
sorted according to their labelling
Answers
(Fig. 2.21). Testing data is then
classified as one of the labels of Fig. 2.22 Learning-based Approach
the training dataset. For example, if we want to train a model to identify if an image is of a guitar
or a piano, we need to train it with multiple images of both guitar and piano along with their labels.
The machine will then classify images based on the labels and predict the correct label for testing
data. The classification works on the discrete dataset.
Learning-based Approach
Under learning-based approach, the machine is fed with data GET IT RIGh T
and the desired output to which the machine designs its own An algorithm that is once designed
cannot be altered without human
algorithm (or set of rules) to match the data to the desired
intervention.
output fed into the machine (Fig. 2.22). Learning-based
This is not true as machines work on the
approach refers to the AI modelling where the relationship
data fed in them.
or patterns in data are not defined by the developer. In this
approach, random data is fed to the machine and it is left on
the machine to figure out patterns and trends out of it. Generally, this approach is followed when the
data is unlabelled and too random for a human to make sense out of it. Thus, the machine looks at
the data, tries to extract similar features out of it and clusters same datasets together. In the end as
output, the machine tells us about the trends which it observed in the training data.
To understand this, we will take the earlier example of apples and bananas. Suppose a dataset has 200
images of apples and 200 images of bananas. These images are labelled with apples and bananas.
These apples and bananas may be of different shapes and sizes. Now, the algorithm is designed in
such a way that it can identify apples and bananas based on their features. It can predict any image
which belongs to apple and banana. After training, the machine is fed with testing data, this data may
not have the same or similar images but the machine can adapt itself and identify the correct label for
the new image feed.
Decision Tree
A decision tree is the most powerful and popular tool used for predictions and classification. This
method uses a set of declarative rules as an input for generating a decision tree. It is a flow control like
a tree structure. This uses a top-down approach. Each node in the tree act like a test case for some
attributes and each edge descending from that node corresponds to one of the possible answers to
the test case. It is a reverse tree, the root is at the top and the leaves are at the bottom. An instance
is classified by starting at the root node of the tree, testing the attribute specified by this node, then
moving down the tree branch corresponding to the value of the attribute. The tree-based method
empowers predictive models with high accuracy, stability and ease of interpretation. These are
considered to be one of the best and most used supervised learning methods. A decision tree can be
utilised for both classification and regression types of problems.
Some common terms used for decision tree are as follows:
• Root node: It represents the entire sample that is further divided into two or more homogeneous
sets.
• Splitting: It is a process by which a node is divided into two or more sub-nodes.
• Decision or interior node: It is the node where the splitting takes place. In other words, it is a place
where the sub-node is divided into other sub-nodes.
• Leaf node or terminal node: It is the node with no child node.
• Branch or Sub-tree: A subsection of the decision tree is known as a branch or sub-tree.
• Parent node and child node: The bottom node which derives from the top node is known as the
child node whereas the top node is known as the parent node.
The decision tree is made up of various nodes. These nodes are the parts of a decision tree. They areas
follows:
EVALUATION
Once a model has been made and trained, it needs to go through proper testing so that one can
calculate the efficiency and performance of the model. Hence, the model is tested with the help of
testing data which was separated from the acquired dataset at the Data Acquisition stage.
A C T IV IT Y B OT
Let’s play JAM (Just a minute). Put some slips in a bowl and divide the class into two groups.
One student from each group will pick a slip from the bowl and speak for one minute on the selected topic.
The bowl can have the following topics:
AI G L O S S A R Y
• Algorithm: Step-by-step procedure to solve a problem by following certain rules
• Automation: Occurs when repetitive tasks with a rule-based approach are automated through
software or hardware systems
• Robotics: The field of engineering that focuses on the design and manufacturing of robots Commented [h2]: Added
• Root node: The entire sample that is further divided into two or more homogeneous sets
• Splitting: A process by which a node is divided into two or more sub-nodes
• Decision node: A node where the splitting takes place
• Leaf node or terminal node: Node with no child node
• Branch or Sub-tree: A subsection of the decision tree
• Parent node: The top node is known as the parent node
• Child node: The bottom node which derives from the top node
• Chance nodes: Representation of probability or uncertainty, shown in a circle
• End nodes: Representation of the result or outcome, shown in a triangle
AI SU M M A R Y
• AI can be classified broadly in two categories, on the basis of their capabilities and on
functionalities.
• Narrow AI or weak AI is designed to perform a single task.
• General AI is the concept where machines can mimic human intelligence and behaviour.
• Super AI makes machines so intelligent that they can surpass human intelligence and can perfor many task
better than a human with cognitive properties.
• The three domains of Artificial Intelligence are: Data, Computer Vision and Natural Language Processing.
• There are three types of Machine Learning algorithms: Supervised Learning, Unsupervised Learning and
Reinforcement Learning.
• Deep Learning is a type of Machine Learning that can process a wider range of data resources requires less
data processing by humans.
• Machine Vision technology gives eyes to a machine. Commented [h3]: Added
• To solve a problem with AI, we need to do some tasks in a series of phases or steps. These phases or steps
are a part of the life cycle of a project.
• Problem scoping refers to the identification of a problem and the vision to solve it.
• Identifying and defining goals is the first and the foremost step in any planning process.
• It is important to identify the stakeholders of a problem before developing any model.
• Stakeholders are the people who face the given problem and would be benefited with the solution.
• We use the 4Ws problem canvas for problem scoping. The four blocks are ‘Who’, ‘What’, ‘Where’ and
‘Why’.
• The stage of acquiring data from the relevant sources is known as data acquisition.
• Quantitative data represents the numerical value (i.e., how much, how often, how many) and gives
information about the quantities of a specific thing.
• Data can be obtained from various sources. All the sources of data can be broadly classified into
primary sources and secondary sources.
• Data visualisation helps the users see and analyse data visually.
• Some of the data visualisation tools are Excel, Tableau, Datawrapper, Visme, etc.
• A rule-based approach is generally based on the data and rules fed to the machine, where the
machine reacts accordingly to deliver the desired output.
• In regression, the algorithm generates a mapping function from the given data, represented by the
solid line.
• In classification, the algorithm can determine which set a given data point belongs to by utilising a
classification function represented by the dotted line.
• Under learning approach, the machine is fed with data and the desired output to which the machine
designs its own algorithm (or set of rules) to match the data to the desired output fed into the
machine.
• Clustering is a machine learning approach where the machine generates its own rules or algorithms
to differentiate amongst the given dataset to achieve the pre-decided goal.
• A decision tree is a powerful tool used for predictions and classification. It is a flow control like a
tree structure.
AI LAB
AI Tool: AutoDraw
AutoDraw is a new kind of drawing tool. It pairs machine learning with drawings from talented artiststo
help everyone create anything visual. It is the fastest way to draw. One of the features of this is that
there is nothing to download and it is available free of cost. It works quite well on all devices like
smartphone, tablet, laptop, desktop, etc.
The steps to install and launch the AutoDraw drawing tool are as follows:
Step 2: Type the following URL in the address bar of the browser:
https://experiments.withgoogle.com/autodraw
Step 4: Click on the Start Drawing button to start this. You will get a blank drawing canvas.
Select tool
Auto Draw tool
Draw tool
Type tool
Fill tool
Shape tool
Color tool
Zoom tool
Undo tool
Delete tool
Step 5: Now, click on the AutoDraw pencil / Draw tool and start drawing.
Step 6: When you draw an image using the Autodraw pencil, it suggests you many images which
are created by artists. You can select any of the images. The main idea is, you have to drawa
rough sketch and it will display a number of suggestion images at the top. Now, you can
choose any of the images which suits your needs.
Step 7: You can fill color in your drawing using the Fill tool.
Step 8: You can also type the name of your drawing using the Text tool.
Step 9: You can draw different shapes using the Shape tool and can fill different colors in them.
Yes No
Branches
No
Yes
At the root, first condition is plotted which is about the question: hungry or not? If yes, then again, a
condition: do you have `60? If your answer is yes then you can go to the restaurant and if no then you
can buy a burger or if you are not hungry then you can go to sleep.
10. .................................... data is used to denote features like daily, weekly, fortnightly, monthly,
annually, etc.
11. Velocity means the .................................... of the generation of data.
12. The .................................... sources are the sources of data where the user collects the data on its
own accord.
13. .................................... allows users to create and distribute an interactive and shareable dashboard,
which depict the trends, variations, and density of the data in the form of graphs and charts.
14. node represents the result or outcome, shown in a triangle.
4. What is AI project life cycle? What are the steps involved in the AI project life cycle? Explain in brief.
5. Write about different steps involved in the problem scoping.
6. What are the features of data?
7. Classify data on the basis of its structure.
8. What do you mean by Big Data? Write any four benefits of it.
9. Write the uses of Big Data in healthcare, banking and education.
10. What are the challenges of Big Data?
11. Explain the sources of collecting data.
12. What is data visualisation? Write about some of the popular data visualisation tools.
13. Write about the models of rule-based approach.
14. What is decision tree? Which nodes are the parts of a decision tree? Explain.
F. Skill-based questions.
1. Explore about SDGs and select any one out of them. Write the problem statement for the same. Also,
define data acquisition and exploration. Draw at least one graph for the collected data. Which learning
approach is used for your AI project modelling? Problem-solving skill
2. Write about any one problem from your surrounding and then write the possible solution for that. Write
the factors which may affect the solution of the problem. Write the factors in the form of questions
and answers. Write the consequences, if the answer is yes. Draw a decision tree for the same.
Critical-thinking skill
3. Rahul is depressed as he had not performed well in exams. Monika, his friend, wants to help Rahul out
of this depression. She asks Rahul to write the goals of his life on paper with all the possible aspects
and consequences. She also asks him to write what will happen if he is not able to achieve the goals
and how his life will be affected if he achieves his goals. She also asks him to prepare a table for the
above problem, solution, consequences and aspects, and prepare a tree chart. Then, she explains that
he is defeated in achieving only one but, there are many ways which go to his goals. What values has
Monika shown by helping Rahul? Social skill
IN TH E AI LAB
1. Explore the following websites for decision trees. These websites will help you draw the decision tree,
online.
• Creately
• SmartDraw
• Lucid Chart
Take any problem and try to draw a decision tree by using any of the three websites.
2. Type the following URL in the address bar of the browser:
https://teachable-snake.netlify.app/
This is the web link of an interactive web game ‘Teachable Snake’. To control the snake movement, you
need to draw the arrow on the paper instead of using physical buttons to control the game. For this, you
can draw a black arrow on a piece of white paper as controller and move the snake by turning the paperin
different directions in front of webcam. Try moving the snake and enjoy the game.
3. The data can be visualised by the Datawrapper tool. It allows you to visualise the data in various forms.
Type the following URL in the address bar of the browser:
https://app.datawrapper.de/chart/Gb0eb/upload
Upload some data and visualise it.
A I PROJECTS
1. As Artificial Intelligence gets incorporated in various industries, the employability of unskilled labour
reduces day-by-day. A lot of global reports and surveys have predicted mass unemployment in the near
future due to emerging technologies. Try to identify the problem in this scenario after collecting the
required details. Draw a 4Ws canvas for this problem.
2. Record the sounds of birds which chirp in free environment. After this, try to observe caged birds. Now,
explore the clips by listening to them carefully. Now, answer the following questions:
a. Do the sounds of free and caged birds sound similar?
b. Can you identify any difference in the sounds of free and caged birds?
c. Can you predict if a bird is caged or not just by listening to its chirping?
This project explains how an Artificially Intelligent machine is able to predict answers according to the
data on which it is trained.
3. The following is a dataset comprising four parameters which lead to the prediction of whether a tiger
would be spotted or not. The parameters which affect the prediction are:
Outlook, Temperature, Humidity and Wind. Draw a decision tree for this dataset.
Outlook Temperature Humidity Wind Tiger Spotted?
Sunny Hot High Weak No
Sunny Hot High Strong No
Overcast Hot High Weak Yes
Rain Mild High Weak Yes
Rain Cool Normal Weak Yes
Rain Cool Normal Strong No
Overcast Cool Normal Strong Yes
Sunny Mild High Weak No
Sunny Cool Normal Weak Yes
Rain Mild Normal Weak Yes
Sunny Mild Normal Strong Yes
Overcast Mild High Strong Yes
Overcast Hot Normal Weak Yes
Rain Mild High Strong No
CONCEPTBOT
AI PROJECT CYCLE
Will be
Data visualisation modified as
tools per change
Setting goals
Excel Problem scoping
Steps of problem scoping Identifying the
Tableau Data acquisition
4Ws problem canvas stakeholders
Datawrapper Data exploration
Identifying the
Visme Modelling
Who What Where Why existing measures
BoxPlot Evaluation
Features of data Identifying the
Histogram
Types of data ethical concerns
Heat map
Big Data
Line chart Sources of data
Bar chart Primary sources
Rule-based
Pie chart Learning-based Secondary sources
Scatter chart Decision tree
Bubble chart
Timeline chart
Treemap
For Teachers
You may explain the concept of AI project cycle. Before going into detail, first select any one problem. Then, discuss
all the five stages of this problem. Explain each stage, in detail. Discuss 4Ws canvas and encourage students to
answer all the 4Ws. Explain the meaning to Big Data. Encourage them to give examples of Big Data from their
surroundings.
For Parents
You may ask your child to list the most important life goals and their planning to achieve these goals. Ask them to
write the process and stages which must be followed to achieve these goals.