Class_10_AI_STUDY_MATERIAL_19.08.2024_2_1
Class_10_AI_STUDY_MATERIAL_19.08.2024_2_1
The objective of this module/curriculum - which combines both Inspire and Acquire modules is to develop a
readiness for understanding and appreciating Artificial Intelligence and its application in our lives. This
module/curriculum focuses on:
1. Helping learners understand the world of Artificial Intelligence and its applications through
games, activities and multi-sensorial learning to become AI-Ready.
2. Introducing the learners to three domains of AI in an age-appropriate manner.
3. Allowing the learners to construct meaning of AI through interactive participation and engaginghands-
on activities.
4. Introducing the learners to AI Project Cycle.
5. Introducing the learners to programming skills - Basic python coding language.
LEARNING OUTCOMES:
11. Brainstorm on the ethical issues involved around the problem selected.
12. Foresee the kind of data required and the kind of analysis to be done, identify data requirementsand
find reliable sources to obtain relevant data.
13. Use various types of graphs to visualize acquired data.
14. Understand, create and implement the concept of Decision Trees.
15. Understand and visualize computer’s ability to identify alphabets and handwritings.
16. Understand and appreciate the concept of Neural Network through gamification and learn basic
programming skills through gamified platforms.
17. Acquire introductory Python programming skills in a very user-friendly format.
SKILLS TO BE DEVELOPED:
SCHEME OF STUDIES:
This course is a planned sequence of instructions consisting of units meant for developing employability and
vocational competencies of students of Class IX opting for skill subject along with other educationsubjects.
The unit-wise distribution of hours and marks for class IX & X is as follows:
7 8 4
Access),
B
Practical Examination 5
• Unit 3: Advance Python
5
• Unit 4: Data Science
• Unit 5: Computer Vision 5
Viva Voce 5
Total 35
Project Work / Field Visit / Student Portfolio
PART D
10
(Any one to be done)
Viva Voce 5
Total 15
GRAND TOTAL 210 100
Note: The detailed curriculum/ topics to be covered under Part A: Employability Skills
can be downloaded from CBSE website
UNIT 7: EVALUATION
The equipment / materials listed below are required to conduct effective hands-on learning sessions while
delivering the AI curriculum to class 10 students. The list below consists of minimal configuration required to
execute the AI curriculum for class 10 and create social impact real time solutions/ projects. The quantities
mentioned here are recommended for a batch of 20 students keeping the human- machine ratio as 2:1. An
exhaustive list may be compiled by the teacher(s) teaching the subject.
A SYSTEM SPECIFICATIONS
Processor: Intel® Core™ i5-7300U Processor or equivalent with minimum SYSmark®2018
1
Rating of 750 or higher
2 Graphic Card: Integrated graphics
3 Form Factor: - USFF (Ultra Small Form factor) System chassis volume less than One Litre
4 RAM: 8GB DDR4 – 2400MHz or above
NOTE: In keeping with the spirit of Recycle, Upcycle and Reuse, it is recommended to make use ofany
equipment/ devices/ accessories from the existing inventory in school.
UNIT-1: COMMUNICATION
SKILLS
Communication is the act of giving, receiving, and sharing information -- in other
words, talking or writing, and listening or reading. Good communicators listen
carefully, speak or write clearly, and respect different opinions.
Communication is defined as the imparting or exchanging of information by speaking,
writing, or using some other medium
Communication skills allow you to understand and be understood by others.
These can include but are not limited to effectively communicating ideas to others,
actively listening in conversations, giving, and receiving critical feedback and public
speaking.
Communication skills involve listening, speaking, observing, and empathizing. It is also
helpful to understand the differences in how to communicate through face to-face
interactions, phone conversations, and digital communications like email and social
media.
Session 1 – Methods of Communication:
The word ‘communication’ comes from the Latin word: commūnicāre, meaning‘to
share’.
Communication Skills
Non-Verbal Communication:
Non-verbal communication is the expression or exchange of information or messages
without using any spoken or written word. In other words, we send signals and messages to
others, through expressions, gestures, postures, touch, space, eye contact and para language.
Importance of Non-verbal Communication
In our day-to-day communication
• 55% communication is done using body movements, face, arms, etc.
• 38% communication is done using voice, tone, pauses, etc.
• only 7% communication is done using words. Around 93% of our communication is non-
verbal.
Expressions
• Smiling when you are happy
• Making a sad face when you are sad
Body Language
Postures by which attitudes and feelings are communicated. Standing straight, showing interest
Visual Communication
Visual communication proves to be effective since it involves interchanging messages only
through images or pictures and therefore, you do not need to know any particular language for
understanding it. It is simple and remains consistent across different places.
Some examples:
Importance of Feedback
Feedback is the final component and one of the most
important factors in the process of communication since
it is defined as the response given by the receiver to
the sender. Let us look at certain reasons why feedback
is important.
• It validates effective listening
• It motivates
• It is always there
• It boosts learning
• It improves performance
11 Classify the following actions below as examples of bad and good non- 2
verbal communication
a) Laughing during formal communication b) Scratching head
c) Smiling when speaking to a friend d) Nodding when you agree
with something
12 List down the various elements of Communication Channel. 2
Ans: The various elements of a communication cycle are: Sender: the
person beginning the communication. Message: the information that the
sender wants to convey. Channel: the means by which the information is
sent. Receiver: the person to whom the message is sent. Feedback: the
receiver’s acknowledgement and response to the message.
13 Mention 2 positive facial expressions which you can use in making
effective communication.
Ans:
• Smiling when meeting someone.
• Keeping face relaxed.
• Matching expressions with your words.
• Nodding while listening.
14 List down the basic parts of speech.
Ans: The part of speech indicates how a particular word functions in
meaning as well as grammatically within the sentence. Some examples are
nouns, pronouns, adjectives, verbs and adverbs.
15 Write two sentences of each type of sentence—statement, question,
exclamatory and order.
Stress
Management
Time Self
Management Awareness
Goal Self
Setting Motivation
Self-Management:
Self-management is the ability to control one’s emotions, thoughts and behaviour
effectively in different situations.
Essential Skills for Success:
Dedication
Importance of Self-Management:
Self-sufficient and independent
Ownership and accountability lead to self-confident
Goal-oriented and strategy maker
Self-monitoring and discipline reinforce good habits and behaviours
Organise life and remove stress
Stress: Stress can be defined as emotional, mental, physical and social reaction to
any perceived demands or threats.
Symptoms of Stress:
Causes of Stress:
Conflict or rivalry,
Lack of confidence Meeting deadlines or
Work pressure expectations
Physical discomfort Change of routine
Effects of Stress:
• Deteriorates mental and physical health
• Lack of concentration and productivity in work
• Damage in personal and professional relationships
Stress Management:
A B C
Adversity Beliefs Consequences
Self-Awareness
Know Yourself: Belief, Background, Opinion, Choice, Values
Self-Motivation:
Self-motivation is the internal force that drives one to act towards achievement of
goals.
Types of Motivation:
Stay loyal to
Goals despite
Develop Plan to adversity
achieve Goals
Set Goals and
Focus on them
Find own
Strength
Goal Setting
➢ Goal Setting: Goal setting is all about finding and listing one’s goals in life
and planning on achieving them.
➢ Importance of Goal Setting:
o Helps to think and decide about future
plans o Helps to prioritize things in life o
Helps to focus on important tasks
Emotional intelligence is the ability to identify and manage own and others’
emotions.
Personality Management:
ANSWERS:
4 List down any two methods that can be followed for effective
time management.
✓ Symbian
✓ Windows Phone
✓ iOS
Small pictures on the desktop are called icons. These icons represent files,
folders, 31 applications, etc. At the bottom of the desktop is a long bar called the
Taskbar. To the left of the taskbar is the Start button.
File Concept, File Operations, File Organization, Directory Structures And File
SystemStructures
Everything you store on your computer is stored in the form of a file. There are
specific naming conventions for naming files or folders, like characters that can
be used, maximum number of characters, etc. Files can be separately placed into
groups, called folders/directories. Each directory/folder can contain related files
and/or sub-folders.
data. You should use external hard drive for backup of data on your computer.
• Run anti-virus periodically
• Keep anti-virus software up to date
• Do not overcharge the batteries
• Do not block the vents
• Always shut down the computer properly
Q28. Rishi wants to categorize different types of devices so help him to list any 2
four input, output and storage devices which are used in day to day life.
Ans: Input Devices: Keyboard, Mouse, Scanner, Microphone
Output Devices: Monitor, printer, plotter, Speaker
Storage Devices: Hard Disk, CD/DVD, Pen Drive, Memory card
Q29. How to delete files and folders permanently from the recycle bin window? 2
Ans: 1. Double click on the Recycle bin
2. The Recycle bin window appears
3. Click empty the Recycle
Q30. Ravish wants to change his phone but he wants to transfer his old data 2
for later use so suggest him the term regarding this process.
Ans: The process through which Ravish can transfer his old data for later
use is known as data backup. Backing up data means to save the
information present on your computer on another device, such as
CD/DVD drives or hard disk.
MIND MAP:
Qualities Functions
A) Risk:
• Contrary to wage employment, one has to risk one’s own savings,
time and efforts
B) Workload:
• It takes serious hustle to get a new business up and running from
scratch. While it can be an exciting time, full of possibility, it can
also be exhausting for an entrepreneur.
C) Challenges:
• Being an entrepreneur is not without its challenges. One may face
lonely weekends and late-night works. Further low funding in
initial stages may also lower chances of success of the venture.
D) Uncertainty:
• Entrepreneurs often face headwinds from various quarters. Change
in market dynamics, government policies or even consumer
preferences, all can affect survival chances of a venture.
Over the years, with economic development, there has been an increase in
environmental pollution. For example, with the introduction of high input
agriculture, we can grow more food by using fertilizers, pesticides and hybrid
crops. But it has led to soil and environmental degradation. We need to plan the
use of resources in a sustainable manner so that we and our future generations
can enjoy a good environment.
What is Sustainable Development?
Sustainable development is the development that satisfies the needs of the
present without compromising the capacity of future generations, guaranteeing
the balance between economic growth, care for the environment and social
well-being.
(a) Food: The amount of rich, fertile land needed to grow crops, such as wheat,
rice, etc., is becoming less as we are using up more and more land for other
purposes. Soil nutrients are also getting depleted and lots of chemicals are
spoiling the soil due to use of chemical fertilizers.
but dump garbage into them. The rivers and ponds are getting polluted. This
way after several years, we will have no clean water for our use.
(c) Fuel: We are using a lot of wood from trees as fuels and for construction of
homes and furniture. As more and more trees are being cut, it is affecting the
climate of the place. Extreme weather conditions, such as floods, extreme cold
or heat, are seen inmany places, which affect the people living there.
betweenconcrete buildings;
• using more environment friendly material or biodegradable material and
• use of technologies, which are environmentally friendly and based on efficient
use ofresources.
Quality Education
Education is the most important factors for sustainable development. Children
who have gone to school will be able to do jobs so that they can take care of
themselves and their families. Education helps us become aware of our role as a
responsible citizen. We should
1. use the facilities present in our areas.
2. take our friends to school.
3. help friends' study.
4. Stop friends from dropping out of school.
Reduced Inequalities
To reduce inequalities, we can
1. be helpful to one another.
2. be friendly with everyone.
3. include everyone while working or playing.
4. help others by including everyone whether they are small or big, girl or boy,
Intelligence:
Intelligence is an ability to understand information, and to retain it as knowledge to be
applied in a particular situation or context.
Artificial Intelligence:
Refers to any technique that enables computers to mimic human intelligence. It gives
the ability to machines to recognize a human’s face; to move and manipulate objects; to
understand the voice commands by humans, and also do other tasks. The AI-enabled
machines think algorithmically and execute what they have been asked for intelligently.
Artificial Intelligence vs Natural Intelligence
Artificial Intelligence Natural Intelligence
Found in Machines Found in Humans
AI Machines are Built/Designed with Built based on observation, Learning
Data and Algorithms and etc.
Machines with AI Can perform large Humans have limitation to compute.
complex calculations
What is NOT AI:
It is very common for us to misunderstand any other technology as AI.
The machine/device which is trained with data and makes decision or prediction
based on data and algorithms are considered as AI. Below are some examples
which are not AI.
An Automatic Washing machine operates based on instructions provided by the
user.
Air Conditioner is operated by humans using remote. Humans need to set the
timer and temperature based on requirement. Air Conditioner can be turned
on/off from a different location. Still it needs human to operate.
Smart TV uses different applications and technologies in it, which make ease of
using. These also need humans to operate.
Applications of AI
A. AI in E-Commerce websites
(Examples: Amazon, Flipkart, Myntra and etc.)
B. AI in Virtual Assistants (Examples: Google Assistant, Alexa, Siri and etc)
C. AI in Self Driving Cars (Examples: Tesla, XUV 700 and etc)
D. AI in Health care (Examples: Medical Image Analysis, AI Enabled Medical
Diagnosis and etc.)
E. AI in Gaming (Examples: Cricket, FIFA, Racing Games and etc.)
AI Bias:
AI model is trained with huge set of data. This data is called training data. If this data is
biased, the output of the AI model will also be biased.
AI Bias is an irregularity in the outcome of a model/Algorithm due to data collected is
unbalanced and based on wrong assumptions.
2. In Weather forecasting system, to predict the temperature, rainfall and etc. The
type of technology used is _________________.
a. Computer Vision
b. Interpersonal Intelligence
c. Natural Language Processing
d. Artificial Intelligence
MARKING SCHEME
OBJECTIVE TYPE QUESTIONS 1 MARK
1. (d) Natural Language Processing
2. (d) Artificial Intelligence
3. (c) Computer Vision
4. (d) All the above
5. (c) i, ii & iv
6. (d) Natural Language Processing
7. (c) Virtual Assistants are not applications of NLP
8. (d) All the above
3. Domains of AI
The Three Domain of AI:
A. Data Science: -
Data science is a domain of AI related to data systems and processes, in which
the system collects numerous data, maintains data sets and derives
meaning/sense out of them.
The information extracted through data science can be used to decide about it.
Applications:
Price Comparison Websites, Targeted Advertising, Stock Market Analysis and
etc
B. Computer Vision: -
Computer Vision is a domain of AI that depicts the capability of a machine to
get and analyse visual information and afterwards predict some decisions
about it.
The entire process involves image acquiring, screening, analysing, identifying
and extracting information.
This makes devices visually enabled and gives the capability to understand the
4. AI Categories:
Artificial Intelligence:
Refers to any technique that enables computers to mimic human intelligence. It gives
the ability to machines to recognize a human’s face; to move and manipulate objects;
to understand the voice commands by humans, and also do other tasks. The AI-
enabled machines think algorithmically and execute what they have been asked for
intelligently.
1. Problem Scoping
Data Features
o Refer to the type of data you want to collect.
o E.g.: Salary amount, increment percentage, increment period, bonus etc.
Big data
o It includes data with sizes that exceed the capacity of
traditional software to process within an acceptable time
and value.
o The main focus is on unstructured type of data
a) Volume
• Amount of data produced
b) Variety
Big Data • Types of data produced
c) Velocity
• Speed of data produced
Data Exploration
❖ Web Scraping
❖ API
Outliers
2. By Using an Imputer to find the best possible substituteto replace missing values.
3. Erroneous Data:
Class
Erroneous data is test data that falls outside of what is
acceptable and should be rejected by the system. Student Name
RIYA GEORGE XA
JOSHUA SAM XA
APARNA BINU XA
SIDHARDH V R XA
NITHILA M 57
ATHULYA M S XA
ANUJA MS XB
KEERTHI KRISHNANATH XB
1) Data Visualization
1. Area Graphs
Area Graphs are Line Graphs but with the area
below the line filled in with a certain colour or
texture. Like Line
Graphs, Area Graphs are used to
display the development of quantitative
values over an interval or time period. They
are most commonly used to show trends, rather
than convey
specific values.
2. Bar Charts
The classic Bar Chart uses either horizontal or
vertical bars (column chart) to show discrete,
numerical comparison across categories. Bars
Charts are distinguished from Histograms, as
they do not display continuous
developments over an interval. Bar Chart's
3. Histogram
5. Scatterplots
6. Flow Charts
This type of diagram is used to show the sequential
steps of a process. Flow Charts map out a
process using a series of connected symbols,
which makes process easy to understand and
aids in its communication to other people. Flow
Charts are useful for explaining how a complex
and/or abstract procedure, system, concept or
algorithm work. Drawing a Flow Chart can also
help in planning an developing an existing one
relationship or correlation between the two
variables exists.
process or improving an
7. Pie Charts
In this approach, the rules are defined by the developer. The machine
follows the rules or instructions mentioned by the developer and
performs its task accordingly. So, it’s a static model. i.e. the machine
once trained, does not take into consideration any changes made in the
original training dataset
Rule based approach
Data
rule Answer
Thus, machine learning gets introduced as an extension to this as in that
case, the machine adapts to change in data and rules and follows the
updated path only, while a rule-based model does what it has been taught
once.
Learning Based Approach
It’s a type of AI modelling where the machine learns by itself. Under
the Learning Based approach, the AI model gets trained on the data fed
to it and then is able to design a model which is adaptive to the change
in data. That is, if the model is trained with X type of data and the
machine designs the algorithm around it, the model would modify itself
according to the changes which occur in the data so that all the
Data
Learningbasedapproach
Rules Answers
After training, the machine is now fed with testing data. Now, the testing
data might not have similar images as the ones on which the model has
been trained. So, the model adapts to the features on which it has been
trained and accordingly predicts the output.
In this way, the machine learns by itself by adapting to the new data
which is flowing in. This is the machine learning approach which
introduces the dynamicity in the model.
Generally, learning based models can be classified as follows:
MachineLearningModels
L UnsupervisedL ReinforcementL
earning earning earning
Dimensionality
Regression Classification Clustering Reduction
Regression
Supervised
Learning
Classification
a) Classification
Clustering
Unsupervised
Learning
Dimensionality
Reduction
Clustering
It refers to the unsupervised learning algorithm which can
cluster the unknown data according to the patterns or trends
identified out of it. The patterns observed might be the ones
which are known to the developer or it might even come up
with some unique patterns
out of it.
a) Dimensionality Reduction
We humans are able to visualize up to 3 Dimensions
only but according to a lot of theories and algorithms,
there are various entities which exist beyond 3-
99
Dimensions.
For example, in Natural language Processing, the words are
considered to be N-Dimensional entities. Which means that
we cannot visualize them as they exist beyond our
visualization ability. Hence, to make sense out of it, we
need to reduce their dimensions. Here, dimensionality
reduction algorithm is
used.
III. Reinforcement Learning
5. Evaluation
Evaluation is a process of understanding the reliability of any AI model,
based on outputs by feeding the test dataset into the model and comparing it
with actual answers. i.e. o once a model has been made and trained, it needs to
go through proper testing so that one can calculate the efficiency and
performance of the model. Hence, the model is tested with the help of Testing
100
Data (which was separated out of the acquired dataset at Data Acquisition
stage.
The efficiency of the model is calculated on the basis of the parameters mentioned below:
1. Accuracy
Accuracy is defined as percentage of the correct predictions out of all the observations.
2. Precision
Precision is defined as the percentage of true positive cases versus all the cases where
101
(b) Hidden layer
(c) Input layer
(d) Data layer
5. How you can identify the problem scoping in the project.
a. Understand why the project was started
b. Define the project’s primary objectives
c. Outline the project’s work statement.
d. All of the above
6. Identify the algorithm based on the given graph
12.___________refers to why we need to address the problem and what the advantages will
be for the stakeholders once the problem is solved.
a. Who
b. What
c. Where
d. Why
104
24.What is a System Map?
a. Helps to make relation between multiple element
b. Only one element will be responsible
c. Indicate the relationship using + or –
d. Both a) and c)
25.Data analysts utilize data visualization and statistical tools to convey dataset
characterizations, such as ___________.
a. size
b. amount
c. accuracy
d. All of the above
26.Data exploration is a technique used to visualize data in the form of statistical methods or
using graphs.
a. Statistical methods
b. Graphical methods
c. Both a) and b)
d. None of the above
27.Data Exploration helps you gain a better understanding of a _________.
a. Dataset
b. Database
c. accuracy
d. None of the above
28._____________helps to represent graphical data that use symbols to convey a story and
help people understand large volumes of information.
a. Dataset
b. Data visualization
c. Data Exploration
d. None of the above
29.A machine that work and react like human is known as ____________.
a. Artificial Intelligence
b. Machine Learning
c. Deep Learning
d. None of the above
30. Machine have abilities to learn from the experience or data.
a. Artificial Intelligence
b. Machine Learning
c. Deep Learning
d. None of the above
105
31._________ is a program that has been trained to recognize patterns using a set of data.
a. AI model
b. Dataset
c. Visualization
d. None of the above
32.Type of AI model are _____________.
a. Lesson Based and Rood Based
b. Learning Based and Rule Based
c. Machine Learning and Visualization
d. None of the above
33.___________refers to AI modelling in which the developer hasn’t specified the
relationship or patterns in the data.
a. Learning Based
b. Rule Based
c. Decision Tree
d. None of the above
34. After a model has been created and trained, it must be thoroughly tested in order to
determine its efficiency and performance; this is known as ___________.
a. Evaluation
b. Learning
c. Decision
d. None of the above
35.Which of the following is the first and the crucial stage of AI Project development which
focuses on identifying and understanding problems?
a) Problem Scoping (ii) Data Acquisition (iii) Data Exploration (iv) Modelling
36.…………………… refer to the type of data to be collected.
a) Data security (ii) Data policy (iii) Data quality (iv) Data features
37.Which of the following uses dots to represent the relationship between two different
numeric variables represented on the x and y axis?
a) Histogram (ii) Scatter plot (iii) Bullet Graphs (iv) Tree Diagram
38.Statement A: Neural networks are made up of layers of neurons.
Statement B: Human brain consists of millions of neurons.
i)Only Statement A is correct (ii) Only Statement B is correct
(iii) Both the statements are correct (iv) None of the statements is correct
39. The process of developing AI machines has different stages that are collectively
known as Al …………………… .
106
a) Project status (ii) Project cycle (iii) Both a) and (b) (iv) None of these
Ans) Project Cycle is a step-by-step process to solve problems using proven scientific
methods and
drawing inferences about them. The AI Project Cycle provides us with an appropriate
framework which can lead us towards the goal.
The AI Project Cycle mainly has 5 stages: They are
a) Problem Scoping b) Data Acquisition c) Data Exploration d) Modelling e) Evaluation.
2) Name the 4Ws of problem canvases under the problem scoping stage of the AI
Project Cycle.
Ans) a. Who, b. What c. Where d. Why
3) What is a problem statement template and what is its significance?
Ans) The problem statement template gives a clear idea about the basic framework required
to achieve the goal. It is the 4Ws canvas which segregates; , who is affected, what is the
problem, where does it arise, why is it a problem? It takes us straight to the goal.
4) What is the need of an AI Project Cycle? Explain.
Ans) Project cycle is the process of planning, organizing, coordinating, and finally
developing a project effectively throughout its phases, from planning through execution
then completion and review to achieve pre-defined objectives. Our mind makes up plans for
every task which we have to accomplish which is why things become clearer in our mind.
Similarly, if we have to develop an AI project, the AI Project Cycle provides us with an
appropriate framework which can lead us towards the goal. The major role of AI Project
Cycle is to distribute the development of AI project in various stages so that the
development becomes easier, clearly understandable and the steps / stages should become
more specific to efficiently get the best possible output. It mainly has 5 ordered stages
which distribute the entire development in specific and clear steps: These are Problem
Scoping, Data Acquisition, Data Exploration, Modelling and Evaluation.
5) What is Sustainable development?
ANS – Sustainable development is the development that satisfies the needs of the present
without compromising the capacity of future generations.
This was a warning to all countries about the effects of globalization and economic growth
on the environment.
107
6) How many goals are there in
Sustainable Development? Mention any
two goals
ANS – In 2015, The general assembly of UN
adopted the 2030 agenda for SD based on the
principle “Leaving None Behind”. The 17 goals
in Sustainable Development goals are –
1. No poverty
2. Zero Hunger
3. Good Health and Well Being
4. Quality Education
5. Gender Equality
6. Clean water and Sanitation
7. Affordable and Clean Energy
8. Decent Work and Economic Growth
9. Industry Innovation and Infrastructure
10.Reduced Inequalities
11.Sustainable Cities and Communities
12.Responsible Consumption and Production
13.Climate Action
14.Life Below Water
15.Life on Land
16.Peace, Justice and Strong Institution
17.Partnership for the Goals
Ans) Data should be collected from an authentic source, and should be accurate. The
redundant and irrelevant data should not be a part of prediction.
108
9) Explain Data Exploration Stage.
Ans) In this stage of project cycle, we try to interpret some useful information out of the
data we have
acquired. For this purpose, we need to explore the data and try to put it uniformly for a
better
understanding. This stage deals with validating or verification of the collected data and to
analyse
that:
➢ The data is according to the specifications decided.
➢ The data is free from errors.
➢ The data is meeting our needs
Ans) Any Artificial Neural Network, irrespective of the style and logic of
implementation, has a few basic features as given below.
• The Artificial Neural Network systems are modelled on the human brain and nervous
system.
• They are able to automatically extract features without feeding the input by
programmer.
• Every node of layer in a Neural Network is compulsorily a machine learning algorithm.
• It is very useful to implement when solving problems for very huge datasets.
Ans) a) Supervised learning is an approach to creating artificial intelligence (AI), where the
program is given labelled input data and the expected output results. OR Supervised learning
is a learning in which we teach or train the machine using data which is well labelled that
means some data is already tagged with the correct answer. After that, the machine is
provided with a new set of examples (data) so that supervised learning algorithm analyses the
training data (set of training examples) and produces a correct outcome from labelled data. In
a supervised learning model, the dataset which is fed to the machine is labelled. It means
some data is already tagged with the correct answer. In other words, we can say that the
dataset is known to the person who is training the machine only then he/she is able to label
the data.
14) Explain the Unsupervised Learning
Ans) Classification: The classification Model works on the labelled data. For example, we
have 3 coins of different denomination which are labelled according to their weight then the
model would look for the labelled features for predicting the output. This model works on
discrete dataset which means the data need not be continuous.
16) Draw the graphical representation of Regression AI model.
Regression: These models work on continuous data to predict the output based on patterns.
For example, if you wish to predict your next salary, then you would put in the data of your
previous salary, any increments, etc., and would train the model. Here, the data which has
been fed to the machine is continuous.
110
UNIT-3: ADVANCE PYTHON
ADVANCE PYTHON
Advanced Python refers to the expert-level concepts, techniques, and libraries that go beyond
the basics of the Python programming language. It includes:
- Advanced data structures and algorithms
- Decorators, generators, and asynchronous programming
- Web development frameworks like Django and Flask
- Data analysis and visualization libraries like Pandas, NumPy, and Matplotlib
- Machine learning libraries like scikit-learn and TensorFlow
- Object-Oriented Programming (OOP) concepts and design patterns
- Regular Expressions (regex) and advanced text processing
- Concurrency and parallel processing
- Debugging and testing techniques
- Advanced numerical computing and scientific computing
5. What is the main advantage of using NumPy arrays over Python lists?
a) Slower performance
b) Faster performance and efficient memory usage
c) Less memory usage
d) More memory usage
10. What is the name of the popular Python library for data analysis and manipulation?
Answer: pandas.
12. What is the name of the Python library for machine learning and data modeling?
Answer: scikit-learn.
14. What is the name of the Python library for natural language processing?
Answer: NLTK.
113
Answer: "join()" is used to join two DataFrames based on a common column, while
"concat()" is used to concatenate two or more DataFrames along a particular axis.
9. What is the difference between collections. Counter and collections. Default dict?
Answer: Counter is a dictionary subclass for counting hashable objects, while
defaultdict is a dictionary subclass that calls a factory function to supply missing
values.
114
15. What is the purpose of the __(link unavailable) file in a Python package?
Answer: It indicates that the directory should be treated as a package.
In Python, coroutines are implemented using the async and await keywords. An async
function is a coroutine that can be paused and resumed, while an await expression
suspends the execution of a coroutine until a specific condition is met.
In this example, the fetch_data function is an asynchronous coroutine that uses the aiohttp
library to fetch data from a given URL. The async with statement creates a context
manager that ensures the session and response objects are properly closed.
The await expression suspends the execution of the coroutine until the response text is
available. Once the data is received, the coroutine resumes execution and returns the
fetched data.
import asyncio
115
asyncio.run(main())
This code defines a main coroutine that calls the fetch_data coroutine and prints the result.
The asyncio.run function runs the main coroutine to completion.
Example:
import pandas as pd
Output:
A B
0 1 16
1 4 25
2 9 36
116
applymap method:
import pandas as pd
Output:
A B
0 1 16
1 4 25
2 9 36
3. Explain the concept of meta classes in Python and how they are used to customize class
creation. Provide an example of a simple meta class that adds a new attribute to a class.
Answer: Meta classes in Python are classes that create classes. They are used to customize
the creation of classes, allowing you to modify or extend the class definition before it's
created. A meta class is a class that inherits from type and defines a __new__ method,
which is responsible for creating the new class.
Here's a simple example of a meta class that adds a new attribute to a class:
class AddAttributeMeta(type):
def __new__(cls, name, bases, namespace):
# Create the new class
new_class = super().__new__(cls, name, bases, namespace)
117
# Add a new attribute to the class
new_class.new_attribute = "This attribute was added by the metaclass"
return new_class
In this example, the AddAttributeMeta metaclass inherits from type and defines a __new__
method. This method is called when a new class is created using the metaclass. The method
creates the new class using the super().__new__ call, and then adds a new attribute
new_attribute to the class.
The MyClass class is created using the AddAttributeMeta metaclass, and as a result, it has
the added attribute new_attribute.
Metaclasses are powerful tools for customizing class creation, and can be used for a wide
range of tasks, such as:
However, metaclasses can also make the code harder to understand and debug, so they
should be used judiciously and only when necessary.
4. Describe the difference between the functools.partial function and the functools.reduce
function. Provide an example of when you would use each.
Answer: The functools.partial and functools.reduce functions in Python are both higher-
order functions, but they serve different purposes:
functools.partial:
- Creates a new function that "partially applies" a given function by fixing some of its
arguments.
- Returns a new function that can be called with the remaining arguments.
118
Example:
from functools import partial
add_5_3 = partial(add, 5, 3)
result = add_5_3(2) # calls add(5, 3, 2)
print(result) # Output: 10
In this example, partial creates a new function add_5_3 that has x and y fixed to 5 and 3,
respectively. When we call add_5_3(2), it's equivalent to calling add(5, 3, 2).
functools.reduce:
Example:
from functools import reduce
numbers = [1, 2, 3, 4, 5]
product = reduce(lambda x, y: x * y, numbers)
print(product) # Output: 120
In this example, reduce applies the multiplication function to the elements of the list,
starting from the first two elements (1 and 2), then to the result (2) and the next element (3),
and so on, until the final result is calculated.
In summary:
- functools.partial creates a new function with some arguments fixed, while
- functools.reduce applies a function to an iterable, reducing it to a single output.
Use partial when you need to create a new function with some arguments pre-set, and
reduce when you need to aggregate values from an iterable using a binary function.
5.Explain the concept of descriptor protocols in Python and how they are used to implement
advanced attribute access and behavior. Provide an example of a simple descriptor that
implements a read-only attribute.
119
Answer: Descriptors in Python are a way to customize attribute access and behavior. They
are special objects that implement the descriptor protocol, which consists of the __get__,
__set__, and __delete__ methods. These methods are called by Python when an attribute is
accessed, set, or deleted, respectively.
Descriptors are used to implement advanced attribute access and behavior, such as:
class ReadOnlyDescriptor:
def __init__(self, value):
self.value = value
class MyClass:
my_attribute = ReadOnlyDescriptor("Initial value")
obj = MyClass()
print(obj.my_attribute) # Output: Initial value
obj.my_attribute = "New value" # Raises AttributeError
In this example, the ReadOnlyDescriptor class implements the descriptor protocol. The
__get__ method returns the value of the attribute, and the __set__ method raises an
AttributeError to prevent the attribute from being set.
120
UNIT-4: DATA SCIENCE
DATA SCIENCE
Data science is a field that combines statistics, computer science, and domain expertise to
extract insights and knowledge from data. It involves using various techniques such as
machine learning, data visualization, and statistical modeling to analyze and interpret
complex data sets.
These are just a few examples of the many uses of data science. The field is constantly
evolving and has the potential to transform various industries and aspects of our lives.
Here are some of the advantages and disadvantages of data science
Advantages:
- Better decision-making: Data science helps businesses and organizations make
better-informed decisions.
- Improved efficiency: Data science can help companies and organizations streamline
their operations by identifying inefficiencies and areas for improvement.
- Enhanced customer experience: Data science can help businesses and organizations
tailor their products and services to better meet the needs of their target audience.
- Predictive analytics: Data science can be used for predictive analytics, which
involves using data to forecast future trends and outcomes.
- Innovation and new discoveries: Data science can lead to new discoveries and
innovations by revealing previously unknown relationships and insights in data.
121
Disadvantages:
- Data privacy concerns: There is a risk of data privacy concerns when data is collected
and analyzed.
- Bias in data: Data can be biased due to many factors, such as the selection of the data
or the way it is collected.
- Misinterpretation of data: Data science involves complex statistical analysis, which
can sometimes lead to misinterpretation of the data.
- Data quality issues: Data science depends on the quality of the data used. If the data
is not accurate, complete or consistent, it can lead to incorrect results.
- Cost and time: Data science can be time-consuming and expensive.
3. What is the term for the process of cleaning and preparing data?
Answer: Data preprocessing.
2. Which algorithm is used for finding the most important features in a dataset?
Answer: Principal Component Analysis (PCA).
123
3. What is the name of the technique used to handle missing values in a dataset?
Answer: Imputation.
5. What is the name of the popular data science tool used for data manipulation and
analysis?
Answer: Pandas.
9. What is the name of the technique used to reduce the dimensionality of a dataset?
Answer: Feature selection.
11.What is the name of the popular data science library used for machine learning?
Answer: scikit-learn.
124
- Deployment and maintenance
2. Explain the concept of overfitting in machine learning and how it can be prevented.
Answer: Overfitting occurs when a model is too complex and performs well on
training data but poorly on new data. Techniques to prevent overfitting include:
- Regularization (L1, L2)
- Early stopping
- Data augmentation
- Cross-validation
- Ensemble methods
5. Explain the concept of bias and variance in machine learning and how they affect
model performance.
Answer: Bias refers to systematic error, while variance refers to model sensitivity to
data. High bias leads to under fitting, while high variance leads to overfitting. The
goal is to balance bias and variance to achieve optimal model performance.
125
UNIT-5: COMPUTER VISION
Computer vision is a branch in the Domain of AI that enables computers to analyze meaningful
information from images, videos, and other visual inputs.
Computer vision is the same as the human eye, it enables us see-through images or visual data,
process and analyses them on the basis of algorithms and methods in order to analyse actual
phenomena with images.
Facial recognition
The most frequently used technology is smartphones. It is a technology to remember and
verify a person, object, etc from the visuals from the given pre-defined data. Such kinds of
mechanics are often used for security and safety purposes.
For eg: Face security lock-in devices and traffic cameras are some examples using facial
recognition.
126
Facial filters
Modern days social media apps like Snapchat and Instagram use such kinds of technology
that extract facial landmarks and process them using AI to get the best result.
Goggle lens
To search data, Google uses Computer vision for capturing and analysing different
features of the input image to the database of images and then gives us the search.
Automotive
The machinery in industries is now using Computer vision. Automated cars are equipped
with sensors and software which can detect the 360 degrees of movements determine the
location, detect objects and establish the depth or dimensions of the virtual world.
For eg: Companies like Tesla are now interested in developing self-driving cars.
Medical Imaging
For the last decades, computer vision medical imaging application has been a trustworthy
help for physicians and doctors. It creates and analyses images and helps doctors with
their interpretation.
The application is used to read and convert 2D scan images into interactive 3D models.
The Application of the computer is performed by certain tasks on the data or input
provided by the user so it can process and analyse the situation and predict the outcome.
127
Single object Multiple object
Image Classification: - Image Object detection: - Object
Classification is the task of detection tasks extract features from
identifying an object in the input the input and use learned
image and label from a predefined formulas to recognize instances of an
category object category.
Basics of Images
The word “pixel” means a picture element.
Pixels
• Pixels are the fundamental element of a photograph.
• They are the smallest unit of information that make up a picture.
• They are typically arranged in a 2-dimensional grid.
• In general term, the more pixels you have, the more closely the image resembles the
original.
128
Resolution
• The number of pixels covered in an image is sometimes called the resolution
• Term for area covered by the pixels in conventionally known as resolution.
• For eg :1080 x 720 pixels is a resolution giving numbers of pixels in width and height
of that picture.
• A megapixel is a million pixels.
Pixel value
• Pixel value represent the brightness of the pixel.
• The range of a pixel value in 0-255(2^8-1)
• where 0 is taken as Black or no colour and 255 is taken as white
Computer systems only work in the form of ones and zeros or binary systems. Each bit
in a computer system can have either a zero or a one. Each pixel uses 1 byte of an
image each bit can have two possible values which tells us that the 8 bits can have 255
possibilities of values that start from 0 and ends at 255.
Grayscale Images
• Grayscale images are images which have a range of shades of gray without apparent
colour.
• The lightest shade is white total presence of colour or 255 and darkest colour is black
at 0.
• Intermediate shades of gray have equal brightness levels of the three primary colours
RBG.
• The computers store the images we see in the form of these numbers.
129
RBG colours
• All the coloured images are made up of three primary colours Red, Green and Blue.
• All the other colour are formed by using these primary colours at different
proportions.
• Computer stores RGB Images in three different channels called the R channel, G
channel and the B channel.
Image Features
• A feature is a description of an image.
• Features are the specific structures in the image such as points, edges or objects.
• Other examples of features are related to tasks of CV motion in image sequences, or
to shapes defined in terms of curves or boundaries between different image regions.
Open CV or Open Source Computer Vision Library is that tool that helps a computer
to extract these features from the images. It is capable of processing images and videos
to identify objects, faces, or even handwriting.
130
4. Which of the following tasks involves identifying and locating objects within
an image?
A) Image compression
B) Feature extraction
C) Object detection
D) Image enhancement
5. A _______________ is a technology based on computer vision that
identifies, verifies, or matches a digital image of a human face against a
database of stored facial images.
6. What does segmentation in Computer Vision refer to?
A) Enhancing image details
B) Dividing an image into parts or regions
C) Reducing image size
D) Increasing image resolution
7. A grayscale image represents intensity values ranging from 0 to ________.
8. What is feature extraction in Computer Vision?
A) Reducing image noise
B) Identifying and describing relevant characteristics from an image
C) Increasing image contrast
D) Storing image data
9. What is the main function of the Google Translate App when interpreting
foreign language signs?
A) To provide dictionary definitions
B) To teach grammar rules
C) To translate text into your preferred language almost instantly
D) To convert voice to text
10. What does the pixel value represent in a grayscale image?
A) The color
B) The intensity
C) The contrast
D) The brightness
11. Which of the following is a common use of Computer Vision in medical
imaging?
A) Audio transcription
B) Image segmentation
C) Video streaming
D) Data encryption
12. Which of the following is a primary color in the RGB color model?
A) Yellow B) Cyan
C) Green D) Magenta
131
13. What is a common application of Computer Vision in security systems?
A) Document editing
B) Video streaming
C) Facial recognition
D) Web browsing
14. Which format is typically used to store a color image in digital form?
A) Grayscale
B) Binary
C) RGB
D) Indexed
15. ____________ is the core technology behind the development of
autonomous vehicles
16. ____________allows you to point your phone’s camera at the words and tell
you what it means in your preferred language almost instantly.
17. True/False
A higher resolution in an image implies less detail.
Assertion Reasoning Questions
18. Assertion (A): Computer vision is a field of artificial intelligence that enables
computers to interpret and make decisions based on visual data from the world.
Reasoning (R): Computer vision uses algorithms to process and analyse images
and videos, enabling tasks like object detection and facial recognition.
19. Assertion (A): Image classification is the process of categorizing and labeling
groups of pixels or vectors within an image based on specific rules.
Reasoning (R): Image classification is a crucial step in medical imaging,
allowing for the diagnosis of diseases from X-rays or MRI scans.
A. Both A and R are true, and R is the correct explanation for A.
B. Both A and R are true, but R is not the correct explanation for A.
C. A is true, but R is false.
D. A is false, but R is true.
E. Both A and R are false.
132
20 Assertion (A): Computer vision can be used in automated quality inspection in
manufacturing industries.
Reasoning (R): Automated quality inspection systems use computer vision to
identify defects or irregularities in products on a production line.
133
1. Case Study: Autonomous Vehicles
Consider a smart surveillance system that employs Computer Vision for security
purposes. Explain how object detection and facial recognition are used in this
system to enhance security. What ethical considerations should be considered
when implementing such a system?
4. Case Study: Digital Image Restoration
9. A pixel is the smallest unit of a digital image, representing a single point in the
image with a specific color or intensity. Pixels are important because they
collectively form the entire image, determining its resolution and detail.
10. Resolution refers to the number of pixels in an image, usually measured in
pixels per inch (PPI). Higher resolution means more pixels and greater detail,
resulting in better image quality.
11. Grayscale images consist of shades of gray, ranging from black to white, with
each pixel representing an intensity value. RGB images use three color channels
(Red, Green, Blue), where each pixel is a combination of these three colors,
allowing for a wide range of colors in the image.
12. In a grayscale image, the pixel value is represented by an intensity level ranging
from 0 to 255, where 0 represents black, 255 represents white, and values in
between represent different shades of gray.
13. In an RGB image, each pixel has three color channels (Red, Green, Blue). The
intensity of each channel determines the final color of the pixel. By combining
different intensities of these three channels, a wide range of colors can be
represented.
14. High-resolution images provide more detail and clarity, which can improve the
accuracy of Computer Vision tasks such as object detection, recognition, and
segmentation, as they allow for better feature extraction and analysis.
15. Pixel density, measured in pixels per inch (PPI), affects the sharpness and clarity
of an image. Higher pixel density means more pixels are packed into a given
area, resulting in a crisper and more detailed image, which is particularly
important for high-quality displays and prints.
136
2. Pixel Value: In digital images, pixel value represents the intensity or color
information of a pixel. In grayscale images, it ranges from 0 (black) to 255
(white). In RGB images, it is defined by the intensities of red, green, and blue
channels.
Resolution: Resolution refers to the number of pixels in an image, typically
measured in pixels per inch (PPI). Higher resolution means more pixels and
greater detail, enhancing image clarity and quality.
Color Channels: In RGB images, each pixel is composed of three-color
channels (red, green, blue). The combination of these channels at varying
intensities produces a wide range of colors. High-quality images require
accurate representation of these color channels.
Collective Impact: High pixel values, resolution, and well-defined color
channels contribute to a detailed, sharp, and color-rich image. Lower values or
resolution can result in blurred, pixelated, or distorted images, reducing visual
quality and effectiveness in Computer Vision tasks.
137
5. Object Detection: This task involves identifying and locating objects within an
image, providing bounding boxes around detected objects. It focuses on
detecting multiple objects and their positions.
Image Classification: This task assigns a single label to an entire image based
on its content. It does not provide the locations of objects, only categorizes the
image as a whole.
Image Segmentation: This task divides an image into segments, each
representing a different object or region. It provides pixel-level classification,
offering detailed information about the structure and boundaries within the
image.
Case Study/Application-Based Questions on Computer Vision- 5 marks-
Answers
1. Computer Vision helps in lane detection by using cameras to identify lane
markings on the road, ensuring the vehicle stays within its lane. Pedestrian
recognition involves detecting and tracking pedestrians to avoid collisions.
Traffic sign recognition uses image processing to identify and interpret traffic
signs, allowing the vehicle to respond accordingly. Challenges in adverse
weather conditions include reduced visibility and accuracy. These can be
mitigated by using additional sensors such as radar and LIDAR, as well as
implementing advanced algorithms to enhance image processing in poor
visibility.
2. Computer Vision algorithms can analyze medical images to detect abnormalities
like tumors by identifying unusual patterns and shapes that indicate the presence
of disease. The advantages include faster and more accurate diagnosis, early
detection of diseases, and improved treatment planning. This technology reduces
the workload on medical professionals and increases the chances of successful
treatment by identifying issues at an early stage.
3. Object detection is used to identify and monitor objects within the surveillance
area, alerting security personnel to any suspicious activity. Facial recognition
identifies individuals by comparing captured images with a database of known
faces, enhancing security by recognizing potential threats. Ethical considerations
include privacy concerns, potential biases in recognition algorithms, and the
need for transparency and accountability in how the data is used and stored.
4. Understanding pixel values helps in identifying the intensity and color
information of each pixel, which is essential for correcting damaged areas.
Resolution knowledge is important for maintaining image detail during
restoration. Color channels are used to accurately restore the colors in RGB
images. Computer Vision enhances quality by using algorithms to fill in missing
parts, correct color imbalances, and sharpen details, resulting in a restored image
that closely resembles the original.
138
5. Computer Vision can track inventory levels in real-time by analyzing shelf
images, ensuring timely restocking. It can analyze customer behavior by
monitoring movement patterns and product interactions, helping in optimizing
store layout and marketing strategies. Automated checkout systems use image
recognition to identify products and streamline the payment process. Benefits
include increased efficiency, reduced labor costs, and improved customer
satisfaction. Challenges include the high cost of implementation, potential
technical issues, and ensuring data privacy and security.
139
UNIT-6: NATURAL LANGAUGE PROCESS (NLP)
INTRODUCTION
Computers can understand the structured form of data like spreadsheets and the
tables in the database, but human languages, texts, and voices form an unstructured
category of data, and it gets difficult for the computer to understand it, and there
arises the need for Natural Language Processing.
In NLP, we can break down the process of understanding English for a model into
a number of small pieces.
140
Applications of Natural Language Processing
1. Chatbots
First, they identify the meaning of the question asked and collect all the data
from the user that may be required to answer the question. Then they answer the
question appropriately.
Have you noticed that search engines tend to guess what you are
typing and automatically complete your sentences? For example,
on typing “game” in Google, you may get further suggestions for
“game of thrones”, “game of life” or if you are interested in maths
then “game theory”. All these suggestions are provided using auto
complete that uses Natural Language Processing to guess what you
want to ask. Search engines use their enormous data sets to
analyze what their customers are probably typing when they enter
particular words and suggest the most common possibilities. They
use Natural Language Processing to make sense of these words
and how they are interconnected to form different sentences.
141
3. Voice Assistants
These days voice assistants are all the rage! Whether its
Siri, Alexa, or Google Assistant, almost everyone uses one of
these to make calls, place reminders, schedule meetings, set
alarms, surf the internet, etc. These voice assistants have
made life much easier. But how do they work? They use a
complex combination of speech recognition, natural
language understanding, and natural language processing to
understand what humans are saying and then act on it.
4.Language Translator
5.Grammar Checkers
142
important tool for any professional writer. They can not only
correct grammar and check spellings but also suggest better
synonyms and improve the overall readability of your content.
And guess what, they utilize natural language processing to
provide the best possible piece of writing! The NLP algorithm
is trained on millions of sentences to understand the correct
format. That is why it can suggest the correct verb tense, a
better synonym, or a clearer sentence structure than what you
have written. Some of the most popular grammar checkers that
use NLP include Grammarly, WhiteSmoke, ProWritingAid,
etc.
6.Sentiment Analysis
Almost all the world is on social media these days! And companies can use
sentiment analysis to understand how a particular type of user feels about a
particular topic, product, etc. They can use natural language processing,
computational linguistics, text analysis, etc. to understand the general sentiment of
the users for their products and services and find out if the sentiment is good, bad,
or neutral. Companies can use sentiment analysis in a lot of ways such as to find
out the emotions of their target audience, to understand product reviews, to gauge
their brand sentiment, etc. And not just private companies, even governments use
sentiment analysis to find popular opinion and also catch out any threats to the
security of the nation.
Emails are still the most important method for professional communication.
However, all of us still get thousands of promotional Emails that we don’t want to
143
read. Thankfully, our emails are automatically divided into 3 sections namely,
Primary, Social, and Promotions which means we never have to open the
Promotional section! But how does this work? Email services use natural language
processing to identify the contents of each Email with text classification so that it
can be put in the correct section. This method is not perfect since there are still
some Promotional newsletters in Primary, but it’s better than nothing. In more
advanced cases, some companies also use specialty anti-virus software with natural
language processing to scan the emails and see if there are any patterns and phrases
that may indicate a phishing attempt on the employees.
8. Text Summarization
Text summarization is the process of creating a shorter version of the text with
only vital information and thus, helps the user to understand the text in a shorter
amount of time. The main advantage of text summarization lies in the fact that it
reduces user’s time in searching the important details in the document.
9.Text Classification
After segmenting the sentences, each sentence is then further divided into tokens.
Tokens is a term used for any word or number or special character occurring in a
sentence. Under tokenisation, every word, number and special character is
considered separately and each of them is now a separate token.
144
Removing Stop words, Special characters and Numbers
In this step, the tokens which are not necessary are removed from the token list.
What can be the possible words which we might not require?
Stop words are the words in any language which do not add much meaning to a
sentence. They can safely be ignored without sacrificing the meaning of the
sentence
Humans use grammar to make their sentences meaningful for the other person
to understand.
But grammatical words do not add any essence to the information which is to be
transmitted through the statement hence they come under stop words. Some
examples of stop words are:
These words occur the most in any given sentence but talk very little or nothing
about the context or the meaning of it. Hence, to make it easier for the computer to
145
focus on meaningful terms, these words are removed.
Along with these words, the sentence might have special characters and/or
numbers. Now it depends on the type of sentence in the documents that we are
working on whether we should keep them in it or not. For example, if you are
working on a document containing email IDs, then you might not want to remove
the special characters and numbers whereas in some other textual data if these
characters do not make sense, then you can remove them along with the stop
words.
After the stop words removal, we convert the whole text into a similar case,
preferably lower case. This ensures that the case-sensitivity of the machine does not
consider same words as different just because of different cases.
Here in this example, the all the 6 forms of hello would be converted to lower
case and hence would be treated as the same word by the machine.
Stemming
In this step, the remaining words are reduced to their root words. In other words,
stemming is the process in which the affixes of words are removed and the words
146
are converted to their base form.
Note that in stemming,
the stemmed words
(words which are we get
after removing the
affixes) might not be
meaningful. Here in this
example as you can see:
healed, healing and healer
all were reduced to heal but studies was reduced to studi after the affix removal
which is not a meaningful word. Stemming does not consider if the stemmed word
is meaningful or not. It just removes the affixes hence it is faster.
Lemmatization
Stemming and
lemmatization both are
alternative processes to
each other as the role of
both the processes is same
– removal of affixes. But
the difference between
both of
them is that in lemmatization, the word we get after affix removal (also known as
lemma) is a meaningful one. Lemmatization makes sure that lemma is a word with
meaning and hence it takes a longer time to execute than stemming.
As you can see in the same example, the output for studies after affix removal has
become study instead of studi.
147
Difference between stemming and lemmatization can be summarized by this
example:
With this we have normalised our text to tokens which are the simplest form of
words. Now it is time to convert the tokens into numbers. For this, we would use
the Bag of Words algorithm
This image gives us a brief overview about how bag of words works. As you can
148
see at the right, it shows us a list of words appearing in the corpus and the numbers
corresponding to it shows how many times the word has occurred in the text body.
Thus, we can say that the bag of words gives us two things:
Here calling this algorithm “bag” of words symbolises that the sequence of
sentences or tokens does not matter in this case as all we need are the unique words
and their frequency in it.
Create Dictionary: Make a list of all the unique words occurring in the corpus. (Vocabulary)
Create document vectors: For each document in the corpus, find out how many
times the word from the unique list of words has occurred.
Create document vectors for all the documents. Let us go through all the steps
with an example:
to a therapist
Here are three documents having one sentence each. After text normalisation, the text
becomes:
149
Document 1: [aman, and, anil, are,
to, a, therapist]
Note that no tokens have been removed in the stop words removal step. It is
because we have very little data and since the frequency of all the words is almost
the same, no word can be said to have lesser value than the other.
Note that even though some words are repeated in different documents, they are all
written just once as while creating the dictionary, we create the list of unique words.
In this step, the vocabulary is written in the top row. Now, for each word in the
document, if it matches with the vocabulary, put a 1 under it. If the same word
appears again, increment the previous value by 1. And if the word does not
occur in that document, put a 0 under it.
150
Since in the first document, we have words: aman, and, anil, are, stressed. So, all
these words get a value of 1 and the rest of the words get a 0 value.
Same exercise has to be done for all the documents. Hence, the table becomes:
In this table, the header row contains the vocabulary of the corpus and three rows
correspond to three different documents. Take a look at this table and analyze the
positioning of 0s and 1s in it.
Finally, this gives us the document vector table for our corpus. But the tokens
have still not converted to numbers. This leads us to the final steps of our
algorithm: TFIDF.
Bag of words algorithm gives us the frequency of words in each document we have
in our corpus. It gives us an idea that if the word is occurring more in a document,
its value is more for that document. For example, if I have a document on air
pollution, air and pollution would be the words which occur many times in it. And
these words are valuable too as they give us some context around the document.
But let us suppose we have 10 documents and all of them talk about different issues.
One is on women empowerment, the other is on unemployment and so on. Do you
151
think air and pollution would still be one of the most occurring words in the whole
corpus? If not, then which words do you think would have the highest frequency in
all of them?
And, this, is, the, etc. are the words which occur the most in almost all the
documents. But these words do not talk about the corpus at all. Though they are
important for humans as they make the statements understandable to us, for the
machine they are a complete waste as they do not provide us with any information
regarding the corpus. Hence, these are termed as stopwords and are mostly removed
at the pre-processing stage only.
Take a look at this graph. It is a plot of occurrence of words versus their value. As
you can see, if the words have highest occurrence in all the documents of the
corpus, they are said to have negligible value hence they are termed as stop words.
These words are mostly removed at the pre- processing stage only. Now as we
move ahead from the stop words, the occurrence level drops drastically and the
words which have adequate occurrence in the corpus are said to have some amount
152
of value and are termed as frequent words. These words mostly talk about the
document’s subject and their occurrence is adequate in the corpus. Then as the
occurrence of words drops further, the value of such words rises. These words are
termed as rare or valuable words. These words occur the least but add the most
value to the corpus. Hence, when we look at the text, we take frequent and rare
words into consideration.
TFIDF stands for Term Frequency and Inverse Document Frequency. TFIDF helps
in identifying the value for each word.
Term Frequency
Here, you can see that the frequency of each word for each document has been
recorded in the table. These numbers are nothing but the Term Frequencies!
Now, let us look at the other half of TFIDF which is Inverse Document Frequency.
For this, let us first understand what does document frequency mean. Document
Frequency is the number of documents in which the word occurs irrespective of
how many times it has occurred in those documents. The document frequency for
153
the exemplar vocabulary would be:
Here, you can see that the document frequency of ‘aman’, ‘anil’, ‘went’, ‘to’ and
‘a’ is 2 as they have occurred in two documents. Rest of them occurred in just one
document hence the document frequency for them is one.
Talking about inverse document frequency, we need to put the document frequency
in the denominator while the total number of documents is the numerator. Here, the
total number of documents are 3, hence inverse document frequency becomes:
154
Here, you can see that the IDF values for Aman in each row is the same and
similar pattern is followed for all the words of the vocabulary. After
calculating all the values, we get
Finally, the words have been converted to numbers. These numbers are the values
of each for each document. Here, you can see that since we have less amount of
data, words like ‘are’ and ‘and’ also have a high value. But as the IDF value
increases, the value of that word decreases. That is, for example:
Which means: log (3.3333) = 0.522; which shows that the word ‘pollution’ has
155
considerable value in the corpus.
Summarizing the concept, we can say that:
Words that occur in all the documents with high term frequencies have the least
values and are considered to be the stop words.
For a word to have high TFIDF value, the word needs to have a high term
frequency but less document frequency which shows that the word is important for
one document but is not a common word for all documents.
These values help the computer understand which words are to be considered while
processing the natural language. The higher the value, the more important the word
is for a given corpus.
Applications of TF-IDF
Topic Modelling: It helps in predicting the topic for the corpus. Topic modelling
refers to a method of identifying short and informative descriptions of a document in
a large collection that can further be used for various text mining tasks such a
summarization, document classification etc.
156
NTLK:
NLTK is a leading platform for building Python programs to work with human
language data. It provides easy-to-use interfaces to over 50 corpora and lexical
resources such as WordNet, along with a suite of text processing libraries for
classification, tokenization, stemming, tagging, parsing, and semantic reasoning,
wrappers for industrial-strength NLP libraries, and an active discussion forum.
NLTK is suitable for linguists, engineers, students, educators, researchers, and
industry users alike. NLTK is available for Windows, Mac OS X, and Linux. Best
of all, NLTK is a free, open source, community-driven project.
NLTK has been called “a wonderful tool for teaching, and working in,
computational linguistics using Python,” and “an amazing library to play with
natural language.”
Installing NLTK
157
Install Python 3.8: https://www.python.org/downloads/ (avoid the 64-bit
versions) Install Numpy (optional):
https://www.scipy.org/scipylib/download.html
Install NLTK: https://pypi.python.org/pypi/nltk
Test installation: Start>Python38, then type import nltk
158
QUESTION BANKS – MCQS:
1. What is NLTK tool in Python?
(a) Natural Linguistics Tool
(b) Natural Language Toolkit
(c) Neutral Language Kit
(d) Neutral Language Toolkit
2. TF-IDF in NLP is defined as:
a. Term Frequency and Definite Frequency
b. Term Frequency and Indefinite Frequency
c. Term Frequency and Inverse Document Frequency
d. Term Frequency and Integrated Document Frequency
3. What do we call the process of dividing a string into component words?
a. Regression
b. Word Tokenization
c. Classification
d. Clustering
4. “Converting text to a common case” is a step in Text Normalisation. (True/False)
5. The higher the value, the more important the word in the document – this is
true of which model?
(a) Bag of Words
(b)TF-IDF
(c) YOLO
(d) SSD
6. Which of these is not an NLP library?
(a) NLTK
(b) NLP Kit
(c) Open NLP
(d) NLP Suite
7. What is a chatbot called which uses simple FAQs without any intelligence?
(a) Smart Chatbot
(b) Script Chatbot
(c) AI Chatbot
(d) ML Chatbot
159
8. What is the process of extracting emotions within a text data using NLP called?
a. Sentiment Analysis
b. Emotional Data Science
c. Emotional Processing
d. Emotional Classification
9. After Lemmatization, the words which we are get after removing the affixes is called
a. Lemmat
b. Lemma
c. Lemmatiz
d. Lemmatiza
10. _____are the words which occur very frequently in the corpus but do not add
any value to it.
a. Special Characters
b. Stopwords
c. Roman Numbers
d. Useless Words
5. What is the difference between how humans interpret communication and how NLP
interpret?
The communications made by the machines are very basic and simple. Human
communication is complex. There are multiple characteristics of the human
language that might be easy for a human to understand but extremely difficult for a
computer to understand.
162
Q. Mar
No ks
PART -A
1. Find the odd men out 1
a) Chatbot
b) Grammar Checkers
c) Jabber-wacky
d) PriceGrabber
2. WhiteSmoke is an example of ____________ domain of AI 1
a) Data Science
b) Computer Vision
c) NLP
d) None of these
3. The Email services use __________________________ to identify the 1
contents of each Email with text classification.
a) Grammar checker
b) Natural Language Processing
c) Computer Vision
d) Data Analysis
4. Companies can use _____________ in a lot of ways such as to find out the 1
emotions of their target audience, to understand product reviews, to gauge
their brand.
a) Sentiment Analysis
b) structure Analysis
c) Deep Learning
d)Emails
5. Two main approaches to summarising text documents are 1
a) Extractive Method & Abstractive Method
b) Classification & Regression
c) Clustering & calculating
d) Chat box & Smart bot
6. Spam filtering in email is an example of ______________ 1
a) Text summarisation
b) Text Classification
c) Sentiment Analysis
d) None of the above
7. Google Assistant, Alexa, Cortana, Siri are examples of 1
a) Script Bot
b) Smart Bot
c) Sling Bot
d) None of these
163
8. _________ is a term used for any word or number or special character 1
occurring in a sentence in Text Normalisation.
a) Tokens
b) Numbers
c) Common case
d) None of the above
9. In text normalization, text from multiple documents and the term used for 1
the whole textual data from all the documents altogether is known as
_________
a) Corpus
b) Tokens
c) Lemma
d) Stem
10. Using the , we can find a vocabulary of words for the corpus and the 1
frequency of these words (number of times it has occurred in the whole
corpus.
11. The process of extracting the root form of the word is known as 1
_________
a) Tokenisation
b) Stemming
c) Lemmatisation
d) Segmentation
12. ___________ is a statistical measure that evaluates how relevant a word 1
is to a document in a collection of documents.
a) TF b) IDF c) TF -IDF d) All of these
13. How many tokens are there in the following sentence: 1
“Traffic Jams have become a common part of our lives nowadays. Living
in an urban area means, you have to face traffic each and every time you
get out on the road. Mostly,
school students opt for buses to go to school.
14. NLP stands for ___________________. 1
a) Natural Language Processing
b) Natural Language Program
c) Neural Language Program
d) Natural Learning Program
15. A corpus contains 4 documents in which the word ‘diet’ was appearing 1
once in document 1. Identify the term in which we can categorise the word
‘diet’.
(a) Stop word (b) Rare word
(c) Frequent word (d) Removable word
164
16. Aditi, a student of class XII developed a chatbot that clarifies the doubts of 1
Economics students. She trained the software with lots of data sets
catering to all difficulty levels. If any student would type or ask questions
related to Economics, the software would give an instant reply. Identify
the domain of AI in the given scenario.
(a) Computer Vision
(b) Data Science
(c) Natural Language Processing
(d) None of these
17. What do you mean by syntax of a language? 1
a) Meaning of a sentence
b) Grammatical structure of a sentence
c) Semantics of a sentence
d) Synonym of a sentence
18. There are 10 documents in which the word “and” appears totally 10 times.
What is the IDF value for “and”
a) 10
b)10/1
c) 1
d) 0
19. The formula of TFIDF for any word W is: 1
a) TFIDF(W) = IDF(W) * log (IDF(W)
b) TFIDF(W) = TF(W) * log (IDF(W)
c) TFIDF(W) = IDF(W) * log (TF(W)
d) TFIDF(W) = IDF(W) * log (DF(W)
165
27. Write the stem ans lemma words for the following: 2
Healing, studies, studying, caring
28. Define Text Summarisation. 2
29. Identify the stop words in the given sentence: 2
166
Q. Answers Marks
No
PART -A
1. d) PriceGrabber 1
2. c) NLP 1
3. b) Natural Language Processing 1
4. a) Sentiment Analysis 1
5. a) Extractive Method & Abstractive Method 1
6. b) Text Classification 1
7. b) Smart Bot 1
8. a) Tokens 1
9. b) Tokens 1
10. Bag of words algorithm 1
11. c) Lemmatisation 1
12. c) TF -IDF 1
13. 47 1
14. a) Natural Language Processing 1
15. (b) Rare word 1
16. (c) Natural Language Processing 1
17. b) Grammatical structure of a sentence 1
18. c) 1
19. b) TFIDF(W) = TF(W) * log (IDF(W) 1
20. a) Heal 1
PART -B
21. Stemming is the process of removing a part of a word, or reducing to its 2
stem or root, e.g., in stemming, the word “studies” gets reduced to its
stem ’studi’ with ‘ed’ removed; similarly, the word ‘advisable’ gets
reduced to its stem ‘advis’.
167
22. 1.Text Normalization: Collect data and pre-process it by removing the 2
known stop words.
2. Design the vocabulary. Prepare the corpus (a collection of words)
from the words in the document. The whole collection of textual data
from all the documents is called corpus.
168
25. 2
Script-bot Smart-bot
a. A scripted chatbot doesn’t carry even a a. Smart bots are built on
glimpse of AI. NLP and ML.
b. Script bots are easy to make Script bot b. Smart –bots are
functioning is very limited as they are comparatively difficult to
less powerful. make.
c. Script bots work around a script which c. Smart-bots are flexible
is programmed in them. and powerful.
d. No or little language processing skills e. NLP and Machine
learning skills are required.
e. Limited functionality e. Limited functionality
Example: the bots which are deployed in Example: Google Assistant,
the customer care section of various Alexa, Cortana, Siri, etc.
companies
26. 2
1. Raj and Vijay are best friends.
2. They play together with other friends.
3. Raj likes to play football but Vijay prefers to play online games.
4. Raj wants to be a footballer.
5. Vijay wants to become an online gamer.
27. 2
Stem Lemma
Healing Heal Heal
Studies Stud Study
Studying Studi Study
caring car care
28. Text summarisation is the process of creating a shorter version of the 2
text with only vital information and thus, helps the user to understand the
text in a shorter amount of time. The main advantage of text
summarisation lies in the fact that it reduces user’s time in searching the
important details in the document.
169
29. is, the, of, that, into, are, and 2
30. NLTK is a Python Package that you can use for NLP. It is a platform 2
used for building Python programs that work with human language data
for applying in statistical natural language processing (NLP). It contains
text processing libraries for tokenisation, parsing, classification,
stemming, tagging and semantic reasoning.
31. Syntax: Syntax refers to the grammatical structure of a sentence. 2
Semantics: It refers to the meaning of the sentence.
32. Term frequency is the frequency of a word in one document. Term 2
frequency can easily be found from the document vector table as in that
table we mention the frequency of each word of the vocabulary in each
document.
33. In Text Normalization, we undergo several steps to normalize the text to 2
a lower level.
That is, we will be working on text from multiple documents and the
term used for the whole textual data from all the documents altogether is
known as corpus.
OR
A corpus is a large and structured set of machine-readable texts that have
been
produced in a natural communicative setting.
OR
A corpus can be defined as a collection of text documents. It can be
thought of as just a bunch of text files in a directory, often alongside
many other directories of text files.
34. No, the vocabulary of a corpus does not remain the same before and after 2
text normalization. Reasons are –
● In normalization the text is normalized through various steps and is
lowered to minimum vocabulary since the machine does not require
grammatically correct statements but the essence of it.
● In normalization Stop words, Special Characters and Numbers are
removed.
● In stemming the affixes of words are removed and the words are
converted to their base form.
So, after normalization, we get the reduced vocabulary.
35. Since we all know that the language of computers is Numerical, the very 2
first step that comes to our mind is to convert our language to
numbers. This conversion takes a few steps to happen. The first step to it
is Text Normalization.
Since human languages are complex, we need to first of all simplify
them in order to make sure that the understanding becomes possible.
170
Text Normalization helps in cleaning up the textual data in such a way
that it comes down to a level where its complexity is lower than the
actual data.
PART -C
36. a) 4
Document Classification: TF-IDF helps in classifying the type and genre
of a document by looking at the frequencies of words in the text. Based
on the TF-IDF values, it is easy to classify emails as spam or ham, to
classify news as real or fake and so on.
Topic Modelling: It helps in predicting the topic for the corpus. Topic
modelling refers to a method of identifying short and informative
descriptions of a document in a large collection that can further be used
for various text mining tasks such a summarisation, document
classification etc.
Key word Extraction: It is also useful for extracting keywords from text.
Information Retrieval System: To extract the important information out
of a corpus.
Stop word Filtering: It helps in removing unnecessary words out of a text
body
b)
172
6. Lemmatization: In lemmatization, the word we get after affix removal
(also known as lemma) is a meaningful one. Lemmatization makes sure
that lemma is a word with meaning and hence it takes a longer time to
execute than stemming.
e. Grammar Checkers - They use not only correct grammar and check
spellings but also suggest better synonyms and improve the overall readability
of your content. The NLP algorithm is trained on millions of sentences to
understand the correct format. Some of the most popular grammar checkers
that use NLP include Grammarly, WhiteSmoke, ProWritingAid, etc.
173
40. Human language Computer Language 4
Human language is Machine/computer
made up of letters, understands the language of
words and sentences numbers (binary numbers-
depending on the 0’s and 1’s). Everything that
languages. is sent to the machine has to
be converted to numbers.
It is very easy for For machines
humans to process understanding and
and communicate in generating natural
natural languages like languages is very
English, Hindi etc. complex process.
Our brain keeps on Computer uses NLP
processing the sounds techniques like Text
that it hears around Normalisation, Bag of
itself and tries to make words to convert the
sense out of them all the text to numbers for it to
time. process.
174
UNIT-7: EVALUATION
Problem Scoping ----- > Data Acquisition ---- > Data Exploring ------ > Modelling >
Evaluation.
Evaluation is the final stage in AI Project Cycle. Once a model has been made and
trained, it needs to go through proper testing so that one can calculate the efficiency
and performance of the model. Hence, the model is tested with the help of Testing
Data.
175
Why do we need evaluation?
Ans-Training data must not be used for evaluation purposes because a model simply
remembers the whole of training data, therefore always predicts the correct output
for any point in the training set whenever training data is fed again. But it gives
very wrong answers if a new dataset is introduced to the model. This situation is
known as overfitting.
1. Prediction The output given by the machine after training and testing the data
is known as Prediction. (Output of the machine)
2. Reality Reality is the real situation and real scenario where prediction has been
made by the machine. (Reality or truth)
We will consider many scenarios for evaluation. Then what is Scenario?
1. Prediction = YES
2. Reality = YES
3. True Positive
Here, we can see in the picture that it’s a football. The model’s prediction
is Yes which means it's football. The Prediction matches Reality. Hence, this
condition is termed as True Positive.
2. Case 2
Is this a Football?
1. Prediction = NO
2. Reality = NO
3. True Negative
Here this is Not an image of Football hence the reality is No. In this case, the
machine has predicted it correctly as a No. Therefore, this condition is termed as
True Negative.
3. Case 3
Is this a Football?
1. Prediction = YES
2. Reality = NO
3. False Positive (Type 1 Error)
Here the reality is that it is not Football. But the machine has incorrectly predicted
that this is Football. This case is termed False Positive.
Another example- You predicted that India won the cricket match series against
England but they lost.
177
4.Case 4
Is this a Football?
1. Prediction = NO
2. Reality = YES
3. False Negative ( Type 2 Error)
Here, a Football has been in a different look because of which the Reality is Yes
but the machine has incorrectly predicted it as a No which means the machine
predicts that it is not Football. Therefore, this case becomes False Negative.
Now these combinations are done by using different metrics. One of them is the
Confusion Matrix.
Confusion Matrix-
1. The comparison between the results of Prediction and reality is called the Confusion
Matrix.
2. It is a record that helps in evaluation.
3. It is not a calculation; it is a performance measurement for machine learning
178
1) Accuracy- It is the percentage of correct predictions out of all the observations.
All True positive and True Negative are the cases in which the Prediction matches with
reality.
Accuracy Formula
OR
TP+TN+FP+FN EXAMPLE-
1. True Positives = 0
2. True Negatives = 95
3. Total cases = 100
4. Therefore, accuracy
becomes: 95+0/100 =
95%
2. Precision Parameter-
It is defined as the percentage of true positive cases versus all the cases where the
prediction is true. It takes True Positives and False Positives.
179
Going back to the football example, in this case, assume that the model always
predicts that object is a Football irrespective of the reality. In this case, all the
Positive conditions would be considered that is,
In this case, the Players will check for the ball all the time to see if it is
Football or not (which means if the reality is True or False).
If Precision is high, this means the True Positive cases are more, giving lesser False
predictions.
3. Recall Parameter
It is the fraction of positive cases that are correctly identified. It considers the true
reality cases where in Reality, there was a football but the machine either detected
it correctly or didn`t. That is, it considers True Positive (There was a football in
reality and the model predicted a football) and False Negative (object is a football
and model predicts it is not).
We can see that the Numerator in both Precsion and Recall is same; True Positive.
But in the denominator, Precision counts the False Positive while Recall takes
False Negative into consideration.
Which one is more important than another, Precision or Recall?
1. Choosing between Precision and Recall depends on the condition in which
the model has been deployed. In a case like Forest Fire, a False Negative can cost
us a lot and is risky too. Imagine no alert being given even when there is a Forest
Fire. The whole forest might burn down.
180
2. Another case where a False Negative can be dangerous is Viral Outbreak.
Imagine a deadly virus has started spreading and the model which is supposed to
predict a viral outbreak does not detect it. The virus might spread widely and
infect a lot of people.
3. On the other hand, there can be cases in which the False Positive
condition costs us more than False Negatives. One such case is Mining. Imagine a
model telling you that there exists treasure at a point and you keep on digging
there but it turns out that it is a false alarm. Here, the False Positive case
(predicting there is a treasure but there is no treasure) can be very costly
.
4. Consider a model that predicts whether a mail is spam or not. If the model
always predicts that the mail is spam, people would not look at it and eventually
might lose important information. Here also False Positive condition (Predicting
the mail as spam while the mail is not spam) would have a high cost.
- If we want to know if our model`s performance is good, we need these two
measures: Precision and Recall. For some cases, you might have High precision but
Low Recall or Low Precision but High Recall. But since both the measures are
important, there is a need for a parameter which takes both Precision and Recall into
account.
4. F1 Score
It can be defined as the measure of balance between precision and recall.
An ideal situation is there when we have a value of 1 for both Precision and Recall.
Then F1 score would also be 1(100%). It is known as the perfect value for F1
Score. A model is having a good performance if F1 Score is high.
181
182
1 Mark question
The process of understanding the reliability of any AI model based on output
by feeding the test dataset is
a. Data feed
1
b. Data Reliability
c. Model Evaluation
d. None of these
The percentage of true positive cases versus all the cases where the prediction
is true is defined as
a. Precision
2
b. Accuracy
c. F1 Score
d. None of these
The percentage of correct predictions out of all observations.
a. Prediction
3 b. Accuracy
c. F1 Score
d. None of these
The result of comparison between the prediction and reality is recorded in
a. F1 Score
4 b. Confusion matrix
c. Evaluation Model
d. All of these
The measure of balance between precision and recall.
a. Accuracy
5 b. F1 Score
c. Precision
d. None of these
Which of the following talks about how true the predictions are by any
model?
a. Accuracy
6
b. Reliability
c. Recall
d. F1 Score
Which of the following parameters will be consider by recall, while
evaluating a model`s performance?
i. False negative
ii. True negative
7 iii. False positive
iv. True Positive
Choose the correct option:
a. only (i) b. (ii) and (iii)
c. (iii) and (iv) d. (i) and (iv)
183
The output given by the AI machine is known as ________
8 a. Prediction
b. Reality
Which of the following statements is not true about overfitting models?
(a) This model learns the pattern and noise in the data to such extent that it
harms the performance of the model on the new dataset
9
(b) Training result is very good and the test result is poor
(c) It interprets noise as patterns in the data
(d) The training accuracy and test accuracy both are low
Seema is learning the conditions that make up the confusion matrix. She came
across a scenario in which the machine that was supposed to predict a bird
was
always predicting a bird. What is this condition called?
10
a. False Positive
b. True Positive
c. False Negative
d. True Negative
What is the value of F1 score if the model is 100 % accuracy?
a. 100
11 b. 1
c. 0
d. 50
When the prediction is True and reality is False, that condition is termed as
a. TN
12 b. TF
c. FP
d. FN
Out of the following, which evaluation methods are used to calculate F1
score?
a. Accuracy & recall
13 b. Precision & F1 score
c. Accuracy and Precision
d. Precision & Recall
184
In spam email detection, which of the following will be considered as “ False
negative” ?
a. When a spam email is mistakenly identified as legitimate.
17
b. When an email is accurately recognised as spam.
c. When an email is inaccurately labelled as important.
d. When a legitimate email is accurately identified as not spam.
When the prediction is False and reality is True, that condition is
called………..
a. TN
18
b. TF
c. FP
d. FN
Statement 1: F1 score is evaluated based on precision or recall.
Statement 2: When the F1 score is 0, the model accuracy is 100%
185
According to the data given below, Calculate TP,TN,FP,&FN.
Index 1 2 3 4 5 6 7 8 9 10
Actual Bird Bird Bird Not Not Bird Not Bird Not Bird
14
bird Bird bird bird
Predicted Bird Bird Not Bird Not Bird Bird Not bird Not
Bird Bird bird bird
Consider the confusion matrix and calculate the recall and precision.
Reality
Confusion matrix
15 YES NO
YES 40 60
prediction
NO 80 20
4 mark question
1 Explain the different methods of evaluation of AI models.
Consider the scenario where the AI model is created to predict if there will be
rain or not. The confusion matrix for the same is given below. Calculate
precision, accuracy and recall.
2 Reality
Confusion matrix
YES NO
YES 70 30
prediction
NO 50 50
A binary classification model has been developed to classify the information
spread through social media is as either “Fake ” or “Real ”. The model was
tested on a dataset of 300 information, and the resulting confusion matrix is as
follows:
3 Reality
Confusion matrix
YES NO
YES 150 40
prediction
NO 50 60
186
An IT company situated in Bombay developed an AI model which predicts
the purchasing of electronic gadgets. During testing, the AI model came up
with the following predictions.
Based on the given predictions, calculate the following
Reality
Confusion matrix
5 YES NO
YES 60 25
prediction
NO 5 10
i. How many total tests have been performed in the above scenario.
ii. Calculate precision, recall and F1 score.
ANSWERS
1 mark questions
1 Model Evaluation
2 Precision
3 Accuracy
4 Confusion matrix
5 F1 score
6 Accuracy
7 d (I &iv)
8 Prediction
9 Training result is very good and the test result is poor
10 False Negative
11 1
12 TN
13 Precision & Recall
14 0 to 1
15 True
16 Accuracy
17 When a spam email is mistakenly identified as legitimate.
18 FN
19 Statement 1 is correct but statement 2 is incorrect.
20 Model Evaluation
2-mark question
1 Evaluation is the process of understanding the reliability of any AI model,
based on outputs by feeding test dataset into the model and comparing with
actual answers
2 Confusion matrix is a table that shows the result of comparison between the
prediction and reality. The confusion matrix allows us to understand the
prediction results.
187
Reality
Confusion matrix
YES NO
YES TP FP
prediction
NO FN TN
3 Overfitting is a problem where the evaluation of machine learning algorithms
on training data is different from unseen data.
4 Accuracy is defined as the percentage of correct predictions out of all the
observations. A prediction can be said to be correct if it matches the reality.
%
5 Recall is defined as the fraction of positive cases that are correctly identified.
6 Precision is defined as the percentage of true positive cases versus all the
cases where the prediction is true.
8 TP stands for True Positive. When the Prediction matches with the Reality,
that condition is called TP. That is, prediction is True and the Reality is True.
9 TN stands for True Negative. When the model evaluate that the prediction is
False and the Reality is also False.
188
15
4 mark question
1 Accuracy is defined as the percentage of correct predictions out of all the
observations. A prediction can be said to be correct if it matches the reality.
%
Recall is defined as the fraction of positive cases that are correctly identified.
Precision is defined as the percentage of true positive cases versus all the
cases where the prediction is true.
2 Accuracy =(70+50)/200=0.60
Recall=70/120=0.58
Precision=70/100=0.7
F1 Score=2*(.58*0.70)/0.58+0.70=0.639
3 Accuracy=(150+60)/300=0.7
Recall=150/200=0.75
Precision=150/190=0.789
F1 score=2*0.75*0.789/(0.75+0.789)=0.76
4 Accuracy=(90+50)/160=0.875
Recall=90/100=0.9
Precision=90/100=0.9
F=2*recall*precision/(recall+precision)
=2*0.81/1.8=0.9
Activity 2: Write a Python code to calculate Area of a triangle with Base and
Height
B=int(input(“Enter Base of a rectangle”))
H=int(input(“Enter Height of a rectangle”))
print(“Area of a rectangle is”,0.5*B*H)
o/p:
Enter Base of a rectangle5
Enter Height of a rectangle4
Area of a rectangle is 10.0
190
Enter person’s age 21
Person is Eligible to vote
191
Activity 6: Write a program to add the elements of the two lists.
l1=[20,30,40]
l2=[30,50,10]
l3=l1+l2
print("Addition of",l1,"and",l2,"is",l3)
o/p:
Addition of [20, 30, 40] and [30, 50, 10] is [20, 30, 40, 30, 50, 10]
192
o/p:
Activity 9: Write a program to display a scatter chart for the following points
(2,5), (9,10),(8,3),(5,7),(6,18).
import matplotlib.pyplot as plt
x=[2,9,8,5,6]
y=[5,10,3,7,18]
plt.scatter(x,y)
plt.title("Line chart")
plt.show()
o/p:
193
Activity 10: Write a program to display bar chart for the following data with
appropriate titles:
Subjects=[“Eng”,”Sci”,”Soc”,”Maths”,”AI”]
Marks=[89,87,78,90,99]
import matplotlib.pyplot as plt
Sub=["Eng","Sci","Soc","Maths","AI"]
Marks=[89,87,78,90,99]
plt.bar(Sub,Marks)
plt.title("Term-1 Performance")
plt.xlabel("Subjects")
plt.ylabel("Marks")
plt.show()
0/p:
Activity 11: Read CSV file saved in your system and display 5 rows
import pandas as pd
df=pd.read_csv(r"C:\Users\ADMIN\Desktop\abc.csv",nrows=10)
print(df)
194
o/p:
RNO NAME MARKS
0 1 HARI 67
1 2 RAMESH 89
2 3 SOMESH 56
3 4 RAJESH 78
4 5 BHIMESH 45
Activity 12: Read CSV file saved in your system and display its
information
import pandas as pd
df=pd.read_csv(r"C:\Users\ADMIN\Desktop\abc.csv",nrows=10)
print(df)
o/p:
RNO NAME MARKS
0 1 HARI 67
1 2 RAMESH 89
2 3 SOMESH 56
3 4 RAJESH 78
4 5 BHIMESH 45
5 6 SRIKANTH 67
6 7 SRINIVAS 89
7 8 SANDHYA 90
8 9 SADANA 56
9 10 RAJU 45
195
Activity 14: Write a program to read an image and display image shape
and size using Python
import cv2
img=cv2.imread(r"C:\Users\ADMIN\Desktop\abc.jpg")
cv2.imshow('myimg',img)
print("The shape of the image is",img.shape)
print("The Size of the image is",img.size)
cv2.waitKey(0)
o/p:
196
ARTIFICIAL INTELLIGENCE (SUBJECT CODE - 417)
Sample Question Paper for Class X (Session 2024-2025)
General Instructions:
197
(iii) Which of the following is a key aspect of time management? 1
(a) Procrastination and delaying tasks
(b) Prioritizing tasks based on urgency and importance
(c) Taking on more tasks than can be realistically completed
(d) Ignoring deadlines and commitments
(iv) You are training employees on safe computing practices to avoid 1
cyber threats. What steps would you take while using public Wi-Fi
networks?
(a) Disable firewall protection
(b) Avoid accessing sensitive websites
(c) Use a Virtual Private Network (VPN)
(d) Share Wi-Fi login credentials with others
(v) What is a key characteristic of successful entrepreneurs? 1
(a) Avoiding risks and playing it safe
(b) Focusing solely on short-term profits
(c) Being adaptable and willing to learn from failures
(d) Rejecting new ideas and sticking to traditional methods
(vi) Ecotech Solutions is a company specializing in green technologies. 1
They are planning to expand their operations globally. What
strategies can they adopt to ensure their expansion aligns with green
principles?
(a) Prioritizing cost-cutting measures over environmental concerns
(b) Implementing renewable energy sources in their production facilities
(c) Disregarding local environmental regulations for faster growth
(d) Promoting excessive consumption of their products without
considering sustainability
Q.2. Answer any 5 questions out of 6 1x5=
5
(Fill in the blanks: "Human intelligence encompasses various components 1
i such as reasoning, problem-solving, and
) ."
( Artificial Intelligence (AI) always operates ethically and without bias. - 1
i True or False?
i
)
(iii) With Great Power Comes Great Responsibility"? List 2 1
suggestions for responsible use of AI.
(iv) Which of the following statements about AI bias are incorrect? 1
a) AI bias can result from biased training data.
b) AI systems are inherently unbiased.
c) Addressing AI bias requires diverse and inclusive data.
d) Regular monitoring and auditing can help mitigate AI bias.
198
(How can AI be used in real life? 1
va) Autonomous driving vehicles
)b) Personalized medicine
c) Predicting future stock prices
d) All of the above
(vi) What are some ethical concerns involved in AI development? 1
a) AI bias
b) Data privacy
c) Unemployment due to automation
d) Transparency in decision-making
Q3 Answer any 5 out of the given 6 questions 1x5
=5
(i) What is the first step in the AI project cycle? 1
(a) Model training
(b) Data collection and preprocessing
(c) Model deployment
(d) Evaluation and testing
(ii) Which technique is commonly used in data science to handle missing 1
data in a dataset? (a) Ignoring the missing values
(b) Filling missing values with the mean or median
(c) Dropping rows with missing values
(d) Creating synthetic data to replace missing values
(iii) What is the primary application of object detection in computer vision? 1
(a) Classifying images into categories
(b) Segmenting images into regions
(c) Identifying and locating objects within an image
(d) Generating captions for images
(iv) Which task in natural language processing involves predicting the next 1
word in a sequence of words?
(a) Named Entity Recognition (NER)
(b) Sentiment Analysis
(c) Part-of-Speech Tagging (POS)
(d) Language Modeling
(v) What is the purpose of model evaluation in machine learning? (a) 1
To train the model on new data
(b) To select the best model based on performance metrics
(c) To preprocess the data before training
(d) To collect data for future analysis
(vi) The total number of Sustainable Development Goals (SDGs) were 1
launched at the United Nations Sustainable Development Summit in New
York in the year 2015, forming the 2030 Agenda for Sustainable
Development are:
a) 17 b)15 c)13 d)1
199
Q 4. Answer any 5 out of the given 6 questions 1X5
=5
(i) Which of the following includes major tasks of NLP? 1
a) Automatic Summarization
b) Discourse Analysis
c) Machine Translation
d) All of the mentioned
(ii) Which NLP task involves determining the sentiment or emotional tone 1
expressed in a piece of text such as positive, negative, or
neutral?
a) Named entity recognition
b) Sentiment analysis
c) Part of speech tagging
d) Machine translation
(iii) Rock, Papers, and Scissors game is based on the following domain. 1
a) Data for AI b) Natural Language Processing
c) Computer Vision d) Image Processing
(iv) The makes the data understandable for humans as we can 1
discover trends and patterns out of it.
a) Random Data
b) Graphical Representation
c) Unstructured Data
d) None of the above
(v) In unsupervised learning model, if we need to reduce their 1
dimension, which algorithm do we have to use?
a) Supervised algorithm
b) Dimensionality reduction algorithm
c) Clustering algorithm
d) None of the above
(vi) Chatbots often use a specific type of NLP model to maintain the context of 1
a conversation. What is the name of this model?
a) Recurrent Neural Network (RNN)
b) Convolutional Neural Network (CNN)
c) Transformer Model
d) Decision Tree Classifier
Q5 Answer any 5 out of the given 6 1x5=
5
(i) What is the primary purpose of a confusion matrix in model evaluation? 1
a) To compare different machine learning algorithms b.) To
visualize the model's decision boundary
c) To measure model's prediction accuracy
d) To evaluate the performance of classification model
200
(ii) Each evaluation metric represents the ratio of true negatives to all actual 1
negative instances and is commonly used in binary
classification.
a)Accuracy b)Precision c)Recall d)Specify
(iii) In model evaluation, what is the term for the process of splitting the data 1
set into two parts? One for training and one for testing. a). Data
sampling.
b). Data cleaning. c). Data splitting.
d). Data transformation.
(iv) If evaluation model will simply remember the whole training set, and 1
will therefore always predict the correct label for any point in the training
set. This is known as :
a) Overfitting
b) Overriding
c) Over remembering
d) None of the above
(v) The percentage of true positive cases versus all the cases where the 1
prediction is true is called
A. Overfitting
B. Accuracy
C. Precision
D. Data Acquisition
(vi) Rhea wants to know what is the primary purpose of validation data set 1
in machine learning.
It is:
A. To train the model.
B. To evaluate the model on unseen data.
C. To test the model's performance on the training data.
D. To visualize data relationships.
201
Q 7. As a student managing multiple assignments and deadlines. How could 2
you use AI tools or apps to organize your tasks, set priorities, and ensure
timely completion of each assignment? Provide 2 AI-based strategies for
effective self- management.
Q 8. How can AI-based recommendation systems enhance the user 2
experience on e-commerce platforms? Provide an example of how
these systems work.
Q 9. Discuss the role of AI in improving agricultural practices to reduce 2
water usage and increase crop yield.
Q 10. Mention precautions to take to do secure online payments 2
Answer any 4 out of the given 6 questions in 20 – 30 words each (2 x 4 = 8 marks)
Q 11. Compare and contrast the approaches of symbolic AI and machine 2
learning in solving AI tasks, highlighting their strengths and limitations.
Q 12. Evaluate the role of continuous testing and validation throughout the AI 2
project cycle in ensuring the reliability and accuracy of AI models.
Q 13. Explain the impact of data quality on the outcomes of data science 2
projects, considering factors such as data completeness, accuracy, and
relevance
Q 14. What are the ethical considerations related to the use of facial 2
recognition technology in public spaces, discussing privacy concerns
and potential biases.
Q 15. What are NLP systems with machine learning-based approaches, 2
highlighting their applicability in different NLP tasks.
Q 16. Evaluate the effectiveness of different evaluation metrics, such as 2
precision, recall, and F1 score, in assessing the performance of AI models
across various tasks.
Answer any 3 out of the given 5 questions in 50– 80 words each 4 x 3 = 12
Q 17. Aaadya is multi-talented and has excelled in academics, music, dancing, 4
sports and painting.
Describe different types of intelligences by naming and explaining any four
types of intelligences?
Q 18. After class 12 Rahul wanted to join for AI course. His parents didn’t know 4
much about its domains Explain them the domains of AI.
202
Q 20. Normalise the text on the segmented sentences given below: 4
Document 1: Diya and Riya are best friends.
Document 2: Diya likes to play guitar but Riya prefers to play violin
203
ANSWER KEY
Q1 (i) (b) To understand the speaker's message fully and accurately
(ii) (d) Use diplomatic language and provide constructive feedback
during the discussion
(iii) (b) Prioritizing tasks based on urgency and importance
(iv) (c) Use a Virtual Private Network (VPN)
(v) (c) Being adaptable and willing to learn from failures
(vi) (b) Implementing renewable energy sources in their
production facilities
Q2 (i) creativity
(ii) False
(iii)(a) Ensure transparency in AI decision-making processes
(b) Regularly audit AI systems for bias and fairness
(iv) b) AI systems are inherently unbiased.
(v) d) All of the above
(vi)(a) AI bias(b) Data privacy
Q3 (ii) (b) Filling missing values with the mean or median
(iii) (c) Identifying and locating objects within an image
(iv) (d) Language Modeling
(v) (b) To select the best model based on performance metrics
(vi) (a) 17
Q4 (i) d) All of the mentioned
(ii) b) Sentiment analysis
(iii) d) Image Processing
(iv) b) Graphical Representation
(v) b) Dimensionality reduction algorithm
Q5 (i) d) To evaluate the performance of classification model
(ii) c) Recall
(iii) c) Data splitting
(iv) a) Overfitting
(v) C. Precision
(vi) B. To evaluate the model on unseen data.
Q6 AI-powered chatbots can enhance customer service in retail by providing
immediate assistance, answering frequently asked questions, and guiding
customers through the purchasing process. For example, a chatbot can help
customers track their orders, recommend products based on their preferences,
and resolve billing inquiries in real-time.
Q7 As a student, AI tools can help organize tasks and set priorities by
using task management apps that utilize AI algorithms to schedule
assignments based on deadlines and workload. Additionally, AI-
powered virtual assistants can provide reminders and suggestions for
effective time management.
204
Q8 AI-based recommendation systems enhance user experience on e-
commerce platforms by analyzing user preferences and behavior to
provide personalized product recommendations. For instance, platforms
like Amazon use collaborative filtering algorithms to suggest products
based on past purchases, browsing history, and similar users' preferences.
Q9 AI plays a crucial role in improving agricultural practices by analyzing
data from sensors, drones, and satellites to optimize water usage, detect
crop diseases
early, and forecast yield. AI algorithms can provide insights on when and
where to irrigate, identify areas needing pest control, and suggest crop
varieties suited to specific conditions.
Q10 Precautions for secure online payments include using trusted payment
gateways, ensuring the website has SSL encryption, avoiding public Wi-
Fi for transactions, regularly monitoring bank statements, and enabling
two-factor authentication where possible. Additionally, using virtual
cards or digital wallets can add an extra layer of security.
Q11 Symbolic AI relies on predefined rules and representations to solve AI
tasks, while machine learning learns patterns from data. Symbolic AI is
transparent and interpretable but may struggle with complex or
ambiguous tasks. Machine learning, on the other hand, can handle large
datasets and adapt to new information but may lack transparency and
require substantial computational resources for training.
Q12 Continuous testing and validation throughout the AI project cycle ensure
the reliability and accuracy of AI models by detecting and correcting
errors early. It helps in refining models, improving performance, and
ensuring that they meet the desired objectives and specifications.
Q13 Data quality significantly impacts the outcomes of data science projects.
Factors such as data completeness, accuracy, and relevance influence the
reliability and effectiveness of analyses and models. Poor data quality can lead
to biased results, erroneous insights, and ineffective decision-making.
Q14 Facial recognition technology in public spaces raises ethical concerns
regarding privacy invasion and potential biases. There are concerns about
surveillance, consent, and the misuse of facial data. Biases in facial recognition
algorithms can lead to discriminatory outcomes, particularly against certain
demographics.
Q15 NLP systems with machine learning-based approaches utilize algorithms
to learn patterns from textual data, enabling tasks such as sentiment
analysis, named entity recognition, and machine translation. These
systems excel in handling large and diverse datasets, offering scalability
and adaptability across various NLP tasks.
205
Q16 Types of Intelligences:
Linguistic Intelligence: Aaadya's ability to excel in academics and
possibly writing or public speaking showcases linguistic intelligence,
which involves proficiency in language and communication.
Musical Intelligence: Aaadya's talent in music indicates musical
intelligence, which involves sensitivity to rhythm, melody, and sound.
Bodily-Kinesthetic Intelligence: Aaadya's prowess in dancing and
sports suggests bodily-kinesthetic intelligence, which relates to physical
coordination, agility, and control.
Visual-Spatial Intelligence: Aaadya's skill in painting reflects visual-
spatial intelligence, which involves the ability to perceive the world
accurately and manipulate objects mentally
Q17 Types of Intelligences:
Linguistic Intelligence: Aaadya's ability to excel in academics and
possibly writing or public speaking showcases linguistic intelligence,
which involves proficiency in language and communication.
Musical Intelligence: Aaadya's talent in music indicates musical
intelligence, which involves sensitivity to rhythm, melody, and sound.
Bodily-Kinesthetic Intelligence: Aaadya's prowess in dancing and
sports suggests bodily-kinesthetic intelligence, which relates to physical
coordination, agility, and control.
Visual-Spatial Intelligence: Aaadya's skill in painting reflects visual-
spatial intelligence, which involves the ability to perceive the world
accurately and manipulate objects mentally.
206
Q20 Hint -Stopwords in the given sentence which should not be removed are:
@, . (fullstop) ,_(underscore) , 123(numbers) These tokens are generally
considered as stopwords, but in the above sentence, these tokens are part
of email id. removing these tokens may lead to invalid website address
and email ID. So these words should not be removed from the above
sentence. (1 mark for identifying any two
stop words from the above, and 1 mark for the valid justification.
Precision = True Positives / (True Positives + False Positives) = 60 / (60
+ 5) = 0.923 Recall = True Positives / (True Positives + False Negatives)
= 60 / (60 + 25) = 0.706 F1 Score = 2 * (Precision * Recall) / (Precision
+ Recall) = 2 * (0.923 * 0.706) / (0.923 + 0.706) ≈ 0.801
(ii) Total tests performed = Sum of all entries in the confusion matrix =
60 + 25 + 5
+ 10 = 100.
207
.
208
209
210
211
212
213
214
215
216
ZIET Mysore
217
ZIET Mysore
218