Decoding AI Class X Supplement_2025-26
Decoding AI Class X Supplement_2025-26
The CBSE, in its recently-released Artificial Intelligence syllabus for 2025-26, has
announced certain changes in the Class X curriculum (Code 417). To apprise students
of the latest changes and provide them with the new content in printed form, we have
prepared this Supplement, which completes our book ‘Decoding Artificial Intelligence’
in all respects vis-a-vis the latest syllabus. We are offering this Supplement FREE OF COST
to all those using our book.
For a soft copy of this Supplement, write to us at [email protected].
Educational Publishers
Syllabus
ARTIFICIAL INTELLIGENCE (Code No. 417)
CLASS X (2025–26)
Total Marks: 100 (Theory 50 + Practical 50)
UNITS NO. OF HOURS MAX. MARKS
(Theory and Practical) (Theory and Practical)
EMPLOYABILITY SKILLS
Unit 1 : Communication Skills–II 10 2
Unit 2 : Self-Management Skills–II 10 2
PART A Unit 3 : ICT Skills–II 10 2
Unit 4 : Entrepreneurial Skills–II 10 2
Unit 5 : Green Skills–II 10 2
Total 50 10
SUBJECT-SPECIFIC SKILLS Theory Practical
Unit 1 : Revisiting AI Project Cycle & Ethical Frameworks 11 4
7
for AI
Unit 2 : Advanced Concepts of Modeling in AI 18 7 11
Unit 3 : Evaluating Models 21 4 10
PART B
Unit 4 : Statistical Data – 28 –
Unit 5 : Computer Vision 10 20 4
Unit 6 : Natural Language Processing 20 7 8
Unit 7 : Advance Python 10 –
Total 160 40
PRACTICAL & PROJECT WORK
Practical File with minimum 15 Programs 15
Practical Examination
Unit 4: Statistical Data
Unit 5: Computer Vision
Unit 6: Natural Language Processing 15
PART C Unit 7: Advance Python
Viva Voce 5
Project Work / Field Visit / Student Portfolio (Any one to be
10
done)
Viva Voce (related to project work) 5
Total 50
GRAND TOTAL 210 100
To begin, play the MyGoodness game and complete the 10 giving decisions. Pay attention to
how you make choices, especially when some details are hidden. How did you choose whom to
give?
1. Did you prefer giving to certain people, causes or locations?
2. Did hidden information affect your choices?
3. Were your decisions based on emotions, personal experiences, or assumptions?
AI systems learn from human data, which may include biases like favouring certain groups
overlooking hidden factors or making assumptions based on incomplete information.
Just as you made decisions based on limited data, AI can also develop biases depending on how
it is trained.
Factors affecting human decision-making:
1. Personal and Emotional Factors: Our decisions are usually influenced by emotions,
past experiences and upbringing. People may favour choices that are connected with their
values, beliefs or personal experiences.
2. Perception of Need and Impact: Our choices are also governed by how urgent or
effective an option appears. We tend to prioritize actions that seem to have a direct or
visible impact.
3. Bias in Human vs Non-Human Considerations: Humans are most likely to prioritize
their own needs over those of animals or the environment. However, emotional
attachment or ethical beliefs can shift preferences.
4. Geographic and Demographic Biases: People are more likely to make decisions
that benefit those in familiar locations or social groups. Stereotypes and personal
identification can shape preferences and priorities.
5. Religious and Ethical Views: Faith and moral beliefs influence decision-making,
affecting judgments on fairness, responsibility and what is considered right or wrong.
6. Transparency and Trust: People prefer options that feel reliable and verifiable. Lack of
information or fear of deception can discourage certain choices.
Do it Yourself
Play the MyGoodness game again and see if your decisions change when you actively try
to reduce bias. Discuss how bias in AI could impact areas like hiring, loan approvals or
criminal justice.
Principles of Bioethics
Respect for Persons/Autonomy: This principle recognizes that each
person has inherent value, dignity and is capable of making their own
decisions. As doctors, it is not enough to simply treat someone—you
must also honour their choices, allowing them to be active
participants in the decision-making process. In the context of
medicine, autonomy demands that doctors fully inform patients about
proposed procedures, obtain their consent and respect their refusal. Autonomy
Beneficence (Doing good): This principle is a call for action and a moral
imperative to act in the best interests of others, seeking ways to help
them. Medical interventions, treatments and research should be driven
by a desire to bring maximum benefit and provide improved care to those
seeking help.
Developing vaccines for diseases like polio, smallpox or COVID-19 are
some examples of the principle of beneficence, with the sole aim of
Beneficence
improving the well-being of millions. These treatments were successful
because the intention to help others reigned supreme.
Non-Maleficence (Avoiding harm): This bioethics principle
is the commitment to ‘do no harm’. Doctors, researchers and
healthcare providers must be cautious about potential risks, actively
avoiding unnecessary or unjustifiable harm to their patients.
Justice (Fairness): It is the ethical principle that reminds us to
treat everyone fairly, irrespective of social, economic or other
differences. Resources should be distributed equitably and access
to healthcare should be guaranteed for all. This principle requires
that healthcare must be a right and not a privilege of every human. Non-Maleficence
Case Study
SMART MEDICINE DISPENSER AND THE VILLAGE DOCTOR
Consider Asha Gram, a rural village in India. Like many other villages, Asha Gram faces challenges in
healthcare access. There is one primary health centre run by Dr Sharma, a dedicated doctor who works
long hours in the service of the people. Remote parts of the village are laden with challenges as delivering
medicines on time becomes difficult.
HealTech, a tech company, has developed a new ‘Smart Medicine Dispenser’. This is a small, AI-powered
device that is designed to automatically dispense the right medicine and dosage to patients, based on the
doctor’s prescription and the patient’s unique identification (through a fingerprint scan or Aadhaar card).
It is equipped with a screen that shows simple instructions while recording details of each dispensing.
This could be particularly helpful in rural areas with shortage of trained medical staff.
HealTech proposes a pilot program for Asha Gram—they will install multiple
smart dispensers at community centres and train local volunteers to assist
people in using them. Dr Sharma will initially prescribe medicines as
usual. However, eventually, the AI dispenser could also give suggestions
based on the data it collects, such as a patient’s prior health records
and symptom descriptions (entered by the local health volunteer or the
patient themselves). This can also help to track usage of medicines and
provide analytics to public health workers to identify outbreaks or gaps
in health service delivery.
Key Issues
• Limited Access: Asha Gram has limited access to healthcare professionals and medications and
solely relies on Dr Sharma.
• New Technology: Smart Medicine Dispenser offers a potential solution but also brings up ethical
questions.
Bioethical Considerations
1. Autonomy: Does Smart Medicine Dispenser respect the autonomy of patients? Do the patients
have a right to choose if they want to use this technology or not? Who makes the ultimate
decision on their healthcare?
2. Beneficence: How can Smart Medicine Dispenser improve healthcare in Asha Gram? What are the
benefits for the patients and the community? Does using this device truly do any good?
3. Non-Maleficence: What are the risks associated with this technology? How might it potentially
harm patients? What safeguards need to be put in place?
4. Justice: Would using these dispensers be fair to everyone in Asha Gram? How might differences
in technology literacy, health awareness or accessibility create inequities in healthcare access?
AI Ethics Considerations
1. Data Privacy: What are the ethical concerns about collecting and storing patient data with Smart
Medicine Dispenser? How should this data be protected? Is consent being obtained fairly and in a
culturally appropriate manner?
2. Transparency: How does the AI system determine which medication to dispense? How transparent
is the decision-making process to Dr Sharma and the patients?
3. Bias: How can we ensure that the AI system is not biased against certain groups of patients? Is the
AI using data from other countries which may not be suitable for patients in India?
4. Accountability: Who is responsible if the AI system makes an error and causes harm to a patient—
HealTech, Dr Sharma or the village volunteer?
5. Solutions: What steps should Dr Sharma, the villagers and HealTech take to ensure that Smart
Medicine Dispenser is implemented ethically and effectively? How should the villagers’ concerns
be addressed? What happens when things go wrong?
Exercises
Objective Type Questions
I. Multiple Choice Questions (MCQs):
1. Which of the following is not a principle of bioethics?
(a) Autonomy (b) Justice
(c) Accountability (d) Beneficence
2. What is the primary focus of the principle of beneficence?
(a) Avoiding harm to patients (b) Acting in the best interest of others
(c) Ensuring fairness in treatment (d) Respecting patients’ autonomy
SUPPLEMENT—Decoding Artificial Intelligence–X 7
3. The Hippocratic Oath is most closely associated with which bioethical principle?
(a) Non-Maleficence (b) Autonomy
(c) Justice (d) Beneficence
4. What ethical concern arises when AI systems make independent decisions in healthcare?
(a) Data transparency (b) Patient autonomy
(c) Equitable resource allocation (d) All of these
5. Which of the following is a primary reason for needing ethical frameworks in AI?
(a) To increase profit margins in tech companies
(b) To ensure fairness, accountability and transparency
(c) To promote rapid adoption of AI systems
(d) To eliminate the need for human oversight
3. Labels (Target Variable): Labels are the expected output that AI models try to
predict. They also vary depending on the application and input data, e.g., in a spam
detection system, labels would be ‘Spam’ and ‘Not Spam’ while in a cat vs dog classifier,
labels would be ‘Cat’ and ‘Dog’. In a sentiment analysis model, labels would be ‘Positive’,
‘Negative’ and ‘Neutral’.
Data Type Features (Input) Label (Output)
Tabular Email contains ‘Win’, includes link Spam / Not Spam
Image Pixels, Edges, Color Distribution Cat / Dog
Text Words, Punctuation, Sentiment Score Positive / Negative
4. Labelled and Unlabelled Data: In AI, data can be categorized as labelled or unlabelled,
depending on whether the outputs (labels) are provided. The choice of data type affects
how AI learns and makes predictions.
• Labelled data consists of input examples that are tagged with the correct output (label).
This type of data is used in supervised learning, where AI learns by mapping inputs
to known outputs. Some examples are as follows:
Data Type Features (Input) Label (Output)
Tabular Age, Symptoms Disease Name
Image Pixels, Colors Cat / Dog
Text Words, Length Positive / Negative
• Unlabelled data consists of input examples without predefined labels. AI models analyze
patterns and group similar data without knowing the correct answer. This is used in
unsupervised learning, where AI finds hidden structures in data. For example—
Tabular Data: A dataset of customer transactions without predefined categories,
where AI groups similar spending behaviours.
Image Data: A collection of photos without labels, where AI clusters similar-looking
images.
Text Data: A set of news articles where AI automatically categorizes topics (e.g.,
sports, politics, entertainment) without predefined labels.
5. Training Data and Test Data: These are derived from a dataset by dividing it into
two parts. Usually, the training dataset is 70-80% of the main dataset while the testing
dataset is the remaining 20-30%. Training data is used to teach the AI model while test
data is used to evaluate how well the AI model performs on unseen data.
For example, in a handwriting recognition AI model, 80% of digit images are used for
training, while 20% are used for testing. Similarly, in a chatbot, historical chat logs
are used for training while new user messages serve as test data.
Apple Orange
Labelled Data
• Building the Model: The program then builds a model which helps it classify future images.
• Prediction: When we give it a new image, the model tries to classify it, saying “Is this an
apple or an orange?” It outputs a classification label or a category corresponding to
the new input based on its learning, as illustrated below:
It’s an ‘Orange’
• Performance: The most important goal is to get a good classification accuracy, which
means that it can correctly label the images most of the time.
Many daily activities use classification models. For example, when you upload a video on a
platform, it is automatically categorized as ‘education’, ‘entertainment’, etc. Another example
is how your email puts new emails in different folders like inbox, spam, etc., or a customized
label that you may create.
Rooms
Training
Size Price
Location
Labelled Data
• Model Creation: The model creates an equation which best describes how house size
and location relates to its price.
• Price Prediction: When we have details of a new house, we put these values into the
equation and the model predicts its likely selling price.
Rooms Predicted
Size Price
Location
• Goal: The goal is to make predictions as close to the actual prices as possible. We
check the model’s performance by measuring how close the predictions are to the actual
values. In the process, we try to make sure the difference or the error is as small as
possible.
It is the same way how programs used in predicting stock markets work. They analyze a lot of
historical data to predict the price of a stock in the future. These models are used in a number
of applications including financial predictions, in predicting sales, in weather forecasting, in
medical research and even in environmental protection.
Clustering
Illustrative Example: Think about a situation where you have a mix of different card games,
e.g., Uno, regular playing cards and Bingo cards, all jumbled together. How would we sort them
out? We want to group them based on their properties without knowing their names or game
types.
• The Data: We have a mixed set of cards, i.e., Uno cards, Bingo cards and regular playing
cards. For each card, we note down features like its color, whether it has a number or
symbol, its shape (square for Bingo, rectangular for others) and what type of game it is used
for. However, we don’t have labels telling us if a card is Uno, Bingo or a regular playing card.
• Finding Similarities: The clustering algorithm looks at the different features of each
card and tries to find the ones that are similar.
• Grouping: It groups together cards with similar features into a cluster. For example, one
cluster might have all the number cards from the regular playing cards, another cluster
may contain the brightly-colored Uno cards and yet another cluster may include square-
shaped Bingo cards.
• Results: In the end, we will have several clusters and the cards in each cluster will be
more similar to each other than cards in other clusters. We may have a cluster for Uno
cards, another cluster for regular playing cards and yet another cluster for Bingo cards.
</>
Clustering
Associations
Definition: An association model looks for rules or patterns that describe how different items
or events are connected. It tries to find which data tends to occur together.
Illustrative Example: Consider the items that people buy in a supermarket.
• The Data: We collect lots of data on customer transactions that indicate the items
people usually buy together.
• Finding Relationships: The association model analyzes this data to find out the items
that are frequently purchased together.
• Discovering Patterns: The model might discover rules like ‘if a customer buys milk,
they are also likely to buy cereal’ or ‘if a customer buys a pizza, they will also buy a soft
drink’.
• Generating Rules: The model creates rules about what occurs together frequently and
how strong this pattern is.
• Actionable Insights: These discovered relationships can be used to make decisions.
In practice, e-commerce companies use association models to provide product
recommendations to their customers based on past purchases. To increase its sales, a retail
store can also rearrange products based on what customers frequently buy together.
Deep Learning uses more complex models and utilizes neural networks that have many layers.
This is inspired by how the human brain works. They can learn very complex patterns from
data and give us better results than simpler machine learning models.
We will discuss two popular types of deep learning models—Artificial Neural Networks (ANN)
and Convolutional Neural Networks (CNN).
Input Layer
Hidden Layer 1
Hidden Layer 2
Output Layer
Understanding ANN: Let us assume that we want to build a program that can understand
and translate a sentence from English to Hindi, without needing to manually write the rules of
translation.
The Challenge: Normally, for translation, we would need to have a lot of labelled data,
like sentences in English along with their Hindi translations, and have manual rules that
define how each word is translated. This can be difficult and time-consuming. This is where
supervised learning cannot perform well as it cannot generalize to unseen data in such a
complex environment or create the rules required to convert a language.
• ANN Solution: An ANN can learn to translate by looking at a large number of sentences
in both languages, without us having to tell it all the specific translation rules.
• Neurons and Connections: ANN consists of interconnected artificial neurons arranged
in layers that process words sequentially.
• Learning Process: ANN learns how to represent the meanings of words and how to
translate them correctly by adjusting the connections in the network.
• Making Translations: After learning, when given a new sentence in English, ANN uses
this knowledge to give the corresponding sentence in Hindi.
• Complex Relationships: ANN can automatically learn complicated rules of language
translation, which are difficult to define in traditional ways. This is why supervised
learning is not adequate for this case.
x3 w3
Weighted Sum
Calculate z
Point to Remember
The weights and features used in this perceptron example are for demonstration purposes only. They
are simplified and may vary based on individual preferences, environmental factors and specific
contexts. In real-world AI models, weights are learned from data and feature importance depends
on the dataset and training process.
Exercises
Objective Type Questions
I. Multiple Choice Questions (MCQs):
1. What is the primary goal of a classification model?
(a) To group similar data points together (b) To predict numerical values
(c) To sort data into predefined categories (d) To discover relationships between data
2. In supervised learning, which of the following is NOT required for training?
(a) Labelled data (b) Training examples
(c) Feedback from the algorithm (d) Unlabelled data
3. Which of the following statements about regression models is correct?
(a) They classify data into groups.
(b) They predict a category for input data.
(c) They predict continuous numerical values.
(d) They discover associations between variables.
4. Which of these is an example of clustering in unsupervised learning?
(a) Identifying spam emails
(b) Grouping customers based on purchasing behaviour
(c) Predicting housing prices based on size and location
(d) Translating English to Hindi
Prerequisite: Understand the role of evaluation in the development and implementation of AI systems
(See Book Unit 3, ‘Evaluation’, pages 177-178; Unit 8, ‘Evaluation’, pages 395-398.)
TRAIN-TEST SPLIT
Evaluation: How Do We Know If Our Model is Good?
We have already learned how to train machine learning models using data. But how do we know
if the model is any good, i.e., if it is making good predictions or classifications.
Consider an analogy: You are preparing for an exam. You need to study using the training
material available but you also need a way to test how well you have learned. Similarly, to see
how well our model is performing, we need to evaluate its performance. Evaluation is usually
done using a special method called train-test split. This helps us understand if our model is
actually learning and making good predictions.
Split
Image 1
1. Complete Data: Start with the entire collection of cat and dog images. This is
represented in the top row in Image 1, i.e., ‘Complete Data’.
2. Marked Split: Decide how to divide the data for training and testing. This split is shown
in the second row in Image 1, i.e., ‘Complete Data (With Split Percentage Marked)’, where
the blocks are now colored differently. You may decide to divide the data in 70:30 ratio,
i.e., 70% for training and 30% for testing. (The split ratio can be 80:20, 60:40, 75:25 or
any other ratio depending upon the problem and data. However, a 70:30 split is usually
considered acceptable.)
3. Split Data: Separate the data into two parts, as shown with arrows in the image, which
results in ‘Training Data’ and ‘Testing Data’. This is a random split, having photos of both
cats and dogs in the training dataset as well as in the testing dataset.
ML Model Predicts on
Test Values
Testing Data
Image 2
4. Training: Train the model using the ‘Training Data’, as shown in Image 2, by feeding
it to the machine learning model. The model studies these images to learn how to
differentiate a cat from a dog based on features such as color, shape, size, etc.
5. Testing: Once the training is completed, use the ‘Testing Data’ to test the model’s
accuracy. Check how accurately the model classifies the images in the test data that it has
never seen during training.
6. Evaluation: The predicted classifications are then compared to the correct labels in the
testing set. If the model is accurate, the predicted labels should be the same as the correct
labels.
CLASSIFICATION METRICS
To understand what is meant by correct or incorrect predictions, we need to define something
called the positive class. In the given example, let us choose to define ‘Cat’ as the positive
class and ‘Dog’ as the negative class. True Positive, True Negative, False Positive and False
Negative are defined with respect to this chosen positive class. Let us look at this with our
classification example:
• True Positive (TP): The number of times the model correctly predicts a cat’s image as
‘cat’, i.e., the image is of a cat.
Exercises
Objective Type Questions
I. Multiple Choice Questions (MCQs):
1. What is the purpose of a train-test split in machine learning?
(a) To collect data
(b) To evaluate model performance on unseen data
(c) To ensure 100% accuracy of the model
(d) To optimize training data usage
2. Which dataset is used to evaluate the model’s performance?
(a) Training dataset (b) Testing dataset
(c) Validation dataset (d) Full dataset
3. In train-test split method, what does the testing set represent?
(a) Data used for model training
(b) Data used for hyperparameter tuning
(c) Unseen data to check the model’s generalization ability
(d) Data used for increasing model accuracy
4. If a model has an accuracy of 90%, what is its error rate?
(a) 10% (b) 90%
(c) 80% (d) Cannot be determined
5. What does ‘True Negative’ represent in classification?
(a) Model incorrectly predicts the positive class
(b) Model correctly predicts the positive class
(c) Model correctly predicts the negative class
(d) Model incorrectly predicts the negative class
AI FOR EVERYONE
We have learned about the complexities of Machine Learning and Deep Learning but any
AI-based application needs data to understand the problem and to be able to analyze it. Once
we have gathered sufficient data, we can build AI models. Let us understand what statistical
data is and where it is used.
2. Google Cloud AutoML: A suite of no-code machine learning tools by Google that
enables users to train AI models for tasks like image recognition, text classification and
translation without coding. It uses Google’s powerful AI infrastructure to automatically
optimize models based on provided data.
Installation Steps
Prerequisites: Ensure Python (3.6 or a newer version) is installed on your computer. You can
download Python from python.org
Install Orange: The software can be installed from command line or from an installer. You
can also visit the official website https://orangedatamining.com/download/
To access the blank interface of Orange Data Mining tool, click New and the following screen
will appear.
Case Study
LOAN CLASSIFICATION
Let us discuss a case study where we will classify loan application status using loan data, containing
two files, one each for training data and testing data. The dataset, which can be downloaded from
https://bit.ly/loan-data, has been used here along with the hands-on example to learn more about Orange
Data Mining tool.
Understanding Use Case: Loan Classification
In this case study, we will build an AI model to classify loan applications. This is a common problem faced
by banks and financial institutions.
• Problem: Banks receive many loan applications every day. Making decisions on which loans to
approve and which to deny can be quite time-consuming.
• Solution: We will use historical data to train a model that can classify loan applications as ‘Approved’
or ‘Denied’, based on the information provided.
• Objective: The aim is to automate the decision-making process.
Steps for AI Project Cycle
We will consider AI project cycle steps for this project. Thus, the steps for our project will include the following:
1. Problem Definition: Classify loan applications as ‘Approved/Yes’ or ‘Denied/No’ based on the
features provided. We wish to automate the decision-making process using historical data to create
a classification model to predict whether a bank will approve a loan application or not. The banks
usually decide whether to give loan to an applicant or not based on some factors, which are provided
as features in the dataset.
Data Acquisition
Step 1: Load Data: Start Orange and load train_loan.csv dataset using File widget, as
shown in the following screenshots. This will import the data into our tool. Consider the
following steps:
Step 1(a): Drag File widget and drop in the workflow area.
Step 1(c): Select the downloaded train_loan.csv file from your computer.
Step 1(d): Check the data available in the train_loan.csv file by double-clicking the File
widget.
Step 1(f): Repeat the steps to import test_loan.csv on a File widget and rename it as
TestData.
Which features have missing values? Let us find out with the help of Feature
Statistics widget. Drag and drop the Feature Statistics widget on the
workflow canvas. Observe carefully that this widget has two surrounding
dotted lines on both left and right flanks as compared to only one
dotted line in the File Widget.
These dotted lines which flank a widget are the Input and Output Interfaces for the widgets,
which help them connect to other widgets.
Step 2(a): Connecting Widgets—Try connecting the right flank dotted line (output
interface) of TrainingData to the left flank dotted line (input interface) of Feature Statistics
by dragging a line from TrainingData to Feature Statistics. The result of joining the two is
shown below:
The last column shows the number of missing values in each feature. We can see that Credit
History is missing in 50 out of the 614 records. We can either drop those rows for improving
data quality or use imputation to fill the missing values with average values in the feature.
Step 3: Imputation—Add the Impute widget available under
Transform tab and connect TrainingData to Impute. Data flows
automatically to the Impute widget upon connection.
Let us apply an imputation operation by double-clicking the
Impute widget. We can choose the default impute settings
under ‘Default Method’ where we have chosen ‘Average/Most
Frequent’ value for filling in the missing values.
In case we wish to apply a different imputation method to a specific feature, select the feature
in the list and choose the preferred method.
To achieve this, we must update the Feature Type of Loan Status from a Categorical Feature
to a Categorical Label. Let us use the Select Columns widget under the Transformation tab
and connect the output of Impute widget to it.
To select Loan_Status as the target label, we can drag and drop the Loan_Status from
features to the Target window as illustrated below:
But which part of the data split is this data info showing?
Let us find out by clicking on the ‘Link Label’ as highlighted below. You can highlight the link
label by a single click while a double click shows which part of the split data is populated in the
Data Info.
The data sampler interface has two output check
boxes and a line showing which part of the split
data is passed on to the Data Info widget. By
default, the data sample is passed to Data Info.
Let us inspect the remaining data by adding
another Data Info and connecting Data Sampler to
this widget.
Double-click the link label and edit it to remove
connection between Data Info and Data Sampler
and create a different connection from ‘Remaining
data” to Data widget. You can edit the connection
by clicking on the connecting line and then drawing
another line between boxes you wish to connect.
Now you can observe both 80% data in the first Data Info widget and 20% remaining data
in the second Data Info widget by double-clicking to see the properties. The Link Labels are
automatically updated as well!
We have used a Random Sampling Method with a Training Size of 80%. But we can also let
Test and Score widget to evaluate the Learner by connecting remaining data for testing its
performance.
Let us make another connection from the data sampler to the Test and Score widget but this
time we will change the link label to use Remaining Data. Also, double-click Test and Score
widget and select the Radio Button—Test on Test Data.
All settings are illustrated below for clarity.
Let us apply another learner to check comparative performance of two Machine Learning
algorithms on the same training dataset.
We shall use Random Forest as the second learner and connect this learner to Test and Score,
keeping all other connections same as before.
Double-click Test and Score widget for a comparative analysis after applying both learners.
The outputs of both Confusion Matrix and ROC Analysis can be obtained by double-clicking
the widgets and we have already learned how to understand the outputs in previous chapters.
Use the model with best results after testing all classification models in the same way.
2. The connection line appears dashed or broken. This happens because we have not
connected the training data to our learner. Let us connect the data sampler to Logistic
Regression learner and test predictions again.
3. We can add the Tree learner to the Predictions widget and connect the Data Sampler to
Tree Learner in the same way. The complete flow is shown below.
In the end, double-click the Predictions widget which will show the Predicted loan
approval status for each entry in the TestData for both Learners in separate columns 1 and
column 2.
Activity 1
PALMER PENGUINS
Objective: Analyze the Palmer Penguins dataset using Orange Data Mining to classify penguin
species based on physical measurements.
Steps to Engage:
1. Data Preparation:
• Download the Palmer Penguins dataset from https://bit.ly/4atlCN7
• Ensure the dataset is clean and organized for importing into Orange.
2. Load Data into Orange:
• Open Orange and create a new workflow.
• Drag and drop Import Data widget to load the Palmer Penguins dataset.
• Connect Data Table and Scatter Plot widgets to explore the dataset visually.
3. Feature Extraction and Classification:
• Use Image Embeddings widget to extract relevant features, if using images, or directly
proceed with Data Table, if using numerical features.
• Add a Classifier Learner (e.g., Logistic Regression or Random Forest) and connect it
to Test & Score widget.
4. Evaluation and Analysis:
• Evaluate the model using Test and Score widget.
• Observe metrics like accuracy, precision and recall.
• Connect Confusion Matrix widget to analyze misclassifications.
Evaluate the model’s performance and suggest improvements.
Discuss the practical implications of using no-code tools for data analysis.
Resource Name: Palmer Penguins Case Study in Orange Data Mining
Link to Dataset: https://bit.ly/4atlCN7
Artificial Intelligence
Machine
Learning
Deep
Computer Vision Learning
Difference between Computer Vision and Image Processing
Computer Vision Image Processing
CV is an AI-driven technology that allows computers Image processing refers to enhancing or modifying
to interpret images/videos and make decisions. images using techniques like filtering, resizing and
color adjustment.
It helps in understanding visual content, e.g., object It helps in improving image quality, e.g., noise removal,
recognition, motion tracking. sharpening, color correction, etc.
It uses deep learning and neural networks to provide It implements image filters, transformations, pixel
feature extraction and object detection. manipulation and other simple techniques for
processing images.
Examples: Facial recognition, self-driving cars, Examples: Image resizing, contrast enhancement,
medical diagnostics watermark removal
CV is a subset of AI and strongly relies on AI and Machine Learning while image processing
does not necessarily use AI and is usually done using mathematical techniques.
• In the search bar, type Image Analytics. Select Image Analytics and click OK.
• After adding the extension, load Orange3 and create a New Workflow (File → New).
Step 2: Load image data.
• Drag Import Images widget onto the canvas.
• Click the widget and select the “Directory” to
upload dandelion and sunflower images from your
computer. (Do ensure that images in the folders
are correctly labelled.)
Step 3: Extract image features.
• Add Image Embedding widget. Connect Import Images to Image Embedding.
• Choose an Embedder (embedding model, e.g., SqueezeNet or InceptionV3).
g ont EMBEDDING
a r er
J Al Embedding refers to turning an image into numbers that a computer can understand.
When you look at a picture of a dandelion or sunflower, you see colors, shapes and textures.
But a machine doesn’t ‘see’ like we do—it needs numbers.
An embedding model (like SqueezeNet or InceptionV3) is a smart tool that looks at an image
and creates a list of numbers (features) that represent important details like shape, texture and
color patterns.
These numbers (embeddings) help the computer recognize and classify images correctly.
Instead of comparing raw pictures, the model compares these numerical features to decide if
an image is a dandelion or a sunflower.
• Double-click on Test and Score widget and choose Cross validation or Random
sampling to split data for training and testing.
Double-click Confusion Matrix widget to check results. The matrix will show how
well the model distinguishes between dandelions and sunflowers.
Activity 1
Learn how to plot AUC curves using Orange; the outputs should look like the image below.
Hint: Connect another widget from Evaluate tab to Test and Score.
Arbitrariness Productivity
New words
Meaning changes
Consider an example: ‘Riya forgot her umbrella. She got wet on way home.’ NLP infers
that it was raining, even though the sentence does not explicitly say so. This reasoning is
essential for making sense of the text in a human-like way.
Exercises
Objective Type Questions
I. Multiple Choice Questions (MCQs):
1. Which feature of natural languages allows humans to create and understand new sentences they
have never heard before?
(a) Arbitrariness (b) Productivity (Creativity)
(c) Displacement (d) Cultural Transmission
2. In NLP, which stage is responsible for breaking text into individual words or tokens?
(a) Syntax Analysis (b) Lexical Analysis
(c) Semantic Analysis (d) Pragmatic Analysis
3. What is the primary function of sentiment analysis in NLP?
(a) Translating text between languages (b) Detecting grammatical errors
(c) Identifying emotions in text (d) Classifying news articles
4. Which of the following is NOT an application of NLP?
(a) Google Translate (b) Video Compression
(c) Auto-Summarization (d) Speech-to-Text
NO CODE NLP
Natural language processing has progressed very rapidly during the past five years and its
widespread usage has resulted in development of tools which can be used by anyone with basic
knowledge, without any programming requirements. Tools like Orange Data Mining for NLP
make it accessible for beginners with a no-code approach to problem-solving. A comparative
analysis of No-Code tools with code-based NLP libraries is presented below.
No-Code NLP Tools vs Code-based NLP Libraries
Feature No-Code Tools (e.g., Orange Data Code-based Libraries (e.g., spaCy, NLTK)
Mining)
Ease of Use Drag-and-drop interface does not Requires coding skills (Python), setup and scripting
require programming knowledge
Flexibility Limited customization, mainly pre- Highly customizable with deep control over NLP
built components tasks
Speed & User-friendly but may be slower for Faster processing with optimized algorithms,
Performance large datasets especially in spaCy
Supported Tasks Basic NLP tasks like tokenization, Advanced NLP tasks such as named entity
sentiment analysis and word recognition (NER), dependency parsing and custom
clouds. model training
Machine Learning Provides built-in models but limited Allows full control over training custom ML models
Integration fine-tuning for NLP
Use Case Ideal for beginners, educators and Preferred for research, industry applications and
Suitability quick exploratory analysis production-level NLP systems
One popular application of NLP is sentiment analysis. Sentiment analysis is a Natural
Language Processing (NLP) technique used to determine the emotional tone behind a piece
of text. It classifies text as positive, negative or neutral, helping to analyze opinions and
emotions.
Some applications of Sentiment Analysis include:
• Social Media Monitoring: Analyzing tweets, comments or posts to understand public
sentiment about brands, products or events.
• Customer Feedback Analysis: Companies use it to assess product reviews and improve
services based on customer opinions.
• Stock Market Prediction: Investors analyze news articles and social media sentiment
to predict stock trends.
• Political Opinion Analysis: Used to understand public sentiment towards political
candidates or policies.
• Use the Data tab and add the CSV File Import widget to the workflow. Double-click
it and select the csv file to be uploaded from your computer.
• Let us load our data to a corpus by adding and connecting the Corpus widget from
the Text Mining tab to the CSV File Import widget.
We may add multiple data files as the corpus signifies a collection of documents which
shall be used for Natural Language Processing tasks.
SUPPLEMENT—Decoding Artificial Intelligence–X 65
• To view the contents of the corpus, add the Corpus Viewer widget from the
Text Mining tab to the workflow and connect it to the Corpus widget.
• You can inspect the corpus data by double clicking the Corpus Viewer widget, which
illustrates the content of all the documents added to the corpus.
You may experiment with different methods like Liu Hu, SentiArt or Multilingual Sentiment to
observe how well they perform.
Also connect a Corpus Viewer and a Data Table widget to the output of sentiment analysis
and observe the output sentiments associated with each post, as illustrated below:
Exercises
Objective Type Questions
I. Multiple Choice Questions (MCQs):
1. Which of the following tools can be used for No-Code NLP sentiment analysis?
(a) Jupyter Notebook (b) Orange Data Mining
(c) TensorFlow (d) Visual Studio Code
2. In No-Code NLP, which widget in Orange is used to visualize the frequency of words?
(a) Sentiment Analysis Widget (b) Corpus Viewer Widget
(c) WordCloud Widget (d) Preprocess Text Widget
Chapter 2
I. Multiple Choice Questions (MCQs):
1. (c) 2. (d) 3. (c) 4. (b) 5. (c)
6. (b) 7. (b) 8. (b)
II. Fill in the blanks:
1. mathematical equations 2. clusters 3. human brain
4. Labels 5. Activation 6. Features
III. True or False:
1. True 2. False 3. True 4. False 5. True
IV. Assertion and Reasoning Based Questions:
1. (i) 2. (iv) 3. (iii) 4. (iii)
Chapter 3
I. Multiple Choice Questions (MCQs):
1. (b) 2. (b) 3. (c) 4. (a) 5. (c)
II. Fill in the blanks:
1. Training set 2. (TP + TN) / (TP + TN + FP + FN)
3. Error
III. True or False:
1. False 2. True
IV. Assertion and Reasoning Based Questions:
1. (iii) 2. (iii)
Chapter 4
I. Multiple Choice Questions (MCQs):
1. (b) 2. (c) 3. (c) 4. (c) 5. (b)
6. (c)
II. Fill in the blanks:
1. Numerical 2. education
3. Orange Data Mining 4. numerical
III. True or False:
1. False 2. True
IV. Assertion and Reasoning Based Questions:
1. (iii) 2. (i)
Chapter 6
I. Multiple Choice Questions (MCQs):
1. (b) 2. (b) 3. (c) 4. (b)
II. Fill in the blanks:
1. Syntax Analysis 2. Speech-to-text 3. Keywords
III. True or False:
1. False 2. True 3. False
IV. Assertion and Reasoning Based Questions:
1. (i) 2. (iv)
Chapter 6 (Practicals)
I. Multiple Choice Questions (MCQs):
1. (b) 2. (c)
II. True or False:
1. False 2. True