AI- Machine Learning Engineer Handbook
AI- Machine Learning Engineer Handbook
Sector
IT-ITeS
Sub-Sector
Future Skills
Occupa�on
Ar�ficial Intelligence & Big Data Analy�cs
This license lets others remix, tweak, and build upon your work even for commercial purposes, as long as they credit you and
license their new creations under the identical terms. This license is often compared to “copyleft” free and open-source software
licenses. All new works based on yours will carry the same license, so any derivatives will also allow commercial use. This is the
license used by Wikipedia and is recommended for materials that would benefit from incorporating content from Wikipedia and
similarly licensed projects.
Disclaimer
The information contained herein has been obtained from various reliable sources. IT- ITeS Sector Skills Council
NASSCOM disclaims all warranties to the accuracy, completeness or adequacy of such information. NASSCOM shall
have no liability for errors, omissions, or inadequacies, in the information contained herein, or for interpretations
thereof. Every effort has been made to trace the owners of the copyright material included in the book. The
publishers would be grateful for any omissions brought to their notice for acknowledgements in future editions of
the book. No entity in NASSCOM shall be responsible for any loss whatsoever, sustained by any person who relies
on this material. All pictures shown are for illustration purpose only. The coded boxes in the book called Quick
Response Code (QR code) will help to access the e-resources linked to the content. These QR codes are generated
from links and YouTube video resources available on Internet for knowledge enhancement on the topic and are not
created by NASSCOM. Embedding of the link or QR code in the content should not be assumed endorsement of
any kind. NASSCOM is not responsible for the views expressed or content or reliability of linked videos. NASSCOM
cannot guarantee that these links/QR codes will work all the time as we do not have control over availability of the
linked pages.
Skilling is building a be�er India.
If we have to move India towards
development then Skill Development
should be our mission.
iii
COMPLIANCE TO
QUALIFICATION PACK – NATIONAL OCCUPATIONAL
STANDARDS
is hereby issued by the
IT – ITeS Sector Skill Council NASSCOM
for
iv
Acknowledgements
NASSCOM would like to express its gratitude towards company representatives, who belieive in our
vision of improving employability for the available pool of engineering students. SSC NASSCOM makes
the process easier by developing and implementing courses that are relevant to the projected industry
requirements.
The aim is to close the industry-academia skill gap and create a talent pool that can withstand upcoming
externalities within the IT-BPM industry.
This initiative is the belief of NASSCOM and concerns every stakeholder – students, academia, and
industries. The ceaseless support and tremendous amount of work offered by IT-ITeS members to
strategize meaningful program training materials, both from the context of content and design are
truly admirable.
We would also like to show our appreciation to Orion ContentGrill Pvt. Ltd. for their persistent effort,
and for the production of this course publication.
v
Participant Handbook
The Participant Handbook details the relevant activities to be performed by the DevOps Engineer. After
studying this handbook, the job holders will be proficient enough to perform their duties as per the
applicable quality standards. The latest and approved edition of The AI- Machine Learning Engineer’s
Handbook aligns with the following National Occupational Standards (NOS) detailed in.
1. SSC/N8121: Evaluate technical performance of algorithmic models
2. SSC/N8122: Develop software code to support the deployment of algorithmic models
3. SSC/N9014: Maintain an inclusive, environmentally sustainable workplace
4. DGT/VSQ/N0102: Employability Skills (60 Hours)
The handbook has been divided into an appropriate number of units and sub-units based on the contents
of the relevant QPs. We hope that it will facilitate easy and structured learning for the participants,
enabling them to acquire advanced knowledge and skills.
vi
AI- Machine Learning Engineer
Table of Contents
S.N. Modules and Units Page No
https://www.skillindiadigital.gov.in/content/list
9. Annexure 123
vii
Participant Handbook
viii
1. Artificial Intelligence
& Big Data Analytics
– An Introduction
Unit 1.1 - Introduction to Artificial Intelligence & Big
Data Analytics
Bridge Module
Participant Handbook
2
AI- Machine Learning Engineer
Unit Objectives
By the end of this unit, the participants will be able to:
1. Explain the relevance of AI & Big Data Analytics for the society
2. Explain the various use-cases of AI & Big Data in the industry
3. Define “general” and “narrow” AI
4. Describe the fields of AI such as image processing, computer vision, robotics, NLP, etc.
5. Analyse the differences between key terms such as Supervised Learning, Unsupervised Learning
and Deep Learning
6. Outline a career map for roles in AI & Big Data Analytics
3
Participant Handbook
Ethical
decession
making
Significance
of AI & Big
Data
Analy�cs for
Society
Economic
Improved
growth and job
healthcare
crea�on
Smart ci�es
and
infrastructure
1. Enhanced Decision-Making: AI and Big Data Analytics empower individuals, businesses, and
governments to make more informed decisions. By analysing large datasets, patterns and trends
can be identified, enabling proactive decision-making in various sectors such as healthcare, finance,
and public policy.
2. Innovation and Automation: AI fosters innovation by automating repetitive tasks and processes,
freeing up human resources for more creative and strategic endeavours. This leads to increased
efficiency, productivity, and the development of new products and services that tailor to evolving
societal needs.
3. Improved Healthcare: In the healthcare sector, AI and Big Data Analytics play a pivotal role in
disease diagnosis, treatment optimization, and personalized medicine. Predictive analytics helps
identify potential outbreaks, while machine learning algorithms contribute to the development of
innovative medical solutions.
4. Smart Cities and Infrastructure: The integration of AI and Big Data Analytics in urban planning results
in the creation of smart cities. These cities leverage data to enhance infrastructure, transportation,
and public services, leading to improved quality of life for residents.
5. Economic Growth and Job Creation: The adoption of AI and Big Data Analytics contributes to
economic growth by fostering modernization and creating new job prospects. Industries embracing
these technologies witness increased competiveness on a global scale.
6. Ethical Considerations: The widespread use of AI and Big Data Analytics raises ethical concerns
related to privacy, bias, and accountability. It becomes imperative for society to navigate these
challenges and establish frameworks that ensure responsible and ethical use of these technologies.
4
AI- Machine Learning Engineer
5
Participant Handbook
1.1.4 Types of AI
Artificial Intelligence (AI) constitutes a multifaceted domain, embracing diverse approaches and
capabilities. Within the expansive realm of AI, we encounter two fundamental classifications: "General
AI" and "Narrow AI," each delineating distinct scopes and functionalities in the quest to emulate human
intelligence. Additionally, there is a futuristic concept known as "Super AI," which envisions an even
more advanced form of artificial intelligence transcending the capabilities of both General and Narrow
AI.
6
AI- Machine Learning Engineer
• Limited Context: Narrow Al operates within a constrained context and may struggle when faced
with situations outside its predefined parameters.
1.1.5 Fields of AI
Artificial Intelligence (AI) spans a multitude of specialized fields, each addressing unique aspects
of intelligent system development. These fields leverage advanced algorithms, machine learning
techniques, and data analysis to enhance various applications. Here are descriptions of some prominent
fields within AI:
• Image Processing: Involves manipulating and analysing visual data for quality improvement,
information extraction, and pattern recognition. Applications include medical imaging, facial
recognition, object detection, and satellite imagery enhancement
• Computer Vision: Enables machines to interpret and understand visual information, akin to human
visual perception. Applications include object recognition, image and video analysis, autonomous
vehicles, and augmented reality.
7
Participant Handbook
• Machine Learning: Encompasses the development of algorithms and models for systems to
learn from data and improve performance over me. Applications include predictive analytics,
recommendation systems, fraud detection, and autonomous decision-making.
8
AI- Machine Learning Engineer
• Speech Recognition: Technology that interprets and understands spoken language, converting
it into text or triggering specific actions. Applications include virtual assistants, voice-controlled
devices, and transcription services.
• Expert Systems: Computer programs that mimic human expert decision-making in specific domains.
Applications include diagnosing medical conditions, providing technical support, and offering
expertise in various professional fields.
9
Participant Handbook
Point of
Supervised Learning Unsupervised Learning Deep Learning
Difference
Deep learning is a subset
In supervised learning, Unsupervised learning
of machine learning that
the algorithm is trained deals with unlabelled
involves neural networks
on a labelled dataset, data, where the
with numerous layers
where input-output algorithm explores the
(deep neural networks).
pairs are provided. inherent structure and
These networks,
Definition The algorithm learns patterns within the
inspired by the structure
to map inputs to data without explicit
of the human brain,
corresponding outputs, guidance. The goal
can automatically
making predictions or is often to discover
learn hierarchical
classifications based on hidden relationships or
representations from
this learned mapping. groupings.
data.
Can involve both
Requires a labelled Works with unlabelled labelled and unlabelled
dataset where the data, exploring patterns data, but often benefits
Training Data
algorithm learns from without predefined from large labelled
input-output pairs outputs. datasets for training
intricate models.
Primarily employed
for automatically
Aims to predict or
Focuses on uncovering learning hierarchical
classify based on
patterns, relationships, representations,
Goal learned relationships
or groupings within the facilitating complex jobs
between input and
data. like image recognition
output data.
and natural language
understanding.
Applied in clustering,
Commonly used
dimensionality Excels in complex tasks
for tasks such as
reduction, and like image and speech
classification and
association rule learning. recognition, natural
Use Cases regression. Examples
Examples include language processing,
include spam detection,
customer segmentation, and autonomous
image recognition, and
anomaly detection, and systems.
predicting stock prices.
topic modelling.
The model is trained
Utilizes neural networks
on a labelled dataset, The algorithm
with multiple layers.
adjusting parameters autonomously identifies
Training involves back
Training Process iteratively to minimize patterns or structures
propagation, adjusting
the difference between within the data without
weights in the network
predicted and actual predefined labels.
to minimize errors.
outputs.
10
AI- Machine Learning Engineer
Point of
Supervised Learning Unsupervised Learning Deep Learning
Difference
Dominates tasks
Widely used in real- Applied in scenarios requiring sophisticated
world applications where finding hidden pattern recognition,
such as sentiment patterns or groupings such as image and
Applications
analysis, credit scoring, is essential, like market speech recognition,
and recommendation basket analysis or language translation,
systems. identifying outliers. and autonomous vehicle
control.
Machine Learning
Big Data Engineer
Engineer
(NSQF Level - 7)
(NSQF Level - 8)
Fig. 1.1.11: Career Map for roles in AI & Big Data Analytics
11
Participant Handbook
12
AI- Machine Learning Engineer
Summary
• AI and Big Data Analytics play a crucial role in addressing complex societal challenges by providing
insights, optimizing processes, and enabling informed decision-making.
• Industries leverage AI and Big Data for applications like predictive analytics, personalized marketing,
fraud detection, and process optimization, enhancing efficiency and competitiveness.
• General AI refers to machines with human-like cognitive abilities across various tasks, while Narrow
AI focuses on specific tasks, demonstrating expertise in a defined domain.
• AI encompasses diverse fields, including image processing (manipulating visual data), computer
vision (interpreting visual information), robotics (automation), and NLP (interpreting and generating
human language).
• Supervised learning uses labelled data for training, unsupervised learning finds patterns in unlabelled
data, and deep learning involves neural networks with multiple layers to extract complex features.
• Careers in AI and Big Data Analytics involve steps like gaining relevant education, acquiring skills
in programming and data analysis, and specializing in areas such as machine learning or data
engineering.
13
Participant Handbook
Exercise
Multiple Choice Questions
1. What is the primary role of AI and Big Data Analytics in society?
a. Entertainment b. Addressing societal challenges
c. Political activism d. Cultural preservation
5. What is a crucial step in the career map for AI and Big Data Analytics roles?
a. Gaining expertise in a specific field b. Pursuing a career in robotics
c. Ignoring programming skills d. Avoiding specialization
Descriptive Questions
1. Explain the impact of AI and Big Data Analytics on a specific industry of your choice.
2. Discuss the ethical considerations associated with the use of AI in society.
3. Elaborate on the applications of computer vision in real-world scenarios.
4. Compare and contrast supervised learning and unsupervised learning with examples.
5. How can individuals prepare themselves for a career in Big Data Analytics, considering the evolving
industry requirements?
14
AI- Machine Learning Engineer
Notes
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
Scan the QR codes or click on the link to watch the related videos
https://youtu.be/ad79nYk2keg?si=U3fOp-AmnaBCe-Gl https://youtu.be/XFZ-rQ8eeR8?si=5ptCRjz5Lg6zVkyB
15
Participant Handbook
16
2. Product Engineering
Basics
Unit 2.1 - Exploring Product Development and
Management Processes
Bridge Module
Participant Handbook
18
AI- Machine Learning Engineer
Unit Objectives
By the end of this unit, the participants will be able to:
1. Analyse activities across product development stages.
2. Discuss product management processes, including ideation, market research, wireframing,
prototyping, and user stories.
3. Explore strategies for generating and managing new product ideas.
4. Evaluate product risks and devise corresponding risk response plans.
Planning Phase:
• Create a product roadmap outlining key milestones and deliverables.
• Define project scope, objectives, and success criteria.
19
Participant Handbook
20
AI- Machine Learning Engineer
product lifecycle, guiding strategic direction, feature prioritization, and go-to-market strategies. By
leveraging market research findings, product managers can effectively mitigate risks, identify growth
opportunities, and tailor product offerings to meet evolving market demands.
Wireframing, prototyping, and user stories serve as essential tools in translating conceptual ideas
into tangible product features and functionalities. Wireframes provide skeletal representations of
user interfaces, enabling stakeholders to visualize layout structures and navigation flows. Prototyping
facilitates iterative design and development by creating interactive mock-ups that simulate user
interactions and workflows. User stories capture user requirements and behaviours in a narrative
format, guiding development teams in prioritizing features and delivering value-aligned solutions.
Together, these processes streamline communication, foster collaboration, and ensure alignment
between product vision and execution, ultimately enhancing the likelihood of delivering successful
products that resonate with end-users.
Ideation:
• Collaborative Brainstorming Sessions: Engage stakeholders from various departments to generate
a diverse range of ideas.
• Customer Feedback Analysis: Gather insights from existing and potential customers to understand
their pain points and needs.
• Market Trends Analysis: Monitor industry trends, competitor activities, and emerging technologies
to identify opportunities for innovation.
• Creativity Nurturing: Encourage a culture of creativity and experimentation within the organization
to foster a conducive environment for ideation.
Market Research:
• Comprehensive analysis: Conduct in-depth research to gather data on consumer behaviors,
preferences, and market dynamics.
• Understanding customer pain points: Identify key challenges faced by customers and prioritize
features that address these pain points.
• Strategic decision-making: Utilize market research findings to inform product strategy, feature
prioritization, and go-to-market plans.
• Risk mitigation and opportunity identification: Identify potential risks and opportunities in the
market landscape to make informed decisions and capitalize on emerging trends.
21
Participant Handbook
22
AI- Machine Learning Engineer
Product development includes all aspects of producing innovation, from thinking of a concept to
delivering the product to customers. When modifying an existing product to create new interest, these
stages verify the potential success of the modifications at generating business. The seven stages of
product development are:
Benefits of Product Development Strategy: A robust product development strategy offers numerous
benefits to organizations, ranging from enhanced competitiveness to increased customer satisfaction.
By systematically innovating and refining products, companies can stay ahead of market trends and
maintain relevance in dynamic environments. Additionally, a well-executed product development
strategy enables organizations to capitalize on emerging opportunities, expand market reach, and
drive revenue growth. Moreover, by prioritizing customer needs and preferences throughout the
development process, companies can deliver products that resonate with their target audience,
fostering brand loyalty and positive customer experiences. Furthermore, an effective product
development strategy facilitates agility and adaptability, allowing organizations to respond swiftly to
changing market conditions and customer demands, thereby positioning themselves for long-term
success and sustainability in competitive landscapes.
23
Participant Handbook
• Make New Items: Changing your product idea is one way to go about product development. If a
market isn't reacting well to innovation, the business might think about investing its resources to
determine what that market wants. Since not every idea will lead to a successful product, it can be
good to be open to changing ideas as necessary.
• Look for New Markets: Numerous goods can be offered profitably in a variety of markets. Marketing
an established product to a new market or demographic is one approach to product creation. This
could involve marketing to a different age group, focusing on businesses rather than individual
customers, or expanding the geographic reach of your product.
24
AI- Machine Learning Engineer
• Technical Risks:
ᴑ Risk: Technical challenges or complexities in product development, such as scalability issues,
integration problems, or software bugs.
ᴑ Response: Conduct thorough technical feasibility studies and prototype testing to identify
and address potential issues early in the development process. Implement agile development
methodologies to facilitate rapid iteration and adaptation to changing requirements. Maintain
a skilled development team and leverage external expertise or partnerships as needed.
• Market Risks:
ᴑ Risk: Market volatility, changing consumer preferences, or unexpected competitive pressures
that may affect product demand or adoption.
ᴑ Response: Conduct comprehensive market research to understand target audience needs,
preferences, and market trends. Develop flexible go-to-market strategies to adapt to changing
market conditions. Implement pilot testing or market validation activities to assess product-
market fit before full-scale launch. Diversify product offerings or target markets to mitigate
dependency on a single market segment.
• Regulatory and Compliance Risks:
ᴑ Risk: Non-compliance with regulatory requirements, industry standards, or data protection
regulations, leading to legal liabilities or market restrictions.
ᴑ Response: Stay updated on relevant regulations and standards applicable to the product's
industry and target markets. Incorporate compliance considerations into the product design
and development process from the outset. Engage legal experts or regulatory consultants to
ensure adherence to applicable laws and regulations. Implement robust data protection and
security measures to safeguard customer privacy and mitigate data breach risks.
• Financial Risks:
ᴑ Risk: Cost overruns, budget constraints, or insufficient return on investment (ROI) due to
unforeseen expenses or revenue shortfall.
ᴑ Response: Develop detailed budget plans and financial projections to estimate development
and launch costs accurately. Implement cost control measures, such as resource optimization,
vendor negotiation, and risk contingency planning. Explore alternative funding sources, such
as grants, partnerships, or crowdfunding, to supplement financial resources. Continuously
monitor project expenses and performance metrics to identify deviations from the budget and
take corrective actions as needed.
25
Participant Handbook
Summary
• Categorize activities across stages such as conceptualization, planning, design and development,
testing and validation, launch and deployment, and post-launch evaluation and maintenance.
• Discuss key processes, including ideation, market research, wireframing, prototyping, and user
stories to guide effective product development.
• Explore strategies for generating ideas through brainstorming, customer feedback, and managing
new products through innovation frameworks and structured processes.
• Evaluate potential risks throughout the product lifecycle and devise response plans to address
technical, market, regulatory, financial, and reputation risks.
• Demonstrate the importance of budgeting and scheduling in product development, ensuring
efficient resource allocation and timely project delivery.
• Apply cost models and forecasts to optimize product development costs, enhancing profitability
and competitiveness in the market.
• Foster a culture of innovation to generate creative ideas that address market needs and gaps,
driving product development forward.
• Utilize comprehensive market research to understand consumer behaviours, preferences, and
trends, enabling informed decision-making and strategic planning.
• Create prototypes to validate product concepts and gather user feedback, facilitating iterative
design and ensuring alignment with user needs.
• Embrace a cycle of continuous improvement, leveraging feedback and data analytics to refine
products, optimize processes, and stay responsive to evolving market demands.
26
AI- Machine Learning Engineer
Exercise
Multiple-choice Question:
1. In which stage of product development would wireframing typically occur?
a. Ideation b. Planning
c. Design and Development d. Testing and Validation
3. Which of the following is NOT a strategy for exploring new product ideas?
a. Conducting market research b. Hosting brainstorming sessions
c. Prototyping d. Ignoring customer feedback
Descriptive Questions
1. Discuss the significance of wire framing in the product development process and its role in facilitating
user interface design.
2. Explore the role of market research
3. Describe two methods for exploring new product ideas and managing the innovation process within
organizations.
4. Evaluate the importance of risk assessment in product development.
5. Demonstrate the process of budgeting and scheduling in product development.
27
Participant Handbook
Notes
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
Scan the QR codes or click on the link to watch the related videos
https://www.youtube.com/watch?v=oE6VD23Kr0I https://www.youtube.com/watch?v=XD45n_agC3g
28
3. Product Engineering
Basics
Unit 3.1 - Statistical Analysis Fundamentals
Bridge Module
Participant Handbook
30
AI- Machine Learning Engineer
Unit Objectives
By the end of this unit, the participants will be able to:
1. Differentiate between probability distributions like Normal, Poisson, Exponential, and Bernoulli by
categorizing their characteristics and applications.
2. Employ graphical techniques like scatterplots to identify correlations between variables and analyse
patterns within datasets to discern relationships.
3. Apply descriptive statistics fundamentals, using mean, median, and mode measures to accurately
summarize and interpret data distributions.
4. Utilize correlation techniques, such as Pearson’s Correlation Coefficient and Methods of Least
Squares, to assess relationships between variables, quantifying associations within datasets.
31
Participant Handbook
Bernoulli distribution is the simplest of the distributions mentioned here, dealing with binary outcomes:
success or failure. It models situations where there are only two possible outcomes in a single trial,
such as flipping a coin (heads or tails) or a single yes-or-no question. The distribution is defined by a
single parameter p, which represents the probability of success in a single trial.
Each of these distributions plays a vital role in probability theory and statistics, offering valuable
tools for modelling and analysing various types of data and phenomena, from the continuous and
symmetrical to the discrete and binary. Understanding their properties and applications is essential for
making informed decisions in a wide range of fields, from finance and engineering to healthcare and
social sciences.
Each of these probability distributions serves distinct purposes in statistical analysis. The Normal
distribution provides a versatile framework for modelling continuous variables with a tendency to
cluster around a central value, making it applicable in fields ranging from natural sciences to economics.
The Poisson distribution is specifically tailored to model rare event occurrences within fixed intervals,
such as customer arrivals or product defects, offering insights into probability distributions for discrete
events. The Exponential distribution, focusing on the time between events in memoryless processes,
aids in analysing waiting times or lifespans of components in various engineering and operational
contexts. Meanwhile, the Bernoulli distribution simplifies scenarios to binary outcomes, which is
crucial for modelling success or failure probabilities in single trials and laying the groundwork for more
complex binomial distribution models. Each distribution caters to specific data characteristics and
analytical needs, contributing uniquely to statistical inference and decision-making processes across
diverse fields and applications.
Probability
Characteristics Applications
Distribution
• Symmetrical and bell-shaped • Modeling continuous random
curve variables with data clustering around
• Mean, median, and mode at the a central value and predictable
Normal spread.
centre
• Two parameters: mean (μ) and • - Natural and social sciences, finance,
standard deviation (σ) engineering, and quality control.
• Modeling the probability of a certain
• Describes rare event
number of events happening within a
occurrences
Poisson fixed interval.
• Single parameter: λ (lambda)
• Customer arrivals, product defects,
representing average rate
accidents, and rare event analyses.
• Describes the time between
consecutive events • Analyzing the time duration
between occurrences in memoryless
• Memoryless property: events processes.
Exponential occur independently at a
constant rate • Waiting times between events,
component lifespans, and reliability
• Single parameter: λ (lambda) analysis.
representing average rate
• Modeling outcomes with only two
possible states: success/failure,
• Deals with binary outcomes
heads/tails, or yes/no.
Bernoulli • Single-trial with two possible
• Single-trial experiments, binary
outcomes
decision-making, and foundational
for binomial distribution models.
32
AI- Machine Learning Engineer
33
Participant Handbook
Correlation is a statistical measure that quantifies the strength and direction of the relationship between
two variables. There are several types of correlation, each indicating a different kind of relationship
between variables:
• Positive Correlation: In a positive correlation, as one variable increases, the other variable also
tends to increase. Conversely, as one variable decreases, the other variable tends to decrease. The
correlation coefficient for a positive correlation is between 0 and +1.
• Negative Correlation: In a negative correlation, as one variable increases, the other variable tends
to decrease, and vice versa. The correlation coefficient for a negative correlation is between -1 and
0.
• Zero Correlation: There is no systematic relationship between the variables in a zero correlation.
Changes in one variable do not predict changes in the other variable. The correlation coefficient is
close to 0.
34
AI- Machine Learning Engineer
• Linear Correlation: Linear correlation occurs when the relationship between variables can be
approximated by a straight line on a scatterplot. Positive and negative correlations can both be
linear.
• Nonlinear Correlation: Nonlinear correlation occurs when the relationship between variables
cannot be adequately represented by a straight line. Instead, the relationship may follow a curved
or irregular pattern on a scatterplot.
35
Participant Handbook
• Perfect Correlation: Perfect correlation occurs when all data points fall exactly on a straight line (in
the case of positive or negative correlation) or a curve (in the case of nonlinear correlation). The
correlation coefficient is either +1 or -1.
• Partial Correlation: Partial correlation measures the relationship between two variables while
controlling for the effects of one or more additional variables. After accounting for other factors, it
helps to assess the unique association between variables.
36
AI- Machine Learning Engineer
37
Participant Handbook
Mean: The mean, also known as the average, is calculated by summing all values in a dataset and
dividing by the total number of observations. It represents the central tendency of the data and is
sensitive to extreme values, making it useful for symmetrically distributed data. However, outliers can
influence it, potentially skewing its value and misrepresenting the data.
One of the primary advantages of using the mean is its ability to provide a single representative value
that summarizes the entire dataset. This makes it particularly useful for datasets with a symmetric
distribution, where the values cluster around a central point. For example, when examining the heights
of a population, the mean height would provide a concise summary of the typical height within that
group.
However, a key limitation of the mean is its sensitivity to extreme values, also known as outliers.
Outliers are data points that lie significantly far away from the majority of the data. Since the mean
takes into account every value in the dataset, outliers can substantially impact its value. For instance, if
a dataset contains a few extremely high or low values, the mean can be skewed towards these outliers,
misrepresenting the central tendency.
To illustrate this, consider a dataset representing the incomes of individuals in a country. Most people
may earn moderate incomes, but if a few billionaires are included in the dataset, their exceptionally high
incomes will inflate the mean, making it appear much higher than the typical income of the population.
Median: The median is the middle value of a dataset when arranged in ascending or descending order.
It is not affected by extreme values, making it robust to outliers. The median is particularly useful
for datasets with skewed distributions, accurately reflecting the central tendency even in non-normal
distributions. It is often preferred over the mean in such cases.
The median is a measure of central tendency that provides valuable insights into data distribution,
especially in cases where the data may be skewed or influenced by outliers. Unlike the mean, which
extreme values can heavily influence, the median remains robust and unaffected by outliers. This
property makes it particularly useful for datasets with skewed distributions, where the mean may not
accurately represent the central tendency.
When a dataset is arranged in ascending or descending order, the median is simply the middle value.
If there is an odd number of observations, the median is the value exactly in the middle of the ordered
list. If there is an even number of observations, the median is the average of the two middle values.
This calculation method ensures that the median represents a central value that divides the dataset
into two equal halves.
In skewed distributions, where the data is not symmetrically distributed around a central value, extreme
values may pull the mean towards the skewed tail of the distribution. However, the median remains
unaffected by these extreme values, accurately reflecting the central tendency of the majority of the
data points. This property makes the median a more reliable measure of central tendency in skewed
distributions.
Overall, the median is often preferred over the mean in situations where the distribution of data is non-
normal, skewed, or influenced by outliers. Its robustness to extreme values makes it a valuable tool for
accurately summarizing and interpreting data distributions, providing insights into the typical or central
values within the dataset.
38
AI- Machine Learning Engineer
Mode: The mode is the most frequently occurring value in a dataset. It provides insight into the central
tendency of categorical or nominal data. Unlike the mean and median, the mode can be calculated
for any type of data, including nominal and ordinal variables. However, depending on the distribution,
datasets may have multiple modes or no modes.
The mode is a fundamental measure of central tendency in descriptive statistics, representing the value
that occurs most frequently in a dataset. Unlike the mean and median, which are typically used for
numerical data, the mode can be calculated for any type of data, including categorical or nominal
variables. For example, the mode would indicate the most common colour in a dataset representing
the colors of cars owned by a group of people.
One of the key advantages of the mode is its applicability to a wide range of data types. Whether the
data is categorical, nominal, or ordinal, the mode can provide insight into the central tendency by
identifying the most prevalent category or value. This makes it a versatile tool for summarizing datasets
across various domains and disciplines.
However, it's important to note that datasets may exhibit different characteristics in terms of their
mode. In some cases, a dataset may have a single mode, indicating a clear peak or dominant value. For
example, in a dataset representing the ages of individuals in a community, the mode might be the age
group with the highest frequency, such as the age range of 30-40 years.
On the other hand, datasets may also have multiple modes, where two or more values occur with the
same highest frequency. This scenario often occurs in bimodal or multimodal distributions, where the
data exhibits more than one peak. For instance, in a dataset representing the scores of students on an
exam, there might be two distinct modes representing different performance levels.
Furthermore, it's also possible for a dataset to have no mode at all, particularly if all values occur with
equal frequency or if there is no clear pattern of repetition in the data. This situation is more common
in continuous or uniformly distributed datasets where no single value predominates.
Central Tendency: Mean, median, and mode all measure central tendency, but they may differ based
on the distribution of the data. The mean, median, and mode are approximately equal, suggesting a
symmetric distribution. If they differ significantly, it indicates skewness in the distribution.
Central tendency refers to the typical or central value around which data points in a dataset tend
to cluster. Mean, median, and mode are three commonly used measures of central tendency, each
providing different insights into the distribution of data. When the mean, median, and mode are
approximately equal, it suggests that the data are symmetrically distributed around a central value.
In a symmetric distribution, the mean, median, and mode are all located at the center of the distribution,
with roughly equal values on both sides. This indicates that the data are evenly distributed around the
central value, resulting in a balanced distribution. For example, the mean, median, and mode are all
located at the same point in a perfectly normal distribution, resulting in a symmetrical bell-shaped
curve.
However, when the mean, median, and mode differ significantly, it suggests that the distribution is
skewed. Skewness occurs when the data are not evenly distributed around the central value, causing
the distribution to be asymmetrical. In a positively skewed distribution, the mean is typically greater
than the median and mode, with a tail extending towards higher values. Conversely, in a negatively
skewed distribution, the mean is usually less than the median and mode, with a tail extending towards
lower values.
The difference between the mean, median, and mode can provide valuable insights into the shape
and characteristics of the distribution. For example, suppose the mean is greater than the median and
mode. In that case, it suggests that the distribution is positively skewed, with a concentration of data
points towards the lower end and a few extreme values towards the higher end. Conversely, suppose
39
Participant Handbook
the mean is less than the median and mode. In that case, it indicates a negatively skewed distribution,
with a concentration of data points towards the higher end and a few extreme values towards the
lower end.
Comparing the mean, median, and mode can help analysts assess the symmetry or skewness of a
distribution and understand the typical or central values in the dataset. This understanding is crucial for
interpreting data accurately and making informed decisions based on the distribution's characteristics.
Variability: Descriptive statistics also include measures of variability, such as range, variance, and
standard deviation. These measures quantify the spread or dispersion of the data around the central
tendency. A large spread indicates high variability, while a small spread suggests low variability.
Variability is a critical aspect of descriptive statistics that measures the extent to which data points
deviate from the central tendency. Among the key measures of variability are the range, variance, and
standard deviation.
The range is the simplest measure of variability and is calculated as the difference between the highest
and lowest values in a dataset. It provides a straightforward indication of the spread of the data but can
be sensitive to outliers, as it only considers two extreme values. While the range is easy to calculate and
understand, it may not capture the full extent of variability, especially in larger datasets with diverse
distributions.
Variance and standard deviation offer more sophisticated measures of variability by considering the
deviation of each data point from the mean. Variance is calculated by averaging the squared differences
between each data point and the mean, providing a measure of the average dispersion of data points
around the mean. However, since variance is in squared units, it may not be easily interpretable in the
original units of the data.
Standard deviation, on the other hand, is the square root of the variance and provides a more
interpretable measure of variability in the original units of the data. It quantifies the average distance
of data points from the mean, with larger standard deviations indicating greater variability and smaller
standard deviations suggesting less variability. Standard deviation is widely used in statistics and
provides valuable insights into the dispersion of data points within a dataset.
Measures of variability such as range, variance, and standard deviation are essential components of
descriptive statistics. They quantify the spread or dispersion of data around the central tendency,
providing valuable insights into the variability of the dataset. Analysts can better interpret and analyze
data by understanding variability, leading to more informed decision-making and deeper insights into
the underlying patterns and trends.
Shape of Distribution: Descriptive statistics help analysts understand the shape of the distribution,
whether it is symmetric, skewed, bimodal, or uniform. The choice of central tendency measure (mean,
median, mode) depends on the distribution's shape and the data analysis type.
Understanding the shape of the distribution is essential in descriptive statistics as it provides insights
into the underlying characteristics of the dataset. One common shape is a symmetric distribution,
where the data is evenly distributed around the central value. In such cases, the mean, median, and
mode are typically similar and can accurately represent the central tendency of the data. Symmetric
distributions are common in many natural phenomena, such as human heights or exam scores, where
values cluster around a central point without significant skewness.
Conversely, skewed distributions exhibit asymmetry, with the data clustering more towards one end
of the distribution than the other. Positive skewness occurs when the tail of the distribution extends
towards higher values, while negative skewness indicates a longer tail towards lower values. In
skewed distributions, the choice of central tendency measure becomes crucial. For positively skewed
40
AI- Machine Learning Engineer
distributions, where extreme values pull the mean in the direction of the skew, the median may provide
a more representative measure of central tendency. Similarly, in negatively skewed distributions, the
median may better capture the central tendency than the mean.
Bimodal distributions have two distinct peaks or modes, indicating that the dataset contains two
separate clusters or categories of values. In such cases, using a single central tendency measure like the
mean may not accurately represent the data. Analysts may need to consider both modes separately or
use alternative measures like the median for a more nuanced understanding of the distribution. Finally,
uniform distributions occur when all values in the dataset have the same frequency, resulting in a flat
or constant distribution. In uniform distributions, the mean, median, and mode may all be the same, as
values are not clustered around a central point.
The shape of the distribution plays a crucial role in selecting the appropriate central tendency measure
for summarizing and interpreting the data accurately. By understanding whether the distribution is
symmetric, skewed, bimodal, or uniform, analysts can make informed decisions about which descriptive
statistics to use and how to interpret the characteristics of the dataset effectively.
Interpretation: By analysing descriptive statistics, analysts can interpret the data distribution
characteristics accurately. They can identify outliers, assess the presence of skewness, and determine
the typical or central values in the dataset. This interpretation provides valuable insights for further
analysis and decision-making.
Analysing descriptive statistics enables analysts to gain deep insights into the characteristics of a
dataset, empowering them to make informed decisions and conduct further analysis effectively.
One key aspect of interpreting descriptive statistics is the identification of outliers. Outliers are data
points that significantly deviate from the rest of the dataset and can skew the interpretation of the
central tendency and variability measures. By identifying outliers through methods such as box plots
or z-scores, analysts can assess their impact on the data distribution and determine whether they
represent genuine anomalies or errors in measurement.
Additionally, descriptive statistics help analysts assess the presence of skewness in the data distribution.
Skewness refers to the distribution's asymmetry, where the distribution's tail extends more to one side
than the other. Positive skewness indicates that the distribution is skewed to the right, with a longer
tail on the right side of the distribution, while negative skewness indicates a left-skewed distribution.
Analysts can infer the direction and magnitude of skewness by examining measures such as the mean,
median, and mode, allowing them to understand the underlying distributional characteristics more
accurately.
Moreover, descriptive statistics aid analysts in determining the typical or central values in the dataset,
providing valuable insights into the overall pattern of the data. Measures such as the mean, median,
and mode offer different perspectives on central tendency, allowing analysts to choose the most
appropriate measure based on the distribution of the data and the nature of the variables involved.
By interpreting these measures in conjunction with measures of variability, such as range or standard
deviation, analysts can develop a comprehensive understanding of the dataset's central tendency
and variability, facilitating robust analysis and decision-making processes. In summary, interpreting
descriptive statistics provides analysts with valuable insights into the data distribution characteristics,
enabling them to identify outliers, assess skewness, and determine central values effectively, ultimately
supporting informed analysis and decision-making in various domains.
41
Participant Handbook
42
AI- Machine Learning Engineer
Correlation techniques are statistical methods used to quantify and analyze the relationship between
variables in a dataset. They help determine the extent to which changes in one variable correspond to
changes in another variable. Two common correlation techniques are Pearson's correlation coefficient
and least squares methods.
Pearson's Correlation Coefficient: Pearson's correlation coefficient, denoted as r, measures the strength
and direction of a linear relationship between two continuous variables. It ranges from -1 to +1, where:
• +1 indicates a perfect positive linear relationship (both variables increase together).
• 0 indicates no linear relationship (variables are independent).
• -1 indicates a perfect negative linear relationship (one variable increases as the other decreases).
Pearson's correlation coefficient is sensitive to outliers and assumes that the relationship between
variables is linear and that the variables are normally distributed.
Methods of Least Squares: Methods of least squares are used to fit a model to observed data points
to minimise the sum of the squared differences between the observed and predicted values. One
common application is linear regression, which assumes a linear relationship between variables. The
least squares method estimates the coefficients of the linear equation that best fits the data. This
method allows analysts to quantify the relationship between variables, predict values, and assess the
significance of the relationship through hypothesis testing.
These correlation techniques provide valuable insights into the relationships between variables in a
dataset, enabling analysts to:
• Assess the strength and direction of relationships.
• Identify patterns and trends in the data.
• Make predictions and forecasts based on observed relationships.
• Determine the significance of relationships through hypothesis testing.
• Understand the impact of one variable on another, aiding decision-making processes in various
fields such as economics, finance, healthcare, and social sciences.
By applying correlation techniques appropriately, analysts can gain a deeper understanding of the
underlying dynamics within datasets, facilitating more informed analysis and decision-making.
43
Participant Handbook
Summary
• Probability distributions such as Normal, Poisson, Exponential, and Bernoulli differ in their
characteristics and applications, representing different types of random variables and outcomes.
• Graphical techniques, including scatterplots, are employed to visualize the correlation between
variables, aiding in the identification of patterns and relationships within datasets.
• Descriptive statistics basics, such as measures of central tendency like mean, median, and mode,
provide insights into the typical values of a dataset, facilitating data interpretation and analysis.
• Various correlation techniques, such as Pearson's Correlation Coefficient and Methods of Least
Squares, are utilized to assess relationships between variables and quantify associations within
datasets.
• Regression analysis encompasses techniques like linear, logistic, ridge, and lasso regression, each
serving different purposes in modelling relationships between variables and making predictions.
• Hypothesis testing is employed to draw inferences about population parameters based on sample
data and measure the statistical significance of observed differences or relationships.
• Probability distributions, including Normal, Poisson, Exponential, and Bernoulli, play a crucial
role in modelling random phenomena and analyzing uncertainty in various fields such as finance,
engineering, and healthcare.
• Scatterplots and other graphical techniques provide visual representations of data relationships,
allowing analysts to identify trends, clusters, and outliers that may impact decision-making and
further analysis.
• Descriptive statistics, such as measures of central tendency (mean, median, mode), variability
(standard deviation, variance), and distributional characteristics, summarize and interpret datasets,
providing valuable insights into the underlying data structure.
• Regression analysis techniques, such as linear, logistic, ridge, and lasso regression, enable analysts
to model relationships between variables, predict outcomes, and understand the influence of
predictors on the response variable.
44
AI- Machine Learning Engineer
Exercise
Multiple-choice Question:
1. What type of distribution is characterized by its symmetrical and bell-shaped curve?
a. Poisson distribution b. Exponential distribution
c. Normal distribution d. Bernoulli distribution
2. Which distribution is commonly used to model rare event occurrences, such as customer arrivals
or product defects?
a. Poisson distribution b. Exponential distribution
c. Normal distribution d. Bernoulli distribution
5. What measure of central tendency represents the most frequently occurring value in a dataset?
a. Mean b. Median
c. Mode d. Range
Descriptive Questions
1. Describe the main characteristic of a normal distribution.
2. Explain the use of Poisson distribution in modeling rare event occurrences.
3. What type of relationships can be identified by analyzing trends in scatterplots?
4. Differentiate between mean, median, and mode as measures of central tendency.
5. How does Pearson's correlation coefficient quantify the strength and direction of a linear relationship
between variables?
45
Participant Handbook
Notes
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
Scan the QR codes or click on the link to watch the related videos
https://www.youtube.com/watch?v=xTpHD5WLuoA https://www.youtube.com/watch?v=11c9cs6WpJU
46
4. Development Tools
and Usage
Unit 4.1 - Software Development Practices and
Performance Optimization
Bridge Module
Participant Handbook
48
AI- Machine Learning Engineer
Unit Objectives
By the end of this unit, the participants will be able to:
1. Evaluate programming styles, documentation habits, and code/design principles for enhanced
software development.
2. Apply scripting languages for task automation, program creation, and addressing development
requirements.
3. Utilize suitable tools for efficient program building, debugging, testing, and maintenance.
4. Configure OS components and utilize cloud platforms for software performance and scalability
optimization.
49
Participant Handbook
50
AI- Machine Learning Engineer
51
Participant Handbook
52
AI- Machine Learning Engineer
• Learning Curve: Scripting languages typically have a lower learning curve compared to compiled
languages, making them accessible to beginners and experienced developers alike. Their simplicity
and readability encourage rapid learning and adoption, enabling developers to become productive
quickly.
• Extensibility and Customization: Scripting languages offer extensibility features that allow
developers to extend their functionality through modules, plugins, or extensions. This enables
customization and adaptation of scripting environments to specific project requirements or
workflow preferences.
• Continuous Improvement: Scripting languages evolve over time through community contributions,
language enhancements, and updates. This continuous improvement ensures that scripting
languages remain relevant, efficient, and capable of addressing evolving development needs and
industry trends.
53
Participant Handbook
Program Building: Program building involves utilizing integrated development environments (IDEs) such
as Visual Studio Code, IntelliJ IDEA, or Eclipse to efficiently code, compile, and package applications.
These IDEs provide essential features like syntax highlighting, code completion, and built-in debugging
tools, streamlining the development process and enabling developers to effectively create executable
software from source code. Additionally, version control systems (VCS) like Git and Subversion ensure
version control integrity, facilitating collaborative development and efficient management of source
code changes across team members.
• Integrated Development Environments (IDEs): IDEs like Visual Studio Code, PyCharm, and IntelliJ
IDEA provide comprehensive environments for coding, compiling, and building applications. They
offer features such as syntax highlighting, code completion, and built-in debugging tools.
• Version Control Systems (VCS): Tools like Git, Mercurial, and Subversion facilitate collaborative
development by tracking changes to source code, enabling team members to work simultaneously
on projects without conflicts.
• Build Automation Tools: Build automation tools such as Apache Maven, Gradle, and automate the
process of compiling source code into executable binaries, reducing manual errors and streamlining
the build process.
Debugging: Debugging is the process of identifying, analyzing, and resolving errors or defects in
software code to ensure its proper functionality. Developers use various tools and techniques, such as
integrated development environment (IDE) debuggers, logging frameworks, and error-tracking systems,
to pinpoint and rectify issues efficiently. Through step-by-step code execution, variable inspection,
and error diagnosis, debugging allows developers to troubleshoot and fix bugs, ensuring software
applications' smooth and reliable operation.
• Debugger Tools: Integrated debugger tools within IDEs allow developers to step through code, set
breakpoints, and inspect variables during runtime to identify and fix errors efficiently.
• Logging Frameworks: Logging frameworks like Log4j, Logback, and Python's logging module help
developers track the execution flow and capture relevant information for troubleshooting purposes.
Testing: Testing is a critical phase in software development where various techniques and tools
are employed to assess a software product's quality, functionality, and performance. It involves
systematically executing predefined test cases, evaluating the system's behaviour under different
conditions, and identifying defects or discrepancies between expected and actual outcomes. Testing
encompasses a wide range of activities, including unit testing, integration testing, system testing, and
acceptance testing, each validating different aspects of the software's functionality and ensuring that
it meets the specified requirements and user expectations. Through thorough and rigorous testing,
developers can identify and rectify issues early in development, ultimately delivering a reliable and
high-quality software product to end-users.
• Unit Testing Frameworks: Frameworks such as JUnit, NUnit, and pytest enable developers to write
and execute automated unit tests to verify the functionality of individual components or units of
code.
• Integration Testing Tools: Tools like Selenium, Postman, and SoapUI facilitate automated testing of
application integrations, APIs, and user interfaces to ensure smooth interaction between different
modules.
• Code Coverage Tools: Code coverage tools like JaCoCo, Cobertura, and Coverage.py measure the
extent to which source code is tested by identifying areas that are not covered by test cases, helping
developers assess the quality of their test suites.
54
AI- Machine Learning Engineer
55
Participant Handbook
56
AI- Machine Learning Engineer
57
Participant Handbook
Summary
• Evaluate and adhere to best programming practices while maintaining thorough documentation for
clarity and future reference.
• Utilize scripting languages to automate tasks and develop simple programs, improving workflow
efficiency.
• Employ suitable tools for building, debugging, testing, tuning, and maintaining programs to ensure
reliability and effectiveness in the development process.
• Configure OS components to optimize performance and functionality, enhancing the overall
efficiency of software operations.
• Identify and promptly address software development needs and changes to adapt to evolving
requirements and maintain competitiveness.
• Utilize diverse cloud computing platforms and services to leverage scalability, flexibility, and cost-
effectiveness in software deployment and management.
• Apply code and design quality principles to develop robust, maintainable, and efficient software
solutions, prioritizing readability, modularity, and scalability.
• Implement strategies to optimize software performance, including efficient algorithms, proper
resource management, and scalable architecture design.
• Embrace a culture of continuous improvement in software development processes, fostering
innovation, collaboration, and adaptation to emerging technologies and industry trends.
• Prioritize security and compliance measures throughout the software development lifecycle,
safeguarding against cyber threats and ensuring adherence to data protection and privacy
regulatory requirements.
58
AI- Machine Learning Engineer
Exercise
Multiple-choice Question:
1. Which of the following is NOT a characteristic of good programming styles?
a. Readability b. Efficiency
c. Complexity d. Consistency
2. Which scripting language is commonly used for task automation and simple program writing?
a. Java b. Python
c. C++ d. Ruby
3. What tools are used for building, debugging, testing, tuning, and maintaining programs?
a. Hardware tools
b. Integrated Development Environments (IDEs)
c. Household tools
d. None of the above
4. Which component of the operating system is responsible for managing hardware resources?
a. User Interface b. Kernel
c. File System d. Device Drivers
Descriptive Questions
1. How can relevant information be gathered for vulnerability assessment, including source code,
application type, security controls, application patching, application functionality and connectivity,
and application design and architecture?
2. What role does documentation review play in identifying vulnerabilities, and why is it important in
the vulnerability assessment process?
3. How can false positives be distinguished from genuine security threats, and what are the implications
of misidentifying them?
4. What are the different methods used to identify vulnerabilities in applications, and how do they
contribute to the overall security posture of an organization?
5. Can you explain the methods and tools commonly used in application penetration testing, and how
they help in identifying and mitigating security risks in software applications?
59
Participant Handbook
Notes
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
Scan the QR codes or click on the link to watch the related videos
https://www.youtube.com/watch?v=Fi3_BjVzpqk https://www.youtube.com/watch?v=g0Q-VWBX5Js
60
5. Performance
Evaluation of
Algorithmic Models
Unit 5.1 - Algorithmic Model Development and
Assessment Tasks
SSC/N8121
Participant Handbook
62
AI- Machine Learning Engineer
Unit Objectives
By the end of this unit, the participants will be able to:
1. Differentiate between supervised and unsupervised learning algorithms
2. Identify technical parameters for an algorithmic model given a set of specified requirements
3. Assess various system limitations (such as runtime, memory, and parallel programming constraints)
while running an algorithmic model
4. Demonstrate the testing and debugging of sample algorithmic models
5. Analyse performance indicators (such as runtime, memory usage, model efficiency, etc.) of sample
algorithmic models
Unsupervised Learning: Unsupervised learning algorithms, in contrast, learn from unlabelled data,
meaning that there are no predefined output labels provided during training. Instead, the algorithm
seeks to identify patterns or structures in the data without explicit guidance. Unsupervised learning
is often used for exploratory data analysis, clustering, and dimensionality reduction. In clustering, the
algorithm groups similar data points together into clusters based on some similarity metric, while in
dimensionality reduction, the algorithm reduces the number of features in the data while preserving
its important structure. Common unsupervised learning algorithms include k-means clustering,
63
Participant Handbook
64
AI- Machine Learning Engineer
65
Participant Handbook
• Algorithm Selection: Choose an appropriate algorithm based on the nature of the problem, data
characteristics, and desired outcomes. Options include regression algorithms (linear, logistic),
decision trees, neural networks, and ensemble methods like random forests or gradient boosting.
• Feature Selection: Identify relevant features (input variables) that contribute to the model's
predictive performance. Use techniques such as feature importance analysis, correlation analysis,
or domain knowledge to select the most informative features while avoiding overfitting.
• Data Pre-processing: Prepare the data by appropriately handling missing values, outliers, and
categorical variables. Techniques include imputation, outlier detection and removal, and one-hot
encoding or feature scaling for categorical variables.
• Model Training: Split the data into training and validation sets to train and evaluate the model's
performance. Utilize techniques such as cross-validation to assess model robustness and prevent
overfitting.
• Hyper parameter Tuning: Fine-tune the algorithm's hyper parameters to optimize performance.
Techniques include grid search, random search, or Bayesian optimization to efficiently search the
hyperparameter space.
• Model Evaluation: Assess the model's performance using appropriate evaluation metrics such as
accuracy, precision, recall, F1-score, or area under the ROC curve (AUC). Choose metrics based on
the problem domain and the desired trade-offs between false positives and false negatives.
• Validation Strategies: Select validation strategies such as holdout validation, k-fold cross-validation,
or time series split based on the dataset size, characteristics, and requirements. Validate the model's
performance on unseen data to ensure generalization.
• Regularization Techniques: Apply regularization techniques such as L1 (Lasso) or L2 (Ridge)
regularization to prevent overfitting and improve model generalization. Regularization penalizes
large coefficients and promotes simpler models.
• Model Interpretability: Enhance model interpretability by using techniques such as feature
importance analysis, partial dependence plots, or SHAP (SHapley Additive exPlanations) values to
understand the model's decision-making process.
• Scalability and Efficiency: Consider the scalability and computational efficiency of the algorithm,
especially for large datasets. Utilize distributed computing frameworks or algorithmic optimizations
to improve performance and reduce computational resources.
66
AI- Machine Learning Engineer
ᴑ Impact on User Experience: Exceeding runtime constraints can negatively impact the user
experience, leading to slow response times, unresponsiveness, or perceived system failures.
In applications involving user interaction, such as web applications or mobile apps, meeting
runtime constraints is essential for maintaining user satisfaction.
ᴑ Hardware and Software Optimization: Meeting runtime constraints often require optimization
efforts at both the hardware and software levels. Hardware optimization may involve using
high-performance computing resources, such as multi-core processors or specialized hardware
accelerators, to speed up algorithm execution. Software optimization may include algorithmic
optimizations, code profiling, and tuning parameters to improve efficiency and reduce execution
time.
ᴑ Scalability Considerations: As the size of input data or computational workload increases, the
runtime of the algorithm may also increase. Scalability considerations are important to ensure
that the algorithm can handle larger datasets or higher workloads while still meeting runtime
constraints. Techniques such as parallel processing, distributed computing, and algorithmic
optimizations can help improve scalability and mitigate the impact on runtime.
ᴑ Monitoring and Performance Tuning: Continuous monitoring of runtime performance is
essential to identify potential bottlenecks, optimize resource utilization, and address any
deviations from the specified runtime constraints. Performance tuning efforts may involve
adjusting algorithm parameters, optimizing data processing pipelines, or upgrading hardware
infrastructure to improve runtime performance.
ᴑ Trade-offs: In some cases, meeting strict runtime constraints may require trade-offs in terms
of algorithm complexity, accuracy, or resource usage. Balancing these trade-offs is important to
ensure that the algorithm meets its performance requirements while still achieving the desired
level of functionality and reliability.
• Memory Constraints: Memory usage refers to the amount of system memory (RAM) required
by the algorithm during execution. System limitations may impose constraints on the maximum
amount of memory that can be allocated for running the algorithmic model. Exceeding memory
constraints can lead to performance degradation, slowdowns, or even system crashes.
ᴑ Impact of Exceeding Memory Constraints: When the memory usage of an algorithm exceeds
the maximum allowable limit imposed by system constraints, several issues may arise. Firstly,
performance degradation occurs as the system resorts to using slower forms of memory, such as
virtual memory or disk storage, to compensate for the shortage of RAM. This leads to increased
access times and slower execution speeds.
ᴑ Slowdowns and System Crashes: Exceeding memory constraints can cause significant
slowdowns in algorithm execution due to frequent swapping of data between RAM and disk
storage. This can result in delays in processing, increased response times, and overall degraded
system performance. In extreme cases, where the algorithm consumes excessive amounts of
memory beyond the system's capacity, it can lead to system crashes or instability.
ᴑ Optimization Strategies: To mitigate the impact of memory constraints, optimization strategies
can be employed. This includes optimizing data structures and algorithms to reduce memory
footprint, minimizing unnecessary data storage, and employing memory management
techniques such as caching and recycling memory resources.
ᴑ Memory Profiling: Memory profiling tools can be used to monitor and analyze the memory
usage of an algorithm during execution. These tools provide insights into memory allocation
patterns, identify memory leaks or inefficient memory usage, and help optimize memory
utilization to stay within the constraints of the system.
ᴑ Trade-offs and Balancing Act: Balancing memory usage with algorithm performance is a trade-
off that developers must carefully consider. While minimizing memory usage is desirable to
avoid exceeding constraints, it should not come at the expense of algorithm efficiency or
67
Participant Handbook
functionality. Striking the right balance between memory usage and performance is essential
for optimal algorithm execution.
• Parallel Programming Constraints: Many algorithmic models can benefit from parallel processing
to improve performance by leveraging multiple CPU cores or distributed computing resources.
However, parallel programming introduces its own set of constraints, including synchronization
overhead, communication latency, and load-balancing challenges. System limitations may include
constraints on the maximum number of threads or processes that can be utilized concurrently.
ᴑ Synchronization Overhead: In parallel programming, multiple threads or processes often
need to synchronize their execution to ensure correct and consistent results. Synchronization
mechanisms such as locks, semaphores, and barriers introduce overhead, as threads may
need to wait for each other to complete certain operations, leading to potential performance
bottlenecks.
ᴑ Communication Latency: Parallel programming involves exchanging data or coordinating tasks
between threads or processes. Communication latency refers to the time delay incurred when
sending or receiving messages between different processing units. High communication latency
can impact overall performance, especially in distributed computing environments where
communication occurs over networks.
ᴑ Load Balancing Challenges: Load balancing is essential in parallel programming to distribute
computational tasks evenly across processing units, ensuring efficient resource utilization and
minimizing idle time. However, achieving optimal load balancing can be challenging, particularly
for irregular or dynamically changing workloads, leading to uneven workload distribution and
potential performance degradation.
ᴑ Maximum Concurrency Constraints: System limitations may impose constraints on the
maximum number of threads or processes that can be utilized concurrently. Hardware
limitations, operating system settings, or resource allocation policies may influence these
constraints. Exceeding maximum concurrency limits can lead to resource contention, increased
overhead, and decreased performance.
ᴑ Resource Management: Effective resource management is crucial in parallel programming
to optimally allocate system resources such as CPU cores, memory, and network bandwidth.
System limitations may impact resource availability and require careful consideration of
resource allocation strategies to minimize contention and maximize throughput.
ᴑ Scalability Considerations: Scalability refers to the ability of a parallel algorithm to efficiently
utilize additional processing units as the workload or problem size increases. System limitations
may affect the scalability of parallel algorithms, requiring scalability considerations such as
workload partitioning, data distribution, and communication optimization to achieve optimal
performance across different system configurations.
ᴑ Performance Tuning: Addressing parallel programming constraints often involves performance
tuning techniques to optimize synchronization, communication, and load balancing overhead.
Performance profiling tools and techniques can help identify performance bottlenecks and
guide optimization efforts to enhance parallel program efficiency and scalability.
ᴑ Fault Tolerance: In distributed computing environments, fault tolerance mechanisms are
essential to ensure system reliability and resilience against failures. System limitations may
impact fault tolerance strategies, influencing decisions regarding error detection, recovery, and
fault handling mechanisms in parallel programs.
ᴑ Programming Complexity: Parallel programming introduces additional complexity compared
to sequential programming, requiring developers to consider concurrency, synchronization,
and communication aspects. System limitations may exacerbate programming complexity
by imposing constraints on available resources, concurrency models, and communication
protocols, necessitating careful design and implementation of parallel algorithms.
68
AI- Machine Learning Engineer
ᴑ Future Trends and Challenges: Emerging technologies such as multi-core processors, GPUs, and
distributed computing frameworks continue to drive advancements in parallel programming.
However, addressing parallel programming constraints and harnessing the full potential of
these technologies remain ongoing challenges, requiring innovative solutions and collaborative
efforts across the research and development community.
• Resource Allocation: System limitations may also include constraints on resource allocation, such
as the maximum number of CPU cores, GPU units, or network bandwidth available for running
the algorithmic model. Optimizing resource allocation is essential to maximize performance while
adhering to system constraints.
ᴑ CPU Cores Allocation: System limitations may restrict the number of CPU cores available for
running the algorithmic model. Optimizing CPU core allocation involves efficiently distributing
computational tasks across available cores to maximize parallelism and minimize processing
time. Techniques such as multi-threading and task scheduling can be employed to effectively
utilize CPU resources.
ᴑ GPU Units Allocation: In scenarios where the algorithmic model involves intensive parallel
processing tasks, leveraging GPU units can significantly accelerate computation. However,
system limitations may impose constraints on the maximum number of GPU units available
or the amount of GPU memory accessible. Optimizing GPU unit allocation involves efficiently
distributing computational tasks across available units and minimizing data transfer overhead
between CPU and GPU.
ᴑ Network Bandwidth Allocation: Algorithmic models that rely on distributed computing or
cloud-based resources may encounter constraints on network bandwidth. Efficiently managing
network bandwidth allocation is crucial for minimizing communication latency and maximizing
data throughput. Techniques such as data compression, network protocol optimization, and
traffic prioritization can be utilized to optimize network bandwidth usage.
ᴑ Dynamic Resource Allocation: Resource allocation needs may vary over time based on
workload fluctuations and system demand in dynamic computing environments. Implementing
dynamic resource allocation strategies enables the algorithmic model to adapt to changing
resource availability and optimize performance accordingly. Techniques such as auto-scaling
and resource provisioning based on workload metrics can help efficiently manage resource
allocation in dynamic environments.
ᴑ Load Balancing: System limitations may lead to uneven resource distribution among
multiple computational nodes or processing units, resulting in load imbalance and degraded
performance. Load balancing techniques aim to evenly distribute computational tasks across
available resources to maximize utilization and minimize processing time. Strategies such as
task migration, workload partitioning, and dynamic load balancing algorithms can be employed
to achieve optimal load distribution and resource utilization.
• Scalability: Scalability refers to the ability of the algorithmic model to efficiently utilize system
resources as the input data size or computational workload increases. System limitations may
impact the scalability of the model, requiring careful consideration of scalability challenges and
optimization strategies.
• Optimization Techniques: To mitigate the impact of system limitations, various optimization
techniques can be employed, including algorithmic optimizations, memory management strategies,
parallelization techniques, and distributed computing frameworks. These techniques aim to
improve performance, reduce resource usage, and ensure compliance with system constraints.
ᴑ Impact of System Limitations: System limitations, such as constraints on runtime, memory,
and parallel programming, can directly affect the scalability of the algorithmic model. These
limitations may constrain the model's ability to efficiently utilize available resources as the
workload increases.
69
Participant Handbook
ᴑ Challenges of Scalability: Scalability challenges may arise due to various factors, including
limitations in hardware resources, inefficient algorithm design, and bottlenecks in data
processing or communication. As the input data size or computational workload grows, these
challenges can exacerbate, leading to performance degradation or system failures.
ᴑ Optimization Strategies: Optimization strategies are employed to improve the efficiency
and performance of the algorithmic model to address scalability challenges. These strategies
may include algorithmic optimizations, parallelization techniques, distributed computing
frameworks, and resource allocation optimizations.
Algorithmic Optimizations involve redesigning or refining the algorithm to improve its efficiency
and scalability. This may include reducing computational complexity, optimizing data structures,
and minimizing redundant computations to enhance performance.
ᴑ Parallelization Techniques: Parallelization techniques enable the algorithmic model to leverage
multiple processing units or distributed computing resources to perform computations
concurrently. Parallel programming frameworks, such as MPI (Message Passing Interface) or
OpenMP, facilitate the efficient utilization of parallel resources, enhancing scalability.
ᴑ Distributed Computing: Distributed computing frameworks, such as Apache Hadoop or Spark,
enable the algorithmic model to distribute computations across multiple nodes in a cluster,
enabling horizontal scalability. By distributing the workload, these frameworks can handle
larger datasets and computational workloads more effectively.
ᴑ Resource Allocation Optimization: Optimizing resource allocation involves efficiently allocating
system resources, such as CPU cores, memory, and network bandwidth, to ensure balanced
utilization and prevent resource contention. Resource allocation strategies aim to maximize
performance while adhering to system constraints.
ᴑ Continuous Monitoring and Optimization: Scalability is an ongoing concern that requires
continuous monitoring and optimization. Performance metrics, such as throughput, latency,
and resource utilization, are monitored to identify scalability bottlenecks and optimize the
algorithmic model accordingly.
ᴑ Importance in Real-World Applications: Scalability is crucial for algorithmic models deployed
in real-world applications, where the volume of data and computational requirements can vary
significantly over time. Ensuring scalability enables the model to handle increasing demands
and maintain optimal performance, enhancing its reliability and usability.
• Performance Monitoring and Profiling: Continuous monitoring and profiling of system performance
are essential to identify potential bottlenecks, optimize resource utilization, and address runtime,
memory, and parallel programming constraints effectively. Performance monitoring tools provide
insights into system behaviour, resource usage patterns, and areas for improvement.
ᴑ Profiling Tools: Profiling tools are used to collect detailed information about the execution of an
algorithmic model, including function execution times, memory allocations, and I/O operations.
By analyzing profiling data, analysts can pinpoint specific areas of the code that are consuming
excessive resources or causing performance bottlenecks.
ᴑ Identifying Bottlenecks: Performance monitoring and profiling help identify bottlenecks in the
system, such as CPU-bound or memory-bound operations, disk contention, or network latency.
Analysts can prioritize optimization efforts to address the most critical performance issues by
understanding where the bottlenecks occur.
ᴑ Optimizing Resource Utilization: Performance monitoring tools provide insights into resource
utilization patterns, allowing analysts to optimize resource allocation and utilization. For
example, if CPU usage is consistently high, analysts may need to optimize the algorithm to
reduce computational overhead or parallelize tasks to leverage multiple CPU cores efficiently.
ᴑ Addressing Runtime Constraints: Performance monitoring helps identify runtime constraints,
such as long-running tasks or inefficient algorithms, that may impact system responsiveness or
70
AI- Machine Learning Engineer
throughput. Analysts can reduce runtime constraints and improve overall system performance
by optimising algorithms and improving code efficiency.
ᴑ Managing Memory Usage: Memory profiling tools help analyze memory usage patterns and
identify memory leaks or excessive memory allocations. Analysts can reduce memory-related
bottlenecks and improve system stability and performance by optimising memory usage and
implementing efficient memory management strategies.
ᴑ Optimizing Parallel Programming: Performance monitoring provides insights into parallel
programming constraints, such as thread contention, synchronization overhead, or load
imbalance. By analyzing parallel execution profiles, analysts can identify opportunities to
optimize parallelization strategies, improve scalability, and maximize parallel performance.
ᴑ Continuous Improvement: Performance monitoring and profiling are iterative processes that
involve continuous monitoring, analysis, and optimization. By regularly monitoring system
performance, identifying areas for improvement, and implementing optimization strategies,
analysts can achieve continuous performance improvements and ensure that the algorithmic
model operates efficiently under varying workloads and conditions.
71
Participant Handbook
• Functional Testing: Conduct functional testing to evaluate the overall functionality and behaviour of
the algorithmic model against its specified requirements and objectives. Functional testing focuses
on verifying that the model performs the intended tasks accurately and produces the expected
outputs.
• Performance Testing: Evaluate the performance of the algorithmic model under various conditions,
including different input sizes, data distributions, and computational workloads. Performance
testing helps identify performance bottlenecks, assess resource utilization, and optimize the
model's efficiency and scalability.
• Regression Testing: Perform regression testing to ensure that recent changes or modifications
to the algorithmic model do not introduce new defects or regressions in existing functionalities.
Regression testing involves retesting previously validated components and verifying that they
continue to function correctly after changes are made.
• Error Handling and Exception Testing: Test the algorithmic model's error handling and exception
mechanisms to ensure robustness and resilience in handling unexpected situations or erroneous
inputs. Error handling testing involves deliberately introducing errors or invalid inputs and verifying
that the model responds appropriately and gracefully.
• Debugging and Error Resolution: Use systematic debugging techniques to identify and diagnose
errors, bugs, or unexpected behaviours in the algorithmic model. Debugging involves analyzing error
messages, examining code logic, and using debugging tools to trace and resolve issues effectively.
• Documentation and Reporting: Document test results, debugging efforts, and any identified issues
or improvements in comprehensive reports. Documentation helps maintain a record of testing
activities, facilitates communication among team members, and supports future algorithmic model
maintenance and enhancements.
• Iterative Improvement: Continuously refine and improve the algorithmic model based on feedback
from testing and debugging activities. Iterate through the testing and debugging process to address
identified issues, incorporate enhancements, and ensure the ongoing reliability and quality of the
model.
72
AI- Machine Learning Engineer
• Accuracy: Accuracy measures the extent to which the algorithmic model's predictions or outputs
align with the ground truth or expected outcomes. It reflects the model's ability to make correct
decisions or classifications based on input data.
• Precision and Recall: Precision measures the proportion of true positive predictions among all
positive predictions made by the model, while recall measures the proportion of true positive
predictions among all actual positive instances in the dataset. These metrics are particularly
relevant in classification tasks and provide insights into the model's ability to correctly identify
relevant instances and minimize false positives and false negatives.
• F1 Score: The F1 score is the harmonic mean of precision and recall, providing a balanced measure
of a model's performance in classification tasks. It combines precision and recall into a single metric,
allowing analysts to assess the overall effectiveness of the model in capturing both true positives
and minimizing false positives and false negatives.
• Confusion Matrix: A confusion matrix provides a comprehensive summary of the model's
performance by tabulating true positive, true negative, false positive, and false negative predictions.
It enables analysts to visualize the distribution of prediction outcomes and evaluate the model's
strengths and weaknesses across different classes or categories.
• Mean Absolute Error (MAE) and Mean Squared Error (MSE): MAE and MSE are common metrics
used to evaluate regression models' performance by measuring the average magnitude of errors
between predicted and actual values. Lower values of MAE and MSE indicate better model
performance, with MAE providing a more interpretable measure of error magnitude than MSE.
• R-squared (R²): R-squared is a statistical measure that quantifies the proportion of variance in the
dependent variable explained by the independent variables in a regression model. It ranges from 0
to 1, with higher values indicating a better fit of the model to the data and greater predictive power.
• Computational Efficiency: Computational efficiency measures the algorithmic model's ability to
process input data and produce outputs within a reasonable timeframe. It encompasses factors
such as runtime, memory usage, and scalability and is crucial for assessing the model's feasibility
and practicality in real-world applications.
• Robustness: Robustness refers to the algorithmic model's ability to maintain performance and
reliability under diverse conditions, including variations in input data, environmental changes, and
potential disruptions. A robust model exhibits consistent performance across different scenarios
and is less susceptible to data perturbations or external factors.
• Interpretability: Interpretability measures the ease with which analysts can understand and
interpret the algorithmic model's predictions or decision-making process. A highly interpretable
model provides transparent insights into its internal workings, facilitating trust, validation, and
actionable insights for end-users and stakeholders.
• Scalability: Scalability assesses the algorithmic model's ability to handle increasing volumes of
data or computational workload efficiently. A scalable model can adapt to changing demands and
resource constraints, maintaining performance and reliability as the dataset size or complexity
grows.
73
Participant Handbook
Summary
• Supervised learning algorithms are trained on labelled data, where the algorithm learns from input-
output pairs to make predictions or classifications. In contrast, unsupervised learning algorithms
work with unlabeled data, aiming to discover patterns or structures without explicit guidance.
• Identifying technical parameters for an algorithmic model involves input data characteristics,
algorithm selection, model architecture, hyperparameters tuning, feature engineering, evaluation
metrics, computational resources, and deployment environment.
• Various data and computational structures, including arrays, matrices, graphs, trees, and hash
tables, can be utilized to develop algorithmic models, depending on the nature of the problem and
computational requirements.
• When running an algorithmic model, it's crucial to assess system limitations such as runtime,
memory usage, and parallel programming constraints to ensure efficient execution and resource
utilization.
• Evaluating the speed and memory interdependencies between a system and an algorithmic model
helps optimize performance and resource allocation, balancing computational efficiency with
model accuracy.
• Naïve algorithms are simplistic and inefficient, often characterized by high computational complexity,
while efficient algorithms leverage optimization techniques for better performance and scalability.
• Developing data flow diagrams for proposed algorithmic models aids in visualizing data processing
steps, inputs, outputs, and dependencies, facilitating better understanding and communication of
the model's design.
• Using Big O notation and asymptotic notation helps evaluate algorithmic models' runtime and
memory requirements, providing insights into their scalability and efficiency as input sizes increase.
• Demonstrating the testing and debugging sample algorithmic models is essential to identify and
rectify errors or inconsistencies in the model's implementation or performance.
• Analysing performance indicators such as runtime, memory usage, and model efficiency provides
insights into the effectiveness and scalability of algorithmic models, guiding optimization efforts
and informing decision-making processes.
• Developing documentation to record the results of model performance analysis ensures
transparency, reproducibility, and accountability, facilitating knowledge sharing and future model
improvements.
74
AI- Machine Learning Engineer
Exercise
Multiple-choice Question:
1. Which type of learning algorithm learns from labelled data?
a. Supervised learning b. Unsupervised learning
c. Reinforcement learning d. Semi-supervised learning
Descriptive Questions
1. Explain the concept of supervised learning and provide an example.
2. What are some key factors to consider when addressing runtime constraints in algorithmic models?
3. Why is testing and debugging essential in the development lifecycle of algorithmic models?
4. Describe the significance of performance indicators in evaluating algorithmic models.
5. How does Pearson's correlation coefficient differ from the method of least squares in assessing
relationships between variables?
75
Participant Handbook
Notes
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
Scan the QR codes or click on the link to watch the related videos
https://www.youtube.com/watch?v=1FZ0A1QCMWc https://www.youtube.com/watch?v=yN7ypxC7838
76
6. Performance
Evaluation of
Algorithmic Models
Unit 6.1 - Examination of Data Flow Diagrams of
Algorithmic Models
SSC/N8122
Participant Handbook
78
AI- Machine Learning Engineer
Unit Objectives
By the end of this unit, the participants will be able to:
1. Assess the designs of core algorithmic models in autonomous systems.
2. Examine and interpret data flow diagrams of algorithmic models.
3. Investigate available resources for the product ionisation of algorithmic models.
4. Determine parallel programming requirements (e.g., MISD, MIMD) for algorithmic models.
5. Discuss the principles of code and design quality.
6. Scrutinize technical requirements like scalability, reliability, and security.
7. Describe the process of converting technical specifications into software code.
79
Participant Handbook
• Decision Making and Control: Decision-making algorithms enable autonomous systems to make
high-level decisions based on sensor inputs, environmental context, and mission objectives. These
algorithms incorporate techniques from artificial intelligence, such as reinforcement learning,
planning, and optimization, to select actions that maximize performance and achieve desired
outcomes. Control algorithms translate high-level commands into low-level control signals to
effectively actuate the system's actuators.
• Machine Learning and Adaptation: Machine learning techniques play a crucial role in autonomous
systems, enabling them to learn from experience and adapt to changing conditions. Supervised
learning, unsupervised learning, and reinforcement learning algorithms can be used to train models
for perception, decision-making, and control tasks. Adaptive algorithms continuously update their
behaviour based on new data, allowing autonomous systems to improve performance and adapt
to novel scenarios over time.
80
AI- Machine Learning Engineer
Continuous integration and deployment pipelines, version control systems, and performance
monitoring tools can streamline the management of algorithmic models in production environments.
• Collaborative Development and Open Standards: Collaboration and knowledge sharing are
essential for advancing the field of autonomous systems. Open standards, open-source software,
and collaborative development platforms enable researchers and developers to exchange ideas,
share best practices, and collectively address common challenges in algorithmic model design.
Designing core algorithmic models for autonomous systems is a multifaceted and challenging
endeavour that requires expertise in machine learning, robotics, computer vision, and control theory.
By understanding the key components, challenges, and considerations involved in algorithmic model
design, developers can create sophisticated autonomous systems that are safe, reliable, and capable
of operating effectively in diverse and dynamic environments. As technology continues to advance, the
future of autonomous systems holds promise for revolutionizing various industries and improving the
quality of human life.
81
Participant Handbook
When designing flow diagrams for algorithmic models, it is essential to consider the components'
simplicity, consistency, hierarchy, clarity, and modularity. Simplifying the flow diagram ensures that only
essential steps and decision points are included, avoiding unnecessary complexity or detail. Consistency
in the use of symbols, notation, and formatting throughout the diagram enhances clarity and coherence,
enabling all stakeholders to interpret the meaning of each component consistently. Organizing the flow
diagram in a hierarchical manner with higher-level processes and decision points represented at the top
and lower-level details elaborated as needed improves readability and comprehension. Using clear and
descriptive labels for process blocks, decision points, and input/output symbols enhances clarity and
understanding. Finally, breaking down complex workflows into smaller, modular components promotes
modularity and manageability, facilitating easier comprehension, maintenance, and modification of
the flow diagram over time.
Flow diagrams consist of various components that represent different elements of the algorithmic
model's workflow. These components include:
• Start and End Points: The flow diagram begins with a start point and ends with an end point,
indicating the initiation and completion of the model's execution.
• Process Blocks: Process blocks represent individual steps or operations performed within the
algorithmic model. Each process block is labelled with a descriptive name and may contain
additional details or instructions.
• Decision Points: Decision points, represented by diamond-shaped symbols, indicate branching
paths in the workflow where different actions are taken based on specified conditions or criteria.
• Arrows and Connectors: Arrows and connectors connect the various components of the flow
diagram, illustrating the sequence of steps and decision points in the model's execution flow.
• Input and Output: Input and output symbols represent the data inputs and outputs of the
algorithmic model, indicating where external data is provided to the model and where results are
generated.
Design Considerations for Flow Diagrams: When designing flow diagrams for algorithmic models,
it's essential to consider various factors to ensure that the diagrams effectively communicate the
workflow and logic of the model. One critical consideration is simplicity. Flow diagrams should strive
for simplicity by focusing on the essential steps and decision points involved in the model's execution.
Avoiding unnecessary complexity or detail helps maintain clarity and readability, making it easier for
stakeholders to understand and analyze the diagram. By keeping the flow diagram simple and concise,
developers and analysts can convey the model's logic in a clear and straightforward manner, facilitating
collaboration and decision-making throughout the development process.
Consistency is another crucial design consideration for flow diagrams. Consistency in symbols, notation,
and formatting helps ensure that stakeholders can easily interpret the meaning of each component.
Using standardized symbols and conventions enhances the coherence and readability of the diagram,
reducing the risk of confusion or misinterpretation. Consistency also fosters interoperability with
other diagrams and documentation, enabling seamless integration into the development lifecycle.
By adhering to consistent design principles, flow diagrams become reliable tools for conveying the
workflow and logic of algorithmic models to stakeholders across different teams and disciplines.
When creating flow diagrams for algorithmic models, several design considerations should be taken
into account to ensure clarity, readability, and accuracy. These considerations include:
• Simplicity: Keep the flow diagram simple and concise, focusing on the essential steps and decision
points involved in the model's execution. Avoid unnecessary complexity or detail that may confuse
or overwhelm stakeholders.
82
AI- Machine Learning Engineer
• Consistency: Use consistent symbols, notation, and formatting throughout the flow diagram to
maintain clarity and coherence. Ensure that all stakeholders can easily interpret the meaning of
each component.
• Hierarchy: Organize the flow diagram in a hierarchical manner, with higher-level processes and
decision points represented at the top and lower-level details and sub-processes elaborated as
needed.
• Clarity: Use clear and descriptive labels for process blocks, decision points, and input/output symbols
to accurately convey their purpose and functionality. Avoid ambiguous or vague terminology that
may lead to misinterpretation.
• Modularity: Break down complex workflows into smaller, modular, easily understood and managed
components. Use sub-processes or modules to encapsulate repetitive or specialized functionality
within the flow diagram.
Best Practices for Creating Flow Diagrams: Creating effective flow diagrams requires adherence to best
practices to ensure clarity, readability, and accuracy. Firstly, planning and outlining the flow diagram
before beginning the design process is essential. This involves identifying the key steps, decision
points, inputs, outputs, and dependencies involved in the algorithmic model's workflow. By clearly
understanding the model's logic and functionality upfront, developers can create a well-structured
flow diagram that accurately represents the model's execution flow. Using standardized symbols and
notation is crucial for maintaining consistency and interoperability across different diagrams. Adhering
to commonly accepted conventions ensures that stakeholders can easily interpret the meaning of
each component, facilitating effective communication and collaboration throughout the development
process. Furthermore, iterative design is essential for refining and improving the flow diagram over
time. Regularly reviewing the diagram, soliciting feedback from stakeholders, and incorporating
suggestions for optimization and enhancement help ensure that the flow diagram accurately reflects
the model's requirements and functionality. Through collaboration and documentation, developers can
create flow diagrams that serve as valuable tools for visualizing and understanding algorithmic models.
In addition to planning and outlining the flow diagram, another best practice is to prioritize simplicity
and clarity in the design. Flow diagrams should aim to convey complex processes in a clear and
straightforward manner, avoiding unnecessary complexity or detail that may confuse stakeholders. By
keeping the diagram concise and focusing on the essential steps and decision points, developers can
ensure that stakeholders can easily follow the model's workflow and logic. Furthermore, modularization
can enhance the readability and manageability of flow diagrams by breaking down complex workflows
into smaller, more manageable components. Using sub-processes or modules to encapsulate repetitive
or specialized functionality helps maintain the overall clarity and coherence of the diagram while
facilitating easier navigation and understanding. By adhering to these best practices, developers can
create flow diagrams that effectively communicate the workflow of algorithmic models, enabling
stakeholders to comprehend, analyse, and make informed decisions throughout the development
process.
To create effective flow diagrams for algorithmic models, follow these best practices:
• Plan and Outline: Before creating the flow diagram, plan and outline the key steps and decision
points involved in the model's execution. Identify the main processes, inputs, outputs, and
dependencies to be included in the diagram.
• Use Standard Symbols: Use standardized symbols and notation for process blocks, decision points,
inputs, outputs, and connectors. Adhere to commonly accepted conventions to ensure consistency
and interoperability with other diagrams.
• Iterative Design: Iteratively refine and improve the flow diagram based on feedback from
stakeholders and validation against the model's requirements. Review the diagram regularly to
identify areas for optimization and enhancement.
83
Participant Handbook
• Collaboration: Collaborate with domain experts, developers, and stakeholders to ensure that the
flow diagram accurately reflects the functionality and logic of the algorithmic model. Incorporate
feedback and suggestions to enhance the diagram's effectiveness and clarity.
• Documentation: Document the flow diagram with explanatory notes, annotations, and references
to additional documentation or resources. Provide context and background information to help
stakeholders understand the model's purpose, inputs, outputs, and constraints.
Flow diagrams are invaluable tools for visualizing and understanding the workflow of algorithmic
models. By representing the sequence of steps and decision points in a clear and structured manner,
flow diagrams enable stakeholders to comprehend the model's functionality, logic, and dependencies
effectively. By adhering to design considerations and best practices, developers and analysts can create
flow diagrams that facilitate communication, collaboration, and decision-making throughout the
development lifecycle.
84
AI- Machine Learning Engineer
Moreover, human resources are essential for the successful product ionization of algorithmic models.
This includes skilled professionals such as data engineers, DevOps engineers, software developers, and
data scientists who collaborate to design, implement, deploy, and maintain the production infrastructure
and processes. Cross-functional teams with expertise in data science, software engineering, and
operations are crucial for addressing the complex challenges involved in effectively productionizing
algorithmic models.
Lastly, documentation and knowledge-sharing resources are essential for ensuring that stakeholders
understand how to use and maintain the productionized models. This includes comprehensive
documentation covering model architecture, deployment procedures, API endpoints, data pipelines,
and troubleshooting guidelines. Knowledge-sharing sessions, training workshops, and internal
communication channels also facilitate the dissemination of knowledge and best practices among team
members involved in model productionisation efforts. Overall, leveraging these resources enables
organizations to streamline the deployment and management of algorithmic models in production
environments, enabling them to derive maximum value from their data science initiatives.
The productionisation of algorithmic models is a critical stage in the lifecycle of data science projects,
as it involves transitioning models from development environments to operational systems where they
can be deployed, managed, and utilized at scale. This process requires careful planning, coordination,
and allocation of various resources to ensure the successful integration of models into production
environments and their ongoing maintenance and optimization.
• Computational Resources: Computational resources are the backbone of deploying algorithmic
models in production. These resources encompass hardware infrastructure such as servers, cloud
computing resources (e.g., AWS, Azure, Google Cloud), and specialized hardware accelerators like
GPUs or TPUs. The selection of computational resources depends on the scale and requirements of
the model, ensuring that it can handle the expected workload and deliver timely responses to user
requests without performance degradation. Scalability is a key consideration, as computational
resources must be able to scale dynamically to accommodate increased demand or processing
requirements.
• Software Resources: Software resources are essential for the productionisation process, providing
the necessary tools and frameworks for model deployment, management, and serving. This
includes containerization platforms like Docker and Kubernetes, which facilitate the packaging
and deployment of models in isolated, scalable environments. Orchestration tools like Apache
Airflow enable the automation and scheduling of data workflows, ensuring smooth execution and
coordination of tasks. Model serving frameworks such as TensorFlow Serving or PyTorch Serve
allow developers to expose models as APIs, enabling seamless integration with other systems and
applications.
• Human Resources: Skilled professionals are indispensable for the successful production of
algorithmic models. This includes data engineers, DevOps engineers, software developers, and data
scientists who collaborate to design, implement, deploy, and maintain the production infrastructure
and processes. Cross-functional teams with expertise in data science, software engineering, and
operations are crucial for addressing the complex challenges involved in effectively productionizing
algorithmic models. Continuous collaboration, communication, and knowledge sharing among
team members are essential for ensuring the smooth execution of productionisation efforts.
• Documentation and Knowledge Sharing: Comprehensive documentation and knowledge-sharing
resources are essential for ensuring that stakeholders understand how to use and maintain the
productionized models. This includes documentation covering model architecture, deployment
procedures, API endpoints, data pipelines, and troubleshooting guidelines. Clear, well-organized
documentation facilitates onboarding new team members, troubleshooting issues, and ongoing
maintenance of the production environment. Knowledge-sharing sessions, training workshops,
and internal communication channels further facilitate the dissemination of knowledge and best
practices among team members involved in model productionisation efforts.
85
Participant Handbook
86
AI- Machine Learning Engineer
87
Participant Handbook
concurrently on various CPUs. A single computer with many processors, a group of computers
connected by a network to form a parallel processing cluster or a combination of both can be used
simultaneously by parallel systems. Because parallel computer architecture varies correspondingly and
numerous CPUs' actions need to be coordinated and synchronized, programming parallel systems is
more challenging than programming computers with a single processor. CPUs are the foundation of
parallel processing. Depending on how many data streams and instructions there are to be processed
simultaneously, computing systems are classified into four major categories:
Flynn categorization: Systems with a single instruction and single data (SISD) – A uniprocessor machine
that can only run one instruction at a time while using a single data stream is called a SISD computing
system. Sequential computers are computers that use the sequential processing model of SISD, which
sequentially processes machine instructions. The architecture of most ordinary computers is SISD.
Primary memory must include all the instructions and data that need to be processed.
The rate at which the computer can send information internally limits (depends on) the processing
element's speed in the SISD model. The most common SISD systems are workstations made by IBM PCs.
SIMD systems, or single-instruction, multiple-data systems A multiprocessor device that can run on
several data streams while executing the same command on each CPU is known as a SIMD system.
Because scientific computing involves a lot of vector and matrix operations, machines built on the SIMD
architecture are well suited for this type of work. Organised data elements of vectors can be divided
into several sets (N-sets for N PE systems) so that the information can be delivered to all the processing
elements (PEs). Each PE can process a single data set.
88
AI- Machine Learning Engineer
The Cray vector processing machine is one of the dominant representative SIMD systems.
MSD systems, or multiple-instruction, single-data, is a multiprocessor machine that can execute different
instructions on various PEs while using the same dataset, which is known as a MISD computing system.
Z = sin(x)+cos(x)+tan(x) is an example. The same data collection is subjected to several system actions.
While some machines are constructed using the MISD model, none are commercially available, and
most machines made using this paradigm are unsuitable for most applications.
MIMD systems, or multiple-instruction, multiple-data, is a multiprocessor device that can carry out
numerous instructions on several data sets. It is called a MIMD system. Because every PE in the MIMD
model has a distinct instruction set and data stream, machines constructed with this paradigm can be
used for any form of application. PEs in MIMD machines operate asynchronously, in contrast to SIMD
and MISD machines.
Based on how PEs are connected to the main memory, MIMD computers can be roughly divided into
shared-memory and distributed-memory categories. All of the PEs in the tightly coupled multiprocessor
89
Participant Handbook
systems (MIMD) paradigm with shared memory have access to a single global memory that they are
all connected to. In this paradigm, PEs communicate with each other via shared memory; changes
made to data in the global memory by one PE are visible to all other PEs. The SMP (Symmetric Multi-
Processing) technology from Sun/IBM and Silicon Graphics computers are the two most common
examples of shared memory MIMD systems. Every PE in a loosely connected multiprocessor system
with distributed memory has a local memory.
In this approach, the interconnection network—also known as the interprocess communication channel,
or IPC—is the means by which PEs communicate with one another. Depending on the needs, PEs can
be connected by a network that can be set up in a tree, mesh, or other configuration. Compared to the
distributed memory MIMD model, the shared-memory MIMD architecture is easier to program but less
resilient to errors and more difficult to expand. In contrast to the distributed architecture, where each
PE may be easily isolated, failures in a shared-memory MIMD impact the entire system. Furthermore,
because memory contention arises with additional PEs, shared memory MIMD systems are less likely
to scale. Each PE has its own memory when distributed memory is used, so this is not the case. Given
the practical results and user requirements, the distributed memory MIMD architecture outperforms
the other models now in use.
90
AI- Machine Learning Engineer
altering existing code. Design patterns like Strategy and Decorator exemplify OCP by allowing new
behaviours to be added through composition or inheritance rather than modification.
6. Liskov Substitution Principle (LSP):
LSP emphasizes the use of polymorphism to enable interchangeable components within a system.
It states that objects of a superclass should be replaceable with objects of its subclass without
affecting the correctness of the program. By adhering to LSP, developers ensure that derived classes
adhere to the contracts established by their base classes, promoting code reusability and flexibility.
7. Interface Segregation Principle (ISP):
ISP advocates for designing interfaces specific to clients' needs, avoiding the temptation to create
large, monolithic interfaces. By breaking interfaces into smaller, cohesive units, ISP reduces coupling
between components and prevents clients from depending on methods they do not use. This
principle promotes code maintainability and facilitates easier refactoring and evolution.
8. Dependency Inversion Principle (DIP):
DIP encourages abstraction and decoupling by shifting dependencies from concrete implementations
to abstractions or interfaces. High-level modules should not depend on low-level details but rather
on abstractions, allowing for flexibility and easier substitution of components. Dependency injection
and inversion of control containers are common techniques used to adhere to DIP.
9. Don't Repeat Yourself (DRY):
DRY emphasizes the avoidance of code duplication by promoting code reuse through abstraction
and modularization. Duplication leads to maintenance challenges, as changes must be propagated
across multiple locations. By centralizing logic and data, developers reduce redundancy, improve
consistency, and minimize the risk of introducing errors.
10. Keep It Simple, Stupid (KISS):
KISS advocates for simplicity in design and implementation, favouring straightforward solutions
over complex ones. Simple designs are easier to understand, maintain, and debug, leading to
higher-quality software with fewer defects. However, simplicity should not come at the expense of
functionality or performance; instead, it aims to strike a balance between complexity and clarity.
11. Code Readability and Maintainability:
Readable code is essential for collaboration, debugging, and long-term maintenance. Consistent
formatting, meaningful variable names, and clear documentation enhance code readability,
enabling developers to quickly understand its purpose and functionality. Moreover, modular, well-
structured codebases are easier to maintain and evolve over time, reducing technical debt and
enhancing productivity.
12. Testing and Quality Assurance:
Comprehensive testing is vital for verifying software correctness, identifying defects, and ensuring
robustness. Unit tests, integration tests, and acceptance tests validate different aspects of the
software, providing confidence in its behaviour under various conditions. Automated testing
frameworks and continuous integration pipelines streamline the testing process, enabling rapid
feedback and early detection of regressions.
13. Performance Optimization:
Optimizing code for performance involves identifying and eliminating bottlenecks, improving
efficiency, and reducing resource consumption. Profiling tools help identify areas of code that
require optimization, such as tight loops or memory-intensive operations. Techniques like
algorithmic optimization, caching, and parallelization can significantly enhance the performance of
software systems, ensuring responsiveness and scalability.
91
Participant Handbook
By adhering to these principles and practices, software engineers can develop high-quality code
and design systems that are robust, maintainable, and scalable. Continuous learning, feedback, and
adaptation are essential for evolving software engineering practices and meeting the evolving needs of
users and stakeholders.
Scalability
Scalability refers to a system's ability to handle increasing workloads or growing demands by expanding
resources without sacrificing performance. It is a critical aspect of system design, especially in modern
applications where usage patterns can vary widely and rapidly.
Scalability can be achieved through various approaches, including vertical scaling (adding more
resources to a single machine) and horizontal scaling (distributing the workload across multiple
machines). Horizontal scaling is often preferred for its flexibility and cost-effectiveness, particularly in
cloud-based environments.
One of the primary challenges in achieving scalability is ensuring that the system can effectively utilize
additional resources as they are added. This requires careful design considerations such as decoupling
components, implementing asynchronous communication, and leveraging distributed architectures
like microservices or serverless computing.
Additionally, monitoring and performance testing play crucial roles in assessing and maintaining
scalability. Continuous monitoring helps identify bottlenecks and performance issues, while load testing
allows for simulating various usage scenarios to evaluate system response under different conditions.
In terms of best practices, designing for scalability from the outset is essential. This involves breaking
down the system into smaller, manageable components that can be independently scaled. Employing
technologies like containerization and orchestration tools such as Kubernetes can simplify the
management of distributed systems and facilitate seamless scaling.
92
AI- Machine Learning Engineer
Reliability
Reliability is the measure of a system's ability to perform consistently and predictably under normal
and adverse conditions. It encompasses factors such as fault tolerance, availability, and resilience to
failures.
Achieving reliability requires robust error-handling mechanisms, redundancy, and failover capabilities
to mitigate the impact of hardware failures, software bugs, or network issues. Redundancy can be
implemented at various levels of the system, including data storage, network infrastructure, and
application logic.
High availability architectures, such as active-passive or active-active setups, help ensure uninterrupted
service by automatically redirecting traffic or failing over to standby components in case of failures.
Additionally, implementing techniques like circuit breakers and retry strategies can help gracefully
handle transient errors and prevent cascading failures.
Continuous testing and monitoring are crucial for maintaining reliability. Automated tests, including unit,
integration, and end-to-end tests, help detect regressions and vulnerabilities early in the development
cycle. Continuous monitoring of key metrics such as uptime, response times, and error rates allows for
proactive identification and resolution of issues before they impact users.
Adopting a culture of reliability engineering, where reliability is treated as a first-class concern and
shared responsibility across development, operations, and quality assurance teams, is essential. This
involves incorporating reliability requirements into the software development process, conducting
post-incident reviews to learn from failures, and continuously improving system resilience over time.
Security
Security is paramount in software systems, particularly as cyber threats evolve in complexity and
sophistication. It encompasses measures to protect data, prevent unauthorized access, and mitigate
risks associated with vulnerabilities and attacks.
A comprehensive security strategy involves multiple layers of defence, including network security,
application security, and data encryption. Network security measures such as firewalls, intrusion
detection systems, and secure network protocols help protect against external threats and unauthorized
access.
Application security involves implementing secure coding practices, such as input validation, output
encoding, and proper handling of sensitive data, to prevent common vulnerabilities such as SQL
injection, cross-site scripting (XSS), and authentication bypass. Secure authentication and authorization
mechanisms, such as multi-factor authentication and role-based access control (RBAC), help ensure
that only authorized users can access sensitive resources.
Data encryption is essential for protecting data both at rest and in transit. Encrypting data stored in
databases or filesystems helps prevent unauthorized access in case of data breaches or unauthorized
access to storage devices. Similarly, using secure communication protocols such as HTTPS/TLS for data
transmission over networks ensures data confidentiality and integrity.
In addition to preventive measures, proactive threat detection and incident response capabilities are
critical for effective security management. Security monitoring tools, intrusion detection systems (IDS),
and security information and event management (SIEM) platforms help detect and respond to security
incidents in real-time.
Compliance with industry standards and regulations, such as the General Data Protection Regulation
(GDPR), Health Insurance Portability and Accountability Act (HIPAA), and Payment Card Industry Data
Security Standard (PCI DSS), is essential for ensuring legal and regulatory compliance. Regular security
audits, vulnerability assessments, and penetration testing help identify and address security gaps and
ensure continuous improvement of security posture.
93
Participant Handbook
Therefore, scalability, reliability, and security are fundamental technical requirements that must be
carefully considered and addressed throughout the software development lifecycle. By incorporating
best practices and leveraging appropriate technologies and strategies, organizations can build robust
and resilient systems that meet the demands of modern applications while protecting against evolving
threats and vulnerabilities.
Validation processes play a pivotal role in the technical performance assessment of AI algorithms as they
provide a means to gauge how well a model generalizes to unseen data and real-world scenarios. The
efficacy of these processes directly influences the reliability and accuracy of performance assessments,
impacting decisions regarding model deployment, optimization, and further development.
Firstly, validation processes serve as a critical step in mitigating overfitting, a common challenge in
machine learning. Overfitting occurs when a model learns to memorize the training data rather than
capturing underlying patterns, resulting in poor generalization to new data. Through techniques like
cross-validation and holdout validation, validation processes enable researchers to assess whether
a model has learned meaningful patterns or merely noise from the training data. By evaluating
performance on separate validation sets, researchers can identify instances of overfitting and fine-tune
model parameters to improve generalization performance.
Moreover, validation processes help identify and address biases present in AI algorithms, ensuring fair
and equitable outcomes across different demographic groups. Biases can manifest in various forms,
such as underrepresentation of certain groups in the training data or skewed distributions of features.
Through rigorous validation methodologies like stratified cross-validation and fairness-aware evaluation
metrics, researchers can systematically assess the model's performance across different demographic
subgroups. This allows for the detection of biases and the implementation of mitigation strategies, such
as data augmentation or algorithmic adjustments, to enhance the fairness and inclusivity of AI systems.
Furthermore, validation processes play a crucial role in benchmarking the performance of AI algorithms
against established standards and competing models. By adopting standardized evaluation metrics and
methodologies, researchers can compare the performance of different algorithms on common datasets
or benchmark suites. This facilitates fair and objective comparisons, enabling stakeholders to identify
state-of-the-art approaches and areas for improvement. Additionally, validation processes enable the
tracking of model performance over time, allowing researchers to assess the impact of algorithmic
modifications or dataset changes on technical performance metrics. Through continuous validation
and benchmarking, researchers can iteratively refine AI algorithms to achieve superior performance
across diverse applications and domains.
Validation processes also facilitate the identification of failure modes and edge cases where AI
algorithms may exhibit suboptimal performance or unexpected behavior. Through rigorous testing and
validation methodologies, researchers can systematically explore the model's behavior across a wide
range of scenarios and inputs. This includes stress-testing the model with adversarial examples, out-
of-distribution data, or real-world anomalies to assess its robustness and resilience. By uncovering
failure modes and edge cases during validation, researchers can refine the model's architecture,
training process, or input data preprocessing to enhance its performance and reliability in real-world
deployment scenarios.
Moreover, validation processes contribute to the establishment of trust and transparency in AI systems
by providing stakeholders with insights into the model's capabilities, limitations, and uncertainties.
Through transparent reporting of validation results, including performance metrics, validation
methodologies, and potential biases or limitations, researchers can foster greater understanding
and confidence in the model's predictions and recommendations. This transparency is essential for
facilitating informed decision-making by end-users, policymakers, and other stakeholders who rely
94
AI- Machine Learning Engineer
on AI algorithms for critical tasks. By demonstrating the rigor and reliability of validation processes,
researchers can instill trust in AI systems and promote their responsible and ethical use across various
domains.
Additionally, validation processes play a crucial role in supporting regulatory compliance and industry
standards for AI systems. Many regulatory frameworks and industry guidelines require thorough
validation and testing of AI algorithms to ensure their safety, efficacy, and fairness. By adhering to
established validation methodologies and performance benchmarks, organizations can demonstrate
compliance with regulatory requirements and industry best practices, reducing the risk of legal liabilities
and reputational damage. Furthermore, validation processes enable organizations to proactively
address emerging challenges and regulatory developments in the rapidly evolving landscape of AI
governance and ethics. By investing in robust validation processes, organizations can build trust with
regulators, customers, and the public while driving innovation and responsible AI adoption.
Algorithmic Transparency: Algorithmic transparency refers to the principle of making the algorithms
and processes underlying automated decision-making systems, such as AI models, understandable and
95
Participant Handbook
interpretable to stakeholders, including end-users, developers, regulators, and other interested parties.
It involves providing insight into how algorithms work, why they make certain decisions, and what
factors influence those decisions. Here's a breakdown of algorithmic transparency:
• Provide clear explanations of algorithms and techniques used, such as loss functions, optimization
algorithms, and regularization techniques.
• Describe the mathematical formulations behind these algorithms to aid stakeholders' understanding
of the model's learning process.
• Document any novel or customized algorithms developed, showcasing the model's unique
capabilities and innovations.
Evaluation Metrics and Methodologies: Evaluation metrics and methodologies are essential
components of assessing the performance of AI models. They help quantify how well a model performs
its intended task and provide insights into its strengths and weaknesses. Here's an explanation of
evaluation metrics and methodologies commonly used in AI:
• Document the evaluation metrics used to assess model performance, including accuracy, precision,
recall, and F1 score.
• Present results of model testing, validation, and benchmarking against relevant baselines or
industry standards.
• Include visualizations of model predictions and error analyses to provide insights into its effectiveness
and limitations.
Evaluation Metrics:
Accuracy measures the proportion of correctly classified instances out of the total instances. It's a
common metric for classification tasks with balanced class distributions.
Accuracy= Number of Correct Predictions/ Total Number of Predictions
Precision and Recall: Precision measures the proportion of true positive predictions out of all positive
predictions, indicating the model's ability to avoid false positives.
Precision= True Positives/ True Positives + False Positives
Recall (also known as sensitivity) measures the proportion of true positive predictions out of all actual
positives, indicating the model's ability to capture all positive instances.
Recall= True Positives/ True Positives + False Negatives
F1 Score: F1 score is the harmonic mean of precision and recall, providing a balance between the two
metrics. It's useful when there's an imbalance between classes.
F1=2× Precision×Recall / Precision+Recall
Confusion Matrix: A confusion matrix is a table that summarizes the model's performance by comparing
predicted and actual class labels. It's useful for understanding the types of errors made by the model.
Mean Absolute Error (MAE) and Mean Squared Error (MSE): Mean Absolute Error (MAE) and Mean
Squared Error (MSE) are common metrics used to evaluate the performance of regression models. Both
metrics measure the difference between the actual values and the predicted values generated by the
model.
• MAE and MSE are commonly used for regression tasks.
• MAE=n1∑i=1n|yi−y^i|
• MSE=n1∑i=1n(yi−y^i)2
96
AI- Machine Learning Engineer
R² Score (Coefficient of Determination): R² score measures the proportion of the variance in the
dependent variable that is predictable from the independent variables.
R2=1− MSE(model)/ MSE(baseline)
Evaluation Methodologies:
Train-Test Split:
• The dataset is divided into training and testing sets, where the training set is used to train the
model, and the testing set is used to evaluate its performance.
• Common splits include 70-30, 80-20, or 90-10 ratios for training and testing, respectively.
Cross-Validation:
• Cross-validation involves partitioning the dataset into multiple subsets (folds), training the model
on some folds, and evaluating it on the remaining fold.
• Common techniques include k-fold cross-validation and stratified k-fold cross-validation.
Holdout Validation:
• Similar to train-test split, but with an additional validation set used for hyperparameter tuning.
• The dataset is divided into three sets: training, validation, and testing.
Bootstrapping:
• Sampling with replacement is performed on the dataset to create multiple bootstrap samples.
• Each bootstrap sample is used for training and testing the model, and the results are aggregated to
estimate performance.
97
Participant Handbook
Additionally, it requires clear communication and collaboration with stakeholders, including project
managers, business analysts, and end-users, to ensure a shared understanding of the specifications.
2. Designing Software Architecture:
Once the technical specifications are understood, the next step is to design the software system's
architecture. This involves defining the system's overall structure, components, interfaces, and
interactions. Architects use various design principles, patterns, and methodologies, such as object-
oriented design (OOD), model-view-controller (MVC), and service-oriented architecture (SOA), to
create a robust and scalable architecture that aligns with the specifications.
3. Writing Pseudocode and Algorithms:
Before diving into actual coding, it is often beneficial to draft pseudocode or algorithms based
on the technical specifications. Pseudocode provides a high-level outline of the logic and steps
required to implement the desired functionality. It helps developers conceptualize the solution
and identify potential challenges or edge cases before writing actual code. Conversely, algorithms
outline specific sequences of operations or computations required to achieve certain tasks within
the software.
4. Selecting Programming Languages and Frameworks:
With a clear understanding of the requirements and architecture, the next step is to select the
appropriate programming languages and frameworks for implementing the software. Factors such
as the nature of the application, performance requirements, team expertise, and compatibility with
existing systems influence the choice of languages and frameworks. Commonly used languages
include Java, Python, C++, and JavaScript, while frameworks like Spring, Django, .NET, and Angular
provide additional support for development.
5. Writing Clean and Maintainable Code:
Writing clean, readable, and maintainable code is essential for ensuring software projects' long-
term success and sustainability. Developers adhere to coding standards, conventions, and best
practices established by the development team or industry guidelines. This includes meaningful
variable names, proper indentation, modularization, documentation, and adherence to design
patterns. Additionally, developers leverage code review processes and tools to solicit feedback and
ensure code quality.
6. Implementing Unit Tests and Integration Tests:
Testing is an integral part of the software development process, beginning at the coding stage.
Developers write unit tests to verify the functionality of individual units or components of the
software in isolation. These tests validate that each unit behaves as expected and meets the defined
specifications. Integration tests, on the other hand, verify the interactions and interoperability
between different components or modules of the system. Automated testing frameworks such as
JUnit, NUnit, and pytest facilitate the creation and execution of tests.
7. Refactoring and Optimization:
Developers continuously refactor and optimize the code as the codebase evolves to improve its
quality, performance, and maintainability. Refactoring involves restructuring the code without
changing its external behaviour to enhance readability, reduce complexity, and eliminate
duplication. Optimization focuses on improving the efficiency and speed of the code by identifying
and eliminating bottlenecks, unnecessary computations, or memory leaks. Profiling tools and
performance metrics aid in identifying areas for optimization.
8. Version Control and Collaboration:
Throughout the coding process, developers use version control systems such as Git, Subversion, or
Mercurial to manage changes to the codebase, track revisions, and facilitate collaboration among
team members. Version control enables developers to work concurrently on different features or
98
AI- Machine Learning Engineer
branches, merge changes seamlessly, and revert to previous versions if necessary. Collaboration
platforms like GitHub, Bitbucket, or GitLab also provide tools for code review, issue tracking, and
continuous integration.
9. Documentation and Comments:
Documenting the code is essential for facilitating understanding, maintenance, and future
development. Developers write inline comments, API documentation, and README files to
explain the code's purpose, functionality, inputs, outputs, and usage. Documentation also includes
instructions for setting up the development environment, building the software, running tests, and
deploying the application. Clear and comprehensive documentation enhances the accessibility and
usability of the codebase for developers and stakeholders alike.
10. Adhering to Coding Standards and Guidelines:
Consistency is key to ensuring software projects' readability, maintainability, and scalability.
Developers adhere to coding standards, style guides, and best practices established by the
organization or industry. This includes conventions for naming conventions, formatting, indentation,
error handling, and exception handling. Tools such as linters, code formatters, and static analysis
tools enforce coding standards and identify deviations from best practices during development.
11. Reviewing and Iterating:
Code review is a critical step in the software development lifecycle, where developers evaluate
each other's code for correctness, quality, and adherence to specifications and standards. Code
reviews provide opportunities for feedback, knowledge sharing, and identifying potential issues or
improvements. Developers discuss design decisions, implementation details, and potential edge
cases during code reviews, leading to iterative improvements and refinements to the codebase.
Converting technical specifications into software code is a systematic and iterative journey that involves
understanding requirements, designing architecture, writing code, testing, and refining the solution. By
following best practices, leveraging appropriate tools and technologies, and fostering collaboration
and communication among team members, developers can successfully translate specifications into
high-quality, maintainable, and reliable software solutions that meet the needs of stakeholders and
end-users.
99
Participant Handbook
Furthermore, documenting the evaluation metrics and methodologies used to assess AI model
performance is essential for stakeholders to gauge its effectiveness and suitability for intended tasks.
This involves describing both quantitative metrics, such as accuracy, precision, recall, and F1 score,
as well as qualitative evaluations, such as visualizations of model predictions and error analyses.
Transparently presenting the results of model testing, validation, and benchmarking against relevant
baselines or industry standards enables stakeholders to make informed decisions about the model's
deployment and potential impact. Additionally, documenting any challenges encountered during
model development and validation, along with strategies employed to address them, fosters trust and
confidence in the reliability and robustness of the AI solution.
Integration with Evaluation Framework: Integrate these defect types and corrective measures into the
evaluation framework for AI ML engineering. This could involve:
• Establishing metrics to measure the prevalence and severity of each type of defect.
• Incorporating these metrics into the overall evaluation criteria for AI models and systems.
• Implementing automated testing and monitoring processes to continuously assess model
performance and identify potential defects.
100
AI- Machine Learning Engineer
Documentation and Knowledge Sharing: Document these defect types and corrective measures in
a knowledge base or repository accessible to AI ML engineers. Encourage knowledge sharing and
collaboration to foster continuous improvement in AI model development practices.
101
Participant Handbook
• This builds trust among stakeholders, including clients, regulators, and the public, and mitigates
legal and reputational risks associated with non-compliance.
102
AI- Machine Learning Engineer
Summary
• Participants will gain the ability to assess the designs of core algorithmic models within sample
autonomous systems. This involves analyzing the architecture and functionality of these models to
ensure they meet performance and reliability requirements in autonomous environments.
• They will also learn to evaluate data flow diagrams of sample algorithmic models, understanding the
flow of data and operations within the models. This skill enables participants to identify potential
bottlenecks, inefficiencies, or areas for optimization in data processing pipelines.
• Participants will explore various available resources for the productionization of algorithmic
models, including computational, software, human, and documentation resources. Understanding
these resources is crucial for successfully deploying and managing algorithmic models in real-world
applications.
• They will assess parallel programming requirements such as MISD and MIMD for sample algorithmic
models. This involves understanding how parallelism can be leveraged to improve performance and
scalability in algorithmic computations across multiple processing units.
• Participants will engage in discussions about the principles of code and design quality, emphasizing
the importance of writing clean, maintainable, and efficient code to ensure the reliability and
longevity of software systems.
• They will discuss technical requirements such as scalability, reliability, and security, considering
how these factors influence the design, implementation, and deployment of algorithmic models in
production environments.
• The process of converting technical specifications into software code will be explored, focusing on
translating requirements and design specifications into executable code that meets the intended
functionality and performance criteria.
• The importance of designing testable, version-controlled, and reproducible software code will be
emphasized, highlighting best practices for ensuring the reliability and maintainability of software
systems.
• Participants will evaluate best practices around deploying machine learning models and monitoring
model performance, understanding the importance of continuous monitoring and optimization to
ensure model effectiveness over time.
• They will develop software code to support the deployment of sample algorithmic models, gaining
hands-on experience implementing and managing algorithmic solutions in real-world applications.
• Participants will learn to develop continuous and automated integrations to deploy algorithmic
models, streamlining the deployment process and ensuring consistency and reliability in software
releases.
• They will use appropriate tools and software packages while integrating data flows, data structures,
and core algorithmic models, leveraging technology to optimize performance and efficiency in
algorithmic computations.
• Developing different types of test cases for the code will be explored, covering unit tests, integration
tests, and end-to-end tests to ensure the correctness and robustness of software implementations.
• They will demonstrate unit test case execution to analyze code performance, identifying areas for
optimization and improvement based on test results and performance metrics.
• Finally, participants will document test case results and optimize sample software code based on
test results, iterating on code implementations to achieve optimal performance and reliability in
real-world applications.
103
Participant Handbook
Exercise
Multiple-choice Question:
1. Which of the following is a focus area when evaluating designs of core algorithmic models in sample
autonomous systems?
a. Scalability and reliability b. Data visualization techniques
c. User interface design d. Network security protocols
2. What is the primary purpose of assessing parallel programming requirements for sample algorithmic
models?
a. To improve software documentation
b. To enhance code readability
c. To optimize performance and efficiency
d. To ensure compliance with industry standards
4. What are key considerations when discussing technical requirements such as scalability, reliability,
and security?
a. Optimizing for low memory usage b. Minimizing code modularity
c. Prioritizing single-threaded execution d. Adhering to data privacy regulations
5. What is the purpose of developing continuous and automated integrations to deploy algorithmic
models?
a. To increase manual intervention b. To decrease deployment frequency
c. To improve deployment efficiency d. To prolong software development cycles
Descriptive Questions
1. Explain the significance of evaluating designs of core algorithmic models in sample autonomous
systems.
2. How do data flow diagrams help in evaluating sample algorithmic models?
3. Discuss the importance of considering various available resources to productionization algorithmic
models.
4. Compare and contrast parallel programming requirements such as MISD and MIMD for sample
algorithmic models.
5. Why is discussing the principles of code and design quality important in the context of algorithmic
models?
104
AI- Machine Learning Engineer
Notes
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
Scan the QR codes or click on the link to watch the related videos
https://www.youtube.com/watch?v=CT4xaXLcnpM https://www.youtube.com/watch?v=6VGTvgaJllM
105
Participant Handbook
106
7. Inclusive and
Environmentally
Sustainable
Workplaces
Unit 7.1 - Sustainability and Inclusion
SSC/N9014
Participant Handbook
108
AI- Machine Learning Engineer
Unit Objectives
By the end of this unit, the participants will be able to:
1. Describe different approaches for efficient energy resource utilisation and waste management
2. Describe the importance of following the diversity policies
3. Identify stereotypes and prejudices associated with people with disabilities and the negative
consequences of prejudice and stereotypes.
4. Discuss the importance of promoting, sharing, and implementing gender equality and PwD
sensitivity guidelines at the organizational level.
5. Practice the segregation of recyclable, non-recyclable and hazardous waste generated.
6. Demonstrate different methods of energy resource use optimization and conservation.
7. Demonstrate essential communication methods in line with gender inclusiveness and PwD
sensitivity.
109
Participant Handbook
110
AI- Machine Learning Engineer
111
Participant Handbook
Negative consequences of prejudice and stereotypes associated with people with disabilities include:
• Discrimination: Prejudice and stereotypes can fuel discriminatory practices and policies that
limit the rights and opportunities of individuals with disabilities. This can include employment
discrimination, educational segregation, inaccessible environments, and denial of healthcare
services.
• Social Exclusion: Stereotypes and prejudices can lead to social isolation and exclusion for
individuals with disabilities, as they may face barriers to participating fully in social, recreational,
and community activities. This can contribute to feelings of loneliness, low self-esteem, and mental
health challenges.
• Psychological Impact: Internalizing negative stereotypes and prejudices can have detrimental effects
on the mental health and well-being of individuals with disabilities. It can lead to feelings of shame,
self-doubt, and internalized ableism, which can exacerbate symptoms of anxiety, depression, and
other mental health conditions.
• Underestimation of Potential: Stereotypes and prejudices can result in the underestimation of
the abilities and potential of individuals with disabilities. This can lead to lowered expectations,
reduced personal and professional development opportunities, and barriers to achieving one’s full
potential.
• Economic Disadvantage: Discrimination and stigma can contribute to economic disadvantage
for individuals with disabilities, as they may face challenges in accessing employment, housing,
transportation, and other essential resources. This can perpetuate cycles of poverty and inequality.
Addressing stereotypes and prejudices associated with people with disabilities requires challenging
misconceptions, promoting awareness and understanding, advocating for inclusive policies and
practices, and valuing the diverse abilities and contributions of all individuals, regardless of disability
status.
112
AI- Machine Learning Engineer
113
Participant Handbook
• Provide Accessibility Options: Ensure that all communication materials, including written
documents, presentations, and digital content, are accessible to individuals with disabilities. This
may involve providing alternative formats such as braille, large print, or audio recordings and
ensuring that digital content is compatible with screen readers and other assistive technologies.
• Offer Accommodations: When organizing meetings, events, or training sessions, offer
accommodations to accommodate the needs of individuals with disabilities. This may include
providing sign language interpreters, captioning services, accessible transportation, or assistive
listening devices to ensure equal participation and access for all attendees.
• Respect Privacy and Confidentiality: Respect the privacy and confidentiality of individuals when
discussing gender identity or disability status. Avoid making assumptions about someone’s gender
identity or disability status and only discuss these topics if relevant and with the individual’s consent.
• Provide Training and Education: Offer training and education to employees on gender inclusiveness
and sensitivity towards individuals with disabilities. This may include workshops, seminars, or online
courses that raise awareness, promote understanding, and provide practical strategies for creating
inclusive environments and communicating respectfully with all individuals.
• Seek Feedback and Input: Encourage feedback and input from individuals with diverse backgrounds,
including those from different genders and individuals with disabilities. Create opportunities for
open dialogue and collaboration, and actively listen to the perspectives and experiences of others
to inform decision-making and improve communication practices.
• Lead by Example: Demonstrate inclusive communication practices by leading by example. Be
mindful of your language, behaviour, and attitudes towards gender and disability diversity, and
strive to create an environment where all individuals feel valued, respected, and empowered to
contribute.
• Address Microaggressions and Bias: Be vigilant about addressing microaggressions, unconscious
bias, and discriminatory behaviour that may occur in communication interactions. Take proactive
steps to challenge stereotypes, correct misinformation, and foster a culture of respect and
acceptance for all individuals.
114
AI- Machine Learning Engineer
Enforce policies and guidelines that mandate compliance with waste segregation practices. Clearly
communicate the consequences of non-compliance, such as fines or penalties, to create a deterrent
against improper waste disposal. Monitoring and auditing practices regularly help assess compliance
levels, identify improvement areas, and adjust procedures accordingly.
Collaborate closely with waste management providers to ensure proper collection, transportation,
and processing of segregated waste. Establish connections with local recycling facilities, composting
centres, and hazardous waste disposal sites to guarantee that each waste type undergoes appropriate
handling.
Promote awareness and participation through ongoing communication, educational campaigns, and
incentives for compliance. Recognize and celebrate waste reduction and recycling achievements to
inspire continuous engagement and foster a sense of community responsibility.
Leading by example is crucial in driving change. Demonstrate a commitment to environmental
stewardship by practising proper waste segregation personally. Highlight the multifaceted benefits
of waste segregation, including its positive impact on the environment, public health, and resource
conservation. Organizations and communities can contribute significantly to effective waste
management and sustainable practices by implementing these strategies.
115
Participant Handbook
• Behavioural Changes and Awareness: Promoting energy conservation behaviour and raising
awareness about the importance of energy efficiency can encourage individuals and organizations
to adopt more sustainable practices. Simple actions such as turning off lights when not in use, using
energy-efficient appliances, and reducing unnecessary energy consumption can significantly affect
overall energy use.
• Government Policies and Incentives: Government policies, regulations, and incentives are crucial
in promoting energy conservation and efficiency. This includes setting energy efficiency standards
for appliances and buildings, offering tax incentives or rebates for energy-saving investments, and
implementing carbon pricing mechanisms to internalize the costs of greenhouse gas emissions.
116
AI- Machine Learning Engineer
Summary
• Various approaches such as energy audits, implementing renewable energy sources, and
adopting waste reduction strategies are crucial for efficient energy resource utilization and waste
management, contributing to sustainability and cost-effectiveness.
• Diversity policies promote inclusivity, equality, and respect for individuals from diverse backgrounds,
fostering a positive work environment, innovation, and organizational growth.
• Identifying and challenging stereotypes and prejudices against people with disabilities is essential to
promote inclusivity, combat discrimination, and create a supportive environment where everyone
feels valued and respected.
• Implementing gender equality and sensitivity guidelines at the organizational level helps create a
fair and inclusive workplace where individuals are judged based on their abilities and contributions
rather than gender or disability.
• Practising waste segregation into recyclable, non-recyclable, and hazardous categories, along with
demonstrating energy conservation and optimization methods, are crucial steps in promoting
environmental sustainability and resource efficiency.
117
Participant Handbook
Exercise
Multiple Choice Questions
1. What is one approach for efficient energy resource utilization?
a. Increasing energy consumption b. Conducting energy audits
c. Ignoring renewable energy sources d. Avoiding waste reduction strategies
3. What are the negative consequences of stereotypes and prejudices against people with disabilities?
a. Increased inclusivity b. Enhanced work environment
c. Discrimination and marginalization d. Positive organizational culture
4. What is the significance of promoting gender equality and sensitivity guidelines in organizations?
a. To encourage discrimination b. To create an unfair work environment
c. To foster fairness and inclusivity d. To reinforce gender biases
Descriptive Questions
1. Can you explain the importance of waste segregation and energy conservation in promoting
environmental sustainability?
2. How do diversity policies contribute to organizational success and employee satisfaction?
3. What are some common stereotypes and prejudices associated with people with disabilities, and
how can they impact workplace dynamics?
4. Discuss the role of organizational gender equality initiatives in promoting fairness and inclusivity in
the workplace.
5. Can you demonstrate a practical method for waste segregation and discuss its importance in waste
management practices?
118
AI- Machine Learning Engineer
Notes
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
Scan the QR codes or click on the link to watch the related videos
https://youtu.be/RnvCbquYeIM?si=TLTyWhrv4p-sodME https://youtu.be/1gCr4jOsweo?si=YPQ0qt2tXJ4gowfz
119
Participant Handbook
120
8. Employability Skills
DGT/VSQ/N0102
Participant Handbook
https://www.skillindiadigital.gov.in/content/list
Employability Skills
122
9. Annexure
Participant Handbook
Page
Module No. Unit No. Topic Name Link for QR Code (s) QR code (s)
No
1.1.1 Intro-
duction to https://youtu.be/
Module 1: Artificial Intelli- 15 ad79nYk2keg?si=U3fOp-
Artificial Unit 1.1 gence & Big AmnaBCe-Gl
Intelli- Introduction Data Analytics
gence & to Artificial What Is AI?
Big Data Intelligence
Analytics – & Big Data
An Intro- Analytics
1.1.4 Types https://youtu.be/XFZ-
duction 15
of AI rQ8eeR8?si=5ptCRjz5Lg6zVkyB
The 7 Types of AI
2.1.1 Activities
across Product https://www.youtube.com/
28
Development watch?v=oE6VD23Kr0I
Unit 2.1:
Stages
Exploring What is Product
Module 2:
Product De- development?
Product
velopment
Engineer-
and Man-
ing Basics
agement
Processes 2.1.2 Product
https://www.youtube.com/
Management 28
watch?v=XD45n_agC3g
Processes
What Is Product
Management?
3.1.2 Analysing
Correlations https://www.youtube.com/
46
with Graphical watch?v=xTpHD5WLuoA
Techniques Correlation
Module 3: Unit 3.1: and Regression
Product Statistical Analysis
Engineer- Analysis Fun-
ing Basics damentals 3.1.4 Intro-
duction to
Pearson's
https://www.youtube.com/
Correlation 46
watch?v=11c9cs6WpJU
Coefficient and
Methods of Correlation
Least Squares Coefficient
124
AI- Machine Learning Engineer
Page
Module No. Unit No. Topic Name Link for QR Code (s) QR code (s)
No
4.1.1 Evalua-
tion of Soft- https://www.youtube.com/
60
ware Develop- watch?v=Fi3_BjVzpqk
ment Practices Introduction
Unit 4.1: To Software
Module 4: Software De- Development
Develop- velopment LifeCycle
ment Tools Practices and
and Usage Performance
Optimization
4.1.2 Harness-
ing Scripting
https://www.youtube.com/
Languages for 60
watch?v=g0Q-VWBX5Js
Development
Scripting
Efficiencies Language Vs
Programming
Language
5.1.1 Super-
vised and
https://www.youtube.com/
Unsupervised 76
watch?v=1FZ0A1QCMWc
Module Learning Algo-
Unit 5.1: rithms Supervised vs
5: Perfor- Unsupervised vs
Algorithmic
mance Reinforcement
Model De-
Evaluation Learning
velopment
of Algo-
and Assess-
rithmic
ment Tasks
Models
5.1.2 Technical
Parameters for https://www.youtube.com/
76
Algorithmic watch?v=yN7ypxC7838
Models
All Machine
Learning Models
Module
Unit 6.1:
6: Perfor- 6.1.1. Designs
Examination
mance of Core Algo-
of Data Flow https://www.youtube.com/
Evaluation rithmic Models 105
Diagrams of in Autonomous
watch?v=CT4xaXLcnpM
of Algo-
Algorithmic Systems
rithmic Autonomous
Models
Models Systems
125
Participant Handbook
Page
Module No. Unit No. Topic Name Link for QR Code (s) QR code (s)
No
6.1.2 Data
Flow Diagrams https://www.youtube.com/
105
of Algorithmic watch?v=6VGTvgaJllM
Models
Data Flow
Diagrams
7.1.1 Ap-
proaches for
Efficient Ener- https://youtu.be/
gy Resource 119 RnvCbquYeIM?si=TLTyWhrv4p-
Module 7: sodME
Utilization and
Inclusive Can 100%
Waste Man- renewable
and envi-
Unit 7.1: agement energy power
ronmental-
Sustainability the world?
ly sus-
and Inclusion
tainable
7.1.4 Im-
workplac-
portance of
es
Raising Aware- https://youtu.be/1g-
ness of Gender 119 Cr4jOsweo?si=YPQ0qt2tXJ-
Equality and 4gowfz
PwD Sensitiv- Gender
ity Sensitivity
126
IT – ITeS Sector Skill Council NASSCOM
Address: Plot No. – 7, 8, 9 & 10 Sector – 126, Noida, U�ar Pradesh – 201303
New Delhi – 110049
Website: www.sscnasscom.com
e-mail: [email protected]
Phone: 0120 4990111 – 0120 4990172
Price: `