ML Unit 1
ML Unit 1
2-Marks Questions
1. What is machine learning?
Answer: Machine learning (ML) is a branch of artificial intelligence focused on
building systems that learn and improve from data without explicit programming. It
allows computers to identify patterns and make decisions.
2. Define a well-posed learning problem.
Answer: A well-posed learning problem has three components: a task (T) that specifies
what the model needs to do, a performance measure (P) that evaluates its success, and
experience (E) from data to learn from.
3. What is the difference between training and testing?
Answer: Training involves teaching a model using labeled data, while testing evaluates
the model's ability to generalize to new, unseen data.
4. What is overfitting?
Answer: Overfitting happens when a model learns noise and specific details in the
training data, resulting in poor performance on new data.
5. Explain supervised learning.
Answer: In supervised learning, the model is trained on labeled data, where each input
has a corresponding output. The model learns to map inputs to correct outputs.
6. What is the goal of unsupervised learning?
Answer: The goal of unsupervised learning is to discover patterns or structure in data
without labeled outputs, such as grouping similar items through clustering.
7. What is reinforcement learning?
Answer: Reinforcement learning is a type of machine learning where an agent learns
by interacting with an environment, receiving rewards or penalties for actions, and
aiming to maximize cumulative rewards.
8. Define feature engineering.
Answer: Feature engineering is the process of selecting, transforming, or creating
features (variables) in a dataset to improve the performance of a machine learning
model.
9. What are predictive tasks?
Answer: Predictive tasks involve using historical data to predict future or unknown
outcomes, commonly applied in regression and classification tasks.
10. Name two examples of logical models in machine learning.
Answer: Decision Trees and Rule-Based Systems are examples of logical models that
use a set of rules to make predictions.
4-Marks Questions
1. Describe the steps in designing a learning system.
Answer: The steps in designing a learning system include data collection, feature
selection/engineering, model selection, training, evaluation, and deployment. Each step
ensures that the model can learn patterns in data, make predictions, and operate
effectively in real-world scenarios.
2. Differentiate between classification and regression tasks.
Answer: Classification tasks aim to assign inputs to discrete categories (e.g., spam vs.
non-spam emails), while regression tasks predict continuous numerical values (e.g.,
house prices).
3. Explain the key differences between supervised and unsupervised learning.
Answer: In supervised learning, the model is trained on labeled data, meaning each
input has an associated output. In unsupervised learning, the model identifies patterns
in unlabeled data, grouping or organizing information without predefined labels.
4. What are geometric models in machine learning? Provide an example.
Answer: Geometric models interpret data points as vectors in a geometric space and
aim to find boundaries that separate classes. An example is k-Nearest Neighbors (k-
NN), which classifies points based on their proximity to labeled points in the feature
space.
5. List and describe two main characteristics of machine learning tasks.
Answer: Machine learning tasks are broadly classified as predictive tasks, which
focus on making predictions based on historical data, and descriptive tasks, which
focus on identifying patterns and insights, like clustering and association rule mining.
6. What is feature engineering and why is it important?
Answer: Feature engineering involves selecting, creating, or modifying features in the
data to improve model performance. It is crucial because the right features can enhance
a model's accuracy, reduce overfitting, and improve interpretability.
7. Define underfitting and describe one way to avoid it.
Answer: Underfitting occurs when a model is too simple to capture the underlying data
patterns. This can be avoided by using a more complex model or by adding relevant
features to improve representation.
8. Explain the concept of reinforcement learning with an example.
Answer: Reinforcement learning involves an agent interacting with an environment,
receiving rewards or penalties based on its actions, and learning to maximize
cumulative rewards. For example, a robot learning to navigate a maze can adjust its
actions to reach the goal faster by learning from rewards received.
9. Describe the difference between overfitting and underfitting.
Answer: Overfitting is when a model captures noise and specific details from training
data, reducing generalization ability. Underfitting occurs when the model is too
simplistic, failing to capture underlying trends, and thus performing poorly on both
training and test data.
10. List and briefly describe two probabilistic models in machine learning.
Answer:
o Naïve Bayes: A classification model based on Bayes’ theorem, assuming
independence among features.
o Gaussian Mixture Models (GMM): Models data as a mixture of several
Gaussian distributions, useful for clustering and density estimation.
6-Marks Questions
1. Explain three types of machine learning with examples.
Answer:
o Supervised Learning: Trains on labeled data to map inputs to outputs, such as
classification (e.g., spam detection) or regression (e.g., predicting prices).
o Unsupervised Learning: Learns patterns in unlabeled data, useful in clustering
(e.g., customer segmentation) and association (e.g., market basket analysis).
o Reinforcement Learning: An agent interacts with an environment, learning
through rewards and penalties to maximize cumulative rewards, as in robotics
or game-playing.
2. Discuss the main types of machine learning models and provide examples.
Answer:
o Geometric Models: Represent data points in space, such as k-Nearest
Neighbors.
o Logical Models: Use rules, like Decision Trees, which split data based on
feature values.
o Probabilistic Models: Use probability distributions to handle uncertainty, such
as Naïve Bayes.
3. How does overfitting occur, and what are some methods to prevent it?
Answer: Overfitting occurs when a model learns noise and specific details in the
training data, making it less effective on new data. To prevent overfitting, use
techniques like cross-validation, pruning (for decision trees), regularization, and
simplifying the model.
4. Explain the process of training and testing in machine learning and their
significance.
Answer: Training involves feeding a model labeled data to learn patterns, while testing
evaluates its performance on unseen data to ensure it generalizes well. This process is
vital for assessing a model's predictive power and accuracy.
5. Describe the importance of feature selection and its impact on model performance.
Answer: Feature selection involves choosing the most relevant variables, which helps
improve accuracy, reduces computation, and can prevent overfitting. The right features
provide clearer signals, allowing the model to learn more effectively.
6. What are association tasks in machine learning? Provide an example.
Answer: Association tasks discover interesting relationships between variables in data,
such as finding patterns in transactional data. An example is market basket analysis,
where purchasing one item may suggest a likelihood of purchasing related items.
7. Compare geometric, logical, and probabilistic models.
Answer:
o Geometric Models (e.g., k-NN) focus on spatial relationships in data.
o Logical Models (e.g., Decision Trees) use rules for classification and decision-
making.
o Probabilistic Models (e.g., Naïve Bayes) rely on probabilities and handle
uncertainty in predictions.
8. Discuss the importance of evaluation metrics in machine learning. Give two
examples.
Answer: Evaluation metrics assess model performance, guiding improvement.
Examples include accuracy for classification tasks and mean squared error (MSE)
for regression. Metrics provide quantitative ways to compare models and select the
best-performing one.
9. Explain clustering and its applications in machine learning.
Answer: Clustering groups similar data points into clusters based on shared
characteristics without labels. Applications include customer segmentation in
marketing, grouping documents by topics, and image segmentation.
10. Describe a scenario where reinforcement learning is more suitable than supervised
or unsupervised learning.
Answer: Reinforcement learning is ideal in scenarios where actions yield sequential
rewards, such as game-playing or robotics. For instance, an AI agent in a chess game
learns strategies by maximizing rewards (winning) through trial and error.