0% found this document useful (0 votes)
13 views

ML Unit 1

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views

ML Unit 1

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

1.

Introduction to Machine Learning


• What is Machine Learning?
o Machine Learning (ML) is a subset of artificial intelligence (AI) that enables
systems to learn and improve from experience without being explicitly
programmed. It focuses on using data and algorithms to imitate the way humans
learn, gradually improving accuracy.
o Key Concept: ML is fundamentally about automating the process of learning
patterns and making predictions based on data.
• Well-Posed Learning Problems
o A learning problem is considered "well-posed" when it has:
1. Task (T): The objective or activity the system needs to perform (e.g.,
predicting, classifying).
2. Performance Measure (P): A metric used to evaluate the success of the
system in performing the task (e.g., accuracy, precision).
3. Experience (E): Historical data or knowledge from which the system
can learn.
o Example of a Well-Posed Learning Problem:
▪ Predicting house prices:
▪ Task (T): Predict the sale price of houses.
▪ Performance Measure (P): Mean Absolute Error (MAE) or
Mean Squared Error (MSE) in prediction.
▪ Experience (E): Historical data of past house sales, including
features like area, number of bedrooms, and location.
• Designing a Learning System
o Step 1: Data Collection
▪ Gather relevant data that represents the problem domain. This could be
structured data (like databases) or unstructured data (like text or
images).
o Step 2: Feature Selection/Engineering
▪ Identifying the relevant attributes or features in the data that will help
the model make accurate predictions.
▪ Involves transforming raw data into inputs that the model can interpret
(e.g., converting dates to numeric values).
o Step 3: Choosing a Model
▪ Selecting an algorithm suited to the problem type (classification,
regression, clustering).
▪ Different models have strengths depending on data type and problem
requirements.
o Step 4: Training
▪ Feeding the data to the model to learn the relationship between inputs
and outputs.
o Step 5: Evaluation
▪ Testing the model’s performance using metrics (e.g., accuracy for
classification, RMSE for regression).
o Step 6: Deployment
▪ Integrating the model into an application where it can make predictions
on new data in real-time or batch processing.
• Learning vs. Designing
o Learning: Developing models based on data, letting the model find patterns
autonomously.
o Designing: In traditional programming, rules and logic are manually coded by
engineers rather than discovered by learning.
• Training vs. Testing
o Training: The model learns patterns from a set of labeled examples (known as
the training dataset).
o Testing: The trained model is then evaluated on a separate dataset (testing
dataset) to check how well it generalizes to new, unseen data.
o Goal: Ensure that the model performs well on testing data, indicating it can
generalize to new data.

2. Characteristics of Machine Learning Tasks


• Predictive Tasks
o These tasks focus on predicting a target variable using given input features.
o Examples:
▪ Classification: Assigning data points to predefined categories (e.g.,
spam vs. non-spam emails).
▪ Regression: Predicting a continuous output variable (e.g., predicting
house prices based on features).
• Descriptive Tasks
o These tasks aim to explore data and identify patterns without making explicit
predictions.
o Examples:
▪ Clustering: Grouping data points with similar characteristics (e.g.,
customer segmentation).
▪ Association Rule Mining: Finding relationships between variables in a
dataset (e.g., “market basket analysis” in retail, where buying one item
is linked to buying another).

3. Machine Learning Models


• Geometric Models
o These models interpret data points as vectors in a geometric space, and the
objective is to find boundaries that separate different classes or regions.
o Examples:
▪ Linear Regression: Uses a linear equation to fit data points to a straight
line.
▪ k-Nearest Neighbors (k-NN): Classifies data points based on the
closest labeled points in the feature space.
• Logical Models
o Logical models use rules or trees to make decisions based on data features.
o Examples:
▪ Decision Trees: Divide data by asking a series of “yes/no” questions
until each leaf represents a single class or value.
▪ Rule-Based Systems: If-then rules derived from data to classify or
predict.
• Probabilistic Models
o These models are based on probability theory and often work well with
uncertainty.
o Examples:
▪ Naïve Bayes: Uses Bayes’ theorem to classify data based on conditional
probabilities.
▪ Gaussian Mixture Models (GMM): Models data as a mixture of
several Gaussian distributions, useful in clustering.
• Issues in Machine Learning
o Overfitting: Model learns details/noise in the training data and fails to
generalize to new data.
o Underfitting: Model is too simple to capture the underlying trend in the data.
o Data Quality and Quantity: The effectiveness of machine learning is highly
dependent on high-quality, representative data.
o Model Complexity: The more complex a model, the harder it is to interpret and
the more computing power it may require.

4. Types of Machine Learning


• Learning Associations
o Association learning focuses on discovering interesting relations between
variables in large datasets.
o Example: Market Basket Analysis (e.g., finding patterns like "people who buy
bread also buy butter").
• Supervised Learning
o In supervised learning, the model learns from labeled data, where each data
point has an input and the correct output.
o Types of Supervised Learning:
▪ Classification: Predicts discrete values (e.g., spam vs. not spam).
▪ Regression: Predicts continuous values (e.g., predicting house prices).
o Objective: Make accurate predictions on new data based on learned
relationships.
o Examples of Algorithms: Linear Regression, Support Vector Machine,
Decision Trees, Random Forest, k-NN.
• Unsupervised Learning
o In unsupervised learning, the model learns patterns from data without any
labeled responses.
o Types of Unsupervised Learning:
▪ Clustering: Grouping similar data points together (e.g., customer
segmentation).
▪ Association Analysis: Finding rules that capture associations between
data items.
o Objective: Identify patterns, structure, or groupings within data.
o Examples of Algorithms: k-means, Hierarchical Clustering, PCA, Apriori.
• Reinforcement Learning
o A model learns by interacting with an environment and receives feedback in the
form of rewards or penalties.
o Key Concepts:
▪ Agent: Learner or decision maker.
▪ Environment: The world in which the agent operates.
▪ Actions: All possible moves the agent can make.
▪ Reward: Feedback from the environment based on the actions taken.
o Objective: Learn a strategy (policy) that maximizes cumulative reward over
time.
o Examples of Applications: Robotics, Game Playing, Autonomous Vehicles.
o Common Algorithms: Q-learning, Deep Q-Networks (DQN), Policy Gradient
Methods.

2-Marks Questions
1. What is machine learning?
Answer: Machine learning (ML) is a branch of artificial intelligence focused on
building systems that learn and improve from data without explicit programming. It
allows computers to identify patterns and make decisions.
2. Define a well-posed learning problem.
Answer: A well-posed learning problem has three components: a task (T) that specifies
what the model needs to do, a performance measure (P) that evaluates its success, and
experience (E) from data to learn from.
3. What is the difference between training and testing?
Answer: Training involves teaching a model using labeled data, while testing evaluates
the model's ability to generalize to new, unseen data.
4. What is overfitting?
Answer: Overfitting happens when a model learns noise and specific details in the
training data, resulting in poor performance on new data.
5. Explain supervised learning.
Answer: In supervised learning, the model is trained on labeled data, where each input
has a corresponding output. The model learns to map inputs to correct outputs.
6. What is the goal of unsupervised learning?
Answer: The goal of unsupervised learning is to discover patterns or structure in data
without labeled outputs, such as grouping similar items through clustering.
7. What is reinforcement learning?
Answer: Reinforcement learning is a type of machine learning where an agent learns
by interacting with an environment, receiving rewards or penalties for actions, and
aiming to maximize cumulative rewards.
8. Define feature engineering.
Answer: Feature engineering is the process of selecting, transforming, or creating
features (variables) in a dataset to improve the performance of a machine learning
model.
9. What are predictive tasks?
Answer: Predictive tasks involve using historical data to predict future or unknown
outcomes, commonly applied in regression and classification tasks.
10. Name two examples of logical models in machine learning.
Answer: Decision Trees and Rule-Based Systems are examples of logical models that
use a set of rules to make predictions.

4-Marks Questions
1. Describe the steps in designing a learning system.
Answer: The steps in designing a learning system include data collection, feature
selection/engineering, model selection, training, evaluation, and deployment. Each step
ensures that the model can learn patterns in data, make predictions, and operate
effectively in real-world scenarios.
2. Differentiate between classification and regression tasks.
Answer: Classification tasks aim to assign inputs to discrete categories (e.g., spam vs.
non-spam emails), while regression tasks predict continuous numerical values (e.g.,
house prices).
3. Explain the key differences between supervised and unsupervised learning.
Answer: In supervised learning, the model is trained on labeled data, meaning each
input has an associated output. In unsupervised learning, the model identifies patterns
in unlabeled data, grouping or organizing information without predefined labels.
4. What are geometric models in machine learning? Provide an example.
Answer: Geometric models interpret data points as vectors in a geometric space and
aim to find boundaries that separate classes. An example is k-Nearest Neighbors (k-
NN), which classifies points based on their proximity to labeled points in the feature
space.
5. List and describe two main characteristics of machine learning tasks.
Answer: Machine learning tasks are broadly classified as predictive tasks, which
focus on making predictions based on historical data, and descriptive tasks, which
focus on identifying patterns and insights, like clustering and association rule mining.
6. What is feature engineering and why is it important?
Answer: Feature engineering involves selecting, creating, or modifying features in the
data to improve model performance. It is crucial because the right features can enhance
a model's accuracy, reduce overfitting, and improve interpretability.
7. Define underfitting and describe one way to avoid it.
Answer: Underfitting occurs when a model is too simple to capture the underlying data
patterns. This can be avoided by using a more complex model or by adding relevant
features to improve representation.
8. Explain the concept of reinforcement learning with an example.
Answer: Reinforcement learning involves an agent interacting with an environment,
receiving rewards or penalties based on its actions, and learning to maximize
cumulative rewards. For example, a robot learning to navigate a maze can adjust its
actions to reach the goal faster by learning from rewards received.
9. Describe the difference between overfitting and underfitting.
Answer: Overfitting is when a model captures noise and specific details from training
data, reducing generalization ability. Underfitting occurs when the model is too
simplistic, failing to capture underlying trends, and thus performing poorly on both
training and test data.
10. List and briefly describe two probabilistic models in machine learning.
Answer:
o Naïve Bayes: A classification model based on Bayes’ theorem, assuming
independence among features.
o Gaussian Mixture Models (GMM): Models data as a mixture of several
Gaussian distributions, useful for clustering and density estimation.

6-Marks Questions
1. Explain three types of machine learning with examples.
Answer:
o Supervised Learning: Trains on labeled data to map inputs to outputs, such as
classification (e.g., spam detection) or regression (e.g., predicting prices).
o Unsupervised Learning: Learns patterns in unlabeled data, useful in clustering
(e.g., customer segmentation) and association (e.g., market basket analysis).
o Reinforcement Learning: An agent interacts with an environment, learning
through rewards and penalties to maximize cumulative rewards, as in robotics
or game-playing.
2. Discuss the main types of machine learning models and provide examples.
Answer:
o Geometric Models: Represent data points in space, such as k-Nearest
Neighbors.
o Logical Models: Use rules, like Decision Trees, which split data based on
feature values.
o Probabilistic Models: Use probability distributions to handle uncertainty, such
as Naïve Bayes.
3. How does overfitting occur, and what are some methods to prevent it?
Answer: Overfitting occurs when a model learns noise and specific details in the
training data, making it less effective on new data. To prevent overfitting, use
techniques like cross-validation, pruning (for decision trees), regularization, and
simplifying the model.
4. Explain the process of training and testing in machine learning and their
significance.
Answer: Training involves feeding a model labeled data to learn patterns, while testing
evaluates its performance on unseen data to ensure it generalizes well. This process is
vital for assessing a model's predictive power and accuracy.
5. Describe the importance of feature selection and its impact on model performance.
Answer: Feature selection involves choosing the most relevant variables, which helps
improve accuracy, reduces computation, and can prevent overfitting. The right features
provide clearer signals, allowing the model to learn more effectively.
6. What are association tasks in machine learning? Provide an example.
Answer: Association tasks discover interesting relationships between variables in data,
such as finding patterns in transactional data. An example is market basket analysis,
where purchasing one item may suggest a likelihood of purchasing related items.
7. Compare geometric, logical, and probabilistic models.
Answer:
o Geometric Models (e.g., k-NN) focus on spatial relationships in data.
o Logical Models (e.g., Decision Trees) use rules for classification and decision-
making.
o Probabilistic Models (e.g., Naïve Bayes) rely on probabilities and handle
uncertainty in predictions.
8. Discuss the importance of evaluation metrics in machine learning. Give two
examples.
Answer: Evaluation metrics assess model performance, guiding improvement.
Examples include accuracy for classification tasks and mean squared error (MSE)
for regression. Metrics provide quantitative ways to compare models and select the
best-performing one.
9. Explain clustering and its applications in machine learning.
Answer: Clustering groups similar data points into clusters based on shared
characteristics without labels. Applications include customer segmentation in
marketing, grouping documents by topics, and image segmentation.
10. Describe a scenario where reinforcement learning is more suitable than supervised
or unsupervised learning.
Answer: Reinforcement learning is ideal in scenarios where actions yield sequential
rewards, such as game-playing or robotics. For instance, an AI agent in a chess game
learns strategies by maximizing rewards (winning) through trial and error.

You might also like