0% found this document useful (0 votes)
2 views

machine_learning_units_1_to_5_bolded_questions

The document provides a comprehensive overview of Machine Learning, covering key concepts, types, and techniques across five units. It includes definitions and explanations of various algorithms, evaluation metrics, and challenges in the field. The content serves as a study guide with questions and answers for understanding fundamental and advanced topics in Machine Learning.

Uploaded by

Sony Yadav
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

machine_learning_units_1_to_5_bolded_questions

The document provides a comprehensive overview of Machine Learning, covering key concepts, types, and techniques across five units. It includes definitions and explanations of various algorithms, evaluation metrics, and challenges in the field. The content serves as a study guide with questions and answers for understanding fundamental and advanced topics in Machine Learning.

Uploaded by

Sony Yadav
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

Machine Learning - Questions and Answers for Units 1 to 5 (2 marks

Questions and answer)

Unit 1: Introduction to Machine Learning

1. What is Machine Learning?


Answer: Machine Learning is a field of artificial intelligence that allows
machines to learn from data, identify patterns, and make decisions without
being explicitly programmed.

2. What are the types of Machine Learning?


Answer: The three main types are Supervised Learning, Unsupervised
Learning, and Reinforcement Learning.

3. What is Supervised Learning?


Answer: Supervised Learning is a type of machine learning where the model
is trained on labeled data, meaning the correct output is known for each
training example.

4. What is Unsupervised Learning?


Answer: Unsupervised Learning is a type of machine learning where the
model is trained on unlabeled data, aiming to find hidden patterns or
intrinsic structures in the data.

5. What is Reinforcement Learning?


Answer: Reinforcement Learning is a type of machine learning where an
agent learns to make decisions by interacting with an environment and
receiving feedback based on its actions.

6. What is the primary goal of Machine Learning?


Answer: The primary goal is to enable computers to automatically improve
their performance from experience (data) without human intervention.

7. What is the difference between Artificial Intelligence (AI) and


Machine Learning (ML)?
Answer: AI refers to the broader concept of machines performing tasks that
require human intelligence, while ML is a subset of AI that focuses on the
ability of machines to learn from data.

8. What are the key challenges in Machine Learning?


Answer: Some challenges include data quality, overfitting, underfitting,
bias, and interpretability of models.

9. What is Data Preprocessing in Machine Learning?


Answer: Data preprocessing involves cleaning and preparing raw data into a
format suitable for model training, including tasks like handling missing
values, normalization, and encoding categorical features.

10. What are the key steps in a typical Machine Learning project?
Answer: Key steps include problem definition, data collection, data
preprocessing, model selection, training, evaluation, and deployment.

11. What is Feature Engineering?


Answer: Feature Engineering is the process of selecting, modifying, or
creating new features from raw data to improve model performance.

12. What is Model Evaluation in Machine Learning?


Answer: Model evaluation refers to assessing the performance of a machine
learning model using various metrics like accuracy, precision, recall, F1-
score, and confusion matrix.

13. What is Overfitting in Machine Learning?


Answer: Overfitting occurs when a model learns the details and noise in the
training data to the extent that it negatively impacts its performance on new
data.

14. What is Underfitting in Machine Learning?


Answer: Underfitting occurs when a model is too simple to capture the
underlying patterns in the data, leading to poor performance on both training
and testing data.

15. What is Cross-validation in Machine Learning?


Answer: Cross-validation is a technique used to assess the performance of a
machine learning model by splitting the data into multiple subsets, training
the model on some subsets and testing it on others.

16. What is the difference between a Training Set and a Test Set?
Answer: A training set is used to train the model, while a test set is used to
evaluate the performance of the trained model on unseen data.

17. What is a Hyperparameter in Machine Learning?


Answer: Hyperparameters are parameters whose values are set before the
learning process begins. These include learning rate, batch size, and the
number of hidden layers in a neural network.

18. What is the purpose of the Loss Function in Machine Learning?


Answer: The loss function quantifies the difference between the predicted
output and the actual output. It is used to guide the optimization process
during model training.

19. What is Regularization in Machine Learning?


Answer: Regularization is a technique used to reduce the risk of overfitting
by adding a penalty term to the loss function based on the complexity of the
model.

20. What is Feature Selection?


Answer: Feature Selection is the process of selecting the most relevant
features to use in the model to improve its performance and reduce
overfitting.

Unit 2: Supervised Learning

1. What is Linear Regression?


Answer: Linear Regression is a supervised learning algorithm used to model
the relationship between a dependent variable and one or more independent
variables by fitting a linear equation.

2. What is Logistic Regression?


Answer: Logistic Regression is a classification algorithm that predicts the
probability of a binary outcome by modeling the relationship between input
features and the probability using the sigmoid function.

3. What is the difference between Linear and Logistic Regression?


Answer: Linear Regression predicts continuous values, while Logistic
Regression predicts probabilities for binary outcomes.

4. What is the use of the Sigmoid Function in Logistic Regression?


Answer: The Sigmoid function is used in Logistic Regression to map the
output of a linear equation to a probability value between 0 and 1.

5. What is a Decision Tree?


Answer: A Decision Tree is a model that makes decisions based on splitting
data into subsets using feature values, based on criteria like Gini Index or
Information Gain.

6. What is Information Gain?


Answer: Information Gain is a measure used to determine the best feature
for splitting data at each node in a Decision Tree, based on the reduction of
entropy.

7. What is a Random Forest?


Answer: Random Forest is an ensemble learning method that builds
multiple decision trees and merges them to improve accuracy and reduce
overfitting.

8. What is Bagging?
Answer: Bagging (Bootstrap Aggregating) is an ensemble method that trains
multiple models on random subsets of the data and combines their
predictions to improve accuracy.

9. What is Boosting?
Answer: Boosting is an ensemble method that builds models sequentially,
with each new model focusing on correcting the errors of the previous one.

10. What is Support Vector Machine (SVM)?


Answer: SVM is a supervised learning algorithm used for classification
tasks. It finds the optimal hyperplane that separates data into different
classes.

11. What is the Kernel Trick in SVM?


Answer: The Kernel Trick is used in SVM to map the input data to higher-
dimensional space, making it possible to find a linear separating hyperplane
for non-linearly separable data.

12. What is K-Nearest Neighbors (K-NN)?


Answer: K-NN is a supervised learning algorithm that classifies data points
based on the majority class of the K nearest neighbors.

13. What is Naive Bayes?


Answer: Naive Bayes is a probabilistic classifier based on Bayes' theorem.
It assumes independence between the features and calculates the probability
of different classes based on input features.

14. What is Cross-validation?


Answer: Cross-validation is a technique used to assess the performance of a
machine learning model by splitting the data into multiple subsets, training
the model on some subsets and testing it on others.

15. What is the difference between classification and regression?


Answer: Classification is used for predicting categorical outcomes, while
regression is used for predicting continuous numerical values.

16. What is Precision in classification?


Answer: Precision is a metric that measures the accuracy of positive
predictions made by the model, calculated as the number of true positives
divided by the sum of true positives and false positives.

17. What is Recall in classification?


Answer: Recall is a metric that measures the ability of the model to identify
all relevant positive instances, calculated as the number of true positives
divided by the sum of true positives and false negatives.

18. What is the F1-Score?


Answer: The F1-Score is the harmonic mean of Precision and Recall,
providing a balance between them, particularly when there is an uneven
class distribution.

19. What is the AUC-ROC curve?


Answer: The AUC-ROC curve is a performance measurement for
classification problems. It plots the true positive rate (recall) against the false
positive rate and shows the model’s ability to distinguish between classes.

20. What is Hyperparameter Tuning?


Answer: Hyperparameter tuning is the process of selecting the best set of
hyperparameters to optimize the performance of a machine learning model.

Unit 3: Unsupervised Learning

1. What is Clustering in Machine Learning?


Answer: Clustering is an unsupervised learning technique that groups
similar data points together based on certain features, without any predefined
labels.

2. What is K-Means Clustering?


Answer: K-Means is an iterative clustering algorithm that partitions data
into K clusters by minimizing the variance within each cluster.

3. What is the Curse of Dimensionality?


Answer: The Curse of Dimensionality refers to the challenges that arise
when working with high-dimensional data, such as increased computational
cost and the risk of overfitting.

4. What is Expectation Maximization (EM)?


Answer: EM is an iterative algorithm used for finding maximum likelihood
estimates of parameters in statistical models, particularly when the data has
missing or unobserved variables.

5. What is Gaussian Mixture Model (GMM)?


Answer: A Gaussian Mixture Model is a probabilistic model that assumes
that data is generated from a mixture of several Gaussian distributions with
unknown parameters.

6. What is Dimensionality Reduction?


Answer: Dimensionality reduction is the process of reducing the number of
input variables in a dataset, often using techniques like PCA, to simplify the
model and improve computational efficiency.

7. What is Principal Component Analysis (PCA)?


Answer: PCA is a technique used to reduce the dimensionality of data by
transforming the original variables into a smaller set of uncorrelated
variables called principal components.

8. How does PCA help in data visualization?


Answer: PCA reduces the dimensionality of high-dimensional data,
enabling it to be visualized in 2D or 3D plots while retaining the essential
features of the data.

9. What is Independent Component Analysis (ICA)?


Answer: ICA is a technique used to separate a multivariate signal into
additive, independent components, commonly used in signal processing and
feature extraction.

10. What is Agglomerative Clustering?


Answer: Agglomerative Clustering is a hierarchical clustering method
where each data point starts as its own cluster, and pairs of clusters are
merged based on a similarity measure.

11. What is DBSCAN (Density-Based Spatial Clustering of Applications


with Noise)?
Answer: DBSCAN is a density-based clustering algorithm that groups
together closely packed data points and marks points in low-density regions
as outliers.

12. What is t-SNE (t-distributed Stochastic Neighbor Embedding)?


Answer: t-SNE is a dimensionality reduction technique that is particularly
good for visualizing high-dimensional data by minimizing the divergence
between probability distributions in high and low dimensions.
13. What is Hierarchical Clustering?
Answer: Hierarchical Clustering is a clustering method that builds a tree of
clusters, with each data point being assigned to a cluster that is merged or
split as needed.

14. What is the difference between K-Means and DBSCAN?


Answer: K-Means requires the number of clusters (K) to be specified
beforehand and assumes spherical clusters, while DBSCAN can find
arbitrarily shaped clusters and does not require the number of clusters to be
specified.

15. What is the role of distance metrics in clustering?


Answer: Distance metrics, such as Euclidean distance, are used to measure
the similarity between data points. The choice of distance metric can
significantly affect the results of the clustering algorithm.

16. What is the Silhouette Score?


Answer: The Silhouette Score measures how similar each point is to its own
cluster compared to other clusters. A higher score indicates better-defined
clusters.

17. What is the difference between Agglomerative and Divisive


Clustering?
Answer: Agglomerative clustering starts with individual points and merges
them into larger clusters, while Divisive clustering starts with all points in
one cluster and recursively splits them.

18. What is the elbow method in K-Means?


Answer: The elbow method is used to determine the optimal number of
clusters (K) by plotting the sum of squared distances between data points
and their cluster centers and looking for the "elbow" where the rate of
decrease slows down.

19. What is the role of the distance metric in K-Means clustering?


Answer: The distance metric in K-Means clustering (typically Euclidean
distance) determines how the algorithm calculates the similarity between
data points and assigns them to clusters.
20. What is the limitation of K-Means clustering?
Answer: K-Means assumes that clusters are spherical and of similar sizes,
which can be a limitation when dealing with clusters of different shapes or
densities.

Unit 4: Probabilistic Graphical Models

1. What are Probabilistic Graphical Models (PGMs)?


Answer: PGMs are a family of models that represent complex distributions
through graphs, where nodes represent variables, and edges represent
probabilistic dependencies between them.

2. What is a Directed Graphical Model?


Answer: A Directed Graphical Model is a probabilistic model where the
edges represent conditional dependencies and have a direction, often used in
Bayesian Networks.

3. What is a Bayesian Network?


Answer: A Bayesian Network is a directed acyclic graph (DAG) that
represents a set of variables and their conditional dependencies via
probability distributions.

4. What is the difference between Directed and Undirected Graphical


Models?
Answer: Directed graphical models represent causal relationships, while
undirected graphical models represent associations without implying any
direction.

5. What are Markov Networks?


Answer: Markov Networks are undirected graphical models where nodes
represent random variables, and edges represent dependencies between these
variables.

6. What is Naive Bayes?


Answer: Naive Bayes is a probabilistic classifier based on Bayes' theorem.
It assumes independence between the features and calculates the probability
of different classes based on input features.

7. What is Conditional Independence in PGMs?


Answer: Conditional independence in PGMs means that two variables are
independent given the knowledge of a third variable.

8. What is the role of the 'prior' in Bayesian Networks?


Answer: The prior represents the initial belief or knowledge about the
variables in a Bayesian Network before observing any evidence.

9. What is the role of 'likelihood' in Bayesian Networks?


Answer: The likelihood represents the probability of observing the data
given certain conditions, which is updated as new evidence is obtained.

10. What is a Hidden Markov Model (HMM)?


Answer: An HMM is a statistical model where the system being modeled is
assumed to be a Markov process with hidden states. It is widely used in
speech recognition and time-series analysis.

11. What is Markov Chain?


Answer: A Markov Chain is a sequence of random variables where the
probability of each state depends only on the previous state, with no memory
of earlier states.

12. What is Maximum Likelihood Estimation (MLE) in PGMs?


Answer: MLE is used to estimate the parameters of a model by maximizing
the likelihood of the observed data under the model.

13. What is the Inference Problem in PGMs?


Answer: The Inference Problem involves computing the posterior
distribution of a set of variables given observed evidence in a probabilistic
graphical model.

14. What is the difference between Naive Bayes and Bayesian


Networks?
Answer: Naive Bayes assumes conditional independence of features given
the class, while Bayesian Networks allow arbitrary dependencies between
features.

15. What is a Conditional Probability Table (CPT)?


Answer: A CPT is a table used in Bayesian Networks that represents the
conditional probability of a variable given its parents in the graph.

16. What is a Junction Tree Algorithm in PGMs?


Answer: The Junction Tree Algorithm is a method used for exact inference
in probabilistic graphical models by transforming the model into a tree
structure to simplify calculations.

17. What is the role of the evidence in Bayesian Networks?


Answer: Evidence in Bayesian Networks refers to observed data that is used
to update the probability distribution of the remaining unobserved variables.

18. What are the advantages of Bayesian Networks?


Answer: Bayesian Networks allow the modeling of complex probabilistic
relationships and the incorporation of prior knowledge, making them useful
for uncertain or incomplete data.

19. What is a probabilistic inference?


Answer: Probabilistic inference is the process of computing the probability
distribution of certain variables in a model, given observed evidence.

20. What is the difference between generative and discriminative models


in PGMs?
Answer: Generative models model the joint probability distribution of data,
while discriminative models model the conditional probability of the output
given the input.
Unit 5: Advanced Learning

1. What is Monte Carlo Sampling?


Answer: Monte Carlo Sampling is a method used to approximate complex
integrals or distributions by generating random samples and using statistical
analysis to estimate the results.

2. What is the significance of Reinforcement Learning?


Answer: Reinforcement Learning is significant because it allows an agent to
learn optimal strategies or policies by interacting with an environment and
receiving feedback, which can be applied to areas like robotics and game AI.

3. What is the concept of an agent in Reinforcement Learning?


Answer: In Reinforcement Learning, an agent is an entity that interacts with
the environment and makes decisions to maximize cumulative rewards based
on the state it observes.

4. What is the reward function in Reinforcement Learning?


Answer: The reward function defines the feedback received by the agent
after performing an action in the environment, guiding it towards desirable
behaviors.

5. What is the exploration-exploitation trade-off in Reinforcement


Learning?
Answer: The exploration-exploitation trade-off involves balancing between
exploring new actions to gather more information (exploration) and
choosing known actions that maximize the reward (exploitation).

6. What is Q-Learning?
Answer: Q-Learning is a reinforcement learning algorithm that learns an
action-value function, which gives the expected utility of taking a particular
action in a given state.

7. What is Deep Q-Learning?


Answer: Deep Q-Learning combines Q-Learning with deep neural
networks, where the Q-table is replaced by a neural network to approximate
the Q-values for large state spaces.

8. What is the Bellman equation in Reinforcement Learning?


Answer: The Bellman equation is a recursive equation used to calculate the
value of a state or action, which helps in optimal policy determination in
Reinforcement Learning.

9. What is Temporal Difference (TD) Learning?


Answer: TD Learning is a reinforcement learning method that combines the
benefits of Monte Carlo methods and Dynamic Programming by updating
estimates based on other learned estimates.

10. What is the role of the discount factor in Reinforcement Learning?


Answer: The discount factor determines the importance of future rewards
compared to immediate rewards. A value close to 1 means future rewards
are highly valued, while a value close to 0 means immediate rewards are
prioritized.

11. What is the concept of policy in Reinforcement Learning?


Answer: A policy in Reinforcement Learning is a strategy or function that
dictates the actions an agent should take in each state to maximize
cumulative reward.

12. What is the difference between model-based and model-free


reinforcement learning?
Answer: Model-based reinforcement learning involves building a model of
the environment, while model-free reinforcement learning learns directly
from interactions with the environment without modeling it.

13. What is Supervised Fine-Tuning?


Answer: Supervised fine-tuning involves adjusting a pre-trained model’s
parameters on a specific task using labeled data to improve performance on
the new task.

14. What is an optimizer in deep learning?


Answer: An optimizer is an algorithm used to minimize the loss function by
adjusting the weights of the model during training, such as Gradient Descent
or Adam.

15. What is dropout in deep learning?


Answer: Dropout is a regularization technique used in neural networks to
prevent overfitting by randomly setting a fraction of the input units to zero
during training.

16. What is Deep Learning?


Answer: Deep Learning involves neural networks with many layers,
enabling automatic feature extraction from data, often used in computer
vision, NLP, and speech recognition.

17. What is the role of activation functions in neural networks?


Answer: Activation functions introduce non-linearity into the model,
allowing it to learn complex relationships in the data.

18. What is a Convolutional Neural Network (CNN)?


Answer: A CNN is a deep learning architecture primarily used for image
classification and computer vision tasks, involving layers like convolution,
pooling, and fully connected layers.

19. What is a Recurrent Neural Network (RNN)?


Answer: RNNs are designed for sequential data like time series or text.
They have loops in their architecture, allowing information to persist from
previous time steps.

20. What is the difference between Batch Gradient Descent and


Stochastic Gradient Descent?
Answer: Batch Gradient Descent computes the gradient using the entire
dataset, while Stochastic Gradient Descent computes the gradient based on a
single sample.
Here is a curated list of the must-learn questions and their answers from the
5 units based on the key concepts of Machine Learning. These questions
cover the fundamental aspects you should focus on:

Unit 1: Introduction to Machine Learning

1. What is Machine Learning?


Answer: Machine Learning is a field of artificial intelligence that allows
machines to learn from data, identify patterns, and make decisions without
being explicitly programmed.

2. What are the types of Machine Learning?


Answer: The three main types are Supervised Learning, Unsupervised
Learning, and Reinforcement Learning.

3. What is Supervised Learning?


Answer: Supervised Learning is a type of machine learning where the model
is trained on labeled data, meaning the correct output is known for each
training example.

4. What is the primary goal of Machine Learning?


Answer: The primary goal is to enable computers to automatically improve
their performance from experience (data) without human intervention.

5. What is Overfitting in Machine Learning?


Answer: Overfitting occurs when a model learns the details and noise in the
training data to the extent that it negatively impacts its performance on new
data.
Unit 2: Supervised Learning

1. What is Linear Regression?


Answer: Linear Regression is a supervised learning algorithm used to model
the relationship between a dependent variable and one or more independent
variables by fitting a linear equation.

2. What is Logistic Regression?


Answer: Logistic Regression is a classification algorithm that predicts the
probability of a binary outcome by modeling the relationship between input
features and the probability using the sigmoid function.

3. What is a Decision Tree?


Answer: A Decision Tree is a model that makes decisions based on splitting
data into subsets using feature values, based on criteria like Gini Index or
Information Gain.

4. What is Random Forest?


Answer: Random Forest is an ensemble learning method that builds
multiple decision trees and merges them to improve accuracy and reduce
overfitting.

5. What is the difference between Classification and Regression?


Answer: Classification is used for predicting categorical outcomes, while
regression is used for predicting continuous numerical values.
Unit 3: Unsupervised Learning

1. What is Clustering in Machine Learning?


Answer: Clustering is an unsupervised learning technique that groups
similar data points together based on certain features, without any predefined
labels.

2. What is K-Means Clustering?


Answer: K-Means is an iterative clustering algorithm that partitions data
into K clusters by minimizing the variance within each cluster.

3. What is Dimensionality Reduction?


Answer: Dimensionality reduction is the process of reducing the number of
input variables in a dataset, often using techniques like PCA, to simplify the
model and improve computational efficiency.

4. What is Principal Component Analysis (PCA)?


Answer: PCA is a technique used to reduce the dimensionality of data by
transforming the original variables into a smaller set of uncorrelated
variables called principal components.

5. What is the Silhouette Score?


Answer: The Silhouette Score measures how similar each point is to its own
cluster compared to other clusters. A higher score indicates better-defined
clusters.
Unit 4: Probabilistic Graphical Models

1. What are Probabilistic Graphical Models (PGMs)?


Answer: PGMs are a family of models that represent complex distributions
through graphs, where nodes represent variables, and edges represent
probabilistic dependencies between them.

2. What is a Bayesian Network?


Answer: A Bayesian Network is a directed acyclic graph (DAG) that
represents a set of variables and their conditional dependencies via
probability distributions.

3. What is Naive Bayes?


Answer: Naive Bayes is a probabilistic classifier based on Bayes' theorem.
It assumes independence between the features and calculates the probability
of different classes based on input features.

4. What is Conditional Independence in PGMs?


Answer: Conditional independence in PGMs means that two variables are
independent given the knowledge of a third variable.

5. What is a Hidden Markov Model (HMM)?


Answer: An HMM is a statistical model where the system being modeled is
assumed to be a Markov process with hidden states. It is widely used in
speech recognition and time-series analysis.
Unit 5: Advanced Learning

1. What is Monte Carlo Sampling?


Answer: Monte Carlo Sampling is a method used to approximate complex
integrals or distributions by generating random samples and using statistical
analysis to estimate the results.

2. What is Reinforcement Learning?


Answer: Reinforcement Learning is a type of machine learning where an
agent learns to make decisions by interacting with an environment and
receiving feedback based on its actions.

3. What is the exploration-exploitation trade-off in Reinforcement


Learning?
Answer: The exploration-exploitation trade-off involves balancing between
exploring new actions to gather more information (exploration) and
choosing known actions that maximize the reward (exploitation).

4. What is Q-Learning?
Answer: Q-Learning is a reinforcement learning algorithm that learns an
action-value function, which gives the expected utility of taking a particular
action in a given state.

5. What is Deep Q-Learning?


Answer: Deep Q-Learning combines Q-Learning with deep neural
networks, where the Q-table is replaced by a neural network to approximate
the Q-values for large state spaces.

You might also like