0% found this document useful (0 votes)
2 views

Exam Preparation- Machine Learning Applications

The document contains a series of questions related to machine learning concepts, including linear regression, logistic regression, decision trees, support vector machines, and clustering techniques. It covers definitions, conceptual understanding, and problem-solving aspects, along with a checklist of important topics in machine learning. The questions range from short definitions to detailed explanations and comparisons of various algorithms and techniques.

Uploaded by

nsahil0454
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Exam Preparation- Machine Learning Applications

The document contains a series of questions related to machine learning concepts, including linear regression, logistic regression, decision trees, support vector machines, and clustering techniques. It covers definitions, conceptual understanding, and problem-solving aspects, along with a checklist of important topics in machine learning. The questions range from short definitions to detailed explanations and comparisons of various algorithms and techniques.

Uploaded by

nsahil0454
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

1-Mark Questions (Short Answer/Definitions)

1. What is the primary objective of Linear Regression?


2. State one key difference between simple linear regression and multivariate linear
regression.
3. In Logistic Regression, what is the purpose of the sigmoid function?
4. Define 'overfitting' in the context of machine learning.
5. What is the role of regularization in machine learning?
6. Name two common types of regularization techniques.
7. What is the main goal of 'pruning' a decision tree?
8. Give one example of a splitting criterion used in classification decision trees.
9. What is the core idea behind Support Vector Machines (SVMs)?
10. What does 'boosting' aim to achieve in machine learning?
11. Name one common evaluation metric for regression models.
12. What is the difference between supervised and unsupervised learning?
13. What is the primary objective of K-Means clustering?
14. In PCA, what do 'principal components' represent?
15. What is 'hierarchical clustering' primarily used for?
16. Name one technique for handling missing values in a dataset.
17. Why is feature scaling often necessary before training a machine learning model?
18. What is the purpose of a 'test set' in machine learning?
19. What is 'cross-validation' used for?
20. What is a 'decision boundary' in the context of classification?
21. What is the primary goal of a Decision Tree algorithm for classification?
22. Name one common metric used to determine the best split in a classification Decision
Tree.
23. What is a "leaf node" in a Decision Tree?
24. Define 'pruning' in the context of Decision Trees.
25. What is the main problem that pruning aims to solve?
26. What is the main objective of an SVM in a binary classification problem?
27. Define the 'hyperplane' in the context of SVMs.
28. What are 'support vectors'?
29. What is the purpose of the 'kernel trick' in SVMs?
30. Name one common kernel function used in SVMs.

5-Mark Questions (Conceptual Understanding/Short Explanation)

1. Explain the concept of 'Least Squares' in the context of Linear Regression.


2. Describe the difference between L1 and L2 regularization. When might you prefer
one over the other?
3. How does Logistic Regression predict a binary outcome, even though it's a regression
algorithm? Explain the role of the sigmoid function.
4. Discuss two practical aspects of implementation that are crucial before training any
machine learning model.
5. Explain the process of how a Decision Tree makes a prediction for a given input.
6. What is 'pruning' in Decision Trees, and why is it important?
7. Briefly explain the concept of a 'hyperplane' and 'margin' in Support Vector
Machines.
8. Compare and contrast 'bagging' and 'boosting' ensemble methods.
9. Describe the K-Means clustering algorithm. What are its main steps?
10. Explain the intuition behind Principal Component Analysis (PCA) as a dimensionality
reduction technique.
11. Discuss the importance of 'feature selection' in developing a machine learning
algorithm.
12. How would you approach 'debugging' a machine learning model that is performing
poorly?
13. What is the 'bias-variance trade-off'? How does it relate to underfitting and
overfitting?
14. Explain the concept of 'one-hot encoding' and why it's used.
15. Outline the basic steps involved in setting up a supervised learning problem.
16. Explain how a Decision Tree makes a prediction for a new, unseen data point.
Illustrate with a simple example.
17. Describe the concept of "Information Gain" or "Gini Impurity" as a splitting criterion
in Decision Trees. How do these metrics help in building the tree?
18. What are the disadvantages of unpruned Decision Trees? Explain how pruning
addresses these issues.
19. Briefly compare and contrast Decision Trees with Linear Regression in terms of their
underlying model assumptions and types of problems they can solve.
20. Explain the difference between 'pre-pruning' (early stopping) and 'post-pruning' in
Decision Trees.
21. Explain the concept of 'margin' in SVMs. Why is maximizing the margin important?
22. How does SVM handle non-linearly separable data? Explain the role of the 'kernel
trick'.
23. Discuss the difference between 'hard margin SVM' and 'soft margin SVM'. When
would you use a soft margin SVM?
24. Explain the intuition behind the 'regularization parameter' (C) in SVMs. How does it
affect the trade-off between maximizing the margin and minimizing classification
errors?
25. Briefly compare SVMs with Logistic Regression for classification tasks.

15-Mark Questions (Detailed Explanation/Problem Solving with Sub-parts)

1. Linear Regression and Diagnostics


o a) Derive the cost function for Simple Linear Regression and briefly explain
its objective. (5 Marks)
o b) Explain the assumptions of Linear Regression. Why is it important to check
these assumptions before interpreting the model results? (5 Marks)
o c) Describe two common evaluation metrics for regression models and explain
what each metric indicates about the model's performance. (5 Marks)
2. Logistic Regression and Regularization
o a) Explain how Logistic Regression can be used for multi-class classification.
(Hint: One-vs-Rest strategy) (5 Marks)
o b) Define L1 and L2 regularization. Explain how each penalizes the model's
coefficients and their respective effects on feature selection and model
complexity. (6 Marks)
o c) Why is regularization particularly important when dealing with high-
dimensional datasets? (4 Marks)
3. Decision Trees and Ensemble Methods
o a) Describe the algorithm for building a Decision Tree for a classification
problem. Include details on how splits are chosen (e.g., using Gini impurity or
Entropy). (7 Marks)
o b) What is the problem of overfitting in Decision Trees? Explain two different
pruning techniques to mitigate this issue. (8 Marks)
4. Support Vector Machines (SVMs)
o a) Explain the core principle of Support Vector Machines, focusing on the
concepts of hyperplane and margin. Illustrate with a simple 2D example. (8
Marks)
o b) How does the 'kernel trick' allow SVMs to handle non-linearly separable
data? Give an example of a common kernel function. (7 Marks)
o Core Concepts and Kernel Trick
o a) Explain the fundamental principle behind Support Vector Machines for
binary classification. Clearly define and illustrate with a simple 2D example
the concepts of 'hyperplane', 'margin', and 'support vectors'. (8 Marks)
o b) What is the "kernel trick"? Elaborate on how it allows SVMs to effectively
classify data that is not linearly separable in the original feature space. Provide
an example of a commonly used kernel function and describe its effect. (7
Marks)
o Soft Margin SVM and Hyperparameter Tuning
o a) In real-world datasets, data is often not perfectly linearly separable. Explain
the concept of 'soft margin SVM' and how it addresses this challenge. Discuss
the role of slack variables. (7 Marks)
o b) The performance of an SVM heavily depends on its hyperparameters,
particularly the regularization parameter 'C' and kernel parameters (e.g.,
gamma for RBF kernel). Explain how these parameters influence the SVM
model's behavior and discuss methods for effectively tuning them. (8 Marks)
5. Unsupervised Learning and Practical Considerations
o a) Explain the K-Means clustering algorithm, including the steps involved and
how the clusters are formed. What is a common challenge with K-Means? (7
Marks)
o b) Describe the concept of Principal Component Analysis (PCA) as a
dimensionality reduction technique. How can PCA be beneficial in machine
learning workflows? (8 Marks)
6. Developing Learning Algorithms in Practice
o a) You are tasked with building a machine learning model to predict house
prices. Outline the steps you would take to set up this problem, from data
acquisition to model deployment (excluding specific model details). (8 Marks)
o b) Explain the importance of 'cross-validation' and 'hyperparameter tuning' in
the machine learning development cycle. Give an example of how you might
perform hyperparameter tuning for a Decision Tree. (7 Marks)
7. Debugging and Algorithm Selection
o a) Your machine learning model is performing poorly. Describe a systematic
approach to debugging a learning algorithm, focusing on identifying whether
the issue is related to high bias (underfitting) or high variance (overfitting). (8
Marks)
o b) Given a new machine learning task, discuss the factors you would consider
when choosing from multiple algorithms (e.g., Linear Regression, Logistic
Regression, Decision Trees, SVMs, etc.). (7 Marks)

Checklist

1. Different applications of machine learning

2. K-means clustering (important)

3. What is PCA and its use in dimensionality reduction, compared with TSNE

4. Explain different clustering techniques

5. Logistic regression vs. linear regression (difference), derivation of cost function, importance, and
its use.

6. Boosting vs. Bagging. Different boosting and bagging techniques. (AdaBoosting, XGBoosting,
Gradient Boosting), all bagging variations.

7. Supervised vs Unsupervised learning.

8. Machine learning metrics such as accuracy, precision, recall, F1-Score, TP, FP, TN, FN.

9. Confusion Matrix.

10. General concepts of machine learning, concept learning, PAC.

You might also like