0% found this document useful (0 votes)
30 views

ML Question Bank-1

Uploaded by

vk6859190
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views

ML Question Bank-1

Uploaded by

vk6859190
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

MACHINE LEARNING QUESTION BANK

MOHAMED YOUSUFF (AP/CSE, MITS)

Contents
1 UNIT-I MACHINE LEARNING PRELIMINARIES 1
1.1 1 Mark Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 10 Mark Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

2 UNIT II UNSUPERVISED LEARNING 3


2.1 1 Mark Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.2 10 Mark Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

3 UNIT III SUPERVISED LEARNING 5


3.1 1 Mark Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
3.2 10 Mark Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

4 UNIT III SUPERVISED LEARNING 7


4.1 1 Mark Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
4.2 10 Mark Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

5 UNIT V ADVANCED TOPICS IN MACHINE LEARNING 8


5.1 1 Mark Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
5.2 10 Mark Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1 UNIT-I MACHINE LEARNING PRELIMINARIES

1 UNIT-I MACHINE LEARNING PRELIMINARIES


1.1 1 Mark Questions
1. How is the addition of feature vectors u = [1, 2, 3] and v = [4, 5, 6] used in machine
learning?
Answer: The result is [5, 7, 9]. In machine learning, this can represent combining feature
values from two different data points. For instance, in a recommendation system, it might
combine the attributes of two users or two items.
2. Show how matrix multiplication can
 be applied when multiplying a feature matrix
1 2 5 6
A= by a weight matrix B = in a neural network layer.
3 4 7 8
Answer: Matrix multiplication involves multiplying the rows of matrix A by the columns of
matrix B. For this example:

1∗5+2∗7 1∗6+2∗8
   
19 22
AX B = =
3∗5+4∗7 3∗6+4∗8 43 50

In machine learning, this operation could represent passing a set of features through a layer
of a neural network by applying learned weights.
3. How do you scale the feature vector v = [1, 2, 3] by 4, and why is scaling important in
machine learning?
Answer: To scale the vector, multiply each element by 4: [4, 8, 12]. In machine learning,
scaling vectors is common when adjusting the magnitude of features, such as in feature
normalization or standardization.

4. What is the key limitation of traditional AI compared to machine learning?


Answer: Traditional AI is limited by its reliance on manually created rules, making it
unsuitable for dynamic or complex data environments where exhaustive programming would
be impractical.

5. Define a random variable and explain its use in machine learning.


Answer: A random variable is a variable that represents outcomes of a random phenomenon.
In machine learning, random variables can represent data points or features in classification
and regression problems.
6. What is a probability distribution, and how is it applied in machine learning?
Answer: A probability distribution defines how probabilities are distributed over the values
of a random variable. It is used in machine learning to model uncertainties and predict
outcomes.
7. How does the probability density function (PDF) differ from the probability mass
function (PMF)?

1 Mohamed Yousuff (AP/CSE, MITS)


1.2 10 Mark Questions 1 UNIT-I MACHINE LEARNING PRELIMINARIES

Answer: The PDF applies to continuous random variables and describes the relative likelihood
of the variable taking on a given value, while the PMF applies to discrete random variables.
Answer: Descriptive statistics summarize data using measures such as the mean and standard
deviation. These help in understanding data distributions.

8. Explain the concept of skewness in a dataset and its importance.


Answer: Skewness measures the asymmetry of a data distribution. Positive skewness means
a long right tail, while negative skewness means a long left tail, affecting how a model fits
the data.
9. What is the purpose of inferential statistics in machine learning?
Answer: Inferential statistics allow us to make predictions about a population based on a
sample, which is critical for generalizing machine learning models.
10. What is the significance of the correlation coefficient in machine learning?
Answer: The correlation coefficient measures the strength and direction of the linear rela-
tionship between two variables, helping in feature selection.
11. What are labeled and unlabeled samples in machine learning?
Answer: Labeled samples are data points that include both input features and the corre-
sponding target values. Unlabeled samples only contain input features without output labels.
Supervised learning uses labeled data, while unsupervised learning uses unlabeled data.
12. Explain multiclass classification.
Answer: Multiclass classification is a type of classification problem where the goal is to
assign data points to one of three or more classes. An example is classifying an iris flower as
Setosa, Versicolor, or Virginica in the Iris dataset.
13. What is regression in machine learning?
Answer: Regression is a supervised learning task that predicts a continuous output (numerical
value) based on input features. Unlike classification, which predicts categories, regression
predicts values such as house prices or temperatures.

1.2 10 Mark Questions


1. Apply the concepts of linear algebra in the context of machine learning and explain their
usage with relevant examples and applications.
2. Apply the concepts of statistics in the context of machine learning and explain their usage
with relevant examples and applications.
3. Apply the concepts of probability in the context of machine learning and explain their usage
with relevant examples and applications.
4. Explain in detail the various types and sub-categories of machine learning, providing appro-
priate examples, applications and illustrations.

2 Mohamed Yousuff (AP/CSE, MITS)


2 UNIT II UNSUPERVISED LEARNING

5. Explain the different types of data distributions, providing appropriate examples and graphs,
and illustrate the machine learning techniques most suitable for each distribution.
6. Develop code to visualize both noise and outliers in the data, and explain the types, effects,
and strategies for handling them.

2 UNIT II UNSUPERVISED LEARNING


2.1 1 Mark Questions
1. What is the primary objective of the K-means algorithm?
Answer: The primary objective of K-means is to minimize the sum of squared distances
between data points and the centroid of the assigned cluster.
2. Why is it important to update centroids in K-means clustering?
Answer: Updating centroids is important because it refines the cluster positions to better fit
the data points.
3. Find the key difference between K-means and K-medoids clustering methods.
Answer: The key difference is that K-means uses centroids (average points), while K-medoids
uses actual data points (medoids) as cluster centers.

4. What is the significance of membership values in Fuzzy C-means?


Answer: Membership values represent the degree to which a data point belongs to each
cluster, allowing for overlapping clusters.

5. Select the type of data that Fuzzy C-means is particularly useful for.
Answer: Fuzzy C-means is useful for data where boundaries between clusters are not clearly
defined.

6. Label the two key parameters in DBSCAN.


Answer: The two key parameters in DBSCAN are epsilon (radius for neighborhood) and
minPts (minimum points to form a cluster).

7. Why is DBSCAN effective at handling outliers?


Answer: DBSCAN is effective at handling outliers because it labels points as noise if they
are not part of any dense region.

8. Why is hierarchical clustering used for small datasets?


Answer: Hierarchical clustering is computationally expensive and more suitable for small
datasets, where the hierarchy of clusters can be visualized easily.
9. Why is anomaly detection important in real-world applications?
Answer: Anomaly detection is important for identifying rare but critical events, such as
security breaches, system failures, or fraud, that could have severe consequences.

3 Mohamed Yousuff (AP/CSE, MITS)


2.2 10 Mark Questions 2 UNIT II UNSUPERVISED LEARNING

10. How does a Self-Organizing Map learn from data?


Answer: A SOM learns by updating the weights of the nodes in the grid based on the
distance to the input data points, with neighboring nodes also updated to preserve topological
relationships.

11. What is the A-priori property?


Answer: The A-priori property states that if an itemset is frequent, all of its subsets must also
be frequent.

12. What is the key difference between FP-Growth and the A-priori algorithm?
Answer: The key difference is that FP-Growth does not generate candidate itemsets and
instead uses an FP-tree to mine frequent patterns directly, making it more efficient than
A-priori.

2.2 10 Mark Questions


1. Identify the appropriate final cluster centroids and data point assignments using the K-Means
clustering algorithm for the given dataset. Compute all necessary steps with k = 2, and list
the advantages, drawbacks, and applications of the algorithm.

Datapoints Feature 1 Feature 2


D1 - -
D2 - -
D3 - -
D4 - -
D5 - -
D6 - -

2. Utilize the DBSCAN clustering algorithm’s mathematical formulations to determine the


core, border, noise data points, and the final cluster formation for the given dataset. Compute
all necessary steps with  = 2 and minPts = 3, and list the advantages, drawbacks, and
applications of the algorithm.

Datapoints Feature 1 Feature 2


D1 - -
D2 - -
D3 - -
D4 - -
D5 - -
D6 - -

3. Apply the Agglomerative Hierarchical Clustering (ACH) algorithm using Ward’s linkage
to cluster the given dataset. Compute the necessary steps, including proximity matrices,
and depict the final dendrogram. List the advantages, drawbacks, and applications of the
algorithm.

4 Mohamed Yousuff (AP/CSE, MITS)


3 UNIT III SUPERVISED LEARNING

Datapoints Feature 1 Feature 2


D1 - -
D2 - -
D3 - -
D4 - -

4. Apply the K-Medoids clustering algorithm to determine the appropriate final medoids and
data point assignments for the given dataset. Compute all necessary steps with K = 2, and
outline the advantages, drawbacks, and applications of the algorithm.

Datapoints Feature 1 Feature 2


D1 - -
D2 - -
D3 - -
D4 - -
D5 - -
D6 - -

5. Utilize the Fuzzy C-Means algorithm to determine the final membership of each data point to
all clusters for the given dataset. Compute all necessary steps with k = 2 and a fuzzification
parameter of 2. List the advantages, drawbacks, and applications of the algorithm.

Datapoints Feature 1 Feature 2


D1 - -
D2 - -
D3 - -
D4 - -

3 UNIT III SUPERVISED LEARNING


3.1 1 Mark Questions
1. Define regression and classification.
Answer: Regression predicts continuous values, while classification predicts discrete cate-
gories.
2. What does a correlation coefficient of 1 indicate?
Answer: A correlation coefficient of 1 indicates a perfect positive linear relationship between
two variables.
3. How are eigenvalues used in regression?
Answer: Eigenvalues help analyze multicollinearity in regression by examining the condition
number of the feature matrix.

5 Mohamed Yousuff (AP/CSE, MITS)


3.2 10 Mark Questions 3 UNIT III SUPERVISED LEARNING

4. What is the objective of linear regression?


Answer: The objective of linear regression is to minimize the sum of squared differences
between observed and predicted values.
5. When is multivariate linear regression used?
Answer: Multivariate linear regression is used when there are multiple input (independent)
variables.
6. Why is the penalty term important in ridge regression?
Answer: The penalty term in ridge regression reduces overfitting by constraining the model
coefficients.
7. What is the role of the prior in Bayesian linear regression?
Answer: The prior incorporates prior knowledge or assumptions about the model parameters.
8. What is the difference between prior and posterior in Bayes’ theorem?
Answer: The prior is the initial belief about a parameter, while the posterior is the updated
belief after considering evidence.
9. Why is Naïve Bayes called "naïve"?
Answer: It assumes that all features are conditionally independent, which is often not true
in practice.
10. Where is Naïve Bayes commonly used?
Answer: Naïve Bayes is commonly used for text classification tasks, such as spam detection.
11. How does Maximum Likelihood Estimation work?
Answer: MLE aims to find the parameters that maximize the likelihood of observing the
given data.
12. Which problems can a Multi-Layer Perceptron (MLP) solve?
Answer: MLP can solve both regression and classification problems.
13. What is the kernel trick in SVM?
Answer: The kernel trick allows SVM to operate in a high-dimensional space without
explicitly transforming the data.
14. What is the role of the margin in SVM?
Answer: The margin is the distance between the decision boundary and the closest data
points, and SVM maximizes this margin.
15. What kind of output does logistic regression produce?
Answer: Logistic regression produces a probability value between 0 and 1.

3.2 10 Mark Questions


1. Develop a multivariate linear regression model to estimate the coefficients and intercept using
an example dataset.
2. Utilize Bayes’ theorem to develop a Naive Bayes classifier for a categorical dataset.

6 Mohamed Yousuff (AP/CSE, MITS)


4 UNIT III SUPERVISED LEARNING

3. Explain the working of logistic regression with an example, including parameter initialization
and updates.
4. Apply hard margin SVM and explain the kernel trick with an example dataset.
5. Develop a soft margin SVM classifier with slack variables for an example dataset.

6. Build a logistic regression model on multivariate data, calculate the loss and update the
weights.

7. Construct a decision tree for a classification task using entropy and information gain, detailing
each step of the process.

4 UNIT III SUPERVISED LEARNING


4.1 1 Mark Questions
1. Why is clustering used in data analysis?
Answer: Clustering groups data points into clusters based on similarity, aiding in pattern
recognition and data segmentation.
2. What does the number of clusters represent in clustering?
Answer: The number of clusters refers to the number of groups or partitions into which the
dataset is divided based on similarity.

3. Which measures are used to evaluate clustering indices?


Answer: Clustering indices evaluate the quality of clusters, such as cohesion within clusters
and separation between clusters.
4. Define accuracy in classification.
Answer: Accuracy is the proportion of correctly classified instances among all instances in
the dataset.
5. What is a confusion matrix used for?
Answer: A confusion matrix is a table summarizing the performance of a classification
model by showing true positives, true negatives, false positives, and false negatives.

6. What is the difference between false positive and false negative?


Answer: A false positive occurs when a negative instance is incorrectly classified as positive,
while a false negative occurs when a positive instance is classified as negative.

7. Which metric combines precision and recall into a single value?


Answer: The F-score is the harmonic mean of precision and recall, balancing the two metrics
for classification performance evaluation.
8. How does a ROC curve evaluate classification performance?
Answer: A ROC curve evaluates the performance of a classification model by plotting the
true positive rate against the false positive rate at various thresholds.

7 Mohamed Yousuff (AP/CSE, MITS)


4.2 10 Mark Questions 5 UNIT V ADVANCED TOPICS IN MACHINE LEARNING

9. What are training, testing, and validation datasets used for?


Answer: Training data is used to train the model, validation data is used for hyperparameter
tuning, and testing data evaluates the model’s performance on unseen data.

10. How does k-fold cross-validation work?


Answer: K-fold cross-validation divides the dataset into k folds, training the model on k − 1
folds and validating it on the remaining fold, iteratively.
11. What is the bias-variance dilemma in machine learning?
Answer: The bias-variance dilemma refers to the trade-off between underfitting (high bias)
and overfitting (high variance) in a model.
12. Why is regularization important in machine learning?
Answer: Regularization prevents overfitting by adding a penalty term to the loss function,
controlling the magnitude of model parameters.

4.2 10 Mark Questions


1. Analyze the elbow method and silhouette analysis for determining the optimal number of
clusters in clustering.
2. Analyze the Davies-Bouldin Index and Fowlkes-Mallows Index approaches for evaluating
clustering quality.
3. Assume TP = 45, FP = 10, TN = 40, and FN = 5, then draw the confusion matrix and compute
all classification metrics.
4. Examine leave-one-out and k-fold cross-validation methods using example data points.

5 UNIT V ADVANCED TOPICS IN MACHINE LEARNING


5.1 1 Mark Questions
1. Define Singular Value Decomposition (SVD).
Answer: Singular Value Decomposition is a matrix factorization technique that decomposes
a matrix into three components: U, Σ, and V T , where Σ contains the singular values.

2. How does Latent Semantic Analysis (LSA) work in text mining?


Answer: LSA reduces the dimensionality of the term-document matrix using SVD to identify
latent patterns and relationships between terms and documents.
3. What is the purpose of recommendation systems?
Answer: Recommendation systems suggest items to users based on their preferences and
past behavior.
4. When is matrix rank reduction useful in recommendation systems?
Answer: Matrix rank reduction is useful when reducing high-dimensional user-item matrices
into lower dimensions to uncover latent features.

8 Mohamed Yousuff (AP/CSE, MITS)


5.2 10 Mark Questions 5 UNIT V ADVANCED TOPICS IN MACHINE LEARNING

5. Where are matrix completion algorithms applied?


Answer: Matrix completion algorithms are applied in recommendation systems to predict
missing entries in user-item matrices.

6. Which properties define scale-free networks?


Answer: Scale-free networks are characterized by a power-law degree distribution, the
presence of hubs, and robustness to random failures.
7. Who are the key members in clustering scale-free networks?
Answer: Key members in clustering scale-free networks are the hubs and central nodes that
connect different clusters or communities.
8. Why is graph spectral analysis important in network clustering?
Answer: Graph spectral analysis uses the eigenvalues and eigenvectors of matrices (e.g.,
Laplacian) to reveal structural properties and identify clusters in a network.

9. How are cluster centers identified in social networks?


Answer: Cluster centers are identified using metrics like centrality measures or by analyzing
the nodes with the highest connectivity within a cluster.

10. What are the applications of scale-free networks?


Answer: Scale-free networks are applied in social media analysis, biological networks, and
internet topology studies.
11. Define the term "clustering techniques" in the context of scale-free networks.
Answer: Clustering techniques in scale-free networks are methods used to group nodes
based on connectivity patterns and structural properties.

5.2 10 Mark Questions


1. Apply Singular Value Decomposition to reduce the dimensionality of a term-document matrix
and give its significance in text mining.
2. Develop a text mining approach using Latent Semantic Analysis to uncover hidden patterns
in example textual data.
3. Build a topic modeling framework using Latent Dirichlet Allocation, detailing its key steps
and assumptions.
4. Develop a recommendation system using matrix rank reduction technique and matrix com-
pletion algorithm.
5. Utilize the properties of scale-free networks to identify communities in social networks using
clustering techniques.

May your efforts be rewarded with great success!!!

9 Mohamed Yousuff (AP/CSE, MITS)

You might also like