ML Question Bank-1
ML Question Bank-1
Contents
1 UNIT-I MACHINE LEARNING PRELIMINARIES 1
1.1 1 Mark Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 10 Mark Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1∗5+2∗7 1∗6+2∗8
19 22
AX B = =
3∗5+4∗7 3∗6+4∗8 43 50
In machine learning, this operation could represent passing a set of features through a layer
of a neural network by applying learned weights.
3. How do you scale the feature vector v = [1, 2, 3] by 4, and why is scaling important in
machine learning?
Answer: To scale the vector, multiply each element by 4: [4, 8, 12]. In machine learning,
scaling vectors is common when adjusting the magnitude of features, such as in feature
normalization or standardization.
Answer: The PDF applies to continuous random variables and describes the relative likelihood
of the variable taking on a given value, while the PMF applies to discrete random variables.
Answer: Descriptive statistics summarize data using measures such as the mean and standard
deviation. These help in understanding data distributions.
5. Explain the different types of data distributions, providing appropriate examples and graphs,
and illustrate the machine learning techniques most suitable for each distribution.
6. Develop code to visualize both noise and outliers in the data, and explain the types, effects,
and strategies for handling them.
5. Select the type of data that Fuzzy C-means is particularly useful for.
Answer: Fuzzy C-means is useful for data where boundaries between clusters are not clearly
defined.
12. What is the key difference between FP-Growth and the A-priori algorithm?
Answer: The key difference is that FP-Growth does not generate candidate itemsets and
instead uses an FP-tree to mine frequent patterns directly, making it more efficient than
A-priori.
3. Apply the Agglomerative Hierarchical Clustering (ACH) algorithm using Ward’s linkage
to cluster the given dataset. Compute the necessary steps, including proximity matrices,
and depict the final dendrogram. List the advantages, drawbacks, and applications of the
algorithm.
4. Apply the K-Medoids clustering algorithm to determine the appropriate final medoids and
data point assignments for the given dataset. Compute all necessary steps with K = 2, and
outline the advantages, drawbacks, and applications of the algorithm.
5. Utilize the Fuzzy C-Means algorithm to determine the final membership of each data point to
all clusters for the given dataset. Compute all necessary steps with k = 2 and a fuzzification
parameter of 2. List the advantages, drawbacks, and applications of the algorithm.
3. Explain the working of logistic regression with an example, including parameter initialization
and updates.
4. Apply hard margin SVM and explain the kernel trick with an example dataset.
5. Develop a soft margin SVM classifier with slack variables for an example dataset.
6. Build a logistic regression model on multivariate data, calculate the loss and update the
weights.
7. Construct a decision tree for a classification task using entropy and information gain, detailing
each step of the process.