DA
DA
4. What is ridge regression, and how does it differ from ordinary least squares regression?
5. Describe the main difference between ridge regression and lasso regression.
8. How does Linear Discriminant Analysis (LDA) differ from logistic regression?
10. Define the Perceptron Learning Algorithm and state one limitation.
1. Describe the process of linear regression using least squares. Explain how the model parameters
are estimated, and discuss how least squares minimizes the residuals. Include a brief discussion on
the assumptions of linear regression.
2. Compare and contrast ridge regression and lasso regression. Explain how each method addresses
multicollinearity and overfitting, and describe scenarios in which one might be preferred over the
other.
3. Explain Logistic Regression as a classification method. Describe how logistic regression differs from
linear regression, the interpretation of coefficients, and the role of the sigmoid function in making
predictions.
4. Describe the process of subset selection in multiple regression. What are the advantages and
limitations of this approach? Discuss forward selection, backward elimination, and stepwise selection
methods.
5. Explain Linear Discriminant Analysis (LDA) and its assumptions. Describe the goal of LDA in
classification and outline the steps for applying it to a dataset. How does it differ from Quadratic
Discriminant Analysis (QDA)?
6. Discuss the Perceptron Learning Algorithm. Provide a detailed explanation of how it updates
weights and converges to a solution. Include limitations, especially with non-linearly separable data,
and mention how it can be adapted to solve classification problems.
7. Explain multiple regression with multiple outputs. How does handling multiple output variables
differ from single-output regression, and what additional considerations are necessary for model
evaluation?
8. Describe the significance of regularization in regression models. Discuss how regularization helps
prevent overfitting, and compare the types of regularization penalties applied in ridge regression and
lasso regression.
UNIT – 02
Short Questions (2 Marks)
3. Describe one advantage and one disadvantage of using K-Nearest Neighbor (K-NN) for
classification.
5. How does K-Nearest Neighbor (K-NN) determine the class of a new data point?
1. Explain the backpropagation algorithm in detail. Describe each step involved and discuss how it is
used to train neural networks. Additionally, explain why this algorithm is computationally intensive.
2. Discuss the challenges associated with training neural networks. Include issues such as vanishing
gradients, overfitting, and computational costs, and provide some methods used to overcome these
issues.
3. Explain Support Vector Machines (SVM) for classification. Describe the concept of margin
maximization, the importance of support vectors, and how the kernel trick enables SVMs to perform
classification in higher-dimensional feature spaces.
4. Describe the role of K-Nearest Neighbor (K-NN) in image scene classification. Explain how K-NN
can be applied in this domain, including the distance metric used and the limitations of K-NN for
image classification tasks.
5. Discuss the concept of SVM for regression. How does it differ from SVM for classification? Explain
the ε-insensitive loss function and how it helps in regression.
6. Compare and contrast Neural Networks, Support Vector Machines, and K-Nearest Neighbor.
Discuss scenarios where each method might be preferable over the others, considering factors such
as dataset size, dimensionality, and computational resources.
7. Assume that the nurons have a sigmoid functions perform a forward pass and
backward pass on the network.Assume that the actual output of y =0.5 and
learning rate=1.
8. 5.Assume SVM algorithm ,find the hyperplane with maximum margin for the
following data: N=3, X1(mean)=(2,2), X2(mean)=(4,5), X3(7,4) , y1=-1 ,
y2=+1 , y3=+1
UNIT - 03
Short Questions:
1. What is unsupervised learning, and how does it differ from supervised learning?
4. What are principal components, and why are they important in data analysis?
Long Questions:
1. Describe the process of cluster analysis. What are some methods used for clustering, and
how can they be applied?
2. Explain the concept of association rules in detail. Describe support, confidence, and lift in the
context of association rule mining.
3. Discuss the Principal Component Analysis (PCA) method. How does PCA help in
dimensionality reduction?
4. Explain how a Random Forest algorithm works. What are the advantages and limitations of
using Random Forests for classification problems?
Short Questions:
Long Questions:
1. Discuss how to assess the performance of a classification algorithm using t-tests and
McNemar's test. What are the conditions for using these tests?
2. Explain Analysis of Variance (ANOVA) and its applications. How can ANOVA be used to
analyze differences across multiple groups?
Short Questions:
1. Define big data. What are the characteristics of big data (5 Vs)?
2. What are some of the main challenges associated with big data analytics?
Long Questions:
1. Describe the challenges of big data in terms of storage, processing, and analysis. How do
these challenges impact businesses and analytics processes?
2. Discuss various tools and technologies used for big data analytics. How do they help address
the volume, velocity, and variety of big data?
UNIT – 04
Short Questions (1-2 marks each)
a. Define simple linear regression and give an example of where it can be applied.
b. Explain the concept of the p-value in the context of a linear regression coefficient.
c. How does multiple linear regression differ from simple linear regression?
d. List two assumptions of multiple linear regression.
e. In logistic regression, why is the sigmoid function used?
f. Describe a scenario where logistic regression would be more appropriate than linear
regression.
g. Explain the goal of Linear Discriminant Analysis.
h. How does LDA handle dimensionality reduction?
i. Why might one choose ridge regression over simple linear regression?
j. What is the primary role of the regularization parameter in ridge regression?
k. Briefly describe the difference between cross-validation and bootstrapping.
l. Why is cross-validation important in model evaluation?
m. What is the main difference between a classification tree and a regression tree?
n. Describe the process of "pruning" in decision trees.
o. Name two distance metrics that can be used in K-NN classification.
p. What is the goal of Principal Component Analysis?
q. Explain the importance of eigenvalues in PCA.
r. What is the purpose of the “elbow method” in K-means clustering?
s. How does K-means cluster handle categorical data?