ml_exam_answers
ml_exam_answers
Answers
Q4. What is Cross-Validation and why is it
used?
Definition:
Cross-Validation is a model evaluation technique used to assess how well a machine learning model performs on unseen
(test) data. It helps us check if the model is generalizing properly or if it is overfitting/underfitting.
Instead of using just one train-test split, cross-validation splits the data into multiple parts and tests the model on each
part. This gives a more reliable estimate of model performance.
Cross-validation helps ensure the model is not just memorizing the training data.
Instead of relying on one random test set, it averages results across multiple validations.
All the data gets used for both training and testing, just in different rounds.
It helps choose the best model or parameters based on consistent performance across folds.
3. This process is repeated K times, with each fold used once for testing.
Types of Cross-Validation:
1. K-Fold Cross-Validation – Most common type.
2. Stratified K-Fold – Ensures same class proportion in each fold (good for classification).
Advantages:
Reduces model evaluation bias.
Provides a better estimate of model performance.
Useful in parameter tuning and model comparison.
Disadvantages:
More computation time (as model is trained multiple times).
Can be slower for large datasets.
Diagram:
K-Fold Cross-Validation Flow (Just write 'Diagram' in answer sheet)
Conclusion:
Cross-validation is an essential tool in machine learning for testing model performance and avoiding overfitting. It
ensures that the model will work well on new, unseen data and helps in selecting the best model for the problem.
y = a + bx
Where:
y = predicted value
x = input value
a = y-intercept
Then,
a = (Σy - bΣx) / n
Example:
Let's consider a small dataset: | x | y | |---|---| | 1 | 2 | | 2 | 3 | | 4 | 5 |
Now calculate:
Σx = 1 + 2 + 4 = 7
Σy = 2 + 3 + 5 = 10
n=3
y = 1 + 1x = x + 1
Estimate a Value:
Let's estimate y when x = 3:
y=x+1=3+1=4
Conclusion:
Least Squares Regression is a foundational method for modeling linear relationships between variables. It helps in
making predictions based on past data using a simple linear formula.
Example: Predicting both price and demand of a product using features like cost, marketing spend, etc.
2. Regularized Regression:
Regularization is used to prevent overfitting in regression models.
It adds a penalty term to the loss function to shrink the model coefficients.
Lasso Regression (L1): Penalizes absolute values and can eliminate features.
Regularized regression ensures better generalization on unseen data by reducing model complexity.
Conclusion:
Multivariate regression deals with multiple outputs, while regularized regression controls overfitting using penalties.
Both are extensions of traditional linear regression to handle real-world challenges better.
The model fits a line or plane through the input data by minimizing the sum of squared differences between predicted and
actual values.
Steps:
1. Encode class labels: For binary classification, assign y = 0 for class A, y = 1 for class B.
Example:
Dataset with hours studied (x) and exam result (pass=1, fail=0). Use linear regression to fit a line and classify students.
Conclusion:
Least Squares Regression can be adapted for classification but is not optimal. It's better suited for regression tasks. For
classification, logistic regression or other methods are more accurate.
3. Difficult visualization.
These components capture the maximum variance in the data using fewer dimensions.
Reducing overfitting
Steps in PCA:
1. Standardize the data (mean = 0).
Example:
If you have 10 features and PCA finds that 2 components explain 95% of the variance, you can reduce the dataset from
10D to 2D.
Diagram:
PCA transformation steps and variance plot
Conclusion:
PCA helps overcome the curse of dimensionality by reducing unnecessary features while keeping the important patterns.
Dataset:
X1 X2
2 4
X1 X2
3 6
4 8
5 10
Mean of X1 = 3.5
Mean of X2 = 7
Standardized values:
X1' X2'
-1.5 -3
-0.5 -1
0.5 1
1.5 3
Covariance matrix:
X1 X2
X1 1.67 3.33
X2 3.33 6.67
Diagram:
Showing data before and after PCA projection
Conclusion:
PCA reduces the data from 2D to 1D while preserving most of the information.
A = U × Σ × Vᵀ
Where:
A = original matrix
Purpose:
To reduce data dimensions
Applications of SVD:
1. Dimensionality Reduction
Similar to PCA, SVD helps compress large datasets into fewer features.
2. Image Compression
Used in text analysis to find hidden relationships between words and documents.
4. Recommender Systems
Example:
Given a 3×2 matrix A:
|2|4||1|3||0|0|
Using SVD, we factor A into U, Σ, and Vᵀ. We can use just the largest singular values to reconstruct A with minimal error.
Diagram:
SVD matrix breakdown and compression
Conclusion:
SVD is a powerful tool used across many fields to simplify complex data, reduce storage, and improve understanding of
patterns.
Structure of Perceptron:
1. Inputs (x₁, x₂, ..., xₙ)
3. Bias (b)
b = bias
Example:
Let x₁ = 1, x₂ = 0 Weights: w₁ = 2, w₂ = -1 Bias: b = 1
Weighted sum = (2×1) + (-1×0) + 1 = 3 Activation: Step function gives output = 1 (class A)
Diagram:
Perceptron model showing input, weights, summation, bias, and activation
Conclusion:
The perceptron with bias is a fundamental building block in neural networks. It can learn simple decision boundaries and
is the basis for more advanced models.
Bias: b = 0
Training Steps:
1. Input (0,0) → output = 0 → target = 0 → no weight change
Final Weights:
w1 = 0
w2 = 1
b=1
Conclusion:
Using the perceptron learning algorithm, the OR gate can be implemented successfully.
Diagram:
Perceptron architecture with two inputs and OR logic output
Steps:
1. Forward Pass:
2. Compute Error:
3. Backward Pass:
4. Update Weights:
5. Repeat:
Mathematics Behind:
Error function: E = ½ (target – output)²
Diagram:
Flowchart showing forward pass, error calculation, backpropagation, and weight update
Conclusion:
Backpropagation is essential for training deep neural networks by optimizing weights using error gradients.
ANN Architecture:
1. Input Layer:
2. Hidden Layers:
One or more layers between input and output. They perform computations using weights and activation functions.
3. Output Layer:
5. Activation Function:
Working:
Inputs are multiplied by weights, summed, and passed through activation.
Diagram:
ANN with input, hidden, and output layers
Conclusion:
ANNs are powerful tools used in image recognition, language translation, and more. Their layered structure allows them
to learn complex patterns.
Q16. Explain Delta Learning Rule (LMS /
Widrow-Hoff) with training process
What is Delta Rule?
Also known as Least Mean Square (LMS) or Widrow-Hoff Rule
Formula:
Δw = η × (t – o) × x
Where:
η = learning rate
t = target output
o = actual output
x = input
Steps in Training:
1. Initialize weights and bias.
Example:
If:
η = 0.1
If two neurons are active at the same time, their connection is strengthened.
Learning Rule:
Δw = η × x × y
Where:
x = input
y = output
η = learning rate
Implementing OR Gate:
OR Truth Table:
x1 x2 Output
0 0 0
0 1 1
1 0 1
1 1 1
Δw1 = η × x1 × y
Δw2 = η × x2 × y
Update weights accordingly:
Final weights: w1 = 2, w2 = 2
Diagram:
Hebbian network showing input, weight connections, and output
Conclusion:
Hebbian learning is a biological-inspired unsupervised learning rule that strengthens connections based on co-activation
of neurons.
Limitation: Non-differentiable
4. Ramp Function
Linearly increases within a range
Conclusion:
Activation functions are crucial for learning and generalization in neural networks.
Diagram:
Graphs of all four activation functions
Range: (0, 1)
Smooth curve
2. Tanh Function
Formula: f(x) = (ex – e–x) / (ex + e–x)
Range: (–1, 1)
Zero centered
Range: [0, ∞)
4. Leaky ReLU
Formula: f(x) = x if x > 0 else 0.01x
Range: (–∞, ∞)
5. Softmax
Formula: f(xi) = exi / ∑ exj
Conclusion:
Each activation function has its own use case depending on the task.
Diagram:
Graphs of sigmoid, tanh, ReLU, leaky ReLU, and softmax
Calculate the probability of each data point belonging to a cluster using current parameters.
2. M-Step (Maximization):
Update the parameters (mean, variance, etc.) to maximize the likelihood using the probabilities.
Applications:
Clustering
Image segmentation
Diagram:
Flowchart showing E-step and M-step iteratively updating cluster parameters
Conclusion:
EM helps find hidden structures (clusters) in data by alternating between estimating probabilities and optimizing
parameters.
Steps:
1. Find Eigenvalues (λ) by solving: det(A – λI) = 0
Example:
Matrix A = | 4 1 | | 2 3 |
Conclusion:
Diagonalization simplifies matrix operations like powers and exponentials.
Steps to Find:
1. Solve det(A – λI) = 0 → gives eigenvalues
Example:
Matrix A = | 2 0 | | 0 3 |
Applications:
PCA (Dimensionality Reduction)
Stability analysis
Vibration modes
Diagram:
Shows how eigenvectors don't change direction after transformation
Q23. What is the Trace of a matrix?
Mention its properties
Definition:
The trace of a matrix is the sum of diagonal elements.
Properties:
1. Trace(A + B) = Trace(A) + Trace(B)
3. Trace(AB) = Trace(BA)
4. Trace(Aᵀ) = Trace(A)
Example:
Matrix A = | 1 2 | | 3 4 | Trace(A) = 1 + 4 = 5
Applications:
Used in eigenvalue sum
Characteristic equation
Optimization
Formulas:
Accuracy = (TP + TN) / (TP + TN + FP + FN)
Example:
TP = 70, TN = 50, FP = 10, FN = 20
F1-Score ≈ 82.35%
Conclusion:
These metrics help evaluate classification model performance.
2. Precision:
3. Recall:
How many actual positives are correctly predicted
4. F1-Score:
5. ROC-AUC:
6. Confusion Matrix:
Conclusion:
Multiple metrics should be considered to evaluate a model's true performance.
McCulloch-Pitts Model:
Cannot directly implement XOR using single-layer
Logic Construction:
XOR = (x1 AND NOT x2) OR (NOT x1 AND x2) Use multiple MCP neurons to implement this logic
Diagram:
Multi-layer MCP network showing logic gate connections for XOR
Conclusion:
XOR needs multi-layer McCulloch-Pitts model due to non-linear separability.
Logic:
ANDNOT(x1, x2) = x1 AND (NOT x2)
Input x2 = –1
Threshold = 1
Neuron Output:
When x1 = 1 and x2 = 0 → Net = 1 → Output = 1
Diagram:
MCP neuron with weights (+1, –1) and threshold 1
Conclusion:
ANDNOT can be implemented using a single MCP neuron with suitable weights and threshold.