Assignment 2- Key Answers Spring 24-25
Assignment 2- Key Answers Spring 24-25
KCST
ID:
Assignment 1[ 20 marks]:
Q1. [1 mark] What are the main types of data used in machine learning, and why is it important
to distinguish between them?
Answer: The three main types of data are structured, unstructured, and semi-structured. Structured data is
organized, like tables; unstructured data includes images or text; semi-structured data has elements of both.
Distinguishing between them is important because each requires different preprocessing methods to extract
meaningful information.
Q2. [1 mark] What is the difference between Training and Testing in Machine Learning?
Answer:
Training is the process where a model learns from a dataset by adjusting its parameters based on labeled
data. During this phase, the model improves its performance based on feedback from the data.
Testing, on the other hand, is evaluating the model's performance using a separate dataset (testing set) that
the model has not seen before. The goal is to assess how well the model generalizes to new, unseen data.
Key Difference: Training is about learning patterns from data, while testing evaluates how well the model
performs on new, unseen data.
Q3. [1 mark] You are working with a dataset where a feature represents test scores ranging from 50 to
100, and you need to scale the data using Z-Score Scaling (also known as standardization). Given that the
mean of the feature is 75, and the standard deviation is 10, calculate the scaled (Z-score) value for a test
score of 60.
Equation:
The Z-score formula is:
Where:
Answer:
Substitute the values into the formula:
Q4. [4 mark] Write a description of the following pictures.
# Picture Your Description
1
4
Q5. [1 mark] Explain the difference Between Machine Learning and Deep Learning?
Machine Learning: A subset of AI that focuses on building algorithms to analyze data, learn from
it, and make predictions. It often uses structured data and requires feature engineering.
Deep Learning: A subset of machine learning that uses neural networks with many layers (deep
architectures). It can automatically extract features from raw data without extensive preprocessing.
Q6. [1 mark] Explain the difference between Convolution step, and pooling steps?
Convolution Step: This operation applies a filter (kernel) to the input data to extract features such as
edges, shapes, or textures.
Pooling Step: This operation reduces the spatial size of the feature map by summarizing the most
significant information (e.g., max or average pooling), making the computation more efficient and
reducing overfitting.
Q7. [1 mark] Explain with figure the similarities and the differences between the artificial neuron
and the human neuron?
Similarities:
o Both process inputs and generate outputs.
o Both involve weighted connections (synapses in human neurons, weights in artificial
neurons).
o Both aggregate multiple inputs and apply an activation (human neurons fire when the
potential exceeds a threshold, artificial neurons use activation functions).
Differences:
o Human neurons are biological and highly complex with adaptive learning mechanisms.
o Artificial neurons are mathematical constructs designed to mimic the function of biological
neurons in a simplified manner.
The loss function quantifies the error between the predicted output and the actual target values. It
guides the training process by providing feedback to adjust the weights. Examples include:
o Mean Squared Error (MSE) for regression.
o Binary Cross-Entropy for binary classification.
The optimization function minimizes the loss function by adjusting the weights and biases of the
neural network. Common optimization algorithms include:
o Gradient Descent.
o Adam Optimizer.
Pooling Step Explanation: Pooling reduces the dimensionality of the feature map by summarizing
regions (e.g., taking the maximum or average value).
Apply imputation
What will you do if you have a dataset (mean/median/mode or KNN) or
1
with missing values in the Age column? remove rows/columns with
excessive missing data.
How would you handle a dataset that Detect and correct the entries
2 contains noisy entries such as negative using statistical methods or
salary values? domain knowledge.
Step 1: Convolution
-6 -8 -4 0
-12 -14 -6 -2
-8 -8 -10 -2
-3 -2 -4 -6
Step 2: Pooling
For 2×2 pooling, split the convolution result into 2×2 regions and apply max pooling (maximum value in
each region).
Pooling Result:
-6 0
-2 -2
or
-6 -4 0
-2 -2 -2