0% found this document useful (0 votes)
3 views

AssignmentQuestion4Bigdata_2025

The document outlines the second assignment for a Big Data Analytics course, consisting of multiple questions covering topics such as Hadoop, Big Data characteristics, machine learning, regression techniques, clustering algorithms, and neural networks. Each question requires detailed explanations, comparisons, and diagrams where applicable. The assignment emphasizes individual work and must be submitted before the final exam.

Uploaded by

solomon
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

AssignmentQuestion4Bigdata_2025

The document outlines the second assignment for a Big Data Analytics course, consisting of multiple questions covering topics such as Hadoop, Big Data characteristics, machine learning, regression techniques, clustering algorithms, and neural networks. Each question requires detailed explanations, comparisons, and diagrams where applicable. The assignment emphasizes individual work and must be submitted before the final exam.

Uploaded by

solomon
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

Big Data Analytics Second Assignment questions:

It is an individual handwritten assignment; It should be submitted before the final exam


of the BigData Analytics Course.
_________________________________________________________________________________
1. a) Explain the differences between Hadoop 1.0 and Hadoop 2. X
b) How would you show your understanding of the tools, trends and technology in big data?
c) What is the difference between Hadoop and Traditional RDBMS?
d) Explain various V’s characteristics of Big Data with suitable example

2. a) Explain different types of Big Data Analytics and their relationship


b) Explain Hadoop 2.X Application workflow with a neat diagram
c) Explain the differences between Hadoop and Spark Ecosystems
d) Describe different stages in Bigdata analytics with a neat diagram

3. a) What is machine learning? How is machine learning different from the traditional
programming approach?
b) Explain the features of Machine learning.
c) Describe the different phases of Machine learning with a neat diagram
d) Explain the steps involved in Machine Learning to develop a model.

4. a) Explain the differences between supervised learning and unsupervised learning with
suitable example
b) Explain the major regression techniques with suitable example
c) Explain cross-validation and why it’s essential.
d) Which metrics would you use to evaluate a regression model, and why?

5. Explain the following machine learning algorithms


a) i. Polynomial Regression ii. KNN iii. SVM
b) Define overfitting and underfitting. What techniques can you use to combat them?
c) Describe the logistic function and explain its suitability for classification.
d) Explain k-fold cross-validation and why it’s preferred over a single train/test split.

6. a) What is a Confusion Matrix? Explain how accuracy, sensitivity and specificity can be
calculated using it.
b) Explain the significance and relationship between dependent and independent variables
in model development
c) Describe how decision trees split nodes based on purity measures (e.g., Gini index,
entropy).
d) How do you handle hyperparameter tuning?

7. a) Explain the differences between Lasso, Ridge and Elastic net regression techniques
b) Explain the differences between different cluster classification algorithms
c) Discuss the impact of tree depth on bias and variance.
d) What are the practical trade-offs between using random forests and gradient boosting
machines?

8. a) What is Euclidean Distance?


b) Explain the K-means clustering algorithm with a suitable example
c) Compare how bagging (e.g., Random Forest) and boosting (e.g., XGBoost, LightGBM)
reduce variance and bias differently.
d) Explain when polynomial regression might be more appropriate than linear regression.
What are the risks involved?

9 . Construct a confusion table and calculate following values to

Actual Predicted
Value Value
True False
False True
True True
False False
True False
False True
True True
False False
True True
False False
False False
True False
True False
False True
True True
False False
False True
True True
False False
True True

a) i. Accuracy ii. Misclassification iii. Precision iv. Recall v. Sensitivity vi. Specificity vii.
TRUE Positive Rate(TPR) viii. False Positive Rate(FPR)
b) How do you interpret an ROC curve and an AUC score?
c) Explain the differences between Decision Trees and Random forest
d) Describe the differences between bagging and boosting

10. a) Define a multilayer perceptron. Why is at least one hidden layer necessary for learning
more complex functions?
b) Compare and contrast convolutional layers with fully connected layers. Why do
convolutional layers generally require fewer parameters?
c) Describe the components of a single artificial neuron (perceptron). How do they relate
to biological neurons?
d) List common activation functions (e.g., sigmoid, tanh, ReLU). Explain their
mathematical form and discuss their advantages and disadvantages.

You might also like