0% found this document useful (0 votes)
16 views

Unit 3 & 4 Question Bank

Uploaded by

Shritika Chandra
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views

Unit 3 & 4 Question Bank

Uploaded by

Shritika Chandra
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

MIT ADT University, Pune

MIT Art Design and Technology University


MIT School of Computing, Pune
Department of Computer Science and
Engineering

Unit-wise Question Bank


Subject – Big Data Analytics
Class – LY AIA (SEM-I)

A.Y. 2024 - 2025


MIT ADT University, Pune

Question Bank Unit-III

21BTCS038– Big Data Analytics


Question Question Bloom’s Course Outcome
No Taxonomy (e.g. CO1, CO2, etc.)
Level
1 Suppose everyone who visits a retail
website gets one promotional offer or
no promotion at all. We want to see if
making a promotional offer makes a Analyzing
CO3
difference. What statistical method
would you recommend for this
analysis?
2 Apply K-Means clustering to a
dataset of your choice and interpret Apply CO3
the results.
3 You are analyzing two normally
distributed populations, and your null
hypothesis is that the mean of the
first population is equal to the mean
of the second. Assume the Remember
CO3
significance level is set at 0.05. If the
observed p·value is 4.33e-05, what
will be your decision regarding the
null hypothesis?
4 Explain k –means algorithm Evaluate CO3
5 Using the age and height clustering
example, algebraically illustrate the
impact on the measured distance
when the height is expressed in
Understand
meters rather than centimeters. CO3
Explain why different clusters will
result depending on the choice of
units for the patient's height.
6 In the use of a categorical variable
with n possible values, explain the
following:
a. Why only n - 1 binary variables are Remember CO3
necessary
b. Why using n variables would be
problematic
7 In the example of using Wyoming as
the reference case, discuss the effect
on the estimated model Parameters,
Understand
including the intercept, if another CO3
state was selected as the reference
case.
8 Describe how logistic regression can Analyze CO3
MIT ADT University, Pune

be used as a classifier
9 If the probability of an event
occurring is 0.4, then
Create
a. What is the odds ratio? CO3
b. What is the log odds ratio?
10 Choose a topic of your interest, such
as a movie, a celebrity, or any buzz
word. Then collect 100 tweets related
to this topic. Hand-tag them as
positive, neutral, or negative. Next,
split them into 80 tweets as the
training set and the remaining 20 as Apply CO3
the testing set. Run one or more
classifiers over these tweets to
perform sentiment analysis. What are
the precision and recall of these
classifiers? Which classifier performs
better than the others?
11 Analyze the strengths and
weaknesses of using clustering for Analyze CO3
customer segmentation
12 Evaluate the effectiveness of TFIDF
Evaluate CO3
in representing textual data
13 Predict house prices using an
advanced regression model and Apply CO3
evaluate the performance
14 How are association rules used in
Understand CO3
market basket analysis?
15 What are the key differences between
hierarchical clustering and K-Means Understand CO3
clustering?
MIT ADT University, Pune

Question Bank Unit-IV


21BTCS038– Big Data Analytics
Question Question Bloom’s Course Outcome
No Taxonomy (e.g. CO1, CO2, etc.)
Level
1 Explain the features of R
Analyzing
Programming? CO4
2 Write a R program to find nth highest
value in a given vector. Evaluate CO4
x = c(10, 20, 30, 20, 20, 25, 9, 26)
3
How to create a user-defined
Understand
function in R? CO4

4
What types of loops exist in R, and
Remember CO4
what is the syntax of each type?

5
What types of data plots can be
Understand
created in R? CO4

6
What is the difference between the
Analyze CO4
subset() and sample() functions n R?

7 Consider the following data frame given


below:

Evaluate CO4

i. Create a subset of subject less than 4


by using subset () function and
demonstrate
the output.
ii. Create a subset where the subject
column is less than 3 and the class
equals to 2 by using [ ] brackets and
demonstrate the output
MIT ADT University, Pune

8
The data analyst of Argon technology
Mr. John needs to enter the salaries of
10 employees in R. The salaries of
the employees are given in the
following table

i. Which R command will Mr. John


used to enter these values to
demonstrate the output.
Evaluate CO4
ii. Now Mr. John wants to add the
salaries of 5 new employees in the
existing table,

which command he will use to join


datasets with new values in R.
Demonstrate the output.

9
i. Write the script to sort the values Evaluate CO4
contained in the following vector in
ascending order and descending
order: (23, 45, 10, 34, 89, 20, 67, 99).
Demonstrate the output.

ii. Name and explain the operators


used to form data subsets in R.

10
Explain different applications of R? Analyze CO4
11
What is meant by the factor in R? Understand CO4
12
How to import .csv file,json file in R? Evaluate CO4
13
How to import web data in R? Remember CO4

You might also like