0% found this document useful (0 votes)
22 views

Question Bank( DA)-1

Uploaded by

praneet trimukhe
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views

Question Bank( DA)-1

Uploaded by

praneet trimukhe
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 14

UNIT WISE BTL QUESTION BANK

BTL- Blooms Taxonomy Level

Level 1 – Remembering
Level 2- Understanding
Level 3- Applying
Level 4- Analyzing
Level 5- Evaluating
Level 6- Creating
Sl. Questions BTL Course
No. level Outcome
UNIT – I
Part – A (2 Marks)
1 Why Data Analytics is so important? L1 CO1
What do you mean by Predective data analytics?
2 L1 CO1
3 What are the tools used in DataAnalytics? L1 CO1
4 What are the three essential models in data architecture? L1 CO1
5 What are sensor’s data? L1 CO1
6 Define Qualitative Data &Quantitative Data. L1 CO1
7 Write about GPS and Signal data. L1 CO1
8 What are the features in datasets? L1 CO1
9 What do you mean by Information ? L1 CO1
10 What is the difference between error data and noisy data? L1 CO1
Part – B (5Marks)
1 Categorize the data pre-processing in detail. L4 CO1
2 Demonstrate the features of AmazonWeb Service or Google Cloud L2 CO1
Platform.
3 Identify and explain about the secondary sources of data. L3 CO1
4 Compare on types of data analytics. L4 CO1
5 Explain in detail data processing. L3 CO1
6 List and explain about the sources of data. L2 CO1
7 Illustrate how to handle the missing values or Explain Missing L2 CO1
Imputation.
8 Explain about the Data Management. L2 CO1
9 Compare the differences between Qualitative Data & Quantitative L5 CO1
Data.
10 Explain in detail about data architecture and factor influence to Data L2 CO1
Architecture.
UNIT – II
Part – A (2 Marks)
1 What is the Importance of Analytics? L1 CO2
2 What is the role of Business Data Analytics? L1 CO2
3 List three tools of DA those works on stack of Hadoop. L1 CO2
4 List any 4 tools used in Data Analytics. L1 CO2
5 Write short notes on Data Modeling. L1 CO2
6 Brief some applications of DA. L1 CO2
7 List the Features of Apache Spark. L1 CO2
8 Why and where pandas and matplot are useful? L1 CO2
9 What are the Missing imputation? L1 CO2
10 What is datawarehouse? L1 CO2
Part – B (5 Marks)
1 Categorize the tools used in Data Analytics? L4 CO2
2 Describe Data Modeling in detail. L3 CO2
Contrast on Business Data Analytics types.
3 L4 CO2
4 Explain in detail the ways to use the Data Analytics? L2 CO2
5 Identify the steps involved in Data Analytics? Explain in detail. L3 CO2
6 Summarize the different primary Analytics Tools for DA? L2 CO2
7 Illustrate about Apache Spark Built and components in Hadoop. L2 CO2
8 Compare the differences in SQL & NOSQL databases. L5 CO2
9 Describe different data types and variables. L3 CO2
10 Explain in detail about the Missing Imputations. L2 CO2
UNIT – III
Part – A (2 Marks)
1 When a Regression is chosen? L1 CO3
2 List the Regression Analysis Techniques. L1 CO3
3 What are the advantages and Limitationof Linear Regression L1 CO3
How do you Calculate the B1 & B0 usingCorrelation and Standard
4 L1 CO3
Deviation?
5 What do you mean by Unbiasedness and Least Variance? L1 CO3
6 Define Variable Rationalization? L1 CO3
7 What is a Pruning? L1 CO3
9 What is DA Modeling? L1 CO3
10 Define Discrete and continuous Variables. L2 CO3
Part – B (5Marks)
Explain in detail about Regression with the suitable example.
1 L2 CO3
2 Elaborate Variable Rationalization? L3 CO3
3 Categorize discrete and continuous variables with example. L4 CO3
4 Sketch various analytics applications tovarious Business Domains. L2 CO3
5 Explain the need of Business Modelingand Model Theory. L3 CO3
6 Comparision on different types of NOSQL Databases. L5 CO3
7 Describe about Missing Imputations? L3 CO3
8 Compare and contrast Linear and Logistic Regression. L4 CO3
9 Illustrate the Model Building Life Cyclein Data Analytics. L2 CO3
Regression alongwith Root Mean Squared Error.
10 L3 CO3

UNIT – IV
Part – A (2 Marks)
1 What is Supervised Learning? L1 CO4
What is the major difference between Supervised
2 L1 CO4
& Unsupervised Learning?
3 What is Unsupervised Learning? L1 CO4
4 List some Supervised and UnsupervisedLearning Techniques. L1 CO4
5 What are the types ofDecision Tree Algorithms? L1 CO4
6 Write the terminologies in Decision Tree and representation. L1 CO4
7 Compare and Contrast Entropy &Information Gain. L1 CO4
8 What are the appropriate Problems for Decision Tree Learning? L1 CO4
9 What is Pruning in DTL? L1 CO4
10 Compare Overfitting and Under fitting. L2 CO4
Part – B (5Marks)
1 Compare and contrast Supervised &Unsupervised Learning. L4 CO4
Explain Multiple Decision tree and Random Forest.
2 L2
3 Write and explain the advantages, Limitations of DTL. L3 CO4
4 Explain in detail about the Segmentation approach in Data Analytics L2 CO4
5 Categorize time series methods in Data Analytics. L5 CO4
6 Brief about ARMA and ARIMA L2 CO4
Compare an contrast Classification vs Regression and their
7 L4 CO4
Methods and CART.
8 Interpret STL Approach in Data Analytics. L2 CO4
Explain in detail the appropriate Problems for Decision Tree
9 L3 CO4
Learning?
10 Explain in brief about the types of Decision Tree Algorithms. L2 CO4
UNIT – V
Part – A (2 Marks)
1 What is Data Visualization? L1 CO5
2 Why Data Visualization is required? Elaborate. L1 CO5
3 What is a Tree Map? L1 CO5
4 What are the different Categories of Data Visualizations? L1 CO5
5 What is a Line plot? L1 CO5
6 How the Pie Chart represented? L1 CO5
7 What is a scatter plot? L1 CO5
8 What is a Box Plot? L1 CO5
9 What is Circle packing? L1 CO5
10 What is a Chernoff Face and Sticky Figure? L1 CO5
Part – B (5/7 Marks)
Explore in detail about the different Geometric Projection
1 L3 CO5
VisualizationTechniques.
2 Explore the Pixel-Oriented Visualization Techniques. L3 CO5
3 Contrast about the Icon-Based Visualization Techniques L4 CO5
4 Elucidate about the Hierarchical Visualization Techniques. L3 CO5
5 Briefly explain Data Visualization? L2 CO5
6 Write and brief the different terminologies used in Box Plot. L3 CO5
Define Word Cloud? Explore the Visualizing of Complex Data
7 L2 CO5
and Relations.
8 Explain in detail about different Categories of Data Visualizations? L3 CO5
9 Explain STL approach. L2 CO5
10 Discuss Time Series Methods? Explain ARIMA & ARMA. L3 CO5

OBJECTIVE TYPE QUESTIONS


UNIT – 1
MULTIPLE CHOICE QUESTIONS
1. Most of the data is generated from _ _ [ ]
A. Print media B. Organizations
C. Social media D. e-commerce

2. Data Analytics is used to gather in Data. [ ]


A. hidden insights B. perform market analysis
C. Interesting Patterns D. All the above

3. Market Analysis can be performed to understand the of competitors. [ ]


A. Strengths B. Weaknesses
C. Both Strengths and weaknesses D. Profits and loss

4. policies and rules will help describe the manner in which enterprise [ ]
wishes to process their data.
A. Working Policies B. Labor Policies
C. Business policies D. Administration

5. The General Approach is based on designing the Architecture at Levels [ ]


of Specification.
A. Logical Level B. Physical
C. Implementational Level D. All

6. is a group of non-numerical data such as words, sentences. [ ]


A. quantitative data B. Big Data
C. qualitative data D. Analytics

7. The data which is Raw, original, and extracted directly from the official sources [ ]
is known as .
A. Secondary Data B. primary data
C. Input Data D. Processed Data

8. CRD [ ]
A. Complete Randomized design B. Complete Rough Data
C. Complete Raw Data D. Complete Raw Design

9. LSD – Latin Square Design is squares with an equal number of rows [ ]


and columns
A. N x N B. N x M
C. N x 1 D. 1 x N

10. is the assessment of how much the data is usable and fits its [ ]
serving context.
A. data quality B. Data Integrity
C. Data Quantity D. Data Interpretability

FILL IN THE BLANKS:

11. ANOVA .
12. is the data which has already been collected and reused again for some
valid purpose.
13. _ is about handling of missing data, noisy data etc.
14. is a term referred to storing and accessing data over the internet.
15. Amazon S3 .
16. is a point or an observation that deviates significantly from the other observations.
17. Reasons for outliers .
18. Increase in the error variance and reduces the power of statistical tests due to .
19. PMM: .
20. approach groups the similar data

UNIT-2

MULTIPLE CHOICE QUESTIONS


1. is leading analytics tool used for statistics and data modeling. [ ]
A. Java Programming B. C Programming
C. R Programming D. C++ Programming

2. software that connects to any data source such as Excel, corporate [ ]


Data Warehouse, etc.
A. Tableau B. R
C. Java D. Python

3. can be assembled on any platform like SQL server, a MongoDB database [ ]


or JSON.
A. Java Programming B. Python Programming
C. R Programming D. C++ Programming

4. provides various machine learning and visualization libraries such [ ]


as Scikit-learn, TensorFlow, Matplotlib, Pandas, Keras, etc.
A. Java Programming B. Python Programming
C. R Programming D. C++ Programming

5. is one of the largest large-scale data processing engine that [ ]


executes applications in Hadoop clusters.
A. Python B. R
C. Ruby D. Apache Spark

6. Also known as Google Refine. [ ]


A. Open_Refine B. Closed Refine
C. Wide_Refine D. Big Refine

7. is a collection of tightly or loosely connected computers that work [ ]


together so that they act as a single entity.
A. Cluster computing B. Wide Computing
C. Big Computing D. Close Computing

8. is a lightning-fast cluster computing technology, designed for []


fast computation.
A. Big Computing B. Distributed computing
C. Apache Spark D. MySQL

9. Spark helps to run an application in []


A. Hadoop cluster B. Oracle
C. Big Cluster D. NoSQL

10. is a component on top of Spark Core that introduces a new data abstraction []
A. SQL B. Spark SQL
C. Spark D. NoSQL
FILL IN THE BLANKS:

11. is a distributed machine learning framework above Spark


12. is a distributed graph-processing framework on top of Spark
13. Scala is a statically typed programming language that incorporates both
14. Scala primarily runs on .
15. The name Scala is a portmanteau of .
16. Cloudera Impala is Cloudera's open source massively parallel processing .
17. Cloudera Impala is a query engine that runs on .
18. Database is a non-relational Data Management System.
19. MongoDB is an example for Database for Document oriented data.
20. Example of Graph Database .
UNIT-3

MULTIPLE CHOICE QUESTIONS


1. The term is used to indicate the estimation or prediction of the average [ ]
value of one variable for a specified value of another variable.

A. Segregation B. Progression
C. Regression D. Aggregation

2. simple linear regression we want to model our data as [ ]

A. y = B0*x * B1 B. y = B0 + B1 * x
C. y = B1 * x D. y = B0 + x

3. RMSE can be computed as: [ ]


n n
Err Err
i i i i
i1 i1
A. B.
n n
n n
Err Err
C. i i D. i i
i 1 i 1n 1
n
4. When we have a single input attribute (x) and we want to use linear regression, this [ ]
is called
A. Multiple Linear Regression B. Continuous Linear Regression
C. simple linear regression D. Auto Linear Regression

5. In R, Function used to find the a linear relation between x & y [ ]


A. lm(y,x) B. lm(y~x)
C. Linear(y,x) D. predict(x,y)

6. The goal of is to improve the Data Processing in an optimal way through attribute [ ]
subset selection
A. Rationalization B. Variable correlation
C. Variable Rationalization D. Various Rationalization

7. is a mathematical approach to create a statistical model to [ ]


forecast future behavior based on input test data
A. Progressive modeling B. Professional modelling
C. Predictive modeling D. Pro-active modeling

8. Logistic Regression is modeling [ ]


A. Supervised B. Semi Supervised
C. Unsupervised D. Reinforcement learning

9. In multinomial Logistic regression, there can be 3 or more possible types [ ]


of the dependent variable
A. ordered B. Semi-ordered
C. unordered D. Under ordered

10. In Logistic Regression, the dependent variable must be in nature [ ]


A. Categorical B. Continuous
C. Correlated D. Classical

FILL IN THE BLANKS:

11. In ordinal Logistic regression, there can be 3 or more possible types of

dependent variables.
12. In , there can be only two possible types of the dependent variables Class-0 &
Class-1.

13. is a statistical phenomenon in which multiple independent variables show


high correlation between each other and they are too inter-related.

14. Logistic Regression uses a complex function, known as the

15.Sigmoid function is
16.False positive is Type-2 Error (True/False)

17.False Negative is Type ( I or II) Error.

18. In error, the actual value was negative but the model predicted a positive value
19. Formula for Precision:
20. Formula for F1-Score:

UNIT-4

MULTIPLE CHOICE QUESTIONS:


1. Supervised learning is a learning method in which models are trained using []
A Unlabeled data B. Raw Data
.
C. Labeled data D. Complete Data

2. Unsupervised learning is a method in which inferred from the []


unlabeled input data.
A Classes B. Patterns
.
C. Errors. D. outputs

3. Supervised learning needs supervision to the model []


A Update B. Test
.
C. Train D. Both Train & Test

4. The purpose of is to better understand your customers rather than data []


A Segmentation B. Regression
.
C. Segregation D. Correlation

5. Demographic Segmentation is a segmentation []


A Can be Non-Objective or B. Non-Objective
. objective
C. Objective D. Semi-Objective
6. Decision Tree is a technique [ ]
A supervised learning B. unsupervised learning
C. Semi-supervised learning D. Non-supervised learning

7. It is a tree-structured classifier, where represent the features of a dataset [ ]


A internal nodes & leaf nodes B. Root & Internal nodes
C. Root nodes & leaf nodes D. Leaf nodes

8. If the regression decision tree, the decision or the outcome variable is [ ]

A Continuous B. Categorical
C. Discrete D. Distant

9. is the process of removing the unwanted branches from the tree [ ]


A Edging B. Pruning
C. Regression D. Predicting

10. The decision Tree is to errors [ ]


A Tolerable B. Non-tolerable
C. Sensitive D. Doesn’t allow

FILL IN THE BLANKS:

a. Supervised learning can be categorized in problems.


b. can be classified in Clustering and Associations problems.
c. Linear Regression, Logistic Regression are unsupervised learning models (True/False)
.
d. Decision Tree is the successor of .
e. CART is .
f. The best attribute in the dataset using .
g. is a measure of the randomness in the information being processed.
h. is a statistical property that measures how well a given attribute separates
the training examples.
i. _is a modeling error in statistics that occurs when a function is too closely
aligned to a limited set of data points in turn, will fail in testing.
j. ARIMA is an acronym that stands for
UNIT- 5
MULTIPLE CHOICE QUESTIONS
1. is the art and practice of gathering, analyzing, and graphically []
representing empirical information.
A Data Modification B. Data visualization
C. Data Validation D. Data Updating

2. is used to get graphical output from data predictive analytics results. []

A Tabular Data B. Total Blue


C. Tableau D. Tally

3. Data Visualization induce the viewer to think about the substance []


rather thanabout through graphic design
A Output B. outcome
C. Methodology D. error

4. We need to choose the dimensions and measures in the process of Data. []


A Extracting B. Estimating
C. Expressing D. Exploring

5. are the category type data points such as landing page, []


source medium, etc.
A Directions B. Dimensions
C. Detections D. Du-points

6. One of the great qualities Tableau has is its ability to filter data in []
real time
A show room B. Show space
C. Show case D. Work space

7. DPA: []
A Data presentation architecture B. Dual presentation architecture
C. Data preparation architecture D. Directive presentation architecture

8. Data visualization is viewed by many disciplines as a modern equivalent of []

A Virtual Communication B. Viral Communication


C. Visual Communication. D. Vivid Communication

9. is both an art and a science. []


A Data Virtualization B. Data Visualization
C. Data Variation D. Data Variance

10. Can be more precise and revealing than []


conventional statisticalcomputations.
A Data Representation B. Data Virtualization
C. Graphical representation D. Graphical Computation
FILL IN THE BLANKS:

11. Gain insight into an information space by mapping data onto provide
qualitative overview of large data sets.
12. is a Forecast Accuracy can be defined as the deviation of Forecast
or Prediction from the actual results.
13. In MFA, Error = .
14. CHAID stands for _.
15. Regression tree analysis is when the predicted outcome can be considered a
.
16. A tree is a binary decision tree that is constructed by splitting a node into
two child nodes repeatedly
17. Decision Tree Leaning can be able to handle both numerical and categorical data (True/False)
.
18. Decision Tree uses a White box Model (True/False) .
19. Regression trees / parallel regression modeling, in which the dependent
variable is .
20. The CART growing method attempts to within-node homogeneity.

OBJECTIVES KEY
UNIT-1 KEY
1). C 2). D 3). C 4). C 5). D 6). C 7). B 8). A 9). A 10). C
11). Analysis of Variance (ANOVA). 12). Secondary data 13). Data Cleaning
14). Cloud computing 15). Simple Storage Service 16). Outlier 17). Experimental
errors or special circumstances 18). Outliers 19). Predictive Mean Matching (PMM).
20). Clustering

UNIT –2 KEY
1). C 2). A 3). B 4). B 5). D 6). A 7). A 8). C 9). A 10). B
11). MLlib 12). GraphX 13). functional and object oriented 14). JVM platform
15). "scalable" and "language" 16). SQL query engine 17). Apache Hadoop
18).NoSQL 19). NoSQL 20). Neo4j or Amazon Neptune

UNIT-3 KEY
1). C 2). B 3). C 4). C 5). B 6). C 7). C 8). A 9). C 10). A
11). Ordered 12). Binomial Logistic regression 13). Multi-Collinearity

z  sigmoid ( y)   ( y)  1
14). Sigmoid function 15).

1 ey
16). False 17). Type-II 18). Type-I or False Positive

UNIT-4 KEY
1. C 2. B 3. D 4. A 5. B 6. A 7. B 8. A 9. B 10. A
11. Classification and Regression 12. Unsupervised Learning
13. False 14. ID3 15. Classification and Regression Tree
16. Attribute Selection Measure (ASM) 17. Entropy 18. Information gain
19. Overfitting 20. Auto Regressive Integrated Moving Average

UNIT-5 KEY
1. B 2. C 3. C 4. A 5. B 6. A 7. A 8. C 9. B 10. A
11. Graphical Primitives 12. Measure of Forecast Accuracy
13. Actual demand – Forecast 14. CHI-squared Automatic Interaction Detector
15. Real Number 16. CART 17. True 18. True 19. Quantitative
20. Maximize

**************************************************************************

You might also like