0% found this document useful (0 votes)
65 views11 pages

Big Data Question Bank

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
65 views11 pages

Big Data Question Bank

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Q.

No Question Module No CO No Bloom Level


1 What is Big Data? 2 2 Co1
2 List out the best practices of Big Data Analytics. 1 2 CO1
3 Write down the characteristics of Big Data Applications. 1 2 CO1
4 What is HDFS? 2 2 CO1
5 What is Map Reduce Programming Model? 3 2 CO1
6 How would you show your understanding of classic map-reduce? 3 2 CO1
7 Can you identify the different statistical concepts required for Big data? 1 2 CO1
8 Point out the characteristics of Hadoop. 2 2 CO1
9 What do you mean by Regression ? 5 2 CO1
10 List the details of reducer size and replication rate. 3 2 CO1
11 Analyze the reason behind why do we need NoSQL ? 4 2 CO1
12 Examine the differences between HBase and Hive 5 2 CO1
13 Examine the need for Apache pig. 4 2 CO1
14 Differentiate between analysis and analytic? 1 2 CO1
15 List out various terminologies in Big Data environments? 1 2 CO1
16 List out the various use cases of Hadoop? 2 2 CO1
17 State how HDFS stores massive data in Hadoop Cluster? 2 2 CO1
18 WhatisName node, Secondary Node and Job-tracker? 3 2 CO1
19 Define what happens if the mapper output does not match the reducer input? 3 2 CO1
20 Classify Pig Latin commands in Pig? 4 2 CO1
21 List out the Data types in Hive? 4 2 CO1
22 Interpret joins with an examples? 4 2 CO1
23 Whentousetheregression? 5 2 CO1
24 Discuss the modes of Pig scripts? 4 2 CO1
25 Identify block replication in HDFS? 2 2 CO1
26 What is 'Supervised Learning’ 5 2 CO1
27 Listmajorapplicationsofmachinelearning. 1 2 CO1
28 Define Input Split? 3 2 CO1
29 Define what are the properties of Pig? 4 2 CO1
30 Define the complexity theory for Map-Reduce? 3 2 CO1
31 Comparelearningvsprogramming. 1 2 CO2
32 Define Machine Learning and its type. 5 2 CO2
33 What is Avro? How serialized function achieved in Hadoop 2 2 CO2
34 SummarizetheadvantagesanddisadvantagesofMachineLearning 1 2 CO2
35 Discuss Comparison of Pig with Database 4 2 CO2
36 What is Overfitting in Machine learning 5 2 CO2
37 how to find hidden data from Data Storage ? 1 2 CO2
38 Implement the Input Format for Compute-Intensive applications? 4 2 CO2
39 Generalise the term Record Reader/Writer? 4 2 CO2
40 What is Big R language? 5 2 CO2
41 What is HQL? 4 2 CO2
42 Explain the Pig Latin application flow? 4 2 CO2
43 DistinguishbetweenClusteringandAssociationAnalysis. 5 2 CO2
44 In Hive, explain the term ‘aggregation’ and its uses? 4 2 CO2
45 Write a shell command in Hive to list all the files in the current directory? 4 2 CO2
46 Why is a block in HDFS so large? 2 2 CO2
47 Define racks in Hadoop Cluster? 2 2 CO2
48 Discuss the Distributed Computing challenges? 2 2 CO2
49 What are the various applications of big data analytics? 1 2 CO2
50 Define Business Intelligence(BI). 1 2 CO2
51 Classify the Difference between Traditional Business Intelligence BI versus Big Data? 1 10 CO2
52 Explain in detail the about different types of analytical inBig Data? 1 10 CO2
53 Explain to Infosphere Big Insights and Big Sheets and how to find hidden data? 1 10 CO2
54 In a medical study, doctors recorded the average calorie intake for a group of adolescents in a specific year and
5 their corresponding
10 CO5 average increase in h
55 SummarizeBig Data help in decision making for the organization? How convenient it is for the organizational5personnel10 in CO5
maintaining data in bulk at a sin
56 Evalutehowbigdataanalyticshelpsbusiness peopletoincreasetheirrevenue.Discusswith anyonerealtime application. 5 10 CO5
57 Orbitz generates tremendous amounts of log data. The raw logs are only stored for a few days because of costly4 data warehousing.
10 CO6 Orbitz needed an effe
58 Inasmallstart-upcompany,thereareonly16employees. Toidentifythedifferentdesignations, the company uses4a system of 10grades.
CO6 There are two grade G1 e
G1 - Rs. 25,000
G2 - Rs. 35,000
G3 - Rs. 45,000
G4 - Rs. 60,000 and
G5 - Rs 5,00,000.
What is the mean, median and mode salaries respectively,oftheemployeesofthiscompany?
59 Facebook 鈥’s the world’s biggest social network by a huge margin, and most of us are used to using it to share1 details10 of CO6
our everyday lives with our frien
60 Apply Amazon is a big data giant; how specific organizations use big data. 1 10 CO3
61 Discuss the working model and Design concept of Hadoop Distributed File System (HDFS) with neat architecture? 2 10 CO2
62 Employees are a both a business’s greatest asset and its greatest expense. So, hitting on the right formula for
4 selecting10
them,
CO3and keeping them in place,
63 Discuss about Hadoop file system interfaces for structure and unstructured data in Big data? 2 10 CO2
64 Justify how Hadoop technology satisfies the business insights now -a 鈥揹ays? 1 10 CO4
65 Explain divide and conquer philosophy in processing big data 鈥 2 10 CO2
66 OPower works with utility companies to provide engaging, relevant, and personalized content about home energy 2 use10to CO6
millions of households. The Pro
67 Google- Big data and big business go hand in hand 鈥’s leading corporations are making of the endless amount 2 of digital
10information
CO6 the world is produci
68 General Electric 鈥 2 10 CO6
69 In smart parking, sensors are used for each parking slot, to detect whether the slot is empty or occupied. This
5 information
10 CO6
is aggregated by an on-site sma
70 Sears is a department store (online and brick and mortar). The Problem is Sears' process for analysing marketing
2 campaigns
10 CO6 for loyalty club members use
71 Human Sciences-NextBio is using Hadoop MapReduce and HBase to process massive amounts of human genome 3 data.10
Justify
CO6
72 Illustrate the Hadoop cluster is a special type of computational cluster designed for storing and analysing vast
3 amount 10of unstructured
CO4 data in a distribute
73 Explain how map reduce jobs run on YARN. 3 10 CO2
74 Discuss the various types of map reduce & its formats. 3 10 CO2
75 Applythe Hadoop processing of data in Cloud computing and AmazonEC2 with an example? 3 10 CO3
76 Some analysis of the content of the queries. Many queries contain place names and geographic terms. For this 3 part of 10
theCO3
assignment, your job is to mak
a) Find queries with references to India zip codes b) Find queries with references to place names c) Find other indicators of geographic locations. Further
77 Student 4 10 CO5
Student1
Student2
Student3
Student4
Student5
Student6
Student7
The result generated by the Map function is a key value pair (K, V) which acts as the input for Reduce function using pig Latin
78 Some analysis of the content of the queries. Many queries contain place names and geographic terms. For this 5 part of 10
theCO5
assignment, your job is to mak
Do this at least two different ways:
Find queries with references to India zip codes
Find queries with references to place names
Find other indicators of geographic locations.
Further improve on (b) by doing something clever about ambiguous place names
79 Case study and give solution using machine Learning algorithm for Scaling image processing used in roof inspections
5 Solution
10 CO5built for a risk management
80 Consider a collection of literature survey made by a researcher in the form of a text document with respect 3to cloud and
10 big
CO5data analytics. Using Hadoop
81 Write a step for Unstructured data into NoSQL data and do all operations such as NoSQL query with API. 4 10 CO2
82 List the classification of NoSQL Databases and explain about Key Value Stores 4 10 CO5
83 Some of the challenges you face when you move from a single processor to a distributed computing system.2Moving to10a CO5 distributed environment is a no
84 When should data store need NoSQL instead of relational database? Why do Big Data Analytics use NoSQL data 4 stores? 10 CO2
85 Wikipedia to produce a rudimentary metric of how popular a programming language is, in an effort to see if 5our Wikipedia-based
10 CO5 rankings bear any relati
86 Clients need a database design for his blog with following specifications. 5 10 CO6
Every post has a unique title, description and url.
Every post can have one or more tags.
Every post has the name of its publisher and total number of likes.
Every post has comments given by users along with their name, message, data-time and likes.
On each post, there can be zero or more comments
87 Explain‘training set’ and ‘test set’ in a Machine Learning Model? How much data will you allocate for Your Training,
5 Validation,
10 CO3 and Test Sets?
88 Explain Characteristicis of Big Data with proper examples 1 5 CO2
89 Draw HDFS Architecture. Explain any two commands of HDFS from the following commands with syntax atleast 2 one example
5 CO3of each. CopyFromLocal,set
90 Explain working of various phases of Map Reduce with appropriate example and diagram 3 5 CO2
91 What are the advantages of Hadoop? Explain Hadoop Architecture and its Components with proper diagram.1 5 CO2
92 What are the benefits of Big Data? Discuss challenges under Big Data. How Big Data Analytics can be useful in1 the development
5 CO2 of smart cities.
93 What do you mean by HiveQL Data Definition Language? Explain any three HiveQL DDL command with its syntax 4 and example.
5 CO2
94 What are the execution Modes of Pig and data validation done with data base? 4 5 CO2
95 Explain the function of ‘Supervised Learning’? 5 5 CO2
96 What are difference between Collaborative Filtering vs Content-Based Filtering? 5 5 CO2
97 Describe in detail about the usage of data analysis in Weather forecasting predictions 5 5 CO3
98 Apply Pig Latin and write user defined function in Pig Latin 4 5 CO4
99 Working Model of Map Reduce Anatomy for Job scheduling? 3 5 CO3
100 Solvestep by step execution process of word count reduces 3 5 CO4
101 Write a File-Based Data structures implemented in Big Data explain with example 3 5 CO5
102 Write thedifferent type of analytical method of Big Data? 1 5 CO5
103 How would you distinguish analysis tools and reporting tools used in Big data? 1 5 CO3
104 Compare betweenHBase vs RDBMS? 4 5 CO3
105 What are different kind nodes available in Hadoop ecosystem. 2 5 CO2
106 Explain retrieval and storage, pre-processing and analysis in order to convert multiple data sources into valuable
2 data in5HDFS?
CO4
107 How I/O Serialization process implemented in Hadoop? 2 5 CO3
108 A typical course feedback system functions as per following features: 3 5 CO3
Course management.
Subject management for course.
Faculty subject engagement.
Student registration for course.
Student feedbacks for faculty for subject use map reduce technique how it will make it?
109 Draw the architectural diagram for physical organization of compute nodes. How data flow control and validate 4 by pig? 5 CO4
110 Discuss Data processing operators in Pig Latin 4 5 CO4
111 A start-up company want to use Hive for storing its data. List the collection types provided by Hive for this purpose?
4 5 CO4
112 What would be effect of negative value of second argument, which specifies the number of decimal places, on 4 the output
5 CO4
of MySQL TRUNCATE() function
113 Evaluate how some areas/disciplines that influenced with different type of machine learning. 5 5 CO5
114 What is ‘R’ language? Discuss about how Revolution R Enterprise (RRE) designed for scale & Performance 5 5 CO2
115 List various configuration files used in Hadoop Installation. What is use of mapred- site.xml? 1 5 CO4
116 With proper examples discuss and differentiate structured, unstructured and semi-structured data. Make a note 5 on how5 type
CO5 of data affects data serializ
117 Discuss role of Data node and Name node in HDFS. Give commands with appropriate arguments to perform 2data transfer 5 CO5
between local file system and H
118 Compare Raw oriented and Column Oriented database structures 4 5 CO3
119 Illustrate the difference between the SQL versus NoSql Hadoop in detail? 4 5 CO5
120 Illuatrate the Hadoop cluster is a special type of computational cluster designed for storing and analyzing vast
4 amount of5 unstructured
CO4 data in a distribute
121 Discuss the use of the FOREACH and ASSERT operator in Pig Latin? 4 5 CO5
122 Hit Rate - accounts the number of times direction of the stock is same as predicted. Return of investment 5 5 CO6
123 Create ashell command in Hive to list all the files in the current directory? 4 5 CO6
124 Create amany types of joins are there in Pig Latin with an examples? 4 5 CO6
125 Write a short note on the following operators: a. GROUP b. ORDER BY 4 5 CO6
126 Write a Java program in MapReduce application for word counting on Hadoop cluster? 3 5 CO6
127 Describe in detail about the role of statistical models in Big Data. 3 5 CO3
128 Air pollution monitoring systems can monitor the emission of harmful gases by factories and automobiles using5 gaseous5andCO6meteorological sensors. The
129 Illustrate in detail how big data are effectively filtered and mixed with the traditional one. 1 10 CO3
130 Summarize in detail about the challenges of the Big Data in Modern Data Analytics. 1 10 CO4
131 Analyze how google file system differs from the Hadoop file system and explains the google file system architecture
2 with
10 aCO4
neat sketch.
132 Analyze the statement in detail : “Data Analysis is not a decision-making system, but a decision supporting system”
2 10 CO4
133 Analyze a Regression Model for “ happy people get many hours of sleep” using your own data and what kind5 of inferences
10 CO4
it provides.
134 Compose the K-means partitioning algorithm using the given data. Consider five points { X1, X2,X3, X4, X5} with
5 the following
10 CO3coordinates as a two dimen
135 Perform analysis on web server report Sample Data: teleman.pr.mcs.net,-,-,[01/Jul/2005:00:03:57,0400], "GET,/images/KSC-logosmall.gif,HTTP/1.0",
5 10 CO3 304
136 Analysis how will you Order the use of Hive. How Does Hive Interact With Hadoop explain in detail? 4 10 CO3
137 Formulate a Hbase table from the following data Data_file.txt contains the below data 4 10 CO3
1. 1,India,Bihar,Champaran,2009,April,P1,1,5
2. 2,India, Bihar,Patna,2009,May,P1,2,10
3. 3,India, Bihar,Bhagalpur,2010,June,P2,3,15
4. 4,United States,California,Fresno,2009,April,P2,2,5
5. 5,United States
138 Analysisa generic design for Realtime Analytics Platform(RTAP). Discuss your answer related to real time sentiment
4 analysis
10 CO4 in instagram.
139 Analysisa real time stock market situation, bring out the various ideas used in prediction analysis 5 10 CO4
140 Analysisthe various operational modes of Hadoop cluster configuration and explain in detail about configuring/installing
5 10the
CO4Hadoop in local/standalone m
141 Analyze the steps of Map reduce Algorithms.Draw the neat diagram explain with examples 5 10 CO4
142 Formulate the role of analytic sandbox, its benefits and types 1 5 CO6
143 Compile with a neat sketch about processing of a job in Hadoop 2 5 CO6
144 Recommend a procedure to find the number of occurrence of a word in a document using Hive. 4 5 CO6
145 Write about the system architecture and components of Hive and Hadoop 4 5 CO6
146 Assess the structure of big data representation 1 5 CO4
147 Evaluate ways in which the big data is represented. 1 5 CO4
148 Evaluatein detail about web data and what does it reveal? 1 5 CO4
149 Elaborate the map reduce algorithm with an example. 3 5 CO4
150 Demonstrate on the importance of using HDFS. 2 5 CO4
151 Explain in detail about Predictive Analysis. 5 5 CO5
152 Write the Hive command to create a table with four columns: First name, last name, age, and income? 4 5 CO5
153 Discuss the use of the FILTER and DISTINCT operator in Pig Latin 4 5 CO3
154 Define the various Statements used in flow of data processing in Pig Latin? 4 5 CO3
155 Analysis Can MapReduce be used to solve any kind of computational problems? if not, explain the cases where 3 MapReduce
5 CO4 is not applicable?
156 Estimate the entire process of data analysis conducted in the MapReduce programming model? 3 5 CO4
157 Analysis in detail about the ETL (Extract, Transform and Load) system? 4 5 CO3
158 AnalysisAvro data serialization technique in MapReduce 2 5 CO3
159 Analysis challenges under Big Data. How Big Data Analytics can be useful in the development of smart cities.2 5 CO3
160 Write Map Reduce steps for counting occurrences of specific numbers in the input text file(s). Also write the3commands5toCO3 compile and run the code.
161 Explain the important features that are required to well define a learning problem. 5 5 CO3
162 Analysisthe benefits between Apache pig Vs Map Reduce. 4 5 CO3
163 Analysisbig data analytics and explain various applications in the real world scenario? 1 5 CO3
164 Demonstrate about HBase and Hbase clients in detail. 4 5 CO3
165 Explain Compare and Contrast the Hadoop and MapR 3 5 CO4
166 Show the method of invoking the Grunt shell. 4 5 CO3
167 Analysis about Pig data model in detail with neat diagram 4 5 CO4
168 Assess the several types of motivation and data analysis available for time series? 4 5 CO4
169 Analysis the risks involved in handling Big data. 1 5 CO4
170 It’s true that HDFS is to be used for applications that have large data sets. Why is it not the correct tool to use
2 when there
5 CO5
are many small files?
171 Is it possible to create multiple tables in the hive for the same data?Justify 4 5 CO5
172 Write the command used to copy data from the local system onto HDFS? 2 5 CO5
173 Mention the main configuration parameters that has to be specified by the user to run MapReduce. 3 5 CO5
174 Hadoop is a great file system for running big data applications but it is very costly, comment on the Truthfulness
3 of this 5statement.
CO5
175 WebContent Recommendation: Such applications can leverage big data systems for recommending new content 5 to the5users
CO5based on user preferences
176 Customer Recommendations: Big data systems can be used to analyze customer data (such as demographic5data, shopping 5 CO5
history, or customer feedback
177 Analyze big data architecture with a neat schematic diagram. 1 5 CO5
178 Production planning and control systems measure various parameters of production processes and control the 1 entire production
5 CO5 process in real-time. Th
179 In a MapReduce program, Map () and Reduce () are two functions. The Map function performs actions like filtering,
1 grouping
5 CO5and sorting. While Reduce f
180 Big data systems for real-time data analysis can be used for the analysis of large volumes of fast-moving data1 from wearable
5 CO5devices and other in-hospita
Bloom Level
of geographic locations. Further improve on (b) by doing something clever about ambiguous place names

You might also like