We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11
Q.
No Question Module No CO No Bloom Level
1 What is Big Data? 2 2 Co1 2 List out the best practices of Big Data Analytics. 1 2 CO1 3 Write down the characteristics of Big Data Applications. 1 2 CO1 4 What is HDFS? 2 2 CO1 5 What is Map Reduce Programming Model? 3 2 CO1 6 How would you show your understanding of classic map-reduce? 3 2 CO1 7 Can you identify the different statistical concepts required for Big data? 1 2 CO1 8 Point out the characteristics of Hadoop. 2 2 CO1 9 What do you mean by Regression ? 5 2 CO1 10 List the details of reducer size and replication rate. 3 2 CO1 11 Analyze the reason behind why do we need NoSQL ? 4 2 CO1 12 Examine the differences between HBase and Hive 5 2 CO1 13 Examine the need for Apache pig. 4 2 CO1 14 Differentiate between analysis and analytic? 1 2 CO1 15 List out various terminologies in Big Data environments? 1 2 CO1 16 List out the various use cases of Hadoop? 2 2 CO1 17 State how HDFS stores massive data in Hadoop Cluster? 2 2 CO1 18 WhatisName node, Secondary Node and Job-tracker? 3 2 CO1 19 Define what happens if the mapper output does not match the reducer input? 3 2 CO1 20 Classify Pig Latin commands in Pig? 4 2 CO1 21 List out the Data types in Hive? 4 2 CO1 22 Interpret joins with an examples? 4 2 CO1 23 Whentousetheregression? 5 2 CO1 24 Discuss the modes of Pig scripts? 4 2 CO1 25 Identify block replication in HDFS? 2 2 CO1 26 What is 'Supervised Learning’ 5 2 CO1 27 Listmajorapplicationsofmachinelearning. 1 2 CO1 28 Define Input Split? 3 2 CO1 29 Define what are the properties of Pig? 4 2 CO1 30 Define the complexity theory for Map-Reduce? 3 2 CO1 31 Comparelearningvsprogramming. 1 2 CO2 32 Define Machine Learning and its type. 5 2 CO2 33 What is Avro? How serialized function achieved in Hadoop 2 2 CO2 34 SummarizetheadvantagesanddisadvantagesofMachineLearning 1 2 CO2 35 Discuss Comparison of Pig with Database 4 2 CO2 36 What is Overfitting in Machine learning 5 2 CO2 37 how to find hidden data from Data Storage ? 1 2 CO2 38 Implement the Input Format for Compute-Intensive applications? 4 2 CO2 39 Generalise the term Record Reader/Writer? 4 2 CO2 40 What is Big R language? 5 2 CO2 41 What is HQL? 4 2 CO2 42 Explain the Pig Latin application flow? 4 2 CO2 43 DistinguishbetweenClusteringandAssociationAnalysis. 5 2 CO2 44 In Hive, explain the term ‘aggregation’ and its uses? 4 2 CO2 45 Write a shell command in Hive to list all the files in the current directory? 4 2 CO2 46 Why is a block in HDFS so large? 2 2 CO2 47 Define racks in Hadoop Cluster? 2 2 CO2 48 Discuss the Distributed Computing challenges? 2 2 CO2 49 What are the various applications of big data analytics? 1 2 CO2 50 Define Business Intelligence(BI). 1 2 CO2 51 Classify the Difference between Traditional Business Intelligence BI versus Big Data? 1 10 CO2 52 Explain in detail the about different types of analytical inBig Data? 1 10 CO2 53 Explain to Infosphere Big Insights and Big Sheets and how to find hidden data? 1 10 CO2 54 In a medical study, doctors recorded the average calorie intake for a group of adolescents in a specific year and 5 their corresponding 10 CO5 average increase in h 55 SummarizeBig Data help in decision making for the organization? How convenient it is for the organizational5personnel10 in CO5 maintaining data in bulk at a sin 56 Evalutehowbigdataanalyticshelpsbusiness peopletoincreasetheirrevenue.Discusswith anyonerealtime application. 5 10 CO5 57 Orbitz generates tremendous amounts of log data. The raw logs are only stored for a few days because of costly4 data warehousing. 10 CO6 Orbitz needed an effe 58 Inasmallstart-upcompany,thereareonly16employees. Toidentifythedifferentdesignations, the company uses4a system of 10grades. CO6 There are two grade G1 e G1 - Rs. 25,000 G2 - Rs. 35,000 G3 - Rs. 45,000 G4 - Rs. 60,000 and G5 - Rs 5,00,000. What is the mean, median and mode salaries respectively,oftheemployeesofthiscompany? 59 Facebook 鈥’s the world’s biggest social network by a huge margin, and most of us are used to using it to share1 details10 of CO6 our everyday lives with our frien 60 Apply Amazon is a big data giant; how specific organizations use big data. 1 10 CO3 61 Discuss the working model and Design concept of Hadoop Distributed File System (HDFS) with neat architecture? 2 10 CO2 62 Employees are a both a business’s greatest asset and its greatest expense. So, hitting on the right formula for 4 selecting10 them, CO3and keeping them in place, 63 Discuss about Hadoop file system interfaces for structure and unstructured data in Big data? 2 10 CO2 64 Justify how Hadoop technology satisfies the business insights now -a 鈥揹ays? 1 10 CO4 65 Explain divide and conquer philosophy in processing big data 鈥 2 10 CO2 66 OPower works with utility companies to provide engaging, relevant, and personalized content about home energy 2 use10to CO6 millions of households. The Pro 67 Google- Big data and big business go hand in hand 鈥’s leading corporations are making of the endless amount 2 of digital 10information CO6 the world is produci 68 General Electric 鈥 2 10 CO6 69 In smart parking, sensors are used for each parking slot, to detect whether the slot is empty or occupied. This 5 information 10 CO6 is aggregated by an on-site sma 70 Sears is a department store (online and brick and mortar). The Problem is Sears' process for analysing marketing 2 campaigns 10 CO6 for loyalty club members use 71 Human Sciences-NextBio is using Hadoop MapReduce and HBase to process massive amounts of human genome 3 data.10 Justify CO6 72 Illustrate the Hadoop cluster is a special type of computational cluster designed for storing and analysing vast 3 amount 10of unstructured CO4 data in a distribute 73 Explain how map reduce jobs run on YARN. 3 10 CO2 74 Discuss the various types of map reduce & its formats. 3 10 CO2 75 Applythe Hadoop processing of data in Cloud computing and AmazonEC2 with an example? 3 10 CO3 76 Some analysis of the content of the queries. Many queries contain place names and geographic terms. For this 3 part of 10 theCO3 assignment, your job is to mak a) Find queries with references to India zip codes b) Find queries with references to place names c) Find other indicators of geographic locations. Further 77 Student 4 10 CO5 Student1 Student2 Student3 Student4 Student5 Student6 Student7 The result generated by the Map function is a key value pair (K, V) which acts as the input for Reduce function using pig Latin 78 Some analysis of the content of the queries. Many queries contain place names and geographic terms. For this 5 part of 10 theCO5 assignment, your job is to mak Do this at least two different ways: Find queries with references to India zip codes Find queries with references to place names Find other indicators of geographic locations. Further improve on (b) by doing something clever about ambiguous place names 79 Case study and give solution using machine Learning algorithm for Scaling image processing used in roof inspections 5 Solution 10 CO5built for a risk management 80 Consider a collection of literature survey made by a researcher in the form of a text document with respect 3to cloud and 10 big CO5data analytics. Using Hadoop 81 Write a step for Unstructured data into NoSQL data and do all operations such as NoSQL query with API. 4 10 CO2 82 List the classification of NoSQL Databases and explain about Key Value Stores 4 10 CO5 83 Some of the challenges you face when you move from a single processor to a distributed computing system.2Moving to10a CO5 distributed environment is a no 84 When should data store need NoSQL instead of relational database? Why do Big Data Analytics use NoSQL data 4 stores? 10 CO2 85 Wikipedia to produce a rudimentary metric of how popular a programming language is, in an effort to see if 5our Wikipedia-based 10 CO5 rankings bear any relati 86 Clients need a database design for his blog with following specifications. 5 10 CO6 Every post has a unique title, description and url. Every post can have one or more tags. Every post has the name of its publisher and total number of likes. Every post has comments given by users along with their name, message, data-time and likes. On each post, there can be zero or more comments 87 Explain‘training set’ and ‘test set’ in a Machine Learning Model? How much data will you allocate for Your Training, 5 Validation, 10 CO3 and Test Sets? 88 Explain Characteristicis of Big Data with proper examples 1 5 CO2 89 Draw HDFS Architecture. Explain any two commands of HDFS from the following commands with syntax atleast 2 one example 5 CO3of each. CopyFromLocal,set 90 Explain working of various phases of Map Reduce with appropriate example and diagram 3 5 CO2 91 What are the advantages of Hadoop? Explain Hadoop Architecture and its Components with proper diagram.1 5 CO2 92 What are the benefits of Big Data? Discuss challenges under Big Data. How Big Data Analytics can be useful in1 the development 5 CO2 of smart cities. 93 What do you mean by HiveQL Data Definition Language? Explain any three HiveQL DDL command with its syntax 4 and example. 5 CO2 94 What are the execution Modes of Pig and data validation done with data base? 4 5 CO2 95 Explain the function of ‘Supervised Learning’? 5 5 CO2 96 What are difference between Collaborative Filtering vs Content-Based Filtering? 5 5 CO2 97 Describe in detail about the usage of data analysis in Weather forecasting predictions 5 5 CO3 98 Apply Pig Latin and write user defined function in Pig Latin 4 5 CO4 99 Working Model of Map Reduce Anatomy for Job scheduling? 3 5 CO3 100 Solvestep by step execution process of word count reduces 3 5 CO4 101 Write a File-Based Data structures implemented in Big Data explain with example 3 5 CO5 102 Write thedifferent type of analytical method of Big Data? 1 5 CO5 103 How would you distinguish analysis tools and reporting tools used in Big data? 1 5 CO3 104 Compare betweenHBase vs RDBMS? 4 5 CO3 105 What are different kind nodes available in Hadoop ecosystem. 2 5 CO2 106 Explain retrieval and storage, pre-processing and analysis in order to convert multiple data sources into valuable 2 data in5HDFS? CO4 107 How I/O Serialization process implemented in Hadoop? 2 5 CO3 108 A typical course feedback system functions as per following features: 3 5 CO3 Course management. Subject management for course. Faculty subject engagement. Student registration for course. Student feedbacks for faculty for subject use map reduce technique how it will make it? 109 Draw the architectural diagram for physical organization of compute nodes. How data flow control and validate 4 by pig? 5 CO4 110 Discuss Data processing operators in Pig Latin 4 5 CO4 111 A start-up company want to use Hive for storing its data. List the collection types provided by Hive for this purpose? 4 5 CO4 112 What would be effect of negative value of second argument, which specifies the number of decimal places, on 4 the output 5 CO4 of MySQL TRUNCATE() function 113 Evaluate how some areas/disciplines that influenced with different type of machine learning. 5 5 CO5 114 What is ‘R’ language? Discuss about how Revolution R Enterprise (RRE) designed for scale & Performance 5 5 CO2 115 List various configuration files used in Hadoop Installation. What is use of mapred- site.xml? 1 5 CO4 116 With proper examples discuss and differentiate structured, unstructured and semi-structured data. Make a note 5 on how5 type CO5 of data affects data serializ 117 Discuss role of Data node and Name node in HDFS. Give commands with appropriate arguments to perform 2data transfer 5 CO5 between local file system and H 118 Compare Raw oriented and Column Oriented database structures 4 5 CO3 119 Illustrate the difference between the SQL versus NoSql Hadoop in detail? 4 5 CO5 120 Illuatrate the Hadoop cluster is a special type of computational cluster designed for storing and analyzing vast 4 amount of5 unstructured CO4 data in a distribute 121 Discuss the use of the FOREACH and ASSERT operator in Pig Latin? 4 5 CO5 122 Hit Rate - accounts the number of times direction of the stock is same as predicted. Return of investment 5 5 CO6 123 Create ashell command in Hive to list all the files in the current directory? 4 5 CO6 124 Create amany types of joins are there in Pig Latin with an examples? 4 5 CO6 125 Write a short note on the following operators: a. GROUP b. ORDER BY 4 5 CO6 126 Write a Java program in MapReduce application for word counting on Hadoop cluster? 3 5 CO6 127 Describe in detail about the role of statistical models in Big Data. 3 5 CO3 128 Air pollution monitoring systems can monitor the emission of harmful gases by factories and automobiles using5 gaseous5andCO6meteorological sensors. The 129 Illustrate in detail how big data are effectively filtered and mixed with the traditional one. 1 10 CO3 130 Summarize in detail about the challenges of the Big Data in Modern Data Analytics. 1 10 CO4 131 Analyze how google file system differs from the Hadoop file system and explains the google file system architecture 2 with 10 aCO4 neat sketch. 132 Analyze the statement in detail : “Data Analysis is not a decision-making system, but a decision supporting system” 2 10 CO4 133 Analyze a Regression Model for “ happy people get many hours of sleep” using your own data and what kind5 of inferences 10 CO4 it provides. 134 Compose the K-means partitioning algorithm using the given data. Consider five points { X1, X2,X3, X4, X5} with 5 the following 10 CO3coordinates as a two dimen 135 Perform analysis on web server report Sample Data: teleman.pr.mcs.net,-,-,[01/Jul/2005:00:03:57,0400], "GET,/images/KSC-logosmall.gif,HTTP/1.0", 5 10 CO3 304 136 Analysis how will you Order the use of Hive. How Does Hive Interact With Hadoop explain in detail? 4 10 CO3 137 Formulate a Hbase table from the following data Data_file.txt contains the below data 4 10 CO3 1. 1,India,Bihar,Champaran,2009,April,P1,1,5 2. 2,India, Bihar,Patna,2009,May,P1,2,10 3. 3,India, Bihar,Bhagalpur,2010,June,P2,3,15 4. 4,United States,California,Fresno,2009,April,P2,2,5 5. 5,United States 138 Analysisa generic design for Realtime Analytics Platform(RTAP). Discuss your answer related to real time sentiment 4 analysis 10 CO4 in instagram. 139 Analysisa real time stock market situation, bring out the various ideas used in prediction analysis 5 10 CO4 140 Analysisthe various operational modes of Hadoop cluster configuration and explain in detail about configuring/installing 5 10the CO4Hadoop in local/standalone m 141 Analyze the steps of Map reduce Algorithms.Draw the neat diagram explain with examples 5 10 CO4 142 Formulate the role of analytic sandbox, its benefits and types 1 5 CO6 143 Compile with a neat sketch about processing of a job in Hadoop 2 5 CO6 144 Recommend a procedure to find the number of occurrence of a word in a document using Hive. 4 5 CO6 145 Write about the system architecture and components of Hive and Hadoop 4 5 CO6 146 Assess the structure of big data representation 1 5 CO4 147 Evaluate ways in which the big data is represented. 1 5 CO4 148 Evaluatein detail about web data and what does it reveal? 1 5 CO4 149 Elaborate the map reduce algorithm with an example. 3 5 CO4 150 Demonstrate on the importance of using HDFS. 2 5 CO4 151 Explain in detail about Predictive Analysis. 5 5 CO5 152 Write the Hive command to create a table with four columns: First name, last name, age, and income? 4 5 CO5 153 Discuss the use of the FILTER and DISTINCT operator in Pig Latin 4 5 CO3 154 Define the various Statements used in flow of data processing in Pig Latin? 4 5 CO3 155 Analysis Can MapReduce be used to solve any kind of computational problems? if not, explain the cases where 3 MapReduce 5 CO4 is not applicable? 156 Estimate the entire process of data analysis conducted in the MapReduce programming model? 3 5 CO4 157 Analysis in detail about the ETL (Extract, Transform and Load) system? 4 5 CO3 158 AnalysisAvro data serialization technique in MapReduce 2 5 CO3 159 Analysis challenges under Big Data. How Big Data Analytics can be useful in the development of smart cities.2 5 CO3 160 Write Map Reduce steps for counting occurrences of specific numbers in the input text file(s). Also write the3commands5toCO3 compile and run the code. 161 Explain the important features that are required to well define a learning problem. 5 5 CO3 162 Analysisthe benefits between Apache pig Vs Map Reduce. 4 5 CO3 163 Analysisbig data analytics and explain various applications in the real world scenario? 1 5 CO3 164 Demonstrate about HBase and Hbase clients in detail. 4 5 CO3 165 Explain Compare and Contrast the Hadoop and MapR 3 5 CO4 166 Show the method of invoking the Grunt shell. 4 5 CO3 167 Analysis about Pig data model in detail with neat diagram 4 5 CO4 168 Assess the several types of motivation and data analysis available for time series? 4 5 CO4 169 Analysis the risks involved in handling Big data. 1 5 CO4 170 It’s true that HDFS is to be used for applications that have large data sets. Why is it not the correct tool to use 2 when there 5 CO5 are many small files? 171 Is it possible to create multiple tables in the hive for the same data?Justify 4 5 CO5 172 Write the command used to copy data from the local system onto HDFS? 2 5 CO5 173 Mention the main configuration parameters that has to be specified by the user to run MapReduce. 3 5 CO5 174 Hadoop is a great file system for running big data applications but it is very costly, comment on the Truthfulness 3 of this 5statement. CO5 175 WebContent Recommendation: Such applications can leverage big data systems for recommending new content 5 to the5users CO5based on user preferences 176 Customer Recommendations: Big data systems can be used to analyze customer data (such as demographic5data, shopping 5 CO5 history, or customer feedback 177 Analyze big data architecture with a neat schematic diagram. 1 5 CO5 178 Production planning and control systems measure various parameters of production processes and control the 1 entire production 5 CO5 process in real-time. Th 179 In a MapReduce program, Map () and Reduce () are two functions. The Map function performs actions like filtering, 1 grouping 5 CO5and sorting. While Reduce f 180 Big data systems for real-time data analysis can be used for the analysis of large volumes of fast-moving data1 from wearable 5 CO5devices and other in-hospita Bloom Level of geographic locations. Further improve on (b) by doing something clever about ambiguous place names