0% found this document useful (0 votes)

37 views

ETI solved paper

Big Data analytics 6th semester diploma

Uploaded by

Pragati Dagale

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

37 views

ETI solved paper

Big Data analytics 6th semester diploma

Uploaded by

Pragati Dagale

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 38

lOMoARcPSD|39975562

Scheme - I
Sample Question Paper
Program Name : Diploma in Artificial Intelligence and Machine Learning
Program Code : AN
22684
Semester : Sixth
Course Title : Big Data Analytics
Marks : 70 Time: 3 Hrs.

Instructions:

(1) All questions are compulsory.

(2) Illustrate your answers with neat sketches wherever necessary.
(3) Figures to the right indicate full marks.
(4) Assume suitable data if necessary.
(5) Preferably, write the answers in sequential order.

Q.1) Attempt any FIVE of the following. 10 Marks

a) Define Big Data. ---------(I)
Big data is defined as collections of datasets whose volume, velocity or variety is
so large that it is difficult to store, manage, process and analyze the data using
traditional databases and data processing tools. In the recent years, there has been
an exponential growth in the both structured and unstructured data generated by
information technology, industrial, healthcare, Internet of Things, and other
systems.
lOMoARcPSD|39975562

b) State the importance of Big Data Analytics. -------(I)

Big Data Analytics is crucial because it enables organizations to:
• Extract valuable insights from vast amounts of data.
• Improve decision-making processes.
• Enhance customer experiences through personalized services.
• Optimize business operations and reduce costs.
• Detect and prevent fraud in real-time.

c) State the various raw data sources. -----------(I/II)

d) Enlist any two key advantages of Hadoop. -------(III)

lOMoARcPSD|39975562

e) State the any two complex data type of Hive. -------(IV)

f) Define RDD. -----------(V)

lOMoARcPSD|39975562

g) State the use of SPARK SQL. ------(V)

SPARK SQL is used for executing SQL queries, providing an interface for working with
structured and semi-structured data. It allows the execution of SQL queries, joining of disparate
data sources, and integration with traditional BI tools.

Q.2) Attempt any THREE of the following. 12 Marks

a) Explain the challenges with Big Data Analytics. ---------(I)

lOMoARcPSD|39975562

b) State any four importance of HADOOP. ------(III)

c) Explain any one domain specific example of Big Data.-----------(II)

Healthcare
The healthcare ecosystem consists of numerous entities including healthcare providers
(primary care physicians, specialists, or hospitals), payers (government, private health
insurance companies, employers), pharmaceutical, device and medical service companies, IT
solutions and services firms, and patients. The process of provisioning healthcare involves
massive healthcare data that exists in different forms (structured or unstructured), is stored in
disparate data sources (such as relational databases, or file servers) and in many different
lOMoARcPSD|39975562

formats. To promote more coordination of care across the multiple providers involved with
patients, their clinical information is increasingly aggregated from diverse sources into
Electronic Health Record (EHR) systems. EHRs capture and store information on patient
health and provider actions including individual-level laboratory results, diagnostic, treatment,
and demographic data. Though the primary use of EHRs is to maintain all medical data for an
individual patient and to provide efficient access to the stored data at the point of care, EHRs
can be the source for valuable aggregated information about overall patient populations [5, 6].

With the current explosion of clinical data the problems of how to collect data from distributed and
heterogeneous health IT systems and how to analyze the massive scale clinical data have become
critical. Big data systems can be used for data collection from different stakeholders (patients, doctors,
payers, physicians, specialists, etc) and disparate data sources (databases, structured and unstructured
formats, etc). Big data analytics systems allow
massive scale clinical data analytics and facilitate development of more efficient healthcare
applications, improve the accuracy of predictions and help in timely decision making.
Let us look at some healthcare applications that can benefit from big data systems:
• Epidemiological Surveillance: Epidemiological Surveillance systems study the distribution
and determinants of health-related states or events in specified populations and apply these
studies for diagnosis of diseases under surveillance at national level to control health
problems. EHR systems include individual-level laboratory results, diagnostic, treatment, and
demographic data. Big data frameworks can be used for integrating data from multiple EHR
systems and timely analysis of data for effectively and accurately predicting outbreaks,
population-level health surveillance efforts, disease detection and public health mapping.
• Patient Similarity-based Decision Intelligence Application: Big data frameworks can be
used for analyzing EHR data to extract a cluster of patient records most similar to a particular
target patient. Clustering patient records can also help in developing medical prognosis
applications that predicts the likely outcome of an illness for a patient based on the outcomes
for similar patients.
• Adverse Drug Events Prediction: Big data frameworks can be used for analyzing EHR
data and predict which patients are most at risk for having an adverse response to a certain
drug based on adverse drug reactions of other patients.
• Detecting Claim Anomalies: Heath insurance companies can leverage big data systems for
analyzing health insurance claims to detect fraud, abuse, waste, and errors.
• Evidence-based Medicine: Big data systems can combine and analyze data from a variety
of sources, including individual-level laboratory results, diagnostic, treatment and
demographic data, to match treatments with outcomes, predict patients at risk for a disease.
Systems for evidence-based medicine enable providers to make decisions not only based on
their own perceptions but also from the available evidence.
• Real-time health monitoring: Wearable electronic devices allow non-invasive and
continuous monitoring of physiological parameters. These wearable devices may be in various
forms such as belts and wrist-bands. Healthcare providers can analyze the collected healthcare
data to determine any health conditions or anomalies. Big data systems for real-time data
analysis can be used for analysis of large volumes of fast-moving data from wearable devices
and other in-hospital or in-home devices, for real-time patient health monitoring and adverse
event prediction.
lOMoARcPSD|39975562

d) Describe HDFS. -----------(III)

lOMoARcPSD|39975562

Q.3) Attempt any THREE of the following. 12 Marks

a) Describe classification of Big Data Analytics. -----------(I)

lOMoARcPSD|39975562
lOMoARcPSD|39975562

b) State different types of data analytics. -------------(I)

lOMoARcPSD|39975562
lOMoARcPSD|39975562

c) Describe data preparation process with an example. -------(II)

Data can often be dirty and can have various issues that must be resolved before the
data can be processed, such as corrupt records, missing values, duplicates,
inconsistent abbreviations, inconsistent units, typos, incorrect spellings and
incorrect formatting. Data preparation step involves various tasks such as data
cleansing, data wrangling or munging, de-duplication, normalization, sampling and
filtering. Data cleaning detects and resolves issues such as corrupt records, records
with missing values, records with bad formatting, for instance. Data wrangling or
munging deals with transforming the data from one raw format to another. For
example, when we collect records as raw text files form different sources, we may
come across inconsistencies in the field separators used in different files. Some file
may be using comma as the field separator, others may be using tab as the field
separator. Data wrangling resolves these inconsistencies by parsing the raw data
from different sources and transforming it into one consistent format.
Normalization is required when data from different sources uses different units or
scales or have different abbreviations for the same thing. For example, weather data
reported by some stations may contain temperature in Celsius scale while data from
other stations may use the Fahrenheit scale. Filtering and sampling may be useful
when we want to process only the data that meets certain rules. Filtering can also be
useful to reject bad records with incorrect or out-of-range values.

d) State any four data frame operations in SPARK session. ------(V)

Joining: Combining data from two data frames based on a common key.
lOMoARcPSD|39975562
lOMoARcPSD|39975562

Q.4) Attempt any THREE of the following. 12 Marks

a) Compare RDBMS versus Hadoop. -----------(III)

b) Describe any four Hive data types. ----------------(IV)

lOMoARcPSD|39975562
lOMoARcPSD|39975562

c) Explain Hive file format. ------------(IV)

lOMoARcPSD|39975562

d) Describe data processing in HADOOP. ------------(III)

lOMoARcPSD|39975562

e) Write and explain the Scala/Python code to create the Spark session.------(V)
lOMoARcPSD|39975562

Q.5) Attempt any TWO of the following. 12 Marks

a) Describe the responsibilities of Data Scientist. -------------- (I)

lOMoARcPSD|39975562

b) Describe mapping analysis flow to big data stack. -----------(II)

lOMoARcPSD|39975562
lOMoARcPSD|39975562

c) Write syntax and example of Hive Query commands for following.---------- (IV)
(i) Create table
(ii) Alter Table
(iii) loading data into table from file
lOMoARcPSD|39975562

Q.6) Attempt any TWO of the following. 12 Marks

a) Describe Hive architecture.

lOMoARcPSD|39975562

b) Write a code for building Spark SQL application with SBT.

lOMoARcPSD|39975562
lOMoARcPSD|39975562

c) Explain Apache Spark Architecture

lOMoARcPSD|39975562
lOMoARcPSD|39975562

Scheme - I
Sample Test Paper - I
Program Name : Diploma in Artificial Intelligence and Machine
Learning
Program Code : AN
22684
Semester : Sixth
Course Title : Big Data Analytics
Marks : 20 Time: 1 Hour

Instructions:

(1) All questions are compulsory.

Q.1) Attempt any FOUR. 08

Mark
a) Define Big Data Analytics. s
lOMoARcPSD|39975562

b) State the characteristics of data.

c) State different Big Data Stack.

d) List domain specific examples of Big Data.

e) State the features of Hadoop. (answered)

12
Mark
Q.2) Attempt any THREE. s
a) Explain Data Science.
lOMoARcPSD|39975562
lOMoARcPSD|39975562

b) Explain analytics flow for Big Data.

lOMoARcPSD|39975562
lOMoARcPSD|39975562

c) Explain Data Collection process of Big Data with example.

d) Describe HDFS. (answered)

lOMoARcPSD|39975562

Scheme - I
Sample Test Paper - II
Program Name : Diploma in Artificial Intelligence and Machine
Learning
Program Code : AN 22684
Semester : Sixth
Course Title : Big Data Analytics
Marks : 20 Time: 1 Hour

Instructions:

(1) All questions are compulsory.

Q.1) Attempt any FOUR. 08 Marks

a) Enlist key advantages of Hadoop. (answered)

b) State the use of HIVE.
lOMoARcPSD|39975562

c) Write syntax for loading data into table from file in HIVE
lOMoARcPSD|39975562

d) State the Spark Components.

lOMoARcPSD|39975562

e) Define RDD. (answered)

Q.2) Attempt any THREE. 12 Marks

a) Compare RDBMS versus Hadoop. (answered)

b) Explain SERDE.
lOMoARcPSD|39975562

c) Describe Apache Spark Architecture. (answered)

d) Describe Data Frame Operations. (answered)

cp5293 Big Data Analytics Question Bank
0% (1)
cp5293 Big Data Analytics Question Bank
13 pages
Big Data in Healthcare
No ratings yet
Big Data in Healthcare
14 pages
Big Data Analytics in Healthcare
100% (3)
Big Data Analytics in Healthcare
193 pages
Notes - KCS 061 Big Data Unit 1
No ratings yet
Notes - KCS 061 Big Data Unit 1
25 pages
Aws Glue Information
No ratings yet
Aws Glue Information
46 pages
Cloudera Hbase
100% (1)
Cloudera Hbase
145 pages
Talend Etl
No ratings yet
Talend Etl
78 pages
Bsa Assignment
No ratings yet
Bsa Assignment
13 pages
Big Data Analytics For Healthcare Industry: Impact, Applications, and Tools
No ratings yet
Big Data Analytics For Healthcare Industry: Impact, Applications, and Tools
10 pages
MCA - BigData Notes
No ratings yet
MCA - BigData Notes
136 pages
Big Data
No ratings yet
Big Data
1 page
Big Data Assignment 1 1
No ratings yet
Big Data Assignment 1 1
4 pages
BIG DATA ANALYTICS
No ratings yet
BIG DATA ANALYTICS
10 pages
Mar Publishing
No ratings yet
Mar Publishing
7 pages
BDAHC
No ratings yet
BDAHC
4 pages
Big Data in Healthcare Systems and Research
No ratings yet
Big Data in Healthcare Systems and Research
4 pages
Svy Paper 22101134 Amjad
No ratings yet
Svy Paper 22101134 Amjad
5 pages
Big Data Analytics in Health Care A Review Paper
No ratings yet
Big Data Analytics in Health Care A Review Paper
12 pages
Bda Solved Sample Question Paper 70 Marks
No ratings yet
Bda Solved Sample Question Paper 70 Marks
29 pages
10 1109ICoAC44903 2018 8939061
No ratings yet
10 1109ICoAC44903 2018 8939061
9 pages
bd_mcq
No ratings yet
bd_mcq
40 pages
Business Intelligence & Big Data Analytics-CSE3124Y
No ratings yet
Business Intelligence & Big Data Analytics-CSE3124Y
25 pages
Questions
No ratings yet
Questions
6 pages
Big Data Hadoop in Health Care
No ratings yet
Big Data Hadoop in Health Care
51 pages
final big data word
No ratings yet
final big data word
9 pages
Big Datapptfina1
No ratings yet
Big Datapptfina1
25 pages
Exam
No ratings yet
Exam
3 pages
BigData-Assignment1-CSP 554
No ratings yet
BigData-Assignment1-CSP 554
4 pages
Final Exam
No ratings yet
Final Exam
2 pages
Seminar Big Data in Health Care
No ratings yet
Seminar Big Data in Health Care
36 pages
Cp5293 Big Data Analytics Question Bank
0% (1)
Cp5293 Big Data Analytics Question Bank
13 pages
It (r20) 4-1 Big Data Analytics Digital Notes
No ratings yet
It (r20) 4-1 Big Data Analytics Digital Notes
84 pages
DA (1)
No ratings yet
DA (1)
86 pages
Mini Project Doc 2
No ratings yet
Mini Project Doc 2
25 pages
The Use of Big Data Analytics in Healthcare: Open Access Research
No ratings yet
The Use of Big Data Analytics in Healthcare: Open Access Research
24 pages
hadoop-big-data-unit-2
No ratings yet
hadoop-big-data-unit-2
23 pages
Analysis of Research in Healthcare Data Analytics - Sathyabama
No ratings yet
Analysis of Research in Healthcare Data Analytics - Sathyabama
43 pages
A Novel Framework For Bringing Smart Big Data To Proactive Decision Making in Healthcare
No ratings yet
A Novel Framework For Bringing Smart Big Data To Proactive Decision Making in Healthcare
13 pages
Abdul Azam - Final Research Report
No ratings yet
Abdul Azam - Final Research Report
9 pages
Public Health Precautionary - Survey and Challenges Venkatesh V Nitin Bhushan K N M S Ramaiah University of Applied Sciences, Bengaluru, India.
No ratings yet
Public Health Precautionary - Survey and Challenges Venkatesh V Nitin Bhushan K N M S Ramaiah University of Applied Sciences, Bengaluru, India.
4 pages
Big Data in Health Care Sector: Department of Computer Applications
No ratings yet
Big Data in Health Care Sector: Department of Computer Applications
9 pages
ST (Eal) Health PDF
No ratings yet
ST (Eal) Health PDF
10 pages
Algoritmos de Aprendizaje Automatic, medicina
No ratings yet
Algoritmos de Aprendizaje Automatic, medicina
4 pages
(25439251 - Data and Information Management) Big Data in Health Care - Applications and Challenges
No ratings yet
(25439251 - Data and Information Management) Big Data in Health Care - Applications and Challenges
29 pages
CSE545 sp23 (1) What Is Big Data 1-29
No ratings yet
CSE545 sp23 (1) What Is Big Data 1-29
88 pages
How might we develop analytics for hospitals' health-care data, optimizing data utilization to improve patient care, streamline operations, and enhance overall efficiency in healthcare institutions
No ratings yet
How might we develop analytics for hospitals' health-care data, optimizing data utilization to improve patient care, streamline operations, and enhance overall efficiency in healthcare institutions
15 pages
learn 2
No ratings yet
learn 2
32 pages
Innovative Project1
No ratings yet
Innovative Project1
25 pages
A Review Paper On Scope of Big Data Analysis in Heath INFORMATICS
No ratings yet
A Review Paper On Scope of Big Data Analysis in Heath INFORMATICS
8 pages
Big Data Lec4
No ratings yet
Big Data Lec4
38 pages
Seminar Big Data in Health Care
No ratings yet
Seminar Big Data in Health Care
33 pages
Discussion 6
No ratings yet
Discussion 6
2 pages
Big Data Bank
No ratings yet
Big Data Bank
24 pages
Unit I LM
No ratings yet
Unit I LM
12 pages
Data Analytics III-i
No ratings yet
Data Analytics III-i
85 pages
Assignment 1 Based On Unit 1
No ratings yet
Assignment 1 Based On Unit 1
6 pages
ARG 3203 Big Data Analytics Course Outline
No ratings yet
ARG 3203 Big Data Analytics Course Outline
2 pages
Buat PDM
No ratings yet
Buat PDM
19 pages
Introduction To Big Data Ecosystem V 2.0
No ratings yet
Introduction To Big Data Ecosystem V 2.0
76 pages
Introduction To Big Data PDF
No ratings yet
Introduction To Big Data PDF
31 pages
Big Data Answers
No ratings yet
Big Data Answers
14 pages
The InfluxDB Handbook: Deploying, Optimizing, and Scaling Time Series Data
From Everand
The InfluxDB Handbook: Deploying, Optimizing, and Scaling Time Series Data
Robert Johnson
No ratings yet
Data Mining Models: Techniques and Applications
From Everand
Data Mining Models: Techniques and Applications
Ravi Deshpande
No ratings yet
INTERVIEW RELATED QUESTIONS
No ratings yet
INTERVIEW RELATED QUESTIONS
4 pages
Est
No ratings yet
Est
25 pages
Big Data analytics
No ratings yet
Big Data analytics
17 pages
Black diary report
No ratings yet
Black diary report
1 page
2
No ratings yet
2
1 page
3
No ratings yet
3
9 pages
MAD_Unit 1 Notes
No ratings yet
MAD_Unit 1 Notes
13 pages
4
No ratings yet
4
1 page
5
No ratings yet
5
1 page
Tushar Chhabra: Education Skills Courses and Tools
No ratings yet
Tushar Chhabra: Education Skills Courses and Tools
1 page
Unit 5 Nosql Databases
No ratings yet
Unit 5 Nosql Databases
9 pages
Prophecy Io
No ratings yet
Prophecy Io
3 pages
Experiment No 2
No ratings yet
Experiment No 2
9 pages
Copy of 2024
No ratings yet
Copy of 2024
75 pages
Deepak (Sr. Data Engineer)
No ratings yet
Deepak (Sr. Data Engineer)
10 pages
Ramniranjan Jhunjhunwala College of Arts, Science & Commerce (Autonomous)
No ratings yet
Ramniranjan Jhunjhunwala College of Arts, Science & Commerce (Autonomous)
35 pages
Hortonworks Data Platform Installing HDP On Windows
No ratings yet
Hortonworks Data Platform Installing HDP On Windows
84 pages
Vivek Varma K: Data Scientist - Data Analyst
No ratings yet
Vivek Varma K: Data Scientist - Data Analyst
5 pages
Dice Resume CV SN
No ratings yet
Dice Resume CV SN
5 pages
Akhil Reddy GCP
No ratings yet
Akhil Reddy GCP
8 pages
Hadoop and Java Ques - Ans
No ratings yet
Hadoop and Java Ques - Ans
222 pages
NoSQL Interview Questions
No ratings yet
NoSQL Interview Questions
1 page
Hadoop Notes
No ratings yet
Hadoop Notes
11 pages
ETI solved paper
No ratings yet
ETI solved paper
38 pages
Un Ecosistema de Big Data de Fabricación Global para
No ratings yet
Un Ecosistema de Big Data de Fabricación Global para
10 pages
CC ZG522 Course Handout
No ratings yet
CC ZG522 Course Handout
6 pages
Cloudera Developer Training PDF
No ratings yet
Cloudera Developer Training PDF
593 pages
Big Data- Road map
No ratings yet
Big Data- Road map
22 pages
Deepak Sanagapalli Uuupdated Resume
No ratings yet
Deepak Sanagapalli Uuupdated Resume
8 pages
Unity Catalog
No ratings yet
Unity Catalog
15 pages
Aws Glue Interview
No ratings yet
Aws Glue Interview
259 pages
Srilakshmi M Resume
No ratings yet
Srilakshmi M Resume
2 pages
doc6
No ratings yet
doc6
3 pages
Cloudera Tutorial
100% (1)
Cloudera Tutorial
36 pages
7 IT 7 To 8 Scheme Syllabus
No ratings yet
7 IT 7 To 8 Scheme Syllabus
11 pages
50 Real Time Scenario (Problems & Solutions)
No ratings yet
50 Real Time Scenario (Problems & Solutions)
24 pages

ETI solved paper

Uploaded by

ETI solved paper

Uploaded by

lOMoARcPSD|39975562

(1) All questions are compulsory.

Q.1) Attempt any FIVE of the following. 10 Marks

b) State the importance of Big Data Analytics. -------(I)

c) State the various raw data sources. -----------(I/II)

d) Enlist any two key advantages of Hadoop. -------(III)

e) State the any two complex data type of Hive. -------(IV)

f) Define RDD. -----------(V)

g) State the use of SPARK SQL. ------(V)

Q.2) Attempt any THREE of the following. 12 Marks

a) Explain the challenges with Big Data Analytics. ---------(I)

b) State any four importance of HADOOP. ------(III)

c) Explain any one domain specific example of Big Data.-----------(II)

d) Describe HDFS. -----------(III)

Q.3) Attempt any THREE of the following. 12 Marks

a) Describe classification of Big Data Analytics. -----------(I)

b) State different types of data analytics. -------------(I)

c) Describe data preparation process with an example. -------(II)

d) State any four data frame operations in SPARK session. ------(V)

Q.4) Attempt any THREE of the following. 12 Marks

a) Compare RDBMS versus Hadoop. -----------(III)

b) Describe any four Hive data types. ----------------(IV)

c) Explain Hive file format. ------------(IV)

d) Describe data processing in HADOOP. ------------(III)

Q.5) Attempt any TWO of the following. 12 Marks

a) Describe the responsibilities of Data Scientist. -------------- (I)

b) Describe mapping analysis flow to big data stack. -----------(II)

Q.6) Attempt any TWO of the following. 12 Marks

a) Describe Hive architecture.

b) Write a code for building Spark SQL application with SBT.

c) Explain Apache Spark Architecture

(1) All questions are compulsory.

Q.1) Attempt any FOUR. 08

b) State the characteristics of data.

c) State different Big Data Stack.

d) List domain specific examples of Big Data.

e) State the features of Hadoop. (answered)

b) Explain analytics flow for Big Data.

c) Explain Data Collection process of Big Data with example.

d) Describe HDFS. (answered)

(1) All questions are compulsory.

Q.1) Attempt any FOUR. 08 Marks

a) Enlist key advantages of Hadoop. (answered)

d) State the Spark Components.

e) Define RDD. (answered)

Q.2) Attempt any THREE. 12 Marks

a) Compare RDBMS versus Hadoop. (answered)

c) Describe Apache Spark Architecture. (answered)

You might also like