0% found this document useful (0 votes)
37 views

ETI solved paper

Big Data analytics 6th semester diploma

Uploaded by

Pragati Dagale
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
37 views

ETI solved paper

Big Data analytics 6th semester diploma

Uploaded by

Pragati Dagale
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 38

lOMoARcPSD|39975562

Scheme - I
Sample Question Paper
Program Name : Diploma in Artificial Intelligence and Machine Learning
Program Code : AN
22684
Semester : Sixth
Course Title : Big Data Analytics
Marks : 70 Time: 3 Hrs.

Instructions:

(1) All questions are compulsory.


(2) Illustrate your answers with neat sketches wherever necessary.
(3) Figures to the right indicate full marks.
(4) Assume suitable data if necessary.
(5) Preferably, write the answers in sequential order.

Q.1) Attempt any FIVE of the following. 10 Marks


a) Define Big Data. ---------(I)
Big data is defined as collections of datasets whose volume, velocity or variety is
so large that it is difficult to store, manage, process and analyze the data using
traditional databases and data processing tools. In the recent years, there has been
an exponential growth in the both structured and unstructured data generated by
information technology, industrial, healthcare, Internet of Things, and other
systems.
lOMoARcPSD|39975562

b) State the importance of Big Data Analytics. -------(I)


Big Data Analytics is crucial because it enables organizations to:
• Extract valuable insights from vast amounts of data.
• Improve decision-making processes.
• Enhance customer experiences through personalized services.
• Optimize business operations and reduce costs.
• Detect and prevent fraud in real-time.

c) State the various raw data sources. -----------(I/II)

d) Enlist any two key advantages of Hadoop. -------(III)


lOMoARcPSD|39975562

e) State the any two complex data type of Hive. -------(IV)

f) Define RDD. -----------(V)


lOMoARcPSD|39975562

g) State the use of SPARK SQL. ------(V)


SPARK SQL is used for executing SQL queries, providing an interface for working with
structured and semi-structured data. It allows the execution of SQL queries, joining of disparate
data sources, and integration with traditional BI tools.

Q.2) Attempt any THREE of the following. 12 Marks

a) Explain the challenges with Big Data Analytics. ---------(I)


lOMoARcPSD|39975562

b) State any four importance of HADOOP. ------(III)

c) Explain any one domain specific example of Big Data.-----------(II)

Healthcare
The healthcare ecosystem consists of numerous entities including healthcare providers
(primary care physicians, specialists, or hospitals), payers (government, private health
insurance companies, employers), pharmaceutical, device and medical service companies, IT
solutions and services firms, and patients. The process of provisioning healthcare involves
massive healthcare data that exists in different forms (structured or unstructured), is stored in
disparate data sources (such as relational databases, or file servers) and in many different
lOMoARcPSD|39975562

formats. To promote more coordination of care across the multiple providers involved with
patients, their clinical information is increasingly aggregated from diverse sources into
Electronic Health Record (EHR) systems. EHRs capture and store information on patient
health and provider actions including individual-level laboratory results, diagnostic, treatment,
and demographic data. Though the primary use of EHRs is to maintain all medical data for an
individual patient and to provide efficient access to the stored data at the point of care, EHRs
can be the source for valuable aggregated information about overall patient populations [5, 6].

With the current explosion of clinical data the problems of how to collect data from distributed and
heterogeneous health IT systems and how to analyze the massive scale clinical data have become
critical. Big data systems can be used for data collection from different stakeholders (patients, doctors,
payers, physicians, specialists, etc) and disparate data sources (databases, structured and unstructured
formats, etc). Big data analytics systems allow
massive scale clinical data analytics and facilitate development of more efficient healthcare
applications, improve the accuracy of predictions and help in timely decision making.
Let us look at some healthcare applications that can benefit from big data systems:
• Epidemiological Surveillance: Epidemiological Surveillance systems study the distribution
and determinants of health-related states or events in specified populations and apply these
studies for diagnosis of diseases under surveillance at national level to control health
problems. EHR systems include individual-level laboratory results, diagnostic, treatment, and
demographic data. Big data frameworks can be used for integrating data from multiple EHR
systems and timely analysis of data for effectively and accurately predicting outbreaks,
population-level health surveillance efforts, disease detection and public health mapping.
• Patient Similarity-based Decision Intelligence Application: Big data frameworks can be
used for analyzing EHR data to extract a cluster of patient records most similar to a particular
target patient. Clustering patient records can also help in developing medical prognosis
applications that predicts the likely outcome of an illness for a patient based on the outcomes
for similar patients.
• Adverse Drug Events Prediction: Big data frameworks can be used for analyzing EHR
data and predict which patients are most at risk for having an adverse response to a certain
drug based on adverse drug reactions of other patients.
• Detecting Claim Anomalies: Heath insurance companies can leverage big data systems for
analyzing health insurance claims to detect fraud, abuse, waste, and errors.
• Evidence-based Medicine: Big data systems can combine and analyze data from a variety
of sources, including individual-level laboratory results, diagnostic, treatment and
demographic data, to match treatments with outcomes, predict patients at risk for a disease.
Systems for evidence-based medicine enable providers to make decisions not only based on
their own perceptions but also from the available evidence.
• Real-time health monitoring: Wearable electronic devices allow non-invasive and
continuous monitoring of physiological parameters. These wearable devices may be in various
forms such as belts and wrist-bands. Healthcare providers can analyze the collected healthcare
data to determine any health conditions or anomalies. Big data systems for real-time data
analysis can be used for analysis of large volumes of fast-moving data from wearable devices
and other in-hospital or in-home devices, for real-time patient health monitoring and adverse
event prediction.
lOMoARcPSD|39975562

d) Describe HDFS. -----------(III)


lOMoARcPSD|39975562

Q.3) Attempt any THREE of the following. 12 Marks

a) Describe classification of Big Data Analytics. -----------(I)


lOMoARcPSD|39975562
lOMoARcPSD|39975562

b) State different types of data analytics. -------------(I)


lOMoARcPSD|39975562
lOMoARcPSD|39975562

c) Describe data preparation process with an example. -------(II)


Data can often be dirty and can have various issues that must be resolved before the
data can be processed, such as corrupt records, missing values, duplicates,
inconsistent abbreviations, inconsistent units, typos, incorrect spellings and
incorrect formatting. Data preparation step involves various tasks such as data
cleansing, data wrangling or munging, de-duplication, normalization, sampling and
filtering. Data cleaning detects and resolves issues such as corrupt records, records
with missing values, records with bad formatting, for instance. Data wrangling or
munging deals with transforming the data from one raw format to another. For
example, when we collect records as raw text files form different sources, we may
come across inconsistencies in the field separators used in different files. Some file
may be using comma as the field separator, others may be using tab as the field
separator. Data wrangling resolves these inconsistencies by parsing the raw data
from different sources and transforming it into one consistent format.
Normalization is required when data from different sources uses different units or
scales or have different abbreviations for the same thing. For example, weather data
reported by some stations may contain temperature in Celsius scale while data from
other stations may use the Fahrenheit scale. Filtering and sampling may be useful
when we want to process only the data that meets certain rules. Filtering can also be
useful to reject bad records with incorrect or out-of-range values.

d) State any four data frame operations in SPARK session. ------(V)

Joining: Combining data from two data frames based on a common key.
lOMoARcPSD|39975562
lOMoARcPSD|39975562

Q.4) Attempt any THREE of the following. 12 Marks

a) Compare RDBMS versus Hadoop. -----------(III)

b) Describe any four Hive data types. ----------------(IV)


lOMoARcPSD|39975562
lOMoARcPSD|39975562

c) Explain Hive file format. ------------(IV)


lOMoARcPSD|39975562

d) Describe data processing in HADOOP. ------------(III)


lOMoARcPSD|39975562

e) Write and explain the Scala/Python code to create the Spark session.------(V)
lOMoARcPSD|39975562

Q.5) Attempt any TWO of the following. 12 Marks

a) Describe the responsibilities of Data Scientist. -------------- (I)


lOMoARcPSD|39975562

b) Describe mapping analysis flow to big data stack. -----------(II)


lOMoARcPSD|39975562
lOMoARcPSD|39975562

c) Write syntax and example of Hive Query commands for following.---------- (IV)
(i) Create table
(ii) Alter Table
(iii) loading data into table from file
lOMoARcPSD|39975562

Q.6) Attempt any TWO of the following. 12 Marks

a) Describe Hive architecture.


lOMoARcPSD|39975562

b) Write a code for building Spark SQL application with SBT.


lOMoARcPSD|39975562
lOMoARcPSD|39975562

c) Explain Apache Spark Architecture


lOMoARcPSD|39975562
lOMoARcPSD|39975562

Scheme - I
Sample Test Paper - I
Program Name : Diploma in Artificial Intelligence and Machine
Learning
Program Code : AN
22684
Semester : Sixth
Course Title : Big Data Analytics
Marks : 20 Time: 1 Hour

Instructions:

(1) All questions are compulsory.


(2) Illustrate your answers with neat sketches wherever necessary.
(3) Figures to the right indicate full marks.
(4) Assume suitable data if necessary.
(5) Preferably, write the answers in sequential order.

Q.1) Attempt any FOUR. 08


Mark
a) Define Big Data Analytics. s
lOMoARcPSD|39975562

b) State the characteristics of data.

c) State different Big Data Stack.

d) List domain specific examples of Big Data.

e) State the features of Hadoop. (answered)

12
Mark
Q.2) Attempt any THREE. s
a) Explain Data Science.
lOMoARcPSD|39975562
lOMoARcPSD|39975562

b) Explain analytics flow for Big Data.


lOMoARcPSD|39975562
lOMoARcPSD|39975562

c) Explain Data Collection process of Big Data with example.

d) Describe HDFS. (answered)


lOMoARcPSD|39975562

Scheme - I
Sample Test Paper - II
Program Name : Diploma in Artificial Intelligence and Machine
Learning
Program Code : AN 22684
Semester : Sixth
Course Title : Big Data Analytics
Marks : 20 Time: 1 Hour

Instructions:

(1) All questions are compulsory.


(2) Illustrate your answers with neat sketches wherever necessary.
(3) Figures to the right indicate full marks.
(4) Assume suitable data if necessary.
(5) Preferably, write the answers in sequential order.

Q.1) Attempt any FOUR. 08 Marks

a) Enlist key advantages of Hadoop. (answered)


b) State the use of HIVE.
lOMoARcPSD|39975562

c) Write syntax for loading data into table from file in HIVE
lOMoARcPSD|39975562

d) State the Spark Components.


lOMoARcPSD|39975562

e) Define RDD. (answered)

Q.2) Attempt any THREE. 12 Marks

a) Compare RDBMS versus Hadoop. (answered)


b) Explain SERDE.
lOMoARcPSD|39975562

c) Describe Apache Spark Architecture. (answered)


d) Describe Data Frame Operations. (answered)

You might also like