0% found this document useful (0 votes)
10 views

Block 1

The document is a term-end examination for a Data Science and Big Data course. It contains 10 questions assessing various concepts. Question 1 has 8 sub-questions covering topics like data types, probability distributions, Hadoop Distributed File System characteristics, data stream characteristics, NoSQL databases, data filtering mechanisms, and R programming concepts like dataframes, lists and vectors. Question 2 covers measurement scales, significance testing and data preprocessing concepts. Question 3 discusses Big Data characteristics, MapReduce paradigm and features of Apache Spark, Hive and other databases. The remaining questions assess additional topics like PageRank, recommendation systems, document similarity, social network graphs and data analysis concepts in R programming.

Uploaded by

yadavrajani3396
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

Block 1

The document is a term-end examination for a Data Science and Big Data course. It contains 10 questions assessing various concepts. Question 1 has 8 sub-questions covering topics like data types, probability distributions, Hadoop Distributed File System characteristics, data stream characteristics, NoSQL databases, data filtering mechanisms, and R programming concepts like dataframes, lists and vectors. Question 2 covers measurement scales, significance testing and data preprocessing concepts. Question 3 discusses Big Data characteristics, MapReduce paradigm and features of Apache Spark, Hive and other databases. The remaining questions assess additional topics like PageRank, recommendation systems, document similarity, social network graphs and data analysis concepts in R programming.

Uploaded by

yadavrajani3396
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

No.

of Printed Pages : 4 MCS-226

MASTER OF COMPUTER APPLICATIONS


(MCA-NEW)

Term-End Examination

December, 2022

MCS-226 : DATA SCIENCE AND BIG DATA

Time : 3 hours Maximum Marks : 100


Weightage : 70%

Note : Question no. 1 is compulsory and carries


40 marks. Attempt any three questions from the
rest.

1. (a) Explain the following types of data : 6


(i) Semi-structured data
(ii) Unstructured data
(iii) Qualitative data
(iv) Quantitative data

(b) What is meant by ‘‘Probability distribution


of continuous random variable’’ ? Explain
with the help of a diagram. Also explain the
normal distribution. 6

MCS-226 1 P.T.O.
(c) What are the characteristics of Hadoop
Distributed File System (HDFS) ? Why is it
used for Big data processing ? 6

(d) Explain the characteristics of data


streams. 4

(e) What are NoSQL databases ? Why are they


used ? 4

(f) Explain any one mechanism of filtering of


data streams. 4

(g) Explain the following, with the help of an


example, in the context of
R programming : 6

(i) Dataframe
(ii) List
(iii) Vector

(h) What is logistic regression ? Which


function of R programming can be used to
implement logistic regression ? 4

2. (a) Explain the characteristics of measurement


scales of data. Use these characteristics to
define various measurement scales of data. 6

(b) Explain the steps of significance testing,


with the help of an example. 8
MCS-226 2
(c) Explain the following terms with the help
of an example : 6
(i) Data pre-processing
(ii) Data curation
(iii) Data cleaning

3. (a) Explain the characteristics of Big data. How


does Big data differ from relational data ? 6

(b) Explain the steps of map-reduce paradigm


using the example of word counting. 6

(c) List the features of any two of the


following : 8
(i) Apache Spark
(ii) Hive
(iii) Column-based databases
(iv) Graph-based databases

4. (a) How can link analysis be used to compute


PageRank ? 4

(b) Explain the concept of Recommendation


System. 6

(c) Explain how the similarity between two


documents can be found. 6

(d) Explain how the social networks can be


represented using a graph. 4
MCS-226 3 P.T.O.
5. (a) Write an R program to create two
3  3 matrices and multiply them. How is
this program different from a similar
C program ? 5

(b) What is a box plot ? List the commands of


R programming that can be used to create a
box plot. 5

(c) What is multiple regression ? Write steps


about how R programming can be used to
create multiple regression model. 5

(d) What is a decision tree ? Write steps on


how R programming can be used for
making decision tree. 5

MCS-226 4

You might also like