Question Paper DSBDA
Question Paper DSBDA
Course Outcomes:
After completion of the course, learners should be able to
CO1: Analyze needs and challenges for Data Science Big Data Analytics
CO2: Apply statistics for Big Data Analytics
CO3: Apply the lifecycle of Big Data analytics to real world problems
CO4: Implement Big Data Analytics using Python programming
CO5: Implement data visualization using visualization tools in Python programming
CO6: Design and implement Big Databases using the Hadoop ecosystem
Unit I: Introduction 07
Hours
Basics and need of Data Science and Big Data, Applications of Data Science, Data explosion, 5
V’s of Big Data, Relationship between Data Science and Information Science, Business
intelligence versus Data Science, Data Science Life Cycle, Data: Data Types, Data Collection.
Need of Data wrangling, Methods: Data Cleaning, Data Integration, Data Reduction, Data
Transformation, Data Discretization.
Unit II Statistical Inference 07 Hours
Need of statistics in Data Science and Big Data Analytics, Measures of Central Tendency: Mean,
Median, Mode, Mid-range. Measures of Dispersion: Range, Variance, Mean Deviation, Standard
Deviation. Bayes theorem, Basics and need of hypothesis and hypothesis testing, Pearson
Correlation, Sample Hypothesis testing, Chi-Square Tests, t- test.
Text Books:
1. David Dietrich, Barry Hiller, “Data Science and Big Data Analytics”, EMC education
services, Wiley publication, 2012, ISBN0-07-120413-X.
2. Jiawei Han, MichelineKamber, and Jian Pie, “Data Mining: Concepts and Techniques”
Elsevier Publishers Third Edition, ISBN: 9780123814791, 9780123814807.
Reference Books :
1. EMC Education Services, “Data Science and Big Data Analytics- Discovering,
analyzing Visualizing and Presenting Data”
2. DT Editorial Services, “Big Data, Black Book”, DT Editorial Services, ISBN:
9789351197577, 2016 Edition.
3. Chirag Shah, “A Hands-On Introduction To Data Science”, Cambridge University
Press, (2020), ISBN : ISBN 978-1-108-47244-9.
4. Wes McKinney, “Python for Data Analysis” O' Reilly media, ISBN: 978-1-449-31979-3
5. “Scikit-learn Cookbook”, Trent hauk,Packt Publishing, ISBN: 9781787286382
6. 6. Jenny Kim, Benjamin Bengfort, “Data Analytics with Hadoop”, OReilly Media, Inc., ISBN:
9781491913703.
7. Venkat Ankam, “Big Data Analytics”, Packt Publishing, ISBN: 9781785884696
e-Books :
• An Introduction to Statistical Learning by Gareth James
• https://www.ime.unicamp.br/~dias/Intoduction%20to%20Statistical%20Learning.pdf
• Python Data Science Handbook by Jake VanderPlas
• https://tanthiamhuat.files.wordpress.com/2018/04/pythondatasciencehandbook.pdf
• Introducing Data Science by Davy Ciele, Manning Publications
• Introducing Data Science [PDF]
• Handbook for visualizing : a handbook for data driven design by Andy krik
• A Handbook for Data Driven Design
• An introduction to data Science :
https://docs.google.com/file/d/0B6iefdnF22XQeVZDSkxjZ0Z5VUE/edit?pli=1
• Hadoop Tutorial :
https://www.tutorialspoint.com/hadoop/hadoop_tutorial.pdf?utm_source=7_&utm_medium=affili
ate&utm_content=5f34cd37cdf1050001b09537&utm_campaign=Admitad&utm_term=761c5754
24fc4a6b48d02f72157eb578
• Learning with Python; How to think like a computer scientist:
http://openbookproject.net/thinkcs/python/english3e/
• Python for everybody:
• http://do1.dr-chuck.com/pythonlearn/EN_us/pythonlearn.pdf
• Scikit Learn Tutorial
• https://scikit-learn.org/stable/
Question. Questions
No.
1 Compare BI Vs. Data science
Question. Questions
No.
1 What is the need of statistics in Data Science and Big Data Analytics
Question. Questions
No.
1 Discuss the following in detail
a. Conventional challenges in big data
b. Nature of Data
2 Describe any five characteristics of Big Data.
4 Define big data. Why is big data required? How does traditional BI
environmentdiffer from big data environment?
5 What are the challenges with big data?
6 What are the three characteristics of big data? Explain the differences
between Bland Data Science.
Question. Questions
No.
1 Discuss the Looping Statements with an example.
(i) while (ii) for (iii) range
2 Write a Python function to sum of the numbers in a list
3 Write the features of Python. Give the advantages & disadvantages of it.
Question. Ques
No. tions
6. What is TF-IDF?
Question. Questions
No.
1 Introduce data visualization.
8 Describe Pig?
10 Explain Hive.
14 Define Histogram.
Instructions to Candidates:
1. Attempt Questions Q.1 OR Q.2, Q.3 OR Q.4,
2. Neat diagrams must be drawn wherever necessary
3. Assume suitable data, if necessary
C. Distinguish one tail or two tail hypothesis; draw the diagram to support 5
your answer.
Instructions to Candidates:
1. Attempt Questions Q.1 OR Q.2, Q.3 OR Q.4, Q.5 OR Q.6, Q.7 OR Q.8
2. Neat diagrams must be drawn wherever necessary
3. Assume suitable data, if necessary
a. Describe: 6
i. K-means clustering
5 ii. Hierarchical Clustering
b. What is TF-IDF? Explain with example. 6
c. Explain need and introduction to social network analysis. 4
OR
4
a. What is parameter tuning and optimization?
6 b. Write short note on:
i. Holdout Method 6
ii. Random Subsampling
6
c. Explain Confusion matrix in detail.
7 b. Describe: 6
i. line plot
ii. Density plot
iii. Box- plot
6
c. What is Map reduce, pig and Hive?
OR
6
a. Explain the types of data visualization
8 6
b. Explain Data Visualization Techniques
c. What are Analytical techniques used in Big data visualization 6
Instructions to Candidates:
1. Attempt Questions Q.1 OR Q.2, Q.3 OR Q.4, Q.5 OR Q.6, Q.7 OR Q.8
2. Neat diagrams must be drawn wherever necessary
3. Assume suitable data, if necessary
OR
a. Explain Data Science Life Cycle. 6
b. What is relationship between Data Science and Information Science? 6
2
c. Write short notes 8
i. Data Cleaning
ii. Data Integration
iii. Data Reduction
iv. Data Discretization
OR
a. Explain logistic regression
6
4 b. Differentiate between data analytics types. 4
c. Explain Naïve Bayes algorithm in detail. 6
a. Describe: 6
i. K-means clustering
5 ii. Hierarchical Clustering
b. Explain in detail text-preprocessing. 4
c. Explain Bag of words and TF-IDF. 6
OR
6
a. Explain the types of data visualization
7 6
b. Explain Data Visualization Techniques
c. What are Analytical techniques used in Big data visualization 6
OR
a. Explain Data visualization and Challenges to Big data visualization. 6
b. Describe: 6
8 i. line plot
ii. Density plot
iii. Histogram
6
c. What is Map reduce, pig and Hive?