Data Science Syllabus New PDF
Data Science Syllabus New PDF
Data Science
Unit -1
Introduction, Grasping the Fundamentals of Big Data, The Evolution of Data Management, Defining Big Data,
Building a Successful Big Data Management Architecture, Beginning with capture, organize, integrate, analyze, and
act, Setting the architectural foundation, Performance matters, Big Data Types, Defining Structured Data, sources of
big structured data, role of relational databases in big data, Defining Unstructured Data, sources of unstructured
data, Integrating data types into a big data environment
Unit-II
Statistics- Population, Sample, Sampled data, Sample space, Random sample, Sampling distribution, Variable,
Variation, Frequency, Random variable, Uniform random variable, Exponential random variable, Mean, Median,
Range, Mode, Variance, Standard deviation, Correlation, Linear Correlation, Correlation and Causality, Regression,
Linear Regression, Linear Regression with Nonlinear Substitution, Classification, Classification Criteria, Naive
Bayes Classifier, Support Vector Machine
Unit-III
Introduction Data Analytics, Drivers for analytics, Core Components of analytical data architecture,
Data warehouse architecture, column oriented database, Parallel vs. distributed processing, Shared nothing data
architecture and Massive parallel processing, Elastic scalability, Data loading patterns, Data Analytics lifecycle:
Discovery, Data Preparation, Model Planning, Model Building, Communicating results and findings, Methods: K
means clustering, Association rules.
Unit-IV
Data Science Tools- Cluster Architecture vs Traditional Architecture, Hadoop, Hadoop vs. Distributed databases,
The building blocks of Hadoop, Hadoop datatypes, Hadoop software stack, Deployment of Hadoop in data center,
Hadoop infrastructure, HDFS concepts, Blocks, Namenodes and Datanodes, Overview of HBase, Hive, Cassandra
and Hypertable, Sqoop.
Unit-V
Introduction to R, Data Manipulation and Statistical Analysis with R, Basics, Simple manipulations, Numbers and
vectors, Input/Output, Arrays and Matrices, Loops and conditional execution, functions, Data Structures, Data
transformations, Strings and dates, Graphics.
References:
1. Big Data For Dummies by Judith Hurwitz, Alan Nugent, Fern Halper, Marcia Kaufman, Wiley, ISBN: 978-1118-50422-2, 2013
2. Data Analytics, Models and Algorithms for Intelligent Data Analysis by Runkler, Thomas A., Springer Vieweg ,
ISBN 978-3-8348-2589-6, 2013
3. Big Data Analytics with R and Hadoop, by Vignesh Prajapati, Packt Publication, ISBN 978-1-78216-328-2, 2013