Big Data Analysis
Big Data Analysis
Course Structure
Semester-1
Semester-2
Compulsory courses:
1. Foundations of Data Science
2. Advance Statistical Methods
3. 3 Machine Learning I
4. Datamining
Semester-3
Compulsory courses:
1. Modelling in Operations Management
2. Enabling Technologies for Data Science
3. Value Thinking
1
1. Introduction to Econometrics & Finance
2. Machine Learning II
3. Time series Analysis & Forecasting
4. Bio informatics
Semester-4
1. Internship based project.
Semester-1
Suggested books :
1. Statistics : David Freedman, Pobert Pisani & Roger Purves, WW.Norten & Co. 4 th Edition
2007.
2. The visual display of Quantitative Information : Edward Tufte, Graphics Press, 2001.
3. Best Practices in Data Cleaning : Jason W. Osborne, Sage Publications 2012.
2
Random Variables : discrete and continuous probability models, some probability
distributions : Binomial, Poisson, Geometric, Hypergeometric, Normal, exponential, Chi-
square, expectation, variance and other properties of the distribution.
Suggested Books:
1. A First Course in Probability : Shelden M. Ross, 2014.
2. Introduction to Stochastics Process : Paul G. Hoel, Sydney C. Port & Charles J. Stone,
Waveland Press, 1987.
3. Time Series Analysis and Its Applications : Robert H. Shumway and David S. Stoffer,
Springer 2010.
Suggested Books:
1. Linear Algebra and Its Application : Gilbert Strang, 4th Edition, Academic Press.
3
B) Concepts of Computation (20 hrs – Theory 2 hrs + Lab 18 hrs)
Algorithms, Convergence, Complexity with illustrations, Some sorting & searching
algorithms, Some numerical methods e.g. Newton-Raphson, Steepest ascent.
Suggested Books:
1. Database system concepts : Abraham Silberschartz, Henry F. Korth and S. Surarshan,
McGraw Hill, 2011.
Semester-2
4
D) Singular Value Decomposition (SVD): (5 hrs – Theory 1hrs + Lab 4 hrs)
Best rank k approximation, Power method for computing the SVD, Applications.
F) Algorithm for Massive Data Problems : (16 hrs – Theory 6hrs + Lab 10 hrs)
Frequency Moments of data streams, matrix algorithms.
Suggested book :
1. Foundations of Data Science : John Hopcroft & Ravindran Kannan.
Suggested Books :
1. Statistical Inference : P. J. Bickel and K. A. Docksum, 2nd Edition, Prentice Hall.
2. Introduction to Linear Regression Analysis : Douglas C. Montgomery
5
C) Neural Networks : ( 9 hrs – Theory 3hrs + Lab 6 hrs)
Representation Learning, Different Models like single and multi-layer perceptron, back
propagation, Application.
Suggested Books :
E) Cluster Analysis and Deviation Detection: (14 hrs – Theory 6hrs + Lab 8 hrs)
Partitioning algorithms, Density bases algorithm, Grid based algorithm, Graph theoretic
clustering.
F) Temporal and spatial data mining. ( 10 hrs – Theory 6 hrs + Lab 4 hrs)
6
Suggested Books
1. Data Mining Techniques : A. K. Pujari, Sangam Books Ltd., 2001
2. Mastering Data Mining : M. Berry and G. Linoff, John Wiley & Sons., 2000
Suggested Books :
1. Applied Multivariate Statistical Analysis : Richard A. Johnson and Dean W. Wichern,
Prentice Hall, 2002
Suggested Books :
1. Operations Research : Prem Kumar Gupta & D. S. Hira
2. Fundamentals of Queuing Theory : Donald Gross, John F. Shortle, James M. Thompson &
Carl M. Harris, Fourth Edition, Wiley
7
Semester-3
Suggested Books:
1. Hadoop in Action : Chuck Lam, 2010, ISBN : 9781935182191
2. Data-intensive Text Processing with Map Reduce : Jimmy Lin and Chris Dyer, Morgan &
Claypool Publishers, 2010
Movies:
8
1. Twelve Angry Men
2. Roshoman by Kurosawa
3. Trial of Nuremberg
4. Mahabharata by Peter Brook
Suggested Books:
1. The Hound of the Baskervilles by Arthur Conan Doyle
2. Five Little Pigs by Agatha Christie
3. The Purloined Letter by Edger Allan Poe
4. The Case of the Substitute Face
Case Studies:
9
E) Clustering : ( 6 hrs – Theory 3 hrs + Lab 3 hrs)
Performance criteria, K-means clustering, EM algorithm
F) Collaborative filtering ( 6 hrs – Theory 3 hrs + Lab 3 hrs)
G) Combining models ( 6 hrs – Theory 3 hrs + Lab 3 hrs)
H) Probabilistic graphical models( 6 hrs – Theory 3 hrs + Lab 3 hrs)
I) Large Scale Machine Learning : ( 6 hrs – Theory 3 hrs + Lab 3 hrs)
gradient descent with large data sets
J) Genetic Algorithm. ( 6 hrs – Theory 3 hrs + Lab 3 hrs)
Suggested Books
1. Machine Learning : Tom Mitchell
Suggested Books
1. Introduction to Statistical Time Series : W. A. Fuller
2. Introduction to Time Series Analysis : P. J. Brockwell and R. A. Davis
10
K) Hidden Markov Model. ( 4 hrs – Theory 2 hrs + Lab 2 hrs)
L) Lattice Model. ( 4 hrs – Theory 2 hrs + Lab 2 hrs)
M) Algorithms. (8 hrs – Theory 6 hrs + Lab 2 hrs)
Suggested Books
1. Introduction to Computational Molecular Biology : C. Setubal & J. Meidanis, PWS
Publishing, Boston, 1997
Semester-4
11