IV - CSE - Data Warehousing and Data Mining
IV - CSE - Data Warehousing and Data Mining
JAI'\IAHARLAL NEHRU TECH NOLOGICAL UNIVERSITY HYDERABAD lV Year B.Tech. CSE - I Sem T'P'D C
L 4
1t-t-
lntroduction: Fundarnentals of data mlning, Data Mining Functionalities, Classification of Data Mining systems, Data Mining Task primitives, lntegration ofa Oata Mining System with a Database ora Data Warehouse System, Major issues in Daia Mining. Data Preprocesslng: Need for Preprocessing the Oata, Data Cleaning,
Data lntegration and Transformation, Data Reduction, Oiscretization and Concept Hierarchy Generation.
UNIT II
)'r
I ,
tY
\)
Warehouse, Multidimensional Data Model, Oata Warehouse Architecture, Data Warehouse lmplementation, Further Development of Data Cube \ Technology, From Data Warehousing to Data Data Cube Computation and Data Generalization: Efficient Methods for^\ Oata Cube Computation, Further Development of Data Cube and OLep ) I Technology, Attribute-Odented
Mining
lnduction.
UNIT III
Srsl:l-
Concepts, Effcient and Scalable Frequent ltemset Mining MetnoOs, Uining various kinds of Association Rules, FromAssociation Mining to Conelation
"\
,,i
q\
v\
IV
Classification and Prediction: lssues Regarding Classification and\ Prediction, Classification by Oecision Tree tnduction, Bayesian/ Classification, Rule-Based Classification, Ctassification bt{_ Backpropagation, Support Vector Machines, Associative Classification,'\
Lazy Learners, Other Classification Methods, Prediction, Accuracy and Enor measures, Evaluating the accuracy of a Classifier or a Predictor, Ensemble
I
unitv
Methods
Cluster Analysis lntroduction :Types of Oata in Ciuster Analysis, A I Categorization of Maior Cluslering Methods, Partitioning Methods. t Hierarchical Methods, Density-Based Methods, Grid-Based lvlethods.
145
Model-Based Clustring Methods, Ctustering High-Oimensionat Data, Constraint-Based Cluster Anatysis, Ou!ie. Analysis. UNITVI Mlning Streams, Tims Ssriss and Saquonce Data: Minjng Data Streams, Mining Time-Series Oata, Mining Sequence patterns in Transactional Databases, [rining Sequence patterns in Biologicat Data, Graph Miningr.\ Social Network Analysis and Muttirelational Data UNITVII Mlning Object, Spatlal, Muttimedia, Tsxt and Web Oata: Muttidimensionat Analysis and Descriptive Mining of Complex Data Objects, Spatial Datr Mining, Multimedia Data Mining, Text Mining, Mintng the WorlO WiOe
Mining:
uNtTvlt
Weff
Applicatlons and Tr.nds in Data Mining: Data Mining Applications, Data Mining System Products and Research prototypes, Additional Themes { on Data Mining and Social lmpacts oi Oata Mining.
TEXT BOOKS:
1.
2.
Data Mining - Concepts and Techniques . Jiawei Han & Micheline Kamber, Morgan Kaufmann publishers, Etsevie(2- Edition, 2006. lntroduction to Oata Mining - pang-Ning Tan, Michael Steinbach and Vipin Kumar, Pearson education.
2.
4.
5. 6
Press. Data Warehousing in the RealWorld- SamAanhory & Oennis Munay Pearson Edn Asia. lnsight into Data Mining, K.pSoman, S.Diwakar, V.Ajay, pHl,2OOB. Data Warehousing Fundamenlals - paulraj ponnaiah Wiley student Edition The Data Warehouse Life cycte Toot kit - Ratph Kimball Witey siudent
editio n
Building the Oata Warehouse By Willjam H Inmon, John Witey & Sons lnc, 2005. 7. Data Mining lntroductory and advanced topics -Margaret H Ounham, Pearson education 8. Oata Mining,VPudi and P.Radha Krishna,Oxford University press. 9. Data Mining:Methods and Techniques,A.B. M Shawkat Ali and S.A.Wasimi, Cengage Leaming. 10. Data Warehouse 2.0,The Architecture forthe nextgeneraion of Data Warehousing. W.H.lnmon, O.Strauss, G.Neushloss, Elsevier, Distributed by SPD.
DATA WAREHOUSING AND DATA MINING Unit-l: lntroduciion to Data Mining: What is data mining, motivating challenges, origins of data mining, data mining tasks, Types of Data-attributes and miasuremints, types of data sets, Data Quality (Tan) Unit-ll: Dta preprocessing, Measures of Similarity and Oissimilarity: Basics, similarity and dissimilarity between simple attributes, dissimilarities between data objects, simiririties between data objects, examples of. proximity-measures: similarity ,"""rr"" for binary data, Jaccard coefficient, Cosine similarity, Extended Jaccard ctefficient, Conelation, Exploring Data : Data Set, Summary Statistics (Tan)
Unitlll:
Data Warehouse: basic concepts:, Data Warehousing Modeling: Data Cube and OLAp, Data Warehouse implementation efficient Olta cuU6 computation, partiai materialization, indexing OLAP data, efficient processing of OLAp queries. ( H &
i)
Uniuv:
Classification: Basic Concepts, General approach to solving a classification problem, Decision Tree induction: working of decision tree, building aiecision tree, meihods foi expressing attribute test conditions, measures for selecting the best split, Algorithm for decision tree induction. Model over fitting: Due to presence- of noise, due to lack of represeniation samples, evaluating the performance of classifier: holdout method, random sub sampling, cioss_ validation, bootstrap. (l-an) Unit-V: Classification-Alternative techniques: Bayesian Classifier: Bayes theorem, using bayes theorm for classification, Nai've Bayes classifier, Bayes erior rate, eayesian'eellet Networks: Model representation, model building (Tan)
Unit-Vl: Association Analysis: Problem Definition, Frequent ltem-set generation_ The Apriori principle , Frequent ltem set generation in the Apriori algorithri, candidate generation and pruning, support counting (eluding support counting using a Hash tree) , Rule generation, compact representation of frequent item sets, Fp-Growth Algorithmj. Gan)
Unit-Vll:
Overview- types of clusledng, Basic K-means, K -means -additional issues, Bisecting k-means, k-means and different types of clusters, strengths and weaknesses, k_meani as an optimization problem.
Unit-Vlll: Agglomerative Hierarchical cl'lslelrlg, basic agglomerative hierarchical clustering algorithm, specific techniques, DBSCAN: Traditionat density: center_based approach] strengths and weaknesses (Tan)
TEXT BOOKS:
l.
2.
lntroduction to Data Mining : pang-Ning tan, Michael Steinbach, Vipin Kumar, Pearson Data l\rining ,Concepts and Techniques, 3/e, Jiawei Han , Micheline Kamber , Elsevier
REFERENCE BOOKS:
2. 3. 4.
l.
lntroduction to Data Mining with Case Studies 2nd ed: GK Gupta; pHl. Data Mining : lntroductory and Advanced Topics : Dunham, Siidhar, pearson. Data Warehousing, Data Mining & OLAP, Alex Berson, Stephen J Smith, TMH Data Mining Theory and practice, Soman, Diwakar, Aiay, pHl,2006.