Data Warehousing and Mining
Data Warehousing and Mining
Course objectives:
1. To identify the scope and essentiality of Data Warehousing and Mining.
2. To analyze data, choose relevant models and algorithms for respective applications.
3. To study spatial and web data mining.
4. To develop research interest towards advances in data mining.
Module
Topics Hrs.
No.
Introduction to Data Warehouse and Dimensional modelling: Introduction to
Strategic Information, Need for Strategic Information, Features of Data Warehouse,
Data warehouses versus Data Marts, Top-down versus Bottom-up approach. Data
1.0 warehouse architecture, metadata, E-R modelling versus Dimensional Modelling, 8
Information Package Diagram, STAR schema, STAR schema keys, Snowflake
Schema, Fact Constellation Schema, Factless Fact tables, Update to the dimension
tables, Aggregate fact tables.
ETL Process and OLAP: Major steps in ETL process, Data extraction:
Techniques, Data transformation: Basic tasks, Major transformation types, Data
2.0 Loading: Applying Data, OLTP Vs OLAP, OLAP definition, Dimensional 8
Analysis, Hypercubes, OLAP operations: Drill down, Roll up, Slice, Dice and
Rotation, OLAP models : MOLAP, ROLAP.
Introduction to Data Mining, Data Exploration and Preprocessing: Data
Mining Task Primitives, Architecture, Techniques, KDD process, Issues in Data
Mining, Applications of Data Mining, Data Exploration :Types of Attributes,
Statistical Description of Data, Data Visualization, Data Preprocessing: Cleaning,
3.0 10
Integration, Reduction: Attribute subset selection, Histograms, Clustering and
Sampling, Data Transformation & Data Discretization: Normalization, Binning,
Concept hierarchy generation, Concept Description: Attribute oriented Induction
for Data Characterization.
Text Books:
1. PaulrajPonniah, ―Data Warehousing: Fundamentals for IT Professionals‖, Wiley India.
2. Han, Kamber, "Data Mining Concepts and Techniques", Morgan Kaufmann 3rd
edition.
3. ReemaTheraja ―Data warehousing‖, Oxford University Press.
4. M.H. Dunham, "Data Mining Introductory and Advanced Topics", Pearson
Education.
Reference Books:
1. Ian H. Witten, Eibe Frank and Mark A. Hall " Data Mining ", 3rd Edition Morgan kaufmann
publisher.
2. Pang-Ning Tan, Michael Steinbach and Vipin Kumar, Introduction to Data Mining", Person
Publisher.
3. R. Chattamvelli, "Data Mining Methods" 2nd Edition NarosaPublishing House.
Internal Assessment:
Assessment consists of two class tests of 20 marks each. The first class test is to be conducted when approx.
40% syllabus is completed and second class test when additional 40% syllabus is completed. Duration of
each test shall be one hour.