CS F415 1322 Data Mining
CS F415 1322 Data Mining
Pilani Campus
INSTRUCTION DIVISION
First Semester 2015-2016
COURSE HANDOUT (PART II)
In addition to Part-I (general handout for all courses appended to this time table) this portion gives further
details pertaining to the course.
Course No.: CS F415
Course Title: Data Mining
Instructor-in-charge: POONAM GOYAL ([email protected])
1. Objective and Scope
The course explores the concepts and techniques of data mining, a promising and flourishing frontier in
database systems. Data Mining is automated extraction of patterns representing knowledge implicitly stored in
large databases, data warehouses, and other massive information repositories. It is a decision support tool that
addresses unique decision support problems that cannot be solved by other data analysis tools such as Online
Analytical Processing (OLAP). The course covers data mining tasks like constructing decision trees, finding
association rules, classification, and clustering. The course is designed to provide students with a broad
understanding in the design and use of data mining algorithms. The course also aims at providing a holistic
view of data mining. It will have database, statistical, algorithmic and application perspectives of data mining.
2. Text Book
i) Tan P. N., Steinbach M & Kumar V. “Introduction to Data Mining” Pearson Education, 2006.
3. Reference Books
i) Han J & Kamber M, “Data Mining: Concepts and Techniques”, Morgan Kaufmann Publishers,
Second Edition, 2006
ii) Zaki MJ & Wagner M JR, “Data Mining and Analysis-Fundamental Concepts and Algorithms”
Cameridge Univ Press, 2014.
iii) Dunhum M.H. & Sridhar S. “Data Mining-Introductory and Advanced Topics”, Pearson
Education, 2006.
4. Course Plan
Lecture Learning Objective Topic(s) Chapter
No. Reference
1-2 To understand the Introduction to Data Mining 1+Class
Motivation Notes
definition and applications
of Data Mining What is Data Mining?
Data Mining Tasks
Issues in Data Mining
Applications
3-5 To understand types of Data Preprocessing 2
data and to improve the Types of data
quality of data and Data Quality
efficiency and the ease of Data preprocessing
the mining process. Similarity and Dissimilarity Measures
6 To study how to Data Exploration 3 Self Study
investigate the data Data Set & its Statistics
Visualization
OLAP & Multidimensional Data Analysis
7-10 To understand Classification 4+Class
Introduction Notes
Classification and its
applications Applications
Decision Tree based Algorithms
Model Over-fitting
Performance Evaluation of a Classifier
Comparing Classifiers
11-15 To study the alternative Classification: Alternative Techniques 5
approaches for Rule Based Classifier
Classification Nearest Neighbor Classifier
Bayesian Classification
Support Vector Machine
Ensemble Classifiers
Class Imbalance Problem
Multiclass Problem
16-19 To understand Association Rule Mining 6
applications of Introduction
Association Rule Mining Applications
and algorithms to find Market-Basket Analysis
them Frequent Itemsets
Apriori Algorithm
Alternative Methods
20-23 To understand methods Advanced Association Rule Mining 7+Class
Generalized Association Rules Notes
and need for finding
complex Association Multilevel Association Rules
Rules Multidimensional Association Rules
Temporal Association Rules
Infrequent Patterns
Constrained Based Association Rules
24-28 To understand Clustering 8
applications and Introduction
algorithms for Clustering Applications
Partitioning Algorithms
Hierarchical Algorithms
Density based Algorithms
Cluster Evaluation
29-33 To study advanced topics Clustering: Additional Issues and Algorithms 9
in cluster analysis Characteristics of Data, Clusters and clustering
Algorithms
Graph Based Clustering
Scalable Clustering Algorithms
34-35 To understand detection Anomaly Detection 10
of anomalies & their Preliminaries
causes Statistical Approaches
Proximity based Outlier Detection
Density based Outlier Detection
Clustering Based Techniques
36-40 To introduce advanced Advanced Topics Class Notes
topics in Data Mining Web Mining
Incremental Algorithms for Data Mining
Stream Data Mining
5. Evaluation Schedule
Component Duration Weightage(%) Date & Time Venue Remarks
Mid Sem Exam 90 Mins. 30 7/10 2:00 - 3:30 Closed Book
PM
Labs/Assignments 30 To be announced
Comprehensive 3 Hours 40 07/05 7/12 FN Partly open
7. Labs
One hour lab will be conducted every week. Students will be applying the concepts of data mining on the
problems and cases through the Data Mining software, IBM SPSS Modeler. Students will also be exposed
to modeling of the problems.
8. Assignments
Assignment(s) (programming/reading) will be given to the students. This will immensely help the
students in gaining a better understanding of the subject.
9. Chamber Consultation Hours
To be announced in the class.
10. Make-up Policy: Prior Permission is must and Make-up shall be granted only in genuine cases based on
individual’s need and circumstances.
11. Notices
All the notices concerning this course will be displayed on the IPC Notice Board or course website.
Instructor-in-charge