BI unit 2-1
BI unit 2-1
Decision Making
Syllabus
Mathemattcalmodels for decision makina :Structure of mathematical models, Development ofa model, Casses
of models.
Data mining Definition of data minina. Representation of input data,
methodologies.
Data mining process, Analyss
Data preparation :Data validation, Data
transformation, Data reduction.
2.1 Modeling
Modeling is building models for the representation of modules which is also called as the entities of a System.
The needs of modeling are as follows
To decompose the system into its basic entities.
To identify the essential entities and linkages.
To recompose a selected version of the system with its essential/relevant entities and linkages (i.e. the model).
2.2 Models
AModel isa simplified representation of the essential entities of some specificreality and their characteristics.
The Models are used for following things :
Exploration
Explanation
Extrapolation
2.2.1 Mathematical Models
Mathematical Models can be classified as follows:
Types of mathematical models
Types of Mathematical
Models
2. Analog Modal
3. Symbolíc Model
Continuous models typically are represented with f (t) and the changes are reflected over
intervals. continuous time
TechKaouloge
PuDIICations
Buslness Intelligence and Data
Analytics 2-5 Mathematical Models for Decision Making
2.3.2 Characteristics of
Mathematical Models
To be used succesSTully in a typical Management
following criteria: Science (MS) prolect, a mathematical model must meet tne
A The model should be as
simple and understandable
as possible.
( The Model should be
reasonable.
(H) The Modelshould be easy to
maintain and control.
(iv) The model should be adaptive. The parameters and
structure of the model should be easy to change as new
insights and information evolve.
(v) The model should be complete on important issues. Le.. all
important variables and factors should have been
taken into consideration.
Tech Knewledge
Pubi|Cations
Business Intelligence and Data Analytics 2-6 Mathematical Models for Decislon
2.4 Classes of Models
Maklng
There are varlous models whichh are used for makíng decisions. The various mathematical models are as
Classes of Models
follows:
Risk analysis model
Predective model
Öptimisation model
4. Optimisationmodel
constraints, a
The Optimization Model class provides a common API for defining and accessing variables and
wellas other properties of each model. We will now discuss each of these components in more detail.
Tech Knewledge
PubIlcations
Business Inttellgence and Data
Analytics 2-7 Mathematlcal Models for Deciston Making
Types of Optimlzatlon Models
Optimlzaion problems can be classifled in terms of the nature of the oblective functlon and the nature ortne
Constralnts, speclal forms of the oblectlve function and the constratnts glve rise to speciallzed algoritnms
that are more efficlent. From this polnt of view, there are four types of
complexity.
optimlzation problems, o ute
An Unconstralned optimlzation problem is an optimization problem where the objective function can be or
any kind (inear or nonlinear) and there are no constraints. These types of problems are handled by ne
classes discussed in the earlier sections. Alinear
program an optimlzatlon problem with an objéctive
function that is linear in the variables, and all constratnts are also linear. Linear programs are
implemeuto
by the Linear Program class.
A quadraticprogram isan optimization problem with an objective function that is quadratic in the vartableS
(.e. 1t may contain squares and cross products of the decislon vartables), and all constraints are linear. A
quadratic program with no squares or cross products in the objective function is a linear program.
Quadratic programs are implemented by the Quadratic Program class.
Anoninear program is an optimization problem with an objective function that is an arbitrary nonlinear
function of the decision variables, and the constraints can be linear or nonlinear. Nonlinear programs are
implemented by the Nonlinear Program class.
5. Waiting Line model
There are basically two costs that must be balanced in waiting line system -the cost of service and the cost
of waiting. Note thatI am not considering another possible cost component -the cost of a scheduling system.
Theoretically, a scheduling system is a management strategy designed to avoid waiting lines (meaning you
should never wait in the doctor's office -yeah, right!) and is not covered in this module.
Scheduling systems are useful when the customer is known to the system and the short and long run costs
of waiting are relatively high. We willstudy scheduling system applications in linear programming later on
in the course.
Operational characteristics of waiting lines include:
1. The probability that no customers (or units) are in the system.
2. The average number of customers in the lines,
3. The average number of customers in the system (customers in line plus those being served.
4. The average time a customer spends in the waiting line.
5 The average time a customer spends in the system (waiting time plus time in the service facility.
6. The probability that an arriving customer has to wait for service.
6. Pattern recognisation model
Patern recognition deals with identifying a pattern and confrming it again. In general, a pattern can be a
fingerprint image, a handwritten cursive word, a human face, a speech signal, a bar code, or a web page on
the Internet.
The individual patterns are often grouped 1nto various categories based on their properties. When the
patterns of same properties are grouped together, the resultant group is also a pattern, which is often called
a pattern class.
TechKnouledge
PubICatlons
Business Intelligence and Data Analytics 2-8 Mathematical Models for
Decislon
Pattern recognition is the sclence for observing, distinguishing the patterns of interest, and Maklng
maklng correct
decisions about the patterns or pattern classes. Thus, a biometrlc system applies pattern
identify and classify the individuals, by comparing it with the stored templates. recognltion to
2.5 Data Mining Process
Data mining is a process used by companies to turn raw data into useful information. By using software to lo
for patterns in large batches of data, businesses can learn more about their customers to develop
more effecti..
marketing strategies, increase sales and decrease costs.
Data mining depends on effective data collection, warehousing and computer processing. Data mining is also
known as data discovery and knowledge discovery.
2.5.1 Data Mining Parameters
In data mining, association rules are created by analysing data for frequent if/then patterns, then using the
support and confidence criteria to locate the most important relationships within the data. Support is how
frequently the items appear in the database, while confdence is the number of times if-then statements are
accurate.
Other data mining parameters include Sequence or Path Analysis, Classification, Clustering and Forecasting.
Sequence or Path Analysis parameters look for patterns where one event leads to another later event
ASequence is an ordered list of sets of items, and it is a common type of data structure found in many databases.
AClassification parameter looks for new patterns, and might result in a change in the way the data is organized.
Classification algorithms predict variables based on other factors within the database.
Clustering parameters find and visually document groups of facts that were previously unknown. Clustering
groups a set of objects and aggregates them based on how similar they areto each other.
There are different ways a user can implement the cluster, which differentiate between each clustering model.
Fostering parameters within data mining can discover patterns in data that can lead to reasonable predictions
about the future, also known as predictive analysis.
TechKnowledge
PubICations
Businesss Intelligence and Data Analytics 2-9 Mathematical Models for Decislon Making
2.6 pata Mining Architecture
The majr components of any data mining system are data
source, data warehouse server, data m
patternevaluation module,
graphical user interface and knowledge base.
Pattern Evaluation
Knowledge
Base
Dala Mining Engine
World
Wide
Web
Database Data Other Data
Warehouse Repositories
Database, data warehouse, World Wide Web (Www), text files and other documents are the actual sources
of data. You need large volumes of historicaldata for data mining to be successful.
Organizations usually store data in databases or data warehouses. Data warehouses may contain one or
more databases, text files, spreadsheets or other kinds of information repositories. Sometimes, data may
reside even in plain text files or spreadsheets. World Wide Web or the Internet is another big source of data.
Different processes
The data needs to be cleaned, integrated and selected before passing it to the database or data warehouse
server. As the data is from different sources and in different formats, it cannot be used directly for the data
mining process because the data might not be complete and reliable. So, first data needs to be cleaned and
integrated. Again, more data than required will be collected from different data sources and only the data of
interest needs to be selected and passed to the server.
These processes are not as simple as we think. A number of techniques may be performed on the data as
part of cleaning, integration and selection.
(b) Database or Data warehouse server
The database or data warehouse server contains the actual datathat is ready to be processed. Hence, the server
0s responsible for retrievíng the relevant data based on the data mining request of the user.
TechKnouledge
Pub|ICatlons
Business Intelligence and Data Analytics 2-10 Mathematlcal Models for Decision
(c) Data mining engine
Making
The data mining engine is the core component of any data mining system. It consists of a number of modules fo
performing data mining tasks including assoclation, classiflcatlon, characterízation, clustering predictlon, time.
series analysis etc.
(d) Pattern evaluation modules
The pattern evaluation module is mainly responsible for the measure of interestingness of the patterm by using a
threshold value. It interacts with the data mining engline to focus the search towards interesting patterns.
(e) Graphical user interface
The graphical user interface module communicates between the user and the data mining system. This
module helps the user use the system easily and efficiently without knowing the real complexity behind the
process.
When the user specifies a query or a task, this module interacts with the data mining system and displays
the result in an easily understandable manner.
() Knowledge base
The knowledge base is helpful in the whole data mining process. It might be useful for guiding the search or
evaluating the interestingness of the result patterns.
The knowledge base might even contain user beliefs and data from user experiences that can be useful in
the process of data mining. The data mining engine might get inputs from the knowledge base to make the
result more accurate and reliable. The pattern evaluation module interacts with the knowledge base on a
regular basis to get inputs and also to update it.
The no-coupling architecture is consldered a noor architecture for data mining system. But 1 15 useu s
simple data mining processes.
i, Data Layer
TechKnowedge
PubIICations
Business Intelllgence and Data Analytics Mathematical Models for Decision
2.6.2 Types of Data MiningProcesses
2-12
Making
Different data mining processes can be classifled into two types: data preparation or data preprocessing and data
mining. In fact, the first four processes, that are data cleaning, data integration, data selection and dat.
transformation, are considered as data preparation processes.
The last three processes including data mining, pattern evaluation and knowledge representation are integrated
intoone process called data mining.
Data Preparation
Data Mining
Knowledge
Evaluation Data Mining|
Patterns
Fig. 2.6.4
Data cleaning is the process where the data gets cleaned. Data in the real world is normally incomplete,
noisy and inconsistent.
The data available in data sources might be lacking attribute values, data of interest etc. For example, you
want the demographic data of customers and what if the available data does not inlude attributes for the
gender or age of the customers? Then the data is of course incomplete. Sometimes the data might contain
errors or outliers.
An example is an age attribute with value 200. It is obvious that the age value is wrong in this case. The data
could also be inconsistent.
For example, the name of an employee might be stored differently in different data tables or documents.
Here, the data is inconsistent. If the data is not clean, the data mining results would be neither reliable nor
accurate.
Data cleaning involves a number of techniques including filling in the missing values manually,
combined
computer and human inspection, etc. The output of data cleaning process is adequately cleaned data.
(b) Data integration
Data integration is the process where data from different data sources are integrated into one. Data
lies in
different formats in different locations. Data could be stored in databases, text files, spreadsheets,
documents, data cubes, Internet and so on. Data integration is a really complex and tricky task because data
from different sources does not match normally.
Tech Knouledge
Pub|catlon s
BusinessIntelligence and| Data Analytics 2-13 Mathematical Models for Decislon Maklng
Supposea table Acontains an entitv named customer id where as another table Bcontains an entity named
number. It is really dificult to ensure that whether both these entities refer to the same
value or not.
Metadata can be used effectively to reduce errors the data integrationprocess. Another issue faced is data
redundancy. The same data might be available in diferent tables in the same database or even in different
data sources. Data integration tries to reduce redundancy to the maximum possible
the reliability of data. level without atrecung
Data mining process requires large volumes of historical data for analysis. So,
usually the data
integrated data contains much more data than actually reauired. From the available data, data ofrepository wn
to be selected and stored. Data selection is the interest needs
process where the data relevant to the analysis is retrieved rom
the database.
Tech Kaowledge
Publlcatlons
Business Intellgence and Data Analytics 2-14 Mathematical Models for
3. Fraud Detection
Business Data
understanding understanding
Data
preparation
Deployment
Data
Modeling
Evaluation
Fig. 2.7.2
1. Business understanding
In the business understanding phase :
First, it is required to understand business objectives clearly and find out what are the business's needs.
Next, we have to assess the current situation by finding the resources, assumptions, constraints and other
important factors which should be considered.
Then, from the business objectives and current situations, we need to create data mining goals to achieve
the business objectives within the current situation.
Finally, a good data mining plan has to be established to achieve both business and data mining goals. The
plan should be as detailed as possible.
2. Data understanding
Birst. the data understanding phase starts with initial data collection, which we collect from available data
sources, to help us get familiar with the data. Some important activities must be performed including data
load and data integration in order to make the data collection successfully,
Next. the "gross" or "surface" properties of acquired data need to be examined carefully and reported. Then,
the data needs to be explored by tackling the data mining questions, which can be addressed using querying,
reporting, and visualization.
Finally, the data quality must be examined by answering some important questions such as "Is the acquired
data complete?", "Is there any missing values in the acquired data?"
TechKnouledge
PubICatlons
Business Intelligence and Data Analytics 2-16 Mathematical Models for
Decision
3. Data preparation Making
The data preparation typically consumes about 90o of the time of the project. The outcome
preparation phase is the final data set. the data
Once available data sources are identified, they need to be selected, cleaned. constructed and
formatted into
the desired form. The dataexploration task at a greater depth may be carried during this phase to notica,
patterns based on business understanding,
4. Modeling
First, modeling techniques haveto be selected to be used for the prepared dataset. Next, the test scenards
must be generated to validate the quality and validity of the model. Then, one or more models are created L.
running the modeling tool on the prepared dataset.
Finally, models need tobe assessed carefully involving stakeholders to make sure that created models ara
met business initiatives.
5. Evaluation
In the evaluation phase, the model results must be evaluated in the context of business objectives in the first
phase. In this phase, new business requirements may be raised due to the new patterns that have been
discovered in the model results or from other factors.
Gaining business understanding is an iterative process in data mining. The go or no-go decision must be
made in this step to move to the deployment phase.
6. Deployment
The knowledge or information, which we gain through data mining process, needs to be presented in such a
way that stakeholders can use it when they want it. Based on the business requirements, the deployment
phase could be as simple as creating a report or as complex as a repeatable data mining process across the
organization.
In the deployment phase, the plans for deployment, maintenance, and monitoring have to be created for
implementation and also future supports. From the project point of view, the final report of the project
needs to summary the project experiences and reviews the project to see what need to improved created
learned lessons.
The CRISP-DM offers a uniform framework for experience documentation and guidelines. In addition, the
CRISP-DM can apply invarious industries with different types of data.
Data Preparation
datamining
Knowlodgo
Selection Prepared patterns
Transformation Data
Cleaning Cleaned
Integratio Data
Fig. 2.8.1
1. Smoothing
2. Aggregation
3. Generalization
4. Normalization
5. Attribute Construction
2. Dirnensionality Reduction
3. Data Compresslon
4. Numerosity Reductions
Review Questions
Q. 1 What are the different types of model? (Refer Section 2.2.1) (5 Marks)
(5 Marks)
Q. 2 Write short note on structure of mathematical model. (Refer Section 2.3)
Q. 3 Explain classes of model. (Refer Sectlon 2.4)
(5Marks)
(2 Marks)
Q. 4 Define Data Mining. (Refer Section 2.5)
Section 2.5) (5 Marks)
Q. 5 Write short note on Data Mining parameters. (Refer
(5 Marks)
a. 6 Drawand explain architecture of data mining. (Refer Section 2.6)
Section 2.6) (5 Marks)
Q.7 Write various application of data mining. (Refer
(Refer Section 2.7.2) (5 Marks)
a. 8 Write short note on Corporate Analysis and Risk Management.
Sectlon 2.7.3) (5 Marks)
Q.9 Write short note on fraud detection. (Refer
Section 2.8) (5 Marks)
a. 10 Draw and explain data preparation. (Refer
(5 Marks)
Section 2.8.1)
a. 11 Write note on Data validation. (Refer
(5 Marks)
Section 2.8.2)
a. 12 Explain data transformation with suitable diagram. (Refer
2.8.3) (5Marks)
Q. 13 Write short note on data Reduction. (Refer Section