0% found this document useful (0 votes)
14 views

BI unit 2-1

The document outlines the syllabus for a course on Mathematical Models for Decision Making, covering the structure, development, and types of mathematical models, including iconic, analog, and symbolic models. It discusses the mathematical modeling process, characteristics, advantages, and disadvantages, as well as various classes of models used in decision-making such as risk analysis, project management, and predictive models. The document also highlights the importance of mathematical models in representing complex systems and aiding in effective decision-making.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views

BI unit 2-1

The document outlines the syllabus for a course on Mathematical Models for Decision Making, covering the structure, development, and types of mathematical models, including iconic, analog, and symbolic models. It discusses the mathematical modeling process, characteristics, advantages, and disadvantages, as well as various classes of models used in decision-making such as risk analysis, project management, and predictive models. The document also highlights the importance of mathematical models in representing complex systems and aiding in effective decision-making.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

2 Mathematical Models for

Decision Making
Syllabus

Mathemattcalmodels for decision makina :Structure of mathematical models, Development ofa model, Casses
of models.
Data mining Definition of data minina. Representation of input data,
methodologies.
Data mining process, Analyss
Data preparation :Data validation, Data
transformation, Data reduction.
2.1 Modeling
Modeling is building models for the representation of modules which is also called as the entities of a System.
The needs of modeling are as follows
To decompose the system into its basic entities.
To identify the essential entities and linkages.
To recompose a selected version of the system with its essential/relevant entities and linkages (i.e. the model).

2.2 Models

AModel isa simplified representation of the essential entities of some specificreality and their characteristics.
The Models are used for following things :
Exploration
Explanation
Extrapolation
2.2.1 Mathematical Models
Mathematical Models can be classified as follows:
Types of mathematical models
Types of Mathematical
Models

1, lconic (Scale) Model

2. Analog Modal
3. Symbolíc Model

Fig. 2.2.1:Types of mathematical models


2-2 Mathematlcal Models for Declslon Making
Buslness Intellgence and Data Analytics
1. lconlc (Scale) Model
orlginal. These mav
copy of a system usually based on a different scale than the
An lconlc model is a physical
or bridge model to scale.
appear in three dimensions like alrplane, car lconic Model is a look-allke
another type of iconlc model but it is only two dimenslons. An
Photographs are
house.
representatlon of some specific entity for example
Iconic Models can be represented in :
Two Dimenslons: e.g. photos, drawings, etc.
Three Dimensions: e.g, scale model.
Ascale model can be a
building).
reduction (scaled down, e.g, the model of a
working model).
reproduction (same scale, e.g. copy model, prototype or
enlargement (scaled up, e.g. the model of an atom).
2. Analog Model
behaves like it. These are usually two dimensional
An analog model does not look like the real system but
authority, and responsibility
charts or diagrams for e.g., organization charts, showing structure,
relationships.
Model is the representation of entities of a
Analog models are more abstract than iconic ones. An Analogue
system by analogue entities pertaining to the model (e.g. through diagrams).
An Analogue Model can be built through :

(a) Two Dimensional Visualization


(b) Three Dimensional Visualization

(a) Two Dimensional Visualization

Charts, Graphs, Diagrams


(e.g. the colour coding of a geographical chart for representing different altitudes)
(b) Three Dimensional Visualization
Analogue Devices
(e.g. the flow of water in pipes to represent the flow of electricity in wires or the flow of resources in an economic
system)
3. Symbolic Model
The complexity of relationships in some systems cannot be represented physically or the physical
representation may be cumbersome and take time to construct. Therefore a more abstract model is used
with the ald of symbols.
Most management science analysis is executed with the aid of mathematical models which utilize
mathematical symbols. These are general rather than specific and can describe diverse situations.
Furthermore they can be manipulated easily for purposes of experimentation and prediction.
When the concept of a model is extended to the area of mathematics, it is useful to know in a quantitative
sense how important or how pertinent the varlables are in the model with regard to their impact on the
solution.
TochKaouledge
PuDIlcations
Businesss Intelligence and| Data
Analytics 2-3 Mathematlcal Models for Decision Maklng
The mathnematlcal models deplct exnllcit relationshins and
interrelatlonships among the Vartaoles aitd
Tactors aeemed important in solving prohlems. ASymbollc Model s the
system through symbols. representation or enuus e
Symbols can be:
Mathematical.
Logical.
ad-hoc.
ASymbolic Model is used
whenever the reality is :
too complex or too abstract to be
portrayed through an iconic or analogue moael
the factors of the system (variables) can be renresented by symbols that can be
and fruitful way. manipulated in a meaningrul

2.3 The Structure of


Mathematical Models
Mathematical models are typically in the form of equations or other mathematical statements.
For example, the relatlonship between cost, revenue and profit can be
expressed as :
P = R-C .. (2.3.1)
Where, Pis profit, R is revenues, and Cis cost.
2.3.1 Classification of Mathematical Models
Classiflcatlon of
Mathematlcal Models

1. Linear vs. nonlinear

2. Deterministícvs. probabilistic (stochastic)

3. Static Vs. dynamic

4. Discrete vs. Continuous

5. Deductive, inductive, or floating

Fig. 2.3.1:Classification of mathematical models

1. Linear vs. nonlinear


Mathematical models are usually composed by variables, which are abstractions of quantities of interest in
the described systems, and operators that act on these varlables, which can be algebraic operators,
functions, differential operators, etc.
C If all the operators in amathematical modelexhibit linearity, the resulting mathematical model is defined as
linear, Amodel is considered to be nonlinear otherwise. The question of linearity and nonlinearity is
dependent on context, and linear models may have nonlinear expressions in them.
For example. in a statlstical linear model, it is assumed that a relationship is linear in the parameters, but it
may be nonlinear in the predictor variables. Similarly, adifferential equation is said to be linear if it can be
written with linear differential operators, but it can still have nonlinear expressions in it
TechKnewledge
PubIICations
Business Intelligence and Data Analytics 2-4 Mathematical Models for Decislon Making
In a mathematical programming model, Ifthe objective functions and constraints are represented
entirely
by linear equations, then the model is regarded as a linear model. If one or more of the objective functionsor
constraints are represented with a nonlinear equation, then the model is known as a nonlinear model,
Nonlinearity, even in fairly simple systems, is often associated with phenomena such as chaos and
irreversibility. Although there are exceptions, nonlinear systems and models tend to be more difficult to
study than linear ones.
Acommon approach to nonlinear problems is linearization, but this can be problematic if one is trying to
study aspects such as irreversibility, which are strongly tied to nonlinearity.
2. Deterministic vs. probabilistic (stochastic)
A deterministic model is one in which every set of variable states is uniquely determined by parameters in
the model and by sets of previous states of these variables. Therefore, deterministic models perform the
same way for a given set of initial conditions.
Conversely, in a stochastic model, randomness is present, and variable states are not described by unique
values, but rather by probability distributions.
3. Static vs. dynamic
static model does not account for the element of time, while a dynamic
A
model does.
Dynamic models typically are represented with difference equations or differential equations.
4. Discrete vs. Continuous
Adiscrete model does not take into account the
function of time and usually uses time-advance methods,
while a Continuous model does.

Continuous models typically are represented with f (t) and the changes are reflected over
intervals. continuous time

5. Deductive, inductive, or floating


A deductive model is a logical structure based on a theory. An inductive model arises from
findings and generalization from them. The floating model rests on empirical
merely the
neither theory nor observation, but is
invocation of expected structure.
Application of mathematics in social sciences outside of economics has been
models. Application of catastrophe theory in science has been criticized for unfounded
characterized as a floating model.
Seven Steps of Mathematical Modeling
1. Formulate the Problem.
2. Observe the System.
3. Formulate a Mathematical Model of the Problem.
4. Verify the Model and Use the Model for
Prediction.
5. Select a Simulation Alternative.
6.Present the Results and Conclusion of the Study to the
7. Implement and Evaluate Organization.
Recommendations.

TechKaouloge
PuDIICations
Buslness Intelligence and Data
Analytics 2-5 Mathematical Models for Decision Making
2.3.2 Characteristics of
Mathematical Models
To be used succesSTully in a typical Management
following criteria: Science (MS) prolect, a mathematical model must meet tne
A The model should be as
simple and understandable
as possible.
( The Model should be
reasonable.
(H) The Modelshould be easy to
maintain and control.
(iv) The model should be adaptive. The parameters and
structure of the model should be easy to change as new
insights and information evolve.
(v) The model should be complete on important issues. Le.. all
important variables and factors should have been
taken into consideration.

2.3.3 Develop the Model


a. Formulate the Model

Mathematical Representation : Translate the problem into equations and inequalities.


Objective Function : Maximize or Minimize f(x)
Constralints : g(x)sc
Assumptions: State any simplifying assumptions.
b. Implement Computational Tools
Use software for model implementation, such as:
Python (libraries like NumPy, SciPy, and PuLP)
R(statistical computing)
Business intelligence platforms (e.g., Tableau, Power BI with integrated analytics).

2.3.4 Advantages of Mathematical Models


1. Use of models avoids constructing costly plants ànd warehouses in locations that do not best meet the present
and future needs of the customers.
2 Amodel indicates gaps that are not immediately apparent, and after testing, the character of the failure might
give aclue to the model's deficiencies.
3. Models have the advantage of time, since results can be obtained within a relatively-short time.
4 Because of theconstant squeeze on profits, the cost and timesaving that MS models allow make them decision.
making tools of great value to the manager.

2.3.5 Disadvantages of Mathematical Models


1 Amodel that oversimplifies may inaccurately reflect the real world situation.
2 If the person who builds a model does not know what he is doing, output from the model will be incorrect.
3. Models can sometimes prove too expensive to originate when their cost is compared to the expected return from
their use.

Tech Knewledge
Pubi|Cations
Business Intelligence and Data Analytics 2-6 Mathematical Models for Decislon
2.4 Classes of Models
Maklng
There are varlous models whichh are used for makíng decisions. The various mathematical models are as
Classes of Models
follows:
Risk analysis model

Project management model

Predective model

Öptimisation model

Waiting Line model

Pattern recognisation model

Fig. 2.4.1 : Classes of Models

1. Risk analysis model


corporate.
Risk analysis is the process of assessing the likelihood of an adverse event occurring within the
government, or environmental sector.
to the
Risk analysis is the study of the underlying uncertainty of a given course of action and refers
uncertainty of forecasted cash flow streams, variance of portfolio/stock returns, the probability of a
project'ssuccess or failure, and possible future economic states.
Rísk analysts often work in tandem with forecasting professionals to minimize future negative unforeseen
effects.
2. Project managementmodel
our projects
Every project is extremely unique which means we cannot have a standard structure to execute
framework or
and achieve success in our endeavor. However, to have a good plan we need some kind of
structure to follow depending on the nature of the projéct.
is
Project management models or methodologies provide the framework to execute projects. A framework
something that tells you how often you will meet and discuss the progress, how you will document results,
how you will communícate and so on.
3. Predective model
Predictive modeling is a process that uses data mining and probability to forecast outcomes. Each model is
made up of a number of predictors, which are variables that are likely to influence future results.
may
Once data has been collected for relevant predictors, a statistical model is formulated. The model
sophisticated
employ a simple linear equation, or it may be a complex neural network, mapped out by
software. As additional data becomes available, the statistical analysis model is validated or revised.

4. Optimisationmodel
constraints, a
The Optimization Model class provides a common API for defining and accessing variables and
wellas other properties of each model. We will now discuss each of these components in more detail.
Tech Knewledge
PubIlcations
Business Inttellgence and Data
Analytics 2-7 Mathematlcal Models for Deciston Making
Types of Optimlzatlon Models
Optimlzaion problems can be classifled in terms of the nature of the oblective functlon and the nature ortne
Constralnts, speclal forms of the oblectlve function and the constratnts glve rise to speciallzed algoritnms
that are more efficlent. From this polnt of view, there are four types of
complexity.
optimlzation problems, o ute
An Unconstralned optimlzation problem is an optimization problem where the objective function can be or
any kind (inear or nonlinear) and there are no constraints. These types of problems are handled by ne
classes discussed in the earlier sections. Alinear
program an optimlzatlon problem with an objéctive
function that is linear in the variables, and all constratnts are also linear. Linear programs are
implemeuto
by the Linear Program class.
A quadraticprogram isan optimization problem with an objective function that is quadratic in the vartableS
(.e. 1t may contain squares and cross products of the decislon vartables), and all constraints are linear. A
quadratic program with no squares or cross products in the objective function is a linear program.
Quadratic programs are implemented by the Quadratic Program class.
Anoninear program is an optimization problem with an objective function that is an arbitrary nonlinear
function of the decision variables, and the constraints can be linear or nonlinear. Nonlinear programs are
implemented by the Nonlinear Program class.
5. Waiting Line model
There are basically two costs that must be balanced in waiting line system -the cost of service and the cost
of waiting. Note thatI am not considering another possible cost component -the cost of a scheduling system.
Theoretically, a scheduling system is a management strategy designed to avoid waiting lines (meaning you
should never wait in the doctor's office -yeah, right!) and is not covered in this module.
Scheduling systems are useful when the customer is known to the system and the short and long run costs
of waiting are relatively high. We willstudy scheduling system applications in linear programming later on
in the course.
Operational characteristics of waiting lines include:
1. The probability that no customers (or units) are in the system.
2. The average number of customers in the lines,
3. The average number of customers in the system (customers in line plus those being served.
4. The average time a customer spends in the waiting line.
5 The average time a customer spends in the system (waiting time plus time in the service facility.
6. The probability that an arriving customer has to wait for service.
6. Pattern recognisation model
Patern recognition deals with identifying a pattern and confrming it again. In general, a pattern can be a
fingerprint image, a handwritten cursive word, a human face, a speech signal, a bar code, or a web page on
the Internet.
The individual patterns are often grouped 1nto various categories based on their properties. When the
patterns of same properties are grouped together, the resultant group is also a pattern, which is often called
a pattern class.
TechKnouledge
PubICatlons
Business Intelligence and Data Analytics 2-8 Mathematical Models for
Decislon
Pattern recognition is the sclence for observing, distinguishing the patterns of interest, and Maklng
maklng correct
decisions about the patterns or pattern classes. Thus, a biometrlc system applies pattern
identify and classify the individuals, by comparing it with the stored templates. recognltion to
2.5 Data Mining Process
Data mining is a process used by companies to turn raw data into useful information. By using software to lo
for patterns in large batches of data, businesses can learn more about their customers to develop
more effecti..
marketing strategies, increase sales and decrease costs.
Data mining depends on effective data collection, warehousing and computer processing. Data mining is also
known as data discovery and knowledge discovery.
2.5.1 Data Mining Parameters
In data mining, association rules are created by analysing data for frequent if/then patterns, then using the
support and confidence criteria to locate the most important relationships within the data. Support is how
frequently the items appear in the database, while confdence is the number of times if-then statements are
accurate.

Other data mining parameters include Sequence or Path Analysis, Classification, Clustering and Forecasting.
Sequence or Path Analysis parameters look for patterns where one event leads to another later event
ASequence is an ordered list of sets of items, and it is a common type of data structure found in many databases.
AClassification parameter looks for new patterns, and might result in a change in the way the data is organized.
Classification algorithms predict variables based on other factors within the database.
Clustering parameters find and visually document groups of facts that were previously unknown. Clustering
groups a set of objects and aggregates them based on how similar they areto each other.
There are different ways a user can implement the cluster, which differentiate between each clustering model.
Fostering parameters within data mining can discover patterns in data that can lead to reasonable predictions
about the future, also known as predictive analysis.

2.5.2 Data Mining Tools and Techniques


Data mining techniques are used in many research areas, including mathematics, cybernetics, genetics and
marketing. While data mining techniques are a means to drive efficiencies and predict customer behavior, if used
correctly,a business can set itself apart from its competition through the use of predictive analysis.
Web mining, a type of data míning used in customer relationship management, integrates information gathered
by traditional data mining methods and techniques over the web.
Other data mining techniques include network approaches based on multitask learning for classifying patterns,
ensuring parallel and scalable execution of data mining algorithms, the mining of large databases, the handling of
relational and complex data types, and machine learning. Machine learning is a type of data mining tool that
designs specific algorithms from which to learn and predict.

TechKnowledge
PubICations
Businesss Intelligence and Data Analytics 2-9 Mathematical Models for Decislon Making
2.6 pata Mining Architecture
The majr components of any data mining system are data
source, data warehouse server, data m
patternevaluation module,
graphical user interface and knowledge base.

Graphical User Intertace

Pattern Evaluation
Knowledge
Base
Dala Mining Engine

Database or Data Warehouse Server

Data Cleaning, Integration and Selection

World
Wide
Web
Database Data Other Data
Warehouse Repositories

Fig. 2.6.1l: Data Mining System


(a) Data sources

Database, data warehouse, World Wide Web (Www), text files and other documents are the actual sources
of data. You need large volumes of historicaldata for data mining to be successful.
Organizations usually store data in databases or data warehouses. Data warehouses may contain one or
more databases, text files, spreadsheets or other kinds of information repositories. Sometimes, data may
reside even in plain text files or spreadsheets. World Wide Web or the Internet is another big source of data.
Different processes
The data needs to be cleaned, integrated and selected before passing it to the database or data warehouse
server. As the data is from different sources and in different formats, it cannot be used directly for the data
mining process because the data might not be complete and reliable. So, first data needs to be cleaned and
integrated. Again, more data than required will be collected from different data sources and only the data of
interest needs to be selected and passed to the server.
These processes are not as simple as we think. A number of techniques may be performed on the data as
part of cleaning, integration and selection.
(b) Database or Data warehouse server

The database or data warehouse server contains the actual datathat is ready to be processed. Hence, the server
0s responsible for retrievíng the relevant data based on the data mining request of the user.

TechKnouledge
Pub|ICatlons
Business Intelligence and Data Analytics 2-10 Mathematlcal Models for Decision
(c) Data mining engine
Making
The data mining engine is the core component of any data mining system. It consists of a number of modules fo
performing data mining tasks including assoclation, classiflcatlon, characterízation, clustering predictlon, time.
series analysis etc.
(d) Pattern evaluation modules
The pattern evaluation module is mainly responsible for the measure of interestingness of the patterm by using a
threshold value. It interacts with the data mining engline to focus the search towards interesting patterns.
(e) Graphical user interface
The graphical user interface module communicates between the user and the data mining system. This
module helps the user use the system easily and efficiently without knowing the real complexity behind the
process.

When the user specifies a query or a task, this module interacts with the data mining system and displays
the result in an easily understandable manner.
() Knowledge base
The knowledge base is helpful in the whole data mining process. It might be useful for guiding the search or
evaluating the interestingness of the result patterns.
The knowledge base might even contain user beliefs and data from user experiences that can be useful in
the process of data mining. The data mining engine might get inputs from the knowledge base to make the
result more accurate and reliable. The pattern evaluation module interacts with the knowledge base on a
regular basis to get inputs and also to update it.

2.6.1 Four Types of Data Mining Architecture


Types of Data Mining Architecture
Types of Data Mining
Architecture

a. No-coupling Data Mining

b. Loose Coupling Data Mining


c. Semi-Tight Coupling Data Mining

d. Tight Coupling Data Mining

Fig. 2.6.2 : Types of Data Mining Architecture

(a) No-coupling data mining


In thís architecture, data mining system does not use any functionality of a database. A no-coupling data
mining system retrieves data from a particular data sources.
The no-coupling data mining architecture does not take any advantages of a database. That is already very
efficient in organizing,storing, accessing and retrieving data.
Tech Kneuledge
PubICatI0nS
Making
Buslness Intelligence and Data Analytics 2-11 Mathematlcal Models for Decislon

The no-coupling architecture is consldered a noor architecture for data mining system. But 1 15 useu s
simple data mining processes.

(6) Loose coupllng data mining


In this architecture, data mining system uses a database for data retrieval. In loose coupling, data minins
architecture, data mining system retrleves data from a database. And it stores the result in those systems.
Data mining architecture is for memorv-based data mining system, That does not must high scalabilly and
high performance.
() Semi-Tight coupling data mining
In semi-tight coupling, data mining system uses several features of data warehouse systems. That is to pertorm
some data mining tasks. That includes sorting, indexing, aggregation, In this, some intermediate result can be
stored in a database for better performance.
(4) Tight coupling data mining
In tight coupling, a data warehouse is treated as an information retrieval component. Al the features of
database or data warehouse are used to perform data mining tasks.
This architecture provides system scalability, high performance, and integrated information. There are three
tiers in the tight-coupling data mining architecture
Three Tlers in the tight-coupling
data mining architecture

i, Data Layer

ii. Data mining application layer

ii. Front-end layer

Fig. 2.6.3: Three Tiers in the tight-coupling data mining architecture


) Data layer
We can define data layer as a database or data warehouse systems. This layer is an interface for all data
sources. Data mining resultsare stored in the data layer. Thus, we can present to end-user in form of reports
or another kind of visualization.

() Data mining application layer


It is to retrieve data from a database. Some transformation routine has to perform here. That is to transform
data into the desired format. Then we have to process data using various data mining algorithms.
(lii) Front-end layer
It provides the intuitive and friendly user interface for end-user. That is to interact with data mining system.
Data mining result presented in visualization form to the user in the front-end laver.

TechKnowedge
PubIICations
Business Intelllgence and Data Analytics Mathematical Models for Decision
2.6.2 Types of Data MiningProcesses
2-12
Making
Different data mining processes can be classifled into two types: data preparation or data preprocessing and data
mining. In fact, the first four processes, that are data cleaning, data integration, data selection and dat.
transformation, are considered as data preparation processes.
The last three processes including data mining, pattern evaluation and knowledge representation are integrated
intoone process called data mining.

Data Preparation

Cleaning Selection Prepared


Cleaned Data
Jintegratlon Data Transformation

Data Mining
Knowledge
Evaluation Data Mining|

Patterns

Fig. 2.6.4

(a) Data cleaning

Data cleaning is the process where the data gets cleaned. Data in the real world is normally incomplete,
noisy and inconsistent.
The data available in data sources might be lacking attribute values, data of interest etc. For example, you
want the demographic data of customers and what if the available data does not inlude attributes for the
gender or age of the customers? Then the data is of course incomplete. Sometimes the data might contain
errors or outliers.
An example is an age attribute with value 200. It is obvious that the age value is wrong in this case. The data
could also be inconsistent.
For example, the name of an employee might be stored differently in different data tables or documents.
Here, the data is inconsistent. If the data is not clean, the data mining results would be neither reliable nor
accurate.

Data cleaning involves a number of techniques including filling in the missing values manually,
combined
computer and human inspection, etc. The output of data cleaning process is adequately cleaned data.
(b) Data integration
Data integration is the process where data from different data sources are integrated into one. Data
lies in
different formats in different locations. Data could be stored in databases, text files, spreadsheets,
documents, data cubes, Internet and so on. Data integration is a really complex and tricky task because data
from different sources does not match normally.
Tech Knouledge
Pub|catlon s
BusinessIntelligence and| Data Analytics 2-13 Mathematical Models for Decislon Maklng
Supposea table Acontains an entitv named customer id where as another table Bcontains an entity named
number. It is really dificult to ensure that whether both these entities refer to the same
value or not.
Metadata can be used effectively to reduce errors the data integrationprocess. Another issue faced is data
redundancy. The same data might be available in diferent tables in the same database or even in different
data sources. Data integration tries to reduce redundancy to the maximum possible
the reliability of data. level without atrecung

(c) Data selection

Data mining process requires large volumes of historical data for analysis. So,
usually the data
integrated data contains much more data than actually reauired. From the available data, data ofrepository wn
to be selected and stored. Data selection is the interest needs
process where the data relevant to the analysis is retrieved rom
the database.

(d) Data transformation


Data transtormation is the process of transforming and
consolidating the data into different forms that are
suitable for mining. Data transformation normally involves normalization,
aggregation, generalization etc.
For example, a data set available as"-5, 37. 100. 89. 78" can be transformed as "-0.05, 0.37,
1.00, 0.89, 0.78".
Here data becomes more suitable for data mining. After data
integration, the available data is ready for data
mining.

(e) Data mining


Data mining is the core process where a number of complex and intelligent methods are
applied to extract
patterns from data. Data mining process includes a number of tasks such as association, classification,prediction,
clustering, time series analysis and so on.
(0 Pattern evaluation
The pattern evaluation identifies the truly interesting patterns representing knowledge based on
different types
of interestingness measures. A pattern is considered to be interesting if it is potentially useful, easily
understandable by humans, validates some hypothesis that someone wants to confirm or valid on new data with
some degree of certainty.

(9) Knowledge representation


The information mined from the data needs to be presented to the user in an appealing way. Different knowledge
representation and visualization techniques are applied to provide the output of data mining to the users.
Benefits of data mining
1. Data mining technique helps companies to get knowledge-based information.
2. Data mining helps organizations to make the profitable adjustments in operation and production.
3. The data mining is acost-effective and efficient solution compared to other statistical data applications.
4. Data mining helps with the decision-making process.
3. Facilitates automated prediction of trends and behaviors as wellas automated discovery of hidden patterns.
o. It can be implemented in new systems as well as existing platforms.
I, ltis the speedy process which makes it easy for the users to analyze huge amount of data in less time.

Tech Kaowledge
Publlcatlons
Business Intellgence and Data Analytics 2-14 Mathematical Models for

Disadvantages of data minlng


Decislon Maalking
1 There are chances of companles may sell useful tnformation of thelr customers to other companles
for
For example, American Express has sold credit card purchases of their customers to the other companies money.
2. Many data mining analytics software is difficult to operate and requíres advance traíning to work on.
3. Different data mining tools work in different manners due to different algorithms employed in
Therefore, the selection of correct data mining tool is a very difficult task. thelr deslgn.
4 The data mining techniques are not accurate, and so it can cause serious consequences in certain conditione

2.7 Analysis Methodologies


Data Mining Applications
Data mining is highly useful in the following domains :
Domain Types

1. Market Analysis and Management

2. Corporate Analysis and Risk Management

3. Fraud Detection

Fig. 2.7.1: Domain Types


Apart from these, data mining can also be used in the areas of production control, customer retention, science
exploration, sports, astrology, and Internet Web Surf-Aid
2.7.1 Market Analysis and Management
Listed below are the various fields of market where data mining is used:
Customer Profling : Data mining helps determine what kind of people buy what kind of products.
ldentifying Customer Requirements : Data mining helps in identifying the best products for different
customers. It uses prediction to find the factors that may attract new customers.
Cross Market Analysis : Data mining performs Association/correlations between product sales.
Target Marketing :Data mining helps to find clusters of model customers who share the same characteristics
such as interests, spending habits, income, etc.
Determining Customer purchasing pattern : Data mining helps in determining customer purchasing pattern.
Providing Summary Information : Data mining provides us various multidimensional summary reports.
2.7.2 Corporate Analysis and Risk Management
Data mining is used in the following fields of the Corporate Sector:
Finance Planning and Asset Evaluatlon : It involves cash flow analysis and prediction, contingent caim
analysis to evaluate assets.
Resource Planning : It involves summarizing and comparing the resources and spending.
Competition : It involves monitoring competitors and market directions.
Tech Knouledge
PubIICatlons
Buslness Intelligence and Data Analytlcs 2-15 Mathematlcal Models for Decision Making
2.7.3 Fraud Detection
Data mining is also used in the flelds of credit
card services and
In fraud telephone calls, it helps to find the teleconmmunlcation todetect irauas.
ate It also analyzes the destination of the call. duration of the call, time of the day or week
patterns that deviate from expected norms.

Business Data
understanding understanding

Data
preparation

Deployment
Data
Modeling

Evaluation

Fig. 2.7.2

1. Business understanding
In the business understanding phase :

First, it is required to understand business objectives clearly and find out what are the business's needs.
Next, we have to assess the current situation by finding the resources, assumptions, constraints and other
important factors which should be considered.
Then, from the business objectives and current situations, we need to create data mining goals to achieve
the business objectives within the current situation.
Finally, a good data mining plan has to be established to achieve both business and data mining goals. The
plan should be as detailed as possible.
2. Data understanding
Birst. the data understanding phase starts with initial data collection, which we collect from available data
sources, to help us get familiar with the data. Some important activities must be performed including data
load and data integration in order to make the data collection successfully,
Next. the "gross" or "surface" properties of acquired data need to be examined carefully and reported. Then,
the data needs to be explored by tackling the data mining questions, which can be addressed using querying,
reporting, and visualization.
Finally, the data quality must be examined by answering some important questions such as "Is the acquired
data complete?", "Is there any missing values in the acquired data?"

TechKnouledge
PubICatlons
Business Intelligence and Data Analytics 2-16 Mathematical Models for
Decision
3. Data preparation Making
The data preparation typically consumes about 90o of the time of the project. The outcome
preparation phase is the final data set. the data
Once available data sources are identified, they need to be selected, cleaned. constructed and
formatted into
the desired form. The dataexploration task at a greater depth may be carried during this phase to notica,
patterns based on business understanding,
4. Modeling
First, modeling techniques haveto be selected to be used for the prepared dataset. Next, the test scenards
must be generated to validate the quality and validity of the model. Then, one or more models are created L.
running the modeling tool on the prepared dataset.
Finally, models need tobe assessed carefully involving stakeholders to make sure that created models ara
met business initiatives.

5. Evaluation

In the evaluation phase, the model results must be evaluated in the context of business objectives in the first
phase. In this phase, new business requirements may be raised due to the new patterns that have been
discovered in the model results or from other factors.
Gaining business understanding is an iterative process in data mining. The go or no-go decision must be
made in this step to move to the deployment phase.
6. Deployment
The knowledge or information, which we gain through data mining process, needs to be presented in such a
way that stakeholders can use it when they want it. Based on the business requirements, the deployment
phase could be as simple as creating a report or as complex as a repeatable data mining process across the
organization.
In the deployment phase, the plans for deployment, maintenance, and monitoring have to be created for
implementation and also future supports. From the project point of view, the final report of the project
needs to summary the project experiences and reviews the project to see what need to improved created
learned lessons.
The CRISP-DM offers a uniform framework for experience documentation and guidelines. In addition, the
CRISP-DM can apply invarious industries with different types of data.

2.8 What is Data Preparation ?


data into a form suitable for
Data preparation (or data pre-processing) in this context means manipulation of
which cannot be fully
further analysis and processing. It is a process that involves many different tasks and
consuming. It has been
automated. Many of the data preparation activities are routine, tedious, and time
estimated that data preparation accounts for 60%-80% of the tÉme spent on a data mining project.
result in incorrect and
Data preparation is essential for successful data mining. Poor quality data typically
improve
unreliable data mining results. Data preparation improves the qualityy of data and consequently helps
the quality of data mining results. The well-known saying "garbage-in garbage-out" is very relevant to this
domain.
TechKnouledge
PubLCatlons
Data Analytics 2-17 Mathematical Models for Decislon Making
Intellgenceand
Rusiness

Data Preparation
datamining

Knowlodgo
Selection Prepared patterns
Transformation Data
Cleaning Cleaned
Integratio Data

Fig. 2.8.1

28.1 Data Validation


data needs of the
svlidation is about checking the information and to ensure that it complements the
check.
tom This removes the chances of errors.One of the many examples of data validation isrange
about checking the input data
Data validation has nothing to do with what the user wants to input. Validation is
data errors.
n ensure it conforms to the data requirements of the system to avoid
specified range.
An example of this is arange check to avoid an input number that is greater or smaller than the
28.2 Data Transformation
is more
In data transformation process data are transformed from one format to another format that
appropriate for data mining.
Some data transformation strategies
Data Transformation
Strategies

1. Smoothing

2. Aggregation

3. Generalization

4. Normalization

5. Attribute Construction

Fig. 2.8.2 : Data Transformation Strategies


LSmoothíng:Smoothing is a process of removing noise from the data.
2
Aggregatlon :Aggregation is a process where summary or aggregation operations are applied to the data.
eneralization : In generalization low-level data are replaced with high-level data by using concept hlerarchies
climbing,
*NOFmallzation : Normalization scaled attrlbute data so as to fall within a small specifled range, such as 0.0 to
1.0
5. Attrlbute Construction : In Attribute constructlon, new attributes are constructed from the given set of
attributes, database or date warehouse may store terabytes of data. So it may take very long to perform data
analysis and mining on such huge amounts of data.
TechKaouledge
Pubcatlons
Business Intelllgence and Data Analytics 2-18 Mathematical Models for Decision
2.8.3 Data Reduction Making
Data reductiontechniques can be applied to obtain a reduced representation of the data set that is much smal
in volume but still contain critical information.
Data reduction strategies
Types of Data
Reduction Strategles

1. Data Cube Aggregation

2. Dirnensionality Reduction

3. Data Compresslon

4. Numerosity Reductions

5. Discretisation and concept hierarchy generation

Fig. 2.8.3:Types of datareduction strategies


1, Data cube aggregation: Aggregation operations are applied to the data in the construction of adata cube.
2. Dimensionality reduction : In dimensionality reduction redundant attributes are detected and removed which
reduce the data set size.
3. Data compression : Encoding mechanisms are used to reduce the data set size.
4. Numerosity reductions: In numerosity reduction where the dataare replaced or estimated by alternative.
5.
Discretisation and concept hierarchy generation : Where raw data values for attributes are replaced by
ranges or higher conceptual levels.

Review Questions

Q. 1 What are the different types of model? (Refer Section 2.2.1) (5 Marks)
(5 Marks)
Q. 2 Write short note on structure of mathematical model. (Refer Section 2.3)
Q. 3 Explain classes of model. (Refer Sectlon 2.4)
(5Marks)
(2 Marks)
Q. 4 Define Data Mining. (Refer Section 2.5)
Section 2.5) (5 Marks)
Q. 5 Write short note on Data Mining parameters. (Refer
(5 Marks)
a. 6 Drawand explain architecture of data mining. (Refer Section 2.6)
Section 2.6) (5 Marks)
Q.7 Write various application of data mining. (Refer
(Refer Section 2.7.2) (5 Marks)
a. 8 Write short note on Corporate Analysis and Risk Management.
Sectlon 2.7.3) (5 Marks)
Q.9 Write short note on fraud detection. (Refer
Section 2.8) (5 Marks)
a. 10 Draw and explain data preparation. (Refer
(5 Marks)
Section 2.8.1)
a. 11 Write note on Data validation. (Refer
(5 Marks)
Section 2.8.2)
a. 12 Explain data transformation with suitable diagram. (Refer
2.8.3) (5Marks)
Q. 13 Write short note on data Reduction. (Refer Section

You might also like