0% found this document useful (0 votes)
19 views19 pages

Unit II - DW&DM

The document discusses the multi-dimensional data model, which represents data in cubes with dimensions and facts. It describes how a multi-dimensional model works by collecting, grouping, and analyzing data. Examples are provided to illustrate 2D and 3D data representations. Advantages include ease of use and performance, while disadvantages include complexity.

Uploaded by

Ranveer Sehedeva
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views19 pages

Unit II - DW&DM

The document discusses the multi-dimensional data model, which represents data in cubes with dimensions and facts. It describes how a multi-dimensional model works by collecting, grouping, and analyzing data. Examples are provided to illustrate 2D and 3D data representations. Advantages include ease of use and performance, while disadvantages include complexity.

Uploaded by

Ranveer Sehedeva
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

MultiDimensional Data Model

The multi-Dimensional Data Model is a method which is used for ordering data in the database along with
good arrangement and assembling of the contents in the database.
The Multi Dimensional Data Model allows customers to interrogate analytical questions associated with
market or business trends, unlike relational databases which allow customers to access data in the form of
queries. They allow users to rapidly receive answers to the requests which they made by creating and
examining the data comparatively fast.
OLAP (online analytical processing) and data warehousing uses multi dimensional databases. It is used to
show multiple dimensions of the data to users.
It represents data in the form of data cubes. Data cubes allow to model and view the data from many
dimensions and perspectives. It is defined by dimensions and facts and is represented by a fact table. Facts
are numerical measures and fact tables contain measures of the related dimensional tables or names of the
facts.

Multidimensional Data Representation

Working on a Multidimensional Data Model

On the basis of the pre-decided steps, the Multidimensional Data Model works.
The following stages should be followed by every project for building a Multi Dimensional Data Model :
Stage 1 : Assembling data from the client : In first stage, a Multi Dimensional Data Model collects correct
data from the client. Mostly, software professionals provide simplicity to the client about the range of data
which can be gained with the selected technology and collect the complete data in detail.
Stage 2 : Grouping different segments of the system : In the second stage, the Multi Dimensional Data
Model recognizes and classifies all the data to the respective section they belong to and also builds it
problem-free to apply step by step.
Stage 3 : Noticing the different proportions : In the third stage, it is the basis on which the design of the
system is based. In this stage, the main factors are recognized according to the user’s point of view. These
factors are also known as “Dimensions”.
Stage 4 : Preparing the actual-time factors and their respective qualities : In the fourth stage, the factors
which are recognized in the previous step are used further for identifying the related qualities. These
qualities are also known as “attributes” in the database.
Stage 5 : Finding the actuality of factors which are listed previously and their qualities : In the fifth
stage, A Multi Dimensional Data Model separates and differentiates the actuality from the factors which
are collected by it. These actually play a significant role in the arrangement of a Multi Dimensional Data
Model.
Stage 6 : Building the Schema to place the data, with respect to the information collected from the steps
above : In the sixth stage, on the basis of the data which was collected previously, a Schema is built.
For Example :
1. Let us take the example of a firm. The revenue cost of a firm can be recognized on the basis of different
factors such as geographical location of firm’s workplace, products of the firm, advertisements done,
time utilized to flourish a product, etc.

2. Let us take the example of the data of a factory which sells products per quarter in Bangalore. The data
is represented in the table given below :

2D factory data

In the above given presentation, the factory’s sales for Bangalore are, for the time dimension, which is
organized into quarters and the dimension of items, which is sorted according to the kind of item which is
sold. The facts here are represented in rupees (in thousands).
Now, if we desire to view the data of the sales in a three-dimensional table, then it is represented in the
diagram given below. Here the data of the sales is represented as a two dimensional table. Let us consider
the data according to item, time and location (like Kolkata, Delhi, Mumbai). Here is the table :
3D data representation as 2D

This data can be represented in the form of three dimensions conceptually, which is shown in the image
below :

3D data representation

Advantages of Multi Dimensional Data Model

The following are the advantages of a multi-dimensional data model :


 A multi-dimensional data model is easy to handle.
 It is easy to maintain.
 Its performance is better than that of normal databases (e.g. relational databases).
 The representation of data is better than traditional databases. That is because the multi-
dimensional databases are multi-viewed and carry different types of factors.
 It is workable on complex systems and applications, contrary to the simple one-dimensional
database systems.
 The compatibility in this type of database is an upliftment for projects having lower bandwidth
for maintenance staff.

Disadvantages of Multi Dimensional Data Model

The following are the disadvantages of a Multi Dimensional Data Model :


 The multi-dimensional Data Model is slightly complicated in nature and it requires
professionals to recognize and examine the data in the database.
 During the work of a Multi-Dimensional Data Model, when the system caches, there is a great
effect on the working of the system.
 It is complicated in nature due to which the databases are generally dynamic in design.
 The path to achieving the end product is complicated most of the time.
 As the Multi Dimensional Data Model has complicated systems, databases have a large number
of databases due to which the system is very insecure when there is a security break.
Data Warehouse Implementation

There are various implementation in data warehouses which are as follows

1. Requirements analysis and capacity planning: The first process in data warehousing
involves defining enterprise needs, defining architectures, carrying out capacity planning,
and selecting the hardware and software tools. This step will contain be consulting senior
management as well as the different stakeholder.

2. Hardware integration: Once the hardware and software has been selected, they require
to be put by integrating the servers, the storage methods, and the user software tools.

3. Modelling: Modelling is a significant stage that involves designing the warehouse


schema and views. This may contain using a modeling tool if the data warehouses are
sophisticated.

4. Physical modeling: For the data warehouses to perform efficiently, physical modeling
is needed. This contains designing the physical data warehouse organization, data
placement, data partitioning, deciding on access techniques, and indexing.

5. Sources: The information for the data warehouse is likely to come from several data
sources. This step contains identifying and connecting the sources using the gateway,
ODBC drives, or another wrapper.

6. ETL: The data from the source system will require to go through an ETL phase. The
process of designing and implementing the ETL phase may contain defining a suitable
ETL tool vendors and purchasing and implementing the tools. This may contains
customize the tool to suit the need of the enterprises.

7. Populate the data warehouses: Once the ETL tools have been agreed upon, testing the
tools will be needed, perhaps using a staging area. Once everything is working
adequately, the ETL tools may be used in populating the warehouses given the schema
and view definition.

8. User applications: For the data warehouses to be helpful, there must be end-user
applications. This step contains designing and implementing applications required by the
end-users.

9. Roll-out the warehouses and applications: Once the data warehouse has been populated
and the end-client applications tested, the warehouse system and the operations may be
rolled out for the user's community to use.

Implementation Guidelines

1. Build incrementally: Data warehouses must be built incrementally. Generally, it is


recommended that a data marts may be created with one particular project in mind, and
once it is implemented, several other sections of the enterprise may also want to
implement similar systems. An enterprise data warehouses can then be implemented in an
iterative manner allowing all data marts to extract information from the data warehouse.

2. Need a champion: A data warehouses project must have a champion who is active to
carry out considerable researches into expected price and benefit of the project. Data
warehousing projects requires inputs from many units in an enterprise and therefore needs
to be driven by someone who is needed for interacting with people in the enterprises and
can actively persuade colleagues.

3. Senior management support: A data warehouses project must be fully supported by


senior management. Given the resource-intensive feature of such project and the time
they can take to implement, a warehouse project signal for a sustained commitment from
senior management.

4. Ensure quality: The only record that has been cleaned and is of a quality that is implicit
by the organizations should be loaded in the data warehouses.

5. Corporate strategy: A data warehouse project must be suitable for corporate strategies
and business goals. The purpose of the project must be defined before the beginning of
the projects.

6. Business plan: The financial costs (hardware, software, and peopleware), expected
advantage, and a project plan for a data warehouses project must be clearly outlined and
understood by all stakeholders. Without such understanding, rumors about expenditure
and benefits can become the only sources of data, subversion the projects.

7. Training: Data warehouses projects must not overlook data warehouses training
requirements. For a data warehouses project to be successful, the customers must be
trained to use the warehouses and to understand its capabilities.

8. Adaptability: The project should build in flexibility so that changes may be made to the
data warehouses if and when required. Like any system, a data warehouse will require to
change, as the needs of an enterprise change.

9. Joint management: The project must be handled by both IT and business professionals
in the enterprise. To ensure that proper communication with the stakeholder and which
the project is the target for assisting the enterprise's business, the business professional
must be involved in the project along with technical professionals.

Data Warehouse Overview:


The term "Data Warehouse" was first coined by Bill Inmon in 1990. According to Inmon, a data warehouse
is a subject oriented, integrated, time-variant, and non-volatile collection of data. This data helps analysts to
take informed decisions in an organization.
An operational database undergoes frequent changes on a daily basis on account of the transactions that take
place. Suppose a business executive wants to analyze previous feedback on any data such as a product, a
supplier, or any consumer data, then the executive will have no data available to analyze because the
previous data has been updated due to transactions.
A data warehouses provides us generalized and consolidated data in multidimensional view. Along with
generalized and consolidated view of data, a data warehouses also provides us Online Analytical Processing
(OLAP) tools. These tools help us in interactive and effective analysis of data in a multidimensional space.
This analysis results in data generalization and data mining.
Data mining functions such as association, clustering, classification, prediction can be integrated with OLAP
operations to enhance the interactive mining of knowledge at multiple level of abstraction. That's why data
warehouse has now become an important platform for data analysis and online analytical processing.
Understanding a Data Warehouse
 A data warehouse is a database, which is kept separate from the organization's operational
database.
 There is no frequent updating done in a data warehouse.
 It possesses consolidated historical data, which helps the organization to analyze its business.
 A data warehouse helps executives to organize, understand, and use their data to take strategic
decisions.
 Data warehouse systems help in the integration of diversity of application systems.
 A data warehouse system helps in consolidated historical data analysis.
Why a Data Warehouse is Separated from Operational Databases
A data warehouses is kept separate from operational databases due to the following reasons −
 An operational database is constructed for well-known tasks and workloads such as searching
particular records, indexing, etc. In contract, data warehouse queries are often complex and
they present a general form of data.
 Operational databases support concurrent processing of multiple transactions. Concurrency
control and recovery mechanisms are required for operational databases to ensure robustness
and consistency of the database.
 An operational database query allows to read and modify operations, while an OLAP query
needs only read only access of stored data.
 An operational database maintains current data. On the other hand, a data warehouse
maintains historical data.
Data Warehouse Features
The key features of a data warehouse are discussed below −
 Subject Oriented − A data warehouse is subject oriented because it provides information
around a subject rather than the organization's ongoing operations. These subjects can be
product, customers, suppliers, sales, revenue, etc. A data warehouse does not focus on the
ongoing operations, rather it focuses on modelling and analysis of data for decision making.
 Integrated − A data warehouse is constructed by integrating data from heterogeneous sources
such as relational databases, flat files, etc. This integration enhances the effective analysis of
data.
 Time Variant − The data collected in a data warehouse is identified with a particular time
period. The data in a data warehouse provides information from the historical point of view.
 Non-volatile − Non-volatile means the previous data is not erased when new data is added to
it. A data warehouse is kept separate from the operational database and therefore frequent
changes in operational database is not reflected in the data warehouse.
Note − A data warehouse does not require transaction processing, recovery, and concurrency controls,
because it is physically stored and separate from the operational database.
Data Warehouse Applications
As discussed before, a data warehouse helps business executives to organize, analyze, and use their data for
decision making. A data warehouse serves as a sole part of a plan-execute-assess "closed-loop" feedback
system for the enterprise management. Data warehouses are widely used in the following fields −

 Financial services
 Banking services
 Consumer goods
 Retail sectors
 Controlled manufacturing
Types of Data Warehouse
Information processing, analytical processing, and data mining are the three types of data warehouse
applications that are discussed below −
 Information Processing − A data warehouse allows to process the data stored in it. The data
can be processed by means of querying, basic statistical analysis, reporting using crosstabs,
tables, charts, or graphs.
 Analytical Processing − A data warehouse supports analytical processing of the information
stored in it. The data can be analyzed by means of basic OLAP operations, including slice-
and-dice, drill down, drill up, and pivoting.
 Data Mining − Data mining supports knowledge discovery by finding hidden patterns and
associations, constructing analytical models, performing classification and prediction. These
mining results can be presented using the visualization tools.

Data Warehousing - OLAP


Online Analytical Processing Server (OLAP) is based on the multidimensional data model. It allows
managers, and analysts to get an insight of the information through fast, consistent, and interactive access to
information. This chapter cover the types of OLAP, operations on OLAP, difference between OLAP, and
statistical databases and OLTP.
Types of OLAP Servers
We have four types of OLAP servers −

 Relational OLAP (ROLAP)


 Multidimensional OLAP (MOLAP)
 Hybrid OLAP (HOLAP)
 Specialized SQL Servers
Relational OLAP
ROLAP servers are placed between relational back-end server and client front-end tools. To store and
manage warehouse data, ROLAP uses relational or extended-relational DBMS.
ROLAP includes the following −

 Implementation of aggregation navigation logic.


 Optimization for each DBMS back end.
 Additional tools and services.
Multidimensional OLAP
MOLAP uses array-based multidimensional storage engines for multidimensional views of data. With
multidimensional data stores, the storage utilization may be low if the data set is sparse. Therefore, many
MOLAP server use two levels of data storage representation to handle dense and sparse data sets.
Hybrid OLAP
Hybrid OLAP is a combination of both ROLAP and MOLAP. It offers higher scalability of ROLAP and
faster computation of MOLAP. HOLAP servers allows to store the large data volumes of detailed
information. The aggregations are stored separately in MOLAP store.
Specialized SQL Servers
Specialized SQL servers provide advanced query language and query processing support for SQL queries
over star and snowflake schemas in a read-only environment.
OLAP Operations
Since OLAP servers are based on multidimensional view of data, we will discuss OLAP operations in
multidimensional data.
Here is the list of OLAP operations −

 Roll-up
 Drill-down
 Slice and dice
 Pivot (rotate)
Roll-up
Roll-up performs aggregation on a data cube in any of the following ways −
 By climbing up a concept hierarchy for a dimension
 By dimension reduction
The following diagram illustrates how roll-up works.

 Roll-up is performed by climbing up a concept hierarchy for the dimension location.


 Initially the concept hierarchy was "street < city < province < country".
 On rolling up, the data is aggregated by ascending the location hierarchy from the level of city
to the level of country.
 The data is grouped into cities rather than countries.
 When roll-up is performed, one or more dimensions from the data cube are removed.
Drill-down
Drill-down is the reverse operation of roll-up. It is performed by either of the following ways −

 By stepping down a concept hierarchy for a dimension


 By introducing a new dimension.
The following diagram illustrates how drill-down works −
 Drill-down is performed by stepping down a concept hierarchy for the dimension time.
 Initially the concept hierarchy was "day < month < quarter < year."
 On drilling down, the time dimension is descended from the level of quarter to the level of
month.
 When drill-down is performed, one or more dimensions from the data cube are added.
 It navigates the data from less detailed data to highly detailed data.
Slice
The slice operation selects one particular dimension from a given cube and provides a new sub-cube.
Consider the following diagram that shows how slice works.
 Here Slice is performed for the dimension "time" using the criterion time = "Q1".
 It will form a new sub-cube by selecting one or more dimensions.
Dice
Dice selects two or more dimensions from a given cube and provides a new sub-cube. Consider the
following diagram that shows the dice operation.
The dice operation on the cube based on the following selection criteria involves three dimensions.

 (location = "Toronto" or "Vancouver")


 (time = "Q1" or "Q2")
 (item =" Mobile" or "Modem")
Pivot
The pivot operation is also known as rotation. It rotates the data axes in view in order to provide an
alternative presentation of data. Consider the following diagram that shows the pivot operation.

What is OLAM?

OLAM stands for Online analytical mining. It is also known as OLAP Mining. It integrates online
analytical processing with data mining and mining knowledge in multi-dimensional databases. There are
several paradigms and structures of data mining systems.
Various data mining tools must work on integrated, consistent, and cleaned data. This requires costly pre-
processing for data cleaning, data transformation, and data integration. Thus, a data warehouse constructed
by such pre-processing is a valuable source of high-quality information for both OLAP and data mining.
Data mining can serve as a valuable tool for data cleaning and data integration.
OLAM is particularly important for the following reasons which are as follows −
High quality of data in data warehouses − Most data mining tools are required to work on integrated,
consistent, and cleaned information, which needs costly data cleaning, data integration, and data
transformation as a pre-processing phase. A data warehouse constructed by such pre-processing serves as a
valuable source of high-quality data for OLAP and data mining. Data mining can also serve as a valuable
tool for data cleaning and data integration.
Available information processing infrastructure surrounding data warehouses − Comprehensive data
processing and data analysis infrastructures have been or will be orderly constructed surrounding data
warehouses, which contains accessing, integration, consolidation, and transformation of various
heterogeneous databases, ODBC/OLE DB connections, Web-accessing and service facilities, and
documenting and OLAP analysis tools. It is careful to create the best use of the available infrastructures
instead of constructing everything from scratch.
OLAP-based exploratory data analysis − Effective data mining required exploratory data analysis. A user
will be required to traverse through a database, select areas of relevant information, analyze them at
multiple granularities, and display knowledge/results in multiple forms.
Online analytical mining supports facilities for data mining on multiple subsets of data and at several levels
of abstraction, by drilling, pivoting, filtering, dicing, and slicing on a data cube and some intermediate data
mining outcomes.
On-line selection of data mining functions − It supports a user who cannot understand what type of
knowledge they would like to mine. By integrating OLAP with various data mining functions, online
analytical mining provides users with the flexibility to choose desired data mining functions and swap data
mining tasks dynamically.
On Line Transaction Processing (OLTP) System in DBMS
On-Line Transaction Processing (OLTP) System refers to the system that manage transaction oriented
applications. These systems are designed to support on-line transaction and process query quickly on the
Internet.
For example: POS (point of sale) system of any supermarket is a OLTP System.
Every industry in today’s world use OLTP system to record their transactional data. The main concern of
OLTP systems is to enter, store and retrieve the data. They covers all day to day operations such as
purchasing, manufacturing, payroll, accounting, etc.of an organization. Such systems have large numbers
of user which conduct short transaction. It supports simple database query so the response time of any user
action is very fast.
The data acquired through an OLTP system is stored in commercial RDBMS, which can be used by an
OLAP System for data analytics and other business intelligence operations.
Some other examples of OLTP systems include order entry, retail sales, and financial transaction systems.
Advantages of an OLTP System:
 OLTP Systems are user friendly and can be used by anyone having basic understanding
 It allows its user to perform operations like read, write and delete data quickly.
 It responds to its user actions immediately as it can process query very quickly.
 This systems are original source of the data.
 It helps to administrate and run fundamental business tasks
 It helps in widening customer base of an organization by simplifying individual processes
Challenges of an OLTP system:
 It allows multiple users to access and change the same data at the same time. So it requires
concurrency control and recovery mechanism to avoid any unprecedented situations
 The data acquired through OLTP systems are not suitable for decision making. OLAP systems
are used for the decision making or “what if” analysis.
Type of queries that an OLTP system can Process:
An OLTP system is an online database modifying system. So it supports database query like INSERT,
UPDATE and DELETE information from the database. Consider a POS system of a supermarket, Below are
the sample queries that it can process –
 Retrieve the complete description of a particular product
 Filter all products related to any particular supplier
 Search for the record of any particular customer.
 List all products having price less than Rs 1000.

Sr.No. Data Warehouse (OLAP) Operational Database(OLTP)

1 It involves historical It involves day-to-day


processing of information. processing.

2 OLAP systems are used by OLTP systems are used by


knowledge workers such as clerks, DBAs, or database
executives, managers, and professionals.
analysts.

3 It is used to analyze the It is used to run the business.


business.

4 It focuses on Information It focuses on Data in.


out.

5 It is based on Star Schema, It is based on Entity


Snowflake Schema, and Relationship Model.
Fact Constellation Schema.

6 It focuses on Information It is application oriented.


out.

7 It contains historical data. It contains current data.

8 It provides summarized and It provides primitive and highly


consolidated data. detailed data.

9 It provides summarized and It provides detailed and flat


multidimensional view of relational view of data.
data.

10 The number of users is in The number of users is in


hundreds. thousands.

11 The number of records The number of records accessed


accessed is in millions. is in tens.

12 The database size is from The database size is from 100


100GB to 100 TB. MB to 100 GB.

13 These are highly flexible. It provides high performance.

Frequent Pattern Mining in Data Mining


Frequent pattern mining in data mining is the process of identifying patterns or associations within a
dataset that occur frequently. This is typically done by analyzing large datasets to find items or sets of
items that appear together frequently.
There are several different algorithms used for frequent pattern mining, including:
1. Apriori algorithm: This is one of the most commonly used algorithms for frequent pattern
mining. It uses a “bottom-up” approach to identify frequent itemsets and then generates
association rules from those itemsets.
2. ECLAT algorithm: This algorithm uses a “depth-first search” approach to identify frequent
itemsets. It is particularly efficient for datasets with a large number of items.
3. FP-growth algorithm: This algorithm uses a “compression” technique to find frequent patterns
efficiently. It is particularly efficient for datasets with a large number of transactions.
4. Frequent pattern mining has many applications, such as Market Basket Analysis, Recommender
Systems, Fraud Detection, and many more.

Advantages:

1. It can find useful information which is not visible in simple data browsing
2. It can find interesting association and correlation among data items

Disadvantages:

1. It can generate a large number of patterns


2. With high dimensionality, the number of patterns can be very large, making it difficult to
interpret the results.

Data mining has different types of patterns and frequent pattern mining is one of them. This concept was
introduced for mining transaction databases. Frequent patterns are patterns(such as items, subsequences,
or substructures) that appear frequently in the database. It is an analytical process that finds frequent
patterns, associations, or causal structures from databases in various databases. This process aims to find
the frequently occurring item in a transaction. By frequent patterns, we can identify strongly correlated
items together and we can identify similar characteristics and associations among them. By doing frequent
data mining we can go further for clustering and association.
Frequent data mining can be done by using association rules with particular algorithms eclat and apriori
algorithms. Frequent pattern mining searches for recurring relationships in a data set. It also helps to find
the inheritance regularities. to make fast processing software with a user interface and used for a long time
without any error.
Association Rule Mining:
It is easy to find associations in frequent patterns:
 for each frequent pattern x for each subset y c x.
 calculate the support of y-> x – y.
if it is greater than the threshold, keep the rule. There are two algorithms that support this lattice
1. Apriori algorithm
2. eclat algorithm
Apriori Eclat

It performs “perfect” pruning of


It reduces memory requirements and is faster.
infrequent item sets.

It requires a lot of memory(all


frequent item sets are represented)
and support counting takes very Its storage of transaction list.
long for large transactions. But this
is not efficient in practice.

The words support and confidence support the association rule.


 Support: how often a given rule in a database is mined? support the transaction contains x U y
 Confidence: the number of times the given rule in a practice is true. The conditional
probability is a transaction having x as well as y.

working principle (it is a simple point of scale application for any supermarket which has a good off-
product scale)
 the product data will be entered into the database.
 the taxes and commissions are entered.
 the product will be purchased and it will be sent to the bill counter.
 the bill calculating operator will check the product with the bar code machine it will check and
match the product in the database and then it will show the information of the product.
 the bill will be paid by the customer and he will receive the products.

Tasks in the frequent pattern mining:


 Association
 Cluster analysis: frequent pattern-based clustering is well suited for high-dimensional data. by
the extension of dimension the sub-space clustering occurs.
 Data warehouse: iceberg cube and cube gradient
 Broad applications
There are some to improve the efficiency of the tasks.
Closed Pattern:

A frequent pattern, it meets the minimum support criteria. All super patterns of a closed pattern are less
frequent than the closed pattern.

Max Pattern:

It also meets the minimum support criteria(like a closed pattern). All super patterns of a max pattern are
not frequent patterns. both patterns generate fewer numbers of patterns so therefore they increase the
efficiency of the task.

Applications of Frequent Pattern Mining:

basket data analysis, cross-marketing, catalog design, sale campaign analysis, web log analysis, and DNA
sequence analysis.
Issues of frequent pattern mining
 flexibility and reusability for creating frequent patterns
 most of the algorithms used for mining frequent item sets do not offer flexibility for reusing
 much research is needed to reduce the size of the derived patterns
 Frequent pattern mining has several applications in different areas, including:
 Market Basket Analysis: This is the process of analyzing customer purchasing patterns in order
to identify items that are frequently bought together. This information can be used to optimize
product placement, create targeted marketing campaigns, and make other business decisions.
 Recommender Systems: Frequent pattern mining can be used to identify patterns in user
behavior and preferences in order to make personalized recommendations.
 Fraud Detection: Frequent pattern mining can be used to identify abnormal patterns of behavior
that may indicate fraudulent activity.
 Network Intrusion Detection: Network administrators can use frequent pattern mining to detect
patterns of network activity that may indicate a security threat.
 Medical Analysis: Frequent pattern mining can be used to identify patterns in medical data that
may indicate a particular disease or condition.
 Text Mining: Frequent pattern mining can be used to identify patterns in text data, such as
keywords or phrases that appear frequently together in a document.
 Web usage mining: Frequent pattern mining can be used to analyze patterns of user behavior on
a website, such as which pages are visited most frequently or which links are clicked on most
often.
 Gene Expression: Frequent pattern mining can be used to analyze patterns of gene expression
in order to identify potential biomarkers for different diseases.
These are a few examples of the application of frequent pattern mining. The list is not exhaustive and the
technique can be applied in many other areas, as well.
Correlation Analysis in Data Mining

Correlation analysis is a statistical method used to measure the strength of the linear relationship between
two variables and compute their association. Correlation analysis calculates the level of change in one
variable due to the change in the other. A high correlation points to a strong relationship between the two
variables, while a low correlation means that the variables are weakly related.

Researchers use correlation analysis to analyze quantitative data collected through research methods like
surveys and live polls for market research. They try to identify relationships, patterns, significant
connections, and trends between two variables or datasets. There is a positive correlation between two
variables when an increase in one variable leads to an increase in the other. On the other hand, a negative
correlation means that when one variable increases, the other decreases and vice-versa.
Correlation is a bivariate analysis that measures the strength of association between two variables and the
direction of the relationship. In terms of the strength of the relationship, the correlation coefficient's value
varies between +1 and -1. A value of ± 1 indicates a perfect degree of association between the two variables.

As the correlation coefficient value goes towards 0, the relationship between the two variables will be
weaker. The coefficient sign indicates the direction of the relationship; a + sign indicates a positive
relationship, and a - sign indicates a negative relationship.

You might also like