0% found this document useful (0 votes)
22 views

Seminar On Data Mining and Data Warehousing Concepts of Second Module Chapter Two

This document discusses issues in proximity calculation when performing data mining and discusses some approaches to address those issues. It discusses how to handle attributes with different scales and correlations, and how to calculate proximity between objects with different attribute types. It presents the Mahalanobis distance formula as a way to account for correlation between attributes. It also discusses combining similarities for heterogeneous attributes by computing a weighted average similarity between objects based on each attribute's similarity value and importance weight. The document emphasizes selecting the right proximity measure that accounts for differences in attribute scales, types and importance weights.

Uploaded by

Ajay C Hiremath
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views

Seminar On Data Mining and Data Warehousing Concepts of Second Module Chapter Two

This document discusses issues in proximity calculation when performing data mining and discusses some approaches to address those issues. It discusses how to handle attributes with different scales and correlations, and how to calculate proximity between objects with different attribute types. It presents the Mahalanobis distance formula as a way to account for correlation between attributes. It also discusses combining similarities for heterogeneous attributes by computing a weighted average similarity between objects based on each attribute's similarity value and importance weight. The document emphasizes selecting the right proximity measure that accounts for differences in attribute scales, types and importance weights.

Uploaded by

Ajay C Hiremath
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 7

Seminar on Data Mining and data Warehousing

concepts of second module chapter two

Ajay C H
USN- 2JH17CS004
2.4.6 Issues in Proximity Calculation

1.How to handle the case in which attributes have


different scales and/or are correlated?

2.How to calculate proximity between objects


that are composed of different types of?
Attributes

3.And how to handle proximity calculation when


attributes have different weights?
 Standardization and Correlation for Distance
Measures

Mahalanobis Distance Formulae:


mahalanobis(x , y)=
where,

is the matrix whose entry is the covariance of


the and attributes.
 Example 1: In figure 2.4.1 there are 1000 points, whose x and y attributes have a correlation of 0.6 .The
distance between the two large points at the opposite ends of the long axis of the ellipse is 14.7 in terms of
Euclidean distance, but only 6 with respect to Mahalanobis distance. In practice, computing the Mahalanobis
distance is expensive, but can be worthwhile for data whose attributes are correlated. If the attributes are
relatively uncorrelated, but have different ranges, then standardizing the variables is sufficient.
 Combining Similarities for Heterogeneous Attributes
Algorithm
   2.1: Similarities of heterogeneous object
1. For the attribute, compute a similarity , (x, y), in the range {0, 1}.
2. Define an indicator variable, , for attribute as follows:
0 if the attribute is an asymmetric attribute and
= both the objects have a values of 0, or if one of the objects
has a missing values for the attribute
1 otherwise
3. Compute the overall similarity between the two objects using the following formula:
Similarity(x , y) = (1)
the formulas for proximity can be modified by weighting the contribution of each attribute.
If the weights sum to 1, then equation (1) becomes
Similarity(x , y) = (2)
The definition of the Minkowski distance can also be modified as follows:
(3)
2.4.7 Selecting the Right Proximity
Measure
THANK YOU

You might also like