0% found this document useful (0 votes)
74 views

Chi Merge

The document discusses discretization, which is the process of converting qualitative data into quantitative data. It then focuses on ChiMerge, an algorithm that analyzes intervals of quantitative features using chi-square statistics to determine if intervals should be merged based on the independence of the feature intervals and output classifications. The document provides the basic steps of ChiMerge, which involves sorting data, defining initial intervals, and repeatedly merging adjacent intervals where the chi-square value is below a threshold.

Uploaded by

Rehman Ali
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
74 views

Chi Merge

The document discusses discretization, which is the process of converting qualitative data into quantitative data. It then focuses on ChiMerge, an algorithm that analyzes intervals of quantitative features using chi-square statistics to determine if intervals should be merged based on the independence of the feature intervals and output classifications. The document provides the basic steps of ChiMerge, which involves sorting data, defining initial intervals, and repeatedly merging adjacent intervals where the chi-square value is below a threshold.

Uploaded by

Rehman Ali
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Discretization

A process that converts the qualitative data into quantitative data is called discretization.
• Some data mining algorithms only accept categorical attributes (LVF, FINCO, Naïve
Bayes).
• The learning process is often less efficient and less effective when the data has only
quantitative features.
Chi merge
ChiMerge is one automated discretization algorithm that analyzes the quality of multiple
intervals for a given feature by using χ2 statistics.
Algorithm
• The algorithm determines similarities between distributions of data in two adjacent
intervals based on output classification of samples.
• If the conclusion of the χ2 test is that the output class is independent of the feature ’s
intervals, then the intervals should be merged; otherwise, it indicates that the difference
between intervals is statistically significant, and no merger will be performed.
Three basic steps of ChiMerge
1. Sort the data for the given feature in ascending order.
2. Defi ne initial intervals so that every value of the feature is in a separate interval.
3. Repeat until no χ2 of any two adjacent intervals is less then threshold value.
Formula of χ2

Simply

(observed 𝑣𝑎𝑙𝑢𝑒−𝑒𝑥𝑝𝑒𝑐𝑡𝑒𝑑 𝑣𝑎𝑙𝑢𝑒 )2


χ2 = ∑ 𝑒𝑥𝑝𝑒𝑐𝑡𝑒𝑑 𝑣𝑎𝑙𝑢𝑒
Example of ChiMerge

Let we have Data on the Sorted Continuous Feature F with Corresponding Classes K

Step 1.
sort and order the attribute you want

step 2
start with every unique value in the attribute be in its own intervals.
Step 3
Being calculate the chi square test on every interval

Applying formula

Threshold .1 with df = 1 from chi square distribution chart merge if X2 < 2.7024
Step 4. Apply same iteration for all the chi classes
Step 6
Merge the interval with smallest value

Step 7
Repeat it
No more interval can be satisfied

You might also like