On Applications of Rough Sets Theory To Knowledge Discovery: Frida Coaquira
On Applications of Rough Sets Theory To Knowledge Discovery: Frida Coaquira
X x
T T
x SIM X SIM
e
= ) (
)} ( ,..., 1 : { d r i X
i
=
} ) ( : { i x d U x X
i
= e =
) (
1
) ( }) { (
d r
i
i T T
X SIM d SIM POS
=
=
U X _
Similarity measures
a
b
are parameters, this measure is not symmetric.
Similarity for nominal attribute
min max
1 ) , (
a a
v v
v v S
j i
j i a
=
otherwise. 0
if 1
) , (
j i a
v v S
a j a j i
v v v | o + s
a a
| o ,
=
=
= = = =
=
) (
1
) ( ). (
) , ( ) , (
) , (
d r
k
j i
j i a
k d P d r
v a k d P v a k d P
v v S
Quality of approximation of classification
Is the ratio of all correctly classified objects to all objects.
Relative Reduct
is s relative reduct for SIM
A
{d} iff
1)
2) for every proper subset condition 1) is not true.
) (
})) { ( (
}) { (
U Card
d SIM POS Card
d SIM
T
T
=
A R _
}) { ( }) { ( d SIM POS d SIM POS
R A
=
Attribute Reduction
The purpose is select a subset of attributes from an Original
set of attributes to use in the rest of the process.
Selection criteria: Reduct concept description.
Reduct is the essential part of the knowledge, which define
all basic concepts.
Other methods are:
Discernibility matrix (nn)
Generate all combination of attributes and then evaluate
the classification power or dependency coefficient
(complete search).
Discretization Methods
The purpose is development an algorithm that find a
consistent set of cuts point which minimizes the number of
Regions that are consistent.
Discretization methods based on Rough set theory try to find
These cutpoints
A set of S points P1, , Pn in the plane R2 , partitioned into
two disjoint categories S1, S2 and a natural number T.
Is there a consistent set of lines such that the partition of the
plane into region defined by them consist of at most T
regions?
Consistent
Def. A set of cuts P is consistent with A (or A-consistent) iff,
where and are general decisions of A and A
P
respectively.
Def. A set P
irr
of cuts is A-irreducible iff P
irr
is A-consistent
and any its proper subfamily P ( P PP
irr
) is not
A-inconsistent.
P
A
A
c = c
A
c
P
A
c
c
Level of Inconsistency
Let B a subset of A and
Where X
i
is a classification of U and
, i = 1,2,,n
L
c
represents the percentage of instances which can be
Correctly classified into class X
i
with respect to subset B.
U
X B
L
i
c
=
| =
j i
X X
U X
i
=
Imputation Data
The rules of the system should have Maximum in terms of
consistency.
The relevant attributes for x is defined by.
is defined }
And the relation
for all
x and y are consistent if .
Example
Let x=(1,3,?,4), y=(2,?,5,4) and z=(1,?,5,4)
x and z are consistent
x and y are not consistent
) ( : { ) ( x a R a x rel
R
e =
) ( ) ( y a x a y xR
c
=
) ( ) ( y rel x rel a
R R
e
y xR
c
z xR
c
Decision rules
F1 F2 F3 F4 D Rules
O3 0 0 0 1 L R1
O5 0 0 1 3 L R1
O1 0 1 0 2 L R2
O4 0 1 1 0 M R3
O2 1 1 0 2 H R4
Rule1 if (F2=0) then (D=L)
Rule2 if (F1=0) then (D=L)
Rule3 if (F4=0) then (D=M)
Rule4 if (F1=0) then (D=H)
The algorithm should minimize the number of features
included in decision rules.
References
[1] Gediga, G. And Duntsch, I. (2002) Maximum Consistency of
Incomplete Data Via Non-invasive Imputation. Artificial
Intelligence.
[2] Grzymala, J. and Siddhave, S. (2004) Rough set Approach to Rule
Induction from Incomplete Data. Proceeding of the IPMU2004,
the10th International Conference on information Processing and
Management of Uncertainty in Knowledge-Based System.
[3] Pawlak, Z. (1995) Rough sets. Proccedings of the 1995 ACM 23rd
annual conference on computer science.
[4]Tay, F. and Shen, L. (2002) A modified Chi2 Algorithm for
Discretization. In IEEE Transaction on Knowledge and Data
engineering, Vol 14, No. 3 may/june.
[5] Zhong, N. (2001) Using Rough Sets with Heuristics for Feature
Selection. Journal of Intelligent Information Systems, 16, 199-214,
Kluwer Academic Publishers.
THANK YOU!