0% found this document useful (0 votes)
40 views

On Applications of Rough Sets Theory To Knowledge Discovery: Frida Coaquira

This document discusses the application of rough set theory to knowledge discovery. It begins with an introduction to rough sets and their uses in feature selection, discretization, data imputation, and decision rule generation. It then provides definitions and concepts in rough set theory including equivalence relations, decision systems, indiscernibility relations, lower and upper approximations, and positive regions. The document concludes with small examples demonstrating concepts like core attributes, discernibility matrices, and reducts.

Uploaded by

Frank John
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
40 views

On Applications of Rough Sets Theory To Knowledge Discovery: Frida Coaquira

This document discusses the application of rough set theory to knowledge discovery. It begins with an introduction to rough sets and their uses in feature selection, discretization, data imputation, and decision rule generation. It then provides definitions and concepts in rough set theory including equivalence relations, decision systems, indiscernibility relations, lower and upper approximations, and positive regions. The document concludes with small examples demonstrating concepts like core attributes, discernibility matrices, and reducts.

Uploaded by

Frank John
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 30

On Applications of Rough

Sets theory to Knowledge


Discovery
Frida Coaquira
UNIVERSITY OF PUERTO RICO
MAYAGEZ CAMPUS
[email protected]
Introduction
One goal of the Knowledge Discovery is extract meaningful
knowledge.
Rough Sets theory was introduced by Z. Pawlak (1982) as
a mathematical tool for data analysis.

Rough sets have many applications in the field of
Knowledge Discovery: feature selection, discretization
process, data imputations and create decision Rules.

Rough set have been introduced as a tool to deal with,
uncertain Knowledge in Artificial Intelligence Application.
Equivalence Relation

Let X be a set and let x, y, and z be elements of X.
An equivalence relation R on X is a Relation on X
such that:

Reflexive Property: xRx for all x in X.

Symmetric Property: if xRy, then yRx.

Transitive Property: if xRy and yRz, then xRz.
Rough Sets Theory
Let , be a Decision system data,
Where: U is a non-empty, finite set called the universe ,
A is a non-empty finite set of attributes, C and D are subsets
of A, Conditional and Decision attributes subsets
respectively.

for is called the value set of a ,
The elements of U are objects, cases, states, observations.
The Attributes are interpreted as features, variables,
characteristics conditions, etc.

) , , , ( D C A U T =
a
V U a :
, A ae
a
V
Indiscernibility Relation
The Indecernibility relation IND(P) is an
equivalence relation.
Let , , the indiscernibility
relation IND(P), is defined as follows:

for all
A ae
A P _
: ) , {( ) ( U U y x P IND e =
, P ae )} ( ) ( y a x a =
Indiscernibility Relation

The indiscernibility relation defines a partition in U.

Let , U/IND(P) denotes a family of all equivalence
classes of the relation IND(P), called elementary sets.

Two other equivalence classes U/IND(C) and
U/IND(D), called condition and decision equivalence
classes respectively, can also be defined.
A P _
R-lower approximation
Let and , R is a subset of conditional
features, then the R-lower approximation
set of X, is the set of all elements of U which
can be with certainty classified as elements of X.


R-lower approximation set of X is a subset of X
C R _
U X _
} : / { X Y R U Y X R _ e =
R-upper approximation
the R-upper approximation set of X, is the
set of all elements of U such that:


X is a subset of R-upper approximation set of X.
R-upper approximation contains all data which can possibly
be classified as belonging to the set X

the R-Boundary set of X is defined as:


} : / { | = e = X Y R U Y X R
X R X R X BN = ) (
Representation of the approximation sets
X R X R =
If then, X is R-definible (the boundary set is empty)
If then X is Rough with respect to R.

ACCURACY := Card(Lower)/ Card (Upper)
X R X R =
Decision Class
The decision d determines the partition
of the universe U.
Where for

will be called the classification of objects
in T
determined by the decision d.
The set X
k
is called the k-th decision class of T
} ,..., { ) (
) ( 1 d r T
X X d CLASS =
} ) ( : { k x d U x X
k
= e =
) ( 1 d r k s s
) (d CLASS
T
Decision Class
This system data information has 3 classes, We represent the
partition: lower approximation, upper approximation and boundary
set.
Rough Sets Theory

Lets consider U={x
1
, x
2
, x
3
, x
4
, x
5
, x
6
, x
7
, x
8
} and the
equivalence relation R with the equivalence classes:
X
1
={x
1
,x
3
,x
5
}, X
2
={x
2
,x
4
}and X
3
={x
6
,x
7
,x
8
} is a Partition.

Let the classification C={Y
1
,Y
2
,Y
3
} such that
Y
1
={x
1
, x
2
, x
4
}, Y
2
={x
3
, x
5
, x
8
}, Y
3
={x
6
, x
7
}
Only Y
1
has lower approximation, i.e.
,
2 1
X Y R =
Positive region and Reduct
Positive region
POS
R
(d) is called the positive region of classification
CLASS
T
(d) is equal to the union of all lower approximation
of decision classes.
Reducts ,are defined as minimal subset of condition
attributes which preserve positive region defined by the set
of all condition attributes, i.e.
A subset is a relative reduct iff
1 ,
2 For every proper subset condition 1 is not true.
) ( ) ( D POS D POS
C R
= C R_
R R c '
Dependency coefficient
Is a measure of association, Dependency coefficient
between condition attributes A and a decision attribute d is
defined by the formula:



Where, Card represent the cardinality of a set.



) (
)) ( (
) , (
U Card
d POS Card
d A
A
=
Discernibility matrix
Let U={x
1
, x
2
, x
3
,, x
n
} the universe on decision system
Data. Discernibility matrix is defined by:
,

where, is the set of all attributes that classify objects
x
i
and x
j
into different decision classes in U/D partition.

for some i, j } .
))} ( ) ( , ( ) ( ) ( : {
j i j i ij
x d x d D d x a x a C a m = e . = e =
n j i ,..., 3 , 2 , 1 , =
ij
m
} { : { ) ( a m C a C CORE
ij
= e =
Dispensable feature
Let R a family of equivalence relations and let P R,
P is dispensable in R if IND(R) = IND(R-{P}),
otherwise P is indispensable in R.
CORE
The set of all indispensable relation in C will be called the
core of C.
CORE(C)= RED(C), where RED(C) is the family of all
reducts of C.


e
Small Example
Let , the universe set.
, the conditional features set.
, Decision features set.
} , , , , , , {
7 6 5 4 3 2 1
x x x x x x x U =
} , , , {
4 3 2 1
a a a a C =
} {d D =
d
1 0 2 1 1
1 0 2 0 1
1 2 0 0 2
1 2 2 1 0
2 1 0 0 2
2 1 1 0 2
2 1 2 1 1
1
a
2
a
3
a
4
a
1
x
2
x
3
x
4
x
5
x
6
x
7
x
{ , , { , { { , { , { , , , { , , { , , , { , , , , , { , , , , ,
,
,
Discernibility Matrix

-

-

-

-

-

-

1
x
2
x
3
x
4
x
5
x
6
x
2
x
3
x
4
x
5
x
6
x
7
x
} , , {
4 3 2
a a a
} {
2
a
} , {
3 2
a a
} , {
4 2
a a
} , , {
3 2 1
a a a
} , , , {
4 3 2 1
a a a a
} , , , {
4 3 2 1
a a a a } , , {
3 2 1
a a a
} , , , {
4 3 2 1
a a a a
} , , , {
4 3 2 1
a a a a
} , , , {
4 3 2 1
a a a a
} , {
4 3
a a
} , {
4 3
a a
} , {
4 3
a a
} , {
2 1
a a
Example
Then, the Core(C) = {a
2
}
The partition produces by Core is
U/{a
2
} = {{ x
1
,x
2
},{x
5
, x
6
,x
7
},{x
3
,x
4
}},

and the partition produces by the decision feature d is
U/{d}={{ x
4
},{ x
1
,x
2
,x
7
},{x
3
,x
5
,x
6
}}
Similarity relation
A similarity relation on the set of objects is
, It contain all objects similar to x.
Lower approximation
, is the set of all element of U
which can be with certainty classified as elements of X.
Upper approximation

SI M-Possitive region of partition
Let
} : { x ySIM U y x SIM
T T
e =
} : { ) ( X x SIM X x X SIM
T T
_ e =
U X _

X x
T T
x SIM X SIM
e
= ) (
)} ( ,..., 1 : { d r i X
i
=
} ) ( : { i x d U x X
i
= e =

) (
1
) ( }) { (
d r
i
i T T
X SIM d SIM POS
=
=
U X _
Similarity measures
a

b
are parameters, this measure is not symmetric.

Similarity for nominal attribute
min max
1 ) , (
a a
v v
v v S
j i
j i a

=
otherwise. 0
if 1
) , (
j i a
v v S
a j a j i
v v v | o + s
a a
| o ,

=
=
= = = =
=
) (
1
) ( ). (
) , ( ) , (
) , (
d r
k
j i
j i a
k d P d r
v a k d P v a k d P
v v S
Quality of approximation of classification
Is the ratio of all correctly classified objects to all objects.



Relative Reduct
is s relative reduct for SIM
A
{d} iff
1)
2) for every proper subset condition 1) is not true.
) (
})) { ( (
}) { (
U Card
d SIM POS Card
d SIM
T
T
=
A R _
}) { ( }) { ( d SIM POS d SIM POS
R A
=
Attribute Reduction
The purpose is select a subset of attributes from an Original
set of attributes to use in the rest of the process.
Selection criteria: Reduct concept description.
Reduct is the essential part of the knowledge, which define
all basic concepts.
Other methods are:
Discernibility matrix (nn)
Generate all combination of attributes and then evaluate
the classification power or dependency coefficient
(complete search).
Discretization Methods
The purpose is development an algorithm that find a
consistent set of cuts point which minimizes the number of
Regions that are consistent.
Discretization methods based on Rough set theory try to find
These cutpoints
A set of S points P1, , Pn in the plane R2 , partitioned into
two disjoint categories S1, S2 and a natural number T.
Is there a consistent set of lines such that the partition of the
plane into region defined by them consist of at most T
regions?
Consistent
Def. A set of cuts P is consistent with A (or A-consistent) iff,
where and are general decisions of A and A
P
respectively.

Def. A set P
irr
of cuts is A-irreducible iff P
irr
is A-consistent
and any its proper subfamily P ( P PP
irr
) is not
A-inconsistent.
P
A
A
c = c
A
c
P
A
c
c
Level of Inconsistency
Let B a subset of A and



Where X
i
is a classification of U and

, i = 1,2,,n


L
c
represents the percentage of instances which can be
Correctly classified into class X
i
with respect to subset B.
U
X B
L
i
c

=
| =
j i
X X
U X
i
=

Imputation Data
The rules of the system should have Maximum in terms of
consistency.
The relevant attributes for x is defined by.
is defined }
And the relation
for all
x and y are consistent if .
Example
Let x=(1,3,?,4), y=(2,?,5,4) and z=(1,?,5,4)
x and z are consistent
x and y are not consistent
) ( : { ) ( x a R a x rel
R
e =
) ( ) ( y a x a y xR
c
=
) ( ) ( y rel x rel a
R R
e
y xR
c
z xR
c
Decision rules
F1 F2 F3 F4 D Rules
O3 0 0 0 1 L R1
O5 0 0 1 3 L R1
O1 0 1 0 2 L R2
O4 0 1 1 0 M R3
O2 1 1 0 2 H R4
Rule1 if (F2=0) then (D=L)
Rule2 if (F1=0) then (D=L)
Rule3 if (F4=0) then (D=M)
Rule4 if (F1=0) then (D=H)
The algorithm should minimize the number of features
included in decision rules.
References
[1] Gediga, G. And Duntsch, I. (2002) Maximum Consistency of
Incomplete Data Via Non-invasive Imputation. Artificial
Intelligence.
[2] Grzymala, J. and Siddhave, S. (2004) Rough set Approach to Rule
Induction from Incomplete Data. Proceeding of the IPMU2004,
the10th International Conference on information Processing and
Management of Uncertainty in Knowledge-Based System.
[3] Pawlak, Z. (1995) Rough sets. Proccedings of the 1995 ACM 23rd
annual conference on computer science.
[4]Tay, F. and Shen, L. (2002) A modified Chi2 Algorithm for
Discretization. In IEEE Transaction on Knowledge and Data
engineering, Vol 14, No. 3 may/june.
[5] Zhong, N. (2001) Using Rough Sets with Heuristics for Feature
Selection. Journal of Intelligent Information Systems, 16, 199-214,
Kluwer Academic Publishers.

THANK YOU!

You might also like