Time-Series Data Analysis With Rough Sets
Time-Series Data Analysis With Rough Sets
net/publication/228671979
CITATIONS READS
19 489
2 authors:
All content following this page was uploaded by Jingtao Yao on 08 April 2014.
The preprocessed data was split into analysis (1665 objects) Table 4. Statistical Results
Analysis Set Validation Set
and validation (555 objects) sets. The analysis set was uti- Total objects 1665 555
lized by rough sets, which acquired reducts and decision Objects covered 1090 359
rules. Each decision rule had measurements of support, ac- Min. support 48 18
curacy, and coverage. These rules were the primary output Max. support 334 122
Average support 168.5 55.1
of the rough set analysis process. The decision rules were
Min. accuracy 0.4884 0.5200
then validated by firing each individual rule in conjunction Max. accuracy 0.6875 0.8235
with the validation data set. The new support, accuracy, Average accuracy 0.6000 0.6291
and coverage measures were observed for the validation
data set. The goal is to find rules that are accurate repre- Approximately 65.5% of distinct objects are covered by
sentations of the data. Therefore, rules that have similar the ten rules in the analysis set. The same rules, when fired
measures in both the analysis and validation sets should be against the validation set, cover 64.7% of distinct objects
considered as stable and accurate. in the validation set. Through the comparison of the ac-
It is interesting to see the result of combining two deci- curacy and support measures given in Table 3, the amount
sion classes. Two possible combinations can be performed: of change in accuracy and support between the analysis set
the combination of a decrease and neutral, and the combi- and validation set for each rule is shown in Table 5.
nation of an increase and neutral. The combination of de- A large jump in accuracy from the analysis set to the val-
crease and increase potentially offers no value. However, idation set in rules 2 and 5 is observed. Although the sup-
the combination of neutral and either increase or decrease port values for these two rules do not change significantly
can tell us whether this rule can offer a large gain or loss between data sets, this may tell us that these two rules are
with the inclusion of a marginal change. Rule 1 is an exam- unstable since there is such a dramatic difference in accu-
ple of a rule with measurements obtained through both the racy. Rule 3 saw a significant decrease in support from the
analysis and validation sets. analysis set to the validation set. Whereas rule 3 covered
This transformation results in the disjunction of two de- 12.37% of objects in the analysis set, the same rule only
cision classes instead of one. It is interesting to see that the covered 10.45% of objects in the validation set. Rule 1 is
rule above can forecast either a large gain or a minimal loss by far the strongest, most stable rule - showing negligible
or gain. This could be a powerful tool if the user so desired. change in accuracy and support between data sets.
Each rule is applied to the validation data set corre- In order to determine which rules are the strongest, one
sponding to that of the analysis set. Information is collected must compare the rules and how they react with data. To
regarding the support and accuracy of that rule in the new analyze the rules that were have obtained, a ranking sys-
data. tem could be used. Individual rules are ranked according to
The results shown in Table 3 indicate that for rule 1, the their strength (support and accuracy) and stability (change
accuracy and coverage measures are extremely close ac- in support and change in accuracy). Taking into account all
cording to the percentage of data used. This translates into different types of rankings that were used, an overall rank
a rule that is very stable. Rule 2, however, shows an ac- can be determined. The results of this process can be seen
curacy measure of almost twenty percentage points higher in Table 6 where the header R1, R2, R3 and R4 are rank-
Table 5. Statistical Results The results obtained seem to be quite stable, with some
Rule change in support change in accuracy
(%) (%) rules performing beyond expectations, especially those
1 0.480 0.242 rules that were obtain by merging two decision classes to-
2 0.300 16.374 gether. This may indicate that rough sets can be a powerful
3 -1.922 2.578
tool for forecasting time-series data that is comprised of
4 0.360 -7.639
5 -0.420 21.429 uncertain and redundant values. Acquiring reducts is bene-
6 0.541 -2.354 ficial to time-series data analysis, as it removes information
7 -0.781 -2.787 that does not need to be present in order to discern objects
8 -1.081 -5.098 or provide information about relationships between data.
9 -0.541 3.206
10 -0.661 3.163 References
Table 6. Ranking of Rules - Lower is Better [1] R. J. Brachman and T. Anand. The process of knowledge
Rule R1 R2 R3 R4 Final Rank discovery in databases: A human-centered approach. In Ad-
1 1 3 4 1 1 vances in knowledge discovery and data mining, pages 37–
2 8 1 1 9 3 58. American Association for Artificial Intelligence, 1996.
3 4 9 10 3 8 [2] B. S. Chlebus and S. H. Nguyen. On finding optimal dis-
4 10 2 2 8 4 cretizations for two attributes. Lecture Notes in Computer
5 5 6 3 10 7 Science, 1424:537–544, 1998.
6 2 7 5 2 2 [3] U. M. Fayyad, G. Piatetsky-Shapiro, and P. Smyth. From
7 7 4 8 4 6
data mining to knowledge discovery: an overview. In Ad-
8 3 8 9 7 9
9 6 5 5 6 4 vances in knowledge discovery and data mining, pages 1–
10 9 10 7 5 10 34. American Association for Artificial Intelligence, 1996.
[4] L. I. Kuncheva. Fuzzy rough sets: application to feature
ings according to total support, total accuracy, change in selection. Fuzzy Sets Systems, 51(2):147–153, 1992.
support and change in accuracy respectively. The final rank [5] H. Liu, H. Tuo, and Y. Liu. Rough neural network of vari-
able precision. Neural Process. Lett., 19(1):73–87, 2004.
is determined by the previous four rankings for each rule.
[6] A. Ohrn. ROSETTA Technical Reference Manual, 2001.
According to the rankings shown in Table 6, rule 1 is [7] Z. Pawlak. Rough Sets-theoretical Aspects of Reasoning
suggested as the best with low rankings for measures in about Data. Kluwer Academic, 1991.
support, accuracy, change in support, and change in ac- [8] Z. Pawlak. Vagueness - a rough set view. In Structures
curacy. Rule 10 is considered the worst of the ten rules in Logic and Computer Science, A Selection of Essays in
based on the high rankings of total support and total accu- Honor of Andrzej Ehrenfeucht, pages 106–117, 1997.
racy. Rules 4 and 9 both have a rank of four. This is due [9] L. Shen and H. T. Loh. Applying rough sets to market tim-
to the fact that rule 4 has high rankings for total support ing decisions. Decision Support Systems, 37(4):583–597,
and change in accuracy but low rankings for accuracy and 2004.
[10] T. G. Smolinski, D. L. Chenoweth, and J. M. Zurada. Appli-
change in support. Rule 9 has average rankings for each cation of rough sets and neural networks to forecasting uni-
measure. versity facility and administrative cost recovery. In Artificial
A user may give different weights to different rankings. Intelligence and Soft Computing, pages 538–543, 2004.
For example, if a user wishes to rank rules according to sta- [11] W. Su, Y. Su, H. Zhao, and X. dan Zhang. Integration of
bility, he or she may incorporate a higher level of attentive- rough set and neural network for application of generator
ness to change in support and change in accuracy, whereas fault diagnosis. In Rough Sets and Current Trends in Com-
those measure contribute more to the final rankings of rules puting, pages 549–553, 2004.
that those of total support and total accuracy. Equal weight [12] R. Susmaga, W. Michalowski, and R. Slowinski. Identify-
ing regularities in stock portfolio tilting. Technical report,
was given to each measure.
International Institute for Applied Systems Analysis, 1997.
5. Conclusion [13] F. E. Tay and L. Shen. Economic and financial prediction
using rough sets model. European Journal of Operation
The process of time-series data analysis could include the Research, 141:641–659, 2002.
use of rough set tools for knowledge discovery. Using time- [14] S. Taylor. Modelling Financial Time Series. John Wiley &
series data from the New Zealand stock exchange, rough set Sons Ltd, 1986.
analysis acquired reducts used to create rough rules. The [15] Y. F. Wang. Mining stock price using fuzzy rough set sys-
rules, after being tested for accuracy, are used as a fore- tem. Expert Systems with Applications, 24:13–23, 2003.
[16] Y. Yao, S. Wong, and T. Lin. A review of rough set mod-
casting tool to predict future configurations of data. Rough
els. In Rough Sets and Data Mining: Analysis for Imprecise
sets succeed in this task since they are able to describe un- Data, 1996.
certain data from information derived from precise, certain [17] M. Zhang and J. T. Yao. A rough sets approach to feature
data. The data should be prepared so that it contains at- selection. In Proceedings of the 23rd International Confer-
tributes that have minimal loss of information when they ence of NAFIPS, pages 434–439, 2004.
are discretized.