0% found this document useful (0 votes)
55 views

Decision Tree Method in Financial Analysis of Listed Logistics Companies

Data mining is the non-trivial process of obtaining valid, novel, potentially useful and ultimately understandable patterns from the large amount of data. Data mining has a preliminary customer design, relationship finance and military, product biomedical and other fields. The application of data mining in the financial area focuses on the study of financial early-warning model.

Uploaded by

Muhammad Arslan
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
55 views

Decision Tree Method in Financial Analysis of Listed Logistics Companies

Data mining is the non-trivial process of obtaining valid, novel, potentially useful and ultimately understandable patterns from the large amount of data. Data mining has a preliminary customer design, relationship finance and military, product biomedical and other fields. The application of data mining in the financial area focuses on the study of financial early-warning model.

Uploaded by

Muhammad Arslan
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

2010 International Conference on Intelligent Computation Technology and Automation

Decision tree method in financial analysis


of listed logistics companies
Gu Yu Guo Wenjuan
Beijing Wuzi University Beijing Wuzi University
Beijing, China Beijing, China
101149 101149

Abstract—The paper introduces decision tree knowledge from large amounts of data.
algorithm and C5.0 algorithm in the data mining Currently, data mining has a preliminary
at first. Then it introduces financial analysis application in customer relationship
methods, the problems which need to pay attention management, product design, finance and
to in application and the selection process of securities, telecommunications, military,
attributes. At last, we study the financial ratios of biomedical and other fields. The application of
listed logistics companies through the application data mining in the financial area focuses on the
of SPSS Clenmentine12.0 software。The accuracy study of financial early-warning model. Of
of this model is as high as 95.83%. course, the financial early-warning model is a
key point in the financial area, but other areas of
Keywords-decision tree; listed logistics
finance cannot be ignored. In this paper, we use
companies; profits; financial ratios
the decision tree in data mining to analyze which
I. INTRODUCTION financial ratios has strong correlation with the
profit growth of listed logistics companies. I hope
With the rapid development of computer
this paper can play a role in attracting valuable
technology, various industries have accumulated
opinions and lead more scholars to apply the data
large amounts of data, and the amount of data is
mining in the various financial fields.
increasing day by day. People are aware that
these data have a vast reservoir of knowledge. II. DECISION TREE METHOD
However, if we only rely on the understanding of
A. Decision Tree Principles
people own to tap the knowledge is impossible.
The community called for the need of a powerful
The foundation stone of the decision tree
data mining tools, so data mining came into
learning is the concept of learning systems
being. The concept of "Data mining" concept was
framework approach (Concept Learning System
first used by Usama Fayaad 1995 in Montreal,
framework, CLS) which is proposed by Hunt et
Canada, on the first session of the Knowledge
al in1960. Decision tree is a tree structure similar
Discovery and Data Mining International
to the flow chart, which each internal node
Conference. The technical definition of data
(non-leaf node) represents an attribute on the test,
mining: Data mining also known as knowledge
that is, a divided property. The basic steps of
discovery in databases which means the
decision tree classification model are as follows:
non-trivial process of obtaining valid, novel,
First, we divide the sample data into the training
potentially useful and ultimately understandable
samples and test samples according to the
patterns from the large amount of data. To put it
proportion. Secondly, we generate a decision tree
simply, data mining is to extract live "mining"
model according to the training samples. There

978-0-7695-4077-1/10 $26.00 © 2010 IEEE 1101


DOI 10.1109/ICICTA.2010.493

Authorized licensed use limited to: IEEE Xplore. Downloaded on November 03,2010 at 10:23:33 UTC from IEEE Xplore. Restrictions apply.
are two key points in model generation. One is The amount of information: Suppose the training
the selection of split attributes. Attribute selection data set D has two sets. They are YD set and ND
criteria are information gain which created by set, and each contains a collection of the
Quinlan, information gain ratio, the minimum corresponding sample size for y and d, then the
GINI indicators. We usually hope that the tree formula for the amount of information is as
can growth as much as possible. However, follows:
although this can increase its accuracy on the
training samples, but it will reduce the accuracy
of the test samples, that is, we often say that the
phenomenon of over-fitting. So another key point Information expectations: If we use the attribute
is to handle over-fitting problems through A as the root of a decision tree, A has v
pruning. Pruning treatment is divided into two
values , it will be divided A into v
kinds of pre-pruning and post-pruning. Third, use
the decision tree to classify the test samples to
sub-set . Suppose A contains y
obtain useful conclusions.
belong to the category YD set, contains n belong
B. .C5.0 algorithm to ND set, then the information expectations
which a subset is needed is as follows:
Commonly used decision tree algorithms are
ID3 algorithm, C4.5 algorithm, C5.0 algorithm,
CARPT algorithms, CHAID algorithm, PUBLIC
algorithm, SLIQ algorithm and SPRLNT Information gain: The formula of information
algorithm. ID series of algorithms are the most gain which use A as root is as follows:
influential in the international decision-tree
algorithm, and the C5.0 algorithm is based on
the ID series of algorithms. Selected attributes Split information: As the search strategy of ID
metrics of C5.0 algorithm - Gain ratio is series algorithm led to a shorter tree is easy to
calculated as follows:Information Entropy: develop than the longer tree. This could lead to
Information entropy is used to measure the inductive bias. For example, a training data set
uncertainty of the information sources X overall. has n samples, and the attribute A for each
Suppose a collection of sample data for X, and X
contains x sample data. Assume that class label sample has a value . So this
attribute with different values of n, define n attribute in the training data set has the largest
information gain. Then this attribute can predict
different classes . Suppose the
the target attribute of training data. So this
attribute will be selected as the decision attribute
number of samples of are , the probability
of the tree root node. Thereby, generating a
decision tree which is very wide and the depth is
of any sample is , . Then the one. We can imagine that when the decision tree
model was applied to test the sample data, the
information entropy as follows: effect will be poor. Therefore, in order to avoid
the bias and to make up for lack of ID series of
algorithms we use split information. Split
information is used to measure the breadth and

1102

Authorized licensed use limited to: IEEE Xplore. Downloaded on November 03,2010 at 10:23:33 UTC from IEEE Xplore. Restrictions apply.
uniformity of data. The formula of split So we are unable to take measures to achieve
information as follows: profit and avoid loss. Ratio analysis means
determine the level of economic activity by
calculating the various ratios. Ratio is a relative
number, this approach can change
non-comparable indicator into comparable
So finally we get the gain ratio as shown by the indicator. Ratio analysis method has the
following formula: advantage of simple calculation, the calculation
results also relatively easy to judge. But the ratio
analysis method is not perfect, so we should pay
attention to these following issues when we
apply it: the child and the mother items which
In fact, C5.0 is not only changing the metrics used to calculate the ratio must have a certain
which means add the split information to deal amount of logical relations (such as cause and
with inductive bias issue, but also can discrete effect relationship), so as to ensure the financial
the continuous attributes. More importantly, ratios can tell certain problems. In other words,
C5.0 algorithm is the classification algorithm the ratio should have the financial significance;
which can apply to large data sets and it is the child and mother items which used to
improved in the execution efficiency and calculate the ratio must be maintain consistent in
memory use. In reality we usually deal with the the time and scope of calculation. Factor
large data so C5.0 algorithm has more practical analysis method bases on the relationship
significance. Based on these advantages, I select between analysis indicator and driven factors
the C5.0 algorithm to build decision tree model. and can determine the direction and level of
impact by quantity. Factor analysis method can
III. FINANCIAL ANALYSIS
not only make comprehensive analysis of
A. financial analysis method Introduction various factors’ impact on certain economic
indicators, but also can make separate analysis
Financial analysis methods include trend
of certain factor’ impact on certain economic
analysis, ratio analysis and factor analysis. Trend
indicators. So this method applied quite widely
analysis means compare the same indicator
in the financial analysis. But in actual
among two or several consecutive financial
application should pay attention to the following
report to determine changes in the direction of
issues: determine the factors which constitute the
increase or decrease ,the amount and magnitude
economic indicators must have cause and effect
to explain the trends of change of enterprise's
relationship objectively and can reflect the
financial condition or operating results. Trend
inherent causes of the differences of the
analysis method has the advantage of simple and
indicator, otherwise we lose the value of its
intuitive. Its shortcomings are: This method
existence; Alternative factors must follow the
require of comparative analysis of indicators of
interdependence of various factors, arrange in a
different periods, but sometimes the diameter of
certain order, otherwise you will arrive at
calculation inconsistent; this method cannot
different results; Maintain the chain of
exclude the impact of sporadic project which
calculation program in order to make the sum of
lead to the data for analysis does not reflect the
various factors is equal to the difference of the
normal operating conditions; this method doesn’t
change of analysis indicator. So we can fully
make significant analysis on indicator which has
explain the reasons for the change of indicator;
significant change and doesn’t study its causes.

1103

Authorized licensed use limited to: IEEE Xplore. Downloaded on November 03,2010 at 10:23:33 UTC from IEEE Xplore. Restrictions apply.
Solvency Short-term solvency enterprise's financial condition and operating

analysis Analysis: Current Ratio; results.


TABLE 1: FINANCIAL INDICATORS
Quick Ratio
The selection progress of these financial
The long-term solvency
indicators as follows: Firstly, we divide financial
analysis: asset-liability ratio;
indicators into five categories as solvency
Interest coverage ratio
analysis, operational capability analysis,
Viability Liquid flow situation
profitability analysis, development capacity
analysis analysis: inventory turnover
analysis and level of risk analysis; Secondly,
rate
then we subdivide solvency analysis into
Analysis of turnover of
short-term liquidity and long-term solvency
non-current assets: Fixed asset
analysis. We subdivide operational capability
turnover
analysis into current assets turnover analysis,
Total assets of the liquidity
non-current assets turnover analysis and the total
situation analysis: total asset
assets turnover analysis. Profitability analysis is
turnover ratio
subdivided into the viability analysis, assets
Profitability Viability analysis: Sales net
profitability analysis and capital assets
Analysis profit margin
profitability analysis. Capacity analysis is
Assets, profitability analysis:
subdivided into profit growth capability analysis,
total assets of the net profit
asset growth capability analysis and capital
margin
growth capacity analysis. The level of risk
Capital Profitability analysis:
analysis is subdivided into the level of financial
return on invested capital
risk analysis and the level of business risk
Development Ability to grow profitability
analysis. In other words, we divide financial
Capacity analysis: net profit growth rate
indicators into 13 indicators of capacity analysis;
Analysis Asset growth capacity
At last, we set out the financial indicators which
analysis: growth rate of total
corresponding with the indicators of capacity
assets
analysis. Both short-term solvency analysis and
Capital growth capacity
long-term solvency analysis contain two
analysis: the rate of capital
financial ratios. So finally we get 15 financial
accumulation
indicators. They are in the table 1.
The level of The level of financial risk
risk analysis analysis: financial leverage IV. MODEL
factor A. Sample selection:
The level of business risk
analysis: degree of operating According to 《 industry guidelines on the
leverage classification of listed companies 》 which
Make sure that the assumption is a logical established by the China Securities Regulatory
assumption which has real economic Commission in 2001, that the logistics industry
significance. belongs to "transportation and warehousing
industry". Then we search each company's
B. Selection of financial indicators "operating range" and the "core business" on the
In this paper, we use the ratio analysis method. "Orient Securities" website. Finally, we filter out
Then, I select 15 financial ratios from the 35 listed logistics companies which main
analysis indicators which evaluate the business is logistics business as the study sample.

1104

Authorized licensed use limited to: IEEE Xplore. Downloaded on November 03,2010 at 10:23:33 UTC from IEEE Xplore. Restrictions apply.
We found financial ratios of 2007 and net profit
of 2007 and 2008 from GuoTaiAn database.
After delete the company which data is not
complete, we get 24 logistics companies.

B. Analysis of the company's profit changes:


Introduce the company code and net profit of
2007 and 2008 in the excel table. Then use net
profits of 2008 minus net profits of 2007 for
each logistics company. If the outcome is
positive, it means net profit of 2008 achieves
profit growth compare with net profit of 2007,
we use “Y” to represent. If the outcome is
negative, it means net profit of 2008 decrease
compare with net profit of 2007, we use “N” to
represent.

C. Modeling thought:

We use financial indicators of the 24 logistics


company as attributes and use whether the profit
of 2008 will rise as a target. Then we apply the
C5.0 components of SPSS Clementine12.0 to
analysis data to generate decision tree model.
We find some attributes which have larger
information gain ratio. These attributes play an
important role in whether the profit will rise of
2008. After that, make analysis of the decision
tree model to see how accuracy of the model.
Therefore, the whole process is composed of two
experiments: Experiment 1 is build the decision
Figure1: Decision tree model
tree model; Experiment 2 is testing the model
accuracy.
From the decision tree model, we can see that
interest coverage ratio and asset-liability ratio
Experiment 1: Build the decision tree model
play an important role in whether net profits of
2008 rise. These two attributes can give more
Step one: Add an "EXCEL" node, using
information gain ratio than the other 13
"EXCEL" node reads data; Step two: Add a
attributes. These two attributes belong to the
"type" node, set the direction of “ whether
long-term solvency analysis which also verifies
growth” to “output”; Step three: Add the
the model is reasonable from another angle.
decision tree model "C5.0" (parameter: the use
Further, we can see that in the 24 listed logistics
of partitioned data; output data for the decision
companies, there are 15 companies in 2008
tree; simple model); Step four: Click the
achieve profit growth, there are nine companies
"executive" to generate the result. The result is
the Figure 1.

1105

Authorized licensed use limited to: IEEE Xplore. Downloaded on November 03,2010 at 10:23:33 UTC from IEEE Xplore. Restrictions apply.
lose. We can extract out criteria of growth or of the interest coverage ratio as 37.236 and two
reduction of 2008. The criteria of profit grow of split points of assets-liability ratio as 0.440 and
2008 is interest coverage ratio less than or equal 0.553. And the correct rate of this model is
37.236. There are two criteria for profit down of 95.83%.Through monitor these two financial
2008. One is the interest coverage ratio is greater indicators we can predict the profit of next year
than 37.236 and the other is asset-liability ratio will rise or decrease. If the profit decreases, this
is greater than 0.440 and less than or equal to should arouse the attention of management.
0.553. Experiment 2: Test the accuracy of the Then the management should find the reason and
model take measure before the enterprise suffers in
financial distress. In addition, when we use this
We add the “whether growth” model after the model we should pay attention to the treatment
“type” node. Then we add “analysis” mode. of the default value and prevention of
After the implementation we get the following over-fitting phenomenon.
results:
ACKNOWLEDGMENT
TABLE 2: COMPARING $ C-IS GROWTH AND
This paper obtains funding from Beijing
WHETHER GROWTH
Municipal Education Commission project
"Logistics cost research theory and
Correct 23 95.83%
methodology" (SM200810037006) and Research
Mistake 1 4.17%
and Innovation in Business Administration base
Total 24
of Beijing Wuzi University.
As shown in the table, we have selected 24
companies; the model can make right judgments REFERENCES
of 23 companies and only make one judgment
wrong. So the correct rate of the decision tree [1] Chun-Chieh Wu. Financial Distress Prediction: Data
Mining Methods and application [J]. Journal of
model is 95.83%. This shows that this model has Tsinghua University (Philosophy and Social Science
Edition) No.S1 2006 Vol.21, 45-53
a high accuracy.
[2] Qian Xiaodong. Classification in Data Mining
Methods. LIBRARY AND INFORMATION
SERVICE Vol .51, No .3, March ,2007,68-71, 108
Ⅳ. Conclusion
[3] Meijuan Gao, Jingwen Tian, and Shiru Zhou. The of
Building Logistics Cost Forecast Based on Radial
Basic Probabilistic Neural Network. Proceedings of
In this paper, we analyze 15 financial ratios of the IEEE International Conference on Automation and
listed logistics companies. We get the conclusion Logistics Shenyang, China August 2009, 68-71,108
[4] Teams Yan. Chinese logistics companies the market
that interest coverage ratio and asset-liability value of debt financing and research. Enterprises
Economic Research. No.13 2006,63-68
ratio play an important role in whether next year
profit will rise. The model finds a division point

1106

Authorized licensed use limited to: IEEE Xplore. Downloaded on November 03,2010 at 10:23:33 UTC from IEEE Xplore. Restrictions apply.

You might also like