0% found this document useful (0 votes)
7 views

Data Science Final Report

The final report from the University of Economics Ho Chi Minh City focuses on predicting bankruptcy risks in U.S. companies and proposing effective investment strategies using data science methodologies. It includes a detailed evaluation of the research process, methodologies, and results, highlighting the importance of data analysis in financial decision-making. The project acknowledges contributions from team members and emphasizes the guidance received from the supervising professor.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

Data Science Final Report

The final report from the University of Economics Ho Chi Minh City focuses on predicting bankruptcy risks in U.S. companies and proposing effective investment strategies using data science methodologies. It includes a detailed evaluation of the research process, methodologies, and results, highlighting the importance of data analysis in financial decision-making. The project acknowledges contributions from team members and emphasizes the guidance received from the supervising professor.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 61

lOMoARcPSD|49437579

DATA Science Final Report

Khoa học dữ liệu (Trường Đại học Kinh tế Thành phố Hồ Chí Minh)

Scan to open on Studocu

Studocu is not sponsored or endorsed by any college or university


Downloaded by THI?N LÊ MINH ([email protected])
lOMoARcPSD|49437579

UNIVERSITY OF ECONOMICS HO CHI MINH CITY


SCHOOL OF ECONOMICS, LAW, AND PUBLIC ADMINISTRATION
DEPARTMENT OF ECONOMICS

FINAL TERM PAPER


TOPIC
PREDICTING THE RISK OF BANKRUPTCY IN U.S. COMPANIES AND
PROPOSING EFFECTIVE INVESTMENT STRATEGIES

Supervising Professor: Dr. Vo Van Hai


Subject: Data Science
Class– Course: IV0002 – 49
Course Code: 24C1INF50905917

Ho Chi Minh City, December 8, 2024


i

Downloaded by THI?N LÊ MINH ([email protected])


lOMoARcPSD|49437579

Term Paper
Subject DATA SCIENCE
TOPIC “PREDICTING THE RISK OF BANKRUPTCY IN U.S. COMPANIES AND
PROPOSING EFFECTIVE INVESTMENT STRATEGIES”
Supervising Professor Dr. Vo Van Hai
Class– Course 24C1INF50905917
Actual Submission Date December 8, 2024

WORK EVALUATION TABLE


No. Full Name Student ID Assigned Task Contribution
Level
1 Nguyen Ngoc Bao Tram 31231023258 - Selecting the research 25%
topic
- Developing the
framework outline
- Composing the content
for Chapter 3
- Composing the content
for Chapter 4
- Summarizing the
research process
- Creating presentation
slides
- Preparing the
presentation content
2 Diep Minh Tuyen 31231023208 - Selecting the research 25%
topic
- Developing the
framework outline
- Composing the content
for Chapter 2
ii

Downloaded by THI?N LÊ MINH ([email protected])


lOMoARcPSD|49437579

- Composing the content


for Chapter 4
- Summarizing the
research process
- Translating the content
into English
- Preparing the
presentation content
3 Bui Anh Tuyet 31231027842 - Selecting the research 25%
topic
- Developing the
framework outline
- Writing the
acknowledgment
section
- Composing the content
for Chapter 1
- Composing the content
for Chapter 4
- Composing the content
for Chapter 5
- Preparing the
presentation content
4 Nguyen Ngoc Uyen Nhi 31231022586 - Selecting the research 25%
topic
- Developing the
framework outline
- Composing the content
for Chapter 1
- Composing the content
for Chapter 4
- Composing the content
for Chapter 6
- Preparing the
presentation content
- Editing the formatting
of the content

iii

Downloaded by THI?N LÊ MINH ([email protected])


lOMoARcPSD|49437579

SCORE: .....................

PROFESSOR’S FEEDBACK:
.....................................................................................................................................................
.....................................................................................................................................................
.....................................................................................................................................................
.....................................................................................................................................................
.....................................................................................................................................................
.....................................................................................................................................................
.....................................................................................................................................................
.....................................................................................................................................................
.....................................................................................................................................................

Final Semester
Academic Year 2024

iv

Downloaded by THI?N LÊ MINH ([email protected])


lOMoARcPSD|49437579

ACKNOWLEDGMENTS
First of all, our team would like to extend our sincere gratitude to our course
instructor, Dr. Võ Văn Hải, for imparting invaluable knowledge and accompanying us
throughout the process of completing this final project. Through his insightful
guidance and shared experiences, we were able to gain significant knowledge that
proved highly beneficial, not only during the project but also in our personal growth.
The lessons he delivered in class, coupled with the diligent research and efforts of all
team members, enabled us to successfully complete this final project. Nevertheless, we
are aware of the limitations in our knowledge and experience, and we acknowledge
that some errors may have occurred during the implementation process. Thus, we
eagerly welcome any constructive feedback from Dr. Hải to help us improve and excel
in future projects.
Additionally, we would like to express our deepest appreciation to the University
of Economics Ho Chi Minh City for including Data Science in the curriculum and
providing us with the opportunity to study and research this field. This course has
allowed us to acquire fascinating knowledge on critical issues related to data
processing and analysis, applying data-driven insights to practical decision-making in
various economic and social domains, particularly in the field of investment, which is
our primary area of focus. While studying this subject, we realized its importance and
novelty to all UEH students, which posed certain challenges in mastering and applying
it to real-world situations. However, under the mentorship and valuable insights of Dr.
Võ Văn Hải, we have been able to learn, understand, and consolidate the essential
knowledge required to carry out this project.
Finally, we would like to extend our heartfelt thanks to the individuals,
organizations, and experts who provided valuable resources and information, which
greatly enriched the data needed for this project. These contributions have been pivotal
in improving and elevating the quality of our research.
We sincerely thank you all!

Downloaded by THI?N LÊ MINH ([email protected])


lOMoARcPSD|49437579

TABLE OF CONTENTS
ACKNOWLEDGMENTS...........................................................................................1
LIST OF ABBREVIATIONS.....................................................................................4
LIST OF FIGURES.....................................................................................................8
LIST OF TABLES.......................................................................................................9
CHAPTER I: INTRODUCTION.............................................................................12
1. Context:...........................................................................................................12
2. Relevance of the topic:....................................................................................13
3. Research questions:........................................................................................13
CHAPTER II: LITERATURE REVIEW................................................................14
1. Related studies:...............................................................................................14
2. Limitations previous studies and directions for innovation:.......................14
CHAP III: RESEARCH METHODOLOGY..........................................................15
1. Data:................................................................................................................15
1.1 Data source:..................................................................................................15
1.2 Data structure:...............................................................................................16
The dataset consists of 78,682 data rows, 20 features with 0% missing data, and
1 meta attribute, as follows:.................................................................................16
2. Algorithms and Analytical Tools...................................................................17
3. Proposed Research Model..............................................................................18
CHAPTER IV: EXPERIMENTAL PROCEDURES..............................................20
1. Data Preprocessing.........................................................................................20
2. Clustering Model............................................................................................22
2.1 Clustering with Hierarchical Clustering method.........................................22
2.2. Clustering with k-Means method:.................................................................26
2.3 Visulazation:..................................................................................................29
2.4 Conclusion:....................................................................................................35
3. Classification models......................................................................................36
2

Downloaded by THI?N LÊ MINH ([email protected])


lOMoARcPSD|49437579

3.1 Classification models.....................................................................................36


3.2 Classification models analysis.......................................................................39
3.3 Evaluation methods for classification models.............................................41
3.3.1 Model evaluation results from Test and Score.........................................41
3.3.2 ROC Analysis Evaluation Method...........................................................41
3.3.3 Confusion matrix method.........................................................................42
3.2 Data forecasting............................................................................................46
3.3 Conclusion.....................................................................................................48
4. Model evaluation.............................................................................................48
CHAPTER V: RESULTS.........................................................................................51
1. Analysis and Answers to Research Questions..............................................51
CHAPTER VI: CONCLUSION...............................................................................55
1. Research Results.............................................................................................55
2. Limitations of the Study.................................................................................55
3. Future Research Directions...........................................................................56
REFERENCES..........................................................................................................57

Downloaded by THI?N LÊ MINH ([email protected])


lOMoARcPSD|49437579

LIST OF ABBREVIATIONS
1. GDP: Gross Domestic Product
2. FED: Federal Reserve System
3. SMEs: Small and Medium Enterprises
4. NYSE: New York Stock Exchange
5. NASDAQ: National Association of Securities Dealers Automated Quotations
6. SEC: Securities and Exchange Commission
7. SVM: Support Vector Machine
8. TPR: True Positive Rate
9. FPR: False Positive Rate
10. EBIT: Earnings Before Interest and Taxes
11. EBITDA: Earnings Before Interest, Taxes, Depreciation, and Amortization
12. CA: Classification Accuracy
13. AUC: Area Under the Curve
14. ROC: Receiver Operating Characteristic
15. TP: True Positive
16. TN: True Negative
17. FP: False Positive
18. FN: False Negative
19. USD: United States Dollar
20. &P GLOBAL : Standard & Poor's Global

Downloaded by THI?N LÊ MINH ([email protected])


lOMoARcPSD|49437579

LIST OF FIGURES

Figure 1.1.1 Forecast of bankruptcy rates in 2021 compared to 2019 across countries
worldwide.............................................................................................................13
Figure 4.1.1 Dataset after being loaded into the File widget........................................20
Figure 4.1.2 Data Observed in the Data Table.............................................................20
Figure 4.1.3 Image of the American Bankruptcy Dataset Information (after randomly
selecting 7,869 instances).....................................................................................21
Figure 4.1.4 Data Preprocessing model.......................................................................22
Figure 4.2.1 Input Data................................................................................................23
Figure 4.2.2 Distances Tool Interface..........................................................................23
Figure 4.2.3 Interface display in Hierarchical Clustering............................................24
Figure 4.2.4 Interface display in Hierarchical Clustering............................................24
Figure 4.2.5 Evaluation of the Hierarchical model through Silhouette plot.................25
Figure 4.2.6 Evaluation of the Hierarchical model through Silhouette plot.................25
Figure 4.2.7 Evaluation of the Hierarchical model through Silhouette plot.................26
Figure 4.2.8 Clustering results using Hierarchical Clustering......................................26
Figure 4.2.9 Data display at k-Means..........................................................................27
Figure 4.2.10 Evaluation of k-Means through Silhouette plot.....................................28
Figure 4.2.11 Evaluation of k-Means through Silhouette plot.....................................28
Figure 4.2.12 Clustering results using k-Means...........................................................29
Figure 4.2.13 Visual illustration of dividing data into two clusters for comparison.. . .29
Figure 4.2.14 Division into two clusters for comparison based on company operating
time index.............................................................................................................30
Figure 4.2.15 Comparison of two clusters based on retained earnings........................31
Figure 4.2.16 Comparison of two clusters based on net income..................................32
Figure 4.2.17 Comparison of two clusters based on earnings before interest and tax. .32
Figure 4.2.18 Comparison of two clusters based on inventory....................................33
Figure 4.2.19 Comparison of two clusters based on earnings before interest, tax, and
depreciation..........................................................................................................34
Figure 4.2.20 Comparison of two clusters based on total receivables..........................35
Figure 4.2.21 Clustering Model...................................................................................36
Figure 4.3.1 Input data.................................................................................................37
Figure 4.3.2 Data in the Data Table.............................................................................37
Figure 4.3.3 Data Table interface for training data sampling.......................................38
Figure 4.3.4 Bankrupt dataset (70%)...........................................................................38
Figure 4.3.5 Data Table interface for prediction data sampling...................................39
Figure 4.3.6 Bankrupt dataset (30%)...........................................................................39
Figure 4.3.7 Data splitting model................................................................................40
Figure 4.3.8 Test and Score model for algorithm comparison.....................................41
Figure 4.3.9 Test and Score results..............................................................................41
Figure 4.3.10 ROC curve model at alive value of target variable................................42
Figure 4.3.11 ROC curve model at "failed" value of target variable............................43
Figure 4.3.12 Confusion matrix results of Tree with sample count.............................43
Figure 4.3.13 Confusion matrix results of Tree with prediction ratio..........................44
Figure 4.3.14 Confusion matrix results of SVM with sample count............................44
8

Downloaded by THI?N LÊ MINH ([email protected])


lOMoARcPSD|49437579

Figure 4.3.15 Confusion matrix results of SVM with prediction ratio.........................44


Figure 4.3.16 Confusion matrix results of Logistic Regression with sample count.....45
Figure 4.3.17 Confusion matrix results of Logistic Regression with prediction ratio. .45
Figure 4.3.18 Model for evaluating classification methods.........................................47
Figure 4.3.19 File interface of prediction dataset.........................................................47
Figure 4.3.20 Predictions interface..............................................................................48
Figure 4.3.21 Predicted data in Data Table..................................................................48
Figure 4.3.22 Prediction model....................................................................................49
Figure 4.4.1 Evaluation model divided into 5 parts.....................................................50
Figure 4.4.2 Evaluation model divided into 20 parts...................................................50

Downloaded by THI?N LÊ MINH ([email protected])


lOMoARcPSD|49437579

LIST OF TABLES

Table 3.1.1 Description of variables............................................................................16


Table 4.3.1 Comparision of results from 3 tools based on the confusion matrix........44
Table 4.3.2 Prediction results of three tools using the confusion matrix method.........45
Table 4.4.1 Comparison table of sample splitting ratios into 5 parts and 20 parts.......49
Table 5.1.1 Key financial metrics................................................................................51

Downloaded by THI?N LÊ MINH ([email protected])


lOMoARcPSD|49437579

SUMMARY OF THE RESEARCH PROCESS

Execution Steps Content Timeline

Real-world Context In recent years, the U.S. economy has 09/11/2024


faced significant challenges due to rising
interest rates and the debt crisis.
Companies need to quickly adopt new
technologies to maintain their competitive
edge and avoid negative impacts on
economic development.

Defining the Research The application of data science to 09/11/2024


Topic comprehensively and multidimensionally
analyze companies' financial indicators to
forecast their risk of bankruptcy, thereby
proposing effective investment strategies.

Problem Statement Do U.S. companies today face the risk of 10/11/2024


bankruptcy due to poor management of
financial indicators, or are they also
influenced by other factors that hinder their
ability to continue operations?

Research Objective A comprehensive analysis to identify the 14/11/2024


Definition underlying causes of the bankruptcy risk of
companies. From there, propose solutions
for effective investment strategies.

Finding data Target: U.S. companies 14/11/2024

Data Preprocessing Conduct the process of data collection and 15/11-


preprocessing (if the dataset has errors or is 18/11/2024
incomplete), normalize or encode the data
by category, and split it into training and
testing datasets.

Data Preprocessing Apply machine learning models on the 19/11 -


Orange software to identify the factors 25/11/2024
influencing a company's bankruptcy risk.
Evaluate based on clustering and
10

Downloaded by THI?N LÊ MINH ([email protected])


lOMoARcPSD|49437579

classification models with the highest


accuracy and suitability for the dataset.
Then, use the most suitable model to
forecast on the training dataset and finalize
the research report.

Hypothesis Conclusion Present findings on the factors affecting a 28/11/2024


company's bankruptcy risk. Do all the
financial indicators in the dataset affect the
bankruptcy risk of U.S. companies, and
what investment strategies would be
proposed?

Project Conclusion Summarize the entire research process, 28/11/2024


providing insights into the ability to assess
bankruptcy risk in U.S. companies based
on financial indicators, and propose more
effective investment strategies for the
future.

11

Downloaded by THI?N LÊ MINH ([email protected])


lOMoARcPSD|49437579

CHAPTER I: INTRODUCTION
1. Context:
In recent years, the U.S. economy has faced significant volatility, creating a
complex environment for businesses. First and foremost, uneven economic growth
across sectors has become a major challenge. While GDP has continued to grow,
instability in sectors such as technology and finance has become more pronounced,
especially as companies in these sectors face increasing pressure from rising financial
costs. This is largely due to the Federal Reserve's monetary tightening policy, which
includes raising interest rates to curb inflation. Recently, the Fed raised the key interest
rate by 0.25 percentage points, marking the first rate hike in more than three years,
signaling a "hawkish" stance in monetary policy that has caused concerns in the bond
market about a potential recessions led to higher borrowing costs and significantly
impacted the liquidity of many businesses. Furthermore, an inverted yield curve is
often seen as a signal that investors are more concerned about the near future than the
long-term, causing short-term bond yields to rise higher than long-term bond yields.
In addition, the corporate debt crisis has become a serious threat. Small and
medium-sized enterprises (SMEs), which form the backbone of the U.S. economy, are
under increasing pressure from debt repayment in the context of tight cash flows . Data
sr rise in bankruptcy rates, particularly among companies with weak financial
structures, especially those reliant on short-term debt.
Moreover, technological transformation and global competition are crucial
factors affecting the survival of businesses. In the digital age, delayed adoption of new
technologies can cause companies to lose their competitive advantage, especially in
sectors such as e-commerce, manufacturing, and services. Companies that fail to keep
up with technological advancements often struggle to maintain market share, leading
to prolonged financial downturns.
As such, this volatile market environment highlights the urgent need for studies
that predict bankruptcy risks, particularly in a powerhouse like the United States. The
application of multivariate analysis models not only helps identify risk factors early
but also provides a scientific basis for businesses and policymakers to implement
timely solutions. Therefore, this research is not only academically valuable but also
has significant practical implications for risk reduction and supporting the sustainable
development of U.S. businesses.

12

Downloaded by THI?N LÊ MINH ([email protected])


lOMoARcPSD|49437579

Figure 1.1.1 Forecast of bankruptcy rates in 2021 compared to 2019 across countries
worldwide
Sources: Nation Statistics, Solunion, Euler Hermes, Alianz Research
2. Relevance of the topic:
Although the economic situation and the factors causing financial volatility have
been discussed in the context section, issues related to predicting and managing
bankruptcy risk for businesses remain a significant challenge, especially in the U.S.
market, where competition and changes in the business environment occur rapidly.
Traditional analytical methods have not provided a comprehensive view of the risk
factors affecting a company's survival. Therefore, the application of multivariate
analysis models, which allow for the simultaneous processing and evaluation of
multiple key financial variables, has become a potential solution for accurately
predicting bankruptcy risk. This not only helps investors and business managers
identify risk signals in a timely manner but also enables them to develop appropriate
business strategies. Additionally, this study clarifies the relationship between financial
factors and bankruptcy risk, opening opportunities for applying data science in
business management. By applying modern analytical methods, investors and
managers can make more accurate investment decisions, thereby improving business
performance and promoting sustainable development.
3. Research questions:
To achieve the research goal of predicting bankruptcy risk for U.S. companies
and proposing effective investment strategies, this study will focus on three main
questions.
Question 1: Which financial factors can accurately predict the bankruptcy risk of U.S.
companies, and how do they influence investment decisions?

13

Downloaded by THI?N LÊ MINH ([email protected])


lOMoARcPSD|49437579

The purpose of this question is to identify the most important financial factors
that can accurately predict the bankruptcy risk of U.S. companies. Answering this
question will clarify the relationship between financial indicators and investment
decisions, providing investors with the necessary information to make informed
decisions, reduce risk, and maximize profits.
Question 2: Which financial indicators are the most important in predicting the
bankruptcy risk of U.S. companies, and how does their importance vary by industry?
This question seeks to explore the financial indicators that play a decisive role in
predicting bankruptcy risk, while also analyzing how their significance changes across
different industries. The aim is to highlight the differences between industries and how
financial indicators can be effective in various industry contexts, helping investors
gain a detailed understanding of which factors should be prioritized when analyzing
companies in different sectors.
Question 3: What will be the effect of forecasting whether the company will go
bankrupt or not for the US market economy?
This analysis will clarify the impact of forecasting bankruptcy risk on the U.S.
market economy, particularly the effect these forecasts have on investment decisions
and the stability of financial elements. The goal is to show that accurate forecasting
can enhance market stability, reduce financial risks, and facilitate better investment
decision-making, thus ensuring the sustainable development of the economy.

CHAPTER II: LITERATURE REVIEW


1. Related studies:
Research on predicting corporate bankruptcy has a long developmental history,
beginning with simple models based on traditional financial indicators and gradually
becoming more complex by incorporating various factors. Since the 1960s, many
researchers have put considerable effort into testing bankruptcy prediction across
different countries. The most fundamental and crucial work in the bankruptcy
prediction field is Beaver‘s empirical study (1996). He analyzes thirty financial ratios
among failed and survived firms. Employing univariate analysis, three financial ratios
i.e., total debt / total assets, net income/total assets and cash flow/total debt were found
significant in determining financial distress of a company.
Based on this foundation, Altman (1968) study extended the work of Beaver by
employing multivariate discriminant analysis on twenty two financial variables with a
sample of 66 (33 bankrupt and 33 non-bankrupt) manufacturing companies. The
discriminant analysis selected 5 variables suggesting a cutting point of z-score greater
than 2.99 falls into ―non-bankrupt‖ category while firms having a z-score below 1.81
are all bankrupt[3].

14

Downloaded by THI?N LÊ MINH ([email protected])


lOMoARcPSD|49437579

Following previous studies, Ohlson (1980) introduced logit models to predict


bankruptcy. The author successfully developed O-score by using 9 accounting
variables representing 4 factors (current liquidity, size of the company, performance
and capital structure) with a sample of 2163 companies (105 bankrupt and 2058 non-
bankrupt) over a 1970-1976 period. The model establishes an O > 0.038 threshold to
distinguish between bankrupt and non-bankrupt enterprises, indicating an
improvement in the application of financial factors to forecast risk risk. O-score helps
expand the scope of financial analysis, but there are still limitations similar to Z-score
without incorporating non-financial factors.
Meanwhile, Ming Xu and Chu Zhang (2008) offer a different perspective by
emphasizing the role of non-financial factors such as business history, development
orientation and macroeconomic fluctuations. They point out that financial indicators in
the two years prior to bankruptcy often do not differ significantly between bankrupt
and non-bankrupt enterprises, suggesting that non-financial factors play a greater role
in certain contexts.
In addition, the research of Grice and Ingram (2001) re-examine the Z-score
model, showing that its predictive effectiveness varied depending on the context and
time period. This reflects that the application of financial models depends not only on
the methodology but also on the practical conditions at the time.
2. Limitations previous studies and directions for innovation:
The studies above demonstrate that models such as Z-score and O-score provide
a solid foundation for predicting corporate bankruptcy. However, they still have
certain limitations. The advancement of data science can help build more complex
models, with the ability to analyze large datasets and uncover hidden relationships
between factors. Algorithms and models such as Logistic Regression, Decision Trees,
or SVM can aid in analyzing historical data and optimizing bankruptcy predictions.
Additionally, techniques like clustering can identify similar groups of companies,
which helps in analyzing potential patterns within the data. This opens up many new
research avenues, which could enhance the applicability and accuracy of bankruptcy
prediction models in more complex contexts.

CHAP III: RESEARCH METHODOLOGY


1. Data:
1.1 Data source:
The dataset is called US Company Bankruptcy Prediction Dataset and is derived
from Kangle platform, with specific origin from Link
This is a completely new dataset that introduces the prediction of bankruptcy risk
for publicly listed U.S. companies on the New York Stock Exchange (NYSE) and

15

Downloaded by THI?N LÊ MINH ([email protected])


lOMoARcPSD|49437579

NASDAQ. The dataset aggregates accounting information for 8,262 different


companies, collected from 1999 to 2018.
According to the regulations of the U.S. Securities and Exchange Commission
(SEC), a company in the U.S. is considered bankrupt under two circumstances. The
first occurs when the company's management files for bankruptcy under Chapter 11 of
the Bankruptcy Code, indicating an intention to "restructure" the company. In this
case, the company's management retains the authority to run the daily operations of the
company, although major decisions must be approved by the bankruptcy court. The
second situation arises when management files for bankruptcy under Chapter 7 of the
Bankruptcy Code, signaling an intention to cease business operations entirely, leading
to the company's closure.
1.2 Data structure:
The dataset consists of 78,682 data rows, 20 features with 0% missing data, and 1
meta attribute, as follows:
Table 3.1.1 Description of variables
No. Features Description Data type
1 company_name Names of U.S. companies listed Numeric
on the New York Stock Exchange
(NYSE)

2 status_label Company operational status, with Categorical


(1) failed, (0) alive

3 year Year of company operation Numeric

4 X1 Current Assets Numeric

5 X2 Cost of Goods Sold Numeric

6 X3 Depreciation and Amortization Numeric

7 X4 EBITDA Numeric

8 X5 Inventory Numeric

9 X6 Net Income Numeric

10 X7 Total Receivables Numeric

11 X8 Market Value Numeric

12 X9 Net Sales Numeric

16

Downloaded by THI?N LÊ MINH ([email protected])


lOMoARcPSD|49437579

13 X10 Total Assets Numeric

14 X11 Total Long-term Debt Numeric

15 X12 EBIT Numeric

16 X13 Gross Profit Numeric

17 X14 Total Current Liabilities Numeric

18 X15 Retained Earnings Numeric

19 X16 Total Revenue Numeric

20 X17 Total Liabilities Numeric

21 X18 Total Operating Expenses Numeric

Choosing "status_label" as the target variable for the dataset predicting the
bankruptcy risk of U.S. companies is a logical and critical decision in data analysis.
The "status_label" variable represents the operational status of a company, indicating
whether the company is facing bankruptcy risk or maintaining stable operations. This
not only enhances the accuracy of the predictive model but also provides valuable
information that helps managers, investors, and stakeholders make more effective
strategic decisions. As a result, they can minimize financial risks and optimize
business performance in a proactive and precise manner.
Furthermore, "status_label" is a clear classification variable, simplifying the
model training and evaluation process. Specifically, the target variable classifies
companies into two groups: (1) companies that have gone bankrupt and (0) companies
that remain operational, based on significant events such as filing for bankruptcy under
Chapter 11 or Chapter 7 of the Bankruptcy Code. Overall, selecting this variable as the
target will optimize resources and efforts in addressing the critical issue of detecting
and preventing the bankruptcy risk of a company. This not only protects the interests
of stakeholders but also supports the stability and sustainable development of the
economy.
2. Algorithms and Analytical Tools
To analyze the dataset predicting bankruptcy risk of U.S. companies with the
target variable "status_label" on Orange, the following algorithms and tools can be
used:
 Preprocess data: This is the data preprocessing step, which transforms the
input data into suitable output data for subsequent programs. The results from

17

Downloaded by THI?N LÊ MINH ([email protected])


lOMoARcPSD|49437579

this process are used as inputs for other programs, such as compilers, to prepare
and optimize the data, making the following steps easier to process.
 k-Means Clustering: This is an unsupervised clustering algorithm where data
is divided into K groups based on the distance between data points and the
centroid of each cluster. Data points within the same cluster are similar to each
other, while different clusters are distinct, with each cluster represented by a
central point called the centroid, and K is a predefined constant.
 Hierarchical Clustering: This method builds a hierarchical tree (dendrogram –
a tree diagram showing the process of grouping data into clusters at different
levels) to describe how clusters form in a sequential manner.
 Silhouette Analysis for Clustering: This is a metric that assesses the quality of
clustering, indicating the degree of fit of each data point to its current cluster
compared to other clusters. It helps determine whether the clusters have been
clearly separated.
 Logistic Regression: This is a probability model that predicts discrete output
values from a set of input values (represented as a vector).
 Decision Tree: A tool for building predictive models, used to classify data and
generalize given data in the field of data mining.
 SVM (Support Vector Machines): A supervised machine learning algorithm
widely used in classification and regression problems. The objective of SVM is
to represent data as vectors in a space, and then classify them into different
classes by constructing an optimal hyperplane in a multi-dimensional space that
separates the data classes.
 Test & Score model :An analytical tool for testing and evaluating models on
datasets, helping compute and display the performance results of the model.
 Confusion Matrix): A crucial tool for classification models, it assesses the
performance of algorithms, identifies the errors made by the classification
model, and adjusts decisions based on the evaluation results.
 ROC Analysis: A graphical tool widely used for evaluating the performance of
classification models. The curve is created by plotting the True Positive Rate
(TPR) against the False Positive Rate (FPR) at different thresholds..
3. Proposed Research Model
To model the prediction of a company’s operational status as either bankrupt or
still in normal operation, the research team proposes applying a binary classification
model combined with appropriate machine learning algorithms and analytical tools on
the Orange platform. The objective of this model is to use financial indicators and
operational characteristics of the company to predict the target variable "status_label,"
helping determine the company’s bankruptcy risk with the following detailed process:

18

Downloaded by THI?N LÊ MINH ([email protected])


lOMoARcPSD|49437579

 Data Collection and Preprocessing: The dataset is sourced from Kaggle and
undergoes necessary cleaning steps. First, missing values are handled by either
filling in missing information or removing incomplete samples, depending on
the impact of the missing data on the model’s performance. Next, the data is
normalized to bring financial features to a common value range, optimizing the
performance of the algorithms.
 Data Analysis and Visualization: Appropriate methods are used to identify
the most important features affecting the target variable "status_label," helping
reduce input dimensions and increase the model’s effectiveness. Additionally,
visualization tools such as scatter plots, box plots, and heatmaps in Orange are
applied to explore and gain a deeper understanding of the data’s characteristics.
 Machine Learning Models and Algorithms Used: Algorithms such as
Logistic Regression, Decision Tree, SVM, etc., are selected to optimize the
predictive capabilities of the binary classification model and evaluate the
performance of each model through tools like Test & Score, Confusion Matrix,
etc.
 Optimization of Model Selection: After evaluating the performance of the
proposed models, the team will select the optimal model for the prediction task.
This model will be capable of accurately classifying companies at risk of
bankruptcy, helping support investors and managers in making financial and
strategic decisions.
The proposed modeling process for the dataset helps build a system capable of
accurately and effectively predicting the bankruptcy risk of U.S. companies. From data
collection and preprocessing to the selection of the optimized model for analysis, each
stage significantly contributes to the final model’s performance. The optimized model
chosen not only provides accurate forecasting capabilities but also supports managers,
investors, and stakeholders in minimizing risks and maximizing financial benefits.
With this proposed process, we can promptly identify important financial indicators,
improving the competitiveness and sustainability of businesses.

19

Downloaded by THI?N LÊ MINH ([email protected])


lOMoARcPSD|49437579

CHAPTER IV: EXPERIMENTAL PROCEDURES


1. Data Preprocessing
In the Orange software, select the File tool in the Data menu to import the
"Bankruptcy Prediction" dataset.

Figure 4.1.2 Dataset after being loaded into the File widget
To inspect the "Bankruptcy Prediction" dataset before preprocessing, drag and
drop it into the File widget and select Data Table. The dataset can be viewed by
clicking on the Data Table.

Figure 4.1.3 Data Observed in the Data Table

Comments: Based on the detailed information from the dataset, we can deduce the
following:

 Instances: The dataset contains 78.682 instances. This is a large dataset,


providing a comprehensive view of the variables being analyzed.
20

Downloaded by THI?N LÊ MINH ([email protected])


lOMoARcPSD|49437579

 No Missing Data: The dataset has no missing values, ensuring that all variables
are complete and consistent, making the analysis process more accurate and
reliable.
 Features: The dataset includes 19 features, such as company status, company
name, year of operation, and several indicators like current assets, cost of goods
sold, etc. This allows for multidimensional analysis and exploration of
relationships between these features.
 Numeric Outcome: The dataset contains numerical outcomes, which may be
related to important metrics such as revenue, profit, asset value, or other
continuous variables. This dataset may be suitable for regression models or
other predictive numerical analyses.
 Meta Attributes: The dataset contains 1 meta attribute (classification label),
which can aid in the analysis by focusing on specific groups, such as company
status ("alive").

Conclusion: This dataset appears well-organized and clean, making it a good


candidate for analysis to explore the relationships between environmental factors and
business performance.
Addressing the Large Sample Size: Due to the large size of the dataset (78,862
instances), clustering cannot be performed. Therefore, the team decided to randomly
sample 10% of the original dataset for use in the project.

Figure 4.1.4 Image of the American Bankruptcy Dataset Information (after randomly
selecting 7,869 instances)

21

Downloaded by THI?N LÊ MINH ([email protected])


lOMoARcPSD|49437579

Figure 4.1.5 Data Preprocessing model


2. Clustering Model
Clustering plays a crucial role in grouping companies with similar financial
characteristics and operational performance. Specifically, through clustering
algorithms such as k-Means or Hierarchical Clustering, companies can be classified
into groups based on factors such as debt-to-equity ratio, liquidity, operating cash
flow, and profitability indicators. The clustering results help identify groups of
companies with high, medium, or low bankruptcy risk, thereby assisting investors in
making appropriate investment decisions. Moreover, clustering also helps identify
potential groups for strategic investment, while suggesting improvement solutions for
companies at high risk, such as financial restructuring or cash flow optimization. This
method not only enhances predictive effectiveness but also optimizes the investment
capital allocation strategy.
2.1 Clustering with Hierarchical Clustering method
Steps:
Step 1: Import Data
 Use the File widget to import the Bankrupt dataset. Ensure that the data
is read in the correct format and check the input variables.

22

Downloaded by THI?N LÊ MINH ([email protected])


lOMoARcPSD|49437579

Figure 4.2.6 Input Data


Step 2: Calculate Distances
 Add the Distances widget to measure the distance between data
points.Trong
 In the Distances dialog box
o Select Compare: Rows.
o In Distance Metrics, select the Euclidean method.

Figure 4.2.7 Distances Tool Interface


Step 3: Hierarchical Clustering
 Connect the Distances widget to Hierarchical Clustering.
 In Hierarchical Clustering:
o Select Linkage: Ward.

23

Downloaded by THI?N LÊ MINH ([email protected])


lOMoARcPSD|49437579

o Set Annotations to company_name to display the labels of the


data points.
o Adjust the Height Ratio to optimize the display of the
dendrogram.

Figure 4.2.8 Interface display in Hierarchical Clustering

Figure 4.2.9 Interface display in Hierarchical Clustering


Step 4: Assessing Clustering Metrics
 Drag the Silhouette Plot widget from Hierarchical Clustering to evaluate the
effectiveness of clustering.
 In the Silhouette Plot:
o Select Distance: Euclidean.
o Select Grouping: Cluster, to calculate the Silhouette score for
each cluster.
Step 5: Analyzing and Selecting the Optimal Number of Clusters

24

Downloaded by THI?N LÊ MINH ([email protected])


lOMoARcPSD|49437579

 Use the Silhouette Plot to observe the Silhouette scores for different
clusters:
o Perform clustering with 2 to 5 clusters and evaluate the number of
positive Silhouette scores in each case.
o Choose the number of clusters with the highest number of
positive Silhouette values, as well as the highest average
Silhouette score.
 Drag the Data Table widget to examine the details of the selected clusters.

Figure 4.2.10 Evaluation of the Hierarchical model through Silhouette plot

Figure 4.2.11 Evaluation of the Hierarchical model through Silhouette plot

25

Downloaded by THI?N LÊ MINH ([email protected])


lOMoARcPSD|49437579

Figure 4.2.12 Evaluation of the Hierarchical model through Silhouette plot


Result:
After applying Hierarchical Clustering, the team analyzed the number of
positive variables across clusters: 7,775 for 2 clusters, 7,699 for 3 clusters, 7,829 for 4
clusters, and 7,387 for 5 clusters. Although 4 clusters produced the highest number of
positive variables, the distribution was highly imbalanced, with significant disparities
in cluster sizes, making segmentation inconsistent and less meaningful. This uneven
distribution also hindered the practical applicability of the results. To resolve these
issues and ensure clear, consistent group separation, the team selected 2 clusters as the
optimal solution. This choice simplifies the analysis, enhances interpretability, and
facilitates the practical application of findings in investment strategies and
management decisions.

Figure 4.2.13 Clustering results using Hierarchical Clustering


2.2. Clustering with k-Means method:
Procedure:

26

Downloaded by THI?N LÊ MINH ([email protected])


lOMoARcPSD|49437579

Step 1: Connect Data to k-Means via Data Sampler


 Drag and drop the Data Sampler widget, connecting it to the Bankrupt
dataset:
o Use the Data Sampler to randomly select 60% of the original
dataset.
o Connect Data Sampler to the k-Means widget for clustering.
Step 2: Determine the Optimal Number of Clusters
 Analyze the results and identify 10 clusters with the best Silhouette
Scores. The team evaluated and compared the Silhouette Scores of
different clusters. The clusters with the highest Silhouette Scores,
indicating clear and consistent separation, were chosen.

Figure 4.2.14 Data display at k-Means


Step 3: Evaluate Clustering with Silhouette Plot
 Drag the Silhouette Plot widget from k-Means to assess clustering
quality.
 In the Silhouette Plot dialog:
o Select Distance: Manhattan for distance calculation.
o Select Grouping: Cluster to visualize the Silhouette scores of each
cluster.

27

Downloaded by THI?N LÊ MINH ([email protected])


lOMoARcPSD|49437579

Figure 4.2.15 Evaluation of k-Means through Silhouette plot

Figure 4.2.16 Evaluation of k-Means through Silhouette plot


Step 4: Observe Detailed Results
 Drag the Data Table widget from the Silhouette Plot to review detailed
data of the selected clusters.
 Analyze the Silhouette Plot:
o Based on the results, dividing the data into 2 clusters was the
optimal solution, achieving an average Silhouette score of 0.947.
o Both clusters had predominantly positive Silhouette scores with
minimal negative values, demonstrating the effectiveness and
consistency of clustering using k-Means.

28

Downloaded by THI?N LÊ MINH ([email protected])


lOMoARcPSD|49437579

Figure 4.2.17 Clustering results using k-Means


Result:
After applying both Hierarchical Clustering and k-Means clustering methods, the
team found that Hierarchical Clustering with 2 clusters provided the optimal results.
This was demonstrated by the high and stable Silhouette scores, with the majority of
data points showing positive scores, indicating clear and meaningful clustering. On the
other hand, although k-Means achieved reasonable Silhouette scores, it resulted in
uneven cluster distribution, which led to some clusters lacking clear separation,
making it difficult to define distinct groups. These findings highlight that Hierarchical
Clustering is more suitable for processing and analyzing the dataset, as it enables the
precise identification of groups of companies at risk of bankruptcy or with growth
potential. This allows the team to propose more targeted and effective investment
strategies tailored to each group’s specific characteristics, emphasizing the importance
of choosing the right clustering model aligned with the research objectives.
2.3 Visulazation:
The clustering results using the Hierarchical Clustering method showed that
dividing the data into 2 clusters was the optimal choice for this problem. The team
visualized the data using tools such as Distributions and Box Plot to analyze key
attributes that significantly impact a company's ability to survive or go bankrupt,
helping to better identify the main influencing factors.

Figure 4.2.18 Visual illustration of dividing data into two clusters for comparison.

29

Downloaded by THI?N LÊ MINH ([email protected])


lOMoARcPSD|49437579

Compared to cluster C1, where the bankruptcy rate is 29.28%, cluster C2 has a
bankruptcy rate of 0%. To identify the factors causing this difference, the team
proceeded with further analysis and selected key indicators, including:
The company's operating period

Figure 4.2.19 Division into two clusters for comparison based on company operating
time index
 Cluster C1 consists of companies that operated from 2002 to 2011, while
Cluster C2 focuses on companies operating between 2009 and 2016. A notable
factor is the global financial crisis that occurred from 2007 to 2008, which
could have significantly impacted the companies in Cluster C1. Start-ups or
businesses that were active before the crisis may have struggled to adapt to the
dramatic changes in the financial environment. Specifically, these companies
might have relied on business models or funding sources that became obsolete
after the crisis, leading to difficulties and a higher bankruptcy rate.
 In contrast, companies in Cluster C2 emerged after the crisis and likely learned
from the mistakes of earlier companies. They may have adjusted their business
models and developed more cautious financial strategies, while also taking
advantage of the economic recovery to reduce bankruptcy rates. These
companies might have better seized market opportunities once the economy
stabilized after 2008. Additionally, companies in Cluster C2 may have operated
in a less competitive environment, with many other companies failing during
the crisis, or they may have been equipped with more modern technologies and
business strategies compared to companies in Cluster C1.

30

Downloaded by THI?N LÊ MINH ([email protected])


lOMoARcPSD|49437579

Retained Earnings

Figure 4.2.20 Comparison of two clusters based on retained earnings


 The retained earnings distribution for Cluster C1 shows significant variation,
with most values clustered near zero or negative, and some instances reaching
positive values. This reflects an unstable financial performance, with many
companies facing accumulated losses or an inability to accumulate profits. The
wide distribution in retained earnings also indicates considerable differences
among companies in terms of crisis response, financial management strategies,
and operational scale.
 In contrast, the retained earnings distribution for Cluster C2 is entirely positive,
with values ranging from 150,000 to 200,000. This suggests that companies in
this cluster maintain financial stability and consistent operational performance,
with no cases of accumulated losses. Although the retained earnings are not
particularly high, this may reflect a sustainable development strategy, focusing
on investment and growth rather than passive capital accumulation. This
approach enables companies to optimize resources and reinvest for further
growth.

31

Downloaded by THI?N LÊ MINH ([email protected])


lOMoARcPSD|49437579

Net Income

Figure 4.2.21 Comparison of two clusters based on net income


 Cluster C1 includes companies with uneven net income distribution, showing a
significant gap between companies, often in the negative or near-zero range.
Some companies achieve exceptional profits, while many others struggle,
leading to a high bankruptcy rate. This reflects an unstable and challenging
business environment, despite the potential for significant profits.
 In contrast, Cluster C2 consists of companies with net income ranging from
20,000 to 50,000, demonstrating high stability and uniformity among
businesses. This suggests a sustainable business strategy. The majority of these
companies operated between 2009 and 2016, a period of more favorable
economic conditions, allowing them to sustain long-term viability without
needing to achieve excessively high profits. This reflects stability and efficiency
in their operations.
EBIT

Figure 4.2.22 Comparison of two clusters based on earnings before interest and tax

32

Downloaded by THI?N LÊ MINH ([email protected])


lOMoARcPSD|49437579

 Cluster C1 consists of companies operating from 2002 to 2011, with low


financial performance as reflected by EBIT values mainly being negative or
very low. The bankruptcy rate of this cluster is as high as 29.28%, reflecting the
difficulties in maintaining business operations. These challenges may stem from
limitations in adapting to market changes, as well as inefficient management
and business strategies. Additionally, the financial performance disparities
between companies suggest variations in management practices and their ability
to seize business opportunities.
 Cluster C2 focuses on companies operating between 2009 and 2016, with very
high EBIT values, indicating superior financial performance and operational
stability. The 0% bankruptcy rate in this cluster shows that these businesses not
only operate efficiently but also manage risks effectively. This may be
attributed to strong strategic management capabilities and the ability to leverage
resources for sustainable growth. The stark contrast between the two clusters
highlights the importance of business management capabilities and adaptability
to market conditions in maintaining and enhancing operational performance.
Inventory

Figure 4.2.23 Comparison of two clusters based on inventory


 The inventory levels of companies in Cluster C1 are very high, reflecting the
characteristics of industries that require large amounts of raw materials or
finished goods to be stored. However, this high inventory also presents
challenges in managing storage costs and carries the risk of goods depreciating
if not handled promptly. This could impact the operational efficiency and cash
flow of the businesses in this cluster.
 In contrast, Cluster C2 shows lower inventory values, suggesting more efficient
supply chain management or a business focus on areas that are less reliant on
inventory, such as services or made-to-order manufacturing. This optimization

33

Downloaded by THI?N LÊ MINH ([email protected])


lOMoARcPSD|49437579

helps reduce storage costs and financial risks, while also indicating stability in
the operational strategies of the companies within this cluster.
EBITDA

Figure 4.2.24 Comparison of two clusters based on earnings before interest, tax, and
depreciation
 The EBITDA index of companies in cluster 1 exhibits significant dispersion,
with a substantial portion in the negative range. This indicates difficulties in
generating profits before interest, tax, and depreciation expenses. Such
challenges may result from inefficient business strategies, high operational
costs, or negative impacts of unfavorable macroeconomic conditions.
Companies in this cluster are likely facing major financial and operational
management challenges.
 The EBITDA index in cluster 2 is concentrated at high levels, with no negative
values observed. This reflects better management capabilities, enabling
companies in this cluster to maintain efficient operations and achieve stable
profitability. These businesses may have effectively seized opportunities during
the economic recovery period and adjusted their strategies to optimize profits
before financial expenses.

34

Downloaded by THI?N LÊ MINH ([email protected])


lOMoARcPSD|49437579

Total Receivables

Figure 4.2.25 Comparison of two clusters based on total receivables


 Companies in cluster 2 primarily have total accounts receivable values ranging
from 0 to 10,000, with the highest frequency observed near 7,000. This
indicates that most businesses in this cluster exhibit low and relatively uniform
receivable values. However, this concentration at lower levels may reflect a
cautious credit management strategy or the smaller operational scale of the
companies within this group.
 In contrast, accounts receivable values in cluster 2 mainly range from 10,000 to
20,000, with a distribution peak around 15,000. The lower frequency compared
to Cluster C1 suggests that fewer companies belong to this cluster, but they tend
to handle larger receivable amounts. This could indicate characteristics of larger
companies with higher credit-granting capacities or better cash flow
management to handle significant receivable values.
2.4 Conclusion:
From the above analysis, it is evident that the businesses in this study are divided
into two clusters with distinctly different financial and operational characteristics.
 Cluster C1: Represents small and medium-sized enterprises (SMEs) with
limited financial resources. These companies primarily operated before and
during the 2007–2008 global financial crisis. Due to the challenging economic
conditions, they experienced negative retained earnings, restricting their ability
to reinvest and grow. Most accounts receivable for these businesses are at low
levels, indicating limited access to credit and constrained cash flow.
 Cluster C2: Comprises larger enterprises that operated during the post-crisis
economic recovery period (2009–2016). These companies exhibit better
financial management, positive retained earnings, and effective business
strategies. This enabled them to accumulate profits and sustainably expand their

35

Downloaded by THI?N LÊ MINH ([email protected])


lOMoARcPSD|49437579

operations. The larger accounts receivable values reflect their higher transaction
volumes and stronger credit capabilities compared to companies in Cluster C1.
This study highlights the differences between business groups in terms of size,
financial capacity, and operational strategies, while also underscoring the impact of
macroeconomic contexts on businesses during distinct historical periods. The findings
not only provide a deeper understanding of the characteristics of each cluster but also
serve as a basis for recommending appropriate support policies. These policies can
help SMEs overcome challenges, improve competitiveness, and achieve sustainable
growth in the long term

Figure 4.2.26 Clustering Model


3. Classification models
3.1 Classification models
Steps to execute:
Step 1: Build the Model
 Use the File widget to import the Bankrupt dataset.
 Choose Status_label as the Target variable.

36

Downloaded by THI?N LÊ MINH ([email protected])


lOMoARcPSD|49437579

Figure 4.3.27 Input data


 Use the Data Table widget and connect it from File to Data Table to
view the detailed data. We can see that the dataset contains 7869
distances, 19 features, and there is no missing data. The Status_label
variable is the Target variable.

Figure 4.3.28 Data in the Data Table


Step 2: Data Splitting
 Use the Data Sample widget to split the data. Connect the Data Table
widget to the Data Sample widget.
 Double-click on the Data Sample widget, select Fixed proportion of
data, and set it to 70% of the processed Bankrupt dataset. Then, click
Sample Data and close the dialog box.

37

Downloaded by THI?N LÊ MINH ([email protected])


lOMoARcPSD|49437579

Figure 4.3.29 Data Table interface for training data sampling


 Connect the Data Sample widget to the Data Table widget to view
the data. Rename it as Bankrupt 70% (This will be the training data
containing 70% of the data). Then, connect the Data Table widget to
the Save Data widget. Choose a folder to save the file and name
Bankrupt 70% with the format (*.csv).

Figure 4.3.30 Bankrupt dataset (70%)


 The dataset includes 5509 records (samples) with 19 features, one binary
target variable (2 values), and one meta attribute used to provide
information about the company name. There are no missing values in the
entire dataset.

38

Downloaded by THI?N LÊ MINH ([email protected])


lOMoARcPSD|49437579

 Double-click on the Data Sample widget, set the Fixed proportion to 30%,
then click Sample Data, and close the dialog.Nháy đúp chuột vào Data
Sample, tại Fixed proportion kéo thả thành 30% và nhấn chọn Sample Data,
tắt hộp thoại.

Figure 4.3.31 Data Table interface for prediction data sampling


 Connect the Data Sample widget to the Data Table to view the data,
rename it to Bankrupt 30%, and then connect the Data Table to the Save
Data widget. Save the file in the directory and name it Bankrupt 30%
with the *(csv) format. Finally, remove the target variable "Status_label"
from the dataset.

Figure 4.3.32 Bankrupt dataset (30%)

39

Downloaded by THI?N LÊ MINH ([email protected])


lOMoARcPSD|49437579

 The dataset used for forecasting consists of 2,361 records (samples),


with 20 features and a meta attribute variable providing company names.
The dataset does not contain a target variable and has no missing values.

Figure 4.3.33 Data splitting model


3.2 Classification models analysis
Steps to excute:
Step 1: Import Data
 Use the File widget to import the "Bankrupt 70%" dataset. Ensure that
the data is correctly read in the expected format and check the input
variables.
 Select the status_label variable as the target variable.
 Verify that the dataset satisfies two conditions: it contains one target
variable and has no missing or erroneous data.
Step 2: Use Models
 Add the following three widgets: Decision Tree, Logistic Regression,
and SVM (Support Vector Machine).
 Add the Test and Score widget to compare and evaluate the algorithms.
This will help identify the best performing algorithm and the most
accurate forecasting method among the three.

40

Downloaded by THI?N LÊ MINH ([email protected])


lOMoARcPSD|49437579

Figure 4.3.34 Test and Score model for algorithm comparison


 Connect Bankrupt 70%, Logistic Regression, SVM, and Tree to Test
and Score.
 Double-click on Test and Score to view the evaluation results.

Figure 4.3.35 Test and Score results


3.3 Evaluation methods for classification models
3.3.1 Model evaluation results from Test and Score
At the "Evaluation results for target" section of Test and Score, the quantitative results
of the three models—Logistic Regression, Tree, and SVM—show which has the
highest value. Using the Test and Score tool, it is observed that the Tree model is the
most effective. Specifically:
 Accuracy:
The Tree classifier has the highest accuracy: Tree (0.852) > Logistic Regression
(0.736) > SVM (0.410). According to the CA method, Tree is the most suitable
classifier.
41

Downloaded by THI?N LÊ MINH ([email protected])


lOMoARcPSD|49437579

 Area Under the Curve:


When choosing the sample ratio at Cross validation and splitting the data into 5 parts,
the AUC for each model is: SVM = 0.362, Tree = 0.780, Logistic Regression = 0.821.
According to the AUC method, Logistic Regression is the most suitable classifier.
However, considering other methods and the overall view, the Tree model is more
effective.
 Precision:
The Tree model has the highest precision: Tree (0.848) > SVM (0.770) > Logistic
Regression (0.733). According to the Precision method, Tree is the most suitable
classifier.
 Recall:
The Tree model has the highest recall: Tree (0.852) > Logistic Regression (0.736) >
SVM (0.410). According to the Recall method, Tree is the most suitable classifier.
 Harmonic mean:
The Tree model has the highest F1-Score: Tree (0.848) > Logistic Regression (0.658)
> SVM (0.361). According to the F1-Score method, Tree is the most suitable
classifier.
3.3.2 ROC Analysis Evaluation Method

Figure 4.3.36 ROC curve model at alive value of target variable

42

Downloaded by THI?N LÊ MINH ([email protected])


lOMoARcPSD|49437579

Figure 4.3.37 ROC curve model at "failed" value of target variable


The most effective model is the one with the lowest False Positive (FP) Rate and the
highest True Positive (TP) Rate, or the one whose ROC curve approaches the point
(0;1) on the graph the most. From the graph, we can see that the ROC curve of the
Logistic Regression classifier comes closest to the point (0;1).
Therefore, according to the ROC Analysis method, Logistic Regression is the most
suitable classification method with high accuracy. However, I will further analyze
other methods to select the most appropriate model.
3.3.3 Confusion matrix method

Figure 4.3.38 Confusion matrix results of Tree with sample count

43

Downloaded by THI?N LÊ MINH ([email protected])


lOMoARcPSD|49437579

Figure 4.3.39 Confusion matrix results of Tree with prediction ratio

Figure 4.3.40 Confusion matrix results of SVM with sample count

Figure 4.3.41 Confusion matrix results of SVM with prediction ratio

44

Downloaded by THI?N LÊ MINH ([email protected])


lOMoARcPSD|49437579

Figure 4.3.42 Confusion matrix results of Logistic Regression with sample count

Figure 4.3.43 Confusion matrix results of Logistic Regression with prediction ratio

Table 4.3.2 Comparision of results from 3 tools based on the confusion matrix

TP TN FP FN FP + FN

Tree 88.2% 76.4% 23.6% 11.8% 35.4%

SVM 94.8% 32.1% 67.9% 5.2% 73.1%

Logistic Regression 73.7% 72.3% 27.7% 26.3% 54%

Based on the table above:


 TP (True Positive) - Predicted to still exist and the prediction is correct: SVM
(94.8%) > Tree (88.2%) > Logistic Regression (73.7%)
45

Downloaded by THI?N LÊ MINH ([email protected])


lOMoARcPSD|49437579

 TN (True Negative) - Predicted bankrupt and actually bankrupt: Tree (76.4%)


> Logistic Regression (72.3%) > SVM (32.1%)
 FP (False Positive - Type 1 Error) - Predicted bankrupt but actually still
exists: SVM (67.9%) > Logistic Regression (27.7%) > Tree (23.6%)
 FN (False Negative - Type 2 Error) - Predicted to still exist but actually
bankrupt: Logistic Regression (26.3%) > Tree (11.8%) > SVM (5.2%)
From the above observations, we can draw the following conclusions:
Table 4.3.3 Prediction results of three tools using the confusion matrix method

TP TN

Tree 4.692 817

SVM 2.259 3.250

Logistic Regression 4.055 1.454

Commnentary:
Based on the statistics, it is evident that the Tree model is the most suitable method.
 Looking at the confusion matrix comparison results of the three methods:
 FP: Tree < Logistic Regression < SVM (comparing the FP percentage rates)
 FN: SVM < Tree < Logistic Regression (comparing the FN percentage rates)
 Looking at the total errors FP+FN, Tree has the smallest error rate at 35.4%.
From a business perspective, False Negative (FN) errors, i.e., predicting that a
company will survive when it has actually gone bankrupt, will have more severe
consequences than False Positive (FP) errors, i.e., predicting that a company has gone
bankrupt when it is still operational.
The reason is that False Negative errors can lead to businesses or stakeholders
maintaining a false belief in the viability of a business entity that can no longer
operate, resulting in significant financial losses. For example, investors, partners, or
banks may continue to provide funding or sign contracts with a company that can no
longer meet its financial obligations, leading to a widespread risk.
On the other hand, False Positive errors typically only result in missed
opportunities for collaboration or investment. While this can cause some damage, the
losses are usually recoverable by pursuing other business opportunities. Therefore,
considering the overall impact of the damages businesses may face, it is more
important to avoid False Negative errors, as their consequences are often more severe
and harder to remedy than False Positive errors.
Thus, based on the confusion matrix method, the Tree model is the most suitable
classification (forecasting) method, and the team will use this method for forecasting.
46

Downloaded by THI?N LÊ MINH ([email protected])


lOMoARcPSD|49437579

Figure 4.3.44 Model for evaluating classification methods


3.2 Data forecasting
Steps to excute:
To apply linear regression to predict bankruptcy status of companies in the
United States through the Predictions command, follow these steps:
Step 1: Add the File widget and import the Bankrupt 30% data file for
prediction.
Step 2: Choose the Bankrupt 30% file and change the Status_label variable to a
feature variable.

Figure 4.3.45 File interface of prediction dataset


Step 3: Choose the Tree method as the prediction model. Add the Predictions
widget and connect the test file (Bankrupt 30%). Input the training data
(Bankrupt 70%) into the Tree model. Connect the Tree model to make the
prediction.

47

Downloaded by THI?N LÊ MINH ([email protected])


lOMoARcPSD|49437579

Figure 4.3.46 Predictions interface


Step 4: Add the Data Table widget (rename it to Data Dự Báo), connect it to
Predictions to view the detailed data in the Data Table. Choose Save Data to
save the data as a file (*csv) with the name Bankrupt_Dự báo.

Figure 4.3.47 Predicted data in Data Table


Based on the analysis results of the "30% Testing" dataset using the Tree model,
2,360 data samples were classified, including 1,700 businesses still operating
(accounting for 72.1%) and 660 businesses that have gone bankrupt (accounting for
27.9%). Among the 2,360 samples, this represents a relatively high proportion of
surviving businesses, indicating that many businesses in the economy or industry are
able to maintain operations. This reflects the stability and resilience of the majority of
businesses in the sample.
However, with 660 bankrupt businesses (27.9%), this remains a significant figure,
highlighting that a considerable number of businesses are facing substantial challenges
that prevent them from continuing their operations. Factors such as market

48

Downloaded by THI?N LÊ MINH ([email protected])


lOMoARcPSD|49437579

fluctuations, ineffective management, or macroeconomic conditions could play critical


roles in causing these bankruptcies.
Overall, the high proportion of surviving businesses (72.1%) is a positive sign, but
the bankruptcy rate underscores the importance of providing support, consulting, or
risk management solutions for vulnerable businesses to minimize future bankruptcy
risks. This also opens up opportunities for economic policies or financial strategies to
promote the sustainable development of businesses.

Figure 4.3.48 Prediction model


3.3 Conclusion
Based on the results from the classification model, the team evaluated that the
Tree model is the most effective and suitable classification model. With a dataset
consisting of 2,361 records (data), using 20 features and 2 meta attributes, the Tree
model successfully predicted bankruptcy risk. The results show that the probability of
"alive" is higher than "failed", accounting for 90%. This proves that the Tree model
can help businesses assess their situation and take corrective actions to reduce the risk
of bankruptcy.
4. Model evaluation
By evaluating the stability of the model through observing the variation in
performance metrics across different folds, it becomes evident that a stable model
produces consistent results across multiple partitions. When dividing the dataset into 5
or 20 folds, the Evaluation Results indicate that both scenarios yield corresponding
and reliable outcomes.
The data sample was divided into 5 parts and 20 parts to examine whether the
performance metrics of these two sample splits would yield relatively similar results.
From this, it can be observed that the Decision Tree classification method is the most
effective approach for evaluating the model. The only noticeable difference lies in the
Area Under the Curve (AUC) of the ROC, which is minimal and not statistically
significant. Therefore, it can be concluded that the Decision Tree classification method
is the best results for evaluating the model.

49

Downloaded by THI?N LÊ MINH ([email protected])


lOMoARcPSD|49437579

Figure 4.4.49 Evaluation model divided into 5 parts

Figure 4.4.50 Evaluation model divided into 20 parts


Table 4.4.4 Comparison table of sample splitting ratios into 5 parts and 20 parts

Sample Division Ratio 5 folds 20 folds

Classification Accuracy (CA) 0.863 0.863

F1-Score (Harmonic Mean) 0.861 0.861

Precision 0.860 0.860

Recall 0.863 0.863

Area Under the Curve (AUC) 0.801 0.802

From the results presented in the table above, the sample division ratio of 20
folds demonstrates the best performance with the Decision Tree model, as indicated by
50

Downloaded by THI?N LÊ MINH ([email protected])


lOMoARcPSD|49437579

a slightly higher AUC value compared to the 5-fold division (0.802 > 0.801), despite
the overall performance metrics showing minimal differences.
Based on the Confusion Matrix, the Decision Tree model further proves to be the
optimal choice, reinforcing its stability and suitability. This result highlights its
effectiveness over Decision Tree in classification tasks.
By evaluating the implemented models, we can conclude that the Decision Tree
method is the most appropriate approach for predicting the likelihood of bankruptcy or
the continued operation of U.S. companies.

51

Downloaded by THI?N LÊ MINH ([email protected])


lOMoARcPSD|49437579

CHAPTER V: RESULTS
1. Analysis and Answers to Research Questions
Question 1: Which financial factors can accurately predict the bankruptcy risk of U.S.
companies, and how do they influence investment decisions?

Table 5.1.5 Key financial metrics


Dựa Based on the forecast, a company's operational status (Status_Label) —
categorized as “Alive” or “Failed” — can be accurately predicted through the
influence of key variables. Each financial factor acts as a piece of the overall figure of
a company’s financial health and operational performance, helping to distinguish
between stable companies and those at high risk of bankruptcy.
Regarding corporate bankruptcy risks, numerous studies worldwide have
explored this issue. Notably, Edward I. A. (1968) analyzed financial indicators using
polynomial models to predict corporate bankruptcy, and James A. O. (1980)
investigated financial ratios and their predictive power for corporate failure. Their
research presented quantitative results predicting corporate failure as evidence of
bankruptcy events [5]. The key findings of the study can be summarized as follows:
First, the ability to identify four major groups of factors statistically shown to
affect the probability of corporate failure (within one year): (1) company size, (2)
financial structure, (3) operational efficiency, and (4) liquidity.
Furthermore, in the context of bankruptcy research, Evridiki Neophytou, Andreas
Charitou, and Chris Charalambous (2000) developed a failure prediction model for
industrial companies in the United Kingdom using logistic regression analysis. This
predictive model was able to forecast failure up to three years before the event
occurred. The results revealed that a model incorporating three financial variables —
profitability, operating cash flow, and financial leverage — could accurately predict
corporate failure with an overall accuracy rate of 83% one year prior to the event.

52

Downloaded by THI?N LÊ MINH ([email protected])


lOMoARcPSD|49437579

Building on insights from experts and combining clustering, classification


models, and the ranking of feature importance relative to the target variable, the
research team identified the following standout variables for their influence:
X15 - Retained Earnings: Retained earnings represent the portion of net income
that a company reinvests or reserves to address financial fluctuations rather than
distributing as dividends to shareholders. According to the Altman Z-Score Model,
retained earnings are one of the key indicators for assessing a company's financial
sustainability. Companies with high retained earnings typically have greater
reinvestment capacity, improved productivity, and reduced dependence on debt.
Conversely, low or negative retained earnings signal bankruptcy risk, particularly if
the company lacks financial resources to meet debt obligations or unexpected
expenses. Based on the ranking widget, retained earnings hold the highest rank, with a
Gain Ratio of 0.107, reflecting its significant impact on the target variable.
X6 - Net Income: Net income, the profit remaining after deducting all expenses,
taxes, and interest, directly reflects the core operational efficiency of a business. It is a
critical indicator of financial performance. With a Gain Ratio of 0.077 and a Gini
Index of 0.088, net income is ranked as the second most important feature for
predicting bankruptcy risk. High net income demonstrates stable operations and short-
term financial sustainability. Conversely, low or negative net income suggests
potential cash flow issues, increasing financial risk.
X12 - EBIT (Earnings Before Interest and Taxes): EBIT, with a Gain Ratio of
0.068 and a Gini Index of 0.077, serves as a vital measure of profitability derived from
a company's core business operations. Unlike net income, EBIT is not influenced by
financial costs or taxes, providing a clearer view of operational efficiency. High EBIT
indicates strong and sustainable business foundations, while low EBIT signals weak
performance, potentially leading to decline or bankruptcy.
X5 - Inventory: With a Gain Ratio of 0.056 and a Gini Index of 0.059, inventory
is another crucial factor in predicting bankruptcy risk. Excessive inventory may
indicate inefficient product turnover, leading to capital stagnation and reduced cash
flow, a particularly dangerous scenario for resource-constrained companies. On the
other hand, insufficient inventory may disrupt production and supply processes.
Effective inventory management is essential for ensuring financial stability.
X4 - EBITDA (Earnings Before Interest, Taxes, Depreciation, and
Amortization): EBITDA, with a Gain Ratio of 0.048 and a Gini Index of 0.054,
provides a comprehensive view of a company's profitability from core business
activities before accounting for non-operational factors such as interest, taxes, or
depreciation. High EBITDA suggests stable cash flow and strong debt repayment

53

Downloaded by THI?N LÊ MINH ([email protected])


lOMoARcPSD|49437579

capacity, while low EBITDA indicates difficulties in maintaining profitability from


core operations, increasing the risk of insolvency or bankruptcy.
Based on the rankings, the most important features are X15, X6, X12, X5, and
X4. These key factors should be prioritized when building an effective predictive
model. Lower-ranked features can be considered as supplementary inputs. These
indicators are closely linked to a company's financial health. Combining the analysis
of these features not only helps evaluate current operational performance but also
provides a robust foundation for predicting bankruptcy risk. In particular, analyzing
these variables not only helps investors make informed decisions but also enables
businesses to identify and adjust their strategies in a timely manner, thereby
establishing a solid financial foundation and enhancing their ability to withstand
market fluctuations.
Question 2: Which financial indicators are the most important in predicting the
bankruptcy risk of U.S. companies, and how does their importance vary by industry?
The most important financial indicators in predicting bankruptcy risk for U.S.
companies include liquidity, debt management, profitability, and operational
efficiency, but their significance varies across industries. For example, in the financial
sector, liquidity is prioritized, as more than 80% of financial institutions that went
bankrupt during the 2008-2009 crisis had a Liquidity Coverage Ratio below the
regulatory threshold (Federal Reserve, 2023). In the manufacturing industry,
operational efficiency plays a critical role, as companies with an Asset Turnover Ratio
below 0.5 are more likely to go bankrupt (McKinsey, 2021). For the retail sector,
profitability and cash flow are key factors, with 60% of retail businesses that went
bankrupt during the COVID-19 pandemic having a net profit margin below 2%
(National Retail Federation, 2022). Meanwhile, the technology sector places greater
emphasis on revenue growth and cash flow management rather than debt ratios, as
more than 70% of technology startups fail due to cash flow problems (CB Insights,
2023). This suggests that bankruptcy prediction models need to be adaptable to the
characteristics of each industry to enhance accuracy and effectiveness.
Question 3: What will be the effect of forecasting whether the company will go
bankrupt or not for the US market economy?
Bankruptcy prediction is a crucial tool for mitigating the negative impacts on
the U.S. market economy[7]. The application of predictive models (such as decision
trees) not only helps businesses manage financial risks but also enables investors and
the government to make timely decisions, reducing the effects of mass bankruptcies on
the economy. According to a study by S&P Global (2021), the U.S. corporate
bankruptcy rate reached its highest level in more than a decade in 2020, with
approximately 600 large companies declaring bankruptcy, resulting in losses of up to

54

Downloaded by THI?N LÊ MINH ([email protected])


lOMoARcPSD|49437579

$250 billion to the economy. If companies can accurately predict their financial
condition and identify early signs of bankruptcy, they can take timely corrective
actions, such as debt restructuring or business strategy adjustments.
Moreover, bankruptcy prediction helps investors make more informed decisions
when selecting stocks or other assets, thereby minimizing losses in the stock market. A
study by Moody’s Analytics (2022)[8] indicates that, during financial crises, companies
with a lower bankruptcy rate are more likely to be chosen by investors, contributing to
market stability. On the other hand, accurately predicting bankruptcy risk not only
reduces risk but also opens up important opportunities to improve the economy. The
government and financial institutions can leverage data from these predictions to
develop more strategic policies to support businesses, not only through relief measures
but also by fostering innovation and restructuring. For example, through financial aid
packages, businesses can invest in new technologies, improve management processes,
and enhance competitiveness, turning a crisis into a development opportunity.
After the 2008 financial crisis, for instance, the U.S. not only rescued large
financial institutions but also encouraged technology startups and increased investment
in innovative sectors like renewable energy. These policies helped not only recover the
economy but also created millions of new jobs, laying the foundation for more
sustainable growth in the future.
In the field of data science, bankruptcy prediction models based on financial
indicators can also improve decision-making across various sectors. These models can
be applied not only to individual companies but also to the entire market, helping
financial regulators predict systemic risks. Machine learning algorithms and artificial
intelligence (AI) can analyze massive amounts of data from financial reports,
providing more accurate predictions of future bankruptcy risk. When widely applied,
these predictions help companies better manage their financial resources, protect
investors, and contribute to maintaining the stability of the U.S. economy in an ever-
changing economic environment.
Thus, bankruptcy prediction not only provides direct benefits to businesses but
also plays a vital role in stabilizing the U.S. market economy, enabling investors,
governments, and financial regulators to act early to minimize systemic risks, protect
financial markets, and maintain sustainable growth. Rather than viewing bankruptcy
solely as a risk, managing and predicting this risk can be transformed into an
opportunity for reform, enhancing economic efficiency and reshaping business models
in a more positive and sustainable direction.

55

Downloaded by THI?N LÊ MINH ([email protected])


lOMoARcPSD|49437579

CHAPTER VI: CONCLUSION


1. Research Results
This study not only focuses on identifying the financial factors that can
accurately predict bankruptcy risk for U.S. companies but also analyzes the impact of
these factors on investment decisions and the ability to attract capital. Key financial
indicators such as profitability ratios, debt-to-equity ratio, debt repayment capacity,
and free cash flow have been proven to be decisive factors in assessing a company's
bankruptcy risk. By using the Orange software and data from 1999 to 2018, the
financial forecasting model developed helps identify potential risks and supports
investors in making informed decisions, minimizing the risk of losses, and optimizing
investment strategies.
Economically, this research offers significant value in improving the decision-
making efficiency of investors and financial institutions. Accurate bankruptcy
prediction helps investors reduce financial risks and identify high-return investment
opportunities, thereby enhancing capital efficiency. Additionally, the study contributes
to improving a company's ability to attract investment, as a transparent and accurate
financial model increases investor confidence. Companies that can maintain financial
stability and overcome bankruptcy risks are more likely to attract investment,
especially during growth or crisis periods.
Bankruptcy prediction is a powerful early-warning tool and plays a crucial role in
various aspects of financial and business management. It is vital for safeguarding
investments, maintaining financial stability, making informed credit decisions, and
contributing to the overall health of the economy. Investors and businesses can use the
findings from this study to adjust their investment strategies and business development
plans, contributing to sustainable economic growth. Furthermore, implementing
financial forecasting models also enhances a company's competitiveness, creating
opportunities for different industries to develop and driving comprehensive economic
transformation.
2. Limitations of the Study
The current business environment is continuously changing and unpredictable,
with unexpected events such as global pandemics or trade conflicts that can
significantly impact the accuracy of predictive models if not updated in a timely
manner. A common issue is overfitting, where the model is too closely fitted to the
training data, resulting in inaccurate predictions when applied to new or changing data.
Conversely, underfitting occurs when the model is too simple and fails to identify key
factors, reducing the forecasting effectiveness. To address these issues, applying
model adjustment methods such as L1, L2, or Drop-out is essential to avoid the model
learning too much from training data without generalizing to real-world situations.

56

Downloaded by THI?N LÊ MINH ([email protected])


lOMoARcPSD|49437579

Additionally, in the context of a rapidly changing business environment, real-


time data analysis has become crucial for monitoring and responding promptly to
market fluctuations. However, a major challenge is data processing delays, which can
reduce the accuracy and timeliness of predictions. To overcome this, the adoption of
advanced technologies to reduce latency in data processing is critical.
Nevertheless, the study has successfully identified key financial factors that can
predict bankruptcy risk, thereby enhancing the ability to analyze and support decision-
making for investors. The use of Orange software has proven effective in processing
and analyzing large datasets, improving the accuracy of predictive models. However,
for the model to become more comprehensive and applicable, further research is
needed, particularly by incorporating non-financial factors such as governance quality
and social factors, to improve the accuracy and depth of predictions in an ever-
changing market context.
3. Future Research Directions
To improve the accuracy and comprehensiveness of the model, it is essential to
incorporate additional data sources beyond traditional financial metrics, such as text
from annual reports, financial news articles, social media, and geographical data. To
address issues such as overfitting and underfitting, cross-validation on different
subsets of data should be employed to ensure the model’s effectiveness on new,
unseen data. Regularization techniques such as L1, L2, or Drop-out can also be
utilized to prevent the model from overfitting to the training data. Combining multiple
models will further increase prediction accuracy and stability. Additionally,
establishing a periodic retraining mechanism will ensure the model remains updated
with new information and reflects changes in the business environment. Particularly,
Transfer Learning can save time and resources by leveraging pre-trained models.
Future development will focus on optimizing the techniques used and expanding
the research’s application. First, while classification and clustering techniques have
been implemented in this study, improving and tailoring classification algorithms to
better align with specific industry sectors will enhance the prediction accuracy and
efficiency. Second, integrating real-time data into the study could offer timely
forecasts, allowing investors and businesses to react swiftly to market fluctuations.
Third, expanding the research into other sectors, such as technology, retail, or financial
services, will enable the development of sector-specific bankruptcy prediction models,
further improving the accuracy of bankruptcy forecasts. These advancements will not
only improve the model’s precision and scalability but also address the practical needs
of a dynamic business environment, significantly contributing to financial risk
management and optimizing future investment strategies.

57

Downloaded by THI?N LÊ MINH ([email protected])


lOMoARcPSD|49437579

REFERENCES
1. Ministry of Finance of Vietnam. (2022). Details of the article. Retrieved
November 26, 2024, from:
https://mof.gov.vn/webcenter/portal/ttncdtbh/pages_r/l/chi-tiet-tin?
dDocName=MOFUCM227388
2. Vietnam Academy of Social Sciences. (2017). Vai trò của doanh nghiệp
vừa và nhỏ ở Hoa Kỳ trong giai đoạn hiện nay. Retrieved November 26,
2024, from:
https://thuvienkhxh-vass.contentdm.oclc.org/digital/collection/p20065coll
33/id/2709/\
3. Hoàng, V. (2020). Vận dụng mô hình Z-score trong dự báo khả năng phá
sản doanh nghiệp tại Việt Nam. Học viện Ngân hàng. Retrieved November
26, 2024, from
https://hvnh.edu.vn/medias/tapchi/vi/07.2020/system/archivedate/8e4f152
e_B%C3%A0i%20c%E1%BB%A7a%20T%C3%A1c%20gi%E1%BA
%A3%20Ho%C3%A0ng%20Th%E1%BB%8B%20H%E1%BB%93ng
%20V%C3%A2n.pdf
4. Sở Giao dịch Chứng khoán Hà Nội. (n.d.). ZSCORE - Mô hình dự báo khả
năng phá sản doanh nghiệp. Retrieved November 28, 2024, from
https://www.shs.com.vn/Terms/ZSCORE.aspx
5. Nguyễn, T. T. L. (2019). Các nhân tố ảnh hưởng đến rủi ro phá sản của
các doanh nghiệp niêm yết ngành Xây dựng tại Việt Nam. Học viện Ngân
hàng. Retrieved November 28, 2024, from:
https://hvnh.edu.vn/medias/tapchi/vi/07.2019/system/archivedate/B
%C3%A0i%20c%C3%A1%BB%A7a%20ThS.Nguy%E1%BB%85n
%20Th%E1%BB%8B%20Tuy%E1%BA%BFt%20Lan.pdf
6. Trương, T. T. D., & Lê, H. T. (2023). Ứng dụng phương pháp học máy
trong dự báo rủi ro phá sản của các doanh nghiệp Việt Nam. Da Nang
University. Retrieved November 28, 2024, from
https://scholar.dlu.edu.vn/thuvienso/bitstream/DLU123456789/195255/1/
CTv60S3102023044.pdf
7. Uyển, N. T., & Trang, P. T. Q. (2020). Vai trò quan trọng của quản trị tài
chính đối với doanh nghiệp trước những thách thức và rủi ro trong bối
cảnh hội nhập kinh tế quốc tế. Retrieved November 28, 2024, from
https://tusach.fph.gov.vn/upload/data/news/13-05-24/30.-ky-yeu-hoi-thao-
khoa-hoc-giai-phap-quan-tri-tai-chinh-va-dau-tu.pdf#page=51
8. Moody's. (2024). Moody's Outlooks 2025: Clarity from complexity.
Retrieved November 28, 2024, from https://www.moodys.com/

58

Downloaded by THI?N LÊ MINH ([email protected])

You might also like