Data Science Final Report
Data Science Final Report
Khoa học dữ liệu (Trường Đại học Kinh tế Thành phố Hồ Chí Minh)
Term Paper
Subject DATA SCIENCE
TOPIC “PREDICTING THE RISK OF BANKRUPTCY IN U.S. COMPANIES AND
PROPOSING EFFECTIVE INVESTMENT STRATEGIES”
Supervising Professor Dr. Vo Van Hai
Class– Course 24C1INF50905917
Actual Submission Date December 8, 2024
iii
SCORE: .....................
PROFESSOR’S FEEDBACK:
.....................................................................................................................................................
.....................................................................................................................................................
.....................................................................................................................................................
.....................................................................................................................................................
.....................................................................................................................................................
.....................................................................................................................................................
.....................................................................................................................................................
.....................................................................................................................................................
.....................................................................................................................................................
Final Semester
Academic Year 2024
iv
ACKNOWLEDGMENTS
First of all, our team would like to extend our sincere gratitude to our course
instructor, Dr. Võ Văn Hải, for imparting invaluable knowledge and accompanying us
throughout the process of completing this final project. Through his insightful
guidance and shared experiences, we were able to gain significant knowledge that
proved highly beneficial, not only during the project but also in our personal growth.
The lessons he delivered in class, coupled with the diligent research and efforts of all
team members, enabled us to successfully complete this final project. Nevertheless, we
are aware of the limitations in our knowledge and experience, and we acknowledge
that some errors may have occurred during the implementation process. Thus, we
eagerly welcome any constructive feedback from Dr. Hải to help us improve and excel
in future projects.
Additionally, we would like to express our deepest appreciation to the University
of Economics Ho Chi Minh City for including Data Science in the curriculum and
providing us with the opportunity to study and research this field. This course has
allowed us to acquire fascinating knowledge on critical issues related to data
processing and analysis, applying data-driven insights to practical decision-making in
various economic and social domains, particularly in the field of investment, which is
our primary area of focus. While studying this subject, we realized its importance and
novelty to all UEH students, which posed certain challenges in mastering and applying
it to real-world situations. However, under the mentorship and valuable insights of Dr.
Võ Văn Hải, we have been able to learn, understand, and consolidate the essential
knowledge required to carry out this project.
Finally, we would like to extend our heartfelt thanks to the individuals,
organizations, and experts who provided valuable resources and information, which
greatly enriched the data needed for this project. These contributions have been pivotal
in improving and elevating the quality of our research.
We sincerely thank you all!
TABLE OF CONTENTS
ACKNOWLEDGMENTS...........................................................................................1
LIST OF ABBREVIATIONS.....................................................................................4
LIST OF FIGURES.....................................................................................................8
LIST OF TABLES.......................................................................................................9
CHAPTER I: INTRODUCTION.............................................................................12
1. Context:...........................................................................................................12
2. Relevance of the topic:....................................................................................13
3. Research questions:........................................................................................13
CHAPTER II: LITERATURE REVIEW................................................................14
1. Related studies:...............................................................................................14
2. Limitations previous studies and directions for innovation:.......................14
CHAP III: RESEARCH METHODOLOGY..........................................................15
1. Data:................................................................................................................15
1.1 Data source:..................................................................................................15
1.2 Data structure:...............................................................................................16
The dataset consists of 78,682 data rows, 20 features with 0% missing data, and
1 meta attribute, as follows:.................................................................................16
2. Algorithms and Analytical Tools...................................................................17
3. Proposed Research Model..............................................................................18
CHAPTER IV: EXPERIMENTAL PROCEDURES..............................................20
1. Data Preprocessing.........................................................................................20
2. Clustering Model............................................................................................22
2.1 Clustering with Hierarchical Clustering method.........................................22
2.2. Clustering with k-Means method:.................................................................26
2.3 Visulazation:..................................................................................................29
2.4 Conclusion:....................................................................................................35
3. Classification models......................................................................................36
2
LIST OF ABBREVIATIONS
1. GDP: Gross Domestic Product
2. FED: Federal Reserve System
3. SMEs: Small and Medium Enterprises
4. NYSE: New York Stock Exchange
5. NASDAQ: National Association of Securities Dealers Automated Quotations
6. SEC: Securities and Exchange Commission
7. SVM: Support Vector Machine
8. TPR: True Positive Rate
9. FPR: False Positive Rate
10. EBIT: Earnings Before Interest and Taxes
11. EBITDA: Earnings Before Interest, Taxes, Depreciation, and Amortization
12. CA: Classification Accuracy
13. AUC: Area Under the Curve
14. ROC: Receiver Operating Characteristic
15. TP: True Positive
16. TN: True Negative
17. FP: False Positive
18. FN: False Negative
19. USD: United States Dollar
20. &P GLOBAL : Standard & Poor's Global
LIST OF FIGURES
Figure 1.1.1 Forecast of bankruptcy rates in 2021 compared to 2019 across countries
worldwide.............................................................................................................13
Figure 4.1.1 Dataset after being loaded into the File widget........................................20
Figure 4.1.2 Data Observed in the Data Table.............................................................20
Figure 4.1.3 Image of the American Bankruptcy Dataset Information (after randomly
selecting 7,869 instances).....................................................................................21
Figure 4.1.4 Data Preprocessing model.......................................................................22
Figure 4.2.1 Input Data................................................................................................23
Figure 4.2.2 Distances Tool Interface..........................................................................23
Figure 4.2.3 Interface display in Hierarchical Clustering............................................24
Figure 4.2.4 Interface display in Hierarchical Clustering............................................24
Figure 4.2.5 Evaluation of the Hierarchical model through Silhouette plot.................25
Figure 4.2.6 Evaluation of the Hierarchical model through Silhouette plot.................25
Figure 4.2.7 Evaluation of the Hierarchical model through Silhouette plot.................26
Figure 4.2.8 Clustering results using Hierarchical Clustering......................................26
Figure 4.2.9 Data display at k-Means..........................................................................27
Figure 4.2.10 Evaluation of k-Means through Silhouette plot.....................................28
Figure 4.2.11 Evaluation of k-Means through Silhouette plot.....................................28
Figure 4.2.12 Clustering results using k-Means...........................................................29
Figure 4.2.13 Visual illustration of dividing data into two clusters for comparison.. . .29
Figure 4.2.14 Division into two clusters for comparison based on company operating
time index.............................................................................................................30
Figure 4.2.15 Comparison of two clusters based on retained earnings........................31
Figure 4.2.16 Comparison of two clusters based on net income..................................32
Figure 4.2.17 Comparison of two clusters based on earnings before interest and tax. .32
Figure 4.2.18 Comparison of two clusters based on inventory....................................33
Figure 4.2.19 Comparison of two clusters based on earnings before interest, tax, and
depreciation..........................................................................................................34
Figure 4.2.20 Comparison of two clusters based on total receivables..........................35
Figure 4.2.21 Clustering Model...................................................................................36
Figure 4.3.1 Input data.................................................................................................37
Figure 4.3.2 Data in the Data Table.............................................................................37
Figure 4.3.3 Data Table interface for training data sampling.......................................38
Figure 4.3.4 Bankrupt dataset (70%)...........................................................................38
Figure 4.3.5 Data Table interface for prediction data sampling...................................39
Figure 4.3.6 Bankrupt dataset (30%)...........................................................................39
Figure 4.3.7 Data splitting model................................................................................40
Figure 4.3.8 Test and Score model for algorithm comparison.....................................41
Figure 4.3.9 Test and Score results..............................................................................41
Figure 4.3.10 ROC curve model at alive value of target variable................................42
Figure 4.3.11 ROC curve model at "failed" value of target variable............................43
Figure 4.3.12 Confusion matrix results of Tree with sample count.............................43
Figure 4.3.13 Confusion matrix results of Tree with prediction ratio..........................44
Figure 4.3.14 Confusion matrix results of SVM with sample count............................44
8
LIST OF TABLES
11
CHAPTER I: INTRODUCTION
1. Context:
In recent years, the U.S. economy has faced significant volatility, creating a
complex environment for businesses. First and foremost, uneven economic growth
across sectors has become a major challenge. While GDP has continued to grow,
instability in sectors such as technology and finance has become more pronounced,
especially as companies in these sectors face increasing pressure from rising financial
costs. This is largely due to the Federal Reserve's monetary tightening policy, which
includes raising interest rates to curb inflation. Recently, the Fed raised the key interest
rate by 0.25 percentage points, marking the first rate hike in more than three years,
signaling a "hawkish" stance in monetary policy that has caused concerns in the bond
market about a potential recessions led to higher borrowing costs and significantly
impacted the liquidity of many businesses. Furthermore, an inverted yield curve is
often seen as a signal that investors are more concerned about the near future than the
long-term, causing short-term bond yields to rise higher than long-term bond yields.
In addition, the corporate debt crisis has become a serious threat. Small and
medium-sized enterprises (SMEs), which form the backbone of the U.S. economy, are
under increasing pressure from debt repayment in the context of tight cash flows . Data
sr rise in bankruptcy rates, particularly among companies with weak financial
structures, especially those reliant on short-term debt.
Moreover, technological transformation and global competition are crucial
factors affecting the survival of businesses. In the digital age, delayed adoption of new
technologies can cause companies to lose their competitive advantage, especially in
sectors such as e-commerce, manufacturing, and services. Companies that fail to keep
up with technological advancements often struggle to maintain market share, leading
to prolonged financial downturns.
As such, this volatile market environment highlights the urgent need for studies
that predict bankruptcy risks, particularly in a powerhouse like the United States. The
application of multivariate analysis models not only helps identify risk factors early
but also provides a scientific basis for businesses and policymakers to implement
timely solutions. Therefore, this research is not only academically valuable but also
has significant practical implications for risk reduction and supporting the sustainable
development of U.S. businesses.
12
Figure 1.1.1 Forecast of bankruptcy rates in 2021 compared to 2019 across countries
worldwide
Sources: Nation Statistics, Solunion, Euler Hermes, Alianz Research
2. Relevance of the topic:
Although the economic situation and the factors causing financial volatility have
been discussed in the context section, issues related to predicting and managing
bankruptcy risk for businesses remain a significant challenge, especially in the U.S.
market, where competition and changes in the business environment occur rapidly.
Traditional analytical methods have not provided a comprehensive view of the risk
factors affecting a company's survival. Therefore, the application of multivariate
analysis models, which allow for the simultaneous processing and evaluation of
multiple key financial variables, has become a potential solution for accurately
predicting bankruptcy risk. This not only helps investors and business managers
identify risk signals in a timely manner but also enables them to develop appropriate
business strategies. Additionally, this study clarifies the relationship between financial
factors and bankruptcy risk, opening opportunities for applying data science in
business management. By applying modern analytical methods, investors and
managers can make more accurate investment decisions, thereby improving business
performance and promoting sustainable development.
3. Research questions:
To achieve the research goal of predicting bankruptcy risk for U.S. companies
and proposing effective investment strategies, this study will focus on three main
questions.
Question 1: Which financial factors can accurately predict the bankruptcy risk of U.S.
companies, and how do they influence investment decisions?
13
The purpose of this question is to identify the most important financial factors
that can accurately predict the bankruptcy risk of U.S. companies. Answering this
question will clarify the relationship between financial indicators and investment
decisions, providing investors with the necessary information to make informed
decisions, reduce risk, and maximize profits.
Question 2: Which financial indicators are the most important in predicting the
bankruptcy risk of U.S. companies, and how does their importance vary by industry?
This question seeks to explore the financial indicators that play a decisive role in
predicting bankruptcy risk, while also analyzing how their significance changes across
different industries. The aim is to highlight the differences between industries and how
financial indicators can be effective in various industry contexts, helping investors
gain a detailed understanding of which factors should be prioritized when analyzing
companies in different sectors.
Question 3: What will be the effect of forecasting whether the company will go
bankrupt or not for the US market economy?
This analysis will clarify the impact of forecasting bankruptcy risk on the U.S.
market economy, particularly the effect these forecasts have on investment decisions
and the stability of financial elements. The goal is to show that accurate forecasting
can enhance market stability, reduce financial risks, and facilitate better investment
decision-making, thus ensuring the sustainable development of the economy.
14
15
7 X4 EBITDA Numeric
8 X5 Inventory Numeric
16
Choosing "status_label" as the target variable for the dataset predicting the
bankruptcy risk of U.S. companies is a logical and critical decision in data analysis.
The "status_label" variable represents the operational status of a company, indicating
whether the company is facing bankruptcy risk or maintaining stable operations. This
not only enhances the accuracy of the predictive model but also provides valuable
information that helps managers, investors, and stakeholders make more effective
strategic decisions. As a result, they can minimize financial risks and optimize
business performance in a proactive and precise manner.
Furthermore, "status_label" is a clear classification variable, simplifying the
model training and evaluation process. Specifically, the target variable classifies
companies into two groups: (1) companies that have gone bankrupt and (0) companies
that remain operational, based on significant events such as filing for bankruptcy under
Chapter 11 or Chapter 7 of the Bankruptcy Code. Overall, selecting this variable as the
target will optimize resources and efforts in addressing the critical issue of detecting
and preventing the bankruptcy risk of a company. This not only protects the interests
of stakeholders but also supports the stability and sustainable development of the
economy.
2. Algorithms and Analytical Tools
To analyze the dataset predicting bankruptcy risk of U.S. companies with the
target variable "status_label" on Orange, the following algorithms and tools can be
used:
Preprocess data: This is the data preprocessing step, which transforms the
input data into suitable output data for subsequent programs. The results from
17
this process are used as inputs for other programs, such as compilers, to prepare
and optimize the data, making the following steps easier to process.
k-Means Clustering: This is an unsupervised clustering algorithm where data
is divided into K groups based on the distance between data points and the
centroid of each cluster. Data points within the same cluster are similar to each
other, while different clusters are distinct, with each cluster represented by a
central point called the centroid, and K is a predefined constant.
Hierarchical Clustering: This method builds a hierarchical tree (dendrogram –
a tree diagram showing the process of grouping data into clusters at different
levels) to describe how clusters form in a sequential manner.
Silhouette Analysis for Clustering: This is a metric that assesses the quality of
clustering, indicating the degree of fit of each data point to its current cluster
compared to other clusters. It helps determine whether the clusters have been
clearly separated.
Logistic Regression: This is a probability model that predicts discrete output
values from a set of input values (represented as a vector).
Decision Tree: A tool for building predictive models, used to classify data and
generalize given data in the field of data mining.
SVM (Support Vector Machines): A supervised machine learning algorithm
widely used in classification and regression problems. The objective of SVM is
to represent data as vectors in a space, and then classify them into different
classes by constructing an optimal hyperplane in a multi-dimensional space that
separates the data classes.
Test & Score model :An analytical tool for testing and evaluating models on
datasets, helping compute and display the performance results of the model.
Confusion Matrix): A crucial tool for classification models, it assesses the
performance of algorithms, identifies the errors made by the classification
model, and adjusts decisions based on the evaluation results.
ROC Analysis: A graphical tool widely used for evaluating the performance of
classification models. The curve is created by plotting the True Positive Rate
(TPR) against the False Positive Rate (FPR) at different thresholds..
3. Proposed Research Model
To model the prediction of a company’s operational status as either bankrupt or
still in normal operation, the research team proposes applying a binary classification
model combined with appropriate machine learning algorithms and analytical tools on
the Orange platform. The objective of this model is to use financial indicators and
operational characteristics of the company to predict the target variable "status_label,"
helping determine the company’s bankruptcy risk with the following detailed process:
18
Data Collection and Preprocessing: The dataset is sourced from Kaggle and
undergoes necessary cleaning steps. First, missing values are handled by either
filling in missing information or removing incomplete samples, depending on
the impact of the missing data on the model’s performance. Next, the data is
normalized to bring financial features to a common value range, optimizing the
performance of the algorithms.
Data Analysis and Visualization: Appropriate methods are used to identify
the most important features affecting the target variable "status_label," helping
reduce input dimensions and increase the model’s effectiveness. Additionally,
visualization tools such as scatter plots, box plots, and heatmaps in Orange are
applied to explore and gain a deeper understanding of the data’s characteristics.
Machine Learning Models and Algorithms Used: Algorithms such as
Logistic Regression, Decision Tree, SVM, etc., are selected to optimize the
predictive capabilities of the binary classification model and evaluate the
performance of each model through tools like Test & Score, Confusion Matrix,
etc.
Optimization of Model Selection: After evaluating the performance of the
proposed models, the team will select the optimal model for the prediction task.
This model will be capable of accurately classifying companies at risk of
bankruptcy, helping support investors and managers in making financial and
strategic decisions.
The proposed modeling process for the dataset helps build a system capable of
accurately and effectively predicting the bankruptcy risk of U.S. companies. From data
collection and preprocessing to the selection of the optimized model for analysis, each
stage significantly contributes to the final model’s performance. The optimized model
chosen not only provides accurate forecasting capabilities but also supports managers,
investors, and stakeholders in minimizing risks and maximizing financial benefits.
With this proposed process, we can promptly identify important financial indicators,
improving the competitiveness and sustainability of businesses.
19
Figure 4.1.2 Dataset after being loaded into the File widget
To inspect the "Bankruptcy Prediction" dataset before preprocessing, drag and
drop it into the File widget and select Data Table. The dataset can be viewed by
clicking on the Data Table.
Comments: Based on the detailed information from the dataset, we can deduce the
following:
No Missing Data: The dataset has no missing values, ensuring that all variables
are complete and consistent, making the analysis process more accurate and
reliable.
Features: The dataset includes 19 features, such as company status, company
name, year of operation, and several indicators like current assets, cost of goods
sold, etc. This allows for multidimensional analysis and exploration of
relationships between these features.
Numeric Outcome: The dataset contains numerical outcomes, which may be
related to important metrics such as revenue, profit, asset value, or other
continuous variables. This dataset may be suitable for regression models or
other predictive numerical analyses.
Meta Attributes: The dataset contains 1 meta attribute (classification label),
which can aid in the analysis by focusing on specific groups, such as company
status ("alive").
Figure 4.1.4 Image of the American Bankruptcy Dataset Information (after randomly
selecting 7,869 instances)
21
22
23
24
Use the Silhouette Plot to observe the Silhouette scores for different
clusters:
o Perform clustering with 2 to 5 clusters and evaluate the number of
positive Silhouette scores in each case.
o Choose the number of clusters with the highest number of
positive Silhouette values, as well as the highest average
Silhouette score.
Drag the Data Table widget to examine the details of the selected clusters.
25
26
27
28
Figure 4.2.18 Visual illustration of dividing data into two clusters for comparison.
29
Compared to cluster C1, where the bankruptcy rate is 29.28%, cluster C2 has a
bankruptcy rate of 0%. To identify the factors causing this difference, the team
proceeded with further analysis and selected key indicators, including:
The company's operating period
Figure 4.2.19 Division into two clusters for comparison based on company operating
time index
Cluster C1 consists of companies that operated from 2002 to 2011, while
Cluster C2 focuses on companies operating between 2009 and 2016. A notable
factor is the global financial crisis that occurred from 2007 to 2008, which
could have significantly impacted the companies in Cluster C1. Start-ups or
businesses that were active before the crisis may have struggled to adapt to the
dramatic changes in the financial environment. Specifically, these companies
might have relied on business models or funding sources that became obsolete
after the crisis, leading to difficulties and a higher bankruptcy rate.
In contrast, companies in Cluster C2 emerged after the crisis and likely learned
from the mistakes of earlier companies. They may have adjusted their business
models and developed more cautious financial strategies, while also taking
advantage of the economic recovery to reduce bankruptcy rates. These
companies might have better seized market opportunities once the economy
stabilized after 2008. Additionally, companies in Cluster C2 may have operated
in a less competitive environment, with many other companies failing during
the crisis, or they may have been equipped with more modern technologies and
business strategies compared to companies in Cluster C1.
30
Retained Earnings
31
Net Income
Figure 4.2.22 Comparison of two clusters based on earnings before interest and tax
32
33
helps reduce storage costs and financial risks, while also indicating stability in
the operational strategies of the companies within this cluster.
EBITDA
Figure 4.2.24 Comparison of two clusters based on earnings before interest, tax, and
depreciation
The EBITDA index of companies in cluster 1 exhibits significant dispersion,
with a substantial portion in the negative range. This indicates difficulties in
generating profits before interest, tax, and depreciation expenses. Such
challenges may result from inefficient business strategies, high operational
costs, or negative impacts of unfavorable macroeconomic conditions.
Companies in this cluster are likely facing major financial and operational
management challenges.
The EBITDA index in cluster 2 is concentrated at high levels, with no negative
values observed. This reflects better management capabilities, enabling
companies in this cluster to maintain efficient operations and achieve stable
profitability. These businesses may have effectively seized opportunities during
the economic recovery period and adjusted their strategies to optimize profits
before financial expenses.
34
Total Receivables
35
operations. The larger accounts receivable values reflect their higher transaction
volumes and stronger credit capabilities compared to companies in Cluster C1.
This study highlights the differences between business groups in terms of size,
financial capacity, and operational strategies, while also underscoring the impact of
macroeconomic contexts on businesses during distinct historical periods. The findings
not only provide a deeper understanding of the characteristics of each cluster but also
serve as a basis for recommending appropriate support policies. These policies can
help SMEs overcome challenges, improve competitiveness, and achieve sustainable
growth in the long term
36
37
38
Double-click on the Data Sample widget, set the Fixed proportion to 30%,
then click Sample Data, and close the dialog.Nháy đúp chuột vào Data
Sample, tại Fixed proportion kéo thả thành 30% và nhấn chọn Sample Data,
tắt hộp thoại.
39
40
42
43
44
Figure 4.3.42 Confusion matrix results of Logistic Regression with sample count
Figure 4.3.43 Confusion matrix results of Logistic Regression with prediction ratio
Table 4.3.2 Comparision of results from 3 tools based on the confusion matrix
TP TN FP FN FP + FN
TP TN
Commnentary:
Based on the statistics, it is evident that the Tree model is the most suitable method.
Looking at the confusion matrix comparison results of the three methods:
FP: Tree < Logistic Regression < SVM (comparing the FP percentage rates)
FN: SVM < Tree < Logistic Regression (comparing the FN percentage rates)
Looking at the total errors FP+FN, Tree has the smallest error rate at 35.4%.
From a business perspective, False Negative (FN) errors, i.e., predicting that a
company will survive when it has actually gone bankrupt, will have more severe
consequences than False Positive (FP) errors, i.e., predicting that a company has gone
bankrupt when it is still operational.
The reason is that False Negative errors can lead to businesses or stakeholders
maintaining a false belief in the viability of a business entity that can no longer
operate, resulting in significant financial losses. For example, investors, partners, or
banks may continue to provide funding or sign contracts with a company that can no
longer meet its financial obligations, leading to a widespread risk.
On the other hand, False Positive errors typically only result in missed
opportunities for collaboration or investment. While this can cause some damage, the
losses are usually recoverable by pursuing other business opportunities. Therefore,
considering the overall impact of the damages businesses may face, it is more
important to avoid False Negative errors, as their consequences are often more severe
and harder to remedy than False Positive errors.
Thus, based on the confusion matrix method, the Tree model is the most suitable
classification (forecasting) method, and the team will use this method for forecasting.
46
47
48
49
From the results presented in the table above, the sample division ratio of 20
folds demonstrates the best performance with the Decision Tree model, as indicated by
50
a slightly higher AUC value compared to the 5-fold division (0.802 > 0.801), despite
the overall performance metrics showing minimal differences.
Based on the Confusion Matrix, the Decision Tree model further proves to be the
optimal choice, reinforcing its stability and suitability. This result highlights its
effectiveness over Decision Tree in classification tasks.
By evaluating the implemented models, we can conclude that the Decision Tree
method is the most appropriate approach for predicting the likelihood of bankruptcy or
the continued operation of U.S. companies.
51
CHAPTER V: RESULTS
1. Analysis and Answers to Research Questions
Question 1: Which financial factors can accurately predict the bankruptcy risk of U.S.
companies, and how do they influence investment decisions?
52
53
54
$250 billion to the economy. If companies can accurately predict their financial
condition and identify early signs of bankruptcy, they can take timely corrective
actions, such as debt restructuring or business strategy adjustments.
Moreover, bankruptcy prediction helps investors make more informed decisions
when selecting stocks or other assets, thereby minimizing losses in the stock market. A
study by Moody’s Analytics (2022)[8] indicates that, during financial crises, companies
with a lower bankruptcy rate are more likely to be chosen by investors, contributing to
market stability. On the other hand, accurately predicting bankruptcy risk not only
reduces risk but also opens up important opportunities to improve the economy. The
government and financial institutions can leverage data from these predictions to
develop more strategic policies to support businesses, not only through relief measures
but also by fostering innovation and restructuring. For example, through financial aid
packages, businesses can invest in new technologies, improve management processes,
and enhance competitiveness, turning a crisis into a development opportunity.
After the 2008 financial crisis, for instance, the U.S. not only rescued large
financial institutions but also encouraged technology startups and increased investment
in innovative sectors like renewable energy. These policies helped not only recover the
economy but also created millions of new jobs, laying the foundation for more
sustainable growth in the future.
In the field of data science, bankruptcy prediction models based on financial
indicators can also improve decision-making across various sectors. These models can
be applied not only to individual companies but also to the entire market, helping
financial regulators predict systemic risks. Machine learning algorithms and artificial
intelligence (AI) can analyze massive amounts of data from financial reports,
providing more accurate predictions of future bankruptcy risk. When widely applied,
these predictions help companies better manage their financial resources, protect
investors, and contribute to maintaining the stability of the U.S. economy in an ever-
changing economic environment.
Thus, bankruptcy prediction not only provides direct benefits to businesses but
also plays a vital role in stabilizing the U.S. market economy, enabling investors,
governments, and financial regulators to act early to minimize systemic risks, protect
financial markets, and maintain sustainable growth. Rather than viewing bankruptcy
solely as a risk, managing and predicting this risk can be transformed into an
opportunity for reform, enhancing economic efficiency and reshaping business models
in a more positive and sustainable direction.
55
56
57
REFERENCES
1. Ministry of Finance of Vietnam. (2022). Details of the article. Retrieved
November 26, 2024, from:
https://mof.gov.vn/webcenter/portal/ttncdtbh/pages_r/l/chi-tiet-tin?
dDocName=MOFUCM227388
2. Vietnam Academy of Social Sciences. (2017). Vai trò của doanh nghiệp
vừa và nhỏ ở Hoa Kỳ trong giai đoạn hiện nay. Retrieved November 26,
2024, from:
https://thuvienkhxh-vass.contentdm.oclc.org/digital/collection/p20065coll
33/id/2709/\
3. Hoàng, V. (2020). Vận dụng mô hình Z-score trong dự báo khả năng phá
sản doanh nghiệp tại Việt Nam. Học viện Ngân hàng. Retrieved November
26, 2024, from
https://hvnh.edu.vn/medias/tapchi/vi/07.2020/system/archivedate/8e4f152
e_B%C3%A0i%20c%E1%BB%A7a%20T%C3%A1c%20gi%E1%BA
%A3%20Ho%C3%A0ng%20Th%E1%BB%8B%20H%E1%BB%93ng
%20V%C3%A2n.pdf
4. Sở Giao dịch Chứng khoán Hà Nội. (n.d.). ZSCORE - Mô hình dự báo khả
năng phá sản doanh nghiệp. Retrieved November 28, 2024, from
https://www.shs.com.vn/Terms/ZSCORE.aspx
5. Nguyễn, T. T. L. (2019). Các nhân tố ảnh hưởng đến rủi ro phá sản của
các doanh nghiệp niêm yết ngành Xây dựng tại Việt Nam. Học viện Ngân
hàng. Retrieved November 28, 2024, from:
https://hvnh.edu.vn/medias/tapchi/vi/07.2019/system/archivedate/B
%C3%A0i%20c%C3%A1%BB%A7a%20ThS.Nguy%E1%BB%85n
%20Th%E1%BB%8B%20Tuy%E1%BA%BFt%20Lan.pdf
6. Trương, T. T. D., & Lê, H. T. (2023). Ứng dụng phương pháp học máy
trong dự báo rủi ro phá sản của các doanh nghiệp Việt Nam. Da Nang
University. Retrieved November 28, 2024, from
https://scholar.dlu.edu.vn/thuvienso/bitstream/DLU123456789/195255/1/
CTv60S3102023044.pdf
7. Uyển, N. T., & Trang, P. T. Q. (2020). Vai trò quan trọng của quản trị tài
chính đối với doanh nghiệp trước những thách thức và rủi ro trong bối
cảnh hội nhập kinh tế quốc tế. Retrieved November 28, 2024, from
https://tusach.fph.gov.vn/upload/data/news/13-05-24/30.-ky-yeu-hoi-thao-
khoa-hoc-giai-phap-quan-tri-tai-chinh-va-dau-tu.pdf#page=51
8. Moody's. (2024). Moody's Outlooks 2025: Clarity from complexity.
Retrieved November 28, 2024, from https://www.moodys.com/
58