Decision Tree and Random Forest Based Novel Unsteady Aerodynamics Modeling Using Flight Data
Decision Tree and Random Forest Based Novel Unsteady Aerodynamics Modeling Using Flight Data
Engineering Notes
Decision Tree– and Random Forest– High angle of attack and nonlinear aerodynamics modeling have
been studied for several decades, and numerous reports have been
Based Novel Unsteady Aerodynamics published in this study area. Goman and Khrabrov devised a state
Modeling Using Flight Data space approach to model the unsteady aerodynamics [6]. Nelson and
Pelletier have extensively reviewed the experimental information
on flow structure over delta wings and also on complete aircraft
Ajit Kumar∗ and Ajoy Kanti Ghosh† configurations [7]. Fischenberg and Jategaonkar have presented the
Indian Institute of Technology of Kanpur, Kanpur 208 016, study of unsteady aerodynamics using quasi-steady stall model on
C-160 transport aircraft [8]. Ghoreyshi and Cummings came up with
Uttar Pradesh, India a novel approach based on the time-dependent surrogate unsteady
DOI: 10.2514/1.C035034 aerodynamics model performed under different maneuvers [9].
Peyada and Ghosh developed a neural-network-based novel
Downloaded by Indian Institute of Science on December 8, 2023 | http://arc.aiaa.org | DOI: 10.2514/1.C035034
models. In these methods, the algorithm learns information directly A. Classification and Regression Tree
from the data, and so they are also known as machine learning CART is a classification and regression method developed by
techniques; they use observed flight data to construct the decision tree Breiman et al. [26] in the early 1980s, which constructs the decision
and set of decision trees [25–29] to predict the unsteady aerodynamic tree using observed data. Decision trees are usually presented by a set
force and moment coefficients. of questions that then split the learning sample into smaller parts. The
The efficacy of the proposed methods is examined on ATTAS aim of this method is not only to find the models that produce accurate
aircraft data by estimating the lift force coefficient (CL ), drag force predictions, but also to extract knowledge intelligently. Tree-based
coefficient (CD ), and pitching moment coefficient (Cm ) using the methods stand as one of the most effective and useful methods
measured flight data. Open-accessible flight test data of the research capable of producing both reliable and understandable results. There
aircraft DLR-ATTAS were used to estimate aircraft longitudinal are following features that make this approach quite attractive in
force and moment coefficients [1] and demonstrated the proposed practice:
algorithms for unsteady aerodynamics modeling from flight data. 1) Nonparametric and can model any complex relations between
Flight test data were segregated as input data set (angle of attack, inputs and outputs, without any a priori assumption
elevator deflection, pitch rate, and relative airspeed) and as output 2) Handles ordered or categorical variables or a mix of both
dataset (measured force and moment coefficients). These methods 3) Robust to noise, outliers, or errors in labels
were trained and validated with real flight data: 70% of the data were 4) Easy to interpret, understand, and visualize
used in training, and the remaining 30% were used for validation of the In CART, the regression tree is used when output variables are
model. The statistical modeling technique was used in quantifying the continuous, and the classification tree is used when output variables
uncertainty in the model. Estimated nonlinear aerodynamic models are categorical. In the case of regression tree, values obtained by
Downloaded by Indian Institute of Science on December 8, 2023 | http://arc.aiaa.org | DOI: 10.2514/1.C035034
were compared with maximum likelihood estimation (MLE)– terminal nodes in the training of the data are the average response of
predicted models. MLE methods have been a popular classical method the observation falling in the region and this process continues until
for several decades. Estimated results from both the methods were the stopping criterion is met. The splitting process results in fully
found to be in close agreement with each other. CART- and RF- growing trees until the stopping criterion is reached. But the fully
estimated models of CL and CD are fairly close to MLE-estimated grown trees are likely to overfit data, leading to poor accuracy, and so
results. CART- and RF- estimated model of Cm has better correlation pruning is done to avoid it. The simple example of regression-tree-
than MLE-based estimated Cm . Statistical analysis of the presented based modeling is explained in [26] using the Boston housing
results deduces that these machine-learning-based methods can be a classical dataset. The CART algorithm is demonstrated on quasi-stall
viable alternative to this problem. Between CARTand RF, RF found to modeling of DLR-ATTAS aircraft using flight test data.
be slightly better for obvious reasons. Further, they do not require to The CART algorithm can be understood by the following steps:
solve the equation of motion; this advantage further motivates 1) Label the data: Identify the target variable and input variables.
promising directions for future research on nonlinear modeling and In the current application, target variables are coefficient of drag force
identification. (CD ), coefficient of lift force (CL ), and coefficient of the pitching
This paper has the following contributions: moment (Cm ), and input variables are angle of attack (α), elevator
1) Development of a CART- and RF-based model for lift force, deflection (δe), pitch rate (q), and relative airspeed (V).
drag force, and pitching moment coefficient 2) Best split: Identify the best split for each of the input variables.
2) Statistical analysis of the estimated results and comparison with A variable value that produces the greatest separation in the target
conventional method MLE variable called split is selected. Separation in regression trees is
The paper is structured as follows. Section II explains the CART defined using the sum of the squared errors.
and RF methodology and implementation on the problem. Minimum square error (MSE) is used as the best split criterion in this
Section III presents the estimation results and discussion using application.
the CART, RF, and MLE methods in detail, followed by 3) Repeat the process until meeting the stopping criterion.
conclusions. Error tolerance is set as 10−6
4) Pruning of the tree
MSE has been used to avoid the overfitting issue. However, CART
algorithm is prone to advantages like an unstable tree and split by one
variable. These issues are generally handled by incorporating many
II. Methodology: Nonparametric Nonlinear decision trees together. RF is one of such an attempt toward this goal
Aerodynamic Modeling and described in the next section.
CART- and RF-based machine learning methods are applied to
model the quasi-steady stall phenomena of ATTAS research B. Random Forest
aircraft. These methods use tree-based nonparametric approach of The RF proposed by Breiman in 2001 is a statistical estimation-
modeling using flight data. Nonparametric models are structureless based nonparametric tool consisting of many decision trees. As the
models that do not involve parametric equation or any physical name suggests, it creates a forest by generating many decision trees.
parameters such as parametric model. Parametric models are In general, when more trees are in the forest, more robust is the
defined as structured equations with parameters. Parameters prediction and thus higher accuracy. RF generates a forest of the
associated in the model have physical significance and contribute to CART by sampling the data and variables randomly and iteratively.
the analysis of aerodynamic characterization. Although parametric The RF predicts based on the averaging prediction made by each tree
models are very useful and give enough physical insight, they lack in the ensemble using input data. The performance measure is
in completely capturing the dynamics of the phenomena. Any checked by the following two measures [30–35]:
structured or parametrized model will have some amount of 1) A measure of accuracy, by mean squared error
inaccuracy depending on the severity and nonlinearly of the 2) A measure of node impurity by Gini index
dynamics, which reduces the efficiency in modeling. On the other The Gini index, which measures node impurity, is obtained by
hand, CART and RF are data-driven modeling approaches where subtracting the sum of the squared probabilities of each class
they capture the complete dynamics by understanding the gathered from one. P
flight data. These methods lack the physical understanding but It can be written as Gini 1 − ki1 P2i , where k represents class.
increase the modeling accuracy, which is helpful in designing There are two main stages in the RF algorithm: first is the creation
control and guidance law, and fault detection algorithm. The next of an RF; second is to make a prediction from the classifier created.
section describes how these methods are applied to estimate RF algorithm:
longitudinal force and moment model under the quasi-stall 1) Randomly select “K” features from “m” total features
condition using flight data gathered on ATTAS aircraft. where K ≪ m
J. AIRCRAFT, VOL. 56, NO. 1: ENGINEERING NOTES 405
2) Among the “K” features, calculate the node “d” using the best 3) Out of bag prediction to compute predicted class probabilities
split point 4) Minimum number of observations per tree leaf: 5
3) Split the node into daughter nodes using the best split 5) Split criterion: MSE
4) Repeat first three steps until “l” number of nodes has been Independent variables in both the algorithms considered were the
reached angle of attack, elevator deflection, pitch rate, and relative airspeed.
5) Build forest by repeating steps 1 to 4 Target variables considered were CL , CD , and Cm , which have been
Advantages of RF are that 1) it avoids the overfitting problem; 2) for obtained by following relations:
both classification and regression tasks, the same RF algorithm can be
used; and 3) it can be used for identifying the most important features CL −CZ cos α CX sin α
from the training dataset, in other words, feature engineering.
Sets of regression tree were constructed using target and CD −CX cos α − CZ sin α
independent variables. Target and independent variables are the same I Y q_ − Feltx sin σ t ltz cos σ t rqI X − I Z I XZ p2 − r2
as the CART algorithm. Split criteria were by a mean square error. Cm
0.5ρV 2
(1)
III. Estimation Results and Discussion
where
Machine-learning-based CART and RF methods have been
proposed to estimate the nonlinear aerodynamic coefficient model of max − Fe cos σ t
the aircraft. These methods were examined on openly accessible flight CX
Downloaded by Indian Institute of Science on December 8, 2023 | http://arc.aiaa.org | DOI: 10.2514/1.C035034
0.5ρV 2 S
test data of the research aircraft DLR-ATTAS, Germany [1], to estimate
maz Fe sin σ t
the lift force, drag force, and pitching moment coefficients. The CZ (2)
MATLAB [35] software package has been used to implement these 0.5ρV 2 S
algorithms. Further, estimated results of force and moment coefficients
have been compared with the conventional method based on MLE. where
I Y moment of inertia along the y axis, kg ⋅ m2
q_ pitch acceleration, rad∕s2
A. Nonlinear Aerodynamic Modeling Using CART- and RF-Based
Novel Algorithm
Fe thrust force, N
ltx , ltz location of the engine relative to the c.g.
CART- and RF-based methods have been examined on DLR- σ t thrust inclination angle, deg
ATTAS quasi-steady stall data. Force and moment coefficients, CL , I X moment of inertia along the x axis, kg ⋅ m2
CD , and Cm , of the aircraft have been estimated using these data- I Z moment of inertia along the z axis, kg ⋅ m2
driven developed model. Of the flight data, 70% have been used in I XZ cross moment of inertia in the plane xz, kg ⋅ m2
training the model, and the remaining 30% of the data have been used p roll rate, rad∕s
in validating the force and moment coefficient models. r yaw rate, rad∕s
The CART algorithm has been implemented with the following ρ density, kg∕m3
configurations: V air relative speed, m∕s
1) Pruning criteria: MSE m mass, kg
2) Quadratic error tolerance 10e-6 S reference area m2
3) Split criterion: MSE ax axial acceleration, m∕s2
The RF algorithm has been implemented with the following az normal acceleration, m∕s2
configurations:
1) Ensemble with 100 trees Figure 1 shows the time history of the data used in both of these
2) Tree method: Regression algorithms. The figure shows that the change in the elevator
5
q[dps]
0
-5
0 20 40 60 80 100 120 140 160 180 200
20
q [deg]
10
0
0 20 40 60 80 100 120 140 160 180 200
20
a [deg]
0
-20
0 20 40 60 80 100 120 140 160 180 200
100
V[m/s]
80
60
0 20 40 60 80 100 120 140 160 180 200
a [m/s 2 ]
0
-10
-20
z
2
1
0
x
0
-10
0 20 40 60 80 100 120 140 160 180 200
time [s]
Fig. 1 Time history plot for ATTAS quasi-stall data.
406 J. AIRCRAFT, VOL. 56, NO. 1: ENGINEERING NOTES
2 Measured Flight data gathering on dynamic stall are more risky and tedious.
CART Model
1.5 RF Model Quasi-steady stall enables hysteresis time constant τ2 only. The
CL
1
transient effect is neglected by setting τ1 as zero, and this model is
called quasi-steady stall model. In this model, a1 , τ2 , and α are
0.5
0 20 40 60 80 100 120 140
generally adequate to capture the stall hysteresis. The following θstall
vector is a set of aerodynamic and stall parameters that are predicted
0.4
by minimizing the output error in MLE framework.
0.3
CD
0.2
0.1
θstall
0 CD0 eCL0 CLα CLαMα Cm0 Cmα Cmq Cmδe a1 α τ2 CDX CmX T
0 20 40 60 80 100 120 140
(5)
0.2
0.1
θstall vector estimated as: CD0 0.04351.00 , e 0.8390.82,
Cm
0
CL0 0.1582.08, CLα 3.2981.14, CLαMα 9.072.20,
-0.1
Cm0 0.053.50, Cmα −0.1763, Cmq −6.1464.47, Cmδe
-0.2
0 20 40 60 80 100 120 140 −0.3914.08, a1 23.7163.41, α 0.3090.35, τ2
time [s]
24.025c∕V1.46, CDX 0.07923.82, CmX −0.12613.98
Fig. 2 CART- and RF-based developed CL , CD , and Cm model for with acceptable Cramer–Rao bounds. Numbers inside represent
Downloaded by Indian Institute of Science on December 8, 2023 | http://arc.aiaa.org | DOI: 10.2514/1.C035034
ATTAS research aircraft. the relative standard deviation. These data are obtained by running
the MATLAB code given in Ref. [1]. Interested readers can get the
access to these details from p. 421 or p. 475 of the newer edition
deflection affects all the other presented corresponding flight of Ref. [1].
variables in a similar variation pattern. The presented data were Comparison of the CART-, RF-, and MLE-estimated force and
acquired at the sampling rate of 25 Hz. Gathered flight data have been moment coefficients CL , CD , and Cm are presented in Figs. 3–5.
segregated as a target and independent variables to use in the
proposed algorithm. Elevator deflection observed is around 8 deg in
2
both the directions. Axial and normal accelerations follow the change
Measured
in the elevator deflections. Variations of 18 to −18 deg have been CART predicted
recorded. Change in pitch angle is 15 deg approximately, and −4 to 1.8 RF predicted
MLE predicted
4 dps change have been observed in pitch rate data.
Training data for the developed model of CL , CD , and Cm using 1.6
CARTand RF are presented in Fig. 2. These models were designed by
judicious selection of training and validation data. Training data of
1.4
70% capture all the dynamic and include most of the variation. So
CL
training of these two models was achieved using 70% of the flight
data gathered on ATTAS aircraft. Further, these models are validated 1.2
with the remaining 30% of the flight data. CART- and RF-estimated
models are validated with the measured coefficients and also with the 1
derived model from MLE-estimated parameters and input variables.
Validation data deduce that the models obtained from the CART and 0.8
RF are well trained.
1.2
qc
Cm Cm0 Cmα α Cδe δe Cmq CmX 1 − X (3) 1.1
2V
1
The flow separation point X in the above equation is estimated
using the following relation in Eq. (4): 0.9
0.8
dX 1
τ1 X f1 − tanha1 α − τ2 α_ − α g (4) 0.7
dt 2
where a1 is airfoil static stall characteristics 0.8 1 1.2 1.4 1.6 1.8 2
τ1 , transient time constants, s C L measured
τ2 is hysteresis time constant, s Fig. 4 Prediction correlation of lift force coefficient (CL ) among CART,
α is the break point RF, and MLE.
J. AIRCRAFT, VOL. 56, NO. 1: ENGINEERING NOTES 407
0.3 0.26
Measured 0.24
CART predicted
RF predicted
0.25 MLE predicted 0.22
0.2
CD predicted
0.2 0.18
CD
0.16
0.15 0.14
0.12
0.1 0.1
CART
0.08 RF
MLE
0.05 0.06
0 10 20 30 40 50 60 70 0.05 0.1 0.15 0.2 0.25 0.3
time [s] CD measured
Fig. 5 Validation of drag force coefficient (CD ) model using a CART, Fig. 6 Prediction correlation of drag force coefficient (Cm ) among
Downloaded by Indian Institute of Science on December 8, 2023 | http://arc.aiaa.org | DOI: 10.2514/1.C035034
Figure 3 shows the validation of the lift force coefficient (CL ) using 0.1
proposed methods CART and RF from measured output data. The
plot also shows the MLE-estimated CL that is derived from the
predicted aerodynamic parameters and input variables. Estimated 0.05
and measured CL curves are fitting well on each other.
Figure 4 shows a prediction correlation of measured lift force
coefficient (CL ) with the CART, RF, and MLE. The correlation 0
coefficient of all these three methods with measured data is close to
Cm
Table 1 Prediction correlation and RMSE calculation for CART, RF, and MLE methods
CART Random forest MLE
Force/moment coefficient Prediction correlation RMSE Prediction correlation RMSE Prediction correlation RMSE
CL 0.9908 0.0278 0.9930 0.0244 0.989818 0.029438
CD 0.9402 0.0100 0.9559 0.0086 0.948576 0.009290
Cm 0.8961 0.0058 0.9332 0.0047 0.466906 0.011350
408 J. AIRCRAFT, VOL. 56, NO. 1: ENGINEERING NOTES
0.1
comparison. CART- and RF-estimated models for CL and CD were
fairly close to the ML-estimated results and the estimated Cm was
higher in correlation than the MLE estimate.
0.05 From the statistical analysis, it can be concluded that the RF is
slightly superior to CARTand that these two methods can be seen as a
viable alternative to MLE, which is a parametric approach to model
Cm predicted
-0.15
MLE Predicted References
-0.12 -0.1 -0.08 -0.06 -0.04 -0.02 0 0.02 0.04 0.06 0.08 [1] Jategaonkar, R. V., Flight Vehicle System Identification: A Time Domain
Cm measured Methodology, AIAA Education Series, AIAA, Reston, VA, 2006,
Fig. 8 Prediction correlation of pitching moment coefficient (Cm ) Chaps. 2, 12.
Downloaded by Indian Institute of Science on December 8, 2023 | http://arc.aiaa.org | DOI: 10.2514/1.C035034
among CART, RF, and ML. [2] Grauer, J. A., and Morelli, E. A., “Generic Global Aerodynamic Model
for Aircraft,” Journal of Aircraft, Vol. 52, No. 1, 2014, pp. 13–20.
[3] Klein, V., and Morelli, E. A., Aircraft System Identification: Theory and
Practice, AIAA, Reston, VA, 2006, pp. 351–382.
2 [4] Fischenberg, D., “Identification of an Unsteady Aerodynamics Stall
Model from Flight Test Data,” RTO MP-11, DLR-Germany, Paper
1.8
No. 17, 1999.
[5] Mehra, R. K., Stepner, D. E., and Tyler, J. S., “Maximum Likelihood
Identification of Aircraft Stability and Control Derivatives,” Journal of
1.6 Aircraft, Vol. 11, No. 2, 1974, pp. 81–89.
doi:10.2514/3.60327
[6] Goman, M. G., and Khrabrov, A. N., “State-Space Representation of
1.4 Aerodynamic Characteristics of an Aircraft at High,” Journal of
CL
[18] Winter, M., and Breitsamter, C., Unsteady Aerodynamic Modeling Technique in Estimating Peak Particle Velocity Caused by Blasting,”
Using Neuro-Fuzzy Approaches Combined with POD, Deutsche Engineering with Computers, Vol. 33, No. 1, 2017, pp. 45–53.
Gesellschaft für Luft-und Raumfahrt-Lilienthal-Oberth eV, Bonn, doi:10.1007/s00366-016-0455-0
Germany, 2015. [28] Breiman, L., Friedman, J. H., Olshen, R., and Stone, C. J., Classification
[19] Morelli, E. A., and Klein, V., “Application of System Identification to and Regression Tree, Wadsworth Brooks/Cole Advanced Books &
Aircraft at NASA Langley Research Center,” Journal of Aircraft, Software, Pacific, CA, 1984, Chap. 8.
Vol. 42, No. 1, 2005, pp. 12–25. [29] Razi, M. A., and Athappilly, K., “A Comparative Predictive Analysis of
doi:10.2514/1.3648 Neural Networks (NNs), Nonlinear Regression and Classification and
[20] De JesusMota, S., and Botez, R. M., “New Helicopter Model Regression Tree (CART) Models,” Expert Systems with Applications,
Identification Method Based on a Neural Network Optimization Vol. 29, No. 1, 2005, pp. 65–74.
Algorithm and on Flight Test Data,” The Aeronautical Journal, Vol. 115, doi:10.1016/j.eswa.2005.01.006
No. 1167, 2011, pp. 295–314. [30] Antipov, E. A., and Pokryshevskaya, E. B., “Mass Appraisal of
doi:10.1017/S0001924000005789 Residential Apartments: An Application of Random Forest for
[21] Boely, N., Botez, R. M., and Kouba, G., “Identification of a Nonlinear Valuation and a CART-Based Approach for Model Diagnostics,”
F/A-18 Model by Use of Fuzzy Logic and Neural Network Methods,” Expert Systems with Applications, Vol. 39, No. 2, 2012,
Proceedings of the Institution of Mechanical Engineers, Part G, Journal pp. 1772–1778.
of Aerospace Engineering, Vol. 225, No. 5, 2011, pp. 559–574. doi:10.1016/j.eswa.2011.08.077
[22] Kumar, A., and Ghosh, A. K., “Data-Driven Method Based Aerodynamic [31] Loh, W.-Y., “Classification and Regression Trees,” Wiley Interdisciplinary
Parameter Estimation from Flight Data,” 2018 AIAA Atmospheric Flight Reviews: Data Mining and Knowledge Discovery, Vol. 1, No. 1, 2011,
Mechanics Conference, AIAA Paper 2018-0768, 2018. pp. 14–23.
[23] Saderla, S., Dhayalan, R., and Ghosh, A. K., “Non-Linear Aerodynamic [32] Verikas, A., Gelzinis, A., and Bacauskiene, M., “Mining Data with
Downloaded by Indian Institute of Science on December 8, 2023 | http://arc.aiaa.org | DOI: 10.2514/1.C035034
Modelling of Unmanned Cropped Delta Configuration from Experimental Random Forests: A Survey and Results of New Tests,” Pattern
Data,” The Aeronautical Journal, Vol. 121, No. 1237, 2017, pp. 320–340. Recognition, Vol. 44, No. 2, 2011, pp. 330–349.
doi:10.1017/aer.2016.124 doi:10.1016/j.patcog.2010.08.011
[24] Nelles, O., Nonlinear System Identification from Classical approaches [33] Joshi, A., Monnier, C., Betke, M., and Sclaroff, S., “Comparing Random
to Neural Networks and Fuzzy Models, Springer–Verlag, Berlin, 2001, Forest Approaches to Segmenting and Classifying Gestures,” Image and
Chaps. 1, 12. Vision Computing, Vol. 58, Feb. 2017, pp. 86–95.
[25] Loh, W. Y., “Classification and Regression Trees,” Encyclopedia of doi:10.1016/j.imavis.2016.06.001
Statistics in Quality and Reliability, edited by F. Ruggeri, R. Kenett, and [34] Markham, I. S., Mathieu, R. G., and Wray, B. A., “Kanban Setting
F. W. Faltin, Wiley, Chichester, U.K., 2011, pp. 315–323. Through Artificial Intelligence: A Comparative Study of Artificial
[26] Timofeev, R., Classification and Regression Trees (CART) Theory and Neural Networks and Decision Trees,” Integrated Manufacturing
Applications, Humboldt Univ., Berlin, 2004, Chap. 5. Systems, Vol. 11, No. 4, 2000, pp. 239–246.
[27] Khandelwal, M., Armaghani, D. J., Faradonbeh, R. S., Yellishetty, M., doi:10.1108/09576060010326230
Majid, M. Z. A., and Monjezi, M., “Classification and Regression Tree [35] MATLAB, The MathWorks. Inc., Natick, MA, 2016, p. 488.