04.2 DigitalSoilMappinginEarthEngine RandomForestRegression
04.2 DigitalSoilMappinginEarthEngine RandomForestRegression
2 | May 2024
EXERCISE 4.2
Run a Random Forest
Regression
Introduction
The previous exercises have taught us how to 1) access Landsat Imagery to produce a seasonal
composite, 2) calculate spectral indices using Landsat spectral bands, and 3) use other GEE resources to
compile topographic and climatic data. In exercise 4.1 we used that information to run a Random Forest
classification in GEE, so our next step is run a Random Forest regression.
The major difference between a classification and a regression is that in classifications the attribute we
are mapping is thematic and in regression the attribute is continuous.
The region of interest for this exercise is in Essex county, Vermont – all the necessary data will be
provided in the course folder. During this exercise we will use training data to model soil redox depth,
then produce useful accuracy assessment charts and statistics.
You will quickly realize that most of the code for this exercise is very similar to exercise 4.1. Either way,
pay attention to where the scripts you will be creating differ – this will make it easier to modify the code
to your own study area in the future.
Objectives
• Use DSM_Lib of functions to load in Landsat composite and stack predictor layers
• Run a Random Forest regression with an understanding of how function parameters effect the
model
• Conduct model assessment by interpreting useful statistics and figures
• Display final regression and legend on map
Required Data:
• VT_boundary.shp – shapefile representing example area of interest
• VT_pedons.shp – shapefile of training data for Essex County, Vermont
Prerequisites
• Completion of Exercise 1-3 (you can review code by accessing the 02_ExerciseCompleteScripts
folder in the course repository)
• Google Chrome installed on your machine
• An approved Google Earth Engine account
• Follow the links below to gain read access to the GEE code repositories we will refer to in the
script.
o Click here to gain access to the GTAC module repository
o Click here to gain access to the GTAC training repository
Persons with disabilities who require alternative means of communication for program information (e.g., Braille, large print,
audiotape, American Sign Language, etc.) should contact the responsible Agency or USDA's TARGET Center at (202) 720-2600
(voice and TTY) or contact USDA through the Federal Relay Service at (800) 877-8339. Additionally, program information may
be made available in languages other than English.
To file a program discrimination complaint, complete the USDA Program Discrimination Complaint Form, AD-3027, found online
at How to File a Program Discrimination Complaint and at any USDA office or write a letter addressed to USDA and provide in
the letter all of the information requested in the form. To request a copy of the complaint form, call (866) 632-9992. Submit
your completed form or letter to USDA by: (1) mail: U.S. Department of Agriculture, Office of the Assistant Secretary for Civil
Rights, 1400 Independence Avenue, SW, Washington, D.C. 20250-9410; (2) fax: (202) 690-7442; or (3) email:
[email protected].
Table of Contents
functions, we must load in the library. Under the comment that reads “Load in Library of
functions”, paste:
i. Navigate to the ’FEATURES’ tab and explore the different attributes of the shapefile. Since
for this regression we are predicting soil redox depth, we are interested in the
‘REDOX_CM’ field.
ii. Click the blue IMPORT button at the bottom right corner to add it to your script.
3. When you go back to your script, you’ll see a new table has been added to your list of Imports
at the top. Change the name of the new shapefile from Table to VT_pedons.
i. Now you should have two imports: VT_boundary and VT_pedons.
var split = 0.7; // separate 70% for training, 30% for validation
var trainingData = datawithColumn.filter(ee.Filter.lt('random', split));
var validationData = datawithColumn.filter(ee.Filter.gte('random', split));
3. Good job! Now we’ve added in our training and validation data points for our regression.
Don’t forget to save your work as you go!!
Note: we set the scale for training data to 30 m – keeping it consistent with the predictor layer
reprojection we applied earlier.
.train({
features: training,
classProperty: 'REDOX_CM',
inputProperties: bands
});
ii. Notice how ‘setOutputMode’ is set to ‘REGRESSION’ this time. This command is the most
important for running different types Random Forest models in GEE.
iii. Below you’ll see the documentation for our Random Forest model. This is how we know
how to set important parameters. For example, in our case, we’re setting numberOfTrees
= 100. Keep this information in mind if you want to customize your model in the future.
2. Finally, now we’re going to classify our image using the classifier we just created. Under the
comment that reads “Classify image”, paste:
// Add to map
var viz = {palette: palette, min: regressionMin.getNumber('predicted').getInfo(),
max: regressionMax.getNumber('predicted').getInfo()};
Map.addLayer(regression, viz, 'Regression');
i. As you can see, displaying this regression is bit more complex than displaying the
classification. This is because the first part of this code is calculating the appropriate min
and max values for our visualization – It’s simply finding and using the highest and lowest
values of predicted redox depth. In the future, when you use this code to model a
different continuous variable, it will automatically choose acceptable values for your
visualization.
ii. The tileScale parameter adjusts the memory used to calculate the min and max values
used to display the regression appropriately on the map. If you run into memory issues on
this exercise or when adapting to use your own area, you can increase the value of this
parameter. You can learn more about using tileScale and debugging in the GEE guides.
}
});
legend.add(legendTitle);
legend.add(panel);
],
});
legend.add(panel);
Map.add(legend);
i. The “Classifier information” will print to the console and display values for variable
importance, the number of trees in the model, as well as the out-of-bag (OOB) error
estimate. The OOB error is another way of evaluating model performance, and gives the
mean error in predicting samples that were not included in a particular “bag” or decision
tree.
// Get variable importance
var dict = classifier.explain();
print("Classifier information:", dict);
var variableImportance = ee.Feature(null, ee.Dictionary(dict).get('importance'));
// Make chart, print it
var chart =
ui.Chart.feature.byProperty(variableImportance)
.setChartType('ColumnChart')
.setOptions({
title: 'Random Forest Variable Importance',
legend: {position: 'none'},
hAxis: {title: 'Bands'},
vAxis: {title: 'Importance'}
});
print(chart);
ii. Once you run your code, you should have something that looks like this:
3. Next, we’re going to make a histogram that shows how many pixels in our study area were
predicted at each redox depth. This is a useful visual to assess the distribution of the
predicted values. Under the comment that reads **** Histogram of predicted redox depth
****”, paste:
4. Finally, the last figure we’ll be making is a Predicted vs Observed Scatterplot. These are useful
to see how well your model performed, because it takes sample points from your regression
image (predicted values) and plots it against your training data (observed values). To make
this plot, paste the following under the comment that reads “**** Predicted vs Observed
Scatterplot – training data****”:
Note: if you hover over the top right cover of this plot, you’ll be able to see the R^2 value as well. Here’s
what your plot should look like. Note that the R^2 value prints on the plot.
What would it look like to plotted a similar chart for the classification? Would that be informative?
Part 6: Validation
However, we can’t truly get a good sense of how well our data performed by looking at our training data
alone. We’re going to perform similar assessments on our validation data to see how well our model did
on data that weren’t used to train it.
B. Compute RMSE
1. Next, we’ll compute the RMSE again, for the validation data. Copy and paste this code below
the comment that reads “ **** Compute RMSE – validation ***”.
Part 7: Export
A. Choose export settings for Gmail vs Google Cloud Project
1. Now that you’ve created and evaluated your model, you can export it for future use—take it
to ArcMap or your favorite GIS.
i. The export command you will use will depend on whether you are using Earth Engine with
a personal gmail account, or a USDA account.
(a) If you are using a personal gmail account, use the Export.image.toDrive() function.
(b) If you are using a USDA account, use the Export.image.toCloudStorage() function,
being sure to specify a cloud storage bucket.
(c) Both options are provided below for you. If you wish to use the
Export.image.toCloudStorage() function, simply delete the /* and */ before and
after the code block, in order to uncomment it.
2. Copy this code below the comment that reads “**** Export classification **** “.
/*
// If using USDA acct: Export to Cloud Storage
// Export loss layers to cloud storage
Export.image.toCloudStorage({image: regression,
description: exportName,
bucket: cloudStorageBucket, // update with name of Cloud
Storage Bucket here
fileNamePrefix: exportName,
region: compositeArea,
scale: 30,
crs: crs,
maxPixels: 1e13});
*/
3. After you run the script, the Tasks tab on the right side of the pane will turn orange, indicating
that there are export tasks that can be run.
6. In the window that pops up, you will see the export parameters. We have already specified
these in our script. Check to make sure everything looks ok, and click “Run.” The export may
take upwards of 10 minutes to complete, so be patient!
7. Notice that we are sending the export to a folder in your Google Drive called
“DigitalSoilMapping.” If this folder doesn’t already exist, this command will create it.
8. Navigate to your google drive, locate the DigitalSoilMapping folder, and click to open it.
9. Right click to download the file, which should be titled “Essex_VT_DSM_regression.tif”. Now,
you can open this up in your GIS of choice.
Part 8: Discussion
A. Get thinking!
1. Now that we’ve completed our random forest regression and have run our final script, we
should assess our results.
i. Looking through the console at the figures and statistics, is anything catching your
attention? Are you surprised by any of your results?
ii. Observe your mapped final regression – is anything standing out? Would you say this is a
“good” model? What can you do to improve it?
2. These are just a few questions to get the gears moving!
Congratulations! You have successfully completed this exercise. You have used a variety of
techniques to perform random forest regression in Google Earth Engine.