Practical 3 - Cropyield Forecasting - Exercise 5
Practical 3 - Cropyield Forecasting - Exercise 5
Exercise 5: Regression
Analysis with CST
1.0
Objective........................................................................................................................2
2.0 Indicators tab...........................................................................................................2
3.0Options tabY.............................................................................................................4
1.0 Objective
The objective of this exercise is to use the regression analysis of the CST to
forecastyield.
Q2: How many indicators are now available in the “Available indicators” selection box.
Which indicator is now available and why?
Q4: Some indicators are listed as having missing values. Particularly the satellite indicators
(SPOT-VGT) have many missing values. Can you explain this? Tip: have a look at the “Data
view” window in the trend analysis.
Add all indicators in the “available indicators selection box” into the box with “Free
indicators” using the >>button. When adding all indicators, the CgmsStatTool will
come up with the following window:
This is needed because all indicators and official yield need to be present in the
analysis. Press the “Synchronize” button to have all indicators synchronized over the
same years. To see what happened check the Time trend analysis page.
The CgmsStatTool can also compute the correlation matrix between the different
indicators. An indicator that is highly correlated (large absolute coefficient of
correlation) with another indicator has little additional value in the regression
analysis. A quick look on the correlation matrix will show you that all CGMS indicators
(01, 06, 08) are highly correlated. The sum of rainfall (05) and satellite derived
indicators (01, 02) are much less correlated.
The options tab lists a number of options for regression models (see Figure 3). The
most fundamental choice to make here is to use either “Single free indicator” or “Best
subset selection”. In the first case only one indicator at a time is used to build the
regression model, in the second case the CgmsStatTool will test many combinations
of indicators to find the best fitting model.
Figure 3: Explanation of functionality in the regression options window
Exercise 2: Single Free indicator analysis
Settings in the time trend analysis window:
Region: Limpopo
Crop: Total maize
period: June-I (first decade of June)
Start/End year: 1987-2015
Continue in the Regression Tab:
Add all indicator to the Free Indicators
Q3: A measure of the error on the crop yield prediction can be obtained from the
Root Mean Squared Error (RMSE) for prediction. What is the RMSE for the best
model and how does it compare to the Nonemodel?
What is the predicted yield?
Q4: Some of the cells with the t-values are marked either orange or red, what
does this mean?
Task2: Single Free indicator analysis
Repeat exercise 2, but set the period to April-III
Q1: What is the best model and what is the R values of this model?
2
Q2: What is the RMSE for the best model and how does it compare to the
Nonemodel.
Q3: What is the predicted crop yield (Prediction for target year)?
Q4: Is the predicted yield very different from the predicted yield in March-I
We will now repeat exercise 2 with the “Best subset selection”. Using the “Best
subset selection” we can try to build a better regression model and obtain a better
forecast with smaller error bounds.
TASK: Repeat exercise 3 above but now using yellow maize and FEB-II