Project Description
Project Description
This project will examine European call option pricing data on the S&P 500. A European call option
gives the holder the right (but not the obligation) to purchase an asset at a given time for a given price.
Valuing such an option is tricky because it depends on the future value of the underlying asset.
The Black-Scholes option pricing formula provides an approach for valuing such options. Let K denote
the strike price, i.e., the price one must pay to purchase the asset, and τ (tau) the time until the
expiration of the option. Suppose that the asset in question is currently trading at S, and has “volatility”
(i.e., risk or standard deviation) of σ. Finally, suppose that the annual risk-free interest rate is r. Then
the Black-Scholes formula states
Project summary
The 1997 Nobel Prize in Economics was awarded for the Black-Scholes formula because it works re-
markably well in practice. However, in this project, we are going to attempt to build statistical models
to perform the same task. In this project, you should pretend that you don’t know the Black-Scholes
formula when building your machine learning models (e.g., logistics, KNN, etc.).
• BS: The Black-Scholes formula was applied to this data (using some σ) to get Cpred . If an option
has Cpred –C > 0, i.e., the prediction overestimated the option value, we associate that option by
(Over); otherwise, we associate that option with (Under).
The test data set is similar, except it has only 1, 120 options and is missing the Value and BS variables.
You can safely assume that the test data is of good quality, but you should check for missing and
erroneous entries in the training data.
The core idea of the project is to use the training data to build statistical/ML models with
1
The other four variables will be used as the predictors. You will explore the regression (for Value) and
classification (for BS) methods, regardless of whether we have covered them in the course. Ultimately
you will select what you consider to be the most accurate approach and use it to make predictions for
C and BS on the 1, 120 options in the test data set. You will submit these two sets of predictions. I will
compare these predictions in comparison to the actual Value and BS results on the test options (which
I have), in terms of out-of-sample R squared and classification error, respectively.
For BS you must submit a column of 1’s and 0’s (not words or probabilities) with 1 corresponding to a
prediction of “Over” and 0 to a prediction of “Under”.
You submit your predictions for Value and BS in csv file with two columns (with Value and BS as the
column names). For example, group x should submit group x prediction.csv. Please follow this naming
convention. See the sample submission attached (group 0 prediction.csv).
Grading
The project will be graded out of 20 points. 14 points will be allocated to the project report, 5 points
will be allocated to the presentation, and 1 point will be allocated to on-time slide submission.
• Project Report
• 15-min Presentation (5pt): 5 excellent; 4 very good; 3 good; 1-2 below the bar.
2
Instructions for write-up:
You will submit a report that includes a list of summary statistics (EDA) you computed and the plots
you generated. (At least 2 EDA and 3 plots.) For each EDA and plot, please provide the following
explanations:
• Why? - State the rationale behind producing the specific EDA or plot.
• When and Where? - Specify the context in which you utilized the EDA or plot (e.g., dataset
summary, feature selection, evaluation, etc.).
• What? - Describe what information or insights are demonstrated by the EDA or plot.
• How? - Explain how the EDA or plot contributes to achieving your goal or objective in using it.
Timeline:
1. Nov 6th: project proposal due (required if choosing your own project)
3. Nov 20th and Nov 27th: presentations in lecture (The order of presentations will be ran-
domly assigned to each group.)
Example:
i) Figure:
Neural Network
0.8
0.6
errors
0.4
0.2
0.0
error1 error23 error21 error31 error32 overall
Figure 1: The figure is used for XXXX(your goal)XXX and appears at XXXX(the place)XXXX (e.g.,
feature selection step). It plots XXXX (the content) XXXX (e.g., the distributions of approximate errors
for the neural network approach and the NP-adjust classifier. “error1”, “error23”, “error21”, “error32”,
“overall” correspond to R1⋆ , R2⋆ , P2 (Ŷ = 1), P3 (Ŷ = 1), P3 (Ŷ = 2) and P (Ŷ ̸= Y ), respectively.)
The plot shows that XXXX(the message)XXXX (e.g., the NP-adjust method has a powerful control on
error1 and error23 but has slightly higher overall classification errors. Therefore, ....).
ii) EDA:
3
Neural Network
Method Error1 Error23 Error21 Error31 Error32 Overall
classical 0.403 0.153 0.370 0.404 0.304 0.520
NP-adjusted 0.164 0.087 0.666 0.683 0.141 0.552
Table 1: The table is used for XXXX(your goal)XXX and appears at XXXX(the place)XXXX. It plots
XXXX (the content) XXXX (e.g., the averages of approximate errors for ....) The results show that
XXXX(the message)XXXX.