0% found this document useful (0 votes)
13 views

Project Description

Uploaded by

hzeng0428
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views

Project Description

Uploaded by

hzeng0428
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Group project SDSC5002

This project will examine European call option pricing data on the S&P 500. A European call option
gives the holder the right (but not the obligation) to purchase an asset at a given time for a given price.
Valuing such an option is tricky because it depends on the future value of the underlying asset.

The Black-Scholes option pricing formula provides an approach for valuing such options. Let K denote
the strike price, i.e., the price one must pay to purchase the asset, and τ (tau) the time until the
expiration of the option. Suppose that the asset in question is currently trading at S, and has “volatility”
(i.e., risk or standard deviation) of σ. Finally, suppose that the annual risk-free interest rate is r. Then
the Black-Scholes formula states

Cpred = SΦ(d1 ) − Ke−rτ Φ(d2 )


2 )τ √
where Cpred is the predicted option value. d1 = log(S/K)+(r+σ

σ τ
, d2 = d1 − σ τ and Φ(x) represents
the probability that a standard normal random variable will take on a value less than or equal to x.

Project summary
The 1997 Nobel Prize in Economics was awarded for the Black-Scholes formula because it works re-
markably well in practice. However, in this project, we are going to attempt to build statistical models
to perform the same task. In this project, you should pretend that you don’t know the Black-Scholes
formula when building your machine learning models (e.g., logistics, KNN, etc.).

Datasets and goals:


You will find two data sets: option train.csv and option test wolabel.csv. The training data set has
information on 1,680 separate options. In particular, for each option, we have the following variables

• Value (C): Current option value

• S: Current asset value

• K: Strike price of option

• r: Annual interest rate

• τ : Time to maturity (in years)

• BS: The Black-Scholes formula was applied to this data (using some σ) to get Cpred . If an option
has Cpred –C > 0, i.e., the prediction overestimated the option value, we associate that option by
(Over); otherwise, we associate that option with (Under).

The test data set is similar, except it has only 1, 120 options and is missing the Value and BS variables.
You can safely assume that the test data is of good quality, but you should check for missing and
erroneous entries in the training data.

The core idea of the project is to use the training data to build statistical/ML models with

1. Value as the response (i.e., a regression problem) and then

2. BS as the response (i.e., a classification problem).

1
The other four variables will be used as the predictors. You will explore the regression (for Value) and
classification (for BS) methods, regardless of whether we have covered them in the course. Ultimately
you will select what you consider to be the most accurate approach and use it to make predictions for
C and BS on the 1, 120 options in the test data set. You will submit these two sets of predictions. I will
compare these predictions in comparison to the actual Value and BS results on the test options (which
I have), in terms of out-of-sample R squared and classification error, respectively.

For BS you must submit a column of 1’s and 0’s (not words or probabilities) with 1 corresponding to a
prediction of “Over” and 0 to a prediction of “Under”.

You submit your predictions for Value and BS in csv file with two columns (with Value and BS as the
column names). For example, group x should submit group x prediction.csv. Please follow this naming
convention. See the sample submission attached (group 0 prediction.csv).

Grading
The project will be graded out of 20 points. 14 points will be allocated to the project report, 5 points
will be allocated to the presentation, and 1 point will be allocated to on-time slide submission.

• Project Report

– Write Up (10pt): See the next page for further instructions.


– Value Prediction (2pt): It is easy to get 90%. I will allocate 0 point for < 90%, 1 for between
90% and 94%, and 2 for > 94%.
– BS Prediction (2pt): This problem is relatively easy. You should be able to get a classification
error of at most 10% on the test data. Hence, I will allocate 0 point for anything more than
10% (>10%), 1 point for rates between 8% and 10%, and 2 points for rates below 8%(<8%).

• 15-min Presentation (5pt): 5 excellent; 4 very good; 3 good; 1-2 below the bar.

• On-time Slide Submission (1pt)

Choosing your own project (Optional)


You can select a dataset of your choice for the final project. If you choose this option, you don’t need
to complete the assigned project described above. However, you need to submit a (1-2 page) project
proposal in Week 10. The proposal will not be graded but is required for this option. The proposal
needs to describe the project idea, questions being examined, dataset being used, analysis pipeline, and
how expected results are going to be impactful or useful. You should still submit a final project report
and give a in-class presentation with slides.
The final report (at least 6 pages) should contain not only the results but also detailed explanation
of the data, problem being addressed, methods being applied, visualization results, and interpretation
to demonstrate your knowledge on EDA & visualization principles and techniques. The project proposal
and final report should be in PDF format.

2
Instructions for write-up:
You will submit a report that includes a list of summary statistics (EDA) you computed and the plots
you generated. (At least 2 EDA and 3 plots.) For each EDA and plot, please provide the following
explanations:

• Why? - State the rationale behind producing the specific EDA or plot.

• When and Where? - Specify the context in which you utilized the EDA or plot (e.g., dataset
summary, feature selection, evaluation, etc.).

• What? - Describe what information or insights are demonstrated by the EDA or plot.

• How? - Explain how the EDA or plot contributes to achieving your goal or objective in using it.

Timeline:

1. Nov 6th: project proposal due (required if choosing your own project)

2. Nov 19th: slide submission

3. Nov 20th and Nov 27th: presentations in lecture (The order of presentations will be ran-
domly assigned to each group.)

4. Nov 30th: report and prediction submission

Example:
i) Figure:

Neural Network
0.8

0.6
errors

0.4

0.2

0.0
error1 error23 error21 error31 error32 overall

method original NP−adjusted

Figure 1: The figure is used for XXXX(your goal)XXX and appears at XXXX(the place)XXXX (e.g.,
feature selection step). It plots XXXX (the content) XXXX (e.g., the distributions of approximate errors
for the neural network approach and the NP-adjust classifier. “error1”, “error23”, “error21”, “error32”,
“overall” correspond to R1⋆ , R2⋆ , P2 (Ŷ = 1), P3 (Ŷ = 1), P3 (Ŷ = 2) and P (Ŷ ̸= Y ), respectively.)
The plot shows that XXXX(the message)XXXX (e.g., the NP-adjust method has a powerful control on
error1 and error23 but has slightly higher overall classification errors. Therefore, ....).

ii) EDA:

3
Neural Network
Method Error1 Error23 Error21 Error31 Error32 Overall
classical 0.403 0.153 0.370 0.404 0.304 0.520
NP-adjusted 0.164 0.087 0.666 0.683 0.141 0.552

Table 1: The table is used for XXXX(your goal)XXX and appears at XXXX(the place)XXXX. It plots
XXXX (the content) XXXX (e.g., the averages of approximate errors for ....) The results show that
XXXX(the message)XXXX.

You might also like