0% found this document useful (0 votes)

13 views

Project Description

Uploaded by

hzeng0428

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views

Project Description

Uploaded by

hzeng0428

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

Group project SDSC5002

This project will examine European call option pricing data on the S&P 500. A European call option
gives the holder the right (but not the obligation) to purchase an asset at a given time for a given price.
Valuing such an option is tricky because it depends on the future value of the underlying asset.

The Black-Scholes option pricing formula provides an approach for valuing such options. Let K denote
the strike price, i.e., the price one must pay to purchase the asset, and τ (tau) the time until the
expiration of the option. Suppose that the asset in question is currently trading at S, and has “volatility”
(i.e., risk or standard deviation) of σ. Finally, suppose that the annual risk-free interest rate is r. Then
the Black-Scholes formula states

Cpred = SΦ(d1 ) − Ke−rτ Φ(d2 )

2 )τ √
where Cpred is the predicted option value. d1 = log(S/K)+(r+σ
√
σ τ
, d2 = d1 − σ τ and Φ(x) represents
the probability that a standard normal random variable will take on a value less than or equal to x.

Project summary
The 1997 Nobel Prize in Economics was awarded for the Black-Scholes formula because it works re-
markably well in practice. However, in this project, we are going to attempt to build statistical models
to perform the same task. In this project, you should pretend that you don’t know the Black-Scholes
formula when building your machine learning models (e.g., logistics, KNN, etc.).

Datasets and goals:

You will find two data sets: option train.csv and option test wolabel.csv. The training data set has
information on 1,680 separate options. In particular, for each option, we have the following variables

• Value (C): Current option value

• S: Current asset value

• K: Strike price of option

• r: Annual interest rate

• τ : Time to maturity (in years)

• BS: The Black-Scholes formula was applied to this data (using some σ) to get Cpred . If an option
has Cpred –C > 0, i.e., the prediction overestimated the option value, we associate that option by
(Over); otherwise, we associate that option with (Under).

The test data set is similar, except it has only 1, 120 options and is missing the Value and BS variables.
You can safely assume that the test data is of good quality, but you should check for missing and
erroneous entries in the training data.

The core idea of the project is to use the training data to build statistical/ML models with

1. Value as the response (i.e., a regression problem) and then

2. BS as the response (i.e., a classification problem).

1
The other four variables will be used as the predictors. You will explore the regression (for Value) and
classification (for BS) methods, regardless of whether we have covered them in the course. Ultimately
you will select what you consider to be the most accurate approach and use it to make predictions for
C and BS on the 1, 120 options in the test data set. You will submit these two sets of predictions. I will
compare these predictions in comparison to the actual Value and BS results on the test options (which
I have), in terms of out-of-sample R squared and classification error, respectively.

For BS you must submit a column of 1’s and 0’s (not words or probabilities) with 1 corresponding to a
prediction of “Over” and 0 to a prediction of “Under”.

You submit your predictions for Value and BS in csv file with two columns (with Value and BS as the
column names). For example, group x should submit group x prediction.csv. Please follow this naming
convention. See the sample submission attached (group 0 prediction.csv).

Grading
The project will be graded out of 20 points. 14 points will be allocated to the project report, 5 points
will be allocated to the presentation, and 1 point will be allocated to on-time slide submission.

• Project Report

– Write Up (10pt): See the next page for further instructions.

– Value Prediction (2pt): It is easy to get 90%. I will allocate 0 point for < 90%, 1 for between
90% and 94%, and 2 for > 94%.
– BS Prediction (2pt): This problem is relatively easy. You should be able to get a classification
error of at most 10% on the test data. Hence, I will allocate 0 point for anything more than
10% (>10%), 1 point for rates between 8% and 10%, and 2 points for rates below 8%(<8%).

• 15-min Presentation (5pt): 5 excellent; 4 very good; 3 good; 1-2 below the bar.

• On-time Slide Submission (1pt)

Choosing your own project (Optional)

You can select a dataset of your choice for the final project. If you choose this option, you don’t need
to complete the assigned project described above. However, you need to submit a (1-2 page) project
proposal in Week 10. The proposal will not be graded but is required for this option. The proposal
needs to describe the project idea, questions being examined, dataset being used, analysis pipeline, and
how expected results are going to be impactful or useful. You should still submit a final project report
and give a in-class presentation with slides.
The final report (at least 6 pages) should contain not only the results but also detailed explanation
of the data, problem being addressed, methods being applied, visualization results, and interpretation
to demonstrate your knowledge on EDA & visualization principles and techniques. The project proposal
and final report should be in PDF format.

2
Instructions for write-up:
You will submit a report that includes a list of summary statistics (EDA) you computed and the plots
you generated. (At least 2 EDA and 3 plots.) For each EDA and plot, please provide the following
explanations:

• Why? - State the rationale behind producing the specific EDA or plot.

• When and Where? - Specify the context in which you utilized the EDA or plot (e.g., dataset
summary, feature selection, evaluation, etc.).

• What? - Describe what information or insights are demonstrated by the EDA or plot.

• How? - Explain how the EDA or plot contributes to achieving your goal or objective in using it.

Timeline:

1. Nov 6th: project proposal due (required if choosing your own project)

2. Nov 19th: slide submission

3. Nov 20th and Nov 27th: presentations in lecture (The order of presentations will be ran-
domly assigned to each group.)

4. Nov 30th: report and prediction submission

Example:
i) Figure:

Neural Network
0.8

0.6
errors

0.4

0.2

0.0
error1 error23 error21 error31 error32 overall

method original NP−adjusted

Figure 1: The figure is used for XXXX(your goal)XXX and appears at XXXX(the place)XXXX (e.g.,
feature selection step). It plots XXXX (the content) XXXX (e.g., the distributions of approximate errors
for the neural network approach and the NP-adjust classifier. “error1”, “error23”, “error21”, “error32”,
“overall” correspond to R1⋆ , R2⋆ , P2 (Ŷ = 1), P3 (Ŷ = 1), P3 (Ŷ = 2) and P (Ŷ ̸= Y ), respectively.)
The plot shows that XXXX(the message)XXXX (e.g., the NP-adjust method has a powerful control on
error1 and error23 but has slightly higher overall classification errors. Therefore, ....).

ii) EDA:

3
Neural Network
Method Error1 Error23 Error21 Error31 Error32 Overall
classical 0.403 0.153 0.370 0.404 0.304 0.520
NP-adjusted 0.164 0.087 0.666 0.683 0.141 0.552

Table 1: The table is used for XXXX(your goal)XXX and appears at XXXX(the place)XXXX. It plots
XXXX (the content) XXXX (e.g., the averages of approximate errors for ....) The results show that
XXXX(the message)XXXX.

Vijaya ML
88% (8)
Vijaya ML
26 pages
Sample Report Card Comments
95% (40)
Sample Report Card Comments
1 page
Predictive Modelling ALOK KUMAR
100% (1)
Predictive Modelling ALOK KUMAR
25 pages
Mini Projects One
0% (1)
Mini Projects One
59 pages
Cics Question Bank 1 of 28
No ratings yet
Cics Question Bank 1 of 28
28 pages
DS assignment COMPLETED DOC
No ratings yet
DS assignment COMPLETED DOC
11 pages
Em Semester Project
No ratings yet
Em Semester Project
21 pages
1 Introduction and Objectives: IEOR E4150 Introduction To Probability and Statistics CVN Fall 2019 Dr. A. B. Dieker
No ratings yet
1 Introduction and Objectives: IEOR E4150 Introduction To Probability and Statistics CVN Fall 2019 Dr. A. B. Dieker
3 pages
R Programming Exam: Instructions
No ratings yet
R Programming Exam: Instructions
2 pages
CS502M_project_spec
No ratings yet
CS502M_project_spec
8 pages
Matlab Fundamental 15
100% (1)
Matlab Fundamental 15
13 pages
ECON 460202E006 MLforBI2 S23o
No ratings yet
ECON 460202E006 MLforBI2 S23o
5 pages
ML_Final_Project
No ratings yet
ML_Final_Project
3 pages
Machine Learning Project: Sneha Sharma PGPDSBA Mar'21 Group 2
100% (4)
Machine Learning Project: Sneha Sharma PGPDSBA Mar'21 Group 2
36 pages
Description: Salarydata - CSV
No ratings yet
Description: Salarydata - CSV
4 pages
Maths Record Output .
No ratings yet
Maths Record Output .
24 pages
BUS2004 Ass3 Sem2 2024
No ratings yet
BUS2004 Ass3 Sem2 2024
2 pages
Lab 7 - Bias and Variance
No ratings yet
Lab 7 - Bias and Variance
5 pages
Vijayalakshmi
No ratings yet
Vijayalakshmi
17 pages
exp7
No ratings yet
exp7
8 pages
Predictive Modelling Project 2
100% (4)
Predictive Modelling Project 2
32 pages
assignment
No ratings yet
assignment
7 pages
Article Review 11 Eng
No ratings yet
Article Review 11 Eng
18 pages
Assignment 2 - Factor Hair
No ratings yet
Assignment 2 - Factor Hair
39 pages
Milestone - Applied Statistics
No ratings yet
Milestone - Applied Statistics
2 pages
Assignment 1 DA_E Oct 2023 V1-1 (3)
No ratings yet
Assignment 1 DA_E Oct 2023 V1-1 (3)
3 pages
1152CS239-Intro. To Data Science-Syllabus
No ratings yet
1152CS239-Intro. To Data Science-Syllabus
6 pages
Stat 305 Final Practice - Solutions
No ratings yet
Stat 305 Final Practice - Solutions
10 pages
Applied Reliability: Techniques For Reliability Analysis
100% (1)
Applied Reliability: Techniques For Reliability Analysis
91 pages
Sayan Pal Business Report Advance Statistics Assignment PDF
No ratings yet
Sayan Pal Business Report Advance Statistics Assignment PDF
13 pages
Interpretable Meta-Score For Model Performance
No ratings yet
Interpretable Meta-Score For Model Performance
19 pages
2a. Exploratory Data Analysis
No ratings yet
2a. Exploratory Data Analysis
7 pages
DS4420 Coding Midterm
No ratings yet
DS4420 Coding Midterm
5 pages
FINAL - CC01 - Group7
No ratings yet
FINAL - CC01 - Group7
23 pages
Final Cc01 Group7
No ratings yet
Final Cc01 Group7
23 pages
Problem Statements:: Inferential Statistics
0% (1)
Problem Statements:: Inferential Statistics
5 pages
Graph
No ratings yet
Graph
4 pages
Project_Stat_Fall 2023 (2)
No ratings yet
Project_Stat_Fall 2023 (2)
5 pages
Predictive Modelling
100% (1)
Predictive Modelling
58 pages
Problem-Set - 1 Practise Problems From Textbook
No ratings yet
Problem-Set - 1 Practise Problems From Textbook
2 pages
Practicefinalsolutions
No ratings yet
Practicefinalsolutions
7 pages
MidA-F21
No ratings yet
MidA-F21
8 pages
data_science_syllabus
No ratings yet
data_science_syllabus
4 pages
omkar
No ratings yet
omkar
37 pages
IBS Sample I
No ratings yet
IBS Sample I
10 pages
INF554: M L I E P AXA Data Challenge - Assignment: 1 Description of The Assignment
No ratings yet
INF554: M L I E P AXA Data Challenge - Assignment: 1 Description of The Assignment
6 pages
Briefly Explain The Trade-Offs Associated Between The Model Variance Versus Bias-Squared To Inform Model Selection
No ratings yet
Briefly Explain The Trade-Offs Associated Between The Model Variance Versus Bias-Squared To Inform Model Selection
7 pages
Unit 2
No ratings yet
Unit 2
48 pages
assignent
No ratings yet
assignent
18 pages
Pattern Recognition
No ratings yet
Pattern Recognition
26 pages
ALY 6000 Project 1
No ratings yet
ALY 6000 Project 1
6 pages
Kartik mlp 4-9prg (1)
No ratings yet
Kartik mlp 4-9prg (1)
10 pages
Predictive Modelling Anee Das Pgpdsba.o.june22.c
No ratings yet
Predictive Modelling Anee Das Pgpdsba.o.june22.c
20 pages
Milestone-FMT
No ratings yet
Milestone-FMT
2 pages
HSB1003_Sample Exam 2023
No ratings yet
HSB1003_Sample Exam 2023
9 pages
DADS301 MBA Sem 3programming in DS
No ratings yet
DADS301 MBA Sem 3programming in DS
10 pages
FAQ's - Applied Statistics
No ratings yet
FAQ's - Applied Statistics
3 pages
Machinelearning
No ratings yet
Machinelearning
3 pages
IS5312 Mini Project-2
No ratings yet
IS5312 Mini Project-2
5 pages
S-11
No ratings yet
S-11
7 pages
Advanced C++ Interview Questions You'll Most Likely Be Asked
From Everand
Advanced C++ Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet
Data Science with R: Beginner to Expert
From Everand
Data Science with R: Beginner to Expert
Narayana Nemani
No ratings yet
609FAC97880FA04C
No ratings yet
609FAC97880FA04C
10 pages
How To Write A Thesis For AP English
100% (3)
How To Write A Thesis For AP English
5 pages
1st Grade Dance Curriculum Standards
No ratings yet
1st Grade Dance Curriculum Standards
10 pages
Sustainable Built Environment
No ratings yet
Sustainable Built Environment
17 pages
PunchwizardManual Turret
No ratings yet
PunchwizardManual Turret
86 pages
s1080 Manual en v2 PDF
No ratings yet
s1080 Manual en v2 PDF
8 pages
Chemistry Atomic Structure 2 Important Questions With Key
No ratings yet
Chemistry Atomic Structure 2 Important Questions With Key
2 pages
Meaning and Nature of The Law of Torts Law of Torts: A Project ON
No ratings yet
Meaning and Nature of The Law of Torts Law of Torts: A Project ON
15 pages
DUCO GCR Explosion Proof Cobot
No ratings yet
DUCO GCR Explosion Proof Cobot
8 pages
Portfolio (9-10)-converted_converted_by_abcdpdf
No ratings yet
Portfolio (9-10)-converted_converted_by_abcdpdf
5 pages
Human Thanatomicrobiome Succession and Time Since Death
No ratings yet
Human Thanatomicrobiome Succession and Time Since Death
9 pages
Grammar in Focus 1 Unit Tests 1-5
No ratings yet
Grammar in Focus 1 Unit Tests 1-5
5 pages
RMBI SSLC (IL Version) 2023.7.23 With Amendments On 2023.8.4
No ratings yet
RMBI SSLC (IL Version) 2023.7.23 With Amendments On 2023.8.4
4 pages
Shoaib Ahmed: Objective
No ratings yet
Shoaib Ahmed: Objective
3 pages
Nonverbal Communication in The Workplace
No ratings yet
Nonverbal Communication in The Workplace
13 pages
CW For SW Installation - and - License - Activation - Guide PDF
No ratings yet
CW For SW Installation - and - License - Activation - Guide PDF
141 pages
Software Engineering in Practice
No ratings yet
Software Engineering in Practice
86 pages
Hirarc Form - Kosong
No ratings yet
Hirarc Form - Kosong
10 pages
3 PDF
No ratings yet
3 PDF
1 page
Grades 6 Daily Lesson Log: November 20, 2023 Monday (Modular)
No ratings yet
Grades 6 Daily Lesson Log: November 20, 2023 Monday (Modular)
13 pages
VLSI Design - Verilog
No ratings yet
VLSI Design - Verilog
30 pages
Gea31898 Ex2100e Excitation Control For Hydro Generators r7 PDF
No ratings yet
Gea31898 Ex2100e Excitation Control For Hydro Generators r7 PDF
2 pages
Digi CM 32 Manual
No ratings yet
Digi CM 32 Manual
158 pages
How Confusing Food Date Labels Lead To MASSIVE Food Waste in America
No ratings yet
How Confusing Food Date Labels Lead To MASSIVE Food Waste in America
64 pages
Introduction To Elementary and Higher Surveying
No ratings yet
Introduction To Elementary and Higher Surveying
8 pages
Inc Eddy 2011
100% (1)
Inc Eddy 2011
14 pages
We Don'T See Things As They Are, We See Things As We Are.
No ratings yet
We Don'T See Things As They Are, We See Things As We Are.
13 pages

Project Description

Uploaded by

Project Description

Uploaded by

Group project SDSC5002

Cpred = SΦ(d1 ) − Ke−rτ Φ(d2 )

Datasets and goals:

• Value (C): Current option value

• S: Current asset value

• K: Strike price of option

• r: Annual interest rate

• τ : Time to maturity (in years)

1. Value as the response (i.e., a regression problem) and then

2. BS as the response (i.e., a classification problem).

– Write Up (10pt): See the next page for further instructions.

• On-time Slide Submission (1pt)

Choosing your own project (Optional)

2. Nov 19th: slide submission

4. Nov 30th: report and prediction submission

method original NP−adjusted

You might also like