0% found this document useful (0 votes)

11 views

Advanced Topics in Data Science

Uploaded by

hoang

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views

Advanced Topics in Data Science

Uploaded by

hoang

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

Advanced Topics in Data Science

2022-2023 Academic Year

Master of Research in Economics, Finance and Management

1. Description of the subject

● Advanced Topics in Data Science Code: 32282

● Total credits: 6 ECTS Workload: 150 hours
Term: 1st
● Type of subject: Elective
● Department of Economics and Business
● Teaching team: David Rossell
Advanced Topics in Data Science

2. Teaching guide

● Introduction
Statistical and Machine learning techniques are having a deep effect on many
disciplines, including Economics. Due to the increasing number of applications using
data science with techniques popular in Economics many World-leading institutions
incorporated data science in their PhD programs. As examples see the set of lectures
in the NBER Summer Institute by Guido Imbens and Susan Athey (in particular lecture
3, www.nber.org/econometrics_minicourse_2015/), the Big Data and machine
learning syllabus at the Harvard Economics PhD program
(https://locator.tlt.harvard.edu/course/colgsas-156429), the Chicago Booth PhD
courses on Bayesian inference, Big Data and Machine Learning, or the Bocconi PhD
courses in text analysis, Econometrics of networks and causal analysis.

Data analysis methods are evolving to cope with increasingly challenging problems,
a case in point being high-dimensional situations where one considers a large
number of parameters or models. For instance, one may have a regression or factor
model where the number of covariates far exceeds the sample size, or may want to
simplify the interpretation of complex data via clustering, text or latent variable
analysis, or may want to predict an outcome using flexible algorithms. Engaging
effectively in such research, either from a methodological or applied perspective,
requires one to understand, and when needed modify or extend, such methodology.
Just as importantly, they need to communicate such ideas effectively to a potentially
non-expert audience.

The goal of this course is to introduce students to some foundations behind these
methods, with a certain emphasis on the Bayesian framework, penalized likelihood
and latent variable methods (e.g. as used in text analysis) methods, expose them to
and discuss research literature, and practice the skills needed for applying and
presenting novel research. The learning outcomes are an improved familiarity with
selected research topics in Statistics that are relevant for Data Science, at a level
sufficient to critically appraise, modify and apply novel methods, and improved oral
and written presentation skills. The course also intends to provide students with
applied data analysis skills useful for their MRes thesis and subsequent research work.

Pre-requisites: the course is designed for MRes students who are familiar with basic
Statistical inference, specifically linear regression and maximum likelihood estimation.
Basic R programming skills are beneficial, though examples and links to learning
resources will be provided for students who are not familiar with R.

● Contents

1. Foundations. We briefly review classical results from maximum likelihood

estimation and computational methods such as the bootstrap, and we introduce
the basic Bayesian paradigm for statistical inference, its use for model selection,

2
Advanced Topics in Data Science

parameter estimation and prediction, and standard computational tools such as

Gibbs or Metropolis-Hastings.
2. Foundations of variable selection in high-dimensional regression. We review the
fundamental penalized likelihood and Bayesian frameworks for linear regression
models with a large number of variables, and for treatment effect estimation in
such settings. We shall discuss the relative merits of current strategies such as
LASSO, adaptive LASSO and related penalties to help decouple variable selection
from prediction and various Bayesian strategies to achieve good performance in
high dimensions. We will discuss theoretical and practical considerations and
computational strategies.
3. Beyond linear regression. We will extend the earlier strategies to settings where one
considers certain forms of causal inference, simple time series models,
generalized linear models, flexible models for count data, random forests and
Bayesian additive regression trees, or capturing non-linear relationships.
4. Mixture and flexible models. We shall move towards more flexible models enabled
by the use of latent variables. We will place some attention to mixture models and
some non-parametric methods. We shall discuss applications to mixture-of-
regressions models to account for unobserved confounders that may bias
inference, to hidden Markov models for time series and to text data analysis.

References. The books below provide a good introduction to a substantial part of

the topics covered in this course (and many others), however we shall complement
them with a number of additional selected research manuscripts.

● Andrew Gelman, John B. Carlin, Hals S. Stern, David B. Dunson, Aki Vehtari, Donald B.
Rubin. Bayesian Data Analysis (3rd edition). CRC Press, 2013.

● Trevor Hastie, Robert Tishirani, Martin Wainwright. Statistical learning with sparsity.
The LASSO and its generalizations. CRC press.

● Sara van de Geer, Peter Bühlman. Statistics for high-dimensional data: methods,
theory and applications. Springer, 2001.

● Nils Lid Hjort, Chris Holmes, Peter Müller, Stephen G. Walker. Bayesian non-
parametrics. Cambridge University Press, 2010.

● Sylvia Frühwirth-Schnatter. Finite mixture and Markov switching models. Springer,

2006.

Some papers on data science and economics:

● Chernozhukov, V., A. Belloni, C. Hansen (2014), "High-Dimensional Methods and

3
Advanced Topics in Data Science

Inference on Treatment and Structural Effects in Economics", J. Economic

Perspectives

● Chernozhukov, V., D. Chen, A. Belloni, C. Hansen (2012), "Sparse Models and Methods
for Instrumental Regression, with an Application to Eminent Domain",
Econometrica

● Einav, L. y J. Levin (2014), “Economics in the age of big data,” Science, 346 (6210).

● Varian, H. (2014), “Big data: new tricks for econometrics,” Journal of Economic
Perspectives, 28 (2), 3-28.

● Teaching methodology

The course will be delivered in a combination of regular lectures, computer-based

seminars where students get hands-on experience with the taught data analysis
methods, and presentations by students (on published manuscripts chosen by the
students and on the final project).

● Assessment and Grading System

Students will be asked to orally present 2 research papers of their choice (40%
of the final mark), some selected exercises from the seminar sessions (10% of
final mark) and a written report (50% of the final mark). This project will be
decided by the students but must be pre-approved by the lecturer, and should
involve the application, critical assessment or extension of the research
methods seen in class. The content can be theoretical, empirical, a practical
application or a combination of the former.

Encyclopedia of Research Design, 3 Volumes (2010) by Neil J. Salkind PDF
85% (34)
Encyclopedia of Research Design, 3 Volumes (2010) by Neil J. Salkind PDF
1,644 pages
Test Bank for Statistics Tool for Social Research and Data Analysis 5th Edition by Healey
No ratings yet
Test Bank for Statistics Tool for Social Research and Data Analysis 5th Edition by Healey
7 pages
SPPH-500 - Course Outline 2022
No ratings yet
SPPH-500 - Course Outline 2022
5 pages
Machine Learning - A First Course For Engineers and Scientists
No ratings yet
Machine Learning - A First Course For Engineers and Scientists
348 pages
Econ 255-Course Outline 2022-2023 Academic Year
No ratings yet
Econ 255-Course Outline 2022-2023 Academic Year
6 pages
Reza N. Jazar - Approximation Methods in Science and Engineering-Springer (2020)
100% (2)
Reza N. Jazar - Approximation Methods in Science and Engineering-Springer (2020)
544 pages
Executive Briefing For The Santa Fe Grill Case Study
0% (1)
Executive Briefing For The Santa Fe Grill Case Study
10 pages
Ambedkar University Delhi, Kashmere Gate: Content: What Is This Course About?
No ratings yet
Ambedkar University Delhi, Kashmere Gate: Content: What Is This Course About?
2 pages
Epp 7203 Econometrics
No ratings yet
Epp 7203 Econometrics
4 pages
MBAN-603DE - Decision Making Methods & Tools
No ratings yet
MBAN-603DE - Decision Making Methods & Tools
3 pages
Applied Multivariate Statistics With R
No ratings yet
Applied Multivariate Statistics With R
4 pages
Programme 2ndy
No ratings yet
Programme 2ndy
15 pages
Dokumen.pub Introductory Applied Statistics With Resampling Methods Amp r 3031277406 9783031277405
No ratings yet
Dokumen.pub Introductory Applied Statistics With Resampling Methods Amp r 3031277406 9783031277405
197 pages
(eBook PDF) Stat2 : Building Models for a World of Data instant download
100% (1)
(eBook PDF) Stat2 : Building Models for a World of Data instant download
57 pages
(eBook PDF) Stat2 : Building Models for a World of Data download
100% (4)
(eBook PDF) Stat2 : Building Models for a World of Data download
46 pages
Regression Modeling Strategies - With Applications To Linear Models by Frank E. Harrell
100% (4)
Regression Modeling Strategies - With Applications To Linear Models by Frank E. Harrell
598 pages
(eBook PDF) Stat2 : Building Models for a World of Data download pdf
100% (7)
(eBook PDF) Stat2 : Building Models for a World of Data download pdf
55 pages
Statistical Analysis For Social Science Research
No ratings yet
Statistical Analysis For Social Science Research
7 pages
MST1102 Course Outline
No ratings yet
MST1102 Course Outline
6 pages
GBA 7023 Research Methods Syllabus Spring 2023 01162023
No ratings yet
GBA 7023 Research Methods Syllabus Spring 2023 01162023
15 pages
Syllabus 950803 Quantitative Research Method For Public Policy
No ratings yet
Syllabus 950803 Quantitative Research Method For Public Policy
3 pages
004-5-MATH 361 Probability & Statistics
No ratings yet
004-5-MATH 361 Probability & Statistics
1 page
Syllabus 950803 Quantitative Research Methods For Public Policy
100% (1)
Syllabus 950803 Quantitative Research Methods For Public Policy
3 pages
(eBook PDF) Stat2 : Building Models for a World of Data 2024 scribd download
100% (6)
(eBook PDF) Stat2 : Building Models for a World of Data 2024 scribd download
56 pages
Statistical Data Project
No ratings yet
Statistical Data Project
4 pages
Lecture Note On Basic Business Statistics - I Mustafe Jiheeye-1
No ratings yet
Lecture Note On Basic Business Statistics - I Mustafe Jiheeye-1
81 pages
Course Content (Qr-i, Qr-II) Bos 2024
No ratings yet
Course Content (Qr-i, Qr-II) Bos 2024
4 pages
Statistics Half Course
No ratings yet
Statistics Half Course
2 pages
MDSH--Course Description
No ratings yet
MDSH--Course Description
3 pages
Case Study in The e Assessment of Statistics For Non Specialists
No ratings yet
Case Study in The e Assessment of Statistics For Non Specialists
22 pages
Robust Estimation and Testing
From Everand
Robust Estimation and Testing
Robert G. Staudte
3/5 (1)
Devore Wadsworth
No ratings yet
Devore Wadsworth
2 pages
Theoretical_Statistics
No ratings yet
Theoretical_Statistics
18 pages
Teaching Statistics and Data Analysis With R
No ratings yet
Teaching Statistics and Data Analysis With R
16 pages
Quantitative Reasoning I & II
No ratings yet
Quantitative Reasoning I & II
5 pages
Course Report: Master Complete Statistics For Computer Science - I
No ratings yet
Course Report: Master Complete Statistics For Computer Science - I
2 pages
Course Work Syallabus 2015-16
No ratings yet
Course Work Syallabus 2015-16
19 pages
ECON 330-Econometrics-Dr. Farooq Naseer
No ratings yet
ECON 330-Econometrics-Dr. Farooq Naseer
5 pages
Short Course - FINAL
No ratings yet
Short Course - FINAL
2 pages
Ten Steps in Scale Development and Reporting: A Guide For Researchers
No ratings yet
Ten Steps in Scale Development and Reporting: A Guide For Researchers
22 pages
Introduction To Mediation, Moderation and Conditional Process Analysis
No ratings yet
Introduction To Mediation, Moderation and Conditional Process Analysis
8 pages
Regression Modeling Strategies: With Applications To Linear Models, Logistic and Ordinal Regression, and Survival Analysis (Springer Series in Statistics) - ISBN 3319194240, 978-3319194240
100% (22)
Regression Modeling Strategies: With Applications To Linear Models, Logistic and Ordinal Regression, and Survival Analysis (Springer Series in Statistics) - ISBN 3319194240, 978-3319194240
23 pages
PHP2517 Spring2024
No ratings yet
PHP2517 Spring2024
8 pages
PG Diploma in Fashion Business Advance Market Research: Created by DR
No ratings yet
PG Diploma in Fashion Business Advance Market Research: Created by DR
8 pages
joy project
No ratings yet
joy project
46 pages
M I B A: Q A QA: Ronald T. Chua (RTC) and Enrico C. Angtuaco (ECA)
No ratings yet
M I B A: Q A QA: Ronald T. Chua (RTC) and Enrico C. Angtuaco (ECA)
5 pages
MaStat Leaflet
No ratings yet
MaStat Leaflet
9 pages
(2022-2023) EC4401 Course Outline
No ratings yet
(2022-2023) EC4401 Course Outline
10 pages
Harrell2001 Book RegressionModelingStrategies
No ratings yet
Harrell2001 Book RegressionModelingStrategies
583 pages
Statistics Course Outline
No ratings yet
Statistics Course Outline
5 pages
BS MBA - Syllabus 21-22 Paris
No ratings yet
BS MBA - Syllabus 21-22 Paris
3 pages
PGDBM 416 - Block1
No ratings yet
PGDBM 416 - Block1
246 pages
Applied Stat Data Analysis
No ratings yet
Applied Stat Data Analysis
10 pages
Applied Math PHD Dissertation
100% (1)
Applied Math PHD Dissertation
8 pages
PSY2206
No ratings yet
PSY2206
8 pages
AP Stats Detailed Syllabus
No ratings yet
AP Stats Detailed Syllabus
16 pages
Advanced Statistical Inference Lecture Notes
No ratings yet
Advanced Statistical Inference Lecture Notes
22 pages
Econometrics Coursework
100% (2)
Econometrics Coursework
7 pages
PSYC2012 Research Method
No ratings yet
PSYC2012 Research Method
6 pages
Course Details
No ratings yet
Course Details
6 pages
Thesis On Multilevel Modeling
100% (3)
Thesis On Multilevel Modeling
6 pages
NCTM PSSM FocalPoints CommonCoreStandards
No ratings yet
NCTM PSSM FocalPoints CommonCoreStandards
6 pages
Bs Math Thesis Topic
100% (3)
Bs Math Thesis Topic
8 pages
Teaching Biostatistics To Medical Personnel With Computer Based Supplement
No ratings yet
Teaching Biostatistics To Medical Personnel With Computer Based Supplement
8 pages
Solutions To Problem Set 1
No ratings yet
Solutions To Problem Set 1
4 pages
Satellite Image Classification
No ratings yet
Satellite Image Classification
13 pages
Module 4
No ratings yet
Module 4
17 pages
Guidelines For Master Paper - BOU
No ratings yet
Guidelines For Master Paper - BOU
5 pages
(Syll) ARSERCH - REV-Nov2018
No ratings yet
(Syll) ARSERCH - REV-Nov2018
13 pages
Statistics Is The Science of Learning From Data, and of Measuring, Controlling, and
No ratings yet
Statistics Is The Science of Learning From Data, and of Measuring, Controlling, and
2 pages
454-Article Text-1489-1-10-20220715
No ratings yet
454-Article Text-1489-1-10-20220715
10 pages
CH4-2-CDF ExpectedValue Variance
No ratings yet
CH4-2-CDF ExpectedValue Variance
30 pages
Big Data Machine Learning
100% (1)
Big Data Machine Learning
6 pages
p' = 0.23 p (1− p) n → o p →op σp' = 0.060
No ratings yet
p' = 0.23 p (1− p) n → o p →op σp' = 0.060
2 pages
Statistical Analysis With Missing Data, 2nd Edition | Wiley
No ratings yet
Statistical Analysis With Missing Data, 2nd Edition | Wiley
2 pages
Research Methodology Solved Mcqs Set 16
No ratings yet
Research Methodology Solved Mcqs Set 16
6 pages
Wa Biostatistics Unit 4
No ratings yet
Wa Biostatistics Unit 4
4 pages
Factors Influencing Touristic Demand and Its Modelling Possibilities
No ratings yet
Factors Influencing Touristic Demand and Its Modelling Possibilities
6 pages
GRU-based Attention Mechanism For Human Activity Recognition
No ratings yet
GRU-based Attention Mechanism For Human Activity Recognition
6 pages
Timeline of Statistics
100% (1)
Timeline of Statistics
1 page
TRIAL STPM Mathematics M 2 (JOHOR) SMK TunHussienOnn
No ratings yet
TRIAL STPM Mathematics M 2 (JOHOR) SMK TunHussienOnn
8 pages
CITATION Tay11 /L 1033
No ratings yet
CITATION Tay11 /L 1033
54 pages
Instructions: Section - A
No ratings yet
Instructions: Section - A
2 pages
Zagheni Weber2015
No ratings yet
Zagheni Weber2015
13 pages
Chi Square Hadiqa
No ratings yet
Chi Square Hadiqa
5 pages
Block 3
No ratings yet
Block 3
83 pages
Lesson No. 28
No ratings yet
Lesson No. 28
5 pages
Navigating The Challenges and Unlocking The Potential: An Exploration of Small Scale and Cottage Industries in Kalimpong
No ratings yet
Navigating The Challenges and Unlocking The Potential: An Exploration of Small Scale and Cottage Industries in Kalimpong
38 pages
Practical Research 2: CHAPTER 4. Understanding Data and Ways To Systematically Collect Data
No ratings yet
Practical Research 2: CHAPTER 4. Understanding Data and Ways To Systematically Collect Data
12 pages
Comparative Study of Accounting Softwares
No ratings yet
Comparative Study of Accounting Softwares
5 pages

Advanced Topics in Data Science

Uploaded by

Advanced Topics in Data Science

Uploaded by

Advanced Topics in Data Science

2022-2023 Academic Year

1. Description of the subject

● Advanced Topics in Data Science Code: 32282

1. Foundations. We briefly review classical results from maximum likelihood

parameter estimation and prediction, and standard computational tools such as

References. The books below provide a good introduction to a substantial part of

● Sylvia Frühwirth-Schnatter. Finite mixture and Markov switching models. Springer,

Some papers on data science and economics:

● Chernozhukov, V., A. Belloni, C. Hansen (2014), "High-Dimensional Methods and

Inference on Treatment and Structural Effects in Economics", J. Economic

The course will be delivered in a combination of regular lectures, computer-based

● Assessment and Grading System

You might also like