0% found this document useful (0 votes)

22 views

Capitulo 1 Big data uc3m

The document provides an overview of statistical learning, categorizing it into supervised and unsupervised learning, and discusses its applications in predicting wages, stock market movements, and gene expression data. It also outlines the historical development of statistical learning methods from the 19th century to recent advancements. Additionally, it emphasizes the relevance of statistical learning across various disciplines and the importance of applying these methods to real-world problems.

Uploaded by

100473538

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

22 views

Capitulo 1 Big data uc3m

Uploaded by

100473538

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

1. Overview 2. History 3.

Premises

Big data for Business

CHAPTER 1: INTRODUCTION

Department of Statistics
Universidad Carlos III de Madrid

Bachelor in Business Administration

Bachelor in Finance and Accounting

1 / 10
1. Overview 2. History 3. Premises

An overview of Statistical Learning

• Statistical learning refers to a vast set of tools for understanding

data.

• These tools can be classified as:

Supervised learning: involves building a statistical model for

predicting, or estimating, an output based on one or
more inputs.
Unsupervised learning: there are inputs but no supervising output;
nevertheless we can learn relationships and structure
from such data.

2 / 10
1. Overview 2. History 3. Premises

Wage data

We examine a number of factors that relate to wages for a group of

males from the Atlantic region of the United States.
300

300

300
200

200

200
Wage

Wage

Wage
50 100

50 100

50 100
20 40 60 80 2003 2006 2009 1 2 3 4 5

Age Year Education Level

The Wage data involves predicting a continuous or quantitative output

value. This is often referred to as a regression problem.

3 / 10
1. Overview 2. History 3. Premises

Wage data

• wage as a function of age. On average, wage increases with age until

about 60 years, at which point it begins to decline.

• wage as a function of year. There is a slow but steady increase of

approximately 10, 000 in the average wage between 2003 and 2009.

• Boxplots displaying wage as a function of education, with 1 indicating

the lowest level (no high school diploma) and 5 the highest level (an
advanced graduate degree). On average, wage increases with the level
of education.
4 / 10
1. Overview 2. History 3. Premises

Stock Market data

We examine a stock market data set that contains the daily movements
in the Standard & Poor’s 500 (S&P) stock index over a 5-year period
between 2001 and 2005.
Yesterday Two Days Previous Three Days Previous
6

6
Percentage change in S&P

Percentage change in S&P

4
2

2
0

0
−2

−2

−2
−4

−4

−4
Down Up Down Up Down Up

Today’s Direction Today’s Direction Today’s Direction

The Stock Market data involves predicting a categorical or qualitative

output value. This is often referred to as a classification problem.
5 / 10
1. Overview 2. History 3. Premises

Stock Market data

• The left-hand panel displays two boxplots of the previous day’s percentage
changes in the stock index.

• The two plots look almost identical, suggesting that there is no simple strategy
for using yesterday’s movement in the S&P to predict today’s returns.

• The remaining panels, which display boxplots for the percentage changes 2 and
3 days previous to today, similarly indicate little association between past and
present returns.

6 / 10
1. Overview 2. History 3. Premises

Gene Expression data

We consider the NCI60 data set, which consists of 6, 830 gene expression
measurements for each of 64 cancer cell lines.
20

20
0

0
Z2

Z2
−20

−20
−40

−40
−60

−60
−40 −20 0 20 40 60 −40 −20 0 20 40 60

Z1 Z1

Instead of predicting a particular output variable, we are interested in

determining whether there are groups, or clusters, among the cell lines
based on their gene expression measurements. This is often referred to as
a clustering problem.
7 / 10
1. Overview 2. History 3. Premises

Gene Expression data

• Each point corresponds to one of the 64 cell lines. Left: There

appear to be four groups of cell lines, which we have represented
using different colors.

• The right panel shows the same as left panel except that we have
represented each of the 14 different types of cancer using a different
colored symbol. Cell lines corresponding to the same cancer type
tend to be nearby in the two-dimensional space.
8 / 10
1. Overview 2. History 3. Premises

A Brief History of Statistical Learning

• At the beginning of the XIX century, Legendre and Gauss developed the
method of least squares, now known as linear regression.
• In 1936, Fisher proposed linear discriminant analysis.
• In the 1940s, various authors put forth the logistic regression.
• In the 1970s, Nelder and Wedderburn developed generalized linear models.
• By the 1980s, computing technology improved sufficiently that non-linear
methods were no longer prohibitive. Breiman, Friedman, Olshen and Stone
introduced classification and regression trees.
• In 1986, Hastie and Tibshirani proposed generalized additive models.
• Inspired by the advent of machine learning and other disciplines,
statistical learning emerged as a new subfield in statistics.
• In recent years, progress has been marked by the increasing availability of
powerful and relatively user-friendly software, like R.

9 / 10
1. Overview 2. History 3. Premises

Four premises

• Many statistical learning methods are relevant and useful in a wide

range of academic and non-academic disciplines, beyond just the
statistical sciences.

• Statistical learning should not be viewed as a series of black boxes.

• While it is important to know what job is performed by each cog, it

is not necessary to have the skills to construct the machine inside
the box!

• Interest is focused on applying statistical learning methods to

real-world problems.

10 / 10

All Models Are Wrong
No ratings yet
All Models Are Wrong
429 pages
ICT515_LEC1
No ratings yet
ICT515_LEC1
70 pages
8
No ratings yet
8
1 page
ML_Valkenborg
No ratings yet
ML_Valkenborg
84 pages
Machine Learning: What Is Data Science
No ratings yet
Machine Learning: What Is Data Science
15 pages
Day 2. Lecture - Machinelearning
No ratings yet
Day 2. Lecture - Machinelearning
32 pages
ML_Introduction
No ratings yet
ML_Introduction
76 pages
Summer of Science-Final Report
100% (1)
Summer of Science-Final Report
7 pages
Introduction To Statistics
No ratings yet
Introduction To Statistics
54 pages
1
No ratings yet
1
14 pages
Data Science Activity
No ratings yet
Data Science Activity
11 pages
Machine Learning
No ratings yet
Machine Learning
33 pages
Homework
No ratings yet
Homework
6 pages
DS-05 Introduction To Machine Learning
No ratings yet
DS-05 Introduction To Machine Learning
103 pages
Lecture 1 Introduction Lecture 2-9-2024
No ratings yet
Lecture 1 Introduction Lecture 2-9-2024
63 pages
Machine Learning Unit-1.2
No ratings yet
Machine Learning Unit-1.2
38 pages
Machine Learning and Data Mining
No ratings yet
Machine Learning and Data Mining
88 pages
Statistics: New Foundations, Toolbox, and Machine Learning Recipes
No ratings yet
Statistics: New Foundations, Toolbox, and Machine Learning Recipes
309 pages
Lecture 3 - Machine learning and data driven analysis
No ratings yet
Lecture 3 - Machine learning and data driven analysis
36 pages
ML Chapter 1
No ratings yet
ML Chapter 1
41 pages
1 Statistical Learning
No ratings yet
1 Statistical Learning
42 pages
[BOOK] a Primer in Econometric Theory - Stachurski 2016
No ratings yet
[BOOK] a Primer in Econometric Theory - Stachurski 2016
398 pages
ML Merge
No ratings yet
ML Merge
145 pages
Islp 2
No ratings yet
Islp 2
6 pages
Ch2_Statistical_Learning
No ratings yet
Ch2_Statistical_Learning
51 pages
Regression 0
No ratings yet
Regression 0
108 pages
10 Statistical Techniques
No ratings yet
10 Statistical Techniques
9 pages
unit 1
100% (1)
unit 1
13 pages
Big Data Mid Term
No ratings yet
Big Data Mid Term
14 pages
Introduction To Statistical Modeling With SAS/STAT Software
No ratings yet
Introduction To Statistical Modeling With SAS/STAT Software
60 pages
Machine Learning
No ratings yet
Machine Learning
64 pages
Assignment_DADS303_MBA 3_Set 1 and 2
No ratings yet
Assignment_DADS303_MBA 3_Set 1 and 2
9 pages
StatLearning3r PDF
No ratings yet
StatLearning3r PDF
136 pages
Review - 1 Machine Learning: - D.Malakondaiah Chowdary (160050051)
No ratings yet
Review - 1 Machine Learning: - D.Malakondaiah Chowdary (160050051)
12 pages
01-intro
No ratings yet
01-intro
22 pages
Andrew NG Complete Machine Learning
No ratings yet
Andrew NG Complete Machine Learning
170 pages
Statistical Regression and Classification From Linear Models to Machine Learning 1st Edition Norman Matloff instant download
No ratings yet
Statistical Regression and Classification From Linear Models to Machine Learning 1st Edition Norman Matloff instant download
55 pages
Anintroductiontomachinelearning: Michaelclark Centerforsocialresearch Universityofnotredame
No ratings yet
Anintroductiontomachinelearning: Michaelclark Centerforsocialresearch Universityofnotredame
43 pages
Week 1
No ratings yet
Week 1
9 pages
(eBook PDF) Business Statistics 4th Edition by Norean D. Sharpeinstant download
No ratings yet
(eBook PDF) Business Statistics 4th Edition by Norean D. Sharpeinstant download
52 pages
Machine Learning
No ratings yet
Machine Learning
137 pages
ML by Andrew NG
No ratings yet
ML by Andrew NG
2 pages
INTROSTAT Ebook PDF
100% (1)
INTROSTAT Ebook PDF
343 pages
DAML - Lecture Notes
No ratings yet
DAML - Lecture Notes
35 pages
Machine Learning Lecture1
No ratings yet
Machine Learning Lecture1
56 pages
Ebooks File (Ebook PDF) Business Statistics 4th Edition by Norean D. Sharpe All Chapters
100% (1)
Ebooks File (Ebook PDF) Business Statistics 4th Edition by Norean D. Sharpe All Chapters
49 pages
ML-1-PPT-UNIT-1
No ratings yet
ML-1-PPT-UNIT-1
93 pages
Overview.: 1.1 Statistical Learning
No ratings yet
Overview.: 1.1 Statistical Learning
2 pages
Unit 1
No ratings yet
Unit 1
21 pages
Machine Learning
No ratings yet
Machine Learning
41 pages
BTMMeeting25Nov2020-StatisticalLearning
No ratings yet
BTMMeeting25Nov2020-StatisticalLearning
49 pages
Intro To Data Science Lecture 1
No ratings yet
Intro To Data Science Lecture 1
7 pages
Introduction To Statistical Learning
No ratings yet
Introduction To Statistical Learning
16 pages
Unit Iii Supervised Learning
No ratings yet
Unit Iii Supervised Learning
67 pages
R Data Analysis
No ratings yet
R Data Analysis
10 pages
Machine Learning Concepts
No ratings yet
Machine Learning Concepts
68 pages
Course Details
No ratings yet
Course Details
6 pages
Mastering Financial Analysis for Smarter Decisions
From Everand
Mastering Financial Analysis for Smarter Decisions
Mandaakin Deshpande
No ratings yet
Ready to Be an Educational Leader
From Everand
Ready to Be an Educational Leader
Desiree Alexander
No ratings yet
Towards an XBRL-enabled corporate governance reporting taxonomy.: An empirical study of NYSE-listed Financial Institutions
From Everand
Towards an XBRL-enabled corporate governance reporting taxonomy.: An empirical study of NYSE-listed Financial Institutions
Dirk Beerbaum
1/5 (1)
Machine Learning
No ratings yet
Machine Learning
11 pages
Enhancing Q-Learning Speed Using Selective Signal Injection
No ratings yet
Enhancing Q-Learning Speed Using Selective Signal Injection
4 pages
MOE All Model Exit Exam Answer
No ratings yet
MOE All Model Exit Exam Answer
52 pages
Comparative Analysis of K-Means and Fuzzy C-Means Algorithms
No ratings yet
Comparative Analysis of K-Means and Fuzzy C-Means Algorithms
5 pages
3 - 1 Logistic Regression
No ratings yet
3 - 1 Logistic Regression
9 pages
Word Beam Search A Connectionist Temporal Classification Decoding Algorithm
No ratings yet
Word Beam Search A Connectionist Temporal Classification Decoding Algorithm
6 pages
Instant Access to Quantitative Risk Analysis of Air Pollution Health Effects Louis Anthony Cox Jr. ebook Full Chapters
100% (4)
Instant Access to Quantitative Risk Analysis of Air Pollution Health Effects Louis Anthony Cox Jr. ebook Full Chapters
65 pages
UNIT-2 AI Project Cycle
No ratings yet
UNIT-2 AI Project Cycle
3 pages
KPMG Clara A Smart Audit Platform
No ratings yet
KPMG Clara A Smart Audit Platform
12 pages
Machine Learning Techniques Quantum
No ratings yet
Machine Learning Techniques Quantum
159 pages
Geometric Deep Learning
No ratings yet
Geometric Deep Learning
50 pages
Thyroid Disease Classification Using Machine Learning Project
No ratings yet
Thyroid Disease Classification Using Machine Learning Project
34 pages
Analysis and Prediction of Electric Vehicle CostsA Machine Learning Based Approach
No ratings yet
Analysis and Prediction of Electric Vehicle CostsA Machine Learning Based Approach
7 pages
Spam News Detection Report
No ratings yet
Spam News Detection Report
9 pages
Master Thesis Opportunities in Europe
100% (3)
Master Thesis Opportunities in Europe
6 pages
Potential Threats For The Auditing Profession, Audit Firms and Audit Processes Inherent in Using Emerging Technology
No ratings yet
Potential Threats For The Auditing Profession, Audit Firms and Audit Processes Inherent in Using Emerging Technology
11 pages
A Detection System For Stolen Vehicles Using Vehicle Attributes With Deep Learning
No ratings yet
A Detection System For Stolen Vehicles Using Vehicle Attributes With Deep Learning
4 pages
Resources For Machine Learning
No ratings yet
Resources For Machine Learning
2 pages
Answer To The Question No: (A) : Pattern Recognition Is The Process of Recognizing Patterns by Using
100% (1)
Answer To The Question No: (A) : Pattern Recognition Is The Process of Recognizing Patterns by Using
4 pages
AI Photo Enhancement (1)
No ratings yet
AI Photo Enhancement (1)
2 pages
17840_IA_eng_short
No ratings yet
17840_IA_eng_short
8 pages
ANFIS
No ratings yet
ANFIS
19 pages
Cyber Security Problem Statements
No ratings yet
Cyber Security Problem Statements
7 pages
Keras and Tensorflow
No ratings yet
Keras and Tensorflow
11 pages
CH 02 PPTaccessible
No ratings yet
CH 02 PPTaccessible
65 pages
LO1 Final
No ratings yet
LO1 Final
7 pages
Unveiling The World of Computer Science An Exploration Into Its Foundations, Evolution, and Impact
No ratings yet
Unveiling The World of Computer Science An Exploration Into Its Foundations, Evolution, and Impact
3 pages
KSC2016 - Recurrent Neural Networks
No ratings yet
KSC2016 - Recurrent Neural Networks
66 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
7 pages
ETHICS OF AI(2).pdf
No ratings yet
ETHICS OF AI(2).pdf
16 pages