0% found this document useful (0 votes)

2 views

Lecture 1 - Introduction to Data Analysis (1)

Data analysis is the process of inspecting and reporting data to make it useful for non-technical people, utilizing statistical tools to validate trends observed by data analysts. It involves understanding the difference between populations and samples, as well as descriptive and inferential statistics to draw conclusions. The document also discusses the importance of sampling methods, biases in sampling, and the distinction between observational studies and designed experiments.

Uploaded by

pedrano.vinzoliver2004

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views

Lecture 1 - Introduction to Data Analysis (1)

Uploaded by

pedrano.vinzoliver2004

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 37

INTRODUCTION TO

DATA ANALYSIS
Rosal Jane G. Ruda-Bayor
Introduction to Data Analysis 2

WHAT IS DATA ANALYSIS?

Data analysis is the process of inspecting, presenting and

Data Analysis reporting data in a way that is useful to non-technical
people. Data analysis uses probability and statistical
tools to analyze data from a sample.

While data analysts observe trends and patterns in a data,

statistics validates those theories using the scientific
Probability and process.
Statistics
Hence, data analysis acts as a translator between numbers
and figures and the people who need to know about them.
INTRODUCTION TO
STATISTICS
Introduction to Data Analysis 4

STATISTICS DEFINED
Statistics is defined as the “science of collecting, organizing,
summarizing, and analyzing information to draw a conclusion
or answer questions”. It provides a measure of confidence in any
conclusion.

Collection of information

Organization and summary of information

Analysis to draw conclusions

Statistics is also about where
numbers come from and how close Reports should be based on a measure of
they reflect reality. confidence
Introduction to Data Analysis 5

STATISTICS: DEALING WITH DATA

INFORMATION = DATA

Data – is defined as a “fact or proposition used to draw a conclusion or make a decision”. It can be
numerical or nonnumerical. It describes characteristics of an individual.

Data is POWERFUL
“In mathematics, when a problem solved
Proper data analysis can be used to
correctly, the results can be reported with 100%
disprove unfounded claims.
certainty. In statistics, the results do not have
Data is multidimensional 100% certainty.”
A good statistical analysis knows how
to deal with lurking variables.
Data is varied Understanding concepts in probability, statistics
and data analysis will give us the ability to
Statistics helps understand variability
and its sources.
analyze and criticize information.
Introduction to Data Analysis 6

SAMPLE AND POPULATION

Suppose you want to study the number of hours MSU-IIT students spend on social media
(Facebook, Twitter, Tiktok, etc.).

You interviewed 150 students and asked them how much time they spend on social media every
day. The results indicate a mean of 4.25 hours per day with a standard deviation of 2.88 hours.

The population is the complete collection of subjects

or things in which we are interested. It is the entire
group to be studied.

A sample is a subset of the population. The size of the

population is N, where the size of the sample is
denoted as n, 𝑛 ≤ 𝑁.
Introduction to Data Analysis 7

STATISTIC AND PARAMETER

A parameter is a characteristic from a A statistic is a characteristic from a
complete collection of subjects or subset of the population of interest.
things in which we are interested. Statistics are often used to estimate
Parameters are often unknown and may parameter values.
need to estimated from a statistic.

parameters statistics
Parameter = 𝜃 Statistic = 𝜃መ
population proportion = 𝒑 sample proportion = 𝒑 ෝ
population mean = 𝝁 sample mean = 𝒙 ഥ
population standard deviation = 𝝈 sample standard deviation = 𝒔

Often Greek letters are used to denote

parameters and “decorated” letters with
a “hat” or a “bar” are statistics.
Introduction to Data Analysis 8

STATISTICAL INFERENCE
Statistical inference is the process of using known sampled information to form a conclusion about
unknown population characteristics.
Introduction to Data Analysis 9

DESCRIPTIVE AND INFERENTIAL STATISTICS

DESCRIPTIVE INFERENTIAL
Organize, summarize and present the Compares, tests, and predicts future
data in a meaningful manner outcomes or makes estimates
Shown through graphs, charts, tables Utilizes probability scores
Describes data which is already known Tries to make a conclusion about the
population that is beyond the data

Tools: measures of central tendency, Tools: hypothesis test, ANOVA,

mean/median/mode goodness-of-fitness test, etc.

Inferential statistics is extending the results of your sample towards your population. This generalization
contains uncertainty because a sample cannot tell us everything about a population.
Introduction to Data Analysis 10

CONCEPT CHECK
A survey of 100 individuals ages 18-65 showed that 63%
believe they are poor.

Population:
Sample:
Parameter:
Statistic:
Introduction to Data Analysis 11

PROCESS OF STATISTICS
Identify the A researcher must determine the questions he or she want answered. The
research objective questions must clearly identify the population that is to be studied.

Collect the data Conducting the data on the whole population is impractical and expensive.
However, appropriate data collection techniques must also be followed.
needed

Describe the Describe the data collected using numerical and visual tools. It gives us an
overview of the data and can help us determine which statistical tools to use
data for inference.

Perform Apply the appropriate techniques to extend the results obtained from the
sample to the population and report a level of reliability of the results.
inference
Introduction to Data Analysis 12

QUALITATIVE AND QUANTITATIVE VARIABLES

Variables – characteristics of an individual within the population.

Variable Types
Qualitative Quantitative
• Variables which are non-measurable • Variables whose values result from
characteristics of an individual. counting or measuring something
• Can be discrete or continuous
• Classification based on some
attribute or characteristic.
• .
Example: weight, amount of rain, height,
Example: hair color, address, gender,
temperature
rating

Variables are not constant and vary.

Introduction to Data Analysis 13

DISCRETE AND CONTINUOUS VARIABLE

Quantitative variables can be further classified into discrete or continuous .

Quantitative Variable Types

Discrete Continuous
• Has either a finite number of possible • Has an infinite number of possible
values or a countable number of values that are not countable.
possible values. • Can take on every possible value
between any two values
• Cannot take on every possible value
• Value is determined from
between any two values
measurement
• Value is determined from counting
Example: number of children in a
family, number of students in a class
Example: weight, amount of rain, height
14

VARIABLES

Qualitative Quantitative

Discrete Continuous
Introduction to Data Analysis 15

LEVELS OF MEASUREMENT
To establish relationships between variables, researchers must observe the variables and record
their observations. This requires that the variables be measured. The process of measuring a
variable requires a set of categories called a scale of measurement and a process that classifies
each individual into one category.
Levels of Measurement

1. Nominal Scale is an unordered set of categories identified only by name. Nominal measurements
only permit you to determine whether two individuals are the same or different. (Ex. Eye color,
brand)
2. An ordinal scale is an ordered set of categories. Ordinal measurements tell you the direction of
difference between two individuals. It allows for the values to be arranged or ranked (Ex. Letter grade)
3. An interval scale is an ordered series of equal-sized categories. Interval measurements identify the
direction and magnitude of a difference. The zero point is located arbitrarily on an interval scale. It
has the properties of ordinal level of measurement but the differences between values have meaning
(Ex. Temperature)
4. A ratio scale is an interval scale where a value of zero indicates none of the variable. Ratio
measurements identify the direction and magnitude of differences and allow ratio comparisons of
measurements (Ex. Heigh, weight)
Introduction to Data Analysis 16

CONCEPT CHECK
A survey of 100 individuals ages 18-65 showed that 63%
believe they are poor.

Population:
Sample:
Parameter:
Statistic:
OBSERVATIONAL AND
EXPERIMENTAL STUDIES
Introduction to Data Analysis 18

EXAMPLE

Cellular Phones and Brain Tumors

❖ In a study by Benson, et. al. (2013), the researchers followed a
sample middle-aged women in the United Kingdom for 7 years.
The researchers compared the women who never used a mobile
phone with those who used one and found no significant
difference in the incident rate of brain tumors between the two
groups.
❖ Researchers from the United States National Toxicology Program
conducted a study to address the concern of brain tumor
incidence due to radio-frequency radiation (RFR). Since it is
unethical to purposely expose humans to a potential carcinogen,
rats were used. 90 rats were randomly assigned to three possible
groups: control group, GSM-modulated RFR, CDMA- modulated
RFR. Although brain tumor incidence was found in Group 2 and
3, they were not statistically different from the control group.
Introduction to Data Analysis 19

OBSERVATIONAL STUDY VS. EXPERIMENT

Once the research objective is determined, the researcher develops a method in collecting data.

Basis of Collecting Data

An observational study measures the value of the response variable without attempting to influence
the value of either the response or explanatory variables. That is, in an observational study, the
researcher observes the behavior of the individuals without trying to influence the outcome of the study

If a researcher randomly assigns the individuals in a study to groups, intentionally manipulates the
value of an explanatory variable, controls other explanatory variables at fixed values, and then
records the value of the response variable for each individual, the study is a designed experiment.
Introduction to Data Analysis 20

EXAMPLE

Do Flu Shots Benefit Seniors?

Researchers wanted to determine the long-term benefits of the
influenza vaccine on seniors aged 65 years and older by looking at
records of over 36,000 seniors for 10 years. The seniors were divided
into two groups. Group 1 were seniors who chose to get a flu
vaccination shot, and group 2 were seniors who chose not to get
a flu vaccination shot. After observing the seniors for 10 years, it was
determined that seniors who get flu shots are 27% less likely to be
hospitalized for pneumonia or influenza and 48% less likely to die
from pneumonia or influenza.

Source: Kristin L. Nichol, MD, MPH, MBA, James D. Nordin, MD, MPH, David B. Nelson, PhD, John P.
Mullooly, PhD, Eelko Hak, PhD. “Effectiveness of Influenza Vaccine in the Community-Dwelling
Elderly,” New England Journal of Medicine 357:1373–1381, 2007.
Introduction to Data Analysis 21

WHICH IS BETTER? OBSERVATIONAL OR EXPERIMENT

Several factors may have contributed to the results of the response variable.
Confounding in a study occurs when the effects of two or more explanatory variables are not
separated. Therefore, any relation that may exist between an explanatory variable and the response
variable may be due to some other variable or variables not accounted for in the study.

In our Example, other factors such as lower hospitalization or death rates can be caused by other factors
aside from the flu shot. It could race, gender, etc.

Confounding is often caused by a lurking variable. A lurking variable is an explanatory variable that
was not considered in a study, but that affects the value of the response variable in the study. In
addition, lurking variables are typically related to explanatory variables considered in the study.

Observational studies do not allow for a research to claim causation, only association.
Introduction to Data Analysis 22

WHICH IS BETTER? OBSERVATIONAL OR EXPERIMENT

Designed experiments are used whenever control of certain variables is possible and desirable.
This type of research allows researchers to identify certain cause and effect relationships
among variables in the study.

Reasons why observational studies are conducted over designed experiments:

• Ethics
• Greater timeliness, lower cost and broader range of patients

A confounding variable is a an explanatory variable that was considered in a study whose effect cannot
be distinguished from a second explanatory variable in the study.

Main difference between lurking and confounding variable:

• Lurking variables are not considered in the study
• Confounding variables are considered but may have an effect with other explanatory
variables or the response variable
Introduction to Data Analysis 23

TYPES OF OBSERVATIONAL STUDIES

Cross-sectional study – a type of study which collect information about individuals at a specific point
in time or over a very short period of time.

Case-control study – These studies are retrospective, meaning that they require individuals to look
back in time or require the researcher to look at existing records. In case-control studies, individuals
who have a certain characteristic may be matched with those who do not.
Disadvantage: accuracy of information being recalled, truthfulness

Cohort study – A cohort study first identifies a group of individuals to participate in the study (the
cohort). The cohort is then observed over a long period of time. During this period, characteristics
about the individuals are recorded and some individuals will be exposed to certain factors (not
intentionally) and others will not. At the end of the study the value of the response variable is recorded
for the individuals. It is prospective in nature.
Disadvantage: individuals may not continue, expensive

Census – list of all individuals in a population along with certain characteristics of each individual.
SAMPLING METHODS
Introduction to Data Analysis 25

SAMPLING
Random sampling is the process of using chance to select individuals from a population to be
included in the sample.

If convenience is used to obtain a sample, the results of a survey is meaningless.

Every possible sample of size n Obtained by separating the

has an equally likely chance of Simple population into nonoverlapping
Stratified groups called strata, then
occurring. random
sampling obtaining a random sample from
sampling
each strata.

Obtained by selecting every kth

Obtained by selecting all
individual from the population. Systematic Cluster individuals within a randomly
The first individual selected sampling Sampling selected collection or group of
corresponds to a random
individuals.
number between 1 and k.
Introduction to Data Analysis 26

SYSTEMATIC SAMPLING

1. Approximate the population size N

2. Determine the sample size desired, n.
3. Compute N/n and round down to the nearest integer. This value is
k.
4. Randomly select a number between 1 and k. Call this number p.
5. The sample will consist of the following individuals:
p, p+k, p+2k, …. , p+(n-1)k
Introduction to Data Analysis 27

CLUSTER SAMPLING

Important questions to ask in cluster sampling

1. How do I cluster a population?
2. How many clusters do I sample?
3. How many individuals should be in each cluster?

If clusters are homogenous, it is better to have more clusters

with fewer individuals in each cluster.

Heterogenous clusters likely resemble the heterogeneity of the

population.
Introduction to Data Analysis 28

BIAS IN SAMPLING
If the results of the sample are not representative of the population, then the sample is bias.
Sources of Bias in Sampling
1. Sampling Bias – means that the technique used to obtain the individuals in the sample tends to favor one part of the
population over another. This results in undercoverage, which occurs when the proportion one segment of the
population is lower in a sample than in a population.
2. Nonresponse Bias – exists when individuals selected to be in the sample who do not respond to the survey have
different opinions from those who do. This happens if individuals selected do not respond or cannot be contacted.
Callbacks and rewards can be used to counter non-response.
3. Response Bias - exists when the answers on a survey do not reflect the true feelings of the respondent.
a) Interview Error – trained interviewers can help respondents be truthful
b) Misrepresented answers – some questions may result in misrepresentation (survey of salary, etc.)
c) Wording of questions -- asking questions in a balance form, very vague questions
d) Order of question – prior questions may affect the way respondents answer following questions
e) Type of question – open vs. close questions
f) Data entry error
DESIGN OF EXPERIMENTS
Introduction to Data Analysis 30

CHARACTERISTICS OF AN EXPERIMENT
An experiment is a controlled study conducted to determine the effect varying one or more
explanatory variables or factors has on a response variable. Any combination of the values of the factors
is called a treatment.

A factor is a
characteristic that The response is
differentiates each The treatment is Treatment
the measured
a combination of combinations are
group or outcome taken
factors and/or applied to the
population. A from the
factor can have levels of factors. experimental
experimental
two or more units.
units.
levels.

A control group serves as a baseline treatment that can be used to compare it to other treatments.

Replication - Replication is the repetition of an experiment on more than one individual.

Blinding - Blinding is a technique in which the subject doesn’t know whether he or she is receiving a
treatment or a placebo to avoid bias.
Double-blinding – both researcher and subject does not know which one gets the placebo
Presentation title
PRINCIPLES OF EXPERIMENTAL 31

DESIGN

Replicate Randomize Control

• Replicate experimental • Use chance to assign • Minimize external
units in each treatment experimental units to sources of variation
group to estimate treatments. among experimental
variability. • Reduces bias due to units such that the only
• More experimental units unknown sources of source of variation is the
reduce chance variability. variation. treatment.
• Replicate overall • Compare two or more
experiment to validate treatments to better
results. understand an effect.

If the experiment concludes there are differences among treatment groups then the
results may be referred to as statistically significant. Statistical Significance is
when the observed effect so large it would rarely occur by chance.
EXAMPLE
• A manufacturer of a coating formulation wants to know the effect of using a
coating on the corrosion rate of metal roofing.
Identify the following for the above study:
• Factor and Level No Coating With Coating
• treatment
• experimental units Treatment 1 Treatment 2
• response

• Coating
Factor What possible
• Level: Coating, No coating
confounding
variable can you
• With Coating
Treatment • Without Coating
think of that may
affect the results of
Experimental the study?
• Metal roofing
Units
How about HUMIDITY?
Response • Corrosion rate
EXAMPLE
• A manufacturer of a coating formulation wants to know the effect of using a coating on the corrosion rate
of metal roofing. In order to account for humidity, the metal roofs were also subject to atmosphere with
20% humidity and 80% humidity.

• Coating No Coating With Coating

Factor 1 • Level: Coating, No coating 20% Humidity Treatment 1 Treatment 2
80% Humidity Treatment 3 Treatment 4
• Humidity
Factor 2 • Level: 20%, 80%

Experimental
• Metal roofing
Units

Response • Corrosion rate

Introduction to Data Analysis 34

COMPLETELY RANDOMIZED SAMPLING

A completely randomized design is one in which each experimental unit is randomly assigned to a
treatment.
Introduction to Data Analysis 35

COMPLETELY RANDOMIZED SAMPLING

A completely randomized block design is used when units share an observed characteristic that may
introduce unwanted variation. The homogenous units are grouped into blocks based on unavoidable
characteristic. Completely randomized experiments are conducted within the blocks. Example: Testing
different brands

Treatments must be randomly assigned within the block to avoid

confounded variables. If variables are confounded, their treatment effects
cannot be distinguished from each other.
Introduction to Data Analysis 36

MATCHED-PAIR DESIGN
A matched-pairs design is an experimental design in which the experimental units are paired up. The
pairs are selected so that they are related in some way (that is, the same person before and after a
treatment, twins, husband and wife, same geographical location, and so on). There are only two levels of
treatment in a matched-pairs design.

EXAMPLE
An educational psychologist wants to determine whether listening to music has an effect on a student’s
ability to learn. Design an experiment to help the psychologist answer the question.
Approach: Use a matched-pairs design by matching students according to IQ and gender (just in case
gender plays a role in learning with music).
NEXT TOPIC
Descriptive Statistics

Statistical Analysis With Software Application PDF
100% (10)
Statistical Analysis With Software Application PDF
126 pages
Statistical Analysis With Software Application Bsa PDF
100% (3)
Statistical Analysis With Software Application Bsa PDF
112 pages
Painless Statistics
From Everand
Painless Statistics
Barron's Educational Series
No ratings yet
# Basic Statistics For Accounting & Finance
100% (2)
# Basic Statistics For Accounting & Finance
187 pages
Statistical Analysis With Software Application
100% (2)
Statistical Analysis With Software Application
126 pages
Lecture 1
No ratings yet
Lecture 1
39 pages
Chapter 1 BKU2032
No ratings yet
Chapter 1 BKU2032
57 pages
Chapter1 Introduction To Statistics
No ratings yet
Chapter1 Introduction To Statistics
27 pages
1 Introduction To Psychological Statistics
No ratings yet
1 Introduction To Psychological Statistics
83 pages
CAMAD- Data Analysis
No ratings yet
CAMAD- Data Analysis
21 pages
Researh Lesson 1
No ratings yet
Researh Lesson 1
5 pages
Research Samples and Explanations
No ratings yet
Research Samples and Explanations
56 pages
Qualitative_20250209_210633_0000
No ratings yet
Qualitative_20250209_210633_0000
1 page
Statistical Analysis With Software Application
No ratings yet
Statistical Analysis With Software Application
126 pages
Ms Data Science S, 24 (WEEK# 1) Unlock
No ratings yet
Ms Data Science S, 24 (WEEK# 1) Unlock
31 pages
Ms Data Science S, 24 (WEEK# 1)
No ratings yet
Ms Data Science S, 24 (WEEK# 1)
30 pages
Introduction To Descriptive Analytics
No ratings yet
Introduction To Descriptive Analytics
21 pages
Descriptive Statisitics Sir Eric
No ratings yet
Descriptive Statisitics Sir Eric
46 pages
ENGINEERING DATA ANALYSIS NOTES
No ratings yet
ENGINEERING DATA ANALYSIS NOTES
6 pages
Dcc3132 - Statistics: Descriptive Statistics Inference Statistics
No ratings yet
Dcc3132 - Statistics: Descriptive Statistics Inference Statistics
4 pages
Statistic
No ratings yet
Statistic
13 pages
Lecture 1b DMGT 25 Statistics Compressed
No ratings yet
Lecture 1b DMGT 25 Statistics Compressed
54 pages
Module 1
No ratings yet
Module 1
7 pages
Module 2 ILT1 Statistics in Analytical Chemistry
No ratings yet
Module 2 ILT1 Statistics in Analytical Chemistry
7 pages
Final Correction Basic Statistics Combined Chapter (1)
No ratings yet
Final Correction Basic Statistics Combined Chapter (1)
130 pages
Sasa
No ratings yet
Sasa
127 pages
Introduction To Statistics (Stat 2181)
No ratings yet
Introduction To Statistics (Stat 2181)
169 pages
statis
No ratings yet
statis
1 page
CHAPTER 1 & 2
No ratings yet
CHAPTER 1 & 2
60 pages
Introduction To Statistics Using R - Session 1 - Ar
No ratings yet
Introduction To Statistics Using R - Session 1 - Ar
101 pages
Basic Stat - Chapter 1 Introduction To Statistics
50% (2)
Basic Stat - Chapter 1 Introduction To Statistics
25 pages
Statistics
No ratings yet
Statistics
57 pages
Lecture 9 Statistical Learning
No ratings yet
Lecture 9 Statistical Learning
3 pages
Std. ppt
No ratings yet
Std. ppt
91 pages
RSM-CIVL605-1
No ratings yet
RSM-CIVL605-1
43 pages
Ist Sem
No ratings yet
Ist Sem
92 pages
STATAPP1
No ratings yet
STATAPP1
11 pages
STATS MODULE
No ratings yet
STATS MODULE
1 page
QuantiMethods - L1 L8 1
No ratings yet
QuantiMethods - L1 L8 1
15 pages
Statistical Analysis With Software Application
100% (1)
Statistical Analysis With Software Application
126 pages
Lecture Note (Basic Statistics Acc & Fina)
No ratings yet
Lecture Note (Basic Statistics Acc & Fina)
187 pages
Statistics Analysis With Software Application
No ratings yet
Statistics Analysis With Software Application
22 pages
C-07 PPT Session 1 Intro - Statistics
No ratings yet
C-07 PPT Session 1 Intro - Statistics
13 pages
Chapter 1
No ratings yet
Chapter 1
60 pages
Engineering Data Analysis: Categories of Statistics
No ratings yet
Engineering Data Analysis: Categories of Statistics
23 pages
Biostatistics PDF
No ratings yet
Biostatistics PDF
150 pages
Introduction and Descriptive Statistics
No ratings yet
Introduction and Descriptive Statistics
50 pages
Chapter 3 Data Management
No ratings yet
Chapter 3 Data Management
21 pages
MMW PREFI
No ratings yet
MMW PREFI
5 pages
CL 2. Statistics and Data - Modified
No ratings yet
CL 2. Statistics and Data - Modified
5 pages
Mathgazine 2
No ratings yet
Mathgazine 2
19 pages
Statistique Descriptive
No ratings yet
Statistique Descriptive
71 pages
STAT TRANSES
No ratings yet
STAT TRANSES
5 pages
3-Statistics types
No ratings yet
3-Statistics types
5 pages
Data Analysis
No ratings yet
Data Analysis
12 pages
Introduction To STATISTICS-new
100% (1)
Introduction To STATISTICS-new
46 pages
Quantitative Research by Dr. Myla M. Arcinas SDRC Webinar
100% (1)
Quantitative Research by Dr. Myla M. Arcinas SDRC Webinar
56 pages
Definition of Statistics
No ratings yet
Definition of Statistics
4 pages
Business Statistics I Essentials
From Everand
Business Statistics I Essentials
Louise Clark
5/5 (5)
Descriptive Statistics: Six Sigma Thinking, #3
From Everand
Descriptive Statistics: Six Sigma Thinking, #3
Sumeet Savant
No ratings yet
Lecture 1 First Order Differential Equations
No ratings yet
Lecture 1 First Order Differential Equations
50 pages
Separable-Equation
No ratings yet
Separable-Equation
13 pages
Linear
No ratings yet
Linear
10 pages
Lecture 2 - Descriptive Statistics Part II
No ratings yet
Lecture 2 - Descriptive Statistics Part II
47 pages
GEC103 - Intro to globalization for first long-prelim exam 1 24-25
No ratings yet
GEC103 - Intro to globalization for first long-prelim exam 1 24-25
43 pages
A Test of Hypothesis Using T
No ratings yet
A Test of Hypothesis Using T
4 pages
Imacd and Linreg
No ratings yet
Imacd and Linreg
2 pages
Kuliah-10. ANOVA Excel
No ratings yet
Kuliah-10. ANOVA Excel
18 pages
Statistics Wiki
No ratings yet
Statistics Wiki
118 pages
Wooldridge 6e Ch09 SSM
No ratings yet
Wooldridge 6e Ch09 SSM
8 pages
Forecasting Trip Attraction Based On Com PDF
No ratings yet
Forecasting Trip Attraction Based On Com PDF
9 pages
Proc Logistic
No ratings yet
Proc Logistic
261 pages
CFA Probability Distribution Tables For L1 & L2 (300hours Updated)
No ratings yet
CFA Probability Distribution Tables For L1 & L2 (300hours Updated)
6 pages
Automatic Hyperparameter Tuning With Sklearn Using Grid and Random Search - by Bex T. - Towards Data Science
No ratings yet
Automatic Hyperparameter Tuning With Sklearn Using Grid and Random Search - by Bex T. - Towards Data Science
8 pages
Statistics
No ratings yet
Statistics
20 pages
Final Exam Engineering Probability and Statistics 1
No ratings yet
Final Exam Engineering Probability and Statistics 1
5 pages
PR2 Q2 Week4 7
No ratings yet
PR2 Q2 Week4 7
9 pages
1-5-18 M Tech CSE Batch 2018 PDF
No ratings yet
1-5-18 M Tech CSE Batch 2018 PDF
80 pages
POE5 EViews Chapter10
No ratings yet
POE5 EViews Chapter10
29 pages
An Introduction To Probability and Statistics - 2015 - Rohatgi - Subject Index
No ratings yet
An Introduction To Probability and Statistics - 2015 - Rohatgi - Subject Index
11 pages
Standardized Multiple Regression Analysis
No ratings yet
Standardized Multiple Regression Analysis
18 pages
Thesis With Two Way Anova
100% (2)
Thesis With Two Way Anova
7 pages
Operations Management: - Forecasting
No ratings yet
Operations Management: - Forecasting
96 pages
Knut Fatigue Curves 1996
No ratings yet
Knut Fatigue Curves 1996
7 pages
Session 1: Simple Linear Regression: Figure 1 - Supervised and Unsupervised Learning Methods
No ratings yet
Session 1: Simple Linear Regression: Figure 1 - Supervised and Unsupervised Learning Methods
16 pages
Pelletier Wei Backtesting v03
No ratings yet
Pelletier Wei Backtesting v03
37 pages
Afif Akbar Syawala - 120210200029 - Tugas Ekonometrik
No ratings yet
Afif Akbar Syawala - 120210200029 - Tugas Ekonometrik
5 pages
Non Parametric Testing
No ratings yet
Non Parametric Testing
42 pages
Session 12
No ratings yet
Session 12
9 pages
Chapter Two Time Series Regression
No ratings yet
Chapter Two Time Series Regression
7 pages
2010 AP Statistics Free Response Solutions
No ratings yet
2010 AP Statistics Free Response Solutions
3 pages
Correlationanalysis
No ratings yet
Correlationanalysis
49 pages
Module 4
No ratings yet
Module 4
12 pages
Causal Inference Extended Tutorial
No ratings yet
Causal Inference Extended Tutorial
189 pages

Lecture 1 - Introduction to Data Analysis (1)

Uploaded by

Lecture 1 - Introduction to Data Analysis (1)

Uploaded by

INTRODUCTION TO

WHAT IS DATA ANALYSIS?

Data analysis is the process of inspecting, presenting and

While data analysts observe trends and patterns in a data,

Organization and summary of information

Analysis to draw conclusions

STATISTICS: DEALING WITH DATA

SAMPLE AND POPULATION

The population is the complete collection of subjects

A sample is a subset of the population. The size of the

STATISTIC AND PARAMETER

Often Greek letters are used to denote

DESCRIPTIVE AND INFERENTIAL STATISTICS

Tools: measures of central tendency, Tools: hypothesis test, ANOVA,

QUALITATIVE AND QUANTITATIVE VARIABLES

Variables are not constant and vary.

DISCRETE AND CONTINUOUS VARIABLE

Quantitative Variable Types

Cellular Phones and Brain Tumors

OBSERVATIONAL STUDY VS. EXPERIMENT

Basis of Collecting Data

Do Flu Shots Benefit Seniors?

WHICH IS BETTER? OBSERVATIONAL OR EXPERIMENT

WHICH IS BETTER? OBSERVATIONAL OR EXPERIMENT

Reasons why observational studies are conducted over designed experiments:

Main difference between lurking and confounding variable:

TYPES OF OBSERVATIONAL STUDIES

If convenience is used to obtain a sample, the results of a survey is meaningless.

Every possible sample of size n Obtained by separating the

Obtained by selecting every kth

1. Approximate the population size N

Important questions to ask in cluster sampling

If clusters are homogenous, it is better to have more clusters

Heterogenous clusters likely resemble the heterogeneity of the

Replication - Replication is the repetition of an experiment on more than one individual.

Replicate Randomize Control

• Coating No Coating With Coating

Response • Corrosion rate

COMPLETELY RANDOMIZED SAMPLING

COMPLETELY RANDOMIZED SAMPLING

Treatments must be randomly assigned within the block to avoid

You might also like