0% found this document useful (0 votes)
16 views

Notes Week 3

The document discusses descriptive statistics and analyzing quantitative data. It covers topics like measures of central tendency, variability, distributions, outliers, and the relationship between variables. Methods for summarizing, organizing, and drawing conclusions from data are presented.

Uploaded by

Queenie Millor
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views

Notes Week 3

The document discusses descriptive statistics and analyzing quantitative data. It covers topics like measures of central tendency, variability, distributions, outliers, and the relationship between variables. Methods for summarizing, organizing, and drawing conclusions from data are presented.

Uploaded by

Queenie Millor
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Descriptive Statistics

● Statistics
- Discipline that deals with quantitative data:
- Collection
- Organization
- Analysis
- Presentation
- Inference of conclusions

Overview of Using Data: Definitions,Goals

● Data
- Facts and figures
collected, analyzed and
summarized for
presentation and
interpretation

● Variables
- Characteristics or a quantity of interest that can take on different values
● Observation
- Set of values corresponding to a set of variables
● Variation
- Difference in a variable measured over observations
● Random Variables
- Uncertain variable, Quantities whose values are not known with certainty
● Decision Variables
- Variable values that are under direct control of decision makers
TYPES OF DATA
● Population and sample data
Population: all elements if interest ( not feasible to collect)
Sample: subset of population ( can be gathered by random sampling)
● Quantitative and Categorical data
Quantitative data: numeric and arithmetic ( operations can be performed)
Categorical data: if arithmetic operations cannot be performed
● Cross- sectional and Time Series Data
Cross- sectional data: collected from several entities at the same point in time
Time series data: Collected over several time periods
SOURCES OF DATA
● Experimental data
- Variable of interest first identified
- 1/more variables: controlled, manipulated
- How they influence variable of interest
- Ex. COVID 19 vaccine
V of interest: protection from COVID 19
V controlled/ manipulated: dosage level
● Non- experimental or Observational studies
- No attempt to control variable of interest
- 8Consider time & cost: obtaining data ( not exceeding savings of using data to
make better decisions)
- Ex: survey, observational studies
MODIFYING DATA IN Excel
● Sorting and Filtering data in excel
- Select cells > DATA > Sort & Filter > Sort
- > Sort by _____ > Sort On ______ > Order ________ > OK
- Select cells > DATA > Sort & Filter > Filter > Filter Arrow >
- Select check box for “data of interest” / Deselect by unchecking (Select All)
● Conditional Formatting of Data in Excel
- Select Cells > HOME > Styles > Conditional Formatting >
> Enter preferred conditional formatting function

MEASURES OF LOCATION
● Mean ( Arithmetic mean)
- Ave value for a variable
- Measures of central location
- Sample mean
- Excel: AVERAGE( number1, number 2
● Median
- Middle value when data is arranged in ascending order (small to largest)
- Excel: MEDIAN (number
● Mode
- Value that occurs most frequently
- Excel: MODE SNGL MODE MULT
● Geometric mean
- Calculated by finding nth root of the product of n values
- Ex: growth factor, analyzing growth rates in financial data
- Excel: GEOMEAN(number1 …
MEASURES OF VARIABILITY
● Range
- Largest value - smallest
- Excel: MAX(data)-MIN(data)
● Variance
- Deviation from the mean
- VAR.S(number 1….. Or VAR.P
● Standard Deviation
- Positive square root of the variance
- STDEV.S(number 1,....or STDEV.P
● Coefficient of Variation
- Indicates how large the standard deviation is relative to the mean

ANALYZING DISTRIBUTIONS
● Percentiles
- Value of variable at which a specified % of observations are below the value
- (100 – p)% of observations have values greater than pth percentile
Excel: PERCENTILE.EXC(array,k)
array: data array
k: percentile (e.g. 0.20 for 20%)
● Quartile
- Divide data into 4 parts containing ¼ or 25% of the observations
- Q1 = first quartile, 25th percentile
- Q2 = second quartile, 50th percentile (median)
- Q3 = third quartile, 75th percentile
-
Excel: QUARTILE.EXC(array, quart)
array: data array
quart: quartile (e.g. 1 for 1st quartile)
● Z scores
- Measure relative location of a value in a data set
- Helps determine how far the value is from the mean relative to the standard
deviation
- Zi = z score for xi
Excel: STANDARDIZE(Zi or x, mean, standard deviation)

EMPIRICAL RULE
- For symmetric bell-shaped distribution
- Can be used to determine % of data values that are within a specified number of
standard deviations of the mean
- Example: Bell-shaped distribution
- Approx. 68% of data values: within 1 standard deviation of the mean
- Approx. 95% of data values: within 2 standard deviations of the mean
- Almost all of data values: within 3 standard deviations of the mean
IDENTIFYING OUTLIERS
● Extreme values
Unusually large values
Unusually small values
● Should be investigated to ensure data accuracy
● Possible reasons for existing
Incorrect recording
From an observation that don’t belong to the population: incorrectly included
● z-scores
Can be used to identify outliers
z-score <-3 or >3: outlier

BOX PLOTS
- Graphical summary of the distribution of data
- Developed from quartiles of data set

Measures of Association Between Variables


● Scatter Charts
- Graph used for analyzing relationship between two variables
● Covariance
- Descriptive measure of the linear association between two variables
- Excel: COVARIANCE.S (array 1, array 2…. Or COVARIANCE.P
● Correlation Coefficient
- Standard measure of linear association
- Near -1: strong negative linear relationship

You might also like