ZZZZ
ZZZZ
Population – is the entire group of individuals you want to study, and a sample is a subset of that group.
Representative Sample – a subset of the population that has the same characteristics as the population.
Parameter – is a quantitative characteristic of the population that you are interested in estimating or
testing (such as a population mean or proportion).
Statistic – is a quantitative characteristic of a sample that often helps estimate or test the population
parameter (such as a sample mean or proportion).
• Categorical Variable – a variable that takes on values that are names or labels
• Numerical – a variable that takes on values that are indicated by numbers
Data – a set of observations (a set of possible outcomes): most data can be put into two groups:
Descriptive Statistics – are single results you get when you analyze a set of data.
Statistical Inference – refers to using your data (and its descriptive statistics) to make conclusions about
the population.
Table – contains quantitative data organized into rows and columns with categorical labels.
Graph – (or chart) is primarily used to show relationships among data and portrays values encoded as
visual objects (e.g., lines, bars, or points). Numerical values are displayed along the axes providing scales.
Probability – a number between zero and one (inclusive) that gives the likelihood that a specific event will
occur.
Proportion – (or percentage) the number of successes divided by the total number in the sample.
• It involves producing images that communicate relationships among the represented data to
viewers of the images. It makes complex data more accessible, understandable, and usable.
• Data visualization is viewed as a branch of descriptive statistics by some, but also as a tool by
others.
Pie Charts – show you how a whole is divided into different parts
Line Charts – show you how numbers have changes over time, very useful when data are connected,
expected to reveal trends
Box and Whiskey Plot in Excel – is an exploratory chart used to show statistical highlights and distribution
of the data set. This chart is used to show a five number summary of the data. These five-number summary
are “Minimum Value, First Quartile Value, Median Value, Third Quartile Value, and Maximum Value”
4.3 CORRELATION
Correlation – denoted by r, measures the amount of linear association between two variables. The value
of r is always between -1 and 1 inclusive. The R-squared value, denoted by R2, is called the Coefficient of
determination. It measures the proportion of variation in the dependent variable that can be attributed
to the independent variable. The value of R2 is always between 0 and 1 inclusive.
Interpreting Pearson r
Value of r Strength
One of the easiest ways to detect a potential multicollinearity problem is to look at a correlation
matrix and visually check whether any of the variables are highly correlated with each other.
Regression Analysis – finds the equation of the line that best describes the relationship between two
variables to help make accurate or reliable predictions.
Another complex equation can be used to arrive at the estimates of the y-intercept and slope. A quicker
way or most effective way to calculate is by using Data Analysis.