0% found this document useful (0 votes)
20 views

ZZZZ

The document discusses statistical concepts including population, sample, parameter, statistic, variable, data, descriptive statistics, statistical inference, tables, graphs, probability, proportion, data visualization techniques, correlation, correlation matrix, simple regression analysis and finding the line of best fit using different methods.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views

ZZZZ

The document discusses statistical concepts including population, sample, parameter, statistic, variable, data, descriptive statistics, statistical inference, tables, graphs, probability, proportion, data visualization techniques, correlation, correlation matrix, simple regression analysis and finding the line of best fit using different methods.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

4.

1 REVIEW OF COMMON STATISTICAL TERMS

Population – is the entire group of individuals you want to study, and a sample is a subset of that group.

Representative Sample – a subset of the population that has the same characteristics as the population.

Parameter – is a quantitative characteristic of the population that you are interested in estimating or
testing (such as a population mean or proportion).

Statistic – is a quantitative characteristic of a sample that often helps estimate or test the population
parameter (such as a sample mean or proportion).

Variable – is a characteristic of interest for each person or object in a population.

• Categorical Variable – a variable that takes on values that are names or labels
• Numerical – a variable that takes on values that are indicated by numbers

Data – a set of observations (a set of possible outcomes): most data can be put into two groups:

• Qualitative – an attribute whose value is indicated by a label.


• Quantitative – an attribute whose value is indicated by a number.
• Discrete data – the result of counting (e.g., the number of Covid-19 deaths in Cebu)
• Continuous data – the result of measuring (e.g., weight of new-born babies)

Descriptive Statistics – are single results you get when you analyze a set of data.

• Mean – shows the arithmetic mean of the sample data.


• Standard Error – shows the standard error of the data set (a measure of the difference between
the predicted value and the actual value)
• Median – shows the middle value in the data set (the value that separates the largest half of the
values from the smallest half of the values)
• Mode – shows the most common value in the data set
• Standard Deviation – shows the sample standard deviation measure for the data set.
• Sample Variance – shows the sample variance for the data set (the squared standard deviation)
• Kurtosis – shows the kurtosis of the distribution
• Skewness – shows the skewness of the data set’s distribution
• Range – shows the difference between the largest and smallest values in the data set
• Minimum – shows the smallest value in the data set
• Maximum – shows the largest value in the data set
• Sum – adds all the values in the data set together to calculate the sum
• Count – counts the number of values in a data set
• Largest (X) – shows the largest X value in the data set
• Smallest (X) – shows the smallest X value in the data set
• Confidence Level (X) Percentage – shows the confidence level at a given percentage for the data
set values

Statistical Inference – refers to using your data (and its descriptive statistics) to make conclusions about
the population.
Table – contains quantitative data organized into rows and columns with categorical labels.

Graph – (or chart) is primarily used to show relationships among data and portrays values encoded as
visual objects (e.g., lines, bars, or points). Numerical values are displayed along the axes providing scales.

Probability – a number between zero and one (inclusive) that gives the likelihood that a specific event will
occur.

Proportion – (or percentage) the number of successes divided by the total number in the sample.

4.2 DATA VISUALIZATION

Data Visualization – is the graphic representation of data

• It involves producing images that communicate relationships among the represented data to
viewers of the images. It makes complex data more accessible, understandable, and usable.
• Data visualization is viewed as a branch of descriptive statistics by some, but also as a tool by
others.

Bar Graphs – shows numbers that are independent of each other.

Pie Charts – show you how a whole is divided into different parts

Line Charts – show you how numbers have changes over time, very useful when data are connected,
expected to reveal trends

Excel Box and Whiskey Plot

Box and Whiskey Plot in Excel – is an exploratory chart used to show statistical highlights and distribution
of the data set. This chart is used to show a five number summary of the data. These five-number summary
are “Minimum Value, First Quartile Value, Median Value, Third Quartile Value, and Maximum Value”

• Minimum Value – the minimum or smallest value from the dataset


• First Quartile Value – it is the value between the minimum value and median value
• Median Value – median is the value of the dataset
• Third Quartile Value – the value between the median value and maximum value
• Maximum Value – highest value of the dataset

4.3 CORRELATION

Correlation – denoted by r, measures the amount of linear association between two variables. The value
of r is always between -1 and 1 inclusive. The R-squared value, denoted by R2, is called the Coefficient of
determination. It measures the proportion of variation in the dependent variable that can be attributed
to the independent variable. The value of R2 is always between 0 and 1 inclusive.

Interpreting Pearson r

Value of r Strength

-1.0 to -0.5; 1.0 to 0.5 strong relationship

-0.5 to -0.1; 0.1 to 0.5 weak relationship

-0.1 to 0.1 none or very weak

Correlation r = 0.0; R-squared = 0.0. No association

There is no association between the variables.

Correlation r = -0.3. Small negative association.

How to Read a Correlation Matrix

1. -1 indicates a perfectly negative linear correlation between two variables.


2. 0 indicates no linear correlation between two variables.
3. 1 indicates a perfectly positive linear correlation between two variables.

When to use a Correlation Matrix

1. A correlation matrix conveniently summarizes a dataset


A correlation matrix is a simple way to summarize the correlations between all variables in a
dataset.
2. A correlation matrix serves as a diagnostic for regression.
One key assumption of multiple linear regression is that no independent variable in the model is
highly correlated with another variable in the model. When two independent variables are highly
correlated, this results in a problem known as multicollinearity and it can make it hard to interpret
the results of the regression.

One of the easiest ways to detect a potential multicollinearity problem is to look at a correlation
matrix and visually check whether any of the variables are highly correlated with each other.

4.4 Simple Regression

Regression Analysis – finds the equation of the line that best describes the relationship between two
variables to help make accurate or reliable predictions.

Another complex equation can be used to arrive at the estimates of the y-intercept and slope. A quicker
way or most effective way to calculate is by using Data Analysis.

You might also like