0% found this document useful (0 votes)
9 views

Midterm Reviewer

haha

Uploaded by

rosaljerdzjarro
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

Midterm Reviewer

haha

Uploaded by

rosaljerdzjarro
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

INTRODUCTION TO STATISTICS AREAS OF STATISTICS

• Descriptive statistics consists of the


Why Study Statistics? collection, organization, summarization, and
1. Data is everywhere. presentation of data. (e.g. Mean, Median,
2. Statistical techniques are used to make Mode)
many decisions that affect our lives.
3. No matter what your career, you will make • Inferential Statistics comprise those methods
professional decisions that involve data. An concerned with making predictions or
understanding of statistical methods will inferences about a population using only the
help you make these decisions efectively information gathered from a sample. (e.g.
Hypothesis testing)
THE NATURE OF STATISTICS
• Statistics is the branch of science which VARIABLES AND MEASUREMENTS
deals with the collection, presentation, • A variable is a characteristic or attribute of the
organization, analysis and interpretation of elements in a collection which can assume
data. different values for the different elements.
• Data is information, especially facts or
numbers, collected to be examined and • Measurement is the process of determining the
considered and used to help decision- value or label of a variable for a particular
making experimental unit on which the variable is
measured.
Example:
o The unemployment rate in the CLASSIFICATION OF VARIABLES
Philippines is 5.7%. 1. Qualitative (Categorical) - a variable that
o The average lifespan of Filipinos is yields categorical responses
69.09 years. e.g. political affiliation, religion, course
o In 2003, the Philippine forests 2. Quantitative (Numerical) - a variable that takes
comprised 7.2 million hectares. But in on numerical value representing an amount or
2010, forest cover went down by 4.6 quantity.
percent or about 6.8 million hectares. e.g. weight, age, number of students,
temperature
DEFINITION OF TERMS
• A population is a collection of all elements 2 Types of Quantitative Variables
under consideration in a statistical study. a. Discrete - a variable which can assume finite, or,
• e.g. ASCOT Students, Asia at most, countably infinite number of values;
• A sample is a part or subset of the usually measured by counting or enumeration.
population. b. Continuous - a variable which can assume the
• e.g. DFES, Philippines infinitely many values corresponding to a line
• A parameter is a numerical characteristic interval.
of the population.
• e.g. The average daily allowance of Levels of Measurement
ASCOT students is 100php. 1. The nominal level of measurement classifies
• The population of Asia is 4.463B. data into mutually exclusive (nonover-lapping),
• A statistic is a numerical characteristic of exhausting categories in which no order or ranking
the sample. can be imposed on the data.
• e.g. The average daily allowance of e.g.
DFES students is 200php. • A sample of college instructors classified
• The population of the Philippines is according to subject taught (e.g., English,
104.9 million history, or mathematics)
• Classifying survey subjects as male or female
is another example of nominal-level
measurement.
2. The ordinal level of measurement classifies data THE FREQUENCY DISTRIBUTION TABLE
into categories that can be ranked; however, precise To describe situations, draw conclusions, or
differences between the ranks do not exist. make inferences about events, the researcher must
• For example, students might be ranked as organize the data in some meaningful way.
superior, average, or poor. The most convenient method of organizing data
• Other examples of ordinal data are number is to construct a frequency distribution.
grades (A B C D E F)
3. The interval level of measurement ranks data, Raw Data
and precise differences between units. The raw data is the set of data in its original scores.
• of measure do exist; however, there is no
meaningful zero. Example. Final grades of BS Forestry Students in
• Example: IQ is an example of such a variable. Math.
There is a meaningful difference of 1 point
between an IQ of 109 and an IQ of 110.
• Temperature is another example of interval
measurement, since there is a meaningful
difference of 1 F between each unit, such as 72
and 73 F.

4. The ratio level of measurement possesses all the


characteristics of interval measurement, and there
exists a true zero. In addition, true ratios exist
when the same variable is measured on two
different members of the population. ARRAY
e.g. Those used to measure height, weight, area, An array is an arrangement of observations
and number of phone calls received. according to their magnitude, either in increasing
or decreasing order.
Primary Data VS. Secondary Data
1. Primary Data - data measured and gathered by Example. Scores of BS Agriculture Students in
the researcher directly for a specific purpose. the Prelims in Math.
e.g. surveys and focus group discussions
2. Secondary Data - data that are republished by a
researcher for a different purpose or study other
than its original intention.
e.g. Internet, Newspaper articles, papers, thesis
of other people)
Data Collection Methods
1. Survey Method - questions are asked to obtain
information either through self-administered
questionnaires through mail or survey forms or
personal interviews through telephone or focused
group discussions.
The Frequency Distribution Table
2. Observation Method - makes possible the
A frequency distribution is the organization of
recording of behavior but only at the time of
raw data in table form, using classes and
occurrence. May be participant or non-participant.
frequencies. The frequency distribution table
3. Experimental Method - data is obtained under
shows the number of items falling into each group.
controlled conditions and is usually done in
laboratories.
4. Use of Existing Records - Data from published
2 Types of Frequency Distribution
materials like reports, personal files, historical
• Categorical Frequency Distribution - used
records and government census are used.
for data that can be placed in specific
categories, such as nominal- or ordinal-level of
data.
• Grouped Frequency Distribution – used
when the range of data is large.
The Frequency Distribution Table: Definition of 7. The upper class limit of the 1st class interval is
Terms one unit measure less than the lower class limit
1. Unit measure - is the number of decimal places of the 2nd class interval. The succeeding upper
to which data contains. class limits are increased by the class size.
2. Class frequency - the number of observations 8. Tally the frequencies for each class. Sum the
falling in the class. frequencies and check against the total number
3. Class interval - the numbers defining the class. of observations.
4. Class limits - the end numbers of the class. 9. Make sure that the 1st class interval (lowest
5. Class boundaries - the true class limits; the class interval) contain the lowest value in the
lower class boundary (LCB) is defined as data set and the last class interval contain the
halfway between the lower class limit of the highest value in the data set.
class and the upper class limit of the preceding
class . Homework: The ages of the top 50 wealthiest
upper class boundary (UCB) is defined as people in the world
halfway between the upper class limit of the
class and the lower class limit of the next
class.
6. 6. Class size - the difference between the upper
class boundaries of the class and the preceding
class.
7. Class mark (CM) - midpoint of a class
interval.
8. Open-end class - a class that has no lower limit
or upper limit.

Example: Scores of BS Agriculture Students in


the Prelims in Math with class size 6.
MEASURES OF CENTRAL TENDENCY
• Measure of central tendency provides a very
convenient way of describing a set of scores
with a single number that describes the
PERFORMANCE of the group. Measure of
central tendency provides a very convenient
way of describing a set of scores with a single
number that describes the PERFORMANCE of
the group.
• There are three commonly used measures of
central tendency. These are the following:
Steps in Constructing a Frequency Distribution
– MEAN
Table
– MEDIAN
1. Determine the number of items n from the raw
– MODE
data.
2. Determine the highest (max) and lowest (min)
MEAN
values
It is the most commonly used measure of the center
3. Compute the range R = max - min.
of data
4. Determine the number of classes K using the
It is also referred as the “arithmetic average”
formula: K = √ number of observations
5. Determine approximate class size C’ by the
Computation of Sample Mean
formula: C’=R/k. Round this value to the 𝛴𝑥 𝑥1 + 𝑥2 + ⋯ + 𝑥𝑛
number C that is consistent with the unit 𝑥̅ = =
measure of the data and use C as the class size. 𝑛 𝑛
6. List the class intervals starting from the lowest
Example: Sarah got a score of 76 in Math, 87 in
value as the lower class limit of the 1st class.
English, 90 in Science, 86 in Filipino and 90 in
Other lower class limits will be increased
Values Education. What is the Mean of the scores
successively by the class size.
of Sarah?
WEIGHTED MEAN Example: Ten ASCOT students took an Exam in
The weighted mean of a data set is given by Mathematics and got the following scores:
85, 89,90,89, 75, 79, 90, 82, 96, 99, 90.
𝛴𝑥𝑖 𝑤𝑖 Find the Mode.
𝑥̅ =
𝛴𝑤𝑖 GRAPHICAL REPRESENTATION OF DATA
where wi is the weight of each of the data points xi. 1. Line Chart - graphical presentation of data
especially useful for showing trends over a period
Example: Find the Grade Point Average (GPA) of of time. It is also called time-series graph if one
Paolo Adade for the first semester of the school variable is time.
year 2013-2014. Use the table below:

Example: Ice Cream Sales

Day of the Week Ice Cream Sales in


Monay $410
MEDIAN Tuesday $440
• Median is what divides the scores in the Wednesday $550
distribution into two equal parts. Thursday $420
• Fifty percent (50%) lies below the median Friday $610
value and 50% lies above the median value. Saturday $790
• It is also known as the middle score or the Sunday $770
50th percentile.
Steps on how to find the Median 2. Pie Chart - a circular graph that is useful in
1. Arrange the scores (from lowest to highest or showing how
highest to lowest). a total
2. Determine the middle most score in a quantity is
distribution if n is an odd number and get the distributed
average of the two middle most scores if n is an among a
even number. group of
categories.
Example: Find the Median of the following data The “pieces
1. 13, 7, 5, 21, 23, 39, 23, 40, 23, 14, 12, 23, 29 of the pie”
2. 14, 9, 5, 22, 23, 23, 40, 23, 14, 12, 23, 29 represent the
proportion of
MODE the total that
The mode or the modal score is a score or scores fall into each
that occurred most in the distribution. category.
Types of Modes: Example: Favorite Type of Movie
• Unimodal is a distribution of scores that
consists of only one mode. Type of Movie No. Of People
• Bimodal is a distribution of scores that consists Comedy 4
of two modes. Action 5
• Trimodal is a distribution of scores that Romance 6
consists of three modes or multimodal is a Drama 1
distribution of scores that consists of more than Sci-Fi 4
two modes.
5. Scatterplots - consists of a series of data points
3. Bar Chart - consists of a series of rectangular located on the rectangular coordinate system; It is
bars where the length of the bar represents the commonly used when we want to graphically
quantity or frequency for each category if the bars inspect trend of association between two variables
are arranged horizontally. If the bars are arranged (but not causation).
vertically, the height of the bar represents the Example:
quantity.

Fruit No. Of People


Example:
Apple 35
Orange 30
Banana 10
KiwiFruit 25 MEASURES OF DISPERSION
Blueberry 40
Grapes 5 1. RANGE
• The range of a set of data values is the
4. Pictorial unit Chart - a pictorial chart in which difference between the greatest data value
each symbol represents a definite and uniform and least data value..
value. Example Find the Range of the numbers of the
ounces dispensed by Machine 1 and Machine 2
in the table below.
Soda Dispensed
Machine 1 Machine 2
9.52 8.0
6.41 7.99
10.07 7.95
5.85 8.03
8.15 8.20

2. Standard Deviation
• The standard deviation is a measure of
Example dispersion that is less sensitive to extreme
values.
Name No. Of Tennis Games Played • The standard deviation of a set of numerical
John 40 data makes use of the amount by which
Sam 45 each individual data value deviates from the
Mary 70 mean.
Alex 55
STANDARD DEVIATIONS FOR MEASURES OF RELATIVE POSITION
POPULATION AND SAMPLES
Population: If 𝑥1 , 𝑥2 , 𝑥3 , … , 𝑥𝑛 is a population of n 1. Z-Score
number with a mean of µ, the the standard The number of standard deviations between a
deviation of the population is data value and the mean is known as the data
value’s z-score or standard score.
𝛴(𝑥−𝜇)2
𝜎=√ .
𝑛 The graph below shows the download times for this
movie using two different measures: the number of
Sample: If 𝑥1 , 𝑥2 , 𝑥3 , … , 𝑥𝑛 is a sample of n minutes a download time is from the mean and the
number with a mean of 𝑥̅ , the the standard deviation number of standard deviation the download time is
of the population is from the mean.
𝛴(𝑥−𝑥̅ )2
𝜎=√ .
𝑛

Procedure for Computing Standard Deviation


1. Determine the mean of the n numbers.
2. For each number, calculate the deviation
(difference) between the number and the mean
of the numbers.
3. Calculate the square of each deviation and find
the sum of these squared deviations.
4. If the data is a population, then divide the sum The z-score for a given data
by n. If the data is a sample, then divide the sum value x is the number of standard deviations
by n-1. that x is above or below the mean of the data.
5. Find the square root of the quotient in Step 4. The following formulas show how to calculate the
z-score for a data value x in a population and in a
Example 1.: A sample of 6 students took an exam sample.
in Mathematics and got the following scores 𝑥−𝜇 𝑥 − 𝑥̅
75, 78, 85, 90, 69, 92 Population: 𝑧𝑥 = , Sample: 𝑧𝑥 =
𝜎 𝜎
Find the Standard Deviation.

Example 2. A sample of 8 batteries from each of Example: Compare z-scores


the 3 companies was tested. The results are shown Raul has taken two tests in his chemistry class. He
the table below. scored 72 on the first test, for which the mean of all
Company No. of Hours scores as 65, and the standard deviation as 8. He
EverSoBright 6.2, 6.4, 7.1, 5.9, 8.3, 5.3, 7.5, 9.3 received a 60 on a second test, for which the mean
Dependable 6.8, 6.2, 7.2, 5.9, 7.0, 7.4, 7.3 8.2 of all scores was 45 and the standard deviation was
BEacon 6.1, 6.6, 7.3, 5.7, 7.1, 7.6, 7.1, 8.5 12. In comparison to the other students, did Raul do
better on the first test or the second test?

3. VARIANCE Conclusion: Raul Scored 0.857 standard deviation


• A statistic known as the variance is also used above the mean on the first test and 1.25 standard
as a measure of dispersion. deviation above the second test. The z-scores
• The variance for a given set of data is the indicate that, in comparison to his classmates, Raul
square of the standard deviation of the data. scored better on the second test than he did on the
first test.
Notation for Standard Deviation and Variance
𝜎 is the standard deviation of a population Example:
𝜎 2 is the variance of a population A sample of 6 students took an exam in math and,
𝑠 is the standard deviation of the sample Mario got 75, Marlon got 78, Sophia got 85, Apple
𝑠 2 is the variance of a sample got 90, Aladdin got 69 and Albert got 92.
What is the z-score of Aladdin?
2. Percentiles
pth percentile -A value x is called the pth percentile
of a data set provided p% of the data values are less
than x.

Percentile for a Given Data Value


Given a set of data and a data value x,
number of data values less than 𝑥
Percentile of score 𝑥 = × 100
total number of data values Box-and-Whisker Plots
• A box-and-whisker plot(sometimes called
Example: Using Percentiles box plot) is often used to provide a visual
In a recent year, the median monthly salary, for a summary of a set of data. A box-and-
physical therapist was 74,80php. If the 90th whisker plot shows the median, the first and
percentile for the annual salary of a physical third quartiles, and the minimum and
therapist was 105,900php, find the percent of maximum values of a data set.
physical therapists whose annual salary was
a) More than 74,480php
b) Less than 105,900php
c) Between 74,480php and 105,900php

2. On a reading examination given to 900 students,


Elaine’s score of 602 was higher than the score of
576 of the students who took the examination.
What is the percentile for Elaine’s score?

Construction of a Box-and-Whisker Plot


3. Quartiles 1. Draw a horizontal scale that extends from
The three number Q1, Q2, and Q3 that partition the minimum data value to the maximum
a ranked data set into four(approximately) equal data value.
groups are called the quartiles of the data. 2. Above the scale, draw a rectangle (box)
with its left side at Q1, and its right side at
Example: for the data set below, the values Q1=11, Q3.
Q2=29, and Q3=104 are the quartiles of the data. 3. Draw a vertical line segment across the
2, 5, 5, 8, 11, 12, 19, 22, 23, 29, 31, 45, 83, 91, rectangle at the median Q2.
104, 159, 181, 312, 354 4. Draw a horizontal line segment, called a
The quartile Q1 is called the first quartile. The whisker, that extends from Q1 to the
quartile Q2 is called the second quartile. It is the minimum and another whisker than extends
median of the data. The quartile Q3 is called the from Q3 to the maximum.
third quartile.
Example: Construct a Box-and-Whisker Plot
The median Procedure for finding Quartiles Construct a box-and-whisker plot for the data
1. Rank the data
2. Find the median of the data. This is the
second quartile, Q2.
3. The first quartile, Q1, is the median of the
data values less than Q2. The third quartile,
Q3, is the median of the data values greater
than Q2.
Note:
Example: Use medians to find the Quartiles of a • Box plots are popular because they are easy to
Data Set construct and they illustrate several important
The following table lists the calories per 100 features of a data set in a simple diagram.
milliliters of 25 popular sodas. Find the quartiles of • Note that from a box plot, we can easily
the data. estimate
• The quartiles of the data.
• The range of the data.
• The position of the middle half of the data
as shown by the length of the box.

Try.
A sample of 10 students took at 50-item exam in
math and got the following scores.
Name of Score Name of Score
Student Student
Angela 45 Natsu 31
Robert 44 Lucy 33
Mario 44 Aang 37
Killua 43 Katara 44
Gon 36 Soka 40
1. Find the Range
2. Find the Mean
3. Find the Median
4. Find the Mode
5. Find the Standard Deviation
6. Find the Variance
7. What is the z-score of Killua’s Score?
8. What is the Percentile of Killua’s Score?
9. Find Q1, Q2, and Q3.
10. Make a Box and Whisker Plot.

Quiz On Graphs:
https://forms.gle/ZwyMYRq33yu258Ci8
Module 4 Assessment:
https://forms.gle/qJmkhsZ2ogb42U8CA
Module 5 Assessment:
https://forms.gle/8FAMAZQM1cRqZsEk6
Module 6 Assessment:
https://forms.gle/D7BsDaqr3pyGhAxTA
Module 4 Quiz:
https://forms.gle/xD4Vp4b7QAGGqnZE6
Module 5 Quiz:
https://forms.gle/Hzf7qyc6tMVdhBrn6
Module 6 Quiz:
https://forms.gle/EVgF2LrUGqa1QaSw5

You might also like