Statistics Reviewer
Statistics Reviewer
Descriptive statistics consist of organizing and summarizing data. Descriptive statistics describe data
through numerical summaries, tables, and graphs.
Inferential statistics uses methods that take a result from a sample, extend it to the population, and
measure the reliability of the result. One goal is to use statistics to estimate parameters
Universe Sample
Population individuals
Parameter Statistics
Process of statistics
Note: if entire population is studied, it does not need the inferential statistic because the descriptive
statistic already provided the 100% results for a population
Distinction between qualitative and quantitative variables
Qualitative variables- variable that yields categorical response. It is a word or a code that represents a
class or category. What type? Ex: gender, state, hair color
Quantitative variables- take on numerical values representing an amount or quantity. How many, how
much, or how often? Ex: age, BMI, number of kids
Discrete- either a finite number of possible values or a countable number of possible values. Takes on
distinct, countable values. Once you counted, you can already know the number.
Continuous- has an infinite number of possible values that are not countable. Takes on any value within
a range, and the number of possible values within that range is infinite. You count continuously before
arriving at the number.
Level of measurement
Which type of scale is represented by your data since different statistics are appropriate for
different scales of measurement
Nominal level- data that consist of names, labels, or categories only. The data cannot be arranged in
ordering scheme. Nominal scales have no numerical value. Categorical scale, a scale that classifies a
person or object in only one category Ex: Method of payment (cash, check, debit card, credit card), Type
of school (public vs. private), Eye Color (Blue, Green, Brown)
Ordinal level- data that may be arranged in some order, but differences between data values either
cannot be determined or meaningless. Scale that classifies and rank. Highest to lowest, or lowest to
highest, but do not indicate how much higher or how much better. Ex: Food Preferences, Rank of a
Military officer, Social Economic Class (First, Middle, Lower)
Interval level- measurement level that not only classifies and orders the measurements, but it also
specifies that the distances between each interval on the scale are equivalent along the scale from low
interval to high interval. A value of zero does not mean the absence of the quantity. Arithmetic
operations such as addition and subtraction can be performed on values of the variable. Ex:
Temperature on Fahrenheit/Celsius Thermometer, Trait anxiety (e.g., high anxious vs. low anxious), IQ
(e.g., high IQ vs. average IQ vs. low IQ)
Ratio- It has the properties of the interval level of measurement and the ratios of the values of the
variable have meaning. A value of zero means the absence of the quantity. Arithmetic operations such
as multiplication and division can be performed on the values of the variable. Ex: Height and weight,
Time, Distance and speed
Assessment
I. A research objective is presented. For each, identify the (A) population and (B) sample in
the study.
1. A polling organization contacts 2141 male university graduates who have a white-collar
job and asks whether or not they had received a raise at work during the past 4 months.
2. Every year the PSA releases the Current Population Report based on a survey of 50,000
households. The goal of this report is to learn the demographic characteristics, such as
income, of all households within the Philippines
II. Indicate whether the following statements require the use of descriptive or inferential
statistics.
1. A teacher wants to know the attitudes of all students towards abortion. Descriptive
2. A market analyst of a sales firm draws a chart showing the sales figures of a given
product for the period 2006-2007. Descriptive
3. A forecaster predicts the results of an election using the number of votes cast in 15 out
of 25 barangays. Inferential
4. Men are better in math than women. Inferential
5. Forty percent of the employees of an organization were recorded tardy for at least 15
working days. Descriptive
6. There are very few gender-related occupations. Inferential
7. An account predicts accuracy rate of a client’s financial resources. Inferential
8. A quality control manager wishes to check production output. Descriptive
9. Records indicated that 75% of the faculty in the graduate school are doctoral degree
holders. Descriptive
10. There is no relationship between educational qualification of parents and academic
achievement of their children. Inferential
III. Identify the qualitative and quantitative variables and indicate the highest level of
measurement required in each. If quantitative, classify whether discrete or continuous.
1. Occupation Qualitative-Nominal
2. Number of government officials Quantitative-Discrete-Ratio
3. Favorite color Qualitative-Nominal
4. Temperature in Celsius degrees Quantitative-Continuous-Interval
5. Type of school Qualitative-Nominal
6. Volume of mineral water sold daily Quantitative-Continuous-Ratio
7. Employee number Qualitative-Nominal
8. Civil status Qualitative-Nominal
9. Zip code numbers Qualitative-Nominal
10. Brands of soft drinks Qualitative-Nominal
11. Socioeconomic status Qualitative-Ordinal
12. Status Employment Qualitative-Nominal
13. Number of vehicles registered Quantitative-Discrete-Ratio
14. Jersey Number Qualitative-Nominal
15. Number of employees collecting retirement benefits from GSIS Quantitative-Discrete-
Ratio
Data collection is the process of gathering and measuring information on variables of interest, in an
established systematic fashion that enables one to answer stated research questions, test hypotheses,
and evaluate outcomes.
It should be done properly as planned properly to arrive at a reliable data. if more than one
person is involved in the data collection, but data collectors do not follow consistent data collection
practices, they can end up with data with different units, collection processes, and variable names.
3. Determine the method to be used in data gathering and define the comprehensive data collection
points.
5. Collect data
Primary Sources- provide a first-hand account of an event or time-period and are authoritative. They
represent original thinking, reports on discoveries or events, or they can share new information. Often
these sources are created at the time the events occurred, but they can also include sources that are
created later. They are usually the first formal appearance of original research.
Secondary Sources- offer an analysis, interpretation or a restatement of primary sources and are
considered to be persuasive. They often involve generalization, synthesis, interpretation, commentary,
or evaluation in an attempt to convince the reader of the creator's argument. They often attempt to
describe or explain primary sources.
1. Direct personal interviews - The researcher has direct contact with the interviewee. The
researcher gathers information by asking questions to the interviewee
2. Indirect/Questionnaire Method - This method of data collection involves sourcing and accessing
existing data that were originally collected for the purpose of the study.
How to construct a research questionnaire:
2. Pre-existing questionnaires- in the course of your literature review, pay careful attention to how
others are measuring the concept you want to measure. They may have already tested the reliability
and validity of a measure.
An open-ended question is a type of question that does not include response categories. This type of
question is usually appropriate for collecting subjective data.
A closed-ended question is a type of question that includes a list of response categories from which the
respondent will select his answer. This type of question is usually appropriate for collecting objective
a. two-way questions
b. multiple-choice questions
c. checklist
d. ranking
Contingency questions (filter questions)- special type of close ended question because it applies only to
subgroup of respondents
4. Consider the audience
C. Clarify details
D. Avoid double-barreled questions. Ex ng double barreled: Nicki Minaj is fun and helpful.
E. Avoid double-negative wording. Ex ng double-negative: Jerikamae did not meet your expected
customer service
Type of scales
G. Avoid hidden assumptions and contingencies: Ex: In the past weeks, how often did you skip school
6. Ordering
Funnel Sequence - progressively narrower scope, to ascertain something about the respondent’s frame
of reference on a topic, to prevent further specific questions from biasing the initial overall view of the
respondent. General to specific questions
Inverted Funnel Sequence - specific questions on a topic are asked first, and these eventually lead to a
more general question, to think through his or her attitude before reaching an overall evaluation. The
interviewer begins with narrow, closed-ended questions and moves to more broad, open-ended
questions. The interviewer may also begin with more specific questions and gradually ask more general
questions.
■ proofread
Review and pilot test the survey. Talk through the survey questions with potential respondents, ask
colleagues to review them, and/or select a few potential respondents and ask them to complete the
survey and provide feedback on the content.
3. A focus group is a group interview of approximately six to twelve people who share similar
characteristics or common interests. A facilitator guides the group based on a predetermined set of
topics.
4. Experiment is a method of collecting data where there is direct human intervention on the conditions
that may affect the values of the variable of interest. Bear in mind that the experimental method has
several limitations that you should be aware of. - Ethical, moral, and legal Concerns - Unrealistic
Controlled Environments - Inability to Control for All Variables
1. Level of precision- also called sampling error, the level of precision, is the range in which the
true value of the population is estimated to be.
2. Confidence interval- it is statistical measure of the number of times out of 100 that results can
be expected to be within a specified range. For example, a confidence interval of 90% means
that results of an action will probably meet expectations 90% of the time.
Makikita ito sa t-table beh, one-tailed. α= (1- confidence level)/ 2, df= sample size- 1.
3. Degree of variability- depending upon the target population and attributes under consideration,
the degree of variability varies considerably. The more heterogeneous a population is, the larger
the sample size is required to get an optimum level of precision.
• Estimating the Mean or Average- The sample size required to estimate the population mean µ to with
a level of confidence with specified margin of error e, given by
Example data set: 46, 69, 32, 60, 52, 41, mean= (46+69+32+60+52+41)/2= 50
x x−x ¿¿
46 -4 16
69 19 361
32 -18 324
60 10 100
52 2 4
41 -9 81
886
886
S= √ ( ) s= 13.31
6−1
886
σ= √ ( ) σ= 12.15
6
Example of estimating the mean or average for determining sample size:
A soft drink machine is regulated so that the amount of drink dispensed is approximately
normally distributed with a standard deviation equal to 0.5 ounce. Determine the sample size needed if
we wish to be 95% confident that our sample mean will be within 0.03 ounce from the true mean.
Solution: The z – score for confidence level 95% in the z – table is 1.96.
• Estimating proportion (infinite population)- the sample size required to obtain a confidence interval
for p with specified margin of error e is given by
Where: Z is the z-score corresponding to level of confidence. e is the level of precision. P is population
proportion. There is a dilemma in this formula:
Suppose we are doing a study on the inhabitants of a large town and want to find out how many
households serve breakfast in the mornings. We don’t have much information on the subject to begin
with, so we’re going to assume that half of the families serve breakfast: this gives us maximum
variability. So p = 0.5. We want 99% confidence and at least 1% precision.
Solution: The z – score for confidence level 99% in the z – table is 2.58.
• Slovin’s Formula- Slovin’s formula is used to calculate the sample size n given the population size and
error. It is computed as
A researcher plans to conduct a survey about food preference of BS Stat students. If the population of
students is 1000, find the sample size if the error is 5%.
1. Reduced Cost
4. Greater Scope
5. Convenience
6. Necessity
7. Ethical Considerations
1. Probability Sample - Samples are obtained using some objective chance mechanism, thus involving
randomization. They require the use of a complete listing of the elements of the universe called the
sampling frame. The probabilities of selection are known. They are generally referred to as random
samples. They allow drawing of valid generalizations about the universe/population.
2. Non - probability Sample - Samples are obtained haphazardly, selected purposively or are taken as
volunteers. The probabilities of selection are unknown. They should not be used for statistical inference.
Simple Random Sampling- Most basic method of drawing a probability sample. Assigns equal
probabilities of selection to each possible sample. Results to a simple random sample
Procedure
3. The sample will consist of the elements corresponding to the number selected
Advantage: It is very simple and easy to use. Methods are simple and easy
Disadvantage: The sample chosen may be distributed over a wide geographic area. It needs a list of all
elements in the population. Sample size must be very large for heterogenous population in order to get
reliable results. High transportation cost if elements are widely spread geographically
When to use: This is preferable to use if the population is not widely spread geographically. Also, this is
more appropriate to use if the population is more or less homogenous with respect to the
characteristics of the population.
Simple random samples
3. Preplan how to select a sequence of digits from the table so that no bias enters into the selection
process.
6. Select as samples those items in the lot corresponding to the random numbers.
Stratified Random Sampling- in stratified sampling the population is partitioned into groups, called
strata, and sampling is performed separately within each stratum.
Procedure
Advantage: Estimate are more reliable compared to SRS of the same sample size if the population has
been divided into strata with heterogenous elements, but the strata are very different from each other.
Estimation of parameter for each subpopulation is easier when compared to other sampling method. It
can facilitate the administration and supervision of data collection, especially if the stratification variable
is geographic subdivision
Disadvantage: It need a list of all elements of the population including their values on the stratification
variable. High transportation cost if elements are widely spread geographically unless there are field
offices in each geographic area
When to use: If population is heterogenous with respect to the characteristic under study. If we want to
perform separate analysis for certain subpopulations. If we wish to facilitate the administration of the
collection of data
Systematic Random Sampling- relies on arranging the target population according to some ordering
scheme and then selecting elements at regular intervals through that ordered list. Systematic sampling
involves a random start and then proceeds with the selection of every kth element from then onwards.
In this case, k= (population size/sample size). It is important that the starting point is not automatically
the first in the list but is instead randomly chosen from within the first to the kth element in the list.
Procedure
4. Get the rest of the elements in the sample by taking every kth element from the random start
Advantage: Identifying the units in the sample is easy. The design does not require a list of all elements
in the population. The sample is distributed evenly over the entire population. It gives more reliable
estimates than simple random sampling when the arrangement of the elements in the sampling frame is
according to magnitude.
Disadvantage: Estimate may not be reliable when there are periodic regularities in the list. It requires
information on the arrangement of the elements in the sampling frame to determine the reliability of
the estimates
When to use: If there is no available list of elements in the population. If arrangement of the elements
in the sampling frame is according to magnitude
Cluster Sampling- is sampling procedure or system where the sampling unit consist of a group of
elements called clusters. In simple one-stage cluster sampling, the cluster are selected using simple
random sampling
Advantages: The design needs only a list of clusters and not a list of elements Transportation and listing
cost are usually lower.
Disadvantages: Estimates are usually less reliable when compared to other sampling design It is not
cost-efficient if the cluster are large and the elements are homogenous with respect to the characteristic
under study
When to use: If there is no available list of elements If cost is more important than reliability of the
estimates
Non-probability sampling
1. Accidental sampling
2. Quota sampling
3. Convenience sampling
4. Purposive sampling
5. Judgment sampling
1. Non-sampling Error
1. Non-responses
2. Interviewer Error
5. Questionnaire Design
6. Wording of Questions
7. Selection Bias
2. Sampling Error
- Error that results from taking one sample instead of examining the whole population.
- Error that results from using sampling to estimate information regarding a population.