0% found this document useful (0 votes)
35 views

Statistics Reviewer

Uploaded by

Ahlea Labay
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
35 views

Statistics Reviewer

Uploaded by

Ahlea Labay
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 16

Statistics Reviewer

Descriptive: science of collecting, organizing, summarizing and, analyzing information (data) to


draw conclusions or answer questions

Inferential: In addition, statistics is about providing a measure of confidence in any conclusion

Data- factual information used as basis for reasoning, discussion, or calculation

Universe is the set of all entities under study.

Population is the set of all possible values of the variable.

An individual is a person or object that is a member of the population being studied.

Sample is the subset of the population

A statistic is a numerical summary of a sample.

Descriptive statistics consist of organizing and summarizing data. Descriptive statistics describe data
through numerical summaries, tables, and graphs.

A parameter is a numerical summary of a population

Inferential statistics uses methods that take a result from a sample, extend it to the population, and
measure the reliability of the result. One goal is to use statistics to estimate parameters

Universe Sample
Population individuals

Parameter Statistics

Process of statistics

1. Identify the research objective- questions that identify the population


2. Collect the information needed to answer the questions- look at a sample and perform data
collection
3. Organize and summarize the information- obtain the overview of the data and decide what
statistical methods are appropriate for them
4. Draw conclusion from the information- information collected is generalized to the population.
Inferential statistics uses methods that takes results obtained from a sample, extends them to
population, and measure the reliability of the result.

Note: if entire population is studied, it does not need the inferential statistic because the descriptive
statistic already provided the 100% results for a population
Distinction between qualitative and quantitative variables

Variables- the characteristics of the individuals being studied

Qualitative variables- variable that yields categorical response. It is a word or a code that represents a
class or category. What type? Ex: gender, state, hair color

Quantitative variables- take on numerical values representing an amount or quantity. How many, how
much, or how often? Ex: age, BMI, number of kids

Distinction between discrete and continuous variables

Quantitative variables may be classified into discrete and continuous

Discrete- either a finite number of possible values or a countable number of possible values. Takes on
distinct, countable values. Once you counted, you can already know the number.

Continuous- has an infinite number of possible values that are not countable. Takes on any value within
a range, and the number of possible values within that range is infinite. You count continuously before
arriving at the number.

Level of measurement

Which type of scale is represented by your data since different statistics are appropriate for
different scales of measurement

Nominal level- data that consist of names, labels, or categories only. The data cannot be arranged in
ordering scheme. Nominal scales have no numerical value. Categorical scale, a scale that classifies a
person or object in only one category Ex: Method of payment (cash, check, debit card, credit card), Type
of school (public vs. private), Eye Color (Blue, Green, Brown)

Ordinal level- data that may be arranged in some order, but differences between data values either
cannot be determined or meaningless. Scale that classifies and rank. Highest to lowest, or lowest to
highest, but do not indicate how much higher or how much better. Ex: Food Preferences, Rank of a
Military officer, Social Economic Class (First, Middle, Lower)

Interval level- measurement level that not only classifies and orders the measurements, but it also
specifies that the distances between each interval on the scale are equivalent along the scale from low
interval to high interval. A value of zero does not mean the absence of the quantity. Arithmetic
operations such as addition and subtraction can be performed on values of the variable. Ex:
Temperature on Fahrenheit/Celsius Thermometer, Trait anxiety (e.g., high anxious vs. low anxious), IQ
(e.g., high IQ vs. average IQ vs. low IQ)

Ratio- It has the properties of the interval level of measurement and the ratios of the values of the
variable have meaning. A value of zero means the absence of the quantity. Arithmetic operations such
as multiplication and division can be performed on the values of the variable. Ex: Height and weight,
Time, Distance and speed
Assessment

I. A research objective is presented. For each, identify the (A) population and (B) sample in
the study.

1. A polling organization contacts 2141 male university graduates who have a white-collar
job and asks whether or not they had received a raise at work during the past 4 months.

A. Male university graduates who have a white-collar job


B. 2141 male university graduates who have a white-collar job

2. Every year the PSA releases the Current Population Report based on a survey of 50,000
households. The goal of this report is to learn the demographic characteristics, such as
income, of all households within the Philippines

A. All households within the Philippines


B. 50,000 households within the Philippines

II. Indicate whether the following statements require the use of descriptive or inferential
statistics.

1. A teacher wants to know the attitudes of all students towards abortion. Descriptive
2. A market analyst of a sales firm draws a chart showing the sales figures of a given
product for the period 2006-2007. Descriptive
3. A forecaster predicts the results of an election using the number of votes cast in 15 out
of 25 barangays. Inferential
4. Men are better in math than women. Inferential
5. Forty percent of the employees of an organization were recorded tardy for at least 15
working days. Descriptive
6. There are very few gender-related occupations. Inferential
7. An account predicts accuracy rate of a client’s financial resources. Inferential
8. A quality control manager wishes to check production output. Descriptive
9. Records indicated that 75% of the faculty in the graduate school are doctoral degree
holders. Descriptive
10. There is no relationship between educational qualification of parents and academic
achievement of their children. Inferential

III. Identify the qualitative and quantitative variables and indicate the highest level of
measurement required in each. If quantitative, classify whether discrete or continuous.

1. Occupation Qualitative-Nominal
2. Number of government officials Quantitative-Discrete-Ratio
3. Favorite color Qualitative-Nominal
4. Temperature in Celsius degrees Quantitative-Continuous-Interval
5. Type of school Qualitative-Nominal
6. Volume of mineral water sold daily Quantitative-Continuous-Ratio
7. Employee number Qualitative-Nominal
8. Civil status Qualitative-Nominal
9. Zip code numbers Qualitative-Nominal
10. Brands of soft drinks Qualitative-Nominal
11. Socioeconomic status Qualitative-Ordinal
12. Status Employment Qualitative-Nominal
13. Number of vehicles registered Quantitative-Discrete-Ratio
14. Jersey Number Qualitative-Nominal
15. Number of employees collecting retirement benefits from GSIS Quantitative-Discrete-
Ratio
Data collection is the process of gathering and measuring information on variables of interest, in an
established systematic fashion that enables one to answer stated research questions, test hypotheses,
and evaluate outcomes.

It should be done properly as planned properly to arrive at a reliable data. if more than one
person is involved in the data collection, but data collectors do not follow consistent data collection
practices, they can end up with data with different units, collection processes, and variable names.

Consequences from improperly collected data

• Inability to answer research questions accurately.

• Inability to repeat and validate the study.

• Distorted findings resulting in wasted resources.

• Misleading other researchers to pursue fruitless avenues of investigation.

• Compromising decisions for public policy.

• Causing harm to human participants and animal subjects.

Steps in Data Gathering

1. Set the objectives for collecting data

2. Determine the data needed based on the set objectives.

3. Determine the method to be used in data gathering and define the comprehensive data collection
points.

4. Design data gathering forms to be used.

5. Collect data

Primary Sources- provide a first-hand account of an event or time-period and are authoritative. They
represent original thinking, reports on discoveries or events, or they can share new information. Often
these sources are created at the time the events occurred, but they can also include sources that are
created later. They are usually the first formal appearance of original research.

Secondary Sources- offer an analysis, interpretation or a restatement of primary sources and are
considered to be persuasive. They often involve generalization, synthesis, interpretation, commentary,
or evaluation in an attempt to convince the reader of the creator's argument. They often attempt to
describe or explain primary sources.

The primary data can be collected by the following five methods:

1. Direct personal interviews - The researcher has direct contact with the interviewee. The
researcher gathers information by asking questions to the interviewee
2. Indirect/Questionnaire Method - This method of data collection involves sourcing and accessing
existing data that were originally collected for the purpose of the study.
How to construct a research questionnaire:

1. Determine the purpose:

A. What do I need to know?

B. Why do I need to know it?

C. What will happen as a result of this questionnaire?

2. Pre-existing questionnaires- in the course of your literature review, pay careful attention to how
others are measuring the concept you want to measure. They may have already tested the reliability
and validity of a measure.

3.1 Domain of questions: Behavior, Beliefs, Evaluation


3.2 Types of questions:

An open-ended question is a type of question that does not include response categories. This type of
question is usually appropriate for collecting subjective data.

A closed-ended question is a type of question that includes a list of response categories from which the
respondent will select his answer. This type of question is usually appropriate for collecting objective

a. two-way questions

b. multiple-choice questions

c. checklist

d. ranking

e. rating scale -odd-numbered -even-numbered

Contingency questions (filter questions)- special type of close ended question because it applies only to
subgroup of respondents
4. Consider the audience

A. Who should I ask?

B. Choose an appropriate data collection method.

■ Mailed ■ Telephone ■ Personal (face-to-face) interview ■ Web-based

5. Write the questions. (Dos and don’ts)

A. Watch your ranges

B. Avoid abstract terms or jargons

C. Clarify details

D. Avoid double-barreled questions. Ex ng double barreled: Nicki Minaj is fun and helpful.

E. Avoid double-negative wording. Ex ng double-negative: Jerikamae did not meet your expected
customer service

F. Use of appropriate scale

Type of scales

G. Avoid hidden assumptions and contingencies: Ex: In the past weeks, how often did you skip school

6. Ordering

Funnel Sequence - progressively narrower scope, to ascertain something about the respondent’s frame
of reference on a topic, to prevent further specific questions from biasing the initial overall view of the
respondent. General to specific questions

Inverted Funnel Sequence - specific questions on a topic are asked first, and these eventually lead to a
more general question, to think through his or her attitude before reaching an overall evaluation. The
interviewer begins with narrow, closed-ended questions and moves to more broad, open-ended
questions. The interviewer may also begin with more specific questions and gradually ask more general
questions.

7. Cover letter, instructions, and layout:

A. contain some instructions for responding to questions

■ Include the title

■ Consider using a “booklet” format so it stands out from just “paper.”

■ Use quality reproduction

■ proofread

■ Limit matrix/grid questions

8. Pretest and validation

Review and pilot test the survey. Talk through the survey questions with potential respondents, ask
colleagues to review them, and/or select a few potential respondents and ask them to complete the
survey and provide feedback on the content.

3. A focus group is a group interview of approximately six to twelve people who share similar
characteristics or common interests. A facilitator guides the group based on a predetermined set of
topics.

4. Experiment is a method of collecting data where there is direct human intervention on the conditions
that may affect the values of the variable of interest. Bear in mind that the experimental method has
several limitations that you should be aware of. - Ethical, moral, and legal Concerns - Unrealistic
Controlled Environments - Inability to Control for All Variables

5. Observation is a method of collecting data on the phenomenon of interest by recording the


observations made about the phenomenon as it actually happens. involves collecting Open- Ended
versus Closed - Ended information without asking questions.

The secondary data can be collected by the following five methods:

1. Published report on newspaper and periodicals.

2. Financial Data reported in annual reports.

3. Records maintained by the institution.

4. Internal reports of the government departments.

5. Information from official publications.


Sample size- appropriate sample size= validity, accurate results, economical. Denoted by “n”

Choosing of sample size depends on non-statistical considerations and statistical considerations.

• Non-statistical considerations – It may include availability of resources, manpower, budget, ethics,


and sampling frame.

• Statistical considerations – It will include the desired precision of the estimate.

Three criteria need to be specified to determine the appropriate sample size:

1. Level of precision- also called sampling error, the level of precision, is the range in which the
true value of the population is estimated to be.
2. Confidence interval- it is statistical measure of the number of times out of 100 that results can
be expected to be within a specified range. For example, a confidence interval of 90% means
that results of an action will probably meet expectations 90% of the time.

To find the right z – score to use, refer to the table:

Desired confidence level Z-score


80% 1.28
85% 1.44
90% 1.65
95% 1.96
99% 2.58

Makikita ito sa t-table beh, one-tailed. α= (1- confidence level)/ 2, df= sample size- 1.

3. Degree of variability- depending upon the target population and attributes under consideration,
the degree of variability varies considerably. The more heterogeneous a population is, the larger
the sample size is required to get an optimum level of precision.

Methods in Determining the Sample Size

• Estimating the Mean or Average- The sample size required to estimate the population mean µ to with
a level of confidence with specified margin of error e, given by

Z is the z-score corresponding to level of confidence.

e is the level of precision.


If when σ is unknown, it is common practice to conduct a preliminary survey to determine s and use it as
an estimate of σ or use results from previous studies to obtain an estimate of σ. When using this
approach, the size of the sample should be at least 30. The formula for the sample standard deviation s
is

Example data set: 46, 69, 32, 60, 52, 41, mean= (46+69+32+60+52+41)/2= 50

x x−x ¿¿
46 -4 16
69 19 361
32 -18 324
60 10 100
52 2 4
41 -9 81
886

886
S= √ ( ) s= 13.31
6−1
886
σ= √ ( ) σ= 12.15
6
Example of estimating the mean or average for determining sample size:

A soft drink machine is regulated so that the amount of drink dispensed is approximately
normally distributed with a standard deviation equal to 0.5 ounce. Determine the sample size needed if
we wish to be 95% confident that our sample mean will be within 0.03 ounce from the true mean.

Solution: The z – score for confidence level 95% in the z – table is 1.96.

Therefore, we need 1067 sample for our study

• Estimating proportion (infinite population)- the sample size required to obtain a confidence interval
for p with specified margin of error e is given by
Where: Z is the z-score corresponding to level of confidence. e is the level of precision. P is population
proportion. There is a dilemma in this formula:

P is either determined by preliminary value or to simply replace it by 0.5

Example of estimating proportion and determining the sample size:

Suppose we are doing a study on the inhabitants of a large town and want to find out how many
households serve breakfast in the mornings. We don’t have much information on the subject to begin
with, so we’re going to assume that half of the families serve breakfast: this gives us maximum
variability. So p = 0.5. We want 99% confidence and at least 1% precision.

Solution: The z – score for confidence level 99% in the z – table is 2.58.

Therefore, we need 16641 sample for our study

• Slovin’s Formula- Slovin’s formula is used to calculate the sample size n given the population size and
error. It is computed as

Where: N is the total population. e is the level of precision.

Example of slovin’s formula for determining sample size:

A researcher plans to conduct a survey about food preference of BS Stat students. If the population of
students is 1000, find the sample size if the error is 5%.

Therefore, we need 286 sample for the study


Samples

Why do we use samples?

1. Reduced Cost

2. Greater Speed or Timeliness

3. Greater Efficiency and Accuracy

4. Greater Scope

5. Convenience

6. Necessity

7. Ethical Considerations

Two Type of Samples

1. Probability Sample - Samples are obtained using some objective chance mechanism, thus involving
randomization. They require the use of a complete listing of the elements of the universe called the
sampling frame. The probabilities of selection are known. They are generally referred to as random
samples. They allow drawing of valid generalizations about the universe/population.

2. Non - probability Sample - Samples are obtained haphazardly, selected purposively or are taken as
volunteers. The probabilities of selection are unknown. They should not be used for statistical inference.

Basic sampling techniques

Simple Random Sampling- Most basic method of drawing a probability sample. Assigns equal
probabilities of selection to each possible sample. Results to a simple random sample

Procedure

1. List the elements and number them from 1 to N

2. Select n numbers from 1 to N, using a randomization mechanism

3. The sample will consist of the elements corresponding to the number selected

Advantage: It is very simple and easy to use. Methods are simple and easy

Disadvantage: The sample chosen may be distributed over a wide geographic area. It needs a list of all
elements in the population. Sample size must be very large for heterogenous population in order to get
reliable results. High transportation cost if elements are widely spread geographically

When to use: This is preferable to use if the population is not widely spread geographically. Also, this is
more appropriate to use if the population is more or less homogenous with respect to the
characteristics of the population.
Simple random samples

1. Assign a number to each item in the lot.

2. Consult the table of random numbers. And also lottery method.

3. Preplan how to select a sequence of digits from the table so that no bias enters into the selection
process.

4. Select a random number in the preplanned pattern.

5. Arrange the random numbers consecutively in numerical order.

6. Select as samples those items in the lot corresponding to the random numbers.

Stratified Random Sampling- in stratified sampling the population is partitioned into groups, called
strata, and sampling is performed separately within each stratum.

Procedure

1. Divide the population into nonoverlapping strata

2. Obtain a simple random sample from each stratum

3. The sample consists of the selected sample in all strata

Advantage: Estimate are more reliable compared to SRS of the same sample size if the population has
been divided into strata with heterogenous elements, but the strata are very different from each other.
Estimation of parameter for each subpopulation is easier when compared to other sampling method. It
can facilitate the administration and supervision of data collection, especially if the stratification variable
is geographic subdivision

Disadvantage: It need a list of all elements of the population including their values on the stratification
variable. High transportation cost if elements are widely spread geographically unless there are field
offices in each geographic area

When to use: If population is heterogenous with respect to the characteristic under study. If we want to
perform separate analysis for certain subpopulations. If we wish to facilitate the administration of the
collection of data
Systematic Random Sampling- relies on arranging the target population according to some ordering
scheme and then selecting elements at regular intervals through that ordered list. Systematic sampling
involves a random start and then proceeds with the selection of every kth element from then onwards.
In this case, k= (population size/sample size). It is important that the starting point is not automatically
the first in the list but is instead randomly chosen from within the first to the kth element in the list.

Procedure

1. Assign a unique number form 1 to N to each element of the population.

2. Determine the sampling interval k.

3. Obtain the first element in the sample using a randomization mechanism.

4. Get the rest of the elements in the sample by taking every kth element from the random start

Advantage: Identifying the units in the sample is easy. The design does not require a list of all elements
in the population. The sample is distributed evenly over the entire population. It gives more reliable
estimates than simple random sampling when the arrangement of the elements in the sampling frame is
according to magnitude.

Disadvantage: Estimate may not be reliable when there are periodic regularities in the list. It requires
information on the arrangement of the elements in the sampling frame to determine the reliability of
the estimates

When to use: If there is no available list of elements in the population. If arrangement of the elements
in the sampling frame is according to magnitude

Cluster Sampling- is sampling procedure or system where the sampling unit consist of a group of
elements called clusters. In simple one-stage cluster sampling, the cluster are selected using simple
random sampling

Advantages: The design needs only a list of clusters and not a list of elements Transportation and listing
cost are usually lower.

Disadvantages: Estimates are usually less reliable when compared to other sampling design It is not
cost-efficient if the cluster are large and the elements are homogenous with respect to the characteristic
under study
When to use: If there is no available list of elements If cost is more important than reliability of the
estimates

How sampling is done

1. What questions are being asked of the data?

2. Determine the frequency of sampling.

3. Determine the actual frequency times.

4. Select the subgroup (sample) size

Non-probability sampling

1. Accidental sampling
2. Quota sampling
3. Convenience sampling
4. Purposive sampling
5. Judgment sampling

Sources of Errors in Sampling

1. Non-sampling Error

- Errors that result from the survey process.

- Any errors that cannot be attributed to the sample-to-sample variability.

Sources of Non-Sampling Error

1. Non-responses

2. Interviewer Error

3. Misrepresented Answers Multi-Stage Sampling

4. Data entry errors

5. Questionnaire Design

6. Wording of Questions

7. Selection Bias

2. Sampling Error

- Error that results from taking one sample instead of examining the whole population.

- Error that results from using sampling to estimate information regarding a population.

You might also like