0% found this document useful (0 votes)
14 views

Data Analysis

The document discusses statistical analysis and experimentation. It covers topics like descriptive statistics, inferential statistics, data collection methods, sampling techniques, surveys, hypothesis testing, research design, and experiment design. Statistical analysis is used to investigate trends, patterns and relationships in quantitative data through careful planning and valid conclusions.

Uploaded by

Audrey Margallo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views

Data Analysis

The document discusses statistical analysis and experimentation. It covers topics like descriptive statistics, inferential statistics, data collection methods, sampling techniques, surveys, hypothesis testing, research design, and experiment design. Statistical analysis is used to investigate trends, patterns and relationships in quantitative data through careful planning and valid conclusions.

Uploaded by

Audrey Margallo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

DATA ANALYSIS ILLUSTRATION

Statistics - A subset of the population (sample) is taken


for a particular study. Data gathered is
- May be defined as the science that deals with
analyzed in order to draw conclusion about the
the collection, organization, presentation,
population. Since not all of the subjects in the
analysis and interpretation of data in order be
population are taken for the study, there will
able to draw judgments or conclusions that
be variations or uncertainty within data
help in the decision-making.
gathered. The role of the probability is to
Statistics is divided into two main divisions: reduce these uncertainties and increase the
strength or confidence in the conclusions.
 Descriptive Statistics deals with the
procedures that organize, summarize and Methods: Data Collection
describe quantitative data.
Data can be collected in a variety of ways. One of the
 Inferential Statistics deals with making a
most common methods is through the use of surveys.
judgment or a conclusion about a population
Surveys can be done by using a variety of methods.
based on the findings from a sample that is
Using samples saves time and money and in some
taken from the population
cases enables the researcher to get more detailed
Population or Universe information about a particular subject.

- Refers to the overall number of subjects under Samples cannot be selected in haphazard ways because
a particular study. the information obtained might be biased. To obtain
sample that are unbiased, statisticians four basic
Sample methods of sampling.
- Any subset of population Methods: Random Sampling
Data Random samples are selected by using chance methods
- Information collected on some characteristics or random numbers.
of a population or sample. Example:
Ungrouped Data (Raw Data) Placing numbered cards in a bowl, mix them
- Data which are not organized in any specific thoroughly and selected as many cards as needed.
way. They are simply the collection of data as Methods: Systematic Sampling
they are gathered
Researchers obtain systematic samples by numbering
Grouped Data each subject of the population and the selecting every
- Raw data organized into groups or categories nth subject.
with corresponding frequencies. Example:
Parameter For example, suppose there were 200 subjects in a
- Descriptive measure of a characteristics of a population and a sample of 20 subjects are needed. For
population every 10th the first subject will be selected.

Statistic Methods: Stratified Sampling

- Measure of a characteristic of sample Researchers obtain stratified sample by dividing the


population into groups called strata according to some
Constant characteristics that is important to the study, then
sampling from each group.
- Characteristic or property of a population or
sample which is common to all members of
the group.
Variable
- A characteristics or attribute that can assume
different values
-
PLANNING AND CONDUCTING SURVEYS - Investigating trends, patterns, and relationships
There are different ways for conducting a survey. using quantitative data. It is an important
research tool used by scientist’s governments,
 Telephone Surveys
businesses, and other organizations.
 Mailed questionnaire Surveys
- To draw valid conclusions, statistical analysis
 Personal Interview
requires careful planning from the very start of
Surveys can take different forms. They can be used to the research process. You need to specify your
ask only one question or they can ask a series of hypotheses and make decisions about your
questions. We can use surveys to test out people’s research design, sample size, and sampling
opinions or test a hypothesis procedure.
- After collecting data from your sample, you
When designing a survey, the following steps are can organize and summarize the data using
useful: descriptive statistics. Then you can use
1. Determine the objects of your survey: What inferential statistics formally test hypotheses
question do you want to answer? and make estimates about the population.
2. Identify the sample population: Whom will Finally, you can interpret and generalize your
you interview? findings
3. Choose an interviewing method 5 STEPS: STATISTICAL ANALYSIS
4. Decide what questions you will ask in what Step 1: Write your hypotheses and plan your
order, and how to phrase them research design
5. Conduct the interview and collect the data.
6. Analyze the result by making graphs and To collect valid data for statistical analysis, you first
drawing conclusions. need to specify your hypotheses and plan out your
research design.
The products and processes in the engineering and
scientific discipline are mostly derived from Writing statistical hypotheses
experimentation.
The goal of research is often to investigate a
Experiment relationship between variables within a population.
You start with a prediction, and use statistical analysis
- Is a series of tests conducted in a systematic to test that prediction.
manner for a better understanding of an
existing process or to explore a new product or Statistical Hypothesis
process.
- Formal way of writing a prediction about a
If time and resources are infinite there will be no need population. Every research prediction is
for designing experiments. In production and quality rephrased into null and alternative hypotheses
control, we want to control the error and learn as much that can be tested using sample data.
as we can about the process or the underlying theory
Null Hypothesis
with resources at hand.
- Always predicts no effect or no relationship
DESIGNING OF EXPERIMENTS between variables, the alternative hypothesis
The practical steps needed for planning and conducting
states your research prediction of an effect or
an experiment is somewhat similar to scientific
relationship.
method. These includes:
Examples: Statistical Hypotheses to test an effect
1. Recognition and statement of the problem
2. Choice of factors, levels and ranges Null Hypothesis:
3. Selection of the response varables
A 5-minute meditation exercise will have no effect on
4. Choice of design
math test scores in teenagers.
5. Conducting the experiment
6. Statistical analysis Alternative Hypothesis
7. Drawing conclusions and making
recommendations A 5-minute meditation exercise will improve math tests
scores in teenagers.

Statistical Analysis
PLANNING RESEARCH DESIGN Sampling for statistical analysis
Research design
There are two main approaches to selecting a sample.
- Overall strategy for data collection and
 Probability sampling: every member of the
analysis. It determines the statistical tests you
population has a chance of being selected for
can use to test your hypothesis later on.
the study through random selection.
- First, decide whether your research will use a
 Non-probability sampling: some members of
descriptive, correlational, or experimental
the population are more likely than others to
design. Experiments directly influence
be selected for the study because of criteria
variables, whereas descriptive and
such as convenience or voluntary self-
correlational studies only measure variables.
selection
Experimental design
Population
- Assess a cause-and-effect relationship (e.g, the
- Entire group that you want to draw
effect of meditation on test scores) using
conclusions about.
statistical tests of comparison or regression.
Sample
Correlational design
- Specific group that you will collect data from.
- Explore relationships between variables (e,g.,
The size of the sample is always less than the
parental income and GPA) without any
total size of the population.
assumption of causality using correlation
coefficients and significance tests. POPULATION VS SAMPLE
In research, a population doesn’t always refer to
Descriptive design
people. It can mean a group containing elements of
- Study the characteristics of a population or anything you want to study, such as objects, events,
phenomenon (e.g., the prevalence of anxiety in organizations, countries, species, organisms, etc.
U.S. college students) using statistical tests to
draw inferences from sample data. REASON FOR SAMPLING
 Necessity: Sometimes it’s simply not possible
VARIABLES (CORRELATIONAL) to study the whole population due to its size or
Parametric correlation test inaccessibility
 Practicality: It’s easier and more efficient to
- Can be used for quantitative data,
collect data from a sample.
Non-parametric correlation test  Cost-effectiveness: There are fewer
participant, laboratory, equipment, and
- Can be used if one of the variable is ordinal researcher costs involved.
5 STEPS: STATISTICAL ANALYSIS  Manageability: Storing and running statistical
Step 2: Collected Data from a Sample analyses on smaller datastes is easier and
reliable.
In most cases, it’s too difficult or expensive to collect
data from every member of the population you’re SAMPLING ERROR
interested in studying. Instead, you’ll collect data from - Difference between a population parameter
a sample. and a sample statistic. In your study, the
sampling error is the difference between the
Statistical Analysis mean political attitude rating of your sample
and the true mean political attitude rating of all
- Allows you to apply your findings beyond
undergraduate students in the Netherlands.
your own sample as long as you use
- Sampling error happen even when you use a
appropriate sampling procedures. You should
randomly selected sample. This is because
aim for a sample that is representative of the
random sample are not identical to the
population.
population in terms of numerical measures like
means and standard deviations.
5 STEPS: STATISTICAL ANALYSIS (example, age) or the relation between two variable
Step 3: Summarize your data with descriptive (age, creativity)
statistics
The next step is inferential statistics, which help you
Once you’ve collected all of your data, you can inspect decide whether your data confirms or refutes your
them and calculate descriptive statistics that hypothesis and whether it is generalizable to a larger
summarize them. population.
Inspect your data TYPES: DESCRIPTIVE STATISTICS
There are 3 main types of descriptive statistics:
 There are various ways to inspect your data,
including the following:  The distribution concerns the frequency of
 Organizing data from each variable in each value
frequency distribution tables.  The central tendency concerns the averages of
 Displaying data from a key variable in a bar the values.
chart to view the distribution of responses.  The variability or dispersion concerns how
 Visualizing the relationship between two spread out the values are.
variables using a scatter plot.
 By visualizing your data in tables and graphs, FREQUENCY DISTRIBUTION
A data set is made up of a distribution of values, or
you can assess whether your data follow a
scores. In tables or graphs, you can summarize the
skewed or normal distribution and whether
frequency of every possible value of a variable in
there are any outliers or missing data.
numbers or percentages.
Measures of central tendency describe where most
of the values in a data set lie.
Three main measures of central tendency are often
reported:
 Mode: The most popular response or value in
the data set.
 Median: The value in the exact middle of the
data set when ordered from low to high.
 Mean: The sum of all values divided by the
numbers of values.
Calculate measures of variability
Measures of variability tell you how spread out the
values in a data set are. Four main measures of
variability are often reported:
 Range: the highest value minus the lowest
value of the data set.
 Interquartile range: the range of the middle
half data set.
 Standard deviation: the average distance
between each value in your data set and the
mean.
 Variance: the square of the standard deviation. MEASURES OF CENTRAL TENDENCY
Estimate the center, or average, of a data set. The
DESCRIPTIVE STATISTICS mean, mean and mode are 3 ways of finding the
Descriptive statistics summarize and organize average.
characteristics of a data set. A data set is a collection of
responses or observations from a sample or entire Data Set: 15, 3, 12, 0, 24, 3
population. Total Number of Responses: 6
In quantitative research, after collecting data. The first
step of statistical analysis is to describe characteristics
of the responses, such as the average of one variable
MEAN RANGE
It is the most commonly used method for finding the It gives you an idea of how apart the most extreme
average. response scores are.
To find the mean, simply add up all response values
and divide and sum by the total number of responses.
The total number of responses or observations is called
N.

STANDARD DEVIATION
The average amount of variability in your data set. It
tells you, on average, how far each score lies from the
mean. The larger the standard deviation, the more
MEDIAN variable the data set is.
It is the value that’s exactly in the middle of a data set.
There are six steps for finding the standard deviation:
To find the median, order each response value from the
smallest to the biggest. Then, the median is the number 1. List each score and find their mean.
in the middle. If there are two numbers in the middle, 2. Subtract the mean from each score to get the
find their mean. deviation from the mean.
3. Square each of these deviations.
4. Add up all of the squared deviations.
5. Divide the sum of the squared deviation by N-
1.
6. Find the square root of the number you found.

MODE
It is simply the most popular or most frequent response
value. A data set can have no mode, one mode, or
more than one mode.
To find the mode, order your data set from lowest to
highest and find the response that occurs most
frequently.

VARIANCE
The average of squared deviations from the mean.
Variance reflects the degree of spread in the data set.
The more spread the data, the larger the variance is in
relation to the mean.
To find the variance, simply square the standard
MEASURES OF VARIABLITY
deviation. The symbol for variance is s^2
Gives you a sense of how spread out the response
values are. The range. Standard deviation and variance
each reflect different aspects of spread.
UNVARIATE DESCRIPTIVE STATISTICS
It focuses on only one variable at a time. It’s important
to examine data from each variable separately using
multiple measures of distribution, central tendency and
spread. Programs like SPSS and Excel can be used to
easily calculate these.

SCATTER PLOTS
- Is a chart that shows you the relationship
between two or three variables. It’s visual
representation of the strength of a relationship.
- In a scatter plot, you plot one variable along
the x-axis and another one along the y-axis.
Each data is represented by a point in the
chart.

BIVARIATE DESCRIPTIVE STATISTICS


Bivariate analysis
- Simultaneously study the frequency and
variable of two variables to see if they vary
together. You can also compare the central
tendency of the two variable before
performing further statistical tests. 5 STEPS: STATISTICAL ANALYSIS
Step 4: Test Hypotheses or Make Estimates with
Multivariate analysis Inferential Statistics
- Same as bivariate analysis but with more than A number that describes a sample is called a statistic,
two variables. while a number describing a population is called a
parameter. Using inferential statistics, you can make
CONTIGENCY TABLE
conclusions about population parameters based on
Each cell represents the intersection of two variables.
sample statistics.
Usually, an independent variable (e.g.,gender) appears
along the vertical axis and a dependent one appears Researchers often use two main methods
along the horizontal axis (e.g.,activities). (simultaneously) to make inferences in statistics.
 Estimation: calculating population parameters
based on a sample statistics
 Hypothesis testing: a formal process for testing
research predictions about the population
using samples.

ESTIMATION
 Point estimate: a value that represents your
best guess of the exact parameter
 Interval estimate: a range of values that
represent your best guess of where the
parameter lies
If your aim is to infer and report population COMPARISON TESTS
characteristics from sample data, it’s best to use both Comparison tests usually compare the means of
point and interval estimates in your paper. groups. These may be the means of different groups
within a sample, the means of sample group taken at
HYPOTHESIS TESTING different times, or sample mean and a population
Using data from a sample, you can test hypotheses
mean.
about relationships between variables in the
population.  T test is for exactly 1 or 2 groups when the
sample is small (30 or less).
Hypothesis testing starts with the assumption that the
 Z test is for exactly 1 or 2 groups when the
null hypothesis is true in the population, and you use
sample is large.
statistical tests to assess whether the null hypothesis
 An ANOVA is for 3 or more groups.
can be rejected or not.
The z and t tests have subtypes based on the number
STATISTICAL TESTS and types of samples and the hypotheses:
Statistical tests determine where your sample data
would lie on an expected distribution of sample data if  If you have only one sample that you want to
the null hypothesis were true. These tests give two compare to a population mean, use a one-
main outputs: sample test.
 If you have paired measurements (within-
 Test statistics tells you how much your data
subjects design), use a dependent (paired)
differs from the null hypothesis of the test
samples test.
 P value tells you the likelihood of obtaining
 If you have completely separate measurements
your results if the null hypothesis is actually
from two unmatched groups (between-subjects
true in the population.
design), use an independent (unpaired)
Statistical tests come in three main varieties: samples test.
 If you expect a difference between groups in a
 Comparison tests assess group differences in specific direction, use a one-tailed test.
outcomes.  If you don’t have any expectations for the
 Regression tests assess cause-and-effect direction of a difference between groups, use a
relationships between variables. two-tailed test
 Correlation tests assess relationships between
variables without assuming causation. The only parametric correlation test is Pearson’s r. The
correlation coefficient (r) tells you the strength of a
Your choice of statistical test depends on your research linear relationship between two quantitative variables.
questions, research design, sampling method, and data
characteristics. 5 STEPS: STATISTICAL ANALYSIS
Step 5: Test Hypotheses or Make Estimates with
PARAMETRIC TESTS Inferential Statistics
Parametric tests make powerful inferences about the
population based on sample data. But to use them, The final step of statistical analysis is interpreting your
some assumptions must be met, and only some types results.
of variables can be used. If your data violate these
Statistical significance
assumptions, you can perform appropriate data
transformations or use alternative non-parametric tests - Main criterion for forming conclusions. You
instead. compare your p values to a set significance
level usually (0.05) to decide whether your
A regression models the extent to which changes in a
results are statistically significant or non-
predictor variable results in changes in outcome
significant.
variable(s).
- Statistically significant results are considered
 Simple linear regression includes on predictor unlikely to have arisen solely due to chance.
variable and one outcome variable. There is only a very low chance of such a
 Multiple linear regression includes two or result occurring if the null hypothesis is true in
more predictor variables and one outcome the population.
variable.
DECISION ERRORS
Type 1 and Type II errors are mistakes made in
research conclusions.
 Type I error means rejecting the null
hypothesis when it’s actually true.
 Type II error means failing to reject the null
hypothesis when it’s false.
You can aim to minimize the risk of these errors by
selecting an optimal significance level and ensuring
high power. However, there’s a trade-off between the
two errors, so a fine balance is necessary

TABULAR AND GRAPHICAL METHODS


Summarizing Qualitative Data BAR GRAPH
- A bar graph is a graphical device for depicting
 Frequency Distribution qualitative data.
 Relative Frequency - On the horizontal axis we specify the labels
 Percent Frequency Distribution that are used for each of the classes.
 Bar Graph - A frequency, relative frequency, or percent
 Pie Chart frequency scale can be used for the vertical
axis.
FREQUENCY DISTRIBUTION
- Using a bar of fixed width drawn above each
- Is a tabular summary of data showing the
class label, we extend the height appropriately.
frequency (or number) of items in each of
- The bars are separated to emphasize the fact
several nonoverlapping classes.
each class is a separate category.
- The objective is to provide insights about the
data that cannot be quickly obtained by
looking only at the original data.

RELATIVE FREQUENCY DISTRIBUTION PIE CHART


- The relatively frequency of a class is the - The pie chart is a commonly used graphical
fraction or proportion of the total number of device for presenting relative frequency
data items belonging to class. distributions for qualitative data.
- A relative frequency distribution is a tabular - First draw a circle, then use the relative
summary of a set of data showing the relative frequencies to subdivide the circle into sectors
frequency for each class. that correspond to the relative frequency for
each class.
PERCENT FREQUENCY DISTRIBUTION - Since there are 360 degrees in a circle, a class
- The percent frequency of a class is the relative with a relative frequency of .25 would
frequency multiplied by 100. consume 0.25(360) = 90 degrees of the circle
- A percent frequency distribution is a tabular
summary of a set of data showing the percent
frequency for each class.
DOT PLOT
- One of the simplest graphical summaries of
data is a dot plot.
- A horizontal axis shows the range of data
values.
- Then each data values is represented by a dot
placed above the axis.

HISTOGRAM
- Another common graphical presentation of
quantitative data is a histogram.
- The variable of interest is placed on the
horizontal axis.
- A rectangle is drawn above each class interval
with its height corresponding to the interval’s
frequency, relative frequency, or percent
frequency.
- Unlike a bar graph, a histogram has no natural
separation between rectangles of adjacent
classes.
EXPLORATORY DATA ANALYSIS
- The techniques of exploratory data analysis
consist of simple arithmetic and easy-to-draw
picture that can be used to summarize data
quickly.
- One such technique is the stem-and-leaf
display.

STEM-AND-LEAF DISPLAY
- A stem-and-leaf display shows both the rank
order and shape of the distribution of the data.
- It is similar to a histogram on its side, but it
OGIVE
has the advantage of showing the actual data
- An ogive is a graph of a cumulative
values.
distribution.
- The first digits of each data item are arranged
- The data values are shown on the horizontal
to the left of a vertical line.
axis.
- To the right of the vertical line we record the
- Shown on the vertical axis are the:
last digit for each item in rank order.
 Cumulative frequencies
- Each line in the display is referred to as a stem
 Cumulative relative frequencies
- Each digit on a stem is a leaf
 Cumulative percent frequencies
- The frequency (one of the above) of each class
is plotted as a point.
- The plotted points are connected by straight
lines

- Leaf units
 A single digit is used to define each
leaf
 In the preceding example, the leaf unit
was 1
 Leaf units may be 100,10,1,0.1, and so
on.
 Where the leaf unit is not shown, it is
assumed to equal 1.
CROSSTABULATIONS AND SCATTER
DIAGRAMS
- Thus far we have focused on methods that are
used to summarize the data for one variable at
a time.
- Often a manager is interested in tabular and
graphical methods that will help understand
the relationship between two variables.
- Cross tabulation and a scatter diagram are two
methods for summarizing the data for two (or
more) variables simultaneously.
CROSSTABULATION
- Is a tabular method for summarizing the data
for two variables simultaneously
- Cross tabulation can be used when:
 One variable is qualitative and the
other is quantitative
 Both variables are qualitative
 Both variables are quantitative
- The left and top margin labels define the
classes for the two variables
SCATTER DIAGRAM
- A scatter diagram is a graphical presentation
of the relationship between two quantitative
variables
- One variable is shown on the horizontal axis
and the variable is shown on the vertical axis.
- The general pattern of the plotted points
suggests the overall relationship between the
variables

You might also like