0% found this document useful (0 votes)
26 views21 pages

FET 401 Week 8 Lecture Note

1) The document discusses data analysis and descriptive statistics, including defining data analysis, variables, and types of statistics. 2) Descriptive statistics are used to summarize and describe data through measures of central tendency, dispersion, frequency distributions and charts. 3) SPSS software can be used to conduct descriptive statistics on data, including calculating means, standard deviations, ranges and creating frequency tables.

Uploaded by

Johnpraise
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views21 pages

FET 401 Week 8 Lecture Note

1) The document discusses data analysis and descriptive statistics, including defining data analysis, variables, and types of statistics. 2) Descriptive statistics are used to summarize and describe data through measures of central tendency, dispersion, frequency distributions and charts. 3) SPSS software can be used to conduct descriptive statistics on data, including calculating means, standard deviations, ranges and creating frequency tables.

Uploaded by

Johnpraise
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

MODULE 4: DATA ANALYSIS AND RESULTS PRESENTATION

WEEK 8: DATA PREPARATION AND DESCRIPTIVE STATISTICS


Data Analysis
Data analysis means a critical examination of the assembled and grouped data for studying the
characteristics of the object under study and for determining the pattern of relationships among
the variables relating to it. Data analysis is the categorizing, ordering, manipulating, and
summarizing of data, adding that its purpose is to reduce large quantities of raw data to
manageable and interpretable form so that characteristics of situations, events, and people can
be richly described and the relations among variables studied and interpreted. Analysis and
interpretation of data are crucial aspects of research. You will notice that the activities of data
analysis in research cannot be separated from statistics; scoring, categorizing, ordering,
manipulating, summarizing, interpreting data etc are all activities involving the use of statistics
in data analysis.

What is Statistics?
Statistics are quantitative methods of describing, analyzing, and drawing inferences
(conclusions) from data. By using it, one can infer about population characteristics on the basis
of the sample observations. We have seen that no meaningful conclusion can be drawn by
merely looking to the experimental data; hence, appropriate statistical techniques are used for
drawing meaningful inferences. Thus, statistics provides us the know-how for collection,
compilation, and analysis of data scientifically.
Practitioners need to understand statistics:
 To know how to properly present and describe information,
 To know how to draw conclusions about large populations based only on information
obtained from samples,
 To know how to solve key industry-related problems and make sensible, valid, and
reliable decisions on the basis of the statistical analysis conducted

Types of Statistics: There are two general types or categories of statistics that are often referred
to when making a statistical decision or working on a statistical problem:

1|Page
1. Descriptive Statistics: statistical procedures used for summarizing, organizing,
graphing and describing data. Descriptive statistics utilize numerical and graphical
methods to look for patterns in a data set, to summarize the information revealed in a
data set, and to present the information in a convenient form that individuals can use to
make decisions. The main goal of descriptive statistics is to describe a data set. Thus,
the class of descriptive statistics includes both numerical measures (e.g. the mean or the
median) and graphical displays of data (e.g. pie charts or bar graphs).

2. Inferential Statistics – statistical procedures that allow one to draw inferences to the
population on the basis of sample data. Represented as tests of significance (test
relationships and differences). Inferential statistics utilizes sample data to make
estimates, decisions, predictions, or other generalizations about a larger set of data.
Some examples of inferential statistics might be a z statistics or a t-statistics

Variables: A variable is any characteristic that is measured or captured in a dataset – any


factor/issue under investigation: e.g., tourist arrivals per year, expenditure, tourist satisfaction,
nationality, gender, number of nights in destination, perceived quality, likelihood to return to the
destination, etc.

The variables can be measured in one of three general ways: categorical, discrete and
continuous as discussed in our previous class.

Variables can also be classified as independent or dependent variables. Independent variables


are also called explanatory variables and are examined to see how they explain, predict, or
influence other variables – called dependent variables. Dependent variables are the variables
that are believed to be influenced by independent variables.

Data in statistics is sometimes classified according to how many variables are in a particular
study. For example, “height” might be one variable and “weight” might be another variable.
Depending on the number of variables being looked at, the data might be univariate, or it might
be bivariate.

2|Page
 Univariate analysis is the analysis of one (“uni”) variable.
 Bivariate analysis is the analysis of exactly two variables.
 Multivariate analysis is the analysis of more than two variables.

SPSS Overview

1. Data View
 Used to display data
 Columns represent variables
 Rows represent individual units or groups of units that share common values of
variables
2. Variable View
 Used to display information on variables in dataset
 TYPE: Allows for various styles of displaying
 LABEL: Allows for longer description of variable name
 VALUES: Allows for longer description of variable levels
 MEASURE: Allows choice of measurement scale
3. Output View
 Displays Results of analyses/graph

3|Page
Entering Variables in SPSS

4|Page
Importing data from Excel
 Select File Open Data
 Choose Excel as file type
 Select the file you want to import
 Then click Open

5|Page
6|Page
7|Page
Descriptives Statistics
Descriptive statistics describe and summarize data. Descriptive statistics can be used to describe
a single variable (univariate analysis) or more than one variable (bivariate/multivariate
analysis).

Descriptive analysis explores each variable in a data set. It looks at the range of values, as well
as the central tendency of the value. It describes the pattern of response to the variables. Some
ways you can describe patterns found in univariate data include measure of central tendency
(mean, mode, and median) and dispersion (measure of variability): range, variance, maximum,
minimum, quartiles (including the interquartile range), and standard deviation.

Example: A researcher wants to describe the pattern and summarize the main features of 400 L
students of the Faculty of Engineering, Lead City University. He collected and analyzed the
following data.
Data: LCU FENG DATA.xls (available in the Sample Data folder).

In SPSS, the Descriptives procedure computes a select set of basic descriptive statistics for one
or more continuous numeric variables. In all, the statistics it can produce are:
• N valid responses (Number of valid responses/samples)
• Mean
• Sum
• Standard deviation
• Variance
• Minimum
• Maximum
• Range
• Standard error of the mean (or S.E. mean)
• Skewness
• Kurtosis

8|Page
Steps to conduct Descriptive Statistics in SPSS
Running the Procedure
1. Click Analyze > Descriptive Statistics > Descriptives/Frequency.
2. Add the variables to the Variables box.
3. Click OK when finished

To run the Descriptives procedure, select Analyze > Descriptive Statistics > Descriptives.

The Descriptives window lists all of the variables in your dataset in the left column. To select
variables for analysis, click on the variable name to highlight it, then click on the arrow button
to move the variable to the column on the right. Alternatively, you can double-click on the name
of a variable to move it to the column on the right.

9|Page
10 | P a g e
Outputs

Descriptive Statistics
N Range Minimum Maximum Mean Std. Deviation
Age_of_Students_Years 102 13.00 19.00 32.00 22.0098 2.86493
Height_of_Students_feets_Inches 102 2.20 4.90 7.10 5.7569 .56857
Valid N (listwise) 102

Interpretation
Here we see a side-by-side comparison of the descriptive statistics for the two numeric variables.
This allows us to quickly make the following observations about the data:
• The maximum age and height observed among 400 L FENG students are 32 years and 7
feet 1 inch, respectively.
• The minimum age and height observed among 400 L FENG students are 19 years and 4
feet 9 inches, respectively.
• The averages of the age and height were 22 years and approx. 5 feet 8 inches.

11 | P a g e
Frequencies
Output
Statistics
Age_of_Students_Years Height_of_Students_feets_Inches
N Valid 102 102
Missing 0 0
Mean 22.0098 5.7569
Median 21.0000 5.8000
Mode 20.00 5.20a
Std. Deviation 2.86493 .56857
Range 13.00 2.20
Minimum 19.00 4.90
Maximum 32.00 7.10
a. Multiple modes exist. The smallest value is shown

Frequency Table
Age_of_Students_Years
Frequency Percent Valid Percent Cumulative Percent
Valid 19.00 18 17.6 17.6 17.6
20.00 20 19.6 19.6 37.3
21.00 19 18.6 18.6 55.9
22.00 12 11.8 11.8 67.6
23.00 5 4.9 4.9 72.5
24.00 5 4.9 4.9 77.5
25.00 11 10.8 10.8 88.2
26.00 9 8.8 8.8 97.1
32.00 3 2.9 2.9 100.0
Total 102 100.0 100.0

12 | P a g e
Height_of_Students_feets_Inches
Frequency Percent Valid Percent Cumulative Percent
Valid 4.90 4 3.9 3.9 3.9
5.00 9 8.8 8.8 12.7
5.10 5 4.9 4.9 17.6
5.20 10 9.8 9.8 27.5
5.30 10 9.8 9.8 37.3
5.40 1 1.0 1.0 38.2
5.60 3 2.9 2.9 41.2
5.70 7 6.9 6.9 48.0
5.80 6 5.9 5.9 53.9
5.90 8 7.8 7.8 61.8
6.00 2 2.0 2.0 63.7
6.10 6 5.9 5.9 69.6
6.20 8 7.8 7.8 77.5
6.30 8 7.8 7.8 85.3
6.40 4 3.9 3.9 89.2
6.50 2 2.0 2.0 91.2
6.60 3 2.9 2.9 94.1
6.70 4 3.9 3.9 98.0
7.10 2 2.0 2.0 100.0
Total 102 100.0 100.0

Bar Chart
A bar diagram is a graph in which rectangular bars are created with lengths equal to
their values that they represent. These bars can be created vertically or horizontally.
The bar diagram is used for comparing the magnitudes of some discrete groups
having measured either in discrete or continuous manner.

Example of a simple bar chart


A quality engineer for an automotive supply company wants to decrease the number of car door
panels that are rejected because of paint flaws. As part of the initial investigation, the engineer
creates a bar chart to compare the counts of paint flaws.

13 | P a g e
Data: PaintFlaws.xls (available in the Sample Data folder).

Procedure
1. Click Graphs -> Legacy Dialogs -> Bar
2. Click Define
3. Select the variable for which you wish to create a bar chart, and move it into the
“Category Axis” box.
4. Select “Titles” to add a title (Optional)
5. Click Continue after you have added a title
6. Click OK
7. Your bar chart will appear in the SPSS viewer window

Output

Interpretation
This bar chart shows that Peel is the most common paint flaw and that Smudge and Other are
the least common paint flaws.

14 | P a g e
Example of a Clustered bar chart
A researcher wants to describe the pattern and summarize the main features of 400 L students of
the Faculty of Engineering, Lead City University. He collected and analyzed the following data.
Data: LCU FENG DATA.xls (available in the Sample Data folder).

Procedure
1. Click Graphs > Legacy Dialogs > Bar
2. Select “Clustered” and “Summaries for groups of cases”
3. Click Define
4. Select the variable you wish to display on the horizontal axis, and move it into the
“Category Axis” box
5. Select the second variable, and move it to the “Define Clusters by” box
6. Select your desired option under “Bars Represent”
7. Select “Titles” to add a title (Optional)
8. Click OK

Output

15 | P a g e
Interpret the results
Civil and electrical engineering departments have the highest number of second-class upper
degrees while mechanical engineering has the highest number of second-class lower degrees.
No student currently has a pass degree in the mechanical engineering department

Pie Chart
A Pie Chart is a type of graph that displays data in a circular graph. The pieces of the graph are
proportional to the fraction of the whole in each category. In other words, each slice of the pie is
relative to the size of that category in the group as a whole. The entire “pie” represents 100% of
a whole, while the pie “slices” represent portions of the whole.

Example of Pie Chart


A quality engineer for an automotive supply company wants to decrease the number of car door
panels that are rejected because of paint flaws. As part of the initial investigation, the engineer
creates a pie chart to compare the counts of flaws in each category.
Data: PaintFlaws.xls (available in the Sample Data folder).

16 | P a g e
Procedure
1. Click Graphs -> Legacy Dialogs -> Pie
2. Select “Summaries for groups of cases”
3. Click Define
4. Click “Reset” (recommended)
5. Move the variable for which you are creating a pie chart into the “Define slices by” box
6. Select your desired option under “Slices Represent”
7. Select “Titles” to add a title (recommended)
8. Click “OK”

Output

Interpret the results


This pie chart shows that Peel is the most common paint flaw and that Smudge and Other are
the least common paint flaws.

17 | P a g e
The Scattergram/Scatterplot
The scattergram is a visual expression of correlation coefficient. It provides the pattern of the
relationship between two variables. The scattergram can be obtained by plotting the paired data
along the X–Y-axis. The graphic so obtained can show the relationship between the variables. A
scattergram/scatterplot is obtained by plotting the independent variable and dependent variable
along X and Y axes, respectively.

Example of a simple scatterplot


A medical researcher studies obesity in adolescent girls. Because body fat percentage is difficult
and expensive to measure directly, the researcher wants to determine whether the body mass
index (BMI)—a measurement that is easy to take—is a good predictor of body fat percentage.
The researcher collects BMI, body fat percentage, and other personal variables of 92 adolescent
girls.

As part of the initial investigation, the researcher creates a scatterplot of the body fat percentage
vs. BMI to evaluate the relationship between the two variables.

Procedure
1. Open the sample data, BodyFatPercentage.
2. Choose Graph > Legacy Dialogs > Scatter/Dot > Simple.
3. Under Y variables, enter %Fat.
4. Under X variables, enter BMI.
5. Click OK.

Output

18 | P a g e
Output with Regression

Interpret the results

19 | P a g e
The scatterplot of the BMI and body fat data shows a strong positive and linear relationship
between the two variables. Body mass index (BMI) may be a good predictor of body fat
percentage.

Example of a scatterplot (with regression and groups)


A quality engineer for a camera manufacturer wants to shorten the flash recovery time. Flash
recovery time is the least amount of time that is required between flashes. The engineer wants to
determine whether a relationship exists between the voltage that remains in the camera battery
immediately after a flash and the flash recovery time. The engineer also wants to determine
whether there are differences in flash recovery time between old and new formulations of the
battery. The engineer collects random samples of batteries made with the old and new
formulations. The engineer measures the volts remaining immediately after a flash and the flash
recovery time for each.

As part of the initial investigation, the engineer creates a scatterplot of volts remaining after a
flash versus flash recovery time, grouped by battery formulation, to assess the relationship
between the two variables for the two formulations.
Data: FlashRecoveryTime.xls (available in the Scatter Plot Sample Data folder).

Procedure
1. Open the sample data, FlashRecoveryTime.
2. Choose Graph > Legacy Dialogs > Scatter/Dot > choose Simple Scatter > Define.
3. Under Y variables, enter Flash Recovery.
4. Under X variables, enter Volts After.
5. In Set Markers by, choose categorical variables for grouping, enter Formulation.
6. Click OK.

Outputs

20 | P a g e
Interpret the results
The scatterplot shows a negative linear relationship between the volts after and the flash recovery
time. As the amount of volts after the flash increases, the recovery time decreases. The new
formulation appears to require a shorter flash recovery time than the old formulation.

21 | P a g e

You might also like