FET 401 Week 8 Lecture Note
FET 401 Week 8 Lecture Note
What is Statistics?
Statistics are quantitative methods of describing, analyzing, and drawing inferences
(conclusions) from data. By using it, one can infer about population characteristics on the basis
of the sample observations. We have seen that no meaningful conclusion can be drawn by
merely looking to the experimental data; hence, appropriate statistical techniques are used for
drawing meaningful inferences. Thus, statistics provides us the know-how for collection,
compilation, and analysis of data scientifically.
Practitioners need to understand statistics:
To know how to properly present and describe information,
To know how to draw conclusions about large populations based only on information
obtained from samples,
To know how to solve key industry-related problems and make sensible, valid, and
reliable decisions on the basis of the statistical analysis conducted
Types of Statistics: There are two general types or categories of statistics that are often referred
to when making a statistical decision or working on a statistical problem:
1|Page
1. Descriptive Statistics: statistical procedures used for summarizing, organizing,
graphing and describing data. Descriptive statistics utilize numerical and graphical
methods to look for patterns in a data set, to summarize the information revealed in a
data set, and to present the information in a convenient form that individuals can use to
make decisions. The main goal of descriptive statistics is to describe a data set. Thus,
the class of descriptive statistics includes both numerical measures (e.g. the mean or the
median) and graphical displays of data (e.g. pie charts or bar graphs).
2. Inferential Statistics – statistical procedures that allow one to draw inferences to the
population on the basis of sample data. Represented as tests of significance (test
relationships and differences). Inferential statistics utilizes sample data to make
estimates, decisions, predictions, or other generalizations about a larger set of data.
Some examples of inferential statistics might be a z statistics or a t-statistics
The variables can be measured in one of three general ways: categorical, discrete and
continuous as discussed in our previous class.
Data in statistics is sometimes classified according to how many variables are in a particular
study. For example, “height” might be one variable and “weight” might be another variable.
Depending on the number of variables being looked at, the data might be univariate, or it might
be bivariate.
2|Page
Univariate analysis is the analysis of one (“uni”) variable.
Bivariate analysis is the analysis of exactly two variables.
Multivariate analysis is the analysis of more than two variables.
SPSS Overview
1. Data View
Used to display data
Columns represent variables
Rows represent individual units or groups of units that share common values of
variables
2. Variable View
Used to display information on variables in dataset
TYPE: Allows for various styles of displaying
LABEL: Allows for longer description of variable name
VALUES: Allows for longer description of variable levels
MEASURE: Allows choice of measurement scale
3. Output View
Displays Results of analyses/graph
3|Page
Entering Variables in SPSS
4|Page
Importing data from Excel
Select File Open Data
Choose Excel as file type
Select the file you want to import
Then click Open
5|Page
6|Page
7|Page
Descriptives Statistics
Descriptive statistics describe and summarize data. Descriptive statistics can be used to describe
a single variable (univariate analysis) or more than one variable (bivariate/multivariate
analysis).
Descriptive analysis explores each variable in a data set. It looks at the range of values, as well
as the central tendency of the value. It describes the pattern of response to the variables. Some
ways you can describe patterns found in univariate data include measure of central tendency
(mean, mode, and median) and dispersion (measure of variability): range, variance, maximum,
minimum, quartiles (including the interquartile range), and standard deviation.
Example: A researcher wants to describe the pattern and summarize the main features of 400 L
students of the Faculty of Engineering, Lead City University. He collected and analyzed the
following data.
Data: LCU FENG DATA.xls (available in the Sample Data folder).
In SPSS, the Descriptives procedure computes a select set of basic descriptive statistics for one
or more continuous numeric variables. In all, the statistics it can produce are:
• N valid responses (Number of valid responses/samples)
• Mean
• Sum
• Standard deviation
• Variance
• Minimum
• Maximum
• Range
• Standard error of the mean (or S.E. mean)
• Skewness
• Kurtosis
8|Page
Steps to conduct Descriptive Statistics in SPSS
Running the Procedure
1. Click Analyze > Descriptive Statistics > Descriptives/Frequency.
2. Add the variables to the Variables box.
3. Click OK when finished
To run the Descriptives procedure, select Analyze > Descriptive Statistics > Descriptives.
The Descriptives window lists all of the variables in your dataset in the left column. To select
variables for analysis, click on the variable name to highlight it, then click on the arrow button
to move the variable to the column on the right. Alternatively, you can double-click on the name
of a variable to move it to the column on the right.
9|Page
10 | P a g e
Outputs
Descriptive Statistics
N Range Minimum Maximum Mean Std. Deviation
Age_of_Students_Years 102 13.00 19.00 32.00 22.0098 2.86493
Height_of_Students_feets_Inches 102 2.20 4.90 7.10 5.7569 .56857
Valid N (listwise) 102
Interpretation
Here we see a side-by-side comparison of the descriptive statistics for the two numeric variables.
This allows us to quickly make the following observations about the data:
• The maximum age and height observed among 400 L FENG students are 32 years and 7
feet 1 inch, respectively.
• The minimum age and height observed among 400 L FENG students are 19 years and 4
feet 9 inches, respectively.
• The averages of the age and height were 22 years and approx. 5 feet 8 inches.
11 | P a g e
Frequencies
Output
Statistics
Age_of_Students_Years Height_of_Students_feets_Inches
N Valid 102 102
Missing 0 0
Mean 22.0098 5.7569
Median 21.0000 5.8000
Mode 20.00 5.20a
Std. Deviation 2.86493 .56857
Range 13.00 2.20
Minimum 19.00 4.90
Maximum 32.00 7.10
a. Multiple modes exist. The smallest value is shown
Frequency Table
Age_of_Students_Years
Frequency Percent Valid Percent Cumulative Percent
Valid 19.00 18 17.6 17.6 17.6
20.00 20 19.6 19.6 37.3
21.00 19 18.6 18.6 55.9
22.00 12 11.8 11.8 67.6
23.00 5 4.9 4.9 72.5
24.00 5 4.9 4.9 77.5
25.00 11 10.8 10.8 88.2
26.00 9 8.8 8.8 97.1
32.00 3 2.9 2.9 100.0
Total 102 100.0 100.0
12 | P a g e
Height_of_Students_feets_Inches
Frequency Percent Valid Percent Cumulative Percent
Valid 4.90 4 3.9 3.9 3.9
5.00 9 8.8 8.8 12.7
5.10 5 4.9 4.9 17.6
5.20 10 9.8 9.8 27.5
5.30 10 9.8 9.8 37.3
5.40 1 1.0 1.0 38.2
5.60 3 2.9 2.9 41.2
5.70 7 6.9 6.9 48.0
5.80 6 5.9 5.9 53.9
5.90 8 7.8 7.8 61.8
6.00 2 2.0 2.0 63.7
6.10 6 5.9 5.9 69.6
6.20 8 7.8 7.8 77.5
6.30 8 7.8 7.8 85.3
6.40 4 3.9 3.9 89.2
6.50 2 2.0 2.0 91.2
6.60 3 2.9 2.9 94.1
6.70 4 3.9 3.9 98.0
7.10 2 2.0 2.0 100.0
Total 102 100.0 100.0
Bar Chart
A bar diagram is a graph in which rectangular bars are created with lengths equal to
their values that they represent. These bars can be created vertically or horizontally.
The bar diagram is used for comparing the magnitudes of some discrete groups
having measured either in discrete or continuous manner.
13 | P a g e
Data: PaintFlaws.xls (available in the Sample Data folder).
Procedure
1. Click Graphs -> Legacy Dialogs -> Bar
2. Click Define
3. Select the variable for which you wish to create a bar chart, and move it into the
“Category Axis” box.
4. Select “Titles” to add a title (Optional)
5. Click Continue after you have added a title
6. Click OK
7. Your bar chart will appear in the SPSS viewer window
Output
Interpretation
This bar chart shows that Peel is the most common paint flaw and that Smudge and Other are
the least common paint flaws.
14 | P a g e
Example of a Clustered bar chart
A researcher wants to describe the pattern and summarize the main features of 400 L students of
the Faculty of Engineering, Lead City University. He collected and analyzed the following data.
Data: LCU FENG DATA.xls (available in the Sample Data folder).
Procedure
1. Click Graphs > Legacy Dialogs > Bar
2. Select “Clustered” and “Summaries for groups of cases”
3. Click Define
4. Select the variable you wish to display on the horizontal axis, and move it into the
“Category Axis” box
5. Select the second variable, and move it to the “Define Clusters by” box
6. Select your desired option under “Bars Represent”
7. Select “Titles” to add a title (Optional)
8. Click OK
Output
15 | P a g e
Interpret the results
Civil and electrical engineering departments have the highest number of second-class upper
degrees while mechanical engineering has the highest number of second-class lower degrees.
No student currently has a pass degree in the mechanical engineering department
Pie Chart
A Pie Chart is a type of graph that displays data in a circular graph. The pieces of the graph are
proportional to the fraction of the whole in each category. In other words, each slice of the pie is
relative to the size of that category in the group as a whole. The entire “pie” represents 100% of
a whole, while the pie “slices” represent portions of the whole.
16 | P a g e
Procedure
1. Click Graphs -> Legacy Dialogs -> Pie
2. Select “Summaries for groups of cases”
3. Click Define
4. Click “Reset” (recommended)
5. Move the variable for which you are creating a pie chart into the “Define slices by” box
6. Select your desired option under “Slices Represent”
7. Select “Titles” to add a title (recommended)
8. Click “OK”
Output
17 | P a g e
The Scattergram/Scatterplot
The scattergram is a visual expression of correlation coefficient. It provides the pattern of the
relationship between two variables. The scattergram can be obtained by plotting the paired data
along the X–Y-axis. The graphic so obtained can show the relationship between the variables. A
scattergram/scatterplot is obtained by plotting the independent variable and dependent variable
along X and Y axes, respectively.
As part of the initial investigation, the researcher creates a scatterplot of the body fat percentage
vs. BMI to evaluate the relationship between the two variables.
Procedure
1. Open the sample data, BodyFatPercentage.
2. Choose Graph > Legacy Dialogs > Scatter/Dot > Simple.
3. Under Y variables, enter %Fat.
4. Under X variables, enter BMI.
5. Click OK.
Output
18 | P a g e
Output with Regression
19 | P a g e
The scatterplot of the BMI and body fat data shows a strong positive and linear relationship
between the two variables. Body mass index (BMI) may be a good predictor of body fat
percentage.
As part of the initial investigation, the engineer creates a scatterplot of volts remaining after a
flash versus flash recovery time, grouped by battery formulation, to assess the relationship
between the two variables for the two formulations.
Data: FlashRecoveryTime.xls (available in the Scatter Plot Sample Data folder).
Procedure
1. Open the sample data, FlashRecoveryTime.
2. Choose Graph > Legacy Dialogs > Scatter/Dot > choose Simple Scatter > Define.
3. Under Y variables, enter Flash Recovery.
4. Under X variables, enter Volts After.
5. In Set Markers by, choose categorical variables for grouping, enter Formulation.
6. Click OK.
Outputs
20 | P a g e
Interpret the results
The scatterplot shows a negative linear relationship between the volts after and the flash recovery
time. As the amount of volts after the flash increases, the recovery time decreases. The new
formulation appears to require a shorter flash recovery time than the old formulation.
21 | P a g e