0% found this document useful (0 votes)
11 views

Lecture 8 Statistics II Managing and Analysing Quantitative Data_2024

Uploaded by

9ryb64fgn9
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views

Lecture 8 Statistics II Managing and Analysing Quantitative Data_2024

Uploaded by

9ryb64fgn9
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

03SOC5001

SOCIAL RESEARCH METHODS I SEPTEMBER 2024

Statistics II: Managing and


Analysing Quantitative Data
Dr. Nick Tse
MINT Certified Trainer, Member of MINT, CCoun, CGC, Associate Fellow of HKPCA, R.S.W.
BSW HKU, MSc International Addiction Studies KCL, Ph.D. HKU
SEPTEMBER 2022

Statistics II:
Managing and
Analysing
Quantitative
Data

03SOC5001
SOCIAL RESEARCH METHODS I SEPTEMBER 2022
Tasks in Data Management and Analysis
Coding data
Creating rules and codes for different values of data collected

Entering data
Entering values of different data fields of all the data records collected

Cleaning data
Identifying and correcting errors in the data input process

Analysing data
Simplifying and making sense of the data

03SOC5001
SOCIAL RESEARCH METHODS I SEPTEMBER 2024
Coding Data
Codebook
a document containing information about each of the variables of all your
records in your dataset
example: Codebook for eSFQ

String variables VarName VarType Value Label

are variables St_ID String


that hold zero or
more text Module String

characters. Lecturer String


String values are 1. Strongly disagree

always treated 2. Disagree


3. Neutral
as text, even if
Q1 Ordinal 4. Agree
5. Strongly agree
they contain 99- Missing value

only numbers. Q2 Ordinal Same as Q1

1 - Yes
Q3 Nominal 2 - No
99 - missing value

03SOC5001
SOCIAL RESEARCH METHODS I SEPTEMBER 2024
Data Entry
1. Codebook and Training

2. Data Validation
Data validation refers to the process of ensuring the accuracy and quality of data.

3. Missing Data

4. Keeping a log

5. Automate to minimize human error

03SOC5001
SOCIAL RESEARCH METHODS I SEPTEMBER 2024
Data Cleaning
Garbage in, garbage out!
Eyeball checking
Set standards for data cleaning
Who to do the checking
Percentage of records to be checked
Error thresholds for remedial actions, etc.

03SOC5001
SOCIAL RESEARCH METHODS I SEPTEMBER 2024
Example of an Eyeball Checking

03SOC5001
SOCIAL RESEARCH METHODS I SEPTEMBER 2024
Descriptive Statistics

Frequency Distributions 頻數分佈

Measures of Central Tendency 集中趨勢的量度


Mode
Median
Mean

Measures of Variability 差異量數 離中趨勢量數是指描述一組數據離中差異情況和離散程度的量數

Range
Standard Deviation

03SOC5001
SOCIAL RESEARCH METHODS I SEPTEMBER 2024
Interential Statistics

Statistics that Determine Associations


is a hypothesis test that is used when you want to determine... whether
Chi-Square (Χ²) Tests 卡方檢驗 there is a relationship between two categorical variables.

Correlation
Pearson Correlation
Spearman's Rank Correlation

Statistics that Determine Differences


Dependent t Test
Independent t Test
One-way Analysis of Variance

03SOC5001
SOCIAL RESEARCH METHODS I SEPTEMBER 2024
Chi-Square (Χ²) Tests

03SOC5001
SOCIAL RESEARCH METHODS I SEPTEMBER 2024
Methods of Analysing Quantitative Data

Univariate analysis 單變量分析


a statistical analysis that considers only one factor or variable at a time

Bi-variate analysis 雙變量分析


An analysis of the relationship between two variables
E.g., Scatterplots, correlation, simple regression, t-test, simple linear regression

Multi-variate analysis 多變量分析


An analysis of the relationship between more than two variables

03SOC5001
SOCIAL RESEARCH METHODS I SEPTEMBER 2024
Univariate Analysis
Frequency Distribution
tabulating the distribution of cases in terms of number of counts and percentages

Measures of Central Tendency


Mean - arithmetic average of a variable
Median - the point where half cases are higher and half are lower
Mode - the most frequent or common score

Measures of Dispersion
Range - measures that indicates the highest and the lowest scores, (e.g., 1 - 3 vs 1 - 5)
Percentiles - the specific place of a score within the distribution (e.g., 97th percentile)
Standard deviation - average distance between scores and mean

03SOC5001
SOCIAL RESEARCH METHODS I SEPTEMBER 2024
Standard Deviation
The standard deviation is a measure of how spread out the data are

Formula

Interpretation

SD calculator: https://www.calculator.net/standard-deviation-calculator.html

03SOC5001
SOCIAL RESEARCH METHODS I SEPTEMBER 2024
Bivariate Analysis
Cross Tabulation
Cases are organized in the table on the basis of two variables at the same time

Bivariate statistics
Chi-square
Point-biserial correlation
Pearson correlation coefficient
Spearman rank correlation coefficient
t-test
ANOVA

03SOC5001
SOCIAL RESEARCH METHODS I SEPTEMBER 2024
Correlation coefficient
The correlation between 2 variables, often measured as a correlation
coefficient, indicates the strength and direction of a linear
relationship between 2 random variables

Denoted by r, the values of correlation coefficient is between -1 and +1.


The closer r is to 0, the less is the relationship between the two variables.

Types:
Pearson Product Moment Correlation Coefficient
Spearman Rank Correlation Coefficient

03SOC5001
SOCIAL RESEARCH METHODS I SEPTEMBER 2024
Chi-square
A test statistic that is calculated as the sum of the squares of observed
values minus expected values divided by the expected values

Chi-square statistic to test the fit between a theoretical frequency


distribution and a frequency distribution of observed data for which each
observation may fall into one of several classes.
Formula : χ2=(O − E)2/E, where O=observed, E=expected.
Example:
Question : Is the coin fair if in 100 tosses, 70 times are head, 30 times tail ?
Results : χ2=16 (sig.=0.000)
Conclusion : The chance of seeing data that deviate from the expected results is
very small. It is a statistically significant evidence of an unfair coin.
03SOC5001
SOCIAL RESEARCH METHODS I SEPTEMBER 2024
Chi-square - Example
Chi-square enables us to discover whether there is a relationship or
association between two categorical (nominal) variables.
Example:
Question :
Is there a relationship between smoking cigarettes and drinking alcohol in students?
Data :
No. of students smoke and drink: 50
No. of students smoke but not drink: 20
No. of students not smoke but drink: 15
No. of students not smoke and not drink: 25
Results : χ2 =12.121 (sig.=0.000)
Conclusion : The probability of getting a χ2 of this magnitude is very low. There is an
association between smoking and drinking among the students.
Calculator: https://www.socscistatistics.com/tests/chisquare2/default2.aspx

03SOC5001
SOCIAL RESEARCH METHODS I SEPTEMBER 2024
T-test

The t-test assesses whether the means of two groups are statistically
different from each other (IV = Nominal; DV = Scale)

Types of t-test
Paired t-test
Independent sample t-test

Calculator: https://www.graphpad.com/quickcalcs/ttest1.cfm

03SOC5001
SOCIAL RESEARCH METHODS I SEPTEMBER 2024
Paired T-test - Example
Case: 10 students with mild ADHD problems
participated in a programme aiming at helping them
improve their attention span. Their attention
average attention span before and after the
programme is listed in the Table on the right.

Question: Is there a significant difference in the


attention span of the participants before and after
the programme
Analysis : Paired t-test is used.
Results: the paired t-test results show that the
value of t is 3.543957. The value of p is .00628.
The result is significant at p < .05.
03SOC5001
SOCIAL RESEARCH METHODS I SEPTEMBER 2024
Paired T-test - Example
Case: 20 smarter students are allocated to Class A and another 20 weaker
students to Class B. The school principal decides if extra resources are needed
to help the students of the weaker class. A test was administered to students
in both classes to confirm if Class B is a weaker one. The test results are
presented in the Table on the right.

Question: Is there a significant difference in the academic achievement of the


students in the and after the programme

Analysis : Independent t-test is used.

Results: The 15 students in Class A (M = 71.3, SD =9.5) compared to the 15


participants in the control group (M = 60.13, SD = 12.32 ) demonstrated
significantly higher test result, t(28) = 2.73, p = .005301. The result is significant
at p < .05.

03SOC5001
SOCIAL RESEARCH METHODS I SEPTEMBER 2024
Analysis of Variance (ANOVA)
單因子變異數分析

A statistical method for making simultaneous comparison between two or more


means; a statistical method that yield values that can be tested to determine
whether a significant relationship exists between variables

Types of ANOVA
Independent ANOVA vs related ANOVA
One-way vs Two-way

F = Between-groups variance / within-group variance

例子:探討學歷 (Factor/因子) 和人工的關係 (Dependent variable/依變數)

03SOC5001
SOCIAL RESEARCH METHODS I SEPTEMBER 2024
Other Statistics

Consult a basic textbook on statistics

03SOC5001
SOCIAL RESEARCH METHODS I SEPTEMBER 2024

You might also like