PSYC2009 Term 1 Notes
PSYC2009 Term 1 Notes
a. Experimental research – systematic attempt to manipulate theoretically relevant variables and to examine the effect
of these manipulations on the outcome variable (Haslam & McGarty)
- Purpose: cause and effect
Active intervention – attempt to make some change, monitor the impact of the change
Manipulation of variables
- Experimental and control groups
Experimental group – participants who are subjected to the manipulation/treatment
Control group – participants who are not subjected to the treatment
- Independent variable (IV) – manipulated systemically
- Dependent variable (DV) – effect on manipulation (measured after manipulation)
Casual inference [process of reaching conclusions about effect of one variable on another, or the outcome of such
a process] (and internal validity) can be drawn from well-designed experiments – when interpretation is correct
In order to have complete confidence – conditions must be identical in every respect except for differences in
treatment (IV)
Must equate conditions in experiment
Random assignment – allocate participants randomly to different conditions
Representative sample of population you are trying to research
Confounding - unintentionally manipulating variables – quite common
- Between-subjects design – different people exposed to different levels of the IV
- Within-subjects design – levels of the IV differ at different stages within the same participants
- Advantages
Causal inferences can be made – either the manipulation of the independent variable is to blame, or chance is to
blame
Experimental control over what they investigate and causal interpretation
- Disadvantages
Practicality
Reactivity effects (subjects can be made hyper-aware of their manipulation)
Informing people of what is being tested has the effect of altering their likely “natural” behaviour
Validity (ecological and internal)
Much harder to figure out why something happened than that it happened
Whatever choice a researcher makes – none of them provides a pure or direct measure of the variables in which researchers
are interested.
- Normally the research process involves taking many observations
- Taking large samples is helping to reduce uncertainty about conclusions – ensure that sample is drawn in a way that
does not bias results
Random sample – obtain participants from a sub-population
If wish to generalise findings – ensure that sample is a random sample of any sub-population and that sub-
population is representative of broader population on characteristics that we want to generalise
Designing a survey
- Interviews and surveys should approximate a conversation
Beginning, middle and end
Establish common ground and motivations for the researcher and participant
Begin with less intimate topics (such as demographics, age) and move to more intimate or difficult areas through the
middle, then end with safer topics
Topical consistency between questions
Announcements should be used to indicate shifts in topics where necessary
Topics should have a sequence or logical order where possible
Should always end with thanking the participant for their time and effort
Things to do:
- Make sure your question applies to everyone or be clear which to skip if not applicable
- Response options must be exhaustive and mutually exclusive – so all participants will fit into one box and no others
(appropriate option for all)
Measurement error
- Statistical deviation from the true value caused by the measurement procedure
- Observed score = true score +/- measurement error
- Error can be systematic or random
- Reducing measurement error
Standardising administration conditions with clear instructions and questions (e.g. self-administered)
Minimise potential demand characteristics (i.e. train your interviewers)
Use multiple procedures for “fuzzy” constructs and reliable scales
Representativeness
Depends on:
- Adequate sampling frame
- Sampling method
- Sample size
- Response rate (influenced by)
Potential participants level of interest (self-selection biases may occur)
Rewards
Accompanying letter/introduction
Layout and design
Colour of paper
Reminders/follow-up calls
Ease of returning survey – e.g. self-addressed stamped envelope included
Sampling process
1. Identify – target population and sampling frame
2. Select – a sampling method
3. Calculate – your sample size
4. Maximise – the return rate
Sampling methods
- A range of sampling methods are used in psychological research
- Tend to trade off costs (money, labour and time) against precision (randomness/accuracy)
- Various compromises inherent in the different sampling methods
- Precise estimates of population characteristics are not always an important requirement of research
- The sampling methods differ largely due to whether
Randomisation is employed
The sampling is organised for the convenience of the research
The random sampling is restricted to ensure better representativeness
- Response rates
Describe the proportions of people in the selected sample who contribute info to the study, number of factors:
Refusing to take part
Failure to contact
Inadequacies in sampling frame
Probability sampling
- Each member of the population has an equal chance of being selected
- Occurs by random chance
4. Cluster sampling
- Confines research to a limited number of relatively compact geographical areas – reducing travel and related costs to a certain
extent
- Choose locations that reflect key features of the diversity of the geographical area – e.g. a large city, a market town, a village,
and a rural station
- Clusters may be chosen at random but usually are not
- Proportions of people – it is possible to weight the sub-samples (clusters) in terms of their frequency in the geographical area
- Cases are selected from the clusters by random sampling
Non-probability sampling
- Useful for exploratory research or case studies
- Can get a large sample size quickly
- Sample subject to selection bias – might not be representative
5. Convenience sampling
- Whoever is convenient (nearby/available), rather than choosing at random
- Cheaper and faster
- Subject to sampling bias
- Very casual way
- When psychologists are studying basic psychological processes (e.g. memory or perception) common throughout all people,
there may be compelling reasons for thinking that a convenience sample is adequate
6. Quota sampling
- Allowing anyone to participate until quotas are full
- E.g. set % per age bracket, location, to match the census
- Good distribution, cheap
- Interviewer approaches people who appear to fit in categories specified by the interviewer’s employer
- Proportions depend on the purpose of the research
- Does not randomly select people in the categories – finds people who fit the category – no random selection
7. Purposive sampling
- Specific reason for choosing participants (e.g. typical respondents)
8. Snowball sampling
- Asking participants to recommend other people to try and recruit
- Great for difficult-to-access populations (e.g. illegal immigrants, drug users)
- More and more cases are gathered as the research proceeds
- Useful for categories of the individuals that are relatively uncommon or there is no obvious source of suitable people of the
desired type
- Relies on probability that people of a particular sort are most likely to know other people of that sort
Biases
Sampling bias
- Sample is not representative of the target population
Non-sampling biases
- Problems with measurement tools (reliability & validity) or response biases
Measuring variables:
- Measurements are approximate
- Psychological concepts hard to define precisely
Standard measures:
- Well-established measuring techniques
- Generally effective ways of measuring particular variables
- Attractive to researchers – avoid time and effort of creating new measures, available commercially
- Advantage – make comparisons with previous research findings easier
- Disadvantages
If everyone uses same measure harder to spot difficulties within a specific measure
Better to use variety of measures
May not be tailored for the particular group being studied
Important to find evidence of reliability and validity
Reliability
- Two types
- Test-retest reliability measuring something at two different points in time
Stability of a variable measured over time
- Inter-item reliability consistency in response to slightly different measures of same thing at same time
Inter-item – when set of questions being assessed
Inter-judge/inter-rater – when group of people doing the assessing
Validity
- Whether a measure actually measures what it is supposed to measure
- No fixed link between reliability and validity
- Four types
Face validity researcher’s impression that measure appears to be measuring what it is supposed to be measuring
Convergent validity extent to which a measure is strongly related to other measures expected or known to measure
the same thing
Discriminant validity extent that a measure does not measure the things it is not supposed to measure
Construct validity degree to which a measure responds in the way it is supposed to respond theoretically (measure
discriminates between two distinctly different things)
- Good reliability is not a requirement of a valid measure of an unstable variable
- Good reliability is necessary when creating a valid measure of a stable variable
- Often reported (reliability included) in terms of variants of the correlation co-efficient
Definitions of centre
- Balance scale – mean
The point at which the distribution is in balance
E.g. if each number weighs 1 pound where would it be possible to balance?
- Smallest absolute deviation – median
Sum of the absolute deviations (differences)
The centre of a distribution is the number for which the sum of the absolute deviations is smallest
- Smallest squared deviation – mean
Sum of squared differences (deviations)
Target that minimises the sum of squared deviations provides another useful definition of central tendency
Central tendency
- Statistics which represent “centre” of a frequency distribution
Mode – most frequent
Median – 50th percentile (middle score if scores are arranged from smallest to largest
Mean – average
- How do you know which one to use?
Depends on type of data (LOM) and shape of distribution (especially skewness)
- Reporting more than one might be appropriate
Mode (mo)
- Most common score – highest point in a frequency distribution, most common response
- Suitable for all levels of data (might not be meaningful for ratio/continuous data)
- Not affected by outliers
- Check frequencies and bar graph to see whether it is useful
- Most frequently occurring value in the dataset
- Continuous data – frequency of each value is one since no two scores will be exactly the same
Normally computed from a grouped frequency distribution
- Mode not frequently used for continuous data – still important
Median (mdn)
- The mid-point of the distribution – quartile 2, 50th percentile
- Same number of scores above median as below it
- Not badly affected by outliers
- Might not represent central tendency if data is skewed
- If median useful, other percentiles might be worth reporting
- When odd number of values – median is middle number
- When even number of values – median is the mean of the two middle numbers
Mean (arithmetic)
- Average score
- Calculated by summing all scores and dividing by number of scores
- Used for normally distributed ratio or interval data
- Sensitive to extreme scores or outliers
- Sometimes inappropriate (e.g. bimodal distribution – can be describing a value where it is possible we have no scores)
- Sum of the numbers divided by the number of values
- Other means exist – geometric mean, harmonic mean, and others – but when referring to a “mean” almost always referring to
the arithmetic mean
Distribution
- Measures shape, spread and dispersion of data, as well as deviations from central tendency
- How do you decide which statistics to use?
- Non-parametric
Minimum/maximum
Range – highest score minus lowest score – extremely sensitive to outliers
Percentiles – 75% percentile (upper hinge) and 25% percentile (lower hinge), IQR (H-spread)
- Parametric
Standard deviation
Skewness
Kurtosis
Variance
- Average squared distance from the mean
- Closer together results = smaller variance
- Farther apart results = larger variance
- (N – 1) – means variance is larger and provides better estimation of population
2
∑2( X − X́ )
s=
N−1
Standard deviation (SD)
- Standard deviation is the square root of the variance
- Use for normally distributed interval or ratio data
- Affected by outliers
s=
√ ∑ ( X − X́ )
N−1
Options for nominal LOM
- Nominal data = labelled categories
- Can describe data – but it is a little different
Which is most frequent?
Which is least frequent?
What are the frequencies?
Percentages?
Cumulative percentages?
Ratios (e.g. twice as many females as males)
Skewness
- Measure of the lean of the distribution (symmetry/asymmetry)
- Look for the tail (where there are fewer values)
Tail to right = positive skew peak pointing towards lower numbers
Tail to left = negative skew peak pointing towards higher numbers
- What causes skew?
Outliers
Floor effects – everyone gets low scores, few high scores
Ceiling effects – everyone gets high scores, few low scores
Check chart to see
- Skewed data is not always a mistake – e.g. depression scores
Kurtosis
- How flat vs peaked the distribution of the data is
Peaked data = positive kurtosis
Flat data = negative kurtosis
- A distribution can look more peaked or flat depending on how the graph is set up (the X and Y axes), so add a normal curve
to judge kurotsis visually
- Normal curve = mesokurtic
- Peaked curve = leptokurtic
- Flat curve = platykurtic
- How to judge how severe skewness and kurtosis is in a distribution
Check histogram
Are there outliers? Deal with them
Run skewness/kurtosis analyses (will give you value and significance for the test)
Rule of thumb: skewness and kurtosis values between -1 and +1 are generally “normal enough” to meet assumptions for
parametric inferential statistics, but many use +/- 2.5 as the cut off
The significance test for skewness tends to be overly sensitive
Bounded scales
- Sometimes highest or lowest response option is a “censored score” – we don’t know the participant’s exact score
- E.g. individuals who timed out on a task don’t have the exact time taken recorded
- Can look like truncation
If data non-normal:
- Non-parametric descriptive statistics
Min/max
Range
Percentiles
Quartiles
Q1 (25% percentile)
Q2 (median)
Q3 (75% percentile)
Interquartile ratio (Q3 – Q1) – middle 50%
- Ways to fix non-normal distribution
Use transformations to convert data to normal
Allows you to do more powerful tests (i.e. parametric)
Lose original metric – complicates interpretation
Steps to take
- What is the purpose of the graph?
To make large amounts of data coherent?
To present many numbers in a small space?
To encourage eye to make comparisons?
- Select type of graph to use
- Draw and modify graph to be clear, non-distorting and well-labelled
Helps to maximise clarity and minimise clutter
Show the data – avoid distortion
Types of graphs
- Non-parametric (nominal/ordinal data)
Bar graph
Pie chart
- Parametric (normally distributed interval or ratio)
Histogram
Stem and leaf plot
Box plot
Pie chart
- Can display same information as bar chart
- Disadvantages
Harder to read
Difficult to show small values or small differences
Rotating chart and position of slices influences perception (rotating = bad)
Histograms
- Continuous-type data (Likert with more than 5 categories, ratio data)
- X-axis needs happy medium number of categories
- Show shape
- For continuous placed in class intervals
- Best-suited for large amounts of data
- Create frequency table with intervals to group continuous values – width about 10 is good
- Can be based on relative frequencies instead of actual frequencies – show proportion of scores in each interval – rather than
number of scores
Y axis runs from 0 to 1
Box plots
- Use this for interval and ratio data
- Shows min and max, median, quartiles and outliers
- Good for screening data, comparing variables
- Can get messy (information overload)
- Maximum here is calculated as (Q3 + (1.5*IQR))
- Minimum here is calculated as (Q1 – (1.5*IQR))
- Can use for comparing variables
- Depicting differences between distributions
- Whiskers above and below each box to give additional information about the spread of data
Vertical lines + horizontal line on the end
Drawn from upper and lower hinges to upper and lower adjacent values
- Mean score indicated with a plus sign
Line graph
- Alternative to histogram
- Implies continuity (e.g. over time)
- Show multiple lines for different information
- Tops of the bars represented by points joined by lines (the rest of the bar is suppressed)
- Appropriate only when X and Y-axis display ordered (rather than qualitative) variables
- Misleading to use a line graph when the X-axis contains merely qualitative variables
Frequency polygons
- Graphical device for understanding the shapes of distributions
- Especially helpful for comparing sets of data displaying cumulative frequency distributions
- X-axis representing the values of the scores in your data
- Mark the middle of each class interval with a tick mark – label it with the middle value represented by the class
- Y-axis to indicate the frequency of each class
- Class interval at the height corresponding to its frequency
- Connect the points – one class interval below the lowest value in your data and one above the highest value – graph will touch
x-axis on both sides
- Easily discern shape of the distribution from this
Scatter plot
- Shows relationship between two variables
Types of probabilities
1. A priori, or theoretical
- Probability assigned as we do with dice or coins, on a logical basis
- Theoretical probabilities often are assumed or defined in human science research
- An issue of some controversy when one can do this – can be quite dangerous to assume that two events are equal probable
simply because we don’t know what their probabilities are
2. Frequentist/empirical
- Dominant definition of probability in statistics relative frequency of an event out of the number of opportunities for that
event to occur
- E.g. If you want to find out whether a coin is unbiased or not – throw it lots of times and record the results (number of
heads/number of throws)
The greater the number of throws, the more stable (and valid) the result key concept in probabilistic sampling: the
bigger the sample, the more precise the estimate of a population parameter
3. Bayesian, or subjective
- Frequentist probability refers to how often a particular event occurs in a long run of trials
- Bayesian = subjective plausibility of the occurrence of some event in the data
Places the probability of some event in the degree of belief of an observer
- E.g. the probability that X committed a particular crime
Frequentist – the relative frequency with which people with the characteristics of X commit crimes
Bayesian – the degree of belief I can have that X committed the crime, given a prior belief about the likelihood of X
committing crimes and the evidence that X committed the crime
2. Then fill in the rest of the table using logic and arithmetic to find the “missing” numbers:
How to use the table above if problem given to you in probabilistic format
- Suppose on an exam you are given the following information:
P(married now) = 0.45
P(married before) = 0.32
P(married now and married before) = 0.16
- And question is: What’s P(not married before and married now)
1. Multiply all the probabilities by 100 and think of a group of 100 people. So:
45 are married now
32 were married before
16 are married now and were married before
2. Set up a table for the two events and their opposites
Put the numbers in the appropriate cells in this table, including your total of 100 people
3. Then add up the numbers in your cells and divide by 100: 29/100 = 0.29
2. Now supposed we have conducted a large-scale testing study and are able to fill in this table with cases as follows:
Probability distributions
- In general terms – probability distribution describes the a priori likelihood of an outcome – assuming particular characteristics
(parameters) of the population from which the sample is being drawn
- In a probability distribution figure – the X axis describes outcome space (all possible outcomes for discrete variables, and a
mean with standard deviations around it for continuous variables), Y axis describes the probability mass function (for discrete
variables) or the probability density function (for continuous variables)
- Probability mass function – for each outcome what is the probability of achieving that outcome
- Probability density – what is the probability of achieving a point at that standard deviation
Normal distribution – continuous probability distribution used for continuous random variables
- Particularly widely used in psychology
- Central limit theorem states that the average of many observations from a random variable is itself a random variable
- This means that measurement error and physical processes that are the sum of many independent processes are normally
distributed e.g. height
- Unlike binomial distribution, normal distribution has the same symmetric bell shape no matter the value of its parameters (in
this case, mean and standard dev)
- This means that:
All we need to know of a variable that is normally distributed is its mean and standard deviation; and
We can convert any raw score from a normally distributed variable into a standardised score that allows comparisons
between observations and the theoretical distribution
- Z-score = (X – mean)/SD
- A normal distribution in Z-score units is a standard normal distribution mean = 0, SD = 1
The t distribution is a continuous probability distribution, used to represent the sampling distribution of a population parameter,
the mean
- If you take lots of observations from a population, you can derive an estimate of the population mean and the population
standard deviation
- The larger the sample size, the more these estimates will approximate the “true” population values
- Important, however, each sample drawn from a population will have a slightly different estimate of these parameters (e.g. the
mean sampling distribution of the mean)
- t = mean difference/standard error
SD
- It turns out there is a relationship between the SD from a sample and the standard error of the mean SE=
√n
- Given that t = mean difference/SE, this means that the shape of a t distribution differs as a function of sample size
(specifically degrees of freedom (df) unlike the Z distribution)
- For t, then, we need to know three parameters: mean difference, SE, and df
- The t distribution can be used to determine whether means from two samples are drawn from the same population
Going back to variation of bus example, imagine a sample of 6 buses is 51.83 seconds late on average, while a second
sample of 6 buses is 16.17 seconds late on average, and the SE is 14.29
What is the probability that they are from the same population?
df = n1 + n2 – 2
mean difference = 51.83 – 16.17 = 35.66
t(10) = mean difference/SE = 35.66/14.29 = 2.495
Look up t-distribution table p = 0.016
There is only a 1.6% probability that they are from the same population
F-distribution = a continuous probability distribution, used to represent the ratio of two variances
- Always positive because variances are always positive
- F-distribution is almost always greater than or equal to 1, because F values are usually calculated by dividing the larger
variance by the smaller variance (see ANOVA)
- F-distribution is positively skewed
- Shape of the F-distribution is determined by 2 parameters, df associated with variance 1 and df associated with variance 2
- Can be used to determine if two variances are equal or not
F(df1, df2) = Var1/Var2
If F = 1, the variances are equal
- E.g. imagine the variance in height for one group of 11 people is 49cm and the variance in height for another group of 16
people is 14cm is the variance of group 1 greater than group 2? What is the probability that they are from the same
population?
F(10, 15) = 49/14 = 3.5
Look up F table p = 0.014
There is only a 1.4% probability that the sample variance are from the same population
APA publication manual recommendations about effect sizes, confidence intervals and power
- 2001 (APA 5th edition) recommended reporting of effect sizes, power, etc.
- 2009 (APA 6th edition) further strengthened the requirements to use NHST as a starting point and to also include effect sizes,
confidence intervals and powers
- American statistical association (2016)
“Practices that reduce data analysis or scientific inference to mechanical “bright-line” rules (such as “p < 0.05”) for
justifying scientific claims or conclusions can lead to erroneous beliefs and poor decision making. A conclusion does not
immediately become “true” on one side of the divide and “false” on the other.”
Statement on significance testing and p-values
P-values can indicate how incompatible the data are with a specified statistical model
P-values do not measure the probability that the studied hypothesis is true, or the probability that the data were
produced by random chance alone
Scientific conclusions and business or policy decisions should not be based only on whether a p-value passes a
specific threshold
Proper inference requires full reporting and transparency
A p-value, or statistical significance, does not measure the size of an effect or the importance of a result
By itself, a p-value does not provide a good measure of evidence regarding a model or hypothesis
Recommendations to follow when using NHST
- Use traditional null hypothesis significance testing
- Also use complementary techniques (effect sizes and confidence intervals)
- Emphasis practical significance
- Recognise merits and shortcomings of each approach
- New statistics – aim to increase the transparency, completeness, and accuracy of research findings
Confidence interval – a range of values that contains a specified percentage of the sampling distribution of that statistic
Confidence level – the specific percentage above
- Intervals used to make inferences about the plausible values of a parameter on the basis of a sample estimate
- Plausible hypothetical value of a population statistic – one that lies within the confidence interval (inside it)
- Implausible hypothetical value of a population statistic – one that lies outside the confidence interval
- Model error – difference between a hypothetical value of a population statistic and the true value of that statistic
Sample statistical value = hypothetical population statistic value + model error + sampling error
- The best possible model is one that has no model error
- A plausible model is one for which 0 is a plausible value for model error. That is a plausible model predicts plausible values