0% found this document useful (0 votes)
100 views

PSYC2009 Term 1 Notes

This document provides an overview of quantitative research methods in psychology, including the scientific method, ethics in research, and different types of research designs. It discusses the ideals of universalism, organized skepticism, communalism, disinterestedness, and honesty in scientific research. It also outlines ethical principles regarding informed consent, confidentiality, anonymity, privacy, deception, stress/discomfort, and publishing research. Finally, it describes the key features and advantages/disadvantages of experimental, quasi-experimental, and non-experimental research designs.

Uploaded by

Abby Faure
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
100 views

PSYC2009 Term 1 Notes

This document provides an overview of quantitative research methods in psychology, including the scientific method, ethics in research, and different types of research designs. It discusses the ideals of universalism, organized skepticism, communalism, disinterestedness, and honesty in scientific research. It also outlines ethical principles regarding informed consent, confidentiality, anonymity, privacy, deception, stress/discomfort, and publishing research. Finally, it describes the key features and advantages/disadvantages of experimental, quasi-experimental, and non-experimental research designs.

Uploaded by

Abby Faure
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 24

PSYC2009 – Quantitative Methods in Psychology Lecture (and Reading) Notes

Lecture 1: Introduction to research

1.1. The scientific method: ideals and assumptions

Universalism – evidence-based claims judged on merit of evidence


- Regardless of scientist’s gender, ethnicity, political affiliation or other characteristics
Organised scepticism – all evidence-based findings are provisional
- Careful scrutiny on findings – no results accepted other than provisionally even when subject to replication by
others
Communalism – all scientific findings (& processes) should be made transparent
- Publicly available for scrutiny, shared freely with everyone
Disinterestedness – researchers should approach their research q’s (and receive research q’s) impartially
- Considered and tested alternative ideas on same footing – others could repeat tests (despite their opposing views)
and arrive at same conclusion
Honesty – don’t lie, deceive, conceal  a continuum
- Cheating/dissembling may result in banishment or ostracism

1.2. Ethics in research

Sources of ethics research:


- National statement on ethical conduct in human research (AUS)
- University ethics committees
- British psychological society – four ethical principles:
 Respect – valuing dignity and worth of all persons, respecting rights to privacy and self-determination
 Competence – maintenance & development of high competence, functioning optimally within limits of knowledge,
skill, training, education and experience
 Responsibility – value responsibilities to clients, avoiding harm, prevention of misuse/abuse of their research
 Integrity – value honesty, accuracy, clarity and fairness in their interactions, promote integrity
- American Psychological Association

Confidentiality, anonymity and privacy


- Confidentiality – not publishing data from participants/clients
- Anonymity – keeping participants’ identity protected in publications
- Privacy – people’s expectations that personal lives are not impacted by research participation
Informed consent
- Nature and purpose of study, how it will be used, any risks/benefits disclosed
- Right to withdraw/not participate – importance of consent
 Giving full information as to likely level of discomfort, voluntary nature of exercise, right to withdraw
 Reminder of this right throughout the procedure where discomfort is higher than anticipated
 Terminate procedure where levels are substantially higher than anticipated and/or participant is obviously
disturbed to an unacceptable level
Involuntary participation
- Unwitting participants/uninformed participants (e.g. watching covertly – social norm observation), can border on
breaching privacy rights
Deception
- If study is deceptive and cannot be altered to not be, ensure debrief with participants to ensure that they are fully
consenting to their information being used
- Menges (1973) – reviewed 1000 American studies, 80% involved deception, only 3% given complete information about
independent variable
- Consult with disinterested colleagues as to seriousness of deception – feasible alternatives?
- Possible to obtain permission to deceive from participants when they sign up for general research participation – not
often used
- Role-play can be an alternative – can be just as effective if done correctly
Debriefing
- Responsibility in all research studies to debrief participant – attempting to make sure participant feels the same about
themselves before/after experiment
- Reason and Rowan (1981) – “Good research means never having to say you are sorry”
Mental and physical stress and discomfort
- Debriefing with participant
- Is it necessary to cause this stress?
- Who can they speak to if necessary – contact information
- Follow-up to ensure participant is ok/reassess for next participant
- Any research where discomfort might be expected – seek opinion and advice from professional colleagues before going
ahead
Where and what to publish?
- Bottom draw filing – can be detrimental to patient’s wishes – they desire their work to be used, their effort seen and
heard
- Predatory journals – charge publishing fee and are not legitimate websites
- Issues are not just to do with directly harming the participant
- Cannot claim to have demonstrated an effect and withhold raw data or information on procedures and samples used
- Cannot allow the results of work to affect people’s lives before research has been verified, evaluated and replicated
- Honour the scientific endeavour over the attractions of media exposure and fame
- Exercise integrity over sources of funding for research (e.g. Pioneer Fund – six-figure sums to racist research – white
supremacy)
Special power of the investigator
- Obliged to give participant every chance to not participate before/during experiment
- Working against this is position of influence, prestige and power of investigator
- Look at the way consent is obtained from detained persons
- Special circumstances of children and adults with understanding/communication impairments

1.3. Types of research

a. Experimental research – systematic attempt to manipulate theoretically relevant variables and to examine the effect
of these manipulations on the outcome variable (Haslam & McGarty)
- Purpose: cause and effect
 Active intervention – attempt to make some change, monitor the impact of the change
 Manipulation of variables
- Experimental and control groups
 Experimental group – participants who are subjected to the manipulation/treatment
 Control group – participants who are not subjected to the treatment
- Independent variable (IV) – manipulated systemically
- Dependent variable (DV) – effect on manipulation (measured after manipulation)
 Casual inference [process of reaching conclusions about effect of one variable on another, or the outcome of such
a process] (and internal validity) can be drawn from well-designed experiments – when interpretation is correct
 In order to have complete confidence – conditions must be identical in every respect except for differences in
treatment (IV)
 Must equate conditions in experiment
 Random assignment – allocate participants randomly to different conditions
 Representative sample of population you are trying to research
 Confounding - unintentionally manipulating variables – quite common
- Between-subjects design – different people exposed to different levels of the IV
- Within-subjects design – levels of the IV differ at different stages within the same participants
- Advantages
 Causal inferences can be made – either the manipulation of the independent variable is to blame, or chance is to
blame
 Experimental control over what they investigate and causal interpretation
- Disadvantages
 Practicality
 Reactivity effects (subjects can be made hyper-aware of their manipulation)
 Informing people of what is being tested has the effect of altering their likely “natural” behaviour
 Validity (ecological and internal)
 Much harder to figure out why something happened than that it happened

b. Quasi-experimental research – no manipulation (IV not/can’t be manipulated)


- Identical to experimental but  no random assignment
- Quite common method
- Independent variable is not manipulated
- Assignments to groups on basis of pre-existing differences on the IV
- E.g. separation from partners causing reduction in feelings in older versus younger people
- Advantages
 Study variables that cannot be manipulated
 Good ecological validity
- Disadvantages
 Causal inferences cannot always be drawn because experiments are not as controlled
 May not be valid to conclude that any variation that is ultimately observed in the DV has been caused by
variation in the IV
 Serious drawback
 Extraneous variables – something else may be affecting the group
 Adversely affect researcher’s ability to interpret findings correctly – pose serious threats to validity
 Often dealt with by using multiple measures or conducting completely controlled experiments (experimental
method) where possible

c. Non-experimental research – “survey/correlation research” – good for naturally occurring patterns


- Common to lots of social scientific research
- E.g. opinion polls, market research, performance evaluations, censuses
- Conducted anywhere from structured telephone interviews to open-ended questionnaires
- Obtains information about a number of different variables and identifies relationships – methodology used to identify
extent and nature of associations (correlations)
- Advantages
 Ease of admin (faster, cheaper, more convenient for participants) – easy to administer anywhere
 Can examine relationships between variables that would be extremely difficult to isolate experimentally (e.g.
wealth, sexual preference, health, socio-economic status)
 Many variables can be tested – at same time in environment where they occur
 Make and test predictions about the relationship between particular variables
 This descriptive and predictive information can be very useful for a range of people
 Used by makers of social policy, consumer organisations, advertising, gov. depts., employers, etc.
- Disadvantages
 Applicability of the findings may be limited to the group of people that is actually studied – make sure sample is
representative of the whole population
 Generalisability
 Causal inferences (correlation does not equal causation – often overlooked) – researchers often keen to provide
explanations over descriptions
 Does not mean that researchers should conduct experiments instead
 Experimentation is not a substitute for careful observation and description of behaviour – importance place
for both procedures in the field

d. Case study research – used when phenomenon under study is rare


- Small sample size (sometimes n = 1)
- Some areas of psychology – participants are extremely hard to come by
- Special instance of the methods already considered above – experimentally manipulating a given variable over time
- Advantages
 In-depth exploration of theoretical ideas
 Experimental case studies – make appropriate inferences about causality – types of manipulation may be limited
 Observational case studies – good opportunity for detailed examination of significant psychological
phenomenon/event, do not allow for conclusive statements
 Can help to refute theories
- Disadvantages
 Conclusive statements hard to make
 Findings can be unreliable due to peculiarities of particular case – generally keen to replicate findings wherever
possible

From reading: Chapter 3, Research Methods (Haslam and McGarty, 1998)


Two types of psychological phenomenon:
- Stable psychological characteristics – e.g. intelligence, personality or certain beliefs
- Dynamic mental processes – e.g. memory, mood or judgement
 On-the-spot measurement (generally) – compare with others obtained under different conditions
 Administer particular psychological tests – do this to make statements about positions on the dimension in
question (e.g. narcissism, depression, verbal reasoning)
Three main measurement options (and one other one):
- Behavioural measures – study particular forms of behaviour, making inferences about psychological phenomena that
caused or contributed to them
 Usually tailored to specific question
- Self-report measures – asking about feelings, thoughts, reactions to a given issue
 Providing possible for people to reflect consciously on relevant thoughts or behaviours
 Can be used to supplement other behavioural measures or as standalone data
 Gain insight into processes and statistics
 Relatively cheap and easy to administer (usually survey method w large samples)
- Physiological measures – measures of mental traces and responses
 Generally used to examine things that are directly associated with particular forms of mental activity
 E.g. measuring heart rate, galvanic skin response - for anxiety or arousal
 E.g. blood flow or electrical activity in the brain – neural structures involved in particular forms of mental activity
 Can inform about enduring mental states – e.g. brain damage, low arousal
 Difficult and expensive to obtain (usually experimental method w small samples)
- Behavioural trace measures – obtaining physical traces of behaviour to inform about what people are currently doing or
have done in past
 Widely employed in some areas (e.g. environmental, consumer and organisational psychology)
 E.g. amount of electricity being used in a city – can infer if people are awake

Whatever choice a researcher makes – none of them provides a pure or direct measure of the variables in which researchers
are interested.
- Normally the research process involves taking many observations
- Taking large samples is helping to reduce uncertainty about conclusions – ensure that sample is drawn in a way that
does not bias results
 Random sample – obtain participants from a sub-population
 If wish to generalise findings – ensure that sample is a random sample of any sub-population and that sub-
population is representative of broader population on characteristics that we want to generalise

From reading: Statistics with confidence (Smithson, 1999)


Paths to knowledge or belief
- Researchers main contributive goals = add to what we know or believe and correcting erroneous beliefs
- Personal experience – for many it is synonymous with “reality-testing”
 Without adequate precautions and comparisons with other experiences, may be led astray
 May not comprise representative sampling
 Sometimes overgeneralise on the basis of our experiences
 Crucial component of scientific methods – Empirical methods
 Based on first-hand experiences
 Ascribes superior truth-status to things that have been directly observed or manipulated over things that
cannot be observed or manipulated
 Stance of impartiality (disinterestedness) towards all competing opinions or theories
 Scientist should take precautions in research so that someone with diff. values and opinions could repeat their
investigations and arrive at the same conclusions
- Authority or consensus – relies on second/third hand accounts of other’s experiences
 Authorities = sources with high status
 When relevant authorities agree on a proposition – consensus that makes proposition appear undoubtful
 Norms – usual or expected practice

Lecture 2: Sampling and Measurement

2.1. Designing surveys


Why do we research?
- Exploratory purposes – discover + explore
- Descriptive – describing phenomena
- Explanatory – explaining phenomena
- Predictive – make accurate predictions about what might happen next

Designing a survey
- Interviews and surveys should approximate a conversation
 Beginning, middle and end
 Establish common ground and motivations for the researcher and participant
 Begin with less intimate topics (such as demographics, age) and move to more intimate or difficult areas through the
middle, then end with safer topics
 Topical consistency between questions
 Announcements should be used to indicate shifts in topics where necessary
 Topics should have a sequence or logical order where possible
 Should always end with thanking the participant for their time and effort

Concepts and their measurements


- “Concepts express common elements in the world” – e.g. self-esteem, personality, intelligence, environmental attitudes
- Concept needs to be operationally defined (by creating a working definition) in order to be systematically researched
 The art of making a fuzzy concept measurable
 Multi-item measures should be used to assess related but distinct aspects of a concept
- Psychological constructs include felt emotions, perceptions, attitudes and are measured using the average (composite score)
of a number of correlated survey items

Steps to take when writing survey items


1. Brainstorm indicators of a concept – find already used measures (often useful for comparative work in discussion phase)
2. Define the concept – what does it measure, what doesn’t it measure?
3. Draft measurement items
4. Examine psychometric properties – how precise are the measures?
5. Redraft/redefine and re-test

Objective vs. subjective measures


- Objective
 A true answer exists
 A participant theoretically can answer these questions accurately
 Observations that all observers will agree about – even detached observers
 Easily measured or quantified – and can be measured precisely
 Generally refer to number, size or duration of specific units
- Subjective
 Asking about personal perceptions
 No “true”/factual answer, instead many possible answers
 Participant cannot answer these questions accurately
 Observations that are idiosyncratic to a particular observer – not shared by all observers
 Research generally avoid subjective measurements – thought subjectivity may be their important characteristic
 Measures of concepts rather difficult to define and on which it is difficult to get much agreement
 Like to differ considerably from person to person
- Objective + subjective
 Rarely are measures completely objective or completely subjective
 Complete objectivity – relatively simple things
 Trade-off between degree of objectivity and complexity of psychological variables investigated
 Objectivity can generally refer to the accuracy with which data is transcribed
 E.g. multiple choice questions described as objective

Open-ended vs. closed-ended


- Not a characteristic of a particular research method – dimension along which data collection procedures can vary
- Open-ended (unstructured)
 Empty space for answer
 Useful for gathering rich info
 Good for descriptive/explanatory work
 More difficult and subjective to analyse
 Time consuming
 Employed more when exploring issues not previously widely researched
 Allow for richness of detail
 Richer data but researcher needs to be able to take advantage of the richness otherwise effort wasted
 Particularly suited to exploratory and pilot studies
 Structured after the data are collected
- Closed-ended (structured)
 When there is a degree of clarity about nature of variables measured and likely range of different answers
 Pre-set response options
 Good for hypothesis testing
 Easy, quick, and objective to analyse – can be pre-coded
 Participants are restricted in their options and so may potentially lose important information
 May be not very motivating to respondents to feel they wish to express themselves precisely about an issue – not given
opportunity to do so
 Structured before data collected

2.2. Response formats – for closed response questions

Name of format Description # of options Example


Dichotomous Choosing between two response 2 E.g. Do you have any
options pets? (yes/no)
Simplest type of quantification
(categorical/binary data)
May not discriminate very well
between people in all circumstances
Multichotomous Choose from multiple response No limit E.g. Which pet is most
options appealing to you?
(dog/cat/rodent)
Multiple response Choose all that apply No limit E.g. Which pets do you
have? (dog/cat/rodent)
Ranking Measures the relative importance of No limit Self-explanatory
several items
Verbal frequency Measuring frequency of something Generally 4 E.g. over the last 12
occurring in words options but months, how often have
People can be a little more precise as can be more you thought about
to where they fall on an issue breaking up with your
More common than numerical scales partner?
– require less explanation and do not Options: all the time,
require participant to remember what fairly often, occasionally,
each number means never, N/A
Likert scale Measures strength of feeling or Between 3-9 E.g. recreational
perception options marijuana use should be
Participants choose one option from usually legalised
equally spaced intervals (basic rule: Options: disagree, slightly
Scale should balance sensitivity and 7+/-2 disagree, neutral, slightly
reliability agree, agree
Semantic differential Put two words at opposite ends of a No limit – E.g. introvert to extrovert
scale with interval marks generally
around 7
Graphical Mark your response with a cross on a No limit E.g. introvert to extrovert
line
Non-verbal Mostly for engaging with children Generally 5 E.g. rate your feeling
Rate feeling through images but again no about what happened to
limit the toy.

2.3. Designing good survey questions

Things to do:
- Make sure your question applies to everyone or be clear which to skip if not applicable
- Response options must be exhaustive and mutually exclusive – so all participants will fit into one box and no others
(appropriate option for all)

Survey items should avoid:


- Requiring the respondent to:
 Keep track of too much information
 Perform mental calculations
 Recall things that are difficult or too far in the past
 Absorb new information or skills on the spot
- Making questions unsuitable for the age or ability range being studied
- Complex or technical language
- Asking opinions on topics with which the respondent has no familiarity or knowledge
- Double-barrelled questions
 Questions that contain more than one concept/purpose
 Hard to know which part they are responding to
 Simplify or split into questions separately
- Double-negative questions
 E.g. “I never have to worry about not passing my courses”
 Can also be created in the way participants respond
 E.g. “I have never felt depressed”
 Responding “disagree” creates a double-negative and is confusing
- Leading questions
 Questions that suggested the answer the researcher is looking for
 E.g. “What dangers do you see with fireworks?”
- Loaded questions
 Questions that suggest socially desirable answers or are emotionally charged
 E.g. “Do you support a ban on fireworks to improve animal welfare?”

Measurement error
- Statistical deviation from the true value caused by the measurement procedure
- Observed score = true score +/- measurement error
- Error can be systematic or random
- Reducing measurement error
 Standardising administration conditions with clear instructions and questions (e.g. self-administered)
 Minimise potential demand characteristics (i.e. train your interviewers)
 Use multiple procedures for “fuzzy” constructs and reliable scales

Reliability and validity


- Ways of evaluating accuracy of psychological tests and measures
- Reliability – consistency of
 Items within measure
 Measure over time
- Validity – whether measure actually measures the intention

2.4. Sampling overview

Target population = who do you want to generalise to?


Sampling frame = who has a chance of being selected?
Sample = who was selected and responded
- Small selection from a population
- Small selection of all the scores on a variable (alternate meaning)
- Very much neglected aspect of psychological research
Representativeness = to what extent is the sample a good indicator of the target population

Why not select the entire target population?


- Sampling reduces time, cost and sample size (n)
- If sample is representative, then you can still draw inferences about the target population from the data

Representativeness
Depends on:
- Adequate sampling frame
- Sampling method
- Sample size
- Response rate (influenced by)
 Potential participants level of interest (self-selection biases may occur)
 Rewards
 Accompanying letter/introduction
 Layout and design
 Colour of paper
 Reminders/follow-up calls
 Ease of returning survey – e.g. self-addressed stamped envelope included

Sampling process
1. Identify – target population and sampling frame
2. Select – a sampling method
3. Calculate – your sample size
4. Maximise – the return rate

Sampling methods
- A range of sampling methods are used in psychological research
- Tend to trade off costs (money, labour and time) against precision (randomness/accuracy)
- Various compromises inherent in the different sampling methods
- Precise estimates of population characteristics are not always an important requirement of research
- The sampling methods differ largely due to whether
 Randomisation is employed
 The sampling is organised for the convenience of the research
 The random sampling is restricted to ensure better representativeness
- Response rates
 Describe the proportions of people in the selected sample who contribute info to the study, number of factors:
 Refusing to take part
 Failure to contact
 Inadequacies in sampling frame

Probability sampling
- Each member of the population has an equal chance of being selected
- Occurs by random chance

1. Simple random sampling (SRS)


- “Classic” approach
- Everyone in the target population has an equal chance of selection
- Need to have a list of all of the members of the population in question
- Random samples are subsets or subgroups of the population selected in a way which gives each member of the population an
equal chance of being selected
- Any other selection method is biased – some cases have a better chance of being selected than others
- Involves a degree of rigour and organisation – not a haphazard process but a carefully regulated one
- Must involve a demonstrably random selection process
- Each member numbered and appropriate size sample selected by generating random numbers
- Geographically spread-out locations – random sampling may result in too much travelling time to collect data – may be
uneconomical as people may not be available at first contact
- Random samples may be very unrepresentative of the population
- Random sampling will yield representative outcomes only in the long run

2. Systematic random sampling


- Respondents are selected from a list at regular intervals
- E.g. every 5th person on the list

3. Stratified random sampling


- Sub-divide population into strata (e.g. gender, age, location), randomly select using SRS from each stratum
- Improves representativeness
- E.g. phone interviews using post-code strata
- Modified version of random sampling – ensures final sampling includes people ranging in characteristics and in the desired
proportions
- Good compromise between randomness and precision
- Does nothing to make data collection more efficient

4. Cluster sampling
- Confines research to a limited number of relatively compact geographical areas – reducing travel and related costs to a certain
extent
- Choose locations that reflect key features of the diversity of the geographical area – e.g. a large city, a market town, a village,
and a rural station
- Clusters may be chosen at random but usually are not
- Proportions of people – it is possible to weight the sub-samples (clusters) in terms of their frequency in the geographical area
- Cases are selected from the clusters by random sampling

Non-probability sampling
- Useful for exploratory research or case studies
- Can get a large sample size quickly
- Sample subject to selection bias – might not be representative

5. Convenience sampling
- Whoever is convenient (nearby/available), rather than choosing at random
- Cheaper and faster
- Subject to sampling bias
- Very casual way
- When psychologists are studying basic psychological processes (e.g. memory or perception) common throughout all people,
there may be compelling reasons for thinking that a convenience sample is adequate

6. Quota sampling
- Allowing anyone to participate until quotas are full
- E.g. set % per age bracket, location, to match the census
- Good distribution, cheap
- Interviewer approaches people who appear to fit in categories specified by the interviewer’s employer
- Proportions depend on the purpose of the research
- Does not randomly select people in the categories – finds people who fit the category – no random selection

7. Purposive sampling
- Specific reason for choosing participants (e.g. typical respondents)

8. Snowball sampling
- Asking participants to recommend other people to try and recruit
- Great for difficult-to-access populations (e.g. illegal immigrants, drug users)
- More and more cases are gathered as the research proceeds
- Useful for categories of the individuals that are relatively uncommon or there is no obvious source of suitable people of the
desired type
- Relies on probability that people of a particular sort are most likely to know other people of that sort

Biases

Sampling bias
- Sample is not representative of the target population

Non-sampling biases
- Problems with measurement tools (reliability & validity) or response biases

Yea- and nay- saying (acquiescence bias)


- Fix by including pro and contrait items
- Put half in a positive direction, other half in negative direction

Participant fatigue and order effects


- Fix by counter-balancing (different participants get different orders)

Social desirability bias


- Better-than-average effect: sometimes this is what you are trying to measure
- Include objective measures if possible
- Reassure participants that best answer = their honest opinion, and responses are anonymous (if true)
- Demand characteristics
 Disguise aim of the research from participants (and disclose if after they have done the study
 Researcher blind to which participants are in which group

Level of measurement (LOM)


- Discrete – only certain values, small number
- Continuous – can take value over any range (theoretically infinite)
 No absolute rules about discrete/continuous, consider all aspects very carefully – measurements should be precise
enough that there is some discrimination between people
- Nominal – attributes are named (e.g. dog person, cat person)
- Ordinal – attributes are ordered (e.g. rankings, positions in a race)
- Interval – distance is meaningful (exact distances between attributes) (e.g. Likert scale when number of options is >= 5 and
response options are mirrored)
- Ratio – absolute zero is meaningful (e.g. height, age, weight, time, # times events occur)
 Each level has the properties of the levels before + something additional

Why should we care about the LOM?


- Analytical procedures used for different levels of data
- Apply more powerful statistics to higher levels
- You can recode to a lower LOM, but never a higher one

From reading: HC 2000 – Chapter 4: Measuring Variables

Measuring variables:
- Measurements are approximate
- Psychological concepts hard to define precisely

Standard measures:
- Well-established measuring techniques
- Generally effective ways of measuring particular variables
- Attractive to researchers – avoid time and effort of creating new measures, available commercially
- Advantage – make comparisons with previous research findings easier
- Disadvantages
 If everyone uses same measure harder to spot difficulties within a specific measure
 Better to use variety of measures
 May not be tailored for the particular group being studied
 Important to find evidence of reliability and validity

Reliability
- Two types
- Test-retest reliability  measuring something at two different points in time
 Stability of a variable measured over time
- Inter-item reliability  consistency in response to slightly different measures of same thing at same time
 Inter-item – when set of questions being assessed
 Inter-judge/inter-rater – when group of people doing the assessing

Validity
- Whether a measure actually measures what it is supposed to measure
- No fixed link between reliability and validity
- Four types
 Face validity  researcher’s impression that measure appears to be measuring what it is supposed to be measuring
 Convergent validity  extent to which a measure is strongly related to other measures expected or known to measure
the same thing
 Discriminant validity  extent that a measure does not measure the things it is not supposed to measure
 Construct validity  degree to which a measure responds in the way it is supposed to respond theoretically (measure
discriminates between two distinctly different things)
- Good reliability is not a requirement of a valid measure of an unstable variable
- Good reliability is necessary when creating a valid measure of a stable variable
- Often reported (reliability included) in terms of variants of the correlation co-efficient

Ways of measuring in psychology:


- Scientific instrumentation is commonplace in certain areas of psychology
 Numerous different forms
 EEG machines in clinical and sleep research
 Brain scanners in psychophysiological applications
 Plethysmography to measure the erections of sexual predators
 Computers
- Certain branches of psychology  observations of behaviour
- Interviews
 Some highly structured and demand precise answers to precise questions
 Some much more open ended and unstructured
 Style dependent on a range of factors – extent to which the subject matter has been clearly defined & extent that the
purpose of the research is to explore issues
- Self-completion questionnaire
 Scale = tries to measure complex psychological variables using a variety of questions on the same concept
 Value of a measurement technique lies in the evidence of its reliability and validity
 Advantages
 Variables can be measured which are otherwise difficult or impossible to measure
 Measure fairly complex thoughts, feelings and beliefs
 Cheap and not too time-consuming
 Substantial amounts of information may be collected
 Answers do not have to be classified by the researcher
 By combining several items in these questionnaires
 Inter-item reliability can be assessed
 Assess the variable in different circumstances
 Increase our ability to discriminate between people
 Disadvantages
 Studied solely through words of individuals who may be unable to report accurately on beliefs and behaviour
 No check on the accuracy of individual’s claims
 Some aspects of life lengthy and detailed study required – need to take great care to elicit information from
participants
 May collect data just in case it can be used
 Little opportunity to clarify or expand answers
 May alienate people because of superficiality or difficulty of answering
- Scoring
 A score is merely a number that indicates the extent to which an individual shows a particular characteristic
 Binary response – yes=1 and no=2
 Five-point response format – agree very much=5, agree=4, neither agree nor disagree=3, disagree=2, disagree very
much=1
 Answers can be weighted to give more emphasis to certain replies

Lecture 3: Descriptive statistics and graphing

3.1. Levels of measurement of statistics

Levels of measurement review


- Nominal – attributes are named (e.g. dog person, cat person)
- Ordinal – attributes are ordered (e.g. rankings, positions in a race)
- Interval – distance is meaningful (exact distances between attributes) (e.g. Likert scale when number of options is >= 5 and
response options are mirrored)
- Ratio – absolute zero is meaningful (e.g. height, age, weight, time, # times events occur)
 Each level has the properties of the levels before + something additional
- Determines the types of statistics you can use
 Pay attention to LOM of your data
 Determines what kinds of descriptive statistics, graphs and inferential statistics you can use

Descriptive vs. inferential statistics


- Descriptive – describing the data
- Inferential – making inferences about the population based on the sample

Parametric vs. non-parametric statistics


- Non-parametric – categorical and ordinal dependent variables
- Interval and ratio dependent variables are more complicated
 If distribution = normal  parametric
 If distribution = non-normal  non-parametric
- Parametric statistics
 Estimate the parameters of a population based on the normal distribution
 Mean, standard deviation, skewness, kurtosis
 More powerful
 More sensitive
 More assumptions – normal distribution
 Vulnerable to violations of assumptions – less robust
- Non-parametric statistics
 Do not assume sampling from a population which is normally distributed
 Median, frequencies
 Less powerful
 Less sensitive
 Fewer assumptions – do not assume normal distribution
 Less vulnerable to assumption violations – more robust

3.2. Descriptive statistics

What do you want to describe?


- Central tendency – frequency, mode, median, mean
- Distribution
 Spread (dispersion): minimum/maximum value, range, percentiles, variance, standard deviation
 Shape: skewness, kurtosis

Definitions of centre
- Balance scale – mean
 The point at which the distribution is in balance
 E.g. if each number weighs 1 pound where would it be possible to balance?
- Smallest absolute deviation – median
 Sum of the absolute deviations (differences)
 The centre of a distribution is the number for which the sum of the absolute deviations is smallest
- Smallest squared deviation – mean
 Sum of squared differences (deviations)
 Target that minimises the sum of squared deviations provides another useful definition of central tendency

Central tendency
- Statistics which represent “centre” of a frequency distribution
 Mode – most frequent
 Median – 50th percentile (middle score if scores are arranged from smallest to largest
 Mean – average
- How do you know which one to use?
 Depends on type of data (LOM) and shape of distribution (especially skewness)
- Reporting more than one might be appropriate

What are the most appropriate measures for each LOM?


Mode, frequency and Median Mean
percentages
Nominal Yes No No
Ordinal Yes If it is meaningful No
Interval Yes Yes Yes
Ratio If it’s meaningful Yes Yes

Mode (mo)
- Most common score – highest point in a frequency distribution, most common response
- Suitable for all levels of data (might not be meaningful for ratio/continuous data)
- Not affected by outliers
- Check frequencies and bar graph to see whether it is useful
- Most frequently occurring value in the dataset
- Continuous data – frequency of each value is one since no two scores will be exactly the same
 Normally computed from a grouped frequency distribution
- Mode not frequently used for continuous data – still important

Frequencies (f) and percentages (%)


- Number of responses in each category
- Percentage of responses in each category
- Shown using frequency table (to summarise) or bar/pie chart (to visualise)

Median (mdn)
- The mid-point of the distribution – quartile 2, 50th percentile
- Same number of scores above median as below it
- Not badly affected by outliers
- Might not represent central tendency if data is skewed
- If median useful, other percentiles might be worth reporting
- When odd number of values – median is middle number
- When even number of values – median is the mean of the two middle numbers

Mean (arithmetic)
- Average score
- Calculated by summing all scores and dividing by number of scores
- Used for normally distributed ratio or interval data
- Sensitive to extreme scores or outliers
- Sometimes inappropriate (e.g. bimodal distribution – can be describing a value where it is possible we have no scores)
- Sum of the numbers divided by the number of values
- Other means exist – geometric mean, harmonic mean, and others – but when referring to a “mean” almost always referring to
the arithmetic mean

Distribution
- Measures shape, spread and dispersion of data, as well as deviations from central tendency
- How do you decide which statistics to use?
- Non-parametric
 Minimum/maximum
 Range – highest score minus lowest score – extremely sensitive to outliers
 Percentiles – 75% percentile (upper hinge) and 25% percentile (lower hinge), IQR (H-spread)
- Parametric
 Standard deviation
 Skewness
 Kurtosis

What are the most appropriate measures for each LOM?

Min/max and range Percentile Variance/SD


Nominal No No No
Ordinal Yes If it is meaningful No
Interval Yes Yes Yes
Ratio Yes Yes Yes

Variance
- Average squared distance from the mean
- Closer together results = smaller variance
- Farther apart results = larger variance
- (N – 1) – means variance is larger and provides better estimation of population

2
∑2( X − X́ )
s=
N−1
Standard deviation (SD)
- Standard deviation is the square root of the variance
- Use for normally distributed interval or ratio data
- Affected by outliers

s=
√ ∑ ( X − X́ )
N−1
Options for nominal LOM
- Nominal data = labelled categories
- Can describe data – but it is a little different
 Which is most frequent?
 Which is least frequent?
 What are the frequencies?
 Percentages?
 Cumulative percentages?
 Ratios (e.g. twice as many females as males)

Options for ordinal data


- Ordinal data = data tells you about order but not distance (e.g. ranked options)
- Descriptives approach same for nominal but can also use percentiles (including the median)

Options for interval data


- Interval data = order and distances but zero is arbitrary/not meaningful
- Describe in a number of ways:
 Central tendency – mode, median, mean
 Shape/spread – min/max, range, standard dev., skewness, kurtosis
- Interval = discrete data – but often treated as continuous (especially if intervals > 5)

Options for ratio data


- Ratio = numbers convey order and distance with a meaningful zero
- Same descriptive for interval but can also use ratios (e.g. group A twice as tall as group B)

3.3. Properties of the normal distribution

Four moments of a normal distribution:


- Mean – average
- Standard dev. – the spread around the mean
- Kurtosis – how high or flat the peak is
- Skew – how much and in which direction the distribution leans

Normal distribution allows us to estimate where our points lie


- Using standard deviation to show where most points are
- General rule – if data normally distributed
 68% within 1 SD of mean
 95% within 2 SD
 99.7% within 3 SD

Skewness
- Measure of the lean of the distribution (symmetry/asymmetry)
- Look for the tail (where there are fewer values)
 Tail to right = positive skew  peak pointing towards lower numbers
 Tail to left = negative skew  peak pointing towards higher numbers
- What causes skew?
 Outliers
 Floor effects – everyone gets low scores, few high scores
 Ceiling effects – everyone gets high scores, few low scores
 Check chart to see
- Skewed data is not always a mistake – e.g. depression scores
Kurtosis
- How flat vs peaked the distribution of the data is
 Peaked data = positive kurtosis
 Flat data = negative kurtosis
- A distribution can look more peaked or flat depending on how the graph is set up (the X and Y axes), so add a normal curve
to judge kurotsis visually
- Normal curve = mesokurtic
- Peaked curve = leptokurtic
- Flat curve = platykurtic
- How to judge how severe skewness and kurtosis is in a distribution
 Check histogram
 Are there outliers? Deal with them
 Run skewness/kurtosis analyses (will give you value and significance for the test)
 Rule of thumb: skewness and kurtosis values between -1 and +1 are generally “normal enough” to meet assumptions for
parametric inferential statistics, but many use +/- 2.5 as the cut off
 The significance test for skewness tends to be overly sensitive

3.4. Non-normal distributions

How many peaks are there?


- One peak = unimodal
- Two peaks = bimodal
- More than two peaks = multi-modal

Bounded scales
- Sometimes highest or lowest response option is a “censored score” – we don’t know the participant’s exact score
- E.g. individuals who timed out on a task don’t have the exact time taken recorded
- Can look like truncation

How does skew affect measures of central tendency?


- In normal distribution (symmetrical)  mean = median = mode
 Mean and median are the same
- Positive skew  mode < median < mean
 Mean is typically higher than the median (except sometimes in bimodal distributions)
- Negative skew  mean < median < mode
 Mean is typically lower than the median

If data non-normal:
- Non-parametric descriptive statistics
 Min/max
 Range
 Percentiles
 Quartiles
 Q1 (25% percentile)
 Q2 (median)
 Q3 (75% percentile)
 Interquartile ratio (Q3 – Q1) – middle 50%
- Ways to fix non-normal distribution
 Use transformations to convert data to normal
 Allows you to do more powerful tests (i.e. parametric)
 Lose original metric – complicates interpretation

3.5. Graphical techniques

Steps to take
- What is the purpose of the graph?
 To make large amounts of data coherent?
 To present many numbers in a small space?
 To encourage eye to make comparisons?
- Select type of graph to use
- Draw and modify graph to be clear, non-distorting and well-labelled
 Helps to maximise clarity and minimise clutter
 Show the data – avoid distortion
Types of graphs
- Non-parametric (nominal/ordinal data)
 Bar graph
 Pie chart
- Parametric (normally distributed interval or ratio)
 Histogram
 Stem and leaf plot
 Box plot

Bar chart/bar graph


- Compare heights of bars to see which is most common, least common
- Discrete data on x-axis
- Y-axis can show frequencies, percentages or means
- To better show data
 Collapse x-axis if too many categories
 Beware – truncating y-axis exaggerates differences
 Can add data levels (values for each bar)

Pie chart
- Can display same information as bar chart
- Disadvantages
 Harder to read
 Difficult to show small values or small differences
 Rotating chart and position of slices influences perception (rotating = bad)

Histograms
- Continuous-type data (Likert with more than 5 categories, ratio data)
- X-axis needs happy medium number of categories
- Show shape
- For continuous placed in class intervals
- Best-suited for large amounts of data
- Create frequency table with intervals to group continuous values – width about 10 is good
- Can be based on relative frequencies instead of actual frequencies – show proportion of scores in each interval – rather than
number of scores
 Y axis runs from 0 to 1

Stem and leaf plots


- Use it to graph ordinal, interval, and ratio data (if rounded to whole numbers)
- Contains all data and presents in a visual way similar to bar graph
- Rule of thumb: up to 200 data points is acceptable
- Best-suited for small to moderate amounts of data – data not too numerous
- Clarify shape of distribution
- Useful for comparing distributions – two distributions are placed back to back along a common column of stems = “back-to-
back stem and leaf display”
- Can use negative numbers as part of a stem

Box plots
- Use this for interval and ratio data
- Shows min and max, median, quartiles and outliers
- Good for screening data, comparing variables
- Can get messy (information overload)
- Maximum here is calculated as (Q3 + (1.5*IQR))
- Minimum here is calculated as (Q1 – (1.5*IQR))
- Can use for comparing variables
- Depicting differences between distributions
- Whiskers above and below each box to give additional information about the spread of data
 Vertical lines + horizontal line on the end
 Drawn from upper and lower hinges to upper and lower adjacent values
- Mean score indicated with a plus sign

Line graph
- Alternative to histogram
- Implies continuity (e.g. over time)
- Show multiple lines for different information
- Tops of the bars represented by points joined by lines (the rest of the bar is suppressed)
- Appropriate only when X and Y-axis display ordered (rather than qualitative) variables
- Misleading to use a line graph when the X-axis contains merely qualitative variables

Frequency polygons
- Graphical device for understanding the shapes of distributions
- Especially helpful for comparing sets of data displaying cumulative frequency distributions
- X-axis representing the values of the scores in your data
- Mark the middle of each class interval with a tick mark – label it with the middle value represented by the class
- Y-axis to indicate the frequency of each class
- Class interval at the height corresponding to its frequency
- Connect the points – one class interval below the lowest value in your data and one above the highest value – graph will touch
x-axis on both sides
- Easily discern shape of the distribution from this

Scatter plot
- Shows relationship between two variables

3.6. Graphical integrity

Tufte (1983) on graphical integrity


- Lie factor = size of effect in graph/size of effect in data
- Lying sometimes intentional, sometimes not
- Misleading graphs are those that:
 Use area or perspective in misleading ways
 Leave out important context

Lecture 4: Probability – The basics

4.1. What is probability?


- The weight or magnitude of authority held by a witness in a legal case; the likelihood that some event occurred if claimed by
people of different degrees of authority
- Its modern meaning retains this notion of weight or magnitude, and the notion of likelihood that some event occurred, but in
relation to empirical evidence
- A branch of mathematics concerned with estimating the likelihood that some event occurred, expressed as the magnitude of
the degree of certainty/possibility
- The probability of an event is expressed as a number between 0 and 1, where 0 is impossibility and 1 is certainty
- Can be transformed into a percentage by multiplying by 100

Why do we need to know about probability?


1. Anything “random” involves probabilities. Inferential statistics are founded on probability
 Randomness: a numeric sequence that contains no discernible patterns or regularities
 Inferential statistics: the use of numbers to estimate what some population is like
 In practice, inferential statistics often rely on estimating the probability that some outcome is different from randomness
2. Understanding risk
 Assessment of risk are based on conditional probabilities of events
 Conditional probability: the likelihood of some event, given some other information  of the subset of events with
characteristics A, what is the probability of B
 E.g. the risk of suffering a heart attack within the year if someone is 25 years old, versus 75 years old
3. Co-occurrences of uncertain events
 Consider the probability of a child developing ADHD, vs. developing aggression problems, vs. developing both
 Compound and conditional probabilities allow us to determine whether two events are connected
 Compound probability (AND): the likelihood of two independent events co-occurring  the likelihood of A
multiplied by the likelihood of B, when A and B are independent
 Compound probability (OR): the likelihood of one independent event or another independent event  the likelihood
of A plus the likelihood of B, when A and B are independent

4.2. Types of probabilities and rules for probabilities

Types of probabilities
1. A priori, or theoretical
- Probability assigned as we do with dice or coins, on a logical basis
- Theoretical probabilities often are assumed or defined in human science research
- An issue of some controversy when one can do this – can be quite dangerous to assume that two events are equal probable
simply because we don’t know what their probabilities are
2. Frequentist/empirical
- Dominant definition of probability in statistics  relative frequency of an event out of the number of opportunities for that
event to occur
- E.g. If you want to find out whether a coin is unbiased or not – throw it lots of times and record the results (number of
heads/number of throws)
 The greater the number of throws, the more stable (and valid) the result  key concept in probabilistic sampling: the
bigger the sample, the more precise the estimate of a population parameter
3. Bayesian, or subjective
- Frequentist probability refers to how often a particular event occurs in a long run of trials
- Bayesian = subjective plausibility of the occurrence of some event in the data
 Places the probability of some event in the degree of belief of an observer
- E.g. the probability that X committed a particular crime
 Frequentist – the relative frequency with which people with the characteristics of X commit crimes
 Bayesian – the degree of belief I can have that X committed the crime, given a prior belief about the likelihood of X
committing crimes and the evidence that X committed the crime

Rules for probabilities


- Suppose we have a group of 20 people, and we know the following things about them:
 7 are smokers
 9 are under 30 years of age
 4 are both smokers and under 30 years of age
- Total probability rule
 Probabilities must add up to 1 over all possible outcomes
 E.g. (from above)
 P(Smoker) = 7/20
 P(Non-smoker) = 13/20
 P(S) + P(NS) = 7/20 + 13/20 = 1
- Addition rule
 Ensuring that no double-counting occurs
 E.g. (from above) How many people are either a smoker or under 30?
 If we add smokers and under-30’s, we get 7+9=16  double-counts the 4 who smoke and are under 30
 Need to subtract these 4 individuals: 7+9-4 = 12
 Probabilities work in the same way:
 P(S) = 7/20
 P(U) = 9/20
 P(S and U) = 4/20
 So now we can get: P(S or U) = P(S) + P(U) – P(S and U)
 = 7/20 + 9/20 – 4/20 = 12/20
 We can generalise to derive the addition rule:
 P(A or B) = P(A) + P(B) – P(A and B)
 If P(A and B) = 0, they are said to be exclusive
 In this case, P(A or B) = P(A) + P(B)

4.3. Calculating compound probabilities


- Thinking about the smokers-under-30 problem (above)
1. Lay out a table that cross-classifies both sets of options – put numbers in appropriate cells

Smokers Non-smokers Totals


Under 30s 4 9
30+
Totals 7 20

2. Then fill in the rest of the table using logic and arithmetic to find the “missing” numbers:

Smokers Non-smokers Totals


Under 30s 4 5 9
30+ 3 8 11
Totals 7 13 20

Now calculating compound probabilities is easy: P(S and U) = 4/20


- Don’t even have to remember the addition rule!
- Just add up the numbers in the cells that are either S or U or both: P(S or U) = (4 + 5 + 3)/20 = 12/20
- P(NS or NU) = (5 + 3 + 8)/20 = 16/20

How to use the table above if problem given to you in probabilistic format
- Suppose on an exam you are given the following information:
 P(married now) = 0.45
 P(married before) = 0.32
 P(married now and married before) = 0.16
- And question is: What’s P(not married before and married now)

1. Multiply all the probabilities by 100 and think of a group of 100 people. So:
 45 are married now
 32 were married before
 16 are married now and were married before
2. Set up a table for the two events and their opposites
 Put the numbers in the appropriate cells in this table, including your total of 100 people
3. Then add up the numbers in your cells and divide by 100: 29/100 = 0.29

Married now Not married now Totals


Married before 16 16 32
Not married before 29 39 68
Totals 45 55 100

P(not married before and married now) = 29 people, so 0.29 is probability

4.4. Conditional probabilities: rules and calculations

Conditional probabilities and risk


- Conditional probability refers to the probability of some event/characteristics, assuming a priori some other
event/characteristic
- Risk factor: a characteristic that makes one more likely to have some (negative) outcome than someone who does not have
that characteristic (e.g. age is a risk factor for severe COVID-19)
- Conditional probabilities enable us to identify what is a risk factor and what is not
Rules for conditional probabilities
- Conditional probabilities have to be treated according to the following rules
- Denote “probability of A conditional on B having occurred” = P(A|B)
 Vertical bar means “conditional on”
- Rules are
 P(A|B) = P(A and B)/P(B) and
 P(B|A) = P(A and B)/P(A)
- E.g. smoker example once more
 Conditional on being under 30, the probability of selecting a smoker is?
 P(S|U) = P(S and U)/P(U) = 4/9 = 0.44
 For being under 30, given smoking
 P(U|S) = 4/7 = 0.57

Calculating conditional probabilities


- Yancovitz et al., (1991) report a trial of heroin users comparing methadone with a wait-list control group. Table below shows
how many in each group had positive versus negative urine tests after one month
- Is the probability of a negative test greater for the Methadone than for the wait-list group?
- We want to find out if P(N|M) is larger than P(N|W)

Positive Negative Totals


Methadone 22 53 75
Wait-list 56 38 94
Totals 78 91 169

1. Recall that P(N|M) = P(N and M)/P(M)


 Identify the cell containing N and M and the cell containing M.
 N and M = 53
 M = 75
2. Divide the N and M cell by the M row
 P(N|M) = 53/75 = 0.707
3. Recall that P(N|W) = P(N and W)/P(W)
 Identify the cell containing N and W and the cell containing W.
 N and W = 38
 W = 94
4. Divide the N and W cell by the W row
 P(N|W) = 38/94 = 0.404
- Therefore P(N|M) > P(N|W), and we can conclude that taking methadone has an effect on heroin usage
4.5. Conditional probabilities and testing errors
- Diagnostic tests – real world problem
 No test is perfectly accurate
 E.g. a test that is intended to diagnose the early onset of a particular disease
 How could we evaluate the accuracy of this test?
 What if a medical researcher told us “this test of 95% accurate!”
 How would we critically assess the validity of this statement?

1. Identify how the test could go wrong


 Two ways any test could be mistaken
 Test is positive when the person doesn’t have the disease
 Test is negative when the person really does have the disease

Pos test Neg test Totals


Disease Correct False neg Disease
No disease False pos Correct No disease
Totals Pos test Neg test All

2. Now supposed we have conducted a large-scale testing study and are able to fill in this table with cases as follows:

Pos test Neg test Totals


Disease 428 22 450
No disease 592 9458 10050
Totals 1020 9480 10500

- At this point we need to set up a way of specifying what we mean by accuracy


- There are two ways to assess accuracy and they turn out to correspond to two different conditional probabilities
- Starting with positive test results, consider:
 P(P|D) = 428/450 = 0.951
 Probability of a positive test result if the person has the disease  sensitivity (95.1% in this case)
 P(D|P) = 428/1020 = 0.420
 Probability of a person having the disease given a positive test  diagnosticity (42.0% in this case)
 Why is there a difference? There are so few with the disease, and so many without the disease (450 vs 10050)
- Now with negative test results, consider:
 P(N|C) = 9458/10050 = 0.941
 P(C|N) = 9458/9480 = 0.998
- What credibility should we give to a statement like – “This test is 95% accurate!”?

Lecture 5: Probability distributions

5.1. Brief recap of probability


- A branch of mathematics concerned with estimating the likelihood that some event occurred, expressed as the magnitude of
the degree of certainty/possibility
- The probability of an event is expressed as a number between 0 and 1, where 0 is impossibility and 1 is certainty
- Can be transformed into a percentage by multiplying it by 100
- Thus far we have been exploring observed probabilities

From describing data to making inferences


- Observed probabilities such as this allow us to describe data but not to make inferences
- To make inferences, we need theoretical probability distributions – a prior distributions that assume that the phenomenon of
interest is characterised by particular parameters
- Parameters – those characteristics of the population from which we are sampling that define the shape of the theoretical
distribution
- Theoretical distributions allow us to estimate the likelihood of an observed event or set of events

Inferring from theoretical distributions


- E.g. imagine you flip a two-sided coin 8 times and get the following sequence: HHHHHHHT
 The observed probability of getting Heads here is P(H) = 7/8 = 0.875  descriptive
 Let’s say now we want to know how likely it would be to get such a sequence if the coin was unbiased
 We’re now asking: what is the probability of getting P(H) = 0.875, if the underlying probability of getting Heads in the
world is P(H) = ½ = 0.5  inferential
 To answer this question we need a theoretical probability distribution

There are many kinds of probability distribution


- The level of measurement, whether variable is discrete or continuous, the nature of the scale (e.g. bounded/unbounded), and
other factors, determine the appropriate theoretical probability distribution to use
 Binomial distribution
 Normal distribution
 t-distribution
 F-distribution
 Chi-squared distribution
 Poisson distribution
 Negative binomial distribution
 Beta distribution, etc.

Probability distributions
- In general terms – probability distribution describes the a priori likelihood of an outcome – assuming particular characteristics
(parameters) of the population from which the sample is being drawn
- In a probability distribution figure – the X axis describes outcome space (all possible outcomes for discrete variables, and a
mean with standard deviations around it for continuous variables), Y axis describes the probability mass function (for discrete
variables) or the probability density function (for continuous variables)
- Probability mass function – for each outcome what is the probability of achieving that outcome
- Probability density – what is the probability of achieving a point at that standard deviation

5.2. Binomial distribution

Binomial distribution is a discrete probability distribution


- Describes the number of successes in a sequence of independent events/trials, each with a possible yes/no outcome
- Success (yes) is defined by P; e.g. P(H)
- Failure (no) is defined by q = 1 – P; e.g. P(T) = 1 – P(H)
- Binomial distribution is defined by the parameters P and n  P is the underlying a priori probability of the event, and n is the
number of trials

Logic of the binomial distribution


- The shape of the binomial distribution logically arises from the multiplication and addition rules
- P(X and Y) for independent events is equal to P(X)*P(Y); P(X or Y) for independent events is equal to P(X) + P(Y)
- So imagine we’re flipping a coin again, and its unbiased, such that P(H) = 0.5
- What is the probability of (H, H) from 2 trials?
 P(H, H) = P(H)*P(H) = 0.5 x 0.5 = 0.25
- What is the probability of (H, H, H) from 3 trials?
 P(H, H, H) = P(H)*P(H)*P(H) = 0.5 x 0.5 x 0.5 = 0.125
- From this logic we can build up a binomial distribution, given particular P (underlying probability of yes vs. no) and n
(number of trials)
- So, we can generalise to a rule that states that the shape of a binomial distribution will differ depending on P and n

Using the binomial distribution to make inferences


- We can use the binomial distribution as a model of reality for any procedure that has two outcomes
- We can ask whether particular probabilities appear plausible in view of data we’ve obtained
- Imagine you have a crush on someone who typically responds to 1 out of every 5 texts they receive from friends  P(R) =
1/5
- Imagine you’d like to suss out if you’re more than a friend, given they’ve responded to 8 out of your last 15 text messages
- How confident can you be that you’re different from other friends?
 A reasonably way of addressing this question is to ask how likely it would be for you to receive 8 responses out of 15
text messages, if the probability of your crush responding was still 1/5.
 The binomial distribution table shows it’s very unlikely P(R) = 0.0035
 If we compare that to how likely it would be to get 8 out of 15 responses if the true “response rate” was 8/15, we can see
how much less plausible the P = 1/5 model is

5.3. The normal distribution

Normal distribution – continuous probability distribution used for continuous random variables
- Particularly widely used in psychology
- Central limit theorem states that the average of many observations from a random variable is itself a random variable
- This means that measurement error and physical processes that are the sum of many independent processes are normally
distributed  e.g. height
- Unlike binomial distribution, normal distribution has the same symmetric bell shape no matter the value of its parameters (in
this case, mean and standard dev)
- This means that:
 All we need to know of a variable that is normally distributed is its mean and standard deviation; and
 We can convert any raw score from a normally distributed variable into a standardised score that allows comparisons
between observations and the theoretical distribution
- Z-score = (X – mean)/SD
- A normal distribution in Z-score units is a standard normal distribution  mean = 0, SD = 1

Using the standard normal distribution


- Can use the standard normal distribution to ask similar kinds of probability questions as we did with the Binomial distribution
(but with continuous variables)
- E.g. imagine you want to catch a bus on Barry drive but you arrive 1 minute and 44 seconds later that the scheduled time, and
you want to know if the bus might still come, or if it has already been  looking after data from the past 10 years, you find
that on average the bus arrives 34 seconds late, with a standard deviation of 76 seconds
 Using standard normal distribution, 95% of points are within 2 standard deviations of the mean. This means that 95% of
the times the bus has ever come are between 118 seconds early, and 186 seconds late.
 So what you want to know is where does 1 minute 44 seconds fall in the standard normal distribution.
 1 min 44 seconds = 104 seconds
 Z-score = (104-34)/76 = 0.92 (2 dp.)
 Look up this in the Z table 0.1788 of the curve is below 1 min 44 sec, 0.8212 of the curve is above 1 min 44 sec
 Remember these areas correspond to probability densities
 So there is a probability of 0.8212 that the bus has already passed, and a probability of 0.1788 that it is still yet to
come.
- We can also use the standard normal distribution to estimate how late the bus is, given the probability that it has already
passed. So, imagine your friend now tells you that there is a 33% chance that the bus might still come: how many
minutes/seconds late is the bus?
 Look up the Z table (looking for area beyond Z of 0.33)
 We find this is a Z-score of 0.44
 0.44 = (X-34)/76
 X = (0.44*76) + 34 = 67.44  the bus is approximately 1 minute and 7 seconds late
- Can also use the standard normal distribution to decide whether an observation is from the same population as that from
which the mean and SD are derive or not  a one sample Z-test
- Imagine now that a passer-by-tells you that they saw a bus drive past 1 minute 56 seconds EARLY. Your question is now: is
this the ACTION bus I’m waiting for or was that maybe a Deane’s bus?
 Calculate Z-score for the early bus  Z = (-116-34)/76 = -1.97
 Look up a Z-table for -1.97
 The area below Z is 0.024, a very small probability
 This can be interpreted as a small probability that the bus the passer-by saw was from the same population as the bus
you’re waiting for  it was unlikely to be an ACTION bus, so maybe it was a Deane’s bus

5.4. The t distribution

The t distribution is a continuous probability distribution, used to represent the sampling distribution of a population parameter,
the mean
- If you take lots of observations from a population, you can derive an estimate of the population mean and the population
standard deviation
- The larger the sample size, the more these estimates will approximate the “true” population values
- Important, however, each sample drawn from a population will have a slightly different estimate of these parameters (e.g. the
mean  sampling distribution of the mean)
- t = mean difference/standard error
SD
- It turns out there is a relationship between the SD from a sample and the standard error of the mean  SE=
√n
- Given that t = mean difference/SE, this means that the shape of a t distribution differs as a function of sample size
(specifically degrees of freedom (df) unlike the Z distribution)
- For t, then, we need to know three parameters: mean difference, SE, and df
- The t distribution can be used to determine whether means from two samples are drawn from the same population
 Going back to variation of bus example, imagine a sample of 6 buses is 51.83 seconds late on average, while a second
sample of 6 buses is 16.17 seconds late on average, and the SE is 14.29
 What is the probability that they are from the same population?
 df = n1 + n2 – 2
 mean difference = 51.83 – 16.17 = 35.66
 t(10) = mean difference/SE = 35.66/14.29 = 2.495
 Look up t-distribution table  p = 0.016
 There is only a 1.6% probability that they are from the same population

5.5. The F distribution

F-distribution = a continuous probability distribution, used to represent the ratio of two variances
- Always positive because variances are always positive
- F-distribution is almost always greater than or equal to 1, because F values are usually calculated by dividing the larger
variance by the smaller variance (see ANOVA)
- F-distribution is positively skewed
- Shape of the F-distribution is determined by 2 parameters, df associated with variance 1 and df associated with variance 2
- Can be used to determine if two variances are equal or not
 F(df1, df2) = Var1/Var2
 If F = 1, the variances are equal
- E.g. imagine the variance in height for one group of 11 people is 49cm and the variance in height for another group of 16
people is 14cm  is the variance of group 1 greater than group 2? What is the probability that they are from the same
population?
 F(10, 15) = 49/14 = 3.5
 Look up F table  p = 0.014
 There is only a 1.4% probability that the sample variance are from the same population

Lecture 6: Confidence intervals and significance testing

6.1. Null hypothesis significance testing

- Null hypothesis (H0) – there is no effect


- Alternative hypothesis (Ha) – there is an effect
- We calculate an effect size and p-value from our sample data to make inferences about the likelihood of an effect in the
population
- The smaller the p-value, the greater the statistical incompatibility of the data with the null hypothesis

How small does the p-value need to be?


- Referred to as the critical alpha
- Usually use the threshold of p < 0.05 in psychology
- Because this is a probability-based decision, we tolerate some false positives

When we get p-values, what can we do with them?


- Relationships: correlations, regressions
- Differences between groups: t-tests, ANOVA

6.2. Criticisms of null hypothesis significance testing

- The null hypothesis is rarely true


- Significance testing provides
 A binary decision (yes/no)
 But mostly we are interested in the size of the effect – how much of an effect is there?
- Statistical vs. practical significance
 Practical significance – whether the difference is large enough to be of value in a real world sense:
 Is an effect worth being concerned about?
 Is the effect noticeable or worthwhile?
 E.g. a 5% increase in well-being probably starts to have practical value
- Significance is a function of the effect size, sample size, and critical alpha

APA publication manual recommendations about effect sizes, confidence intervals and power
- 2001 (APA 5th edition) recommended reporting of effect sizes, power, etc.
- 2009 (APA 6th edition) further strengthened the requirements to use NHST as a starting point and to also include effect sizes,
confidence intervals and powers
- American statistical association (2016)
 “Practices that reduce data analysis or scientific inference to mechanical “bright-line” rules (such as “p < 0.05”) for
justifying scientific claims or conclusions can lead to erroneous beliefs and poor decision making. A conclusion does not
immediately become “true” on one side of the divide and “false” on the other.”
 Statement on significance testing and p-values
 P-values can indicate how incompatible the data are with a specified statistical model
 P-values do not measure the probability that the studied hypothesis is true, or the probability that the data were
produced by random chance alone
 Scientific conclusions and business or policy decisions should not be based only on whether a p-value passes a
specific threshold
 Proper inference requires full reporting and transparency
 A p-value, or statistical significance, does not measure the size of an effect or the importance of a result
 By itself, a p-value does not provide a good measure of evidence regarding a model or hypothesis
Recommendations to follow when using NHST
- Use traditional null hypothesis significance testing
- Also use complementary techniques (effect sizes and confidence intervals)
- Emphasis practical significance
- Recognise merits and shortcomings of each approach
- New statistics – aim to increase the transparency, completeness, and accuracy of research findings

6.3. Confidence intervals

Confidence interval – a range of values that contains a specified percentage of the sampling distribution of that statistic
Confidence level – the specific percentage above
- Intervals used to make inferences about the plausible values of a parameter on the basis of a sample estimate

When to use confidence intervals


- Around effect sizes
- Around estimates (e.g. means – this is what we focus on today)
- The confidence interval for the mean consists of a range of values starting somewhere below the sample mean and ending the
same distance above it. The range of values includes “plausible” values for the population mean

Calculating confidence intervals


- We can think of the sample mean as consisting of the population mean plus sampling error
X =μ+e
- Note that the error (e) can be positive or negative
- The law of large numbers
 Sample means tend to be closer to the population mean’s value as we take larger samples
 For sufficiently large samples, the sampling distribution of a mean may be closely approximated by a normal
distribution with a mean of μ and a standard deviation equal to the standard error of the mean
- Small samples
 The ratio of the sample mean to the standard error has a t distribution as a sampling distribution
 The ratio is called a t statistic
- To calculate confidence intervals, we need to understand the t distribution
- The t distribution is quite similar to a normal distribution, but the t distribution has larger tails
- This means that confidence intervals based on the t distribution will be wider than they would be if they were based on the
normal distribution
- As sample size increases
 This difference between the t and normal distributions decreases until it is of no practical importance
 E.g. at N=10 they are practically indistinguishable

6.4. Calculating confidence intervals


Step 1: Collect a sample of N observations
Step 2: Calculate the sample mean, standard deviation and standard error
- Standard error of the mean is above (in lecture 5 – t-tests)
- Standard error is one component of the half-width w, so the larger the standard error the bigger w is
Step 3: Calculate critical alpha α
- Set a desired confidence level (e.g. 95%)
- Convert the confidence level into a proportion and subtract from 1 to get the proportion of the sampling distribution to be
excluded from the confidence interval (e.g. 1- 0.95 = an α of 0.05)
Step 4: Look up the two-tailed t value that corresponds to N-1 degrees of freedom at your set α
Step 5: Calculate half width of the confidence interval
- Half-width is calculated by multiplying the t-value and the standard error together
w=t α / 2 s x́
Step 6: Write confidence interval statement
Pr ( X́−w< μ< X́ + w ) =1−α
Properties of confidence intervals
- The locations and widths of confidence intervals vary due to sampling error and sample size
- If we repeated a sampling procedure a very large number of times the percentage of interval containing the true mean would
home in on 95%
- Technically speaking, the confidence level refers to the expected percentage of times that a confidence interval would contain
the population value of the statistic being estimated, under repeated random sampling
- Greater confidence (e.g. 95% vs. 90%) entails wider intervals, and therefore a vaguer statement. There is a trade-off between
confidence and informativeness
- Larger sample sizes entail more precise (narrower) intervals
- When N is large
 As sample size increases, the t-distribution more closely approximates the normal distribution
 For large sample sizes, some textbooks will suggest using the normal (Z) distribution for calculating confidence
intervals around the mean

6.5. Confidence intervals in decision making

- Plausible hypothetical value of a population statistic – one that lies within the confidence interval (inside it)
- Implausible hypothetical value of a population statistic – one that lies outside the confidence interval
- Model error – difference between a hypothetical value of a population statistic and the true value of that statistic
 Sample statistical value = hypothetical population statistic value + model error + sampling error
- The best possible model is one that has no model error
- A plausible model is one for which 0 is a plausible value for model error. That is a plausible model predicts plausible values

Confidence intervals used as decision-making tests


- Decide on the confidence level that you wish to use for distinguishing plausible from implausible values of the population
statistic concerned
- For any hypothesis, ascertain its hypothetical (predicted) values of the population statistic
- On the basis of an appropriate random sample, construct a confidence interval for the population statistic
- If all the predicted values associated with a hypothesis lie outside the confidence interval (i.e. all of the predicted values are
implausible), then the hypothesis may be rejected. Otherwise we cannot reject it
- “Failing to reject” a hypothesis does not mean that we believe the hypothesis – it means it is plausible

You might also like