Template Slide
Template Slide
Mean (average) = The sum of all of the values divided by the number of
values. Least Robust
Median (“middle” value) = The value that is in the middle when all of the
values are arranged in ascending order. If there is an even number of
values there is no single middle value. Therefore, you take the average of
the two middle value. Robustness is in between mean and mode
Mode (most common value) = The value that appears the highest number
of times. Most Robust
If you are given the list of values 1, 3, 3, 5, 7, 10. What is the mean,
median, and mode?
Robustness
The above question illustrates how Robust, or resistant to change by an
extreme value, the three measures of central tendency are. You can see
that by adding one extreme value (an outlier) the mean has changed a lot
and mode hasn’t changed at all. This is because mean is the least
robust of the three values and mode is the most robust. Median is less
robust than mode, but more robust than mean.
On Step 1 you may also be asked to compare mean, median, and mode
in certain situations based on a histogram or set of values. For example,
the answer looks like “mean is greater than mode” rather than a precise
numerical answer. In most of these cases the data is skewed significantly
in one direction and is not normally distributed.
Normal Distribution, Normally Distributed, Skewed Right, Skewed Left, Positive Skew, Negative
Skew, Negatively Skewed, Positively Skewed
Standard Deviation
Standard deviation (greek symbol σ) measures how much the values in a
data set differs from the mean. In other words, standard deviation
measures dispersion or variability in a set of values. A data set with
mostly similar values has a small standard deviation, while a data set with
very different values has a large standard deviation.
Standard deviation changes with changes in sample size (number of
values or participants). With small sample sizes random chance has a
bigger impact and therefore standard deviation for a small sample size is
generally larger. Studies with more values generally have smaller
standard deviations as chance plays less of a role.
Now that you are done with this video you should check out the next video in
the Biostats & Epidemiology for the USMLE Step 1 section which covers 2 by 2
Tables, False Positive, False Negative, True Positive & True Negative
2×2 TABLE, FALSE POSITIVE,
FALSE NEGATIVE, TRUE POSITIVE
& TRUE NEGATIVE
Two-by-Two Tables
Questions involving TP, FP, TN, and FN will usually have a two-by-two
table. Sometimes they will give you the actual table and other times they
will give you all of the data for the table in sentence form and you have to
make the table for yourself. You may have learned to refer to the boxes in
a two-by-two table as A, B, C & D. I am going to strongly recommend you
not do this. First off, those letter labels have no meaning. It is therefore
more likely for you to get confused and make a dumb mistake.
Additionally, the top left box in a two-by-two table may not always
represent true positive. Sometimes, test writers will mix up the order of
the columns and/or rows. I suggest using TP, TN, FP, and FN instead.
Sometimes they will give you an extra row and column that has totals.
Don’t let that throw you off. It is still a two-by-two table even though there
are 3 columns and 3 rows. They just try to save you a step of calculation
by giving you the row and column totals.
Now that you have finished this video on 2 by 2 tables you should check out the
next video in the Biostatistics & Epidemiology section which covers Sensitivity,
Specificity & Confirmatory Tests
SENSITIVITY, SPECIFICITY &
SCREENING TESTS
Before you watch this video you should really watch the previous video which
covers Two by Two Tables, TP, TN, FP & FN. That video lays the foundation for
this video so it may be tough to watch this one by itself.
Sensitivity & Specificity
Sensitivity (Sen) & Specificity (Spec) are used to evaluate the validity of
laboratory tests (not results of the tests). Basically, you use sensitivity and
specificity to determine whether or not to use a certain test or to
determine what situations a certain test would work best in. It is important
to note that Sen and Spec are fixed for a certain test as long as you don’t
change the cutoff point. Therefore, Sensitivity & Specificity are not affected
by changing prevalence. Both are given as a percentage ranging from 0%
to 100%.
In the real world you never have a test that is 100% Sen and 100% Spec.
We are usually faced with a decision to use a test with high Sen (and
lower spec) or high Spec (and lower Sen). Usually a test with high
sensitivity is used as the Initial Screening Test. Those that receive a
positive result on the first test will be given a second test with high
specificity that is used as the Confirmatory Test. In these situations you
need both tests to be positive to get a definitive diagnosis. Getting a
single positive reading is not enough for a diagnosis as the individual
tests have either a high chance of FP or a high chance of FN. For
example, HIV is diagnosed using 2 tests. First an ELISA screening test is
used and then a confirmatory Western Blot is used if the first test is
positive.
There are also specific situations where having a high specificity or
sensitivity is really important. Consider that you are trying to screen
donations to a blood bank for blood borne pathogens. In this situation you
want a super high sensitivity, because the drawbacks of a false negative
(spreading disease to a recipient) are way higher than the drawbacks of a
false positive (throwing away 1 blood donation). Now consider you are
testing a patient for the presence of a disease. This particular disease is
treatable, but the treatment has very serious side effects. In this case you
want a test that has high specificity, because there are major drawbacks
to a false positive.
Now that you have finished this video you should check out the next video in
the Biostats & Epidemiology section which covers the calculation of Predictive
Value Positive & Negative (PPV & NPV). That video has some mnemonics and
concepts that also apply to this video.
POSITIVE & NEGATIVE PREDICTIVE
VALUE (PPV & NPV)
Before you watch this video you should really watch the previous videos which
cover Two by Two Tables, TP, TN, FP & FN as well as Sensitivity & Specificity.
Those videos lay the foundation for this video so it may be tough to watch this
one by itself.
Now that you are done with this video you should check out the next video in
the Biostatistics & Epidemiology section which covers the definitions &
calculations of Prevalence & Incidence
INCIDENCE, PREVALENCE & CASE-
FATALITY RATE
To illustrate the differences, here is how you would calculate the incidence and prevalence
of chicken pox in my home town of Boca Raton.
In most cases, incidence and prevalence are directly proportional. When
one goes up the other one goes up and vice versa. This intuitively makes
sense. If you have more new cases of diabetes within a given year then
you are likely to have a higher total number of people with diabetes at any
particular point during that year. However, Prevalence and Incidence are
not always directly proportional and test makers like to focus on these
situations. Most of these situations include a change in the duration of the
disease. Duration of a disease is the time from when a patient is
diagnosed until they are cured or die. When duration is held constant,
prevalence and incidence are directly proportional. The relationship
between prevalence, incidence and duration of disease can illustrated
with a simple formula.
Prevalence = Incidence * Duration
This relationship makes sense if you think about extreme
examples. Consider a situation where there are 100 new cases of a
disease per year but the disease only lasts one day. Annual incidence will
be higher than prevalence as at any particular moment there is likely only
going to be at most 1 person with the disease. Now consider a disease
that has 100 new cases a year and the disease lasts for 40 years. The
point prevalence is going to be higher than annual incidence, as at any
given point you have the 100 or so newly diagnosed patients from this
year plus people that have been diagnosed over the last 40 years that do
not contribute to incidence.
Confounding
Confounding is when the study results are distorted by some factor other
than the variable(s) being studied. It appears that there is a relationship
between the exposure and health outcome based on the results, but there
is not really a relationship. Some factor other than what is being studied is
distorting the results. A confounder is a characteristic is that is common to
the exposure and the health outcome. Rather than A causing B, C is
associated with A and B. In this example C is the confounder. If you
removed C completely, A and B would not be associated. The problem
with confounders is that an unwise researcher may come to the
conclusion that there is causal relationship between the exposure and
outcome if he or she does not recognize the confounder.
In research, you would ideally like to be able to show that your variable of
interest caused the observed difference in outcomes. For example, you
want to be able to show that your treatment leads to less cases of
disease in the study population. If the treatment and placebo groups
aren’t similar to begin with you can’t come to this conclusion. If the groups
are different at the start of the study you can’t be sure if the observed
differences at the end are due to your treatment or some sort of
predisposing factor that was present to differing degrees in the study
groups.
For example, you can’t learn much if the group receiving your treatment
has an average age of 25 and the group receiving the placebo has an
average age of 75. In this case, your results are being confounded by the
difference in age.
Obviously, when you are creating a research study you want the different
groups to be similar in age, gender, ethnic diversity, socio-economic
factors and lifestyle factors. However, having groups that are similar in
only these types of known prognostic variables is not enough. You also
need the different groups to be similar in characteristics you aren’t even
sure affect the disease process. There could be some type of risk factor
that has not yet been identified as being pivotal to disease development.
You want your groups to be similar with regard to this unknown factor too.
How can you make two groups similar based on an infinite list of
potentially important factors that haven’t even been identified yet? The
answer is randomization.
Randomization
Randomization is just the process of selecting from a group in a fashion
that makes all possibilities equally likely to be selected. To illustrate this
point imagine you have a deck of playing cards. If you take a deck of
cards straight out of the box and pick the top card you are not getting a
random selection. It could be a new deck of cards in which the highest
card is likely on top or you could have last played a game like solitaire
that puts the cards in a particular order. However, if you shuffle the deck
thoroughly before selecting the top card the chances of getting all the
cards are equal. In research studies, randomization is like shuffling the
patient’s before assigning them to different groups so each patient has an
equal chance of being in the different groups.
Stratification
Sometimes randomization is not enough on its own. More often than not
you will get an equal distribution between groups for characteristics such
as gender, but there is still a chance that you will get more males than
females in one group. This is especially true if the sample size is small. If
you know that gender is an extremely important prognostic factor for your
disease (like if you were studying the frequency of an X-linked genetic
disease) you don’t want to take the chance that this could happen. The
way to avoid this is called Stratification. In Stratification you first divide
your population by a particular characteristic and then you randomize.
You can think about stratification as randomization that is balanced with
regard to one particularly important factor.
CrossoverStudy
Crossover Studies are experimental studies that have the participants
“switch groups” part way through the study. For example, patients that
started with the placebo switch to getting the treatment halfway through
the study while those that started with the treatment get the placebo after
the halfway point. In this study design there is no separate control group
as participants act as their own controls.
COHORT, CASE-CONTROL, META-
ANALYSIS & CROSS-SECTIONAL
STUDY DESIGNS
Before you watch this video you should check out the 2 previous videos in
the Biostatistics & Epidemiology section which cover Validity & Bias as well
as Confounding & Types of Bias. Those videos have principles that will be
applied to this video on Types of Study Design.
Hierarchy of Evidence
Based on the types of bias that are inherent in some study designs we
can rank different study designs based on their validity. The types of
research studies at the top of the list have the highest validity while those
at the bottom have lower validity. In most cases if 2 studies on the same
topic come to different conclusions, you assume the trial of the more valid
type is correct. However, this is not always the case. Any study design
can have bias. A very well designed and executed cohort study can yield
more valid results than a clinical trial with clear deficiencies.
Meta-Analysis
Meta-analysis is the process of taking results from multiple different studies
and combining them to reach a single conclusion. Doing this is sort of like
having one huge study with a very large sample size and therefore meta-
analysis has higher power than individual studies.
Case-Control Study
Case-Control Studies are retrospective and observational. You first identify
people who have the health outcome of interest. Then you carefully select
a group of controls that are very similar to your diseased population
except they don’t have that particular disease. Then you try to determine
whether or not the participants from each group had a particular exposure
in the past. I remember this by thinking that in a case control study
you start off knowing whether a person is diseased (a case) or not diseased
(a control). There isn’t a huge difference between retrospective cohort
and case-control. You are basically doing the same steps but in a slightly
different order. However, the two study designs are used in different
settings. As we will see later, the incidence in different groups in a case-
control study is compared using Odds Ratio (OR).
Case-Series
A Case-Series is a small collection of individual cases. It is an
observational study with a very small sample sizeand no control group.
Basically you are just reviewing the medical records for a few people with
a particular exposure or disease. A study like this is good for very rare
exposures or diseases. Obviously the small sample size and lack of a
control group limits the validity of any conclusions that are made, but in
certain situations this is the best evidence that is available.
Cross-sectional Study
Cross Sectional Studies are different from the others we have discussed.
While the other studies measure the incidence of a particular health
outcome over time, a cross-sectional study measures Prevalence. In this
observational study the prevalence of the exposure and the health
outcome are measured at the same time. You are basically trying to
figure out how many people in the population have the disease and how
many people have the exposure at one point in time. It is hard to
determine an association between the exposure and disease just from
this information, but you can still learn things from these studies. If the
exposure and disease are both common in a particular population it may
be worth investing more resources to do a different type of study to
determine whether or not there is a causal relationship.
DEFINITION AND CALCULATION OF
ODDS RATIO & RELATIVE RISK
So the probability of rolling a 4 on one attempt with one six faced die is
1/6. The odds of rolling a 4 are 1/5. Here is another example. If 13 people
of a 60 person sample have lung cancer the probability of a person in that
group having lung cancer is 13/60 and the odds of a person in that group
having lung cancer is 13/ 47.
When we are talking about common events the difference between odds
and probability is high. For example, flipping a coin one time gives you
pretty different results. You have 1/1 odds of getting head and 1/2
probability of getting heads. However, as an event gets more and more
rare the difference between odds and probability gets very small. Pretend
there is a drawing with one winner and 10,000 people entered. The odds
of winning are 1/9,999 (0.0001) and the probability of winning is 1/10,000
(0.0001). In this case, odds and probability are essentially identical.
Ratio
I remember this by thinking about a group of pirates (group = cohort) all
saying “aRRrrr!”. That reminds you that cohort studies use RR and the
“other one” uses OR.
Now that we understand the research setting for each term we can
redefine RR & OR. I should note that I think memorizing these definitions
is unnecessary because if you understand the simpler definitions you
should be able to create these based on the scenario presented in the
question.
An RR or OR of 1 means there is no difference between the two groups
being compared with respect to what you are measuring. In this case the
treatment or risk factor being study has no effect on the rate of outcome
development. Similarly, an OR or RR of 2 means whatever you are
measuring is two times as likely to occur in the group being studied when
compared with the control group. 0.5 means it is half as likely and so on.
Later in the chapter we will cover how confidence intervals are applied to
RR & OR.
ERRATA: At about the 3:00 mark the slide says “10,00” when it is really
supposed to say “10,000.” I added a pop up box to fix it. Thanks to Mehdi
Hedjazi for pointing this typo out in a youtube comment!
NUMBER NEEDED TO TREAT &
ABSOLUTE RISK REDUCTION
Before you watch this video you should check out the previous video in
the Biostats & Epidemiology section which covers the related topics
of Probability, Odds, RR & OR. That video lays the foundation for this video so it
may be difficult to start with this one.
ARR = probability
of disease among exposed – probability of disease among the non-exposed. AR = probability of
disease among those without treatment – probability of disease among treated
I’m never able to keep the two formulas straight and the difference isn’t
that important so I don’t even try. I just remember it like this:
ARR or AR = higher
probability – lower probability
Number needed to
treat = 1 / absolute risk reduction. Number needed to harm = 1 / attributable risk
P-VALUE, STATISTICAL
SIGNIFICANCE & TYPES OF ERROR
Type 1 Error = incorrectly rejecting the null hypothesis. Researcher says there
is a difference between the groups when there really isn’t. It can be thought of
as a false positive study result. Type I Error is related to p-Value and alpha.
You can remember this by thinking that α is the first letter of the alphabet
Type 2 Error = fail to reject null when you should have rejected the null
hypothesis. Researcher says there is no difference between the groups when
there is a difference. It can be thought of as a false negative study result. The
probability of making a Type II Error is called beta. You can remember this by
thinking that β is the second letter in the greek alphabet.
Usually we focus on the null hypothesis and type 1 error, because the
researchers want to show a difference between groups. If there is any
intentional or unintentional bias it more likely exaggerates the differences
between groups based on this desire.
Power increases as you increase sample size, because you have more
data from which to make a conclusion. Power also increases as the effect
size or actual difference between the group’s increases. If you are trying
to detect a huge difference between groups it is a lot easier than
detecting a very small difference between groups. Increasing the
precision (or decreasing standard deviation) of your results also increases
power. If all of the results you have are very similar it is easier to come to
a conclusion than if your results are all over the place.
p-Value Definition:
p-value is the probability of obtaining a result at least as extreme as the
current one, assuming that the null hypothesis is true. Imagine we did a
study comparing a placebo group to a group that received a new blood
pressure medication and the mean blood pressure in the treatment group
was 20 mm Hg lower than the placebo group. Assuming the null
hypothesis is correct the p-value is the probability that if we repeated the
study the observed difference between the group averages would be at
least 20.
Now you have probably picked up on the fact that I keep adding the
caveat that this definition of the p-value only holds true if the null
hypothesis is correct (AKA if is no real difference between the groups).
However, don’t let that throw you off. You just assume this is the case in
order to perform this test because we have to start from somewhere. It is
not as if you have to prove the null hypothesis is true before you utilize
the p-value.
Statistical hypothesis testing is how we test the null hypothesis. For the
USMLE Step 1 Medical Board Exam all you need to know when to use
the different tests. You don’t need to know how to actually perform them.
A Confidence Interval (CI) is the range of values the true value in the
population is expected to fall within based on the study results. The
results we receive in any study do not perfectly mirror the overall
population and the confidence interval lets us get a better idea of what the
results in the overall population might be. The confidence interval is
based on a certain level of confidence. Don’t get this confused with the
value of the sample population. If the measured BMI in 100 people in your
study population and the mean is 25 than you are very confident that the
actual mean BMI in that group is 25. Confidence interval only comes into
play when you try to extrapolate your study results to other situations (like
to the population overall).
If you have a 95% confidence interval (which is most common) that
means there is a 95% chance that the true value lies somewhere in the
confidence interval. You can also alter the width of the confidence interval
by selecting a different percentage of confidence. 90% & 99% are also
commonly used. A 99% confidence interval is wider (has more values) than
a 95% confidence interval & 90% confidence interval is the most narrow.
The width of the CI changes with changes in sample size. The width of
the confidence interval is larger with small sample sizes. You don’t have
enough data to get a clear picture of what is going on so your range of
possible values is wider. Imagine your study on a group of 10 individuals
shows an average shoe size of 9. If based on the results you are 95%
sure that the actual average shoe size for the entire population is
somewhere in between 6 and 12, then the 95% CI is 6-12. Based just on
your results you don’t really know what the average in the population is,
because your study population is a very small sliver of the overall
population. Now if you repeat the study with 10,000 individuals and you
get an average shoe size of 9 the confidence interval is going to be
smaller (something like 8.8 to 9.3). Here you have a much larger sample
size and therefore your results give you a much clearer idea of what is
going on with the entire population. Therefore, your 95% CI shrinks. The
width of the confidence interval decreases with an increasing sample size
(n). This is sort of like the standard deviation decreasing with an
increased sample size.
Confidence intervals are often applied to RR & OR. For example, the
odds ratio might be 1.2, but you aren’t sure how much of an impact
chance had on determining that value. Therefore, instead of just reporting
the value of 1.2 you also report a range of values where the true value in
the population is likely to lie. So we would report something like the odds
ratio is 1.2 and we are 95% confident that the true value within the overall
population is somewhere between .9 and 1.5.
For both relative risk (RR) and odds ratio (OR), the “line of no difference”
is 1. So an RR or OR of 1 means there is no difference between the two
groups being compared with respect to what you are measuring. This is
because RR and OR are ratios and a value divided by itself is 1. If the
95% confidence interval of the RR or OR includes the value 1, that means
it is possible the true value is 1 and there is no difference between
groups. If that is the case, we say the null hypothesis cannot be rejected
or that there is no statistically significant difference shown. This is the
same thing as saying the p-value is greater than .05.
If you are comparing the average between groups we apply the confidence
interval to the difference between groups (the mean of one group minus
the other group). In this case the line of no difference would be 0. So if the
confidence interval for the difference between the means crosses 0, the
results are not statistically significant.