0% found this document useful (0 votes)
11 views

Variables

Uploaded by

onlinenotes4
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views

Variables

Uploaded by

onlinenotes4
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 13

VARIABLES

Kerlinger (1986):Definition: “A variable is a property that takes on different


values. It is a symbol to which numerals or values are assigned.”

Best and Kahn (2003):Definition: “A variable is any characteristic or quality


that varies among the members of a particular group.”

In biostatistics, variables are classified into different types based on their


characteristics and the kind of data they represent. These classifications help
determine the appropriate statistical methods for analysis. The main types of
variables in biostatistics are:

### 1. **Categorical Variables**

Categorical variables represent data that can be divided into distinct groups
or categories. They can be further classified into nominal and ordinal
variables.

- Nominal Variables: These variables have categories without any


intrinsic ordering. Examples include blood type (A, B, AB, O), gender
(male, female), and race (Caucasian, Asian, African American, etc.).
- Ordinal Variables: These variables have categories with a meaningful
order, but the intervals between the categories are not necessarily
equal. Examples include stages of cancer (Stage I, Stage II, Stage III,
Stage IV) and satisfaction ratings (very satisfied, satisfied, neutral,
dissatisfied, very dissatisfied).

### 2. **Numerical Variables**

Numerical variables represent data that can be measured on a numerical


scale. They can be further classified into discrete and continuous variables.

- Discrete Variables: These variables represent countable data,


usually integers. Examples include the number of patients in a study,
the number of infections, and the number of hospital visits.
- Continuous Variables: These variables represent data that can take
any value within a range and can be measured with high precision.
Examples include height, weight, blood pressure, cholesterol levels,
and age.

### 3. Binary Variables

Binary variables, also known as dichotomous variables, are a special type of


categorical variable that have only two categories or levels. Examples
include presence or absence of a disease (yes/no), gender (male/female),
and survival status (alive/deceased).

### 4. Interval Variables

Interval variables are numerical variables that have equal intervals between
values, but no true zero point. Examples include temperature in Celsius or
Fahrenheit, where the difference between 20°C and 30°C is the same as
between 30°C and 40°C, but 0°C does not indicate an absence of
temperature.

### 5. Ratio Variables

Ratio variables are numerical variables that have equal intervals between
values and a true zero point, which allows for meaningful ratios. Examples
include height, weight, age, and income. For instance, a weight of 0 means
no weight, and a weight of 10 kg is twice as much as 5 kg.

5. *Confounding Variables*:

- Variables that affect both independent and dependent variables (e.g.,


age, gender)

6. *Moderator Variables*:

- Variables that interact with independent variables to affect the


dependent variable (e.g., drug interactions)

7. *Time Variables*:

- Variables that measure time (e.g., follow-up time, survival time)

8. *Repeated Measures Variables*:

- Variables measured multiple times for each individual (e.g., longitudinal


studies)

NORMAL DISTRIBUTION
Normal distribution, also known as the Gaussian distribution, is a probability
distribution that appears as a “bell curve” when graphed. The normal
distribution describes a symmetrical plot of data around its mean value,
where the width of the curve is defined by the standard deviation.
Formula:

X = value of the variable or data being examined and f(x) the probability
function

μ = the mean

σ = the standard deviation

The properties of normal distributions:

 The mean, median and mode are exactly the same.


 The distribution is symmetric about the mean—half the values fall
below the mean and half above the mean.
 The distribution can be described by two values: the mean and the
standard deviation.
 In a normal distribution, the mean is zero and the standard deviation is
1. It has zero skew and a kurtosis of 3.

Empirical rule

 Around 68.2% of values are within 1 standard deviation from the mean.
 Around 95.4% of values are within 2 standard deviations from the
mean.
 Around 99.7% of values are within 3 standard deviations from the
mean.

Central Limit Theorem:

The Central Limit Theorem (CLT) states that the sum (or average) of a large
number of independent, identically distributed variables will be
approximately normally distributed, regardless of the original distribution of
the variables. This is crucial for inferential statistics, as it allows for the use
of normal distribution assumptions in hypothesis testing and confidence
interval estimation.

Applications in Biostatistics:

- Summarizing data using mean and standard deviation.

- Conducting hypothesis tests and creating confidence intervals for


population parameters.

- Assuming normal distribution of errors to make inferences about


relationships between variables.

- Using normal distribution properties to monitor and control processes.

Examples

- **Height and Weight**: Often, the distribution of heights and weights in a


population approximates a normal distribution.

- **Blood Pressure**: The distribution of blood pressure readings in a large


population can often be modeled as a normal distribution.
SKEWNESS

Skewness is a statistical measure that describes the asymmetry of a


distribution. It indicates whether the data points are spread out more to the
left or the right of the mean. There are three types of skewness:

Positive Skewness (Right Skewness): When the tail on the right side of
the distribution is longer or fatter than the left side. The bulk of the values lie
to the left of the mean, and the mean is typically greater than the median.

Negative Skewness (Left Skewness): When the tail on the left side of the
distribution is longer or fatter than the right side. The bulk of the values lie to
the right of the mean, and the mean is typically less than the median.

Zero Skewness (Symmetry): When the distribution is symmetrical, the


tails on both sides are balanced, indicating that the mean and median are
equal.
**Importance of Skewness:**

1. Understand data distribution shape


2. Identify outliers and their impact

3. Make informed decisions in finance, risk assessment, and quality control

**Limitations of Skewness:**

1. Skewness does not provide information about the peakedness or tail


heaviness of the distribution.
2. Skewness can be highly affected by outliers, leading to potential
misinterpretation.
3. It only describes the direction of asymmetry, not its impact on data
interpretation.

POSITIVE VS NEGATIVE SKEWNESS

1. **Direction of the Tail**:

- **Positive Skewness (Right-Skewed)**: The tail extends to the right,


indicating the presence of outliers that are larger than most of the data.

- **Negative Skewness (Left-Skewed)**: The tail extends to the left,


indicating the presence of outliers that are smaller than most of the data.

Boi er 3 no. Point ( middle)

4. **Shape of the Distribution**:

- **Positive Skewness**: The peak (mode) is to the left of the center, with
the right tail being longer.

- **Negative Skewness**: The peak (mode) is to the right of the center,


with the left tail being longer.

6. **Impact on Descriptive Statistics**:

- **Positive Skewness**: Increases the standard deviation and variance due


to the presence of high-value outliers.

- **Negative Skewness**: Decreases the standard deviation and variance


due to the presence of low-value outliers.

Here are 10 key differences between positive and negative skewness:


1. Direction of the tail:

- Positive skew: Long tail extends to the right

- Negative skew: Long tail extends to the left

2. Position of the mean:

- Positive skew: Mean is greater than median

- Negative skew: Mean is less than median

3. Relation to mode:

- Positive skew: Mode < Median < Mean

- Negative skew: Mean < Median < Mode

4. Concentration of data:

- Positive skew: Most data concentrated on the left

- Negative skew: Most data concentrated on the right

5. Outliers:

- Positive skew: Outliers tend to be on the high end

- Negative skew: Outliers tend to be on the low end

7. Common examples:

- Positive skew: Income distributions, reaction times

- Negative skew: Age at death, exam scores in easy tests

8. Shape analogy:

- Positive skew: “Right-tailed” or “right-leaning”


- Negative skew: “Left-tailed” or “left-leaning”

9. Mathematical representation:

- Positive skew: Skewness coefficient > 0

- Negative skew: Skewness coefficient < 0

10. Effect on standard deviation:

- Positive skew: Tends to inflate standard deviation

- Negative skew: May compress standard deviation

KURTOSIS
Kurtosis is a statistical measure that describes the shape of a distribution’s
tails in relation to its overall shape, specifically the “tailedness” or the
propensity for producing outliers. It provides insight into the data’s
distribution, indicating how much of the data is in the tails and the peak of
the distribution.

1. **Types of Kurtosis**:

- Mesokurtic (Kurt= 3.0) : This is the kurtosis of a normal distribution,


with a kurtosis value of zero. It represents a moderate level of tail thickness
and peak height.

- Leptokurtic (Kurt > 3.0) : Distributions with positive kurtosis values


(>0) are called leptokurtic. These have fatter tails and a sharper peak than
the normal distribution, indicating more frequent extreme values. While a
leptokurtic distribution may be “skinny” in the center, it also features “fat
tails.”

- Platykurtic (Kurt<3.0) : Distributions with negative kurtosis values (<0)


are called Platykurtic. These have thinner tails and a flatter peak than the
normal distribution, indicating fewer extreme values.
Excess Kurtosis

Excess kurtosis is a metric that compares the kurtosis of a distribution


against the kurtosis of a normal distribution. The kurtosis of a normal
distribution equals 3. Therefore, the excess kurtosis is found using the
formula below:

Excess Kurtosis = Kurtosis – 3

**Importance of Kurtosis:**

1. Tail Extremity Analysis

2. Risk Assessment
3. Distribution Shape Insight

Limitations of Kurtosis:

1. Insensitive to Mean and Variance

2. Complex Interpretation

3. Outlier Sensitivity

You might also like