Statistics theory( Soyaib)
Statistics theory( Soyaib)
2020:
Binomial Distribution: A random variable X is said to follow binomial distribution if it assumes only non-negative
values and its Probability mass function is given by-
It is characterized by two parameters: the number of trials and the probability of success on each trial.
Median: In statistics, the median is the middle value in a set of ordered data. To calculate the median, we first order
the data from least to greatest. If the data set has an odd number of values, the median is the middle value. If the data
set has an even number of values, the median is the average of the two middle values.
Mode: The mode is the most frequent value in a data set. There can be one mode, two modes, or even more modes in
a data set. If there is no mode, the data set is said to be unimodal.
Sample space: A sample space is the set of all possible outcomes of an experiment. For example, the sample space for
flipping a coin is {heads, tails}. The sample space for rolling a die is {1, 2, 3, 4, 5, 6}.
Ex: Suppose you have a bag containing 3 red balls and 2 blue balls. The sample space for drawing a ball from the bag is
{red, red, red, blue, blue}.
Conditional probability: Conditional probability is the probability of an event happening given that another event has
already happened. It is denoted by P(A|B), where A is the event that we are interested in and B is the event that has
already happened.
EX: Suppose we draw a ball from the bag in the previous example and it is red. What is the probability that the next
ball drawn will also be red? The sample space for the second draw is {red, red, red, blue, blue}, but we know that the
first ball drawn was red, so we can remove the blue balls from the sample space. This leaves us with a sample space of
{red, red, red}. The probability of drawing a red ball from this sample space is 3/3, or 1.
Event: An event in statistics is a set of outcomes in a sample space. For example, the event "drawing a red ball" from
the bag in the previous example is a set that contains the outcomes {red, red, red}.
Ex: Suppose we roll a die twice and record the results. The sample space for this experiment is {(1, 1), (1, 2), ..., (6, 6)}.
The event "rolling a double" is a set that contains the outcomes {(1, 1), (2, 2), (3, 3), (4, 4), (5, 5), (6, 6)}.
The Poisson distribution is also important because it is mathematically tractable, meaning that it can be easily analyzed and
manipulated. This makes it a valuable tool for scientists and engineers who need to model complex systems.
Here are some specific examples of how the Poisson distribution is used in the real world:
• Insurance companies use the Poisson distribution to model the number of claims they are likely to receive in a given
period of time. This information helps them to set rates and develop risk management strategies.
• Traffic engineers use the Poisson distribution to model the number of cars that will arrive at an intersection in a given
period of time. This information helps them to design traffic signals and other traffic control measures.
• Quality control engineers use the Poisson distribution to model the number of defects in a batch of products. This
information helps them to develop quality control procedures and identify areas where the manufacturing process can
be improved.
• Biologists use the Poisson distribution to model the number of mutations that will occur in a DNA sequence. This
information helps them to study the evolution of life and to develop new diagnostic and therapeutic tools.
However, the normal distribution also has some limitations. Here are a few:
• It is not always appropriate for modeling data that is skewed or has outliers. Skewed data means that
the distribution is not symmetrical, with more values on one side of the mean than the other. Outliers are
values that are far from the mean.
• The normal distribution assumes that the data is continuous. This means that the values can take on any
value within a certain range. However, some data is discrete, meaning that the values can only take on certain
values.
• The normal distribution is not always the most accurate model for data. In some cases, other
distributions may be more accurate
Comment on the following: mean binomial distribution is 3 and variance is 4
The mean of a binomial distribution is equal to the product of the number of trials and the probability of success on
each trial. The variance of a binomial distribution is equal to the product of the number of trials, the probability of
success on each trial, and the probability of failure on each trial.
Therefore, if the mean of a binomial distribution is 3 and the variance is 4, then we must have:
np = 3
npq = 4
where n is the number of trials, p is the probability of success on each trial and q is the probability of failure on each
trial.
Therefore, the binomial distribution with mean 3 and variance 4 is the distribution of the number of successes in 4
trials, where the probability of success on each trial is 3/4.
This distribution is also known as the Bernoulli distribution with parameter 3/4.
Correlation is a measure of the linear relationship between two variables. It is calculated using a formula that takes
into account the means, standard deviations, and covariance of the two variables. Correlation can range from -1 to
1, with a value of 0 indicating no linear relationship and a value of 1 indicating a perfect positive linear relationship. A
value of -1 indicates a perfect negative linear relationship.
Regression is a statistical technique that can be used to model the relationship between two variables. It is used to
predict the value of one variable (the dependent variable) based on the value of the other variable (the independent
variable). Regression analysis can be used to identify the factors that contribute to a particular outcome and to make
predictions about future outcomes.
2019:
Mention the use of standard deviation:
• Quality control: Standard deviation can be used to monitor the quality of products and processes. For
example, a manufacturer might use standard deviation to track the variation in the weight of their products.
• Financial analysis: Standard deviation can be used to measure the risk of an investment. For example, an
investor might use standard deviation to compare the risk of two different stocks.
• Scientific research: Standard deviation is used to measure the variability of experimental results. For
example, a scientist might use standard deviation to compare the results of two different treatment groups.
Here are some specific examples of how standard deviation is used in the real world:
• A pharmaceutical company uses standard deviation to ensure that the amount of active ingredient in their
drugs is consistent from batch to batch.
• A financial analyst uses standard deviation to determine how risky it is to invest in a particular company.
• A teacher uses standard deviation to assess the performance of their students on a test.
• A scientist uses standard deviation to determine if the results of their experiment are statistically significant.
coefficient of variation: The coefficient of variation (CV), also known as the normalized standard deviation, is a
statistical measure of the dispersion of data points around the mean, relative to the mean. It is defined as the ratio of
the standard deviation to the mean
• Rigidly defined: The definition of the average should be clear and unambiguous.
• Easy to calculate and understand: The average should be easy to calculate and understand, both for
laypeople and for statisticians.
• Based on all items: The average should be based on all of the items in the data set.
• Suitable for further algebraic treatment: The average should be amenable to further algebraic
treatment, such as addition, subtraction, multiplication, and division.
• Stable: The average should be stable, meaning that it should not be unduly affected by small changes
in the data set.
• Resistant to outliers: The average should be resistant to outliers, meaning that it should not be overly
influenced by extreme values in the data set.
• Uniquely defined: The average should be uniquely defined for a given data set, meaning that there
should be only one "correct" average for a given data set.
Difference between raw moment and central moment in group data:
Statistical estimate: A statistical estimate is a value that is calculated from a sample and used to estimate
the value of a population parameter.
Normal Distribution: The normal distribution is a continuous probability distribution that is symmetrical
around the mean, with most of the values concentrated around the center and the tails tapering off to infinity.
The null hypothesis and alternative hypothesis are two competing hypotheses that researchers weigh the
evidence for and against using a statistical test.
• Null hypothesis: The average height of men and women is the same.
• Alternative hypothesis: The average height of men is greater than the average height of
women.
2018:
Explain skewness and kurtosis:
• Positive skewness occurs when the tail of the distribution extends to the right. This means that there are more values
on the left side of the mean than on the right side of the mean.
• Negative skewness occurs when the tail of the distribution extends to the left. This means that there are more values
on the right side of the mean than on the left side of the mean.
• Positive kurtosis occurs when the distribution is more peaked than a normal distribution. This means that there are
more values near the mean and fewer values in the tails of the distribution.
• Negative kurtosis occurs when the distribution is less peaked than a normal distribution. This means that there are
fewer values near the mean and more values in the tails of the distribution.
• Simple correlation: Simple correlation measures the linear relationship between two variables. It ranges from -1
to 1, where -1 is a perfect negative correlation, 0 is no correlation, and 1 is a perfect positive correlation.
• Rank correlation: Rank correlation measures the association between two rankings of the same variables. It also
ranges from -1 to 1, where the same interpretations apply.
2017:
Histogram: A bar chart that shows the distribution of continuous data.
Ogive curve: A graph that shows the cumulative frequency of a distribution.
Frequency polygon: A line graph that connects the midpoints of the bars of a histogram.
Pie chart: A circular graph that shows the proportion of a whole that each category represents.
Simple event: A simple event is an event that cannot be broken down into any smaller events.
• Example: Flipping a coin and getting heads is a simple event.
Mutually exclusive event: Two events are mutually exclusive if they cannot both happen at the same time.
• Example: Flipping a coin and getting heads and tails at the same time is mutually exclusive.
Conditional probability: he probability of event A happening given that event B has already happened.
• Example: The probability of drawing a red card from a deck of cards after drawing a black card is conditional
probability.
• Finance: The normal distribution is used to model the prices of stocks and bonds, as well as the risk and return
of investments.
• Quality control: The normal distribution is used to monitor the quality of production processes and to identify
outliers.
• Medicine: The normal distribution is used to design clinical trials and to analyze data from medical studies.
• Science: The normal distribution is used to analyze data from experiments and to make inferences about
populations.
Normal distribution is foundational to many statistical tools and models and has a wide range of applications in
diverse fields due to its bell-shaped curve and symmetricity around its mean.
2016:
Write down three properties of i) Negative Binomial distribution ii) Normal distribution
Normal Distribution:
2015:
Point estimate: A point estimate is a single value that is used to estimate the population parameter.
Interval estimate: An interval estimate is a range of values that is likely to contain the population parameter.
• Type of data: Histograms are typically used for continuous data, while frequency polygons can be used for
both continuous and discrete data.
• Visual representation: Histograms use bars to represent the frequency of each value, while frequency
polygons use lines to represent the frequency of each value.
• Shape of the distribution: Histograms are better for displaying the shape of the distribution, while frequency
polygons are better for displaying the trends and patterns in the data
• Mean: The mean is the most commonly used measure of central tendency. It is calculated by adding
up the values of all the data points and then dividing by the number of data points.
• Median: The median is the middle value in a data set that has been ordered from highest to lowest or
lowest to highest.
• Mode: The mode is the most frequent value in a data set.
- 2101106