0% found this document useful (0 votes)
43 views

Introduction To Statistics

Statistics is used widely in science, business, and research to analyze data and draw conclusions. It involves collecting data, summarizing it, and identifying relationships and sources of variability. Key statistical concepts include populations, samples, randomization, and probability models. These concepts help account for uncertainty and allow inferences to be made from samples to populations. Statistics is applied in many fields like manufacturing quality control, social science research, drug discovery, and political polling. It enables evidence-based decision making from quantitative data.

Uploaded by

hsbq6s4csz
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
43 views

Introduction To Statistics

Statistics is used widely in science, business, and research to analyze data and draw conclusions. It involves collecting data, summarizing it, and identifying relationships and sources of variability. Key statistical concepts include populations, samples, randomization, and probability models. These concepts help account for uncertainty and allow inferences to be made from samples to populations. Statistics is applied in many fields like manufacturing quality control, social science research, drug discovery, and political polling. It enables evidence-based decision making from quantitative data.

Uploaded by

hsbq6s4csz
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

Introduction to Statistics

Samatrix Consulting Pvt Ltd


What is Statistics
• Statistics is both science and art.
• Statistics help us make data-driven decisions.
• Using statistics, we can relate data to the questions of our interest.
• Statistics is about gathering the data, summarizing the data, and
displaying the data.
• It helps us shed light on the question of interest and draw answers to
the questions based on quantitative evidence.
• Statistics techniques are the standard ways of turning the data from
an accumulation of numbers into useful information.
Statistics in Real Life
• In the real-life, almost everyone deals with the data.
• Corporate presidents, marketing managers, medical and drug
researchers, social scientists, and biologists use statistical techniques
to summarize and present the data.
Variability in Data
• All the numbers are not the same.
• No group of people is all the same weight.
• All the manufacturing part in a batch are not identical.
• All the students in an exam do not score equal marks. They differ from each
other. We need to know how much they differ.
• Using variability, a key concept of statistic, we can answer these questions.
• We use concepts such as standard deviation, variance, range to measure
not-the-sameness.
• Such matrices are very useful to compare the groups where we use
variability and range.
Uncertainty in Data
• Due to variability, the data always contain uncertainty.
• The selection of items to be measured as well as the measurement
process may lead to uncertainty.
• The statistical tools allow us to draw general conclusions from the
data despite the uncertainty in the data.
• The data analysis depends on the way the data were gathered.
• We need probability models to explain the inherent uncertainty in the
data.
Mathematics vs Statistics
• Even though Statistics uses many mathematical tools such as algebra,
calculus, linear algebra etc., Statistics is not purely mathematics.
• The mathematical problems are well defined and have specific
answers but statistical analysis such as data interpretation problems
are not well defined.
• It is possible to have different answers from the same decision
problem due to assumptions about the data and its collection
techniques.
Relationships in Data
Observed variables
• Suppose we have two variables that we have observed and some
additional variables that we have not observed.
• The two observed variables 𝑋and 𝑌, appears to have a relationship.
• The higher values of 𝑋 results in the higher values of 𝑌. At the same
time, lower values of 𝑋 results in the lower values of 𝑌.
• We say that the relationship between both the variables is positive.
• If the higher values of 𝑋 results in the lower values of 𝑌, we say that
the relationship is negative.
Causal Association
• Now, we would like to determine the reasons behind the association.
• There could be several possible explanations.
• One of the explanations is that the association is causal.
• Which means that 𝑋 might be the cause of 𝑌 or vasa versa.
Lurking Variable
• There might be an undefined third variable 𝑍 that has a causal effect
on both 𝑋 and 𝑌.
• 𝑋 and 𝑌 are not in a direct causal relationship.
• 𝑋 and 𝑌 are associated due to their individual causal relationship with
𝑍, known as the lurking variable.
• The relationship between two variables due to a third variable,
lurking variable that is hiding in the background and affecting the data
Confounded Variables
• Another explanation is that the association is due to both the causal
effect and the lurking variable.
• In certain situations, both a causal effect and lurking variable may
contribute to the association.
• The variables that have both the causal effect and effect of the lurking
variable in the association are confounded variables.
Reasons for association
• For the statistical analysis, we need to determine the possible reasons
for the association.
• If the reason is the causal effect, our next goal should be to determine
the size of the effect.
• If the reason for the association is a causal effect, confounded with
the effect of a lurking variable, then our next goal should be to
determine the sizes of both the effects.
Cause and Effect Relationship
• The scientific methods search for cause and effect relationships between two
variables: experimental variable and outcome variable.
• How the changes in the experimental variable affect the outcome variable.
• To establish a relationship, scientific modelling helps develop mathematical
models.
• In both the modeling processes, we need to isolate the experiment from the
outside factors.
• In the fields of physics and chemistry, the number of outside factors could be
easily identified and controlled.
• Thus, there were no lurking variables.
• However, in biology, medicine, engineering, technology, and the social sciences,
we cannot identify and control all external factors.
• Hence, we need a different way to control such outside factors.
Role of Statistics in Scientific
Methods
Variability in Data
• Statistical methods can be used when there is variability in the data.
• The probability models can be used in situations where the relevant
outside factors cannot be identified.
• The factors that cannot be identified, cannot be controlled directly,
and they can affect the data. The lack of direct control can lead to the
wrong interpretation of the experiment results and data.
Randomization
• The statistical idea of randomization has been developed to deal with uncontrolled
factors.
• According to the statistical theory of the design of experiments, randomization involves
randomly allocating the experimental units across the treatment group.
• In the randomization process, each participant has the same chance of being assigned to
either intervention or control.
• For example, if we can want to compare the effects of a new drug with that of a standard
drug, we should randomly assign the patient either to the new drug or the standard
drug.
• Such processes of randomization reduce the confounding effect.
• It averages out the effect of confounding variables and helps us control external factors
statistically.
• Using randomization, we can measure the amount of uncertainty that remains using the
probability model.
Population and Sample
• The concept of the statistical population is very significant.
• The statistical population is the entire group of objects and people.
• It contains all the possible values of observations.
• We can prepare a data set that consists of observations that we have
taken from the sample of the population.
• In order to infer the population parameters from the sample
statistics, the sample must be representative of the population.
• To get the representative sample, we must choose the sample
randomly.
Current applications of statistics
Manufacturing
• A company that is a brand in manufacturing lightbulb produces
approximately 500,000 bulbs per day. To control the quality, the
defect rate of the bulbs needs to be checked. One of the options
would be to test each bulb but that would be very costly. An
alternative approach would be to select 1000 bulbs from daily
production of 500,000 bulbs and test each of 1000 bulbs. The defect
rate of the complete daily production can be estimated using the
defect rate of the sample.
Social Science
• An investigator wants to find whether individuals who quit smoking,
gain weight subsequently. The investigator creates a random sample
of 200 individuals, who had participated in programs to quit smoking.
Each individual in the sample was weighted at the beginning of the
experiment and after a gap of one year. The average increase in
weight of the participants was 5 pounds. Based on the evidence, the
researcher concluded that the claim was valid.
Drug Discovery
• Statisticians play an important role in the discovery of a new
pharmaceutical drug. After the candidate drug passes the initial test
and proves to be effective for the alleviation of a condition, it needs
to be checked for other parameters such as side effects and
interactions with other drugs. In the next stage, statisticians help to
find the optimum quantity and spacing of the new pharmaceutical
drug. After thorough and successful testing on the lab, the drug is
tested on human subjects.
Politics
• One of the applications of statistics is in the opinion and exit polls
conducted by various agencies and media houses. They are able to
predict how the voters are going to vote in upcoming elections. They
cannot contact every voter in all the constituencies. The sample
opinion of a smaller number of voters in each constituency to
estimate the vote share each political party may get. In this process,
they are able to estimate the number of votes each party may get at
constituency, state, or country level.
Thanks
Samatrix Consulting Pvt Ltd

You might also like