Statistics is used widely in science, business, and research to analyze data and draw conclusions. It involves collecting data, summarizing it, and identifying relationships and sources of variability. Key statistical concepts include populations, samples, randomization, and probability models. These concepts help account for uncertainty and allow inferences to be made from samples to populations. Statistics is applied in many fields like manufacturing quality control, social science research, drug discovery, and political polling. It enables evidence-based decision making from quantitative data.
Statistics is used widely in science, business, and research to analyze data and draw conclusions. It involves collecting data, summarizing it, and identifying relationships and sources of variability. Key statistical concepts include populations, samples, randomization, and probability models. These concepts help account for uncertainty and allow inferences to be made from samples to populations. Statistics is applied in many fields like manufacturing quality control, social science research, drug discovery, and political polling. It enables evidence-based decision making from quantitative data.
What is Statistics • Statistics is both science and art. • Statistics help us make data-driven decisions. • Using statistics, we can relate data to the questions of our interest. • Statistics is about gathering the data, summarizing the data, and displaying the data. • It helps us shed light on the question of interest and draw answers to the questions based on quantitative evidence. • Statistics techniques are the standard ways of turning the data from an accumulation of numbers into useful information. Statistics in Real Life • In the real-life, almost everyone deals with the data. • Corporate presidents, marketing managers, medical and drug researchers, social scientists, and biologists use statistical techniques to summarize and present the data. Variability in Data • All the numbers are not the same. • No group of people is all the same weight. • All the manufacturing part in a batch are not identical. • All the students in an exam do not score equal marks. They differ from each other. We need to know how much they differ. • Using variability, a key concept of statistic, we can answer these questions. • We use concepts such as standard deviation, variance, range to measure not-the-sameness. • Such matrices are very useful to compare the groups where we use variability and range. Uncertainty in Data • Due to variability, the data always contain uncertainty. • The selection of items to be measured as well as the measurement process may lead to uncertainty. • The statistical tools allow us to draw general conclusions from the data despite the uncertainty in the data. • The data analysis depends on the way the data were gathered. • We need probability models to explain the inherent uncertainty in the data. Mathematics vs Statistics • Even though Statistics uses many mathematical tools such as algebra, calculus, linear algebra etc., Statistics is not purely mathematics. • The mathematical problems are well defined and have specific answers but statistical analysis such as data interpretation problems are not well defined. • It is possible to have different answers from the same decision problem due to assumptions about the data and its collection techniques. Relationships in Data Observed variables • Suppose we have two variables that we have observed and some additional variables that we have not observed. • The two observed variables 𝑋and 𝑌, appears to have a relationship. • The higher values of 𝑋 results in the higher values of 𝑌. At the same time, lower values of 𝑋 results in the lower values of 𝑌. • We say that the relationship between both the variables is positive. • If the higher values of 𝑋 results in the lower values of 𝑌, we say that the relationship is negative. Causal Association • Now, we would like to determine the reasons behind the association. • There could be several possible explanations. • One of the explanations is that the association is causal. • Which means that 𝑋 might be the cause of 𝑌 or vasa versa. Lurking Variable • There might be an undefined third variable 𝑍 that has a causal effect on both 𝑋 and 𝑌. • 𝑋 and 𝑌 are not in a direct causal relationship. • 𝑋 and 𝑌 are associated due to their individual causal relationship with 𝑍, known as the lurking variable. • The relationship between two variables due to a third variable, lurking variable that is hiding in the background and affecting the data Confounded Variables • Another explanation is that the association is due to both the causal effect and the lurking variable. • In certain situations, both a causal effect and lurking variable may contribute to the association. • The variables that have both the causal effect and effect of the lurking variable in the association are confounded variables. Reasons for association • For the statistical analysis, we need to determine the possible reasons for the association. • If the reason is the causal effect, our next goal should be to determine the size of the effect. • If the reason for the association is a causal effect, confounded with the effect of a lurking variable, then our next goal should be to determine the sizes of both the effects. Cause and Effect Relationship • The scientific methods search for cause and effect relationships between two variables: experimental variable and outcome variable. • How the changes in the experimental variable affect the outcome variable. • To establish a relationship, scientific modelling helps develop mathematical models. • In both the modeling processes, we need to isolate the experiment from the outside factors. • In the fields of physics and chemistry, the number of outside factors could be easily identified and controlled. • Thus, there were no lurking variables. • However, in biology, medicine, engineering, technology, and the social sciences, we cannot identify and control all external factors. • Hence, we need a different way to control such outside factors. Role of Statistics in Scientific Methods Variability in Data • Statistical methods can be used when there is variability in the data. • The probability models can be used in situations where the relevant outside factors cannot be identified. • The factors that cannot be identified, cannot be controlled directly, and they can affect the data. The lack of direct control can lead to the wrong interpretation of the experiment results and data. Randomization • The statistical idea of randomization has been developed to deal with uncontrolled factors. • According to the statistical theory of the design of experiments, randomization involves randomly allocating the experimental units across the treatment group. • In the randomization process, each participant has the same chance of being assigned to either intervention or control. • For example, if we can want to compare the effects of a new drug with that of a standard drug, we should randomly assign the patient either to the new drug or the standard drug. • Such processes of randomization reduce the confounding effect. • It averages out the effect of confounding variables and helps us control external factors statistically. • Using randomization, we can measure the amount of uncertainty that remains using the probability model. Population and Sample • The concept of the statistical population is very significant. • The statistical population is the entire group of objects and people. • It contains all the possible values of observations. • We can prepare a data set that consists of observations that we have taken from the sample of the population. • In order to infer the population parameters from the sample statistics, the sample must be representative of the population. • To get the representative sample, we must choose the sample randomly. Current applications of statistics Manufacturing • A company that is a brand in manufacturing lightbulb produces approximately 500,000 bulbs per day. To control the quality, the defect rate of the bulbs needs to be checked. One of the options would be to test each bulb but that would be very costly. An alternative approach would be to select 1000 bulbs from daily production of 500,000 bulbs and test each of 1000 bulbs. The defect rate of the complete daily production can be estimated using the defect rate of the sample. Social Science • An investigator wants to find whether individuals who quit smoking, gain weight subsequently. The investigator creates a random sample of 200 individuals, who had participated in programs to quit smoking. Each individual in the sample was weighted at the beginning of the experiment and after a gap of one year. The average increase in weight of the participants was 5 pounds. Based on the evidence, the researcher concluded that the claim was valid. Drug Discovery • Statisticians play an important role in the discovery of a new pharmaceutical drug. After the candidate drug passes the initial test and proves to be effective for the alleviation of a condition, it needs to be checked for other parameters such as side effects and interactions with other drugs. In the next stage, statisticians help to find the optimum quantity and spacing of the new pharmaceutical drug. After thorough and successful testing on the lab, the drug is tested on human subjects. Politics • One of the applications of statistics is in the opinion and exit polls conducted by various agencies and media houses. They are able to predict how the voters are going to vote in upcoming elections. They cannot contact every voter in all the constituencies. The sample opinion of a smaller number of voters in each constituency to estimate the vote share each political party may get. In this process, they are able to estimate the number of votes each party may get at constituency, state, or country level. Thanks Samatrix Consulting Pvt Ltd