Theory and Formula

Uploaded by

Govind Trivedi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF or read online on Scribd

0% found this document useful (0 votes)

26 views

Theory and Formula

Uploaded by

Govind Trivedi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF or read online on Scribd

You are on page 1/ 42

oN" 6 B ® su PPOs ke © We take two positive numbers 4 and 16. Arithmetic mean = 4416 . 20 = 10 ae shows that A> G>H. Lhe equality sign holds only i ou : Js only iffall the mumbers Xj, Xap «++» Xy are identical. Gi) Taking the same figures 4 and 16 oe G? = (8) = 64 AH = Arith, Mean x Harmonic Mean 10x64 = 64 Hence, G’=AH Arithmetic mean ‘A measure of central tendency calculated by dividing the sum of observations by the number of observations in the data set. Bimodal distribution A distribution that has two modes. : Fractiles that divide the data into ten equal parts. ‘A measure of central tendency computed by taking the nt root of the product of n observations. The reciprocal of the arithmetic mean of the reciprocals of individual observations. Lower quartile or first The value in a ranked data set such that one-fourth of the measur ments are below this value and three-fourths are above it. Deciles Geometric mean Harmonic mean quartile Measures of central Measures that describe the centre of a distribution. The me tendency median and mode are three measures of central tendency. Median The value of the middle item in a data set arranged in an ascending ‘or a descending order. It divides the data set into two equal parts ‘Mode The value that has the maximum frequency in the data set. Multimodal distribution ‘A distribution that has more than two modes. . , ‘Values that are either very small or very large as compared (0 Outliers majority of the values in a data set, 5 Fractiles that divide a ranked data set into hundred equal par Percentiles It the square root of the arithmetic mean of the squares: Quadratic mean Fractiles that divide a ranked dataset into four equal pats OuartilesMeasures of Central Tendency 117 It is the same as the median that divides a ranked data set into two equal parts. A distribution that has only one mode, Third of the three quartiles that divide a ranked data set into four rartile equal parts, About three-fourths of the values in a data set are smaller than the value of the third quartile and about one-fourth above it. ‘An average in which each item in the data is weighted depending on its importance in the total series. Second quartile Pnimodal per quartile or third LIST OF FORMULAE =k |. Population arithmetic mean of individual observations: 1 Sample arithmetic mean of individual observations: xe a : 3. Sample arithmetic mean of a discrete series: . Sample arithmetic mean of grouped data: where m stands for mid-points. Sample arithmetic mean by the short-cut method: ¥= At Bi where 4 stands for the arbitrary mean and d stands for the deviation from arbitrary mean. where d’ stands for the deviations divided by the common factor C, which is used to simplify calculations. Sample arithmetic mean by the step-deviation method: ¥= A+ Weighted mean: %, = a where w stands for the weight. mx + % Combined mean of two serjes: ¥ nm +My This can be generalised for any number of series. Geometric mean of individual observations; GM = 4x, +42 ° log x, + log x2 tothe) = Anta ; 7 . Geometric mean of a frequency distribution: GM = Antilog (ee) , Where .x is mid-point—_ & Business Statistics H . n ‘atmonic mean for grouped data: HM = Tia) ae “yh Median in a data array: M = Size of (“4) item where nis the number of items in the data array. Where the series consists of an even num, items, the median is the average of the two middle items. 1-4 et of Median in a grouped series: M = + (m-c) Where m is the size of the middle item, ie. [(n + 1) + 2)]" item, /y and J; are respectively 4, upper and lower limits of the class in which the median lies, fis the frequency of the classi which the median lies, and c is the cumulative frequency of the preceding class in which i median lies. Lower quartile: Q; b it Upper quartile: Q,=/,+ bel (42--} Decile, say,2™ decile: D,=/, + ak (ae = 3) Percentile, say, 10" percentile: Pig = J; + ei “a i ) © REG EF tn as Mode: Mo =f, + GaWyt Goh) XE where _f; = frequency of the class in which the mode lies fo = frequency of the class preceding the modal class fy = frequency of the class succeeding the modal class i = class-interval Compound interest formula P,, = Pp (1 +r y" where P, alue of investment at the end of the n™ year Po = initial investment y = annual rate of interest n = number of yearsVon Measures of Dispersion 149 = 75D) A series comprising five individual observations is as follows: éand 7 ‘hese mumbers into standard scores. avert thes ‘A measure of relative variability that expresses the standard devia tion as a perceniage of the mean. b The spread or variability in a set of data. werquartile range The difference between the values of the first and the third quartiles.P| 150 Business Statistics Mean deviation A measure of dispersion that gives the average absolute dir, Gi. ignoring plus and minus signs) between each item gpa mean, & Measures of dispersion Measures that give the spread of a distribution, Quartile deviation or A measure of dispersion, that is obtained by dividing the dig, Semi-interquartile range between the upper and the lower quartiles by two. eee Range Difference between the largest and the smallest valucs in a dat, ‘a Standard deviation The square root of the variance in a series. It shows how the dag are spread out, Standard score The transformation of an observation by subtracting the Mean ang then dividing by the standard deviation. Thus, an observa expressed in standard deviation units above or below the mean. Standardised variable A variable that expresses the x value of interest in terms of number of standard deviations it is away from (that is, above op below) the mean. It is also known as the standardised normal yay, able. Statistic A summary measure calculated for sample data. Variance Average squared deviation between the mean and each item ing series. 1. Range=L-S where L = value of the largest item and S = value of the smallest item. : ESS 2. Coefficient of range = 55 3. Interquartile range = Q, ~ Q,, where Q, and Q, are upper and lower quartiles, respectively. 4. Semi-interquartile range or Quartile deviation = Sea 5. Coefficient of semi-interquartile range or Quartile deviation = 93-1 Gt+Q 21x! N where |x| stands for deviations from the mean ignoring plus and minus signs 6. Mean deviation = 7, Mean deviation in a grouped frequency = 2A where [d] stands for deviations from the mean ignoring plus and minus signs. Bigs 8. Variance of x = 2@— HPMeasures of Dispersion 151 eviation o of ungrouped data * f SSG EB [a a Nido Ni red = deviation from the mean deviation in a grouped series _ [Sioa _ [Ee N N v7 where d=+—# si i standard deviation using the arbitrary mean EE =x—A (arbitrary mean) ‘ eviation by the step-deviation method EEA where d’ stands for deviations divided by C, the clas: lations. / Combined standard deviation of two series ! me m (OP +d?) +1,(03 +43) a nm +My = standard deviation of first group; % being the number of olservations in group 1 and gro iandard d whet - standard where d . Standard d s-interval. C is used to simplify the calcu- = standard deviation of second group; where Oj ny and ny up 2, respectively; @ = (Hi — ia) and dy = (%—%2) - . Coefficient of variation which is free from unit of ‘measurement. It enabl ent units of measurement. \ a CV = —(100)% po u (100) Atelative measure of dispersion, les us to compare two or more distributions having differ Standardised variable Zoek o This is the standard score of an observation x in which we are interested. This shows the number of standard deviations the observation lies below or above the mean,s measure of skewness Kari Pearson's measure of skewness Kelly's measure of skewness Kurtosis Leptokurtic curve Mesokurtic curve Moments Negative skewness Platykurtic curve Positive skewness Skewness Symmetrical curve A measure of skewness based on quartile values. It varies betyey #1, Difference between the mean and the mode divided by the standar deviation of a given data set. A measure of skewness based on percentiles. The degree of ‘peakedness’ or ‘flatness’ of a frequency polygon, A distribution in which most of the observations are concentrated near the mode and in tails. A distribution that is less peaked than a leptokurtic curve. A concept that indicates different aspects of a given distribution By using moments, we can measure the central tendency of as- ries, dispersion or variability, skewness and the pickedness of the curve. When more observations lie to the right of the mean, the longertall of the distribution extends to the left. A ‘flat’ distribution like a table or plateau— When more observations lie to the left of the mean, the longer of the distribution extends to the right. : The extent of non-symmetry or ‘lop-sidedness” of a distribution A ‘bell-shaped’ curve.rT Re aoe Mean - Mode wness = 1. Seesficient of skewness (Karl Pearson’s formula) \ 2. Mean — Mode 3(Mean — Median) @ Standard deviation Gi) Standard deviation nd formula is to be used when mode is ill-defined. ey’s measure) Skp = (Q3 ~ Q2) — (Q: - Q) where @Q, = Lower quartile ‘The secor skewness (Bow! eo Os + 21-20, Qy = Upper quartile Coefficient of skewness - Qs +Q,-2M Sk, - BtQ- 2M Ba Os 20, Kelly's coeflicient of skewness Poo — 2Poo +B, Sk = EI (Based on percentiles) where Pog = value of 90" percentile Pio = value of 10" percentile Peo = value of the median ‘| 6, Bowley’s another measure of coefficient of skewness based on moments. 2 Coefficient of skewness f, = a Moments with mean as the origin M1 E(y 3) N Za, - 3? noe ) © Xx - x) W _ 2% -%)* : Ms fn 8 Coefficient of kurtosis B= Ht Bok anere By =3, for mesokurtic distribution A, > 3, for leptokurtic distribution 4, < 3, for platykurtic distribution208 Business Statistic os A AS this is a case of non-mutually exclusive events, the following formula is used. 55_, 150 _ 30. P(A or B) = P(A) + P(B) — P(A and B) = 750° 250 250 _ 55+150 250) 12522) Five out of 100 items produced in machine A and one out of 100 items produ machine B are found to be defective. An item drawn at random from the items produced by A ay, d found to be defective. What is the probability that this item has been made in machine A, assuming both machines produced equal number of items. Solution This problem can be worked out using Baye’s theorem. The necessary calculations shown in the following table. Event Prior P(E,) Conditional P(A/E,) Joint P(E;A) Posterior P(e, Machine A 0.05 05 0.25 0.83 Machine B 0.01 0.5. 0.05 0.17 0.30 1.00 On the basis of above table, the probability that the defective item has been made in machine is 0.83. Probability estimate made prior to receiving information. A rule that is used while revising probabilities of events after hav. ing obtained more information. The method of assigning probabilities to outcomes or events of experiment with equally likely outcomes. A list of events that represents all the possible outcomes of a experiment. A priori probability Bayes’ theorem Classical probability rule Collectively exhaustive eventsProbability 209 bot d (0 composite) ‘An event that contains more than one outcome of an experiment, vent probability The probability of event B occurring, given that event A has aiion! occurred. ‘When the occurrence of one event affects the probability of the occurrence of the other, then the two events are said to be dependent events. One of the possible outcomes of an experiment. ‘A process that results in an event. Two events for which the occurrence of one does not change the probability of the occurrence of the other. The probability of two or more events occurring together or in succession. ‘The probability of one event without consideration of any other event. Two or more events that cannot occur together. ‘The result of the performance of an experiment. ‘A probability that has been revised on the basis of new information that has become available. ‘A numerical measure of the likelihood that a specific event will occur. The proportion of times that an event occurs in a very large number of trials. ‘The set of all sample points or outcomes of an experiment. ‘The condition when the probability of a certain event is dependent on the occurrence of some other event. ‘The condition when the occurrence of one event does not have any effect on the occurrence of another event. The probability assigned to an event by a person on the basis of his judgment as well as the information available with him. ‘A diagram in which each outcome of an experiment is represented by a branch of a tree. ‘A diagram showing sample space in the form of a rectangle and ‘events as portions of that rectangle. ESOT Probability of event A = P(A). A single probability refers to the probability of one particular Pr and is called the marginal probability, (A) 2 0. Probability of any event (in this case event A) cannot be negative,Business Statistics 3. Pays Simo) et POD, where AB, C+ are mutually excisiy all possible exclusive events is unity, ore 4, ive ae Relative Probsbitiy pay = in’ Mere A is a event of geting head i the event occurs and is the nun 5. P(A or B) = pj he number oti mber of times the experiment is performed, ; (A) + PQB). Proba a bility of cit f tual j their Probabititi, lity of either of two mutual ly exclusive Cvents is the Say = us =p) a P(A) + P(B)—P(AB), When the events are not mutually exclusive proba er A or B is the sum of the two probabilitic minus the probability of 4... eh together, aoe no ae my OFA and B appt 58 PAB) = P(A) X P(B), where P(AB) is the Joint probability of events A, and B, P( Marginal probability of event A and P(B) is the marginal probabili be Probability of two ty of event B, Thy 7 €vents occurring together or in succession is the product Of theiy Probabilities, 8. P(B/A) = PCB). In ca: Probability of event B, given ty Occurrence of event A, is simply the probability of event B. 9. PIA) = P(BAYP(A) and P(AIB) = P(ABY/P(B). In case of stati conditional probability of event B, giv ability of events A and B divided by S, the jog agi se of independent events, the conditional istically dependent Vents, the ‘en the occurrence of event A, is equal to the joint pos. the marginal probability of event A. The same rule opps to the second term P(A/B), P(AB) = P(A/B) x P(B) and P(BA) = P(B/A) x P(A). In case of statistically dependent eres, the joint probability of events A and B is equal tothe Probability of event A, gi 10, 9.1 Given below are ten statements. In (a) If one event is not affected by the mutually exclusive, (b) An unconditional probability is also kn ; (c) The sample space is a set of all possible outcomes of an Saver nes (d) In classical approach to probability, one can State the outcome of an event ibility in calcul (©) The relative frequency of occurrence approach offers greatest flexibility in probability of an event, . If P(A/B) = P(B), then A and B are said to be Y soeabatiealy P(AB) is used to denote a ma, o A subjective probability is just an ielgent i) When two events are not mutually exclusiy Be Pre robability of two oF more independent events occurring together is he De pnargiasl probabilities, \ cate in each case whether it is true or false. ite outcome of another event, the two events are sai }own as a marginal probability. independent events. ginal probability, Suess regarding the occurrence e, P(A or B) is the summation o of ane f POA)Probability Distributions 257 PERU VS One repetition of a binomial experiment. Also called a trial. The probability distribution that gives the probability of x successes in n trials when the probability of success is p for each trial of a binomial experiment correction factor — The addition of 0.5 to and subtraction of 0.5 from the x value where x is the number of successes in 1 trials, [tis a method of converting a diserete variable into a continuous variable. A probability distribution in which the variable can take on any value within a given range. A random variable that can assume any values within a given range. A probability distribution in which the variable takes on only a limited number of values that can be listed. A random variable can take a limited number of values that are countable, A symmetrical distribution with a and becomes at the extrem horizontal axis. Like the binomial, but unlike the normal, it is a discrete probability distribution, that gives the probability of x (success) in an interval. It is appropriate when the probability of x is very small and n is large. A distribution of the probabilities associated with each of the values of a random variable. It is a theoretical distribution and is used to represent population. A variable that assumes a unique numerical value for each of the outcomes in a sample space of a probability experiment. A normal probability distribution, which has its mean as zero and the standard deviation as1. ingle s The two tails never touch the | LIST OF FORMULAE The binomial formula: Probability of ‘r’ successes in ‘n’ Bernoulli trials at r ~ Fi(n—r)! eT" number of successes desired ‘7/= pumber of trials _ p= probability of success _ G= probability of failure (q = | - p) binomial distribution 4 = np—Number of trials multiplied by the probability of+ Standard deviation of a binomial distribution s = J pq — Square root of the Product of, terms—number of trials, probability of a success, and the probability of a failure, | Mt 4. Poisson formula: Pay = 4 XO x! The probability of x occurrences is equal to 4 raised to the power x, multiplied by » (whi equal to 2.71828) raised to the negative 2 power. The resultant numerator is to be divided factorial, a 5. Poisson distribution as an approximation of the binomial re Poy = WP Xe x The mean of the Poisson distribution (A) has been substituted by the mean of the binomiay distribution (np). The approximation is good when 2 20 and p $ 0.05. 6. z= 2H ao . + where x = value of the random variable x in which we are interested #= mean of the distribution of this variable x y= standard deviation == number of standard deviations from x to the mean of this distribution 1 _ aan Normal approximation of the binomial distribution. Here, m has been substituted by np, being the mean of the binomial distribution, and s has been substituted by the standard deviation of the binomial distribution. 10.1 Given below are ten statements. Indicate in each case whether it is true or false. (a) A distribution where the mean and the median have different values is not a nomal distribution. i (b) The right and left tails of the normal curve always touch the horizontal axis. (c) Ina Bernoulli process, the probability of the outcome of any trial (toss) need not be fis! over time. (d) Ina Bernoulli process the trials must always be statistically independent. (e) The standard deviation of a binomial distribution is Jupq . (f) The Poisson distribution is not a discrete probability distribution. a (g) The value of a random variable can be predicted in advance even before the occu” of an event. (b) A binomial distribution need not be symmetr Bernoulli trial is p = 0.5. () A normal curye is bell-shaped and has a single peak, (j) In the formula used in Poisson distribution, the symbol A stands for the mea”. recess it when the probability of su ae296 Business Statistics © _ #20800 20.4 - as (33 gives an area of 0.0918 @ 830-800 _ 30 _» Again, =a 7B Z=2 gives an area of 0.0228 0.5 — 0.0228 = 0.4772 As this applies to each half of the normal curve, the required probability would be 0.4772 + 0.4772 = 0.9544 Area sampling A form of cluster sampling in which areas such as census tracts 24 blocks form the primary sampling units. The population : divided into mutually exclusive areas using maps. A randoy sample of the area is then selected. Census ‘A measurement of each element in a group or population of ts est. A theory that states that as a sample size increases, the distributin of sample means tends to take the form of a normal distributive Asample design i in which a cluster of elements is the primary sa pling unit instead of individual elements in the population. ‘A sample selected by the researcher on the basis of his can nience. Central limit theorem Cluster sampling Convenience sample Finite population correction A correction factor used while determining sample size froma fit factor population. The usual practice is to apply it when sample * more than 5 per cent of the population. It is also known Fis Population Multiplier. Finite population A population having a stated or limited size. ni Infinite population A population that is exceptionally large in size and as S¥° impossible to cover all the elements comprising it.i a fraction panpling with replacement Sampling without | replacement pimple random sample Sampling and Sampling Distributions 297 A on-probabilty sample based on the judgment of the researcher Who thinks that ‘ the sample respondents thus selected would con- tribute to Answering the question. A sample design in which a sample is drayn in two or more stages ‘equentially. The sampling unit in each stage tends to be different. An error that occurs in the collection, recording, tabulation and computation of data, z A quantity that remain: Aqu 's constant in each case considered but varies in different cases. The desired Size of the confidence interval when a population pa- Tameter is to be estimated. The concept is also useful in determining sample size, A non-probability sample that contains a pre-specified quota of certain characteristics of a population. A sample that assigns some chance to each element of the population to be selected in the sample. It is also known as probability sample. A sample that represents the characteristics of the population as closely as possible. A subset or some part of a population. Fora given population, a probability distribution of all the possible values that a statistic may take on for a given sample size. The probability distribution of all the values of the mean calculated from all possible samples of the same size selected from a population. The difference between the population parameter and the observed probability sample statistic. The proportion of the number of elements included in a sample to the total number of elements contained in a population. A sampling procedure in which sample items are returned to the population; as a resvit, there is a possibility of their being chosen again in the sample A sampling proceure in which sample items are not returned to the population, as a result none of these can be selected in the sample again. A probability sampling procedure where each clement of the population has an equal chance of being selected. The standard deviation of the sampling distribution of the mean. It is calculated by dividing the population standard deviation by the square root of the sample size. The standard deviation of the sampling distribution of a statistic,a 298 Business Statistics i Statistic ‘A measure or characteristic of a sample, Statistical inference The process of deriving rope about the POPUlation lig information contained in the sample, , ' ith ation formed in such a way that re Groups within a popul e ch a Way that ea, Strata is relatively homogeneous but wider variability exists ange) Separate groups. Bed invita as hy : 7 A probability sampling met! fod in which sul samples are Stratified sampling a two or more strata comprising the population, Thea a Ss more or less homogeneous. e i i i in which a sample is drawn in such tematic sampling A sampling method in w! ch a yay Syst ipl : it is systematically spread over all the elements of populate vn = standard error of the sample mean standard deviation of the Population weg Humber of elements or units in the sample 2. Standard error of the sample mean when the Population is finite =o N-n OL a Nt where N= size of the population n= number of elements 3. Finite population correction factor where o; oF units in the sample 1n the population is small in relaton) the sample size, ‘When the Sampling fraction (2 is less than 0.05, this ‘tmultiplier need noth used. 4. The Z value for ¥ = sample mean Standard error ofthe sample mean Z= number of Standard Srrors from X to the Population meanSampling and Sampling Distributions 299 e Z value has been obtained, the standard normal probability distribution table ¢ tht h t oe ix Table 1) can be used. The table is organised in terms of standard units or Z values. (ene proportion 5, Sm pow ample proportion ‘umber of elements in the sample that possess a specific characteristic umber of elements or units in the sample proportion p woes P > x n an of the sample Hp =P rd error of tl _ [pd=P) o% n - q=i-P) - ; 4. The Z value for sample proportion B P=P. ’ zabPe % ~~ [pd-p) n where Bf = sample proportion opulation proportion standard error of the sample proportion 6, Met 1 snd he sample proportion ILI Given below are twelve statements. Indicate in each case whether the statement or false: (a) A parameter is a characteristic of a sample. (b) Ina random sample every element in the pop! (©) A cluster sample is a non-random sample. (@) The standard error is different from the standard deviation of the distribution of sample means. (©) A stratified random sampling is one where the population is divided into mutually exclusive and mutually exhaustive strata. 1 ( As the sample size n increases, the stand in Judgment sampling is not a representative h) The standard error of the mean o; decreas: () The proportion of sample size to the populatio @ A theoretical sampling distribution implies that all the sampl sidered, : ol The precision of a sample depends on the proportion of the population sampled. ) If mis relatively very small as compared to N, then the final population correction factor need not be used. ulation has an equal chance of being selected._ lard error oj; not necessarily decreases. sample. ‘es in direct proportion to sample size n. 1 size is known as the sampling fraction, les of a given size are con-— Confidence interval Confidence level Confidence limits Consistent estimator Degrees of freedom Efficient estimator Estimate Estimation Estimator Interval estimate Method of maximum likelihood Parameter Point estimate Student's t distribution Sufficient estimator Unbiased estimator A specified range of numbers within which a population pdrameter is likely to fall, Denoted by (1 — ) 100 per cent, a confidence level states how much confidence we have so that the true population parameter lies within a confidence interval. The upper and lower boundaries of a confidence interval. An estimator that gives values more closely approaching the population parameter as the sample size increases. The number of values in a sample that can be freely specified once something about a sample is known, An estimator that has smaller standard error as compared to some other estimator of the population parameter. The value of a sample statistic that is used to find a corresponding population parameter. A procedure for assigning value or values to a population parameter based on the data collected from a sample. A sample statistic that is used to estimate a population parameter. The estimate of an interval in which an unknown population characteristic is expected to lie for a given level of significance. A method that provides estimators with the desirable properties, such as efficiency, consisteney and sufficiency. It usually does not give unbiased estimators. ‘The numerical value of a summary measure in the population, such as the mean /1 or the standard deviation o The value of a sample statistic pertaining to the corresponding population parameter. A probability distribution used when the population standard deviation is unknown and the sample size n < 30. ‘An estimator that uses all the available data pertaining to a parameter. When the expected value of the statistic used as an estimator is equal to the population parameter to be estimated, then the estima~ tor is unbiased. 1. Point estimate of the population meanTesting Hypotheses 363 the parameter(s). However, there may be certain situations when it is difficult to specify aves O natives to null hypothesis, Hp, that have practical importance. All the same, this difficulty yearly Deen the utility of hypothesis tests. In a way, it suggests that we have to be very careful [oes BO el hypothesis on the basis of insufficient information. If we are convinced that the data jecting Mt. we shoud prefer to get additional data and then apply the test. insite also note that our choice in favour of a one-tail or a two-tail test will depend on the We shove of the paramicter that we are trying to find. Suppose that our parameter of interest is ative yoann case we were to incur heavy financial loss if 71 were greater than jl but not if op ation een we would focus our attention on the detection of values of #1, greater than fly, In that ier id se a right-tail test to reject the hypothesis, In case we are interested in detectiniy value ase W eh is either greater than or less than fZo, then we would use a two-tail test. By applying such fd Wa, we can Use an appropriate test—right-tail or left-tail or two-tail test. Saar poi to note is that sometimes we may find that the assumptions upon which the test is Pear not valid. In such a case, it would be wrong to use the test discussed in this chapter. In order 5 overcome this problem, we may use an appropriate non-parametric test, Such tests are known as istribution-free tests and have a few assumptions, if any. We shall discuss a major non-parametric test chi-square) in Chapter 15 and a number of other non-parametric tests in Chapter 20. Alpha (a) The significance level of a test of hypothesis that denotes the probability of rejecting a null hypothesis when it is actually true. In other words, it is the probability of committing a Type I error. A hypothesis that takes a value of a population parameter different from that used in the null hypothesis. Alternative hypothesis Beta (B) The probability of not rejecting a null hypothesis when it actually ’ is false. In other words, itis the probability of committing a Type II error. Critical region The set of values of the test statistic that will cause us to reject the - null hypothesis. . The ‘first’ (or ‘boundary’) value in the critical region. < If the calculated test statistic falls within the critical region, the null hypothesis Hy is rejected. In contrast, if the calculated test statistic does not fall within the critical region, the null hypothesis is not Critical value Decision rule rejected. F-distribution A continuous distribution that has two parameters (df for the numerator and df for the denominator). It is mainly used to test hypotheses concerning variances, é Frratio In ANOVA, it is the ratio of between-column variance to within- column variance. Hypothesis An unproven proposition or supposition that tentatively explains a phenomenon,Pe 364 Business Statistics Null hypothesis A statement about a status quo about a Population Pata is being tested. Mel One-tail test A statistical hypothesis test in which the alternative values is considered, Power of the hypothesis test ‘The probability of rejecting the null hypothesis When it; Significance level The value of a that gives the probability of Tejecting fe pothesis when itis tue. This gives rise to Type fey é,"® Test criteria Criteria consisting of (i) specifying a level of significan determining a est statisti, (if) determining the erie & () and (iv) determining the critical value(s), elon) Test statistic The value of Z or ¢ calculated for a samy sample mean or the sample proportion. Two-tail test A statistical hypothesis test in which the alternative hy a Stated in such a way that it includes both the higher and them values of a parameter than the value specified in the mu type, esis. . An error caused by rejecting a null hypothesis that is true, An error caused by failing to Teject a null hypothesis, that is no, true. 1. @) The test statistic Z for X when o is known specified such that only one direction of the possibye se ; bug Mm ple statistic SUCh a 4 it Type lerror .- Dye I error 2. ‘The test statistic ¢ fora small sample when cis unknown but n < 30 X-u shin 3. The test statistic z for p (proportion) for a large sample t= » Where 7 = sample proportion, p = population propoti®®365 Testing Hypotheses z for test concerning differences between two population means ¥ — 3a) — U4 = Ha) re unknown, then 5; and sy are used. 1d 2 A r ; in ratte = z for test concerning differences between two population proportions «nee _ @aB)=(- Pr) pa P) Wn, +Vn,) = py then the test statistic z is Pi=P2 2° Ted p(n, + Vn) ed sample proportion (p>) for two samples aghen Ho * peal S DtM iP es pay P BELT Pe , where B (I~ P)= Ba Ih , The power ofa statistical test oe of the test to perform as required. The greater the value of 1 — f, the better is the decision. 1 The test statistic ¢ when paired observations are involved . proce 2 and d=x,-x, $= estimate of a. Ne P statistic = 67/63 = Where 92 oye © St = Variance of Sample 1 i ds§ Bis Ss 52 = Variance of Sample 2es The critical value of 7” at 0.05 level of significance for 12 df is 21.026. At0.01 level OF signig, cn the critical value of 7” is 26.217. a As the calculated value of 7” is less than the critical value of ? at 0.05 and 0.01 levels of, cance, Hy cannot be rejected at either level. Bri 396 Business Statistics s i SU} PRECAUTIONS ABOUT USING THE CHESQUARE TEST quare hypothesis test properly, one has to be extremely careful and keep in. In order to use a chi. mi sample size should be large enough. Ifthe expected frequenciey certain precautions. Fit ae are Ate | small, the value of y? gets over-cstimated. This will result in the rejection of the null hypothesis.” ir several cases, To overcome this problem, we must ensure that the expected frequency in any ce iy contingency table should not be less than 5. In case the expected frequency is below 5 inl more than an we can combine these to obtain the expected frequency of at least 5. Another point to note is that . calculations must be made with actual numbers in each cell and not with proportions oF per ceniag the proportions or percentages were used, then the theoretical distribution would not be applicabe When the calculated value of 77 turns out to be more than the critical or theoretical valye at, predetermined level of significance, we reject the null hypothesis. In contrast, when the 7? value jg less than the critical or theoretical value, the null hypothesis is not rejected. However, when the 7? vajy tums out to be zero, we have to be extremely careful to confirm that there is no difference between the observed and the expected frequencies. Such a situation may sometimes arise on account of a Faulty manner used in the collection of data. In most of the cases, the problems of 7” involve simple calculations. However, for large sets of daty the chi-square test involves very comprehensive calculations. In all such cases, computer should be used. Several computer Statistics packages contain routines for carrying out chi-square tests, GLOSSARY. Chi-square distribution A distribution, with degrees of freedom as the only parameter, It is skewed to the right for small degrees of freedom, but when degrees of freedom are large, it looks like a normal curve. A Statistical technique used to test significance in the analysis of frequency distribution. A table having rows and columns wherein each row corresponds to a level of one variable and each column to a level of another vari. able, The frequencies with which each variable combination has occurred are contained in the body of the table, The number of elements that can be chosen freely, The frequencies for different categories of a multinomial experiment or for different cells of contingency table, which are omveniet occur on the assumption that eer hypothesis ig Gee -of-fit test A statistical test involving the chi-square statisti Rarin expected pattern. Chi-square test Contingency table Degrees of freedom Expected frequencies (24) ry (er-SeJer-2P) NXIXY -(ZXYMLY) Jnxx?-(Lx? JNEY?-(oyP 2. Coefficient of correlation with original data when deviations are taken from means EC-V -¥) i VXX - XP DY -Y)?514 Business Statistics 9. Coefficient of rank correlation between two ranked variables 6xd? N(N?=1) where r, stands for Spearman's rank correlation, d for the difference between the two variables and N for number of paired observations. Paths of, re=l- 17.1 Given below are fifteen statements. Indicate or false: (a) An? value close to zero is an indication of a strong relationship between X and (b) An? vaiue measures how strong the relationship between X and Y is provided itis, (©) Ifone variable is increasing while the other is declining, then there is an inversecant tion between the two variables. (@) The coefficient of correlation must always be between zero and +1. (©) Correlation indicates a causal relationship between the two variables. (©) There is no difference between coefficient of correlation and coefficient of detens tion. (g) A scatter diagram can give us a broad idea whether the two variables are relatos (h) Ifr=0.7, it represents 70 per cent of the total variation in Y. (@ A spurious correlation indicates that the two variables are related, but in realy = no common link between the two. (j) If two series X and Y plotted on a graph, move in opposite directions, then het? absence of correlation. (k) If one variable is constant in the two series.X and Y, then the coefficient of zero. (@ Correlation analysis is a method of obtaining the equation that r between two variables. : cdi! (m) As the value of decreases from its maximum value of 1, there is a share coefficient of determination. (n) The Spearman’s ris a distribution-free measure of correlation. sg ot (0) Rank correlation can be applied both to individual observations a" quency distribution. Multiple Choice Questions (17.2 to 17.12) 17.2. A Scatter diagram is each ease whether the statement, comet oe epresents relat ie ; he,» (a) astatistical test (b) linear oe and) Oe (c) curvilinear (@) a graph ane vo 47.3 Which of the following correlation coefficients shows the highest des! ; a) 0.9 (b) 0.95 (c) -0.89 i 8 @1 (H) Both (d) and) opase™ (d) -1 174 ich of the following correlation coefficients shows the lowest le @ ol y (a) -0.95 (b) 0.75 (c) 0.38 eepf Z 00 ie ean . . . y ior 00 of multiple correlation analysis is that it ye . jimi However, we find that in practice a large vy " iste pattern. As such, the linear regression coe! gv! Assumes. nurhber {ficients that the relationship amongst the Of relationships are not linear and are unable to describe curvilinear i F station is based on the assumption that the effe Pai, ‘pte a quite sue from each other and hence additive, A Feat ent variable has the same effect on the dependent variabl ae dent yariable or variables, { ie amount of work involved in the calculation of multiple linear c: il st is not SO easy to interpret the results accurately as ver pat! cept. The method, on the whole, is complex. yt sends it may be noted that in this chapter the discussion was confined to only three variables; tt? “ons involved in case of three variables can be carried out smoot Susi thly with a pock: . . + et la- over. when the number of variables is four or more, calculations become tedious aid ae i How" reulst0F may take quite some time. In such cases, we may have to use computers. The rire ett ‘computers has greatly facilitated statisticians to handle complex problems of rmultivarints ze a fact, many specific programmes based on the requirements of statisticians are available. iis , i cts of independent variables on the de- cordingly, a given change in © regardless of the size of the ‘orrelation is enormous, ty few persons are well-versed with et ae The positive square root of R?. ‘lation, pint of multiple The proportion of total sum of squares (SST) that is explained by the determination, ®) multiple regression model. It measures how well the multiple regression fits the given data. (onputed F ratio A statistic used to test the significance of the regression as a whole. (onputed t A statistic used for testing the significance of an independent variable. Sede! A general mathematical relationship relating a dependent variable (¥) to independent variables X,, X>, X3, -... Xj. Ihiticollinearity A statistical problem in multiple-regression analysis arising from the existence of correlation between two or more independent variables. It reduces the reliability of regression coefficients. laliple regression A technique of analysing data, which simultaneously investigates the effect of two or more independent variables on a dependent variable. It is, thus, an extension of the simple regression technique. arial correlation It is the correlation between two variables while the remaining vari- t ables are held constdnt. : S ,542 Business Statistics ‘Partial regression Ina multiple regression, these are the coefficients of independen! coefficients variables. The name ‘partial’ suggests that each one of then sures the effect of the corresponding independent variable cham dependent variable when the remaining independent variables °° held constant. S ate Standard error of a . Ameasure of uncertainty regarding the actual value ofa Tegressi regression coefficient coefficient. Ssion, ~ . Multiple linear regression mod Y= at bX + bX) + b3X3+ ... + bX, 2. Estimated multiple regression model: ¥ =a tb X/+ BX + byXy +... + WX 3. Normal equations when two independent variables are involved: LY=na + by UX;+ BEX, EX|Y = GEX, + DEX? + BEX Xy EX,Y = aZX;+ DiEX,X, + bZXF 4. Standard error of estimate: where n—k—1 stands for degrees of freedom. 5. Coefficient of multiple determination: R?= SSR » SST Ee ee where SSR =£(¥ — Y)’ and SST=Z(¥- Y)* ™ 6. Value of the test statistic F for the test of overall significance of the multiple regression model: tk 1 pe __SSRIk i SSE/(n—k—1) where SSR stands for regression sum of squares (ie. the explained part) and SSE for error sum of squares (je. the unexplained part), k for numerator degrees of freedom and. n— k—1 for denominator degrees of freedom. 18.1 Given below are twelve statements. Indicate in each case whether the statement is true, or false: : (a) The multiple linear regression is not a superior analytical tool compared to the simple linear regression. i590 Business Statistics (Contd) \ Q 364 104.0 ; a, 348 100.0 yee ~ 1982 a 342 94.8 mato : 350 101.2 60.74 & 381 104.0 345 95, a 345 100.0 ee a5 1983 Q 364 re me a 390 101.2 3.97 a 401 104.0 its 2a y 100.0 55g a 385 aes GaP ED The following data relate to an annual trend ¥ = 462 + 27.80 Origin : July 1, 2007 Xin terms of years Y in terms of Rupees in million You are asked to convert this annual trend equation into monthly terms. Solution ¥ =462 + 27.8X - Since, this is an annual trend equation, dividing these values by 12, we get Again, 0.193 ‘As it is monthly trend equation, the origin should be shifted from July 1, 2007 to July 15,2007, ie to the middle of the month. Hence, 7 ¥ = 38.5 + (0.193)(0.5) or 38.5 + 0.0965 Thus, we can now write, ¥ =38.5 + 0.0965X Origin : July 15, 2007 X in terms of months Y in terms of Rupees in million OTS Cyelical component or Ina time series, fluctuations around the trend line that last for fluctuations more than one year. Deseasonalisation A statistical process by which the seasonal variation from a tine series is eliminated. Forecasting Predicting the expected value of an item or variable of interest

Business Statistics Cheat Sheet?
No ratings yet
Business Statistics Cheat Sheet?
7 pages
Dsbda Unit 2
No ratings yet
Dsbda Unit 2
155 pages
03 - Measures - of - Center - Variation
No ratings yet
03 - Measures - of - Center - Variation
45 pages
Descriptive Measures With Samples-1
No ratings yet
Descriptive Measures With Samples-1
33 pages
RMBS BPT402
No ratings yet
RMBS BPT402
103 pages
Sure Shot
No ratings yet
Sure Shot
4 pages
Lecture 3 Numerical Measures of Data
No ratings yet
Lecture 3 Numerical Measures of Data
36 pages
03 Numerical Description
No ratings yet
03 Numerical Description
52 pages
5-MEASURES of DISPERSION-02-Aug-2019Material I 02-Aug-2019 Exp. No. 1 - Measures of Central Tendency Dispersion Skewness and Kurtosi
No ratings yet
5-MEASURES of DISPERSION-02-Aug-2019Material I 02-Aug-2019 Exp. No. 1 - Measures of Central Tendency Dispersion Skewness and Kurtosi
10 pages
BBS I Formulae Statistics
No ratings yet
BBS I Formulae Statistics
22 pages
EXP-1- Statistics and Plotting
No ratings yet
EXP-1- Statistics and Plotting
23 pages
Biostatistics (Descriptive Statistics)
No ratings yet
Biostatistics (Descriptive Statistics)
30 pages
3-Measures of Dispersion
No ratings yet
3-Measures of Dispersion
33 pages
measure of dispersion-intro.docx
No ratings yet
measure of dispersion-intro.docx
14 pages
Inter Material Iindyearem Mathsiia Measures of Dispersion
No ratings yet
Inter Material Iindyearem Mathsiia Measures of Dispersion
29 pages
Lecture 3
No ratings yet
Lecture 3
10 pages
Descriptive Stat
No ratings yet
Descriptive Stat
13 pages
DSJ BMS Unit2
No ratings yet
DSJ BMS Unit2
18 pages
Chapter 3 - Statistics
No ratings yet
Chapter 3 - Statistics
16 pages
History Reporting
No ratings yet
History Reporting
61 pages
Module Wise Important Formulae
No ratings yet
Module Wise Important Formulae
45 pages
Measures of Location and VARIATION For 1 Variable
No ratings yet
Measures of Location and VARIATION For 1 Variable
44 pages
03 -- measures_of_center_variation
No ratings yet
03 -- measures_of_center_variation
45 pages
Measures-of-Central-Tendency
No ratings yet
Measures-of-Central-Tendency
11 pages
Unit 1 - Business Statistics & Analytics
No ratings yet
Unit 1 - Business Statistics & Analytics
25 pages
Measures of Central Tendency
100% (15)
Measures of Central Tendency
15 pages
Unit I Bbbbbbbbbbbbbba
No ratings yet
Unit I Bbbbbbbbbbbbbba
8 pages
Statistics English 781679327228760
No ratings yet
Statistics English 781679327228760
15 pages
Statistics Tutorial 1
No ratings yet
Statistics Tutorial 1
12 pages
Lesson-3.2-Measures-of-Central-Tendency-Position-and-Variation
No ratings yet
Lesson-3.2-Measures-of-Central-Tendency-Position-and-Variation
62 pages
Statistics Study Guide: Measures of Central Tendancy
No ratings yet
Statistics Study Guide: Measures of Central Tendancy
2 pages
Descreptive Statistics 1
No ratings yet
Descreptive Statistics 1
74 pages
Business Statistics - Session Descriptive Statistics
No ratings yet
Business Statistics - Session Descriptive Statistics
28 pages
SLIDES - Statistics-Descriptive Statistics
No ratings yet
SLIDES - Statistics-Descriptive Statistics
25 pages
FDSA unit 2
No ratings yet
FDSA unit 2
44 pages
Measures of Location and Spread
No ratings yet
Measures of Location and Spread
1 page
Lecture of BIOSTATISTICS 12.2022 RMDC
No ratings yet
Lecture of BIOSTATISTICS 12.2022 RMDC
85 pages
Measures of Cental Tendency & Dispersions
No ratings yet
Measures of Cental Tendency & Dispersions
42 pages
EDA_W3_Obtaining-Data
No ratings yet
EDA_W3_Obtaining-Data
57 pages
SSC CGL Tier 2 Statistics - Last Minute Study Notes: Measures of Central Tendency
No ratings yet
SSC CGL Tier 2 Statistics - Last Minute Study Notes: Measures of Central Tendency
10 pages
Topic 1 Numerical Measure
No ratings yet
Topic 1 Numerical Measure
11 pages
Math2101Stat 2 2
No ratings yet
Math2101Stat 2 2
23 pages
Bca-1sem Statistics, Unit1,2 and Moment
No ratings yet
Bca-1sem Statistics, Unit1,2 and Moment
52 pages
Lecture 3 - Numerical Statistics
No ratings yet
Lecture 3 - Numerical Statistics
7 pages
RSU - Statistics - Lecture 3 - Final - myRSU
No ratings yet
RSU - Statistics - Lecture 3 - Final - myRSU
34 pages
dddddd2
No ratings yet
dddddd2
5 pages
Descriptive Statistics 1
No ratings yet
Descriptive Statistics 1
63 pages
Measures of Dispersion
80% (5)
Measures of Dispersion
23 pages
Day 3 Educational Statistics
No ratings yet
Day 3 Educational Statistics
37 pages
UNIT FIVE (1)
No ratings yet
UNIT FIVE (1)
23 pages
Describing Data_Numerical Measure
No ratings yet
Describing Data_Numerical Measure
33 pages
S Ah Z5 BW Pwtoebk KRHR JB
No ratings yet
S Ah Z5 BW Pwtoebk KRHR JB
24 pages
Measures of Dispersion
No ratings yet
Measures of Dispersion
7 pages
Statistical Measures 2024 (Part 2) - Word
No ratings yet
Statistical Measures 2024 (Part 2) - Word
8 pages
Lecture III-Measures of Dispersion
No ratings yet
Lecture III-Measures of Dispersion
33 pages
Chapter 3-Measures of Central Tendency
No ratings yet
Chapter 3-Measures of Central Tendency
48 pages
Measures of Dispersion
No ratings yet
Measures of Dispersion
7 pages
Click To Add Text Dr. Cemre Erciyes
No ratings yet
Click To Add Text Dr. Cemre Erciyes
69 pages
Pricing Decision Unit III
No ratings yet
Pricing Decision Unit III
3 pages
It Format
No ratings yet
It Format
5 pages
It Project
No ratings yet
It Project
14 pages

Theory and Formula

Uploaded by

Theory and Formula

Uploaded by

You might also like