0% found this document useful (0 votes)
57 views

What Is Statistics? Statistics Is A Scientific Body of Knowledge That Deals With The Collection, Organization or

The document discusses the use of statistics in various fields such as business, education, psychology, politics, medicine, agriculture, and everyday life. It provides examples of how statistics is used to analyze data, make inferences about large populations based on samples, and help decision making. Statistics involves collecting, organizing, analyzing, and interpreting quantitative and qualitative data using statistical methods and tools.
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
57 views

What Is Statistics? Statistics Is A Scientific Body of Knowledge That Deals With The Collection, Organization or

The document discusses the use of statistics in various fields such as business, education, psychology, politics, medicine, agriculture, and everyday life. It provides examples of how statistics is used to analyze data, make inferences about large populations based on samples, and help decision making. Statistics involves collecting, organizing, analyzing, and interpreting quantitative and qualitative data using statistical methods and tools.
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 24

Graduate School Notre Dame University Cotabato City

What is statistics? Statistics is a scientific body of knowledge that deals with the collection, organization or presentation, analysis, and interpretation of data. Collection refers to the gathering of information or data. Organization or presentation involves summarizing data or information in textual, graphical, or tabular forms Analysis involves describing the data by using statistical methods and procedures. Interpretation refers to the process of making conclusions based on the analyzed data. Application of Statistics Statistics is important in our lives. It has many applications in various disciplines and in real life. In business: A business firm collects and gathers data or information from its everyday operation. Statistics is used to summarize and describe those data such as the amount of sales, expenditures, and production to enable the management to understand and determine the status of the firm. Data that have been organized and analyzed provide the management baseline data to make wise decisions pertaining to the operation of the business. In education: Through statistical tools, a teacher can determine the effectiveness of a particular teaching method by analyzing test scores obtained by their students. Results of this study may be used to improve teaching-learning activities. In Psychology: Psychologists are able to interpret meaningful aptitude tests, IQ tests, and other psychological tests using statistical procedures or tools. In Politics and Government: Public opinion and election polls are commonly used to assess the opinions or preferences of the public for issues or candidates of interest. Statistics plays an important role in conducting surveys or interviews for that purpose. In Medicine: Statistics is also used in determining the effectiveness of new drug products in treating a particular type of disease. To illustrate, a drug company wants to test the effectiveness of its new drug product in treating tuberculosis. An experiment or a clinical trial is conducted. Ten tuberculosis patients are treated using the new drug product and another ten are treated using the existing drug. The results are analyzed statistically to find out if the new product is more effective in treating tuberculosis. In Agriculture: Through statistical tools, an agriculturist can determine the effectiveness of a new fertilizer in the growth of plants or crops. Moreover, crop production and yield can be better analyzed through the use of statistical methods. In Entertainment: The most favourite actresses and actors can be determined using surveys. Ratings of the members of the board of judges in a beauty contest are statistically analyzed. Interviews are used to determine the most widely viewed television show. The top grosser movies for this year are reported based on statistical records of movie houses. All these activities involve the use of statistics. In Everyday Life: The number of cars passing through streets or a highway is recorded to enable traffic enforcers to manage efficiently. Even the number of pedestrians crossing the street, the number of people entering a warehouse or a department store, and the number of people engaged in video games involve the use of statistics. In short, statistics is found and used in everyday life.

Business Statistics Comprehensive Examination Review

Graduate School Notre Dame University Cotabato City

Categories of Statistics Descriptive Statistics is a statistical procedure concerned with describing the characteristics and properties of a group of persons, places, or things. Stating how many are poor and rich How many are literate and illiterate How many fall into various categories of age, height, civil status, IQ, and many more.

Generally, descriptive statistics involves gathering, organizing, presenting, and describing data. Inferential Statistics is a statistical procedure that is used to draw inferences or information about the properties or characteristics by a large group of people, places, or things on the basis of the information obtained from a small portion of a large group. Survey before election poll on 1,000 eligible voters Prediction of the future sales of the company based on the present sales A market vendor investigates the most popular brand of vinegar in the market

Inferential statistics involves analysis of data so that meaningful interpretation or conclusion about a large group of people can be formulated. Terminologies in Statistics Population refers to a large collection of objects, persons, places, or things. Sample is a small portion or part of a population. Parameter is any numerical or nominal characteristic of a population Statistic is any numerical value describing a characteristic of a sample Census the information is gathered for all the units in the population Survey only a part of the population (sample) is used to obtain the data Data (Datum) are facts, or a set of information or observation under study/any recorded event Qualitative data are data which can assume values that manifest the concept of attributes. These are sometimes called categorical data. Quantitative data are data which are numerical in nature. These are data obtained from counting or measuring. Primary data are data which have been acquired from source Secondary data are non-primary data Variable is a characteristic or property of a population or sample which makes the member different from the other Discrete variable is one that can assume a finite number of values. (1, 2, 3) Continuous variable is one that can assume infinite values within a specified interval. (1.1, 1.2, 1.3) Dependent variable is a variable which is affected or influenced by another variable. Independent variable is one which affects or influences the dependent variable Constant is a property or characteristics of a population or sample, which makes the members of the group similar to each other. Dichotomy have only two categories or levels Polytomy have three or more categories

Business Statistics Comprehensive Examination Review

Graduate School Notre Dame University Cotabato City

Types of variables: Univariate involves only one variable Bivariate involves two variables at the same time Multivariate deals with three or more variables at the same time Variables can be: Nominal which can be classified into two or more categories/to identify only (purpose); (e.g. gender, civil status) Ordinal are those grouped according to rank or order of the categories; (e.g. ranking of contestants in a contest) Interval 0 point is arbitrary; (e.g. temperature) Ratio there is a true zero; (e.g. height, weight) Data Gathering Techniques / Sampling Design and Sampling Techniques Collecting Data is the first step in conducting a study or research. Data may be primary or secondary. The following are the various ways of collecting or gathering data:

1. 2. 3. 4.

The Direct or Interview Method (Questionnaire) The Indirect or Questionnaires Method (Self-Administered Questionnaire) The Registration Method (Vital Documents: Birth Certificate, etc.) The Experimental Method (cause and effect relationship)

Determining the sample size: Slovins formula: n = N / (1 + Ne2) Example: 1. N = 10,000 at e = 10% 2. N = 10,000 at e = 5% N 10,000 E n= 10% n (10,000) _ 1 + (10,000)(0.10)2 = 99 (10,000) _ 1 + (10,000)(0.05)2 385 From a population of 10,000 and an error margin of 10%, we need to draw a sample of 99. From a population of 10,000 and an error margin of 5%, we need to draw a sample of approximately 385.

n= 10,000 5%

Business Statistics Comprehensive Examination Review

Graduate School Notre Dame University Cotabato City

Sampling can be: Probability Sampling random samples where every element in the population has equal chance to be selected Non-probability sampling judgemental; with biases Types of probability sampling 1. Simple Random Sampling (lottery, raffle draw, etc) 2. Stratified Random Sampling a. Simple Stratified (Slovins formula) b. Proportional Stratified or percent proportion (n/N) Example: N = 1,000 students (300 Males, 700 Females) n = 200
Males Population Sample Proportion (n/N) Sample Females Total

300 0.20 60

700 0.20 140

1,000 0.20 200

3. Systematic = get the kth element (N/n) from the population after drawing the first sample
4. Cluster Sampling Example: N = 20 hospitals with 800 nurses n = 200 nurses number of nurses per hospital (X) = 40 required cluster, Y Y = n/X = 200/40 = 5 hospitals Therefore, select 5 hospitals from 20 hospitals with 40 nurses each 5. Multi-Stage Sampling = combination of several sampling techniques

Types of Non-probability sampling 1. Convenience sampling 2. Quota sampling 3. Purposive sampling 4. Snowball sampling 5. Accidental or incidental sampling

Business Statistics Comprehensive Examination Review

Graduate School Notre Dame University Cotabato City

Data Organization and Data Presentation Data must be presented in an organized and systematic way so that significant characteristics can be easily seen. Data can be presented in three forms: textual, tabular, and graphical. Data can be classified as: Ungrouped data not organized or if arranged, could only be from highest to lowest or lowest to highest Grouped data are data that are organized and arranged into different classes or categories. Textual Ungrouped data can be presented in textual form, as in paragraph form. This involves enumerating the important characteristics, giving emphasis on significant figures and identifying important features of the data. Tabular Data are presented using tables (frequency distribution table, etc.). By organizing the data in tables, important features about the data can be readily understood and comparisons can be easily made. Thus, a table shows complete information regarding the data. A frequency distribution table is a table which shows the data arranged into different classes and the number of cases which fall into each class. Graphical Data are presented through graphs, diagrams, charts, etc. A graph adds life and beauty to ones work, but more than this, it helps facilitate comparison and interpretation without going through the numerical data. bar chart histogram

pie chart trend line, etc.

Business Statistics Comprehensive Examination Review

Graduate School Notre Dame University Cotabato City

Measures of Central Tendency/Location It is useful to define numerical measures that describe important features of the data. Any measure indicating the center of a set of data, arranged in an increasing or decreasing order of magnitude, is called a measure of central location or a measure of central tendency. The most commonly used measures of central location are the mean, median, and mode. These are numerical descriptive measures. Mean/Arithmetic mean. The mean of a set of values or measurement is the sum of all the measurements divided by the number of measurements in the set. Median. It is the middle value of a given set of measurements, provided that the values or measurements are arranged in an array. An array is an arrangement of values in increasing or decreasing order. Mode. It is the value which occurs most frequently in a set of measurement or value. For ungrouped data Sample Mean X = X/n where: X = mean X = sum of the measurements or values n = number of measurements Example: Below are the travel times spent by Juan dela Cruz in going to his work the previous week. Compute for the mean and interpret the result. Day Monday Tuesday Wednesday Thursday Friday Time spent in travelling 60 min. 45 min. 50 min. 53 min. 47 min.

Weighted Arithmetic Mean X = XW/ W where: X = mean X = measurements or value W = weight

Business Statistics Comprehensive Examination Review

Graduate School Notre Dame University Cotabato City

Example: Subject English Statistics Filipino History Units 5 5 3 3 16 Median Examples: Mode Example: For grouped data Example: Class Interval 22 28 29 35 36 42 43 49 50 56 57 63 Class Mark (Xi) 25 32 39 46 53 60 Frequency (fi) 7 6 5 3 2 2 N = 25 Cumulative Frequency (<cf) 7 13 18 21 23 25 2 3 3 5 1 5 3 6 12 12 24 24 13 13 15 15 18 18 23 Grade 95 90 92 83

(fiXi) 175 192 195 138 106 120 926

Mean

X = fiXi/n = 926 / 25 = 37.04 Median Median = lm + c [(N/2 - fm-1)/fm]


Business Statistics Comprehensive Examination Review

Graduate School Notre Dame University Cotabato City

Where:

lm = lower class boundary of the median class


c = class size N = total frequency fm-1 = less than cumulative frequency above the median class fm = frequency of the median class

Steps to follow: 1. Construct the less than cumulative frequency. 2. Determine the median class. This is the class interval containing one-half of the total frequency N/2 in the less than cumulative frequency column. 3. Use the formula to find the median. Mode Mo = lmo + c [(fmo fa)/(2fmo fa fb)] Where:

lmo = lower class boundary of the modal class


c = class size fmo = frequency of the modal class fa = frequency above the modal class fb = frequency below the modal class

Other Measures of Location Percentile. Percentiles are values that divide a set of observations into 100 equal parts. These values, denoted by P1, P2, , P99, are such that 1% of the data falls below P 1, 2% falls below P2, .. and 99% falls below P99. Decile. Deciles are values that divide a set of observations into 10 equal parts. These values, denoted by D1, D2,.., D9, are such that 10% of the data falls below D1, 20% falls below D2, ., and 90% falls below D9. Quartile. Quartiles are values that divide a set of observations into 4 equal parts. These values, denoted by Q1, Q2, and Q3, are such that 25% of the data falls below Q1, 50% falls below Q2 and 75% falls below Q3. Measures of variability/dispersion The three measures of central location do not by themselves give an adequate description of our data. We need to know how the observations spread out from the average. It is quite possible to have two sets of observations with the same mean or median that differ considerably in the variability of their measurements about the average. It measure the amount of spread out of the values about a central point It is a supplement to the measures of central tendency Through this, we can determine if the data is homogeneous or heterogeneous The smaller the value of measure of dispersion, the more meaningful is the value of the central tendency.
8

Business Statistics Comprehensive Examination Review

Graduate School Notre Dame University Cotabato City

Range. The range of a set of data is the difference between the largest and smallest number in the set. Mean Absolute Deviation (MAD). MAD is the average of the summation of the absolute deviation of each observation from the mean. MAD = /X X/ / N Where: X = the value or score from the raw data X = mean N = total number of cases Variance. Variance is the average of the squared deviations from the mean. For ungrouped data: 2 = (X X)2/ N s2= (X X)2 / (N 1) For grouped data: 2 = [f(X X)2] / N s = [f(X X) ] / (N 1)
2 2

(population) (sample)

(population) (sample)

Standard Deviation. The square root of the variance. Coefficient of Variation. The standard deviation does not itself tell us much about the variability of a single set of data. Perhaps a more appropriate measure is the coefficient of variation, defined by V=s/X or V = / ,

which expresses the standard deviation as a percentage of the mean. Since V is a measure of relative variation expressed as a percent, the coefficient of variation can be used to compare the variability of two or more sets of data even when the observations are expressed in different units of measurement. Symmetry and Skewness A distribution is said to be symmetric if it can be folded along a vertical axis so that the two sides coincide. A distribution that lacks symmetry with respect to a vertical axis is said to be skewed. Symmetry. A distribution of scores may be symmetrical or asymmetrical. Imagine constructing a histogram centred on a piece of paper and folding the paper in half the long way. If the distribution is
Business Statistics Comprehensive Examination Review

Graduate School Notre Dame University Cotabato City

symmetrical, the part of the histogram on the left side of the fold would be the mirror image of the part on the right side of the fold. If the distribution is asymmetrical, the two sides will not be mirror images of each other. True symmetric distributions include the normal distribution. Asymmetric distributions are more commonly found. Skewness. If a distribution is asymmetric it is either positively skewed or negatively skewed. A distribution is said to be positively skewed if the scores tend to cluster toward the lower end of the scale (that is, the smaller numbers) with increasingly fewer scores at the upper end of the scale (that is, the larger numbers). A negatively skewed distribution is exactly the opposite. With a negatively skewed distribution, most of the scores tend to occur toward the upper end of the scale while increasingly fewer scores occur toward the lower end. An example of a negatively skewed distribution would be age at retirement. Most people retire in their mid 60s or older, with increasingly fewer retiring at increasingly earlier ages. Normal Curve: Normal Distribution is a distribution with a bell-shaped appearance. In a normal distribution, the: mean = median = mode.

Non-Normal Curves If the data are skewed to the right, the mean will be greater than the median; and for skewed to the left, the mean is less than the median.

Business Statistics Comprehensive Examination Review

10

Graduate School Notre Dame University Cotabato City

Pearsonian Coefficient of Skewness SK = 3 (X XMd) s or SK = 3 ( Md)

SK will fall from -3 to +3 Normal distribution: SK = 0 Positively Skewed/Skewed to the right: SK is positive Negatively Skewed/Skewed to the left: SK is negative Examples: Number of correct answers 1 2 3 4 5 6 7 f 1 2 4 5 4 2 1 F 0 0 1 2 4 9 3 f 3 9 4 2 1 0 0

Kurtosis Curves of a distribution may have the same coefficient of skewness but may still differ significantly in some aspects. Measures of central tendency, variability, and skewness do not tell us anything about the peakedness or flatness of a distribution. If a distribution is symmetric, the next question is about the central peak: is it high and sharp, or short and broad? You can get some idea of this from the histogram, but a numerical measure is more precise. The height and sharpness of the peak relative to the rest of the data are measured by a number called kurtosis. Higher values indicate a higher, sharper peak; lower values indicate a lower, less distinct peak. This occurs because higher kurtosis means more of the variability is due to a few extreme differences from the mean, rather than a lot of modest differences from the mean. The reference standard is a normal distribution, which has a kurtosis of 3. In token of this, often the excess kurtosis is presented: excess kurtosis = kurtosis3.

A normal distribution has kurtosis exactly 3 (excess kurtosis exactly 0). Any distribution with kurtosis 3 (excess 0) is called mesokurtic.
11

Business Statistics Comprehensive Examination Review

Graduate School Notre Dame University Cotabato City

A distribution with kurtosis <3 (excess kurtosis <0) is called platykurtic. Compared normal distribution, its central peak is lower and broader, and its tails are shorter thinner. A distribution with kurtosis >3 (excess kurtosis >0) is called leptokurtic. Compared normal distribution, its central peak is higher and sharper, and its tails are longer fatter.

to a and to a and

Leptukortic

Normal

Platykurtic

Ku = (X X)4 (for ungrouped data) Ns4 Ku = f(Xm X)4 (for grouped data) Ns4 Where: Ku = is the Kurtosis X = is the raw data X = is the mean Xm = is the class mark s4 = is the square of variance N = is the size of data set Example: 2 3 3 4 4 4 5 5 6

Answer: Ku =1.812 (platykurtic)

Business Statistics Comprehensive Examination Review

12

Graduate School Notre Dame University Cotabato City

Normal Distribution Normal Curve Mode = Median = Mean It is symmetrical about the mean. The tails or ends are asymptotic The total area under the normal curve is defined to be 100% Standard Normal Distribution is a distribution with = 0 and = 1. Z-score measures how many standard deviations a particular value is above or below the mean. Formula: z=X Where; X = raw score = mean = standard deviation Example: Scores of Mario Math X z-score 80 5 85 1.0 English 75 5 82 1.4 Mario performed better in English than in Mathematics

Areas Under the Normal Curve Refer to the Table of areas under the normal curve Problem Hourly wages of 500 employees = P100 = P10 n = 500 a. Find the percentage of workers whose hourly wages are: 1. from P100-P110 2. from P110-P130 3. from P80-P90 b. Find the minimum hourly wage of the upper 10% of the workers. c. Maximum hourly wage of the lower 20% of the workers

Business Statistics Comprehensive Examination Review

13

Graduate School Notre Dame University Cotabato City

Solution: a.1. Compute for the values of z. Use the formula z=X z1 = X1 = 100 100 = 0 (from table: 0.5000) 10 Z2 = X2 = 110 100 = 0 (from table: 0.8413) 10 z = z2 z1 = 0.8413 0.5000 = 0.3413 b. Compute for the value of X, transpose the formula X = z + From the table, find the value of z, converge points by looking for the value closer to 0.90. (0.90 is taken from 10%); 0.90 is closer to z = 1.28 X = (1.28)(10) + 100 = P112.80 (the minimum hourly wage of the upper 10% of the workers) Simple Test of Hypothesis If you see a handsome guy or a beautiful lady, what conclusions do you make regarding the whole family? You may say, The whole family might be all good-looking! Without you knowing it, you are already making an inference or generalization regarding the entire family. But how would you know if what you are thinking is correct? Well, you need to see all the members of the whole family, and only then you can say, I am right! Now, let us suppose that you want to know if male students really out-performed female students in mathematics. Do you think you have a way of searching all males and females out there to compare their grades in mathematics? Do you have enough resources to go out and gather all the necessary data in order to verify your theory? When problems like these arise, then you can just get a representative sample of the population, then make an inference or generalization on the population based on the results of your study on samples. This process is called hypothesis testing. Hypothesis Testing is the process of making an inference or generalization on a population based on the results of the study on samples. When you make guesses, just like when you say that the entire family might be all good-looking, you are already formulating a hypothesis. But such statement is referred to as statistical hypothesis which is still subject to verification.

Business Statistics Comprehensive Examination Review

14

Graduate School Notre Dame University Cotabato City

Statistical Hypothesis is a guess or prediction made by a researcher regarding the possible outcome of the study. There are two-types of statistical hypothesis:

1. Null hypothesis (Ho): It is the hypothesis to be tested which one hopes to reject. It
shows equality or no significant difference or relationship between variable. 2. Alternative hypothesis (Ha): It generally represents the idea which the researcher wants to prove. In doing a research, the first thing you should have is a research problem. From there, you can formulate your null and alternative hypotheses. Example of a research problem Performance in mathematics of the first-born and the last-born children In this problem, you are comparing the performance in mathematics of two groups, namely: A: First-born children B: Last-born children Since the null hypothesis always shows non-significance of difference between the groups being compared, then it should be stated in the following manner: (Ho): There is no significant difference in the performance in mathematics between first-born and last-born children. After stating the null hypothesis, the alternative hypothesis can be formulated. It should be the opposite of the null hypothesis. In the problem, the alternative hypothesis should be stated as follows: (Ha): There is a significant difference in the performance in mathematics between first-born and last-born children. Stating the null hypothesis is not much of a problem, since we just have to express that the groups being compared are equal or have no difference at all. But in the case of alternative hypothesis, we have to consider three things: 1. there is difference between the groups being compared. 2. one group is superior to the other. 3. one group is inferior to the other. The manner in which the alternative hypothesis is stated determines the type of hypothesis test to be used. If the null hypothesis is rejected, the alternative is accepted; and if the null hypothesis is accepted, it follows that the alternative hypothesis is rejected. For this reason, all the possible values of the population parameter which are not included in the null hypothesis should be included in the alternative hypothesis. Rejection of the null hypothesis means it is wrong, while acceptance of the null hypothesis does not mean it is true, it means that we do not have evidence to reject it. We may have an insufficient number of samples or there is an error in sampling, or some restrictions or the test used were not followed.
Business Statistics Comprehensive Examination Review

15

Graduate School Notre Dame University Cotabato City

Types of Hypothesis Testing To test the hypothesis, we may use a one-tailed test or a two-tailed test depending on the alternative hypothesis.

1. One-tailed test: It is a directional test with the region of rejection lying in either left or
right tail of the normal curve.

a. Right directional test: The region of rejection is on the right tail. It is used when
the alternative hypothesis uses comparatives such as greater than, higher than, better than, superior to, exceeds, etc. Acceptance Region

Rejection Region

b. Left directional test: The region of rejection is on the left tail. It is used when the
alternative hypothesis uses comparatives such as less than, smaller than, inferior to, lower than, below, etc. Acceptance Region Rejection Region

2. Two-tailed test: It is a non-directional test with the region of rejection region on both
tails of the normal curve. It is used when the alternative hypothesis uses words such as not equal to, significantly different, etc.

Business Statistics Comprehensive Examination Review

16

Graduate School Notre Dame University Cotabato City

Acceptance Region

Rejection Region

Rejection Region

Research Problem: Performance in mathematics of the first-born and the last-born children If the researcher feels that the first born children perform better in mathematics than the lastborn children, then the null and alternative hypotheses could be stated as follows: Ho: The first-born children perform equally well in mathematics as the last-born children. Ha: The first born children perform better in mathematics than the last-born children. Notice: - the null hypothesis implies that the first-born childrens performance in mathematics is just the same as the performance of the last-born. In other words, their performances in mathematics are equal - the alternative hypothesis requires a one-tailed test, right directional test If the researcher just wants to know if there is a difference between their performance, then the null and the alternative hypotheses could be stated as follows: Ho: There is no significant difference in the performance in mathematics between first-born and last-born children. Ha: There is a significant difference in the performance in mathematics between first-born and last-born children. Notice: - as before, the null hypothesis shows equality of their performance - the alternative hypothesis implies that they are not equal and requires two-tailed test or a non-directional test. Examples: 1. It is known that in our school canteen, the average waiting time for a customer to receive and pay for his order is 20 minutes. Additional personnel has been added and now the management wants to know if the average waiting time had been reduced.

2. A teacher wants to know if there is a significant difference in the performance in


Statistics between morning and afternoon classes.

Business Statistics Comprehensive Examination Review

17

Graduate School Notre Dame University Cotabato City

Note:

1. The null hypothesis (Ho) always expresses equality (=) 2. The alternative hypothesis (Ha) can be expressed in the form which involves >, <, or .
Types of Errors: Type I () is the error committed when the null hypothesis is rejected when in fact it is true and the alternative hypothesis is false. Type II () is the error committed when the null hypothesis is accepted when in fact it is false and the alternative hypothesis is true. Level of Significance of a Test The probability of committing Type I error is designated by alpha (), while the probability of committing a Type II error is designated by beta (). Alpha is the size of the rejection region, while beta is the size of acceptance region. The most popular levels of significance of are the 0.01 and the 0.05 levels. If we want a smaller probability of committing a Type I error, we can set at a smaller values than 0.01. An =0.05, means:

that when a different set of samples was taken from the population, the probability of getting a result which is the same as the one presently under study is 95% and the probability of getting a different result is 5% that we can accept about 5 chances in 100 that we could reject the null hypothesis when it should be accepted that we are 95% confident that we have made the right decision

When Ho is rejected at =0.05, the result is said to be significant. When it is rejected at =0.01, the result is said to be highly significant. The level of significance is set at the beginning of the test so that the researcher will not be tempted to change it when the result does not conform with his desired outcome. The following steps are suggested in testing the hypothesis:

1. Formulate Ho and Ha
2. Set the level of significance , then determine the type of hypothesis test and the tabular value. 3. Set the criterion (when to reject Ho). Determine and compute for the test statistic. /test statistic or computed value/ > /tabular value/; reject Ho, accept Ha /test statistic or computed value/ < /tabular value/; accept Ho, reject Ha 4. Make your decision. 5. Formulate your conclusion.

Business Statistics Comprehensive Examination Review

18

Graduate School Notre Dame University Cotabato City

Z - table Type One-Tailed Two-tailed 0.025 1.96 2.33 0.01 2.33 2.58 0.05 1.65 1.96

Types of Test Statistics for Hypothesis Tests Concerning Means 1. The Z-test (used when n is large, or n30) Example: The principal of ABC High School wants to know which batch of students performed better in English. He took a random sample of 40 students in last years batch and found it to have a mean final grade of 83 with a standard deviation of 7. Fifty students from this years batch were randomly taken and it was found that they have a mean final grade of 86 with a standard deviation of 10. Does this indicate that last years batch is poorer in English than this years batch? Test at =0.01. a. Z-test for comparing hypothesized and sample means

Z = (X ) n Null: There is no significant difference between the average score of ABC HS and the 40 sample students. (Average score (40 students) = 80) Alternative: Alpha= 0.05, 2-tailed Tabular value (Z-tabular) = 1.96 Z (computed) = (84-80)(sq rt 40)/10 = 2.53 Z-computed > Z-tabular; 2.53>1.96; reject null and accept alternative Example: The average score in the entrance examination in Mathematics at ABC High School is 80 with a standard deviation of 10. A random of 40 students was taken from this years examinees and it was found to have a mean score of 84.

a. Is there a significant difference between the known mean and the sample mean?
Test at =0.05. b. Does this indicate that this years batch is better in mathematics than the previous batches? b. Z= Z-test for comparing 2 sample means X1 X2____ S12 + S22_
19

Business Statistics Comprehensive Examination Review

Graduate School Notre Dame University Cotabato City

n1

n2

Z-tabular = 2.33 Z-computed =

Other Statistical Tools Many of us are interested in investigating possible relationships between two or more things. We sometimes wonder why people with high earnings also have high expenditures. A teacher may want to know how well the scores of first year students in the entrance examination relate to their class performance. A businessman may want to know the relationship between advertising cost and earnings or profit. Correlation and regression are two related statistical tools. We use correlation to determine if a relationship exists between two variables. On the other hand, we use regression to predict the value of one variable from our knowledge of the other variable. Correlation Have you encountered situations where two quantitative variables tend to be related to each other? For example, income and educational attainment may be related. If one has a high educational attainment, he is likely to have high income. Also, high income might be associated with high expenditures. Correlation analysis is a method used to measure a strength of relationship between two or more variables. Correlation analysis is a very powerful tool in determining the relationship between variables. However, the exercises of a correlation do not necessarily simply direct causal relation. By a direct causal relation, we mean that if X and Y are correlated, then X is partly the cause of Y or Y is partly the cause of X. Examples of correlation that imply a causal relation are correlation between income and savings, or extent of fatigue and performance in speed test. We believe that the higher our income, the more likely that we have big savings. Although, there are some factors that can affect the relationship between these variables, we feel intuitively that a corraltion really exists between them. Types of Correlation 1. Positive Correlation when high scores in one variable associated with high scores in the second variable. This also true when low scores in one variable are associated
Business Statistics Comprehensive Examination Review

are is with
20

Graduate School Notre Dame University Cotabato City

low scores in the other. Thus, there is a direct relationship that exists in positively correlated variables. Also, in a positive correlation, the points on the scatter diagram closely follow a straight line rising to the right.

2. Negative Correlation when high scores in one variable are associated with low scores in the second variable, and vice versa. Here, points on the scatter diagram closely follow a straight line falling to the right and, thus, a negative correlation exists between the variables under consideration.

3. Zero Correlation when scores in one variable tend to score neither systematically high or systematically low in the other variable. The points in the scatter diagram are spread in a random manner when this relationship exists. The relationship between two variables may also be described by its magnitude or its strength. In terms of strength, correlation may be perfect, high, moderate, or low. In a perfect correlation, all points on the scatter diagram lie on a straight line. Correlation Coefficient (r) The scatter diagram gives us very little information regarding the relationship that exists between two variables. Fortunately, we have descriptive measures that will tell us the magnitude or degree of relationship between two variables. This descriptive measure is a single number called correlation coefficient, usually denoted by r. The value of r ranges from 1 to 1. Hence, there is neither a value of r that can exceed +1 nor lower than 1. The interpretation of the value of r can be determined using the correlation scale. If r is +1, then there is a perfect positive correlation. If r is 1, there is a perfect negative correlation. If r is zero, there is no correlation. The correlation scale is shown below. Value of r 0.80 to 1.00 0.60 to 0.79 0.40 to 0.59 0.20 to 0.39 0 to 0.19 Interpretation High correlation Moderately high correlation Moderate correlation Low correlation Negligible correlation

Business Statistics Comprehensive Examination Review

21

Graduate School Notre Dame University Cotabato City

If r is +0.78, then there is moderate high positive correlation between the variables. Thus, the sign of r indicates the direction of correlation while the absolute value of r indicates the extent or magnitude of correlation. Pearson Correlation Coefficient The most popular and widely used correlation coefficient is the Pearson product-moment correlation coefficient or simply Pearson r. If we let X and Y be the variables we are investigating, then the formula for finding the correlation coefficient is

r=

n (XY) (X)(Y) n(X2) (X)2 * n(Y2) (Y)2

where: n = number of paired observations XY = the sum of the products of X and Y X2 = the sum of the squared values of X Y2 = the sum of the squared values of Y X = the sum of the values of X Y = the sum of the values of Y Example: A personnel manager would like to know if there is a relationship between knowledge factors and practical factors of a training course. The following scores were obtained by six trainees on the knowledge factors, X, and the practical factors, Y, in a training course. Trainee 1 2 3 4 5 6 Knowledge Factors (X) 2 4 4 5 7 8 Practical Factors (Y) 4 10 8 8 14 16

Solution: Construct the table as shown below, Trainee 1 2 3 4 5 6 X 2 4 4 5 7 8 Y 4 10 8 8 14 16 XY 8 40 32 40 98 128 X2 4 16 16 25 49 64 Y2 16 100 64 64 196 256 22

Business Statistics Comprehensive Examination Review

Graduate School Notre Dame University Cotabato City

Summation ()

30

60

346

174

696

After completing the table, substitute the obtained values in the formula. r= 6 (346) (30)(60) 6(174) (30) * 6(696) (60) Coefficient of Determination If we square the value of the Pearson r, we get another descriptive measure called the coefficient of determination. The coefficient f determination tells us the amount of variation in Y that can be accounted for by the variation in X. The coefficient of determination is denoted by the symbol r2. With reference to the previous example, the correlation coefficient r between knowledge factors (X) and practical factors (Y) is 0.96, and so r2 is r2 = (0.96)2 = 0.92. This means that 92% of the variation in practical factors (Y) is due to the variation in knowledge factors (X). Sales independent (X) Profit dependent (Y) Regression Analysis Linear Regression: If two variables are correlated, then it is possible to predict or estimate the value of one variable from the knowledge of the other variable. Suppose the sales and price of a product are correlated, then we can predict the future sales of the product in terms of its price. If income and expenditure are correlated, then we can predict expenditure in terms of income. Problems concerning prediction, estimation, and forecasting can be solved using regression analysis. Regression analysis deals with the estimation of one variable based on the changes or movements of the other variable. If X and Y are correlated variables and we want to predict or estimate the value of Y given the value of X, we have to find the regression equation that describes the relationship between two variables. In general, the regression equation can be represented by the equation Y = a + bX, where a and b are constants and b0.
2 2

0.96 (high positive correlation)

Business Statistics Comprehensive Examination Review

23

Graduate School Notre Dame University Cotabato City

If we know the values of the constants a and b, then we can solve for Y, the dependent variable for any given value of X, the independent variable. Thus, we can predict the value of Y in terms of X by finding the values of a and b in the regression equation.

The constants a and b can be found using the following formula: b = n(XY) (X)(Y) , and n(X2) (X)2 a = Y bX where: Y = average of Y X = average of X

From the previous example, solving for the values of a and b, b = 6 (346) (30)(60) = 276 = 1.9167 6(174) (30)2 144 a = (60/6) (1.9167)(30/6) = 0.4167 then, the regression equation would be Y = 0.4167 + 1.9167X. The line would be plotted
18 16 14 12 y = 1.9167x + 0.4167 R2 = 0.9184

Note: If the equation will be used to project values, just substitute the value of the independent variable (X), then compute algebraically the values of the dependent variable (Y).

10 8 6 4 2 0 0 2 4 6 8 10

Business Statistics Comprehensive Examination Review

24

You might also like