The document provides an overview of various methods for summarizing categorical (qualitative) data and quantitative data, including frequency distributions, relative frequency distributions, percent frequency distributions, bar graphs, pie charts, dot plots, histograms, cumulative distributions, ogives, stem-and-leaf displays, and cross-tabulations. It also discusses scatter diagrams as a method for summarizing the relationship between two variables. Examples are provided to illustrate each method using sample categorical and quantitative data sets.
The document provides an overview of various methods for summarizing categorical (qualitative) data and quantitative data, including frequency distributions, relative frequency distributions, percent frequency distributions, bar graphs, pie charts, dot plots, histograms, cumulative distributions, ogives, stem-and-leaf displays, and cross-tabulations. It also discusses scatter diagrams as a method for summarizing the relationship between two variables. Examples are provided to illustrate each method using sample categorical and quantitative data sets.
Types of data 2 Categorical data (qualitative data) Categorical Data Use labels or names to identify an attribute of each element. Use either the nominal scale or ordinal scale of measurement and may be nonnumeric or numeric. The statistical analysis for qualitative data are rather limited Categorical variable is a variable with categorical data Examle! Car tye "sedan# sort car# $U%# minivan# &'%# and so on( ðod of ayment "cash# credit card# chec)( * Data set for 2 !utual fu"ds + #ua"titative data ,uantitative data ,uantitative data are always numeric. Use either the interval or ratio scale of measurement. -rdinary arithmetic oerations are meaningful only with quantitative data. ,uantitative variable is a variable with quantitative data. Discrete if they are countable data and are collected by counting. Ex! the number of items Continuous if they are collected by measuring and are exressed on a continuous scale. Ex! time to failure of um comonent . #uestio"$ %hat&s the type of these data' Tyes of shis Categorical /T$' students collecting data on the number of shis entering inner channel of $urabaya 0est 1ccess Channel in a articular day. #ua"titative discrete Time until a fuel oil simlex filter getting clogged u. #ua"titative co"ti"uous 2 Tabular a"d graphical !ethods for su!!ari(i"g data 3 Su!!ari(i"g categorical)qualitative data 4requency Distribution 5elative 4requency 'ercent 4requency Distribution 6ar 7rah 'ie Chart 8 *reque"cy distributio" + freque"cy distributio" is a tabular summary of data showing the frequency "or number( of items in each of several nonoverlaing classes. The ob9ective is to provide i"sights about the data that cannot be quic)ly obtained by loo)ing only at the original data. : ,-a!ple$ .arada /"" ;< ,-a!ple$ .arada /"" ;; Relative freque"cy distributio" ;2 0erce"t freque"cy distributio" The perce"t freque"cy of a class is the relative frequency multilied by ;<<. 1 perce"t freque"cy distributio" is a tabular summary of a set of data showing the ercent frequency for each class. ;* ,-a!ple$ .arada /"" ;+ 5elative frequency and ercent frequency distributions 1ar graph 1 bar graph is a grahical device for deicting qualitative data that have been summari=ed in a frequency# relative frequency# or ercent frequency distribution. -n the hori=ontal axis we secify the labels that are used for each of the classes. 1 freque"cy# relative freque"cy# or perce"t freque"cy scale can be used for the vertical axis. Using a bar of fixed width drawn above each class label# we extend the height aroriately. The bars are separated to emhasi=e the fact that each class is a searate category. ;. 1ar graph 2 .arada /"" ;2 0ie chart The pie chart is a commonly used grahical device for resenting relative frequency distributions for qualitative data. 4irst draw a circle> then use the relative frequencies to subdivide the circle into sectors that corresond to the relative frequency for each class. $ince there are *2< degrees in a circle# a class with a relative frequency of <.2. would consume <.2."*2<( ? :< degrees of the circle. ;3 0ie chart 2 .arada /"" ;8 ,-a!ple 2 .arada /"" @nsights 7ained from the 'receding 'ie Chart -nehalf of the customers surveyed gave &arada a quality rating of Aabove averageB or AexcellentB "loo)ing at the left side of the ie(. This might lease the manager. 4or each customer who gave an AexcellentB rating# there were t%o customers who gave a AoorB rating "loo)ing at the to of the ie(. This should dislease the manager. ;: ,-a!ple 2 *ive popular softdri"ks 2< Relative freq3 a"d perce"t freq3 distributio"s 2; 1ar chart a"d pie chart 22 Su!!ari(e of qua"titative data 4requency Distribution 5elative 4requency and 'ercent 4requency Distributions Dot 'lot Cistogram Cumulative Distributions -give 2* ,-a!ple$ 4udso" +uto Repair 2+ The manager of Cudson 1uto would li)e to get a better icture of the distribution of costs for engine tuneu arts. 1 samle of .< customer invoices has been ta)en and the costs of arts# rounded to the nearest dollar# are listed below. *reque"cy distributio" 7uidelines for $electing Dumber of Classes Use between . and 2< classes. 1roximate formula to calculate number of class may also be introduced as! ra"ge 5 largest data value 2 s!allest data value class 5 k 5 67838 log 9: %here 9 is "u!ber of sa!ples i"terval 5 ra"ge)class Data sets with a larger number of elements usually require a larger number of classes. $maller data sets usually require fewer classes. 2. *reque"cy distributio" 7uidelines for selecting width of classes Use classes of equal width. 22 Approximate class width = Largcst data vaIuc -SmaIIcst data vaIuc Numbcr oI cIasscs ,-a!ple$ 4udso" +uto Repair 23 4requency distribution ,-a!ple$ 4udso" +uto Repair 28 ,-a!ple$ 4udso" +uto Repair 2: @nsights gained from the ercent frequency distribution! -nly +E of the arts costs are in the F.<.: class. *<E of the arts costs are under F3<. The greatest ercentage "*2E or almost onethird( of the arts costs are in the F3<3: class. ;<E of the arts costs are F;<< or more. Dot plot -ne of the simlest grahical summaries of data is a dot lot. 1 hori=ontal axis shows the range of data values. Then each data value is reresented by a dot laced above the axis. *< ,-a!ple$ 4udso" +uto Repair *; 4istogra! 1nother common grahical resentation of quantitative data is a histogram. The variable of interest is laced on the hori=ontal axis and the frequency# relative frequency# or ercent frequency is laced on the vertical axis. 1 rectangle is drawn above each class interval with its height corresonding to the intervalGs frequency# relative frequency# or ercent frequency. Unli)e a bar grah# a histogram has no natural searation between rectangles of ad9acent classes. *2 ,-a!ple$ 4udso" +uto Repair ** Cu!ulative distributio" The cumulative frequency distribution shows the number of items with values less than or equal to the uer limit of each class. The cumulative relative frequency distribution shows the roortion of items with values less than or equal to the uer limit of each class. The cumulative ercent frequency distribution shows the ercentage of items with values less than or equal to the uer limit of each class. *+ ,-a!ple$ 4udso" +uto Repair *. ;give 1n ogive is a graph of a cu!ulative distributio". The data values are shown on the hori=ontal axis. $hown on the vertical axis are the! cumulative frequencies# or cumulative relative frequencies# or cumulative ercent frequencies The frequency "one of the above( of each class is lotted as a oint. The lotted oints are connected by straight lines. *2 ,-a!ple$ 4udso" +uto Repair -give 6ecause the class limits for the artscost data are .<.:# 2<2:# and so on# there aear to be one unit gas from .: to 2<# 2: to 3<# and so on. These gas are eliminated by lotting oints halfway between the class limits. Thus# .:.. is used for the .<.: class# 2:.. is used for the 2<2: class# and so on. *3 ,-a!ple$ 4udso" +uto Repair *8 -give with Cumulative 'ercent 4requencies ,-ploratory data a"alysis The techniques of exloratory data analysis consist of simle arithmetic and easytodraw ictures that can be used to summari=e data quic)ly. -ne such technique is the stemandleaf dislay. *: Ste!<a"d<leaf display 1 stemandleaf dislay shows both the ran) order and shae of the distribution of the data. @t is similar to a histogram on its side# but it has the advantage of showing the actual data values. The first digits of each data item are arranged to the left of a vertical line. To the right of the vertical line we record the last digit for each item in ran) order. Each line in the dislay is referred to as a stem. Each digit on a stem is a leaf. +< ,-a!ple$ 4udso" +uto Repair +; Stretched ste!<a"d<leaf display @f we believe the original stemandleaf dislay has condensed the data too much# we can stretch the dislay by using two more stems for each leading digit"s(. 0henever a stem value is stated twice# the first value corresonds to leaf values of <+# and the second values corresonds to values of .:. +2 ,-a!ple$ 4udso" +uto Repair +* Ste!<a"d<leaf display Heaf Units 1 single digit is used to define each leaf. @n the receding examle# the leaf unit was ;. Heaf units may be ;<<# ;<# ;# <.;# and so on. 0here the leaf unit is not shown# it is assumed to equal ;. ++ ,-a!ple$ =eaf u"it 5 >36 +. ,-a!ple$ =eaf u"it 5 6> +2 Crosstabulatio"s a"d scatter diagra!s Thus far we have focused on methods that are used to summari=e the data for one variable at a time. -ften a manager is interested in tabular and grahical methods that will hel understand the relationshi between two variables. Crosstabulation and a scatter diagram are two methods for summari=ing the data for two "or more( variables simultaneously. +3 Crosstabulatio" Crosstabulation is a tabular method for summari=ing the data for two variables simultaneously. Crosstabulation can be used when! -ne variable is qualitative and the other is quantitative 6oth variables are qualitative 6oth variables are quantitative The left and to margin labels define the classes for the two variables. +8 ,-a!ple$ *i"ger =akes 4o!es +: Crosstabulation The number of 4inger Ha)es homes sold for each style and rice for the ast two years is shown below. ,-a!ple$ *i"ger =akes 4o!es .< @nsights 7ained from the 'receding Crosstabulation The greatest number of homes in the samle ";:( are a slitlevel style and riced at less than or equal to F::#<<<. -nly three homes in the samle are an 14rame style and riced at more than F::#<<<. Crosstabulatio"$ Ro% or colu!" perce"tages Converting the entries in the table into row ercentages or column ercentages can rovide additional insight about the relationshi between the two variables. .; ,-a!ple$ *i"ger =akes 4o!es .2 Scatter diagra! 1 scatter diagram is a grahical resentation of the relationshi between two quantitative variables. -ne variable is shown on the hori=ontal axis and the other variable is shown on the vertical axis. The general attern of the lotted oints suggests the overall relationshi between the variables. .* Scatter diagra! .+ ,-a!ple$ 0a"thers *ootball Tea! .. ,-a!ple$ 0a"thers *ootball Tea! .2 ,-a!ple$ 0a"thers *ootball Tea! The receding scatter diagram indicates a ositive relationshi between the number of intercetions and the number of oints scored. Cigher oints scored are associated with a higher number of intercetions. The relationshi is not erfect> all lotted oints in the scatter diagram are not on a straight line. .3 Refere"ces $tatistics for 6usiness and Economics# 1nderson# $weeney# and 0illiams# 0est 'ublishing Comany. $tatistics for 6usiness and Economics.# $outh0esternIThomson Hearning .8