0% found this document useful (0 votes)
44 views

13-Module 5 - ROC Curve Analysis - Introduction and Motivation-26-09-2023

Uploaded by

merylhephzibah03
Copyright
© © All Rights Reserved
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
44 views

13-Module 5 - ROC Curve Analysis - Introduction and Motivation-26-09-2023

Uploaded by

merylhephzibah03
Copyright
© © All Rights Reserved
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 8
MAT1031 Bio-Statistics Module 5 ROC Curve Analysi syllabus a isin cnn an i ii Care wi | ROC Curve Analysis IReceiver-Operating Characteristic (ROC) Analysis, |A ROC curve is a plot of the true positive rate (Sensitivity) lin function of the false positive rate (100-Specificity) for |different cut-off points of a parameter. Each point on the IROC curve represents a sensitivity/specificity pair |corresponding to a particular decision threshold. The Area lunder the ROC curve (AUC) is a measure of how well a parameter can distinguish between two diagnostic groups (diseased /normal). [ROC Curve Analysis (Continuation) [The ROC curve is a graph showing true positive rate on the vertical axis and false positive rate on the horizontal laxis, as the classification threshold t varies. It isa single [curve summarizing the information in the cumulative distribution functions of the scores of the two classes. One Jcan think of it as a complete representation of classifier performance, as the choice of the classification threshold It varies. [ROC Curve Analysis (Continuation) ‘The diagnostic performance of a test, or the accuracy of a test to discriminate diseased cases from normal cases is evaluated using Receiver Operating Characteristic (ROC) curve analysis (Metz, 1978; Zweig & Campbell, 1993). ROC curves can also be used to compare the diagnostic performance of two or more laboratory or diagnostic tests (Griner et al., 1981). When you consider the results of a particular test in two populations, one population with a disease, the other population without the disease, you will rarely observe a perfect separation between the two groups. Indeed, the distribution of the test results will overlap, as shown in the figure. [ROC Curve Analysis (Continuation) Criterion value Testresult [ROC Curve Analysis (Continuation) For every possible cut-off point or criterion value you select to discriminate between the two populations, there will be some cases with the disease correctly classified as positive (TP = True Positive fraction), but some cases with the disease will be classified negative (FN = False Negative fraction). On the other hand, some cases without the disease will be correctly classified as negative (TN = True Negative fraction), but some cases without the disease ‘will be classified as positive (FP = False Positive fraction). [Schematic Outcomes ofa Test Power Tne Poane(i®) » FabePoameG) © ave Nor Fae eae 6X) § Treen) «bea [Sensitivity and Specificity sre hy t= Faas ~ 7 Toa [Sensitivity and Specificity [Suppose that tis the value of the threshold T in a |particular classification rule, so that an individual is lallocated to population P if its classification score s lexceeds t and otherwise to population N. In order to assess Ithe efficacy of this classifier we need to calculate the |probability of making an incorrect allocation. Such a [probability tells us the rate at which future individuals |requiring classification will be misallocated. More |specifically, we can define four probabilities and their lassociated rates for the classifier. [Sensitivity and Specificity Supp 1c that is the value of the threshold T'in a particular elasi- P itits to population N, In order to asses the efficacy of this casifer we need to calculate the prob bility of making an incorrect allocation. Suc probability tells us the] rte at which future individuals requiring elasifcation will be mis: located. fication rule, so that an individual is allocated to popula [Sensitivity and Specificity 1. the probability that an individual from P is carreetly classified i.e, the true positive rate tp = pls > #[P); the probability that an individual from N is misclassified, i.e, the false postive rate fp = p(s > tIN): 8. the probability that an individual from N is correctly classified, Le, the true negative rate tn = p(s < tN}: and 4. the probability that an indivi false negative rate fn = pls < t{P). [Sensitivity and Specificity Given probability densities p(s|P). (I) values ying between O aud 1 ean be obtained readily for these four rate Jand this gives a full description of the performan \Cleatly, for good performance, we requite high “true 1d the value 4, numerical ‘of the clasitie However, this is for « particular choice of threshold f, and the best Jcnoice of fis not generally known in advance but unst be determined las part of the classifier construction. Varying f aud evaluating all the our quantities above wil larly give full information on whieh to base this decision and hence to assess the performance of the lassifer, but fsince tp + fn pin ih information, The ROC eurve provides «um wwe do not nee s ‘a more easily digestible summary, Tt Hs the curve obtained on vary f ere is the value on the horizontal axis (absciss fon the vertical axis (ordinate, ‘and Fabs [Sensitivity and Specificity Let us consider the extremes. ‘The classifier will be least success ful when the two popilations are exactly the same, so that p{s|P) = p(sIN) = p(s), say. In such a ease the probability of allocating an indie vidual to population P is the same whether that individual las come from P o from N, the exact value ofthis probability depending on the ehresholdwalue¢, So in this case, as ¢vaties ¢p will always equal fp aud the ROC euree wil be the straight lin joining points (0,0) and (1,1). This line is usually called the chance diagonal, as it represents essen ally random allocation of individuals to one of the two populations Figure shows three such curves plus the chance diagonal. solid eurve correspon to the best classifier, beeanse at any fp vale i has higher tp than all the others while at any tp value it as lower fp [Sensitivity and Specificity Figure : Three ROC curves, plus chance diagonal 08 cr 04 0.0 =: a: 88: of ee te fp ROC - Area Under the Curve (AUC) |The ROC curve is plotted with TPR against the FPR where ITPR is on the y-axis and FPR is on the x-axis. The area lunder a receiver operating characteristic (ROC) curve, labbreviated as AUC, is a single scalar value that measures {the overall performance of a binary classifier (Hanley and McNeil 1982). AUC - ROC curve is a performance measurement for the classification problems at various, {threshold settings. ROC is a probability curve and AUC lrepresents the degree or measure of separability. It tells, how much the model is capable of distinguishing between classes. [area under ROC Curve - AUC |AUC measures the entire two-dimensional area junderneath the entire ROC from (0,0) to (1,1). The AUC. value is within the range [0.0-1.0], where the minimum value represents the performance of a random classifier Jand the maximum value would correspond to a perfect classifier (eg, with a classification error rate equivalent to zero). As AUC ranges in value from 0 to 1, a model ]whose predictions are 100% wrong has an AUC of 0.0; one ]whose predictions are 100% correct has an AUC of 1.0, [area under ROC Curve - AUC [AUC is de: ble for the following two reasons: |e AUC is scale-invariant. It measures how well predictions are ranked, rather than their absolute values. |e AUC is classification-threshold-invariant. It measures the quality of the model's predictions irrespective of what classification threshold is chosen. [Area under ROC Curve - AUC [The AUC is a robust overall measure to evaluate the performance of score classifiers because its calculation relies on the complete ROC curve and thus involves all lpossible classification thresholds. The AUC is typically calculated by adding successive trapezoid areas below the JROC curve. [area under ROC Curve - AUC [Figure shows the ROC curves for two score classifiers A land B. In this example, classifier A has a larger AUC value Ithan classifier B. 8 Properties of the ROC ‘To study some of the fanuiliar math parties, lt us dei ical notation as the eurve y true positive rate tp and x isthe fale postive points (2,1) on the curve are determined by the classification sore S, and y ean be written more precisely as funetions of the parameter f, viz, (8) = pfs > tN) and y(t) = p(s > AP). However, we will only use this expanded notation if the presence of this parameter needs to be emphasized. Property 1 y = h(x) is @ monotone increasing function in the positive Iying between y = 0 at =O and y = 1 at x = 1 Proof: Consideration of the way that the classifier scores are arranged shows that both 2() and y(t) inerease and decrease together a t vies, Moreover, Litho (2) = lite u(t) = O and lity 2) Tim-.-2o y(t) = 1, whieh establishes the result. Property 1 y = h(2) is ® monotone increasing function in the positive quadra Iying between y = 0 at =O and y = 1 at x = 1 Proof: Consideration of the way that the classifier scores are arranged shows that both 2) and y(t) increase varies. Moreover, lity-ao-r() = lim, W(t Tim-.-2o y(t) = 1, whieh establishes the result. decrease together as t O and lim x 2(¢ Property 2 The ROC cu increasing transfor vo is unaltered the classification scores undergo astrietly Proof: Suppose that U = oS) is strictly increasing transfor se, St > Sp = Us = Si) > Uz = o(S2). Consider the poi ROC curve for $ at threshold value ise ‘on the sand let v= o(t). Then it follows WU > oP) GAS) > DIP) = wlS > t1P) {U > vIN) = plo(S) > (tN) = p(S > tN) ao that the same point exists on the ROC curve for UT. Applying the reverse argument to each point on the ROC curve for U establishes hat the two curves ae identical Property 8 Providing that the slope of the ROC at the point with threshold value 1 is welhdefined, itis given by ly _ p(CiP) Proof: First note that ule so that Ths 4 & Moreover, , x(t) = p(S > tN) PlsIN)ds, bata de . 4 = pan) Also dt dx E-1/h |which establishes the result. Zaid Area under the ROC eave Probably the most widely used suonmary index is the area under the ROC curve, commonly denoted AUC and studied by Green and Smets (1966), Bamber (1975), Hanley and McNeil (1982), and Bradley (1907) among others. Simple geometry establishes the upper and lower bounds of AUC: for the case of perfect separation of P and N, AUC is the area nder the upper borers of the ROC (ie, the area of a square of se 1) s0 the upper bound is 1.0, while for the case of random alloca AUC isthe area under the chance diagonal (ie. the area of a triangle ‘whose base and height are both equal to 1) s0 the lower bound i 0.5. For all other cases, the formal definition is AUC f vledde. Tn other words, if Sp and Sy are the scores allocated to randomly ancl independently chosen individuals from P and N respectively, then, AUC = {Sp > Sx). ‘To prove this result, we start from the definition of AUC given above and change the variable in the integration from the fp rate # t0 the classifier threshold t. From the proof of property 3 of the ROC as given above we fist recollct that de WO) = AS > tP).2(0) = AS > eN), and Z = CN), and we also note that 2 — 0 ast 90 and 2 — 1 as t= —20. Hence [Aue Toe J wea (om engin th able nertin) = ae WAS > e(P)p{@(N)de (Grom the result above) [2s ammere [ons [use > sen tk Sy = Sp > Sw) (by total probab as required. 2.5 The binormal model ‘The normal probability distribution has loug formed a cornerstone of statistical theory. Tt is used as the population model for very many situntions where the measurements are quantitative, and hence it derpins most basic inferential procedures for such measurements. The reasons for this are partly because empirical evidence suggests that ts taken iu practice do aetually behave roughly like normal populations, but also partly because math- ults such as the central limit theorem show that the nor- nal distribution provides perfectly adloquate approsimation to the ‘rue probability distribution of many important statistics. The nor- ‘mal model is thus a *standard” against which any other suggestion is visually measured in common statistical practice. Tikewise, for ROC analysis, ts useful to have such « standard model which can be adopted as a frst port of call in the expecta: tiom that i¢ will provide a reasonable anslysis in many practical situs tons and against which any specialized analysis ean be judged. Such ‘benchmark is provided by the binormal model iu which we asstimc the scores Sof the clasifier to have a normal distribution in ench of the two populations P and N, This model will always be “correct” if the original measurements X have multivariate normal distributions in the two populations and the clasifier i a Linear function of the mea surements ofthe type fist derived by Fisher (00), as is shown in ay standard multivariate text book (eg., Krzanowski and Marriott, 1995, pp. 29-30). However, itis also apprarimately correct for a much wider set of measurement populations and clasifiers. Moreover, as we shall see Inter in this section, this class is even wider in the specific ease of, ROC analysis. First, however, lt us explore some of the consequences of the binormal assutnption, ‘To be specific, we assume that the distributions of the soores 5 are normal in both populations and have means jp, ix. and standard de- Viations op, ow in P and N respectively. In accord with the convention that large values of S are indicative of population P and small ones indicative of population N we further assume that pp place no constraints on the standard deviations, ‘The > py, but we (5 up)/op has standagd normal distribution in P, andl (S—yxx)/ow bins a stan dard normal distribution in N. Suppose that the fp rate is x, with correspond 1 classifier threshold f. Then 2(0) = WS > HIN AZ > [t~ p/ow) ‘where Z has a standard normal distribution. Thus WZ < low A/a) (by 9 try of the normal distribution), 50 x(0) = # (4) where #() is the normal cumulative distribution fu ction (ef). Ths if = isthe value of Z giving tse to this ef, then aMe(p) = Ht ad tay on Xe Hence the ROC curve at this fp rate is ute) HS > tP) = HZ > (t~ wojor) = # (HE ‘and on substituting forthe value of¢ from above we obtain (eset) ule) ‘Thus the ROC curve is of the form y(x) = Ba + be.) or #"(y) = + 11(), where = (p—px)/or, and b=ox/or. It follows from the earlier assumptions that a > 0, while b is clearly nonnegative by definition, The former is known asthe intercept of the binormal ROC curve, ad the Iter as its slope op 02 04 05 081 Figure 22 shows three ROG curves derived from simple binormal modes. ‘The top eurve (dotted line) is for the case jy ‘8 mean difleence in classification scores of 4 virtually complete separation of the two normal ‘populations, so the ROC curve is very close to the best porsible on ‘The middle curve (sll Iie) is for py = 0,n4p = 2,0y = op = 1, and the reduction in separation of means to 2 standardized units is rflected in the poorer ROC curve. Finally, the bottom curve (dashed line) has the same values as the middle one except for op = 2; the further Aeterioration in performance is caused by the imbalance in standard deviations of the two poptlations, the higher standard deviation in ion P effectively diluti the difference in popt also be mentioned at this point that if one distribution has a sufficiently large standard deviation relative to that ofthe other, and to the difference between the population means, then the ROC curve will dip below the chance diagonal ‘One very weil consequence of this model i that its AUC can be derived very easily, and has a very simple form. We sw eal that AUC = p(Sp > Sx) = p(Sp — Sy > 0) for independent ‘But standard theory tells us that if Sp ~ N(up,o}) independe Sy ~ N(a, 08), then Sp—Sy ~ Nap — woh + denotes standard normal randloa variable auc Kullback-Leibler Divergence (KLD) IKLD(Kuliback and Letbler, 1951) ‘To mesure the dfleence bet ‘arable, 4 mote, elle oe inp, the KL “Srerpener, has been populty n mdningHteratire, "Te concept ‘was originated in pebabllty tory ad information they The KL divergence, which i cloely telat to roatve entropy, inform tion divergence, and information Jor dieriminatin, is otrayiannettie met sre ofthe diference between two probability cstibations p(s) and q(r Specially, the Kullhak-Leler (KL) divengener of gz) fom p(s), denoted 2) a merse of the information lt when qe) te fn discrete random 2) > Daal ge) >0 1 g(2) are two probability dsteution ssn. That im oth (2) alg) stan up to 1 foe ain X Dye (p(e}a)) i dete a Equation Paste) = Zo rehnMe IKLD (Continuation) The KL divergence menses the expected mane of extra bits required coe sales rom p(s) when wing acne Dsl on), athe thn ting ene ned om ps). Typically pe) repre he te” tition of dat [tweratiom, or precy calculated theoretical distribution, ‘The acetre ats) epealy represents a theory, mode, description, or aprcximation of a). "The continuous versio of the KE divergence Dake) Although the KL divergence measures the “distance” Ietween two dst bation, is uot distance mere, This is becatne that the KL divergence ist mete tansure, Te bs not symmtricr the KL from plc) 0 els) 8 severally wot the sume asthe KL fom q(x) to fx). Furthermore, 1 a fy triangular inequality: Nevertheless, Dier(PI}Q) i 4 non Dye(PIIQ) 20 ane Dye PI1Q) =0 if al oly i P= Q. Notice that attention should be pad when computing the KL divergence. Wi nn Fino Plagp =0. Homer, when p #0 tnt q = 0, Dx (pla) i let as a This met tat if ome event predicts its awolutely imperil (Le, gle) = 0), then the two distributions ar note diferent, KLD - Example Dietibston Distibtion a Binomial wipe 04, N=? oan | Bk Dwnmammate)| 4 E IKLD - Example (Continuation) Relative entropies Da. (P || @) and Dic.(Q | P) are calculated asfllows. Da(P 1) = FP in Z2) (228) 12, (22708) | 4 (4728 a(Fa)* "(aa )*s"(4s) = 3 (62In(2) + 551n(8) ~s01n(6)) m 0.0852000 Prat Ql P)= Sate 0 22 ) -10(@8) 405) 40(B) i =F AAln() — 6tn(9) + 61n(6)) = 0.087455 KLD Expres: For conic inter varie me Be A Nao) fr eer a Na) fe im th pando dente he ops ff, a ‘ett enna, opie) Then arlene at ‘he KLDs me a f)= A) for the spams bi-Nomal ROC curve i partic, My fda dltoh d=! ns for Bi-Normal ROC Model IKLD for Bi-Normal ROC Model (Continuation) ‘The KLDs ae now sand we can write these os No fy= eds KLD for Bi-Normal ROC Model (Continuation) Figure, alysis of @biNormal ROC curve. The gph shows the Kulfhok-Leibler ivergencs 15) (he oli ine) aod 1) (he dashed ite) fo two Normal desis: i) for cases hs = 3-4 ad said over ange tat ices y= and efor evils has = 20 and oy = 1 Wham 1 6) ~ A) a the corresponding ROC carve symm about he neti digs When 236 = 1-16) sf 298 the conesponting ROC curve is TPP: when 23 IVs)” Mf ad be csrespnding ROC cuve is INP %

You might also like