0% found this document useful (0 votes)
45 views

AK Softwaretechnologie 1: Institute For Software Technology (IST) Graz University of Technology

This document summarizes an empirical evaluation of an L-debugger and describes an approach to automated program repair using genetic programming. The L-debugger was tested on various L programs and test suites, with the Ochiai coefficient always locating errors and the Jaccard and Tarantula coefficients sometimes failing. An automated repair approach using genetic programming evolves new program variants using test cases until one passes. It makes changes based on existing statements and only modifies error-related code regions. Experiments showed it can repair programs but more test cases are needed for general fixes.

Uploaded by

mathhelp112
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
45 views

AK Softwaretechnologie 1: Institute For Software Technology (IST) Graz University of Technology

This document summarizes an empirical evaluation of an L-debugger and describes an approach to automated program repair using genetic programming. The L-debugger was tested on various L programs and test suites, with the Ochiai coefficient always locating errors and the Jaccard and Tarantula coefficients sometimes failing. An automated repair approach using genetic programming evolves new program variants using test cases until one passes. It makes changes based on existing statements and only modifies error-related code regions. Experiments showed it can repair programs but more test cases are needed for general fixes.

Uploaded by

mathhelp112
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Institute for Software Technology (IST) Graz University of Technology

AK Softwaretechnologie 1
716.174 and 716.175

Practical Exercises Task 3

Josef Wachtler, 0730659

Last update: January 12, 2012

Contents
1 Introduction 2 Empirical Evaluation of the L-Debugger 2.1 L-Programs and Testsuites . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Automatically Finding Patches Using 3.1 Basic Concepts . . . . . . . . . . 3.2 Algorithm Details . . . . . . . . . 3.2.1 Program Representation . 3.2.2 Genetic Programming . . 3.2.3 Genetic Operators . . . . 3.3 Experiments . . . . . . . . . . . . 4 Conclusion Genetic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 3 3 3 5 5 8 9 9 10 10 11

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

1 Introduction
This document presents the results of Task 3 of the lecture AK St1 [AKSt1]. First an empirical evaluation of a debugger for the programming language L which was developed within Task 2 is documented. For that the used L-Programs and testsuites are shown and the obtained results are explained and evaluated. The second part of this document describes an automated program repair approach using genetic programming [Weimer2009]. This is done by summing up a detailed report of this algorithm.

2 Empirical Evaluation of the L-Debugger


This part shows an empirical evaluation (Task 3) of the L-Debugger which was developed within Task 2 [AKSt1]. For that the debugger was tested with some test-programs and -suites (see Section 2.1) and in Section 2.2 the obtained results were compared and evaluated. The L-Debugger uses the Spectrum-Based-Debugging approach [Wotawa2011] [Abreu2006] [Chen2002] [Jones2005] which is based on the assumption that an erroneous statement is executed more often in a failing test run than in a passing one. So this approach computes three rankings of statements using three dierent coecients (Jaccard, Tarantula and Ochiai). The computed coecients indicates the probability of a statement to be error prone. Due to that the statements/lines with the highest coecients forms a set of the primary location of the error as the result of the debugger. This implies that the quality of the result set is represented on the one hand by the coecients and their success rate. On the other hand the result set quality depends on its size in comparison to the size of the program.

2.1 L-Programs and Testsuites


To evaluate the functionality of the debugger it was tested with some L-Programs and testsuites. Table 1 lists them with a short description and shows the number of lines of code (LOC) and the erroneous line number. All testles can be found in testdata.zip.

2.2 Results
The results of the performed tests are presented in a compact form by Table 2. For each test the number of lines of code, the coecient values and some properties of the primary error sets are shown. The coecient values are the highest ones computed by the debugger if they addresses the erroneous statement. If the error prone statement was not found by a coecient the cell of the table contains wrong result.

Testsuite L-Program Description LOC Divide.TC (7 Testcases) - Divides two numbers incrementally. Divide F1.L Wrong sign correction of the divi- 26 sor. Divide F2.L Result counter is decremented in- 26 stead of incremented. Divide F3.L Wrong start value of the result 26 counter. Multiplication.TC (10 Testcases) - Multiplies two numbers by adding Multiplication F1.L Wrong sign computation. 41 Multiplication F2.L The factor is multiplied instead of 41 added. Multiplication F3.L Wrong sign correction of the second 41 factor. ProdSum.TC (5 Testcases) - Operates on array values. ProdSum F1.L Wrong array size used. 23 ProdSum F2.L Wrong start value of the sum vari- 23 able. adder.ts (6 Testcases) - Adds two numbers incrementally. adder E1.l Wrong counter start value. 10 adder E2.l Wrong counter incrementation. 10 adder E3.l Wrong loop condition. 10 adder E4.l Loop stop condition to high. 10 sort.ts (7 Testcases) - Sorts an array with BubbleSort. sort E1.l Wrong swap start value. 31 sort E2.l Wrong array length used for the in- 31 ner loop. sort E3.l Wrong condition of the main loop. 31 gcd.ts (5 Testcases) - Computes the greatest common divisor. gcd E1.l Wrong condition of the loop. 13 gcd E2.l Wrong result assignment in the 13 loop. Table 1: The performed tests.

Error in Line 12 20 17 one factor n-times. 40 35 10

12 4

2 7 4 4 3 11 5 4 9

The primary error sets lists for each coecient the number of lines of code (Size) with the highest coecient and its proportion (%-Code) of the complete program. This means for instance that the developer has to check 7.69% of all lines of the program if it has 26 LOC and the debugger narrows the location of the error down to 2 lines. So the quality of the debugger increases if the size of the primary result set decreases. Based on the obtained test results the following observations could be made: The Ochiai coecient always locates the erroneous statement/line. The Jaccard coecient does this also however with minor probabilities (e.g. adder E3.l). The Tarantula coecient sometimes fails completely (e.g. adder E1.l). A well sized primary error set is achieved if the execution trace diers for a large number of testcases (e.g. Multiplication F1.L). If the execution trace is always the same or the error does not inuence it all statements are considered to be erroneous (e.g. gcd E2.l).

3 Automatically Finding Patches Using Genetic Programming


An automated program repair approach which uses genetic programming presented by [Weimer2009] is summarized in this section. First the basic concepts and some details of the algorithm are discussed. In addition some experiments with this approach are shown. The importance of such an algorithm is shown by the fact that some reports claims that software maintenance produces 90% of the total costs of a typical software project [Seacord2003]. The major parts of these costs are generated by modifying existing code and repairing defects [Ramamoothy1996]. Due to that this algorithm is able to repair program defects automatically.

3.1 Basic Concepts


This approach repairs o-the-shell legacy applications and for that it needs no dicult program annotations or special coding practices. This means that it takes as input a program, a set of successful testcases and a failing testcase. Now it uses genetic programming to evolve new variants of the program until one is found which passes both, the positive testcases and the failing testcase.

L-Program

LOC

Divide F1.L Divide F2.L Divide F3.L Multiplication F1.L Multiplication F2.L Multiplication F3.L ProdSum F1.L ProdSum F2.L adder E1.l adder E2.l adder E3.l adder E4.l sort E1.l sort E2.l sort E3.l gcd E1.l gcd E2.l

26 26 26 40 40 40 23 23 10 10 10 10 31 31 31 13 13

Coecients Tarantula Ochiai Primary Error Sets Size % Code Size % Code Size % Code 1 1 1 2 7.69% 2 7.69% 2 7.69% 1 1 1 2 7.69% 6 23.08% 2 7.69% 1 0 1 6 23.08% 26 100% 6 23.08% 0,8 wrong result 0,894427 7 17.5% 7 17.5% 0,5 0,8 0,707107 2 5% 2 5% 2 5% 0,8 wrong result 0,894427 7 17.5% 7 17.5% 0,8 0,5 0,894427 10 43.48% 10 43.48% 10 43.48% 1 0 1 10 43.48% 23 100% 10 43.48% 0,5 wrong result 0,707107 3 30% 3 30% 0,333333 0,714286 0,577350 2 20% 2 20% 2 20% 0,5 0,5 0,707107 3 30% 3 30% 3 30% 1 0 10 10 100% 10 100% 10 100% 0,714286 0,5 0,845154 6 19.35% 6 19.35% 6 19.35% 0,428571 wrong result 0,654654 10 32.26% 10 32.26% 0,285714 0,5 0,534522 4 12.9% 4 12.9% 4 12.9% 0,8 wrong result 0,894427 3 23.08% 3 23.08% 0,8 0,5 0,894427 13 100% 13 100% 13 100% Jaccard

Table 2: The results of the tests.

Genetic Programming (GP) is a method which is taken from biological evolution and is transfered to t into the context of computer science. This means that it manages a population of program variants and uses computational analogs of biological mutation and crossover to produce new variants of a program. The suitability of each program is evaluated with a tness function for continued evolution. Because GP has to handle an innite-sized search space to nd a correct program two restrictions are necessary: 1. The changes should be based on statements which already exists at other places in the program. So only existing statements should be inserted. 2. The genetic operations should only operate on regions of the program which are on the execution path which produces the error and not on the path of a correct passing execution. For a better understanding lets consider a little example. Listing 1 shows a C program which computes the greatest common divisor of two integers. It fails if a is zero and b is positive. It is possible to code this error with a testcase gcd(0, 55) with the expected terminating result 55. Listing 1: A C function which computes the greatest common divisor has a bug: it loops forever if a is zero and b is positive [Weimer2009]. / r e q u i r e s : a >= 0 , b >= 0 / void gcd ( int a , int b ) { i f ( a == 0 ) { p r i n t f ( %d , b ) ; } while ( b != 0 ) i f (a > b) a = a b; else b = b a; p r i n t f ( %d , a ) ; exit (0); } If the algorithm is applied only with this one testcase it might produce something like gcd 2 which is shown in Listing 2. It will delivers the correct result for this testcase but it will fail for other testcases (e.g. gcd 2(1071, 1029) = 21). This shows the importance of providing also a set of passing testcases. Listing 2: A repaired gcd program which is produced by the algorithm if only one failing testcase is used [Weimer2009]. void g c d 2 ( int a , int b ) { p r i n t f ( %d , b ) ; exit (0); }

1 2 3 4 5 6 7 8 9 10 11 12 13

1 2 3 4

To limit the regions in the code which could be changed only statements which occur in the execution path of the failing testcase and not in the passing testcases should be evolved. In the example the passing testcase visits the lines 2-3 and 6-13 and the failing testcase visits the lines 2-5, 6-7 and 9-10. So the lines 4-5 are suitable for changing. Because the number of statements which could be inserted when applying genetic operations is still huge the assumption that the faulty part is handled correctly at a dierent location in the code should be made. This means that the mutating operation which could insert, modify or delete code should only use statements which exists already in the code for inserting. Listing 3: The primary repair [Weimer2009]. void g c d 3 ( int a , int b ) { i f ( a == 0 ) { p r i n t f ( %d , b ) ; exit (0); // i n s e r t e d a = a b; // i n s e r t e d } while ( b != 0 ) i f (a > b) a = a b; else b = b a; p r i n t f ( %d , a ) ; exit (0); } With the mentioned restrictions and a suitable number of testcases GP will produce a working variant (see Listing 3) which is called the primary repair. Because GP tends to produce unnecessary statements (line 5) a nal post processing step is required. This means that a minimal subset of the changes between the faulty program and the primary repair which also passes all testcases will be the nal repair.

1 2 3 4 5 6 7 8 9 10 11 12 13 14

3.2 Algorithm Details


Now this section describes the algorithm presented above more general and detailed. So Figure 1 shows the complete algorithm in a high-level pseudo code. It takes a faulty program, a set of passing testcases and a set of failing testcases as the input and returns a repaired variant of the program. First the algorithm does some initial pre processing and constructs a suitable representation of the program (lines 1-3). This means that the visited path of the testcases is computed and combined with the weights (see Section 3.2.1). Now line 4 constructs an initial population of the program variants. The main GP loop which searches for a program variant that passes all testcases is coded in the lines 5-16 (see Section 3.2.2).

In this loop line 6 removes all variants which fails for every testcase and line 9 takes a random sample from the remaining population by favoring variants which passes more of the testcases. Now line 10 performs the genetic crossover and line 14 applies the mutations operator (see Section 3.2.3). This produces a population which is the input for the next iteration of the main loop. If a program variant which passes all testcases is found the loop terminates and nally some post processing is done to eliminate unneeded statements and the repaired program is returned.

Figure 1: The complete algorithm in a high-level pseudo code [Weimer2009].

3.2.1 Program Representation The algorithm represents each program variant as a pair which contains the abstract syntax tree of all statements of the program and a weighted path through that program. This path is a list of pairs where each pair assigns a weight to a statement. The weight is based on the occurrence of the statement in the dierent testcases. 3.2.2 Genetic Programming The stochastic search method genetic programming manages a population of program variants and applies crossover and mutation on it to produce a new generation of it. To decide if a new variant is suitable to become a member of the new generation a tness function is used. In this algorithm the tness function computes a weighted sum of the passing testcases of the variant in question. In addition the tness function is also used as the terminating condition for the main GP loop of the algorithm.

3.2.3 Genetic Operators Mutation operator: Figure 2 shows the pseudo code of the mutation operator which operates only on statements on the weighted path. To decide which statements should mutate the path weights are considered. There are three possible ways to mutate a statement: it could be deleted, it could be swapped with an other statement or a dierent statement could be inserted after it. The mutate action is chosen randomly. Crossover operator: The pseudo code in Figure 3 shows the crossover operator which is used by the algorithm. It only crosses over statements which are along the weighted path. For that it chooses a cuto point and swaps the following statements.

Figure 2: The mutate [Weimer2009].

operation Figure 3: The crossover [Weimer2009].

operation

3.3 Experiments
To evaluate the presented approach for automated program repairing some open source benchmark tests were performed. Table 3 lists them and their fault. For each program one failing testcase and a little number of passing testcases were used. The results of the tests are summarized in Table 4. The column Initial Repair shows the average performance for one trial for Time (the average time of each successful trial), tness (the average number of tness evaluations in a passing trial), Success (number of trials which resulted in a repair) and Size (the average number of changes in lines). Minimized Repair presents the same information however after the minimizing post processing. In addition to the listed results the experiments showed that there is a tradeo between a fast repair and the number of testcases. This means that on the one hand a

10

Table 3: The used benchmark programs for the experiments [Weimer2009]. small number of testcases results in a rapid but probably not complete repair because the algorithm aggressively deletes erroneous statements which are not used by the program runs of the testcases. On the other hand a higher number of testcases which covers more parts of the program will lead to a more qualitative program but it also increases the repair time.

Table 4: The results of the experiments [Weimer2009].

4 Conclusion
In conclusion it is worth to say that debugging computer programs is a dicult work which produces the major part of the total costs of a software project. Due to that the usage of more or less automated error detection or correction methods is highly required. The Spectrum-Based-Debugging approach is often able to narrow down the number of possible erroneous statements. However the resulting set of error prone lines still could be big and so it is recommended to combine it with a further debugging approach to increase the diagnostic accuracy.

11

Furthermore an automated program repair approach which uses genetic programming was summarized in Section 3. This shows that the concepts of biological evolution also ts into the context of automatically repairing programs.

References
[AKSt1] Franz Wotawa. Lecture ausgewaehlte kapitel aus softwaretechnologie 1 (ak st1) practical exercises, 10 2011. 3

[Abreu2006] Rui Abreu, Peter Zoeteweij, and Arjan J.C. van Gemund. On the accuracy of spectrum-based fault localization. In Proceedings TAIC PART07, page 8998. IEEE, 2006. 3 [Chen2002] M.Y. Chen, E. Kiciman, E. Fratkin, A. Fox, and E. Brewer. Pinpoint: Problem determination in large, dynamic internet services. In Proceedings of the International Conference on Dependable Systems and Networks (DSN), page 595604. IEEE Computer Society, Washington, DC, USA, 2002. 3 [Jones2005] J. A. Jones and M. J. Harrold. Empirical evaluation of the tarantula automatic faultlocalization technique. In Proceedings ASE05, page 273282. ACM Press, 2005. 3 [Ramamoothy1996] C. V. Ramamoothy and W.-T. Tsai. Advances in software engineering. IEEE Computer, 29(10):4758, 1996. 5 [Seacord2003] R. C. Seacord, D. Plakosh, and G. A. Lewis. Modernizing Legacy Systems: Software Technologies, Engineering Process and Business Practices. Addison-Wesley, 2003. 5 [Weimer2009] Westley Weimer, ThanhVu Nguyen, Claire Le Goues, and Stephanie Forrest. Automatically Finding Patches Using Genetic Programming, 2009. 3, 5, 7, 8, 9, 10, 11 [Wotawa2011] Franz Wotawa. A brief introduction to model-based software debugging, 10 2011. 3

12

You might also like