Softwaretesting Codecoverage
Softwaretesting Codecoverage
Grid‐Tools White Paper Series
Practical approaches to improving your testing
by maximising code coverage in complex database
and SOA environments
By Huw Price
Introduction
Most testers have experience of test automation
tools such as QTP, Facilita, etc.. and, to a greater or lesser degree, managing the data input and outputs
from complex tests. Some testers have exposure to code coverage toolsets while all are familiar with the
concepts. Many academics (indeed it seems to be the main research area of a large part of academe),
work on code coverage tools and analytic techniques to improve software design
(see http://crest.dcs.kcl.ac.uk/ run by Dr Mark Harman of KCL). However, there are few companies that
have an end‐to‐end integrated approach to maximizing their code coverage during testing.
Before continuing, it is worth noting a few practical issues:
Most applications are built using disparate modern and legacy technologies for which code
coverage tools do not always exist.
Setting up code coverage can be time consuming and may not be worth the effort.
Specifications and documentation are not always of a high standard.
Testing is usually done at the end of a cycle and is not embedded into a development lifecycle.
This paper will cover some of the techniques and tools to build the most accurate input to tests and also
the practical methods and tools of creating the data to run these test. Simply building an optimized set of
inputs is just the start of a solution. Testers need to be able to explode these tests out into real data in
complex databases and SOA environments.
www.grid‐tools.com UK +44 01865 988542 US: +1 866 555 5555 info@grid‐tools.com 1
Code or Functional Coverage
Before starting, it is worth considering the differences between full code coverage and full functional
coverage. There is a lively debate as to whether 100% code coverage is a) attainable and b) necessary. As
an ex‐programmer, I would suggest that 100% code coverage is not really attainable and, in fact, the
better the programmer the LESS likely is this to be attained. A simple example of this is an error trap; most
code will look ‐up data. For example, in a Customer Credit Limit check in an authorization module, if the
customer is not found, an error will be raised. In a real suite of programs the chances of the customer
record being deleted as you move from one program to another are extremely small and would cause
much larger issues than the failure of one line of code. Testing this particular error trap, and there could
be hundreds in one program, would be, in my opinion, a waste of time. Testing a standard error trap,
however, would be worthwhile. Forcing a failure of one or two missing data records to test the overall
effect of missing data is worth the effort.
So, should 100% code coverage be the goal? In my opinion no, and based on research by Richard Bender,
90% code coverage would be the maximum that can be expected. A more effective and realistic goal,
however, is a 100% functional coverage. In other words, all of the specified requirements are satisfied by
the tests. In this article I will refer to code coverage, but the techniques are similar for both goals.
White or Black Box Testing
Code coverage assumes you have access to the code and can see all the paths through the code. Black box
testing reverses this whereby; you only have control over the input and the ability to monitor any output.
Code coverage tends to be used in component testing, that is, where you have discrete sections of code
with defined input and outputs. In a more typical test scenario, larger groups of components need to be
tested as one unit. With the advent of complex SOA environments, complete end‐to‐end tests need to be
designed and harder tasks implemented. In addition, testers need to balance the need for complete
coverage with only limited time to test. The key challenge is to be more efficient and more effective.
From a practical point of view, having access to complete code gives you very useful information that can
be used in designing tests. However, a balanced approach needs to be used for both black and white box
testing. From a tester’s point of view, this is all “just testing” with more or less information.
www.grid‐tools.com UK +44 01865 988542 US: +1 866 555 5555 info@grid‐tools.com 2
Trillions of Combinations
Hardware engineering tends to have far greater success at eliminating failures than software. The reasons
for this would be a digression if described in this article. Having stated this, however, one of the
techniques vey successfully used is optimizing the test inputs. A simple group of on/off switches that can
be set in particular orders quickly grows to trillions of combinations. Hardware engineers have developed
techniques to eliminate redundant tests, identify key relationships and make the very large numbers small.
The application of some of these techniques to software means the amount of code exercised can be
significantly increased whilst keeping the combinations of test inputs to a minimum.
Cause and Effect
One very useful methodology, as outlined by Richard Bender, is to identify cause and effect actions. This is
where combinations of input cause either explicit actions or intermediate “nodes” to be set. The
identification of these cause and effect graphs are an effective way of optimizing tests inputs.
A
C
Or
B
And G
D
Or
F
E
Figure 1‐ Cause Effect Diagram
In this case, nodes F and C can be turned on by A or B, or D or E respectively so varying combinations of D
and E and A and B will have little effect on the result G. These nodes can be identified within specific
programs or at a higher level within the application. For example, a customer may have a low credit score.
Setting this as a discrete node as part of the test input rather than having to set up multiple customers,
some of whom have a low credit score, will reduce the test inputs for upstream testing.
The identification of what these cause and effect relationships are is based on clear, unambiguous
specifications. A more subtle and difficult area for the tester is to identify which of these relationships are
important, i.e. which ones to ignore. The tester will also have to decide which of the many data inputs
need to be included as “core” to the testing. Some data elements may not be that important and can be
left out.
www.grid‐tools.com UK +44 01865 988542 US: +1 866 555 5555 info@grid‐tools.com 3
Requirements Parsing
While access to the code is a useful way of identifying these “nodes”, clear requirements are essential to
building test input. Ensuring that specifications are clear without ambiguity is essential. A simple AND or
OR out of place or not clearly defined can cause confusion all the way along the development lifecycle. In
the code coverage example below, a comma out of place could create an ambiguous definition.
If the customer is a business client or a preferred personal client and they have a
checking account, $100,000 or more in deposits, no overdraft protection and
fewer than 5 overdrafts in the last 12 months, set up free overdraft protection.
Else, do not give overdraft protection
Code Coverage – Overdraft Protection Example
Coverage Worked Example
If a specification is not clear and is not validated prior to development, the following can happen:
Figure 2 – Ambiguous Specifications
www.grid‐tools.com UK +44 01865 988542 US: +1 866 555 5555 info@grid‐tools.com 4
A very useful technique to validate specifications is to use test parsing and analysis tools that can quickly
identify ambiguous definitions early on in the software development life cycle.
The input from the clear text parsers can be validated and is a valuable resource as input to test case
generators such as BenderRBT.
The clear text can be parsed and synthesized. For example, our overdraft example would be parsed to the
following:
Figure 3 ‐ Bender‐ RBT Requirements Synthesis Output
www.grid‐tools.com UK +44 01865 988542 US: +1 866 555 5555 info@grid‐tools.com 5
The Code to be Tested
Once the requirements have been validated, the
developers can begin coding or more likely
amending an existing system.
Begin
select nvl(count(*),0)
into wk_ocount
from bank_account a,
bank_account_overview v
where ba_bc_id = wk_customer.bc_id and
v.bo_ba_id = a.ba_id and
v.bo_min_bal < 0;
exception
when no_data_found then
wk_ocount := 0;
end;
wk_int1 := 'N';
if wk_customer.bc_prefered = 'Y' or wk_business = 'Y' then
dbms_output.put_line('line 5');
wk_int1 := 'Y';
end if;
wk_int2 := 'N';
if wk_checking = 'Y' and
wk_customer.ba_total_holdings >= 100000 and
wk_overdrawn = 'N' and
wk_ocount < 5 and
wk_protect = 'N' then
dbms_output.put_line('line 6');
wk_int2 := 'Y';
end if;
if wk_int1 = 'Y' and wk_int2 = 'Y' then
dbms_output.put_line('line 7');
wk_protect := 'Y';
else
dbms_output.put_line('line 8');
wk_protect := 'N';
end if;
Figure 4 – PL/SQL code to implement overdraft protection
www.grid‐tools.com UK +44 01865 988542 US: +1 866 555 5555 info@grid‐tools.com 6
Building Test Input
Once the requirements have been clearly defined, the design of the test inputs can begin. There are
numerous algorithms, tools and services that can help you with this task. I will use our overdraft example
and compare four different methods:
All code combinations
All Pairs
Jenny
Bender‐RBT
A strong contender I did not include is the following web service; http://www.testcover.com. I would
recommend testers to take a look at this site and to sign up for a trial, as they offer a service that is
continually being updated with the latest academic research. Other practical techniques include:
Equivalence Partitioning
Boundary Value analysis
Bender RBT
Figure 5 – Bender RBT cause effect builder
www.grid‐tools.com UK +44 01865 988542 US: +1 866 555 5555 info@grid‐tools.com 7
As you can see via in figure 5, the clear specification of AND and OR is easy to identify and verify. The
identification of intermediate Nodes is crucial to the creation of smaller sets of test input.
The output of the RBT tool will create a set of tests that can be used as input to test the process.
Figure 6 – RBT definition and coverage matrix
As you can also see from figure 6, tests take advantage of the INT‐1 intermediate node to reduce the set of
tests. In addition, the production of an expected result will reduce the time checking any results.
www.grid‐tools.com UK +44 01865 988542 US: +1 866 555 5555 info@grid‐tools.com 8
Figure 7 – RBT Summary statistics
Reports of estimated coverage are invaluable in being able to gather code coverage statistics.
www.grid‐tools.com UK +44 01865 988542 US: +1 866 555 5555 info@grid‐tools.com 9
Once the test inputs have been defined they can be fed into tools such as Datamaker to generate the
physical data either as input to capture replay tools, directly into databases or as flat files which are fed
into applications.
Figure 8 – RBT test cases captured in Datamaker ready to generate the physical data
Test Data input was generated using three other methods: All Pairs, Jenny and All code combinations.
All Pairs
All Pairs or pairwise testing is a combinatorial testing method that, for each pair of input parameters, tests
all possible discrete combinations of those parameters. Tools such as Datamaker will automatically
generate these combinations for you.
Figure 9 –All Pairs test cases captured in Datamaker ready to generate the physical data
Jenny
Jenny is a tool for generating regression tests. It will cover most of the interactions with far fewer test
cases. It can guarantee pairwise testing of all features that can be used together, and it can avoid those
feature combinations that cannot.
Figure 10 –Jenny cases captured in Datamaker ready to generate the physical data
www.grid‐tools.com UK +44 01865 988542 US: +1 866 555 5555 info@grid‐tools.com 10
All code combinations
All combinations of variables are defined and, in this case, this is manageable. If you add a few more
variables, however, the number of combinations would grow rapidly.
Figure 11 –All code combinations generated and captured in Datamaker ready to generate the physical data
Building the Physical Data
Once you have decided on your test cases, you need to
prepare the physical data to test your application. The
creation of data to satisfy the test usually takes 3 to 5 times
more effort than it takes to design the test cases
themselves, populating data directly into databases; via
APIs or directly into disparate applications is hard work and
must be factored in. In our case, we use Datamaker to
populate the data directly into the database and, in our
example, the data is held in an Oracle database and is
spread across multiple tables.
As the program we are testing is a batch program, the tester
needs to build the data to test the code. The data must
contain the combinations of attributes we have identified
from our test analysis. These vary from 6 test cases for RBT
to 128 for all combinations.
www.grid‐tools.com UK +44 01865 988542 US: +1 866 555 5555 info@grid‐tools.com 11
As a tester tasked with testing our code, you are immediately confronted with the problem of “how do I
create these test cases”? The options you have are:
Creating the data using the application
You could enter the data by hand for each test case which takes time. Creating an account and entering
transactions such that it has been overdrawn over five times in the last year could be a very complex task.
A similar level of complexity could exist for all variables. In addition, the tester would then have to under‐
stand how to use the application in areas they are not familiar with. In our example, the credit control
testing team would have to run the banking batch transaction system to simulate overdrawn transactions.
Searching for data that matches your criteria
Many sites copy production data into development and testing environments. Nowadays this opens up all
sorts of data protection issues; however, it is still very common practice. Once you have existing data you
can construct queries to search for the combinations of data that match your test cases. If you are lucky
you will find them all very quickly. The problems with this technique are that you are in effect re‐coding
the code that the developer has created to track down the data. For complex queries this may be difficult
and the queries may be slow due to the size of the database. In addition, it is likely you will not find all of
your test cases. By its very nature increasing the code coverage takes you to areas that are very rare in
existing data, as the majority of production data is very similar.
Hacking existing data
A very common technique is to find an account holder who is close to our first test case then go in and edit
the data to match the first criteria. This process can then repeated by either running the batch program
each time and checking the result for the account holder, or by tracking down similar account holders and
directly editing different account holders one for each of our test cases.
In our example the seven variables are spread out across six different tables. In the real world, however,
this is likely to be larger.
Figure 12 –Customers have multiple accounts of different types with a summary of monthly activity
www.grid‐tools.com UK +44 01865 988542 US: +1 866 555 5555 info@grid‐tools.com 12
Generating Data
This is by far the most effective method of creating perfect
test data. The advantage is that you produce the right kind
of data you need ensuring accuracy and the right spread of
data guaranteeing the data is as you need it. In addition the
next time you need to test you will already have test data
that can be modified slightly and used with slightly different
combinations of data criteria.
For our case, we follow a few simple steps:
1. Copy a customer into Datamaker repository. This will
be used as the basis for the generation of all
required test data cases.
2. Create variables for which you need to pivot the data,
i.e. variables that can control what changes as
the data is generated. In our case we have created
variables that map directly with our test case de
sign.
Figure 13 –Create Substitution variables for each test case variable, these can be changed when each
test case is generated
www.grid‐tools.com UK +44 01865 988542 US: +1 866 555 5555 info@grid‐tools.com 13
3. For each of the variables change our data template (this is the original copied customer which has
been turned into a generic data object) to affect the columns in the data that control our desired
data effects. For example if the variable BANKOVERDRAWN is set to Y place a ‘‐‘ minus sign in front
of the column BA_CURRENT_BALANCE.
Figure 14 –In Datamaker the column BA_CURRENT_BALANCE will become negative if BANKOVERDRAWN is set to Y
www.grid‐tools.com UK +44 01865 988542 US: +1 866 555 5555 info@grid‐tools.com 14
4. Once you are happy with the definitions, generate a few test cases by hand and check that the
results look okay in the application.
Figure 15 –A single test case data generation Datamaker the column BA_CURRENT_BALANCE will become negative if
BANKOVERDRAWN is set to Y
5. Use each test case input to drive each publish. In this case we have cleared down the data in the
schema as part of the publish.
Figure 16 –The RBT test case physical data creation using Datamaker
www.grid‐tools.com UK +44 01865 988542 US: +1 866 555 5555 info@grid‐tools.com 15
A similar approach could be used when generating data into XML or flat file.
Figure 17 –XML forma data created by Datamaker based on the RBT test case design
Expected Results
With products such as RBT, it is possible to derive the
expected results from the test conditions. The cause and
effect analysis will be able to produce an expected result.
With Datamaker we have taken this result and included it in
the data publishes. This allows the result to be easily checked
against the actual result the program created. We used the
column BC_NOTES to include a value EXPECTED=Y or
EXPECTED=N it is easy then to run a query to compare
expected against actual.
If a column is not available in any of the tables, you can use
other techniques such as updateable views, triggers, etc to
track and check these results.
With most test case design algorithms, the expected results
are not generated. This means that the expected results will
have to be calculated by hand, a very strong reason to try and
reduce the number of test cases.
www.grid‐tools.com UK +44 01865 988542 US: +1 866 555 5555 info@grid‐tools.com 16
Code coverage results for
overdraft protection
After generating the data directly into the database using
Datamaker for each of the test case combinations, the code was
run and code coverage statistics gathered.
Number of test Expected results 100% code
cases coverage
All Pairs 8
Jenny 8
All 128
Combinations
RBT 6
Interestingly, the RBT methodology and tool outperformed all
other methods. This is not that surprising, as the use of the
internal code “nodes” allows the test cases to be significantly
optimized. It would be possible to use triples or quadruple
combinations of codes as part of the input rather than the paired
used by jenny and all pairs, although this would increase the number of combinations.
Based on our practical observations of larger more realistic test scenarios we would expect All Paired data
inputs to result in about 25% to 50% coverage which is not a bad start however without adding in more
sophisticated techniques such as Equivalence Partitioning and cause and effect mapping it will be very
difficult to attain the goal of a 100% coverage.
www.grid‐tools.com UK +44 01865 988542 US: +1 866 555 5555 info@grid‐tools.com 17
Summary
Increasing code coverage is the route to improved testing
Validate requirements for ambiguity, use text parsing tools to validate clear text
Identify cause and effect relationships
Propagate errors to an observable point
Use advanced tools to generate minimum sets of test input combinations
Do not use copies of production data in your testing, they are too large and will not contain the
specific test cases you need for maximum code coverage
Modify existing data is error prone, time consuming and is not repeatable
Model existing data to generate and pivot values to create the perfect data for testing
References
Thanks to Richard Bender of Bender RBT Inc. for his input to this white paper. For more information on
BenderRBT see www.BenderRBT.com. See also www.testcover.com Testcover.com, LLC for excellent
information on test coverage as well as any article by Hans Schaefer who has written some excellent
In‐depth articles on test design.
www.grid‐tools.com UK +44 01865 988542 US: +1 866 555 5555 info@grid‐tools.com 18
About Grid‐Tools Ltd
Grid‐Tools are specialists in test data creation, test data
management and information lifecycle management. Their
experienced personnel have been writing and developing
solutions for large companies in both the private and public
sectors for over 15 years. The Grid‐Tools Datamaker Suite
includes a wide range of tools for test data management
including such innovative products as Datamaker
(a revolutionary tool that creates and publishes quality test
data from production environments for development and
testing and places this data in a central repository), DataShrink (for subsetting and shrinking databases),
Data Test Professional (for managing the data feeding performance tools) and Data Archive (providing a
different, more efficient approach to archiving).
Within a short span of time, Grid‐Tools have picked up significant momentum and an impressive list of well
known and respected customers and strategic partners world‐wide.
The Grid‐Tools methodology consists of using the “data‐centric” approach to testing whereby, their focus
is to ensure the quality of the test data you are using is of the right quality for successful testing.
About Huw Price
With over a 20 year career, Huw Price has been the lead
technical architect for several US and European software
companies. Specializing in test automation tools, he has
launched numerous innovative products which have
re‐cast the testing model used in the software industry.
As well as building new products, he has become a serial
entrepreneur building‐up three software companies and
selling their technology to larger, less agile competitors.
Huw has provided high‐level architecture design support
to multinational banks, major utility vendors and health
care providers. A strong believer in balancing pragmatism
with a visionary approach, he has been able to rapidly
bring new products to market while maintaining strong
quality control.
Huw’s newest venture, Grid‐Tools, has quickly redefined how large organizations need to approach their
testing strategy. Grid‐Tools has introduced a strong data‐centric approach to testing, launching new
concepts such as “Data Objects”, “Data Inheritance” and “A Central Test Data Warehouse”. Currently
working with leading edge testing companies such as Fiorano, Facilita and Emprix, Grid‐Tools are building
on the strategy of improving automation tools and associated structured methodologies.
www.grid‐tools.com UK +44 01865 988542 US: +1 866 555 5555 info@grid‐tools.com 19