Case Study
Case Study
The Mobile project uses a set of project-specific coding standards derived from
general Samsung coding standards. The Mobile project standards could be rewritten
by taking into account the language variances and other development constraints.
For example, some compilers used in the Mobile project do not support exception
handling. Therefore, it is impossible to detect resource allocation failures while an
object's constructors are called. The Mobile project solves this problem by
employing the well-known two-phase object construction technique described in the
Mobile project coding standards: Divide an object initialization into the object
allocation phase and the resource allocation phase to return the exception as a
value (see Listing One).
class ResourceManager
{
ResourceManager(); // allocate only object
result Construct(); // allocate resources
// 'result' contains error
code
};
int main()
{
// Two phase construction
ResourceManager aObject;
if (aObject.Construct() == FAIL)
printf("Resource allocation is failed");
}
Listing One
Additionally, the Mobile project requires rigid conformance to coding standards. This
project mainly targets a software framework to be used by other developers; it
must be consistent and well organized to facilitate software development on this
framework. The more project constraints there are, the greater the necessity for an
automated tool. That's why the Mobile project adopted a coding standards checker.
One of C++test's distinct features is its GUI-based rules. Figure 1 shows the GUI
rule description for the rule "Each global variable must be initialized." Whenever a
"global variable" is found while parsing the source code, the rule evaluates logic
components. A violation is reported if any of the following conditions are not met:
The GUI interface simplifies rule creation. Most C++ code checkers require scripting
for rule creation; this is difficult and requires more C++ programming knowledge.
GUI-based rules can be easily understood and implemented because the available
conditions are shown graphically. Having GUI-based rules could compromise
extensibility because only predefined nodes and condition sets can be selected in a
GUI. However, C++test provides Python scripting support to ensure extensibility.
Our selection criteria included product usability features, but not specific details
(rule selection, rule execution, and scalability). We established these criteria from
our experiences with previous coding standards checkers, feedback from project
development teams, and product evaluations.
Direct benefits refer to how reducing the number of violations improved code
quality. Indirect benefits are other unexpected developer benefits.
Figure 2 shows the overall trend for the number of coding standards violations. To
eliminate the effects of rule evolution, we use the latest rule set for all checking.
After a relatively steady number of violations between Nov-04 and Jan-05, there
was a violation reduction of approximately 1/8 in Feb-05. There was an unexpected
increase in violations between Feb-05 and Mar-05, but this was acceptable because
the target project was evolving continuously.
Figure 2: The overall number of violations/KLOC.
Figure 3 shows the trend for the number of violations whose corrections impact only
the current module. Violations decreased 6.6 percent since the date of checker
deployment. This data does not cover comment rules, which usually don't have
large change impacts but do require significant change efforts.
Figure 4 shows the trend for the number of violations whose corrections impact
more than the current module. Violations decreased 19.6 percent since the date of
checker deployment.
Another indirect benefit was the removal of unrealistic coding standards items.
When we applied the coding standards rules to the Mobile project, we found that
most developers did not obey the rule "Do not exceed 80 columns per line." This
rule was developed because some old development environments had 80-column
displays. We examined our development environments and concluded that in our
situation, such limited development environments are rarely used and longer
columns are preferred. Consequently, we changed the column limitation from 80 to
150 columns per line. This kind of coding standards modification has improved
developer buy-in to coding standards compliance.
Lessons Learned
Until recently, our checking involved only the QA team checking coding standards
compliance at the project level. This was effective for tracking defect trends and
maintaining the rule set. However, there were several drawbacks.
First, the reported violation sources do not always match the code in the
development environment. Developers don't usually work on a centralized source-
code repository; they write and modify code in a private area, then copy or check-
in the source code to the centralized repository. If the development source code
differs from the code that QA is testing, it can be difficult for developers to identify
and repair the source of reported violations.
For these reasons, we decided to have the coding standards checker run by
developers as well as the QA team.
Developers who do not carefully follow standards may produce code with many
violations, and are often reluctant to correct the violations. This is especially
problematic with violations from identifier- or design-related rules, which are
difficult to correct in later development phases because making such corrections
can have a large, project-wide impact. We recommend checking coding standards
conformance from the early coding phase so that the violated code is corrected
before it is propagated out of the module.
One reason why the coding standards checking effort with our previous tool failed
was the lack of rule maintenance at the organization level.
After the initial rule set is established, rules should be customized continuously
because some rules might not apply to particular projects and development style
might vary across development domains. A rule set should be maintained
throughout each project's operations as the project constraints evolve.
Conclusion
We have applied the checker to several projects in our organization and achieved
significant code-quality improvements in each case. We plan to apply it to
additional projects and analyze the effects on code quality.
Case Study: Histograms
Suppose that you are presented with a large body of text. Is it possible to tell interesting things
about the authors from the typical length of their sentences, their words, and so on? Some
literary detectives think so; in this case study, your job is to collect these statistics and present
them as a bar graph, otherwise known as a histogram. You already have one tool for the job—
C++ input streams, which read in strings separated by spaces. C++ input streams are well
suited to this task because they are not line based; text is naturally parsed into tokens, which
are usually (but not always) words. For example, "He said, let's go." is read as {"He"
"said," "let's" "go."}. So when you read in a token, you need to look at the end of each
word for punctuation. The basic strategy is to read each word and punctuation character and to
increment word and sentence frequency tables. These tables don't have to be very long, so
arrays will work well.
First, the arrays need to be initialized to zero (never assume that this is already true!). The
standard algorithm fill_n() is easier to use in this case than an explicit for loop:
int word_lengths[MAXWORD];
int sentence_lengths[MAXSENTENCE];
void init_arrays()
{
fill_n(word_lengths,MAXWORD,0);
fill_n(sentence_lengths,MAXSENTENCE,0);
}
Here is collect_file_stats(), which reads each word of the text using >>. Then it looks at
the last character of the word, which is word[len-1] because strings are like arrays. If this
character is punctuation, it must be removed from the string. The word length frequency table
can be updated with a single statement: ++word_lengths[len]. The next few lines count
words in sentences.
Note that this method is just not going to work with word_lengths, since the values are very
large. These values need to be scaled by choosing a maximum value for the display (say 60
characters across) and generating an array within those bounds. The maximum value of the
input data is found as before, and the input data is multiplied by the scaling factor, which is just
the ratio of the desired maximum value to the actual maximum value.
How can you create a histogram that is upright? You start with the maximum value as a level,
and then you run along the array, comparing values to the level. If the values are greater than or
equal to the level, you write out a string, and if they are less than the level, you write out spaces:
This example hardly exploits the supercharged graphics of modern machines, of course. You
can also create histograms by using the Turtle Graphics interface of UnderC for Windows.
Appendix B, "A Short Library Reference" gives you all the information you need about using
Turtle Graphics for now. I will be discussing it further in Chapter 4, "Programs and Libraries."
Containers
Arrays have several disadvantages. Earlier in this chapter we discussed their lack of size
information, which means you must use two arguments to pass an array to a function. It also
means that you cannot check an array index at runtime to see whether it's out of bounds. It is
easy to crash a program by using the wrong index; what is perhaps worse—because the
program seems to work—is that memory can be silently overwritten. All C programmers will tell
you that these are some of the worst bugs to solve. Built-in arrays are also inflexible in that they
have a fixed size that must be a constant. Although it is very fast to access array data randomly,
insertions and deletions are slow.
The standard library defines a number of container types. A container holds a number of
elements, like an array, but it is more intelligent. In particular, it has size information and is
resizable. We will discuss three kinds of standard containers in the following sections: vector,
which is used like a built-in array, but is resizeable; list, which is easy to insert elements into;
and map, which is an associative array. That is, it associates values of one type with another
type.
Resizable Arrays: std::vector
You use the vector container type the same way you use an ordinary array, but a vector can
grow when required. The following is a vector of 10 ints:
vector is called a parameterized type. The type in angle brackets (<>) that must follow the
name is called the type parameter. vector is called parameterized because each specific type
(vector<int>, vector<double>, vector<string>, and so on) is built on a specific base type,
like a built-in array. In Chapter 10, "Templates," I will show you how you can build your own
parameterized types, but for now it's only important that you know how to use them.
vi is a perfectly ordinary object that behaves like an array. That is, you can access any element
very quickly using an index; this is called random access. Please note that the initial size (what
we would call the array dimension) is in parentheses, not in square brackets. If there is no size
(as with v2) then the vector is initially of size zero. It keeps its own size information, which you
can access by using the size() method. You cannot initialize a vector in the same way as an
array (with a list of numbers), but you can assign them to each other. The statement v2 =
vi actually causes all the elements of vi to be copied into v2. A vector variable behaves just
like an ordinary variable, in fact. You can pass the vi vector as an argument to a function, and
you won't need to pass the size, as in the following example:
void show_vect(vector<int> v)
{
for(int i = 0; i < v.size(); i++) cout << v[i] << ` `;
cout << endl;
}
;> show_vect(vi);
1 2 3 4 5 6 7 8 9 10
You can resize the vector vi at any point. In the following example the elements of vi are
initialized to random numbers between 0 and 99. (n % 100 will always be in that range). vi is
then resized to 15 elements:
You can resize the vi vector without destroying its values, but this can sometimes be quite a
costly operation because the old values must be copied. Note that vectors are passed to
functions by value, not by reference. Remember that passing by value involves making a copy
of the whole object. In the following example, the function try_change() tries to modify its
argument, but doesn't succeed. Earlier in this chapter ("Passing Arrays to Functions") you saw a
similar example with built-in arrays, which did modify the first element of its array argument.
At this point, you may be tired of typing vector<int>. Fortunately, C++ provides a shortcut.
You can create an alias for a type by using the typedef statement. The form of
the typedef statement is just like the form of a declaration, except the declared names are not
variables but type aliases. You can use these typedef names wherever you would have used
the original type. Here are some examples of how to use typedef, showing how the resulting
aliases can be used instead of the full type:
Think of typedef names as the equivalent of constants. Symbolic constants make typing easier
(typing pi to 12 decimal places each time is tedious) and make later changes easier because
there is only one statement to be changed. In the same way, if I consistently use VI throughout
a large program, then the code becomes easier to type (and to read). If I later decide to use
some other type instead of vector<int>, then that changes becomes straightforward.
As you have learned, passing a vector (or any standard container) to a function involves a full
copy of that vector. This can make a noticeable difference to a program's performance if the
function is called enough times. You can mark an argument so that it is passed by reference, by
using the address operator (&). You can further insist that it remains constant, as we did earlier
in the chapter for arrays and as shown in the following example:
Generally, you should pass vectors and other containers by reference; if you need to make a
copy, it's best to do it explicitly in the function and make such reference arguments const,
unless you are going to modify the vector. When experienced programmers see something
passed by reference, they assume that someone is going to try to change it. So the preferred
way of passing containers is by const reference, as in the preceding example. You can always
use the typedefnames to make things look easier on the eye, as shown here:
The standard string is very much like a vector<char>, and it is considered an "almost
container." Strings can also be indexed like arrays, so if s is a string, then s[0] would be the
first character (not substring), and s[s.size()-1] would be the last character.
Linked Lists: std::list
vectors have strengths and weaknesses. As you have seen, any insertion requires moving
elements, so if a vector contained several million elements (and why not?), insertion could be
unacceptably slow. Although vectors grow automatically, that process can also be slow
because it involves copying all the elements in the vector.
Lists are also sequences of elements, but they are not accessed randomly, and they are
therefore not like arrays. Starting with an empty list, you append values by using push_back(),
and you insert values at the front of the list by using push_front(). back() and front() give
the current values at each end. To remove values from the ends, you
use pop_front() and pop_back(). The following is an example of creating a list:
You can remove from a list all items with a certain value. After the remove operation, the list
contains only "two":
Associative Arrays: std::map
In mathematics, a map takes members of some input set (say 0..n-1) to another set of values;
a simple example would be an array. The standard C++ map is not restricted to contiguous (that
is, consecutive) values like an array or a vector, however. Here is a simple map from int to int:
You access maps the same way you access arrays, but the key values used in the subscripting
don't have to cover the full range. To create the map in the preceding example by using arrays,
you would need at least 89 elements in the array, whereas the map needs only 2. If you consider
a mapof phone numbers and contact names, you can see that an ordinary array is not an
option. maps become very interesting when the key values are non-integers; we say that they
associate strings with values, and hence they are often called associative arrays. Typically,
a map is about as fast as a binary search.
Something that is important to note about maps is that they get bigger if you are continuously
querying them with different keys. Say you are reading in a large body of text, looking for a few
words. If you are using array notation, each time you look up a value in the map, the map gets
another entry. So a map of a few entries can end up with thousands of entries, most of which
are trivial. Fortunately, there is a straightforward way around this: You can use the
map's find()method. First, you can define some typedef names to simplify things:
The find() method returns a map iterator, which either refers to an existing item or is equal to
the end of the map.
Maps are some of the most entertaining goodies in the standard library. They are useful tools,
and you can use them to write very powerful routines in just a few lines. Here is a function that
counts word frequencies in a large body of text (testing this case, the first chapter of Conan
Doyle's Hound of the Baskervilles, courtesy of the Gutenberg Project):
This example uses the shorthand for opening a file, and it assumes that the file will always exist.
The real fun happens on the fourth line in this example. For each word in the file, you increment
the map's value. If a word is not originally present in the map, msi[word] is zero, and a new
entry is created. Otherwise, the existing value is incremented. Eventually, msi will contain all
unique words, along with the number of times they have been used. This example is the first bit
of code in this book that really exercises a machine. The UnderC implementation is too slow for
analyzing large amounts of text, but Chapter 4, "Programs and Libraries," shows how to set up
a C++ program that can be compiled into an executable program.
Often you are given input without any idea of how many numbers to expect. If you
usepush_back(),the vector automatically increases in size to accommodate the new numbers.
So the function read_some_numbers() will read an arbitrary number of integers and add them
to the end of the vector.
A queue, on the other hand, operates in first-in, first-out (FIFO) fashion, similarly to a line of
waiting people, who are served in first come, first served order. A vector is not a good
implementation of a queue because inserting at the front causes all entries to shuffle
along. lists, however, are good candidates for queuing. You add an item to a queue by
using push_front(), and you take an item off the end by using pop_back(). Queues are
commonly used in data communications, where you can have data coming in faster than it can
be processed. So incoming data is buffered—that is, kept in a queue until it is used or the buffer
overflows. The good thing about a list is that it never overflows, although it can underflow,
when someone tries to take something off an empty queue; therefore, it is important to check
size. Graphical user interface systems such as Windows typically maintain a message queue,
which contains all the user's input. So it is possible to type faster than a program can process
keystrokes.
CASE STUDY
Pointer-based arrays have a number of problems. For example, a program can easily “walk off”
either end of an array, because C++ does not check whether subscripts fall outside the range of
an array (the programmer can still do this explicitly though). Arrays of sizen must number their
elements 0, ..., n – 1; alternate subscript ranges are not allowed. An entire non-char array
cannot be input or output at once; each array element must be read or written individually. Two
arrays cannot be meaningfully compared with equality operators or relational operators (because
the array names are simply pointers to where the arrays begin in memory and, of course, two
arrays will always be at different memory locations). When an array is passed to a general-
purpose function designed to handle arrays of any size, the size of the array must be passed as
an additional argument. One array cannot be assigned to another with the assignment
operator(s) (because array names are const pointers and a constant pointer cannot be used on
the left side of an assignment operator). These and other capabilities certainly seem like
“naturals” for dealing with arrays, but pointer-based arrays do not provide such capabilities.
However, C++ does provide the means to implement such array capabilities through the use of
classes and operator overloading.
In this example, we create a powerful array class that performs range checking to ensure that
subscripts remain within the bounds of the Array. The class allows one array object to be
assigned to another with the assignment operator. Objects of the Array class know their size, so
the size does not need to be passed separately as an argument when passing an Array to a
function. Entire Arrays can be input or output with the stream extraction and stream insertion
operators, respectively. Array comparisons can be made with the equality operators == and !=.
This example will sharpen your appreciation of data abstraction. You will probably want to
suggest other enhancements to this Arrayclass. Class development is an interesting, creative
and intellectually challenging activity—always with the goal of “crafting valuable classes.”
The program of Figs. 11.6—11.8 demonstrates class Array and its overloaded operators. First
we walk through main (Fig. 11.8). Then we consider the class definition (Fig. 11.6) and each of
the class’s member-function and friend-function definitions (Fig. 11.7).
integers1[5] is 13
The program begins by instantiating two objects of class Array—integers1 (Fig. 11.8, line 12)
with seven elements, and integers2(Fig. 11.8, line 13) with the default Array size—10
elements (specified by the Array default constructor’s prototype in Fig. 11.6, line 15). Lines 16–
18 use member function getSize to determine the size of integers1 and output integers1,
using the Array overloaded stream insertion operator. The sample output confirms that
the Array elements were set correctly to zeros by the constructor. Next, lines 21–23 output the
size of Array integers2 and output integers2, using the Array overloaded stream insertion
operator.
Using the Overloaded Stream Insertion Operator to Fill an Array
Line 26 prompts the user to input 17 integers. Line 27 uses the Array overloaded stream
extraction operator to read these values into both arrays. The first seven values are stored
in integers1 and the remaining 10 values are stored in integers2. Lines 29–31 output the two
arrays with the overloaded Array stream insertion operator to confirm that the input was
performed correctly.
integers1 != integers2
The equal sign in the preceding statement is not the assignment operator. When an equal sign
appears in the declaration of an object, it invokes a constructor for that object. This form can be
used to pass only a single argument to a constructor.
Next, line 57 uses the overloaded equality operator (==) to confirm that
objects integers1 and integers2 are indeed identical after the assignment.
Line 61 uses the overloaded subscript operator to refer to integers1[ 5 ]—an in-range
element of integers1. This subscripted name is used as an rvalue to print the value stored
in integers1[ 5 ]. Line 65 uses integers1[ 5 ] as a modifiable lvalue on the left side of an
assignment statement to assign a new value, 1000, to element 5 of integers1. We will see
that operator[] returns a reference to use as the modifiable lvalue after the operator confirms
that 5 is a valid subscript for integers1.
Interestingly, the array subscript operator [] is not restricted for use only with arrays; it also
can be used, for example, to select elements from other kinds of container classes, such as
linked lists, strings and dictionaries. Also, when operator[] functions are defined, subscripts no
longer have to be integers—characters, strings, floats or even objects of user-defined classes
also could be used. In Chapter 23, Standard Template Library (STL), we discuss
the STL map class that allows noninteger subscripts.
Array Class Definition
Now that we have seen how this program operates, let us walk through the class header (Fig.
11.6). As we refer to each member function in the header, we discuss that function’s
implementation in Fig. 11.7. In Fig. 11.6, lines 35–36 represent the private data members of
class Array. Each Array object consists of a size member indicating the number of elements in
the Array and an intpointer—ptr—that points to the dynamically allocated pointer-based array
of integers managed by the Array object.
Lines 12–13 of Fig. 11.6 declare the overloaded stream insertion operator and the overloaded
stream extraction operator to befriends of class Array. When the compiler sees an expression
like cout << arrayObject, it invokes global function operator<< with the call
When the compiler sees an expression like cin >> arrayObject, it invokes global
function operator>> with the call
We note again that these stream insertion and stream extraction operator functions cannot be
members of class Array, because theArray object is always mentioned on the right side of the
stream insertion operator and the stream extraction operator. If these operator functions were to
be members of class Array, the following awkward statements would have to be used to output
and input anArray:
Such statements would be confusing to most C++ programmers, who are familiar
with cout and cin appearing as the left operands of<< and >>, respectively.
Function operator<< (defined in Fig. 11.7, lines 127—144) prints the number of elements
indicated by size from the integer array to which ptr points. Function operator>> (defined in
Fig. 11.7, lines 118–124) inputs directly into the array to which ptr points. Each of these
operator functions returns an appropriate reference to enable cascaded output or input
statements, respectively. Note that each of these functions has access to
an Array’s private data because these functions are declared as friends of class Array. Also,
note that class Array’s getSize and operator[] functions could be used
by operator<< and operator>>, in which case these operator functions would not need to
be friends of class Array. However, the additional function calls might increase execution-time
overhead
Array Default Constructor
Line 15 of Fig. 11.6 declares the default constructor for the class and specifies a default size of
10 elements. When the compiler sees a declaration like line 13 in Fig. 11.8, it invokes
class Array’s default constructor (remember that the default constructor in this example actually
receives a single int argument that has a default value of 10). The default constructor (defined
in Fig. 11.7, lines 18–25) validates and assigns the argument to data member size, uses new to
obtain the memory for the internal pointer-based representation of this array and assigns the
pointer returned by new to data member ptr. Then the constructor uses a for statement to set
all the elements of the array to zero. It is possible to have an Array class that does not initialize
its members if, for example, these members are to be read at some later time; but this is
considered to be a poor programming practice. Arrays, and objects in general, should be
properly initialized and maintained in a consistent state.
Array Copy Constructor
Line 16 of Fig. 11.6 declares a copy constructor (defined in Fig. 11.7, lines 29–36) that initializes
an Array by making a copy of an existing Array object. Such copying must be done carefully to
avoid the pitfall of leaving both Array objects pointing to the same dynamically allocated
memory. This is exactly the problem that would occur with default memberwise copying, if the
compiler is allowed to define a default copy constructor for this class. Copy constructors are
invoked whenever a copy of an object is needed, such as in passing an object by value to a
function, returning an object by value from a function or initializing an object with a copy of
another object of the same class. The copy constructor is called in a declaration when an object
of class Array is instantiated and initialized with another object of class Array, as in the
declaration in line 41 of Fig. 11.8.
The argument to a copy constructor should be a const reference to allow a const object
to be copied.
The copy constructor for Array uses a member initializer (Fig. 11.7, line 30) to copy the size of
the initializer Array into data membersize, uses new (line 32) to obtain the memory for the
internal pointer-based representation of this Array and assigns the pointer returned by new to
data member ptr. Then the copy constructor uses a for statement to copy all the elements of
the initializer Arrayinto the new Array object. Note that an object of a class can look at
the private data of any other object of that class (using a handle that indicates which object to
access).
If the copy constructor simply copied the pointer in the source object to the target object’s
pointer, then both objects would point to the same dynamically allocated memory. The first
destructor to execute would then delete the dynamically allocated memory, and the other
object’s ptr would be undefined, a situation called a dangling pointer—this would likely
result in a serious runtime error (such as early program termination) when the pointer was
used.
Array Destructor
Line 17 of Fig. 11.6 declares the destructor for the class (defined in Fig. 11.7, lines 39–42). The
destructor is invoked when an object of class Array goes out of scope. The destructor
uses delete [] to release the memory allocated dynamically by new in the constructor.
getSize Member Function
Line 18 of Fig. 11.6 declares function getSize (defined in Fig. 11.7, lines 45–48) that returns
the number of elements in the Array.
Line 20 of Fig. 11.6 declares the overloaded assignment operator function for the class. When
the compiler sees the expressionintegers1 = integers2 in line 49 of Fig. 11.8, the compiler
invokes member function operator= with the call
integers1.operator=( integers2 )
1. Note that new could fail to obtain the needed memory. We deal with new failures in Chapter
16, Exception Handling.
2. Once again, new could fail. We discuss new failures in Chapter 16.
did not test for this case, operator= would delete the dynamic memory associated with
the Array object before the assignment was complete. This would leave ptr pointing to memory
that had been deallocated, which could lead to fatal runtime errors.
Not providing an overloaded assignment operator and a copy constructor for a class when
objects of that class contain pointers to dynamically allocated memory is a logic error.
It is possible to prevent one object of a class from being assigned to another. This is done by
declaring the assignment operator as a private member of the class.
It is possible to prevent class objects from being copied; to do this, simply make both the
overloaded assignment operator and the copy constructor of that class private.
Line 21 of Fig. 11.6 declares the overloaded equality operator (==) for the class. When the
compiler sees the expression integers1 == integers2 in line 57 of Fig. 11.8, the compiler
invokes member function operator== with the call
integers1.operator==( integers2 )
Lines 30 and 33 of Fig. 11.6 declare two overloaded subscript operators (defined in Fig. 11.7 at
lines 88–99 and 103–114, respectively). When the compiler sees the
expression integers1[ 5 ] (Fig. 11.8, line 61), the compiler invokes the appropriate
overloadedoperator[] member function by generating the call
integers1.operator[]( 5 )
The compiler creates a call to the const version of operator[] (Fig. 11.7, lines 103–114) when
the subscript operator is used on aconst Array object. For example, if const object z is
instantiated with the statement
const Array z( 5 );