VisualC and CPP
VisualC and CPP
"The business side is easy—easy! ...if you're any good at math at all, you understand
business. It's not its own deep, deep subject. It's not like C++."
There are two sample applications associated with this technical article.
Click to open or copy the files for the OWNER sample application.
Click to open or copy the files for the NEWOPR sample application.
Abstract
Many developers ask the question, "Do I need to overload the new operator for
Windows™–based applications?" when they start programming in C++ with the
Microsoft® C/C++ version 7.0 compiler. These developers want to conserve selectors
while allocating memory from the global heap. Fortunately, the C/C++ version 7.0 run-
time library allows developers to reduce selector consumption without overloading the
new operator.
This article examines the behavior of the new operator in a Windows-based program. It
provides an overview of new, discusses whether you should overload new, examines the
C++ ambient memory model, and discusses large-model C++ programming and
dynamic-link library (DLL) ownership issues.
Two sample applications, newopr and owner, illustrate the concepts in this technical
article. A bibliography of suggested reading material is included at the end of the article.
Overview
This section provides an overview of the new operator, the _fmalloc function, and the
_nmalloc function.
new
The new operator calls malloc directly. In small or medium model, it calls the near
version of malloc, which is _nmalloc. In large model, it calls _fmalloc.
Alarms are probably ringing in the heads of experienced programmers. In the past,
Microsoft has recommended against using malloc because it was incompatible with
Windows real mode. In C/C++ version 7.0, malloc is designed for Windows protected-
mode programming, and real mode is no longer a concern in Microsoft® Windows™
version 3.1. In most cases, calling _fmalloc is now better than calling GlobalAlloc
directly.
_fmalloc
Reducing selectors is particularly important for C++ programs. Most programs allocate
lots of small objects on the heap. If new called GlobalAlloc directly, each small object
would use a selector, and the program would reach the system limit of 8192 selectors
(4096 in standard mode) too quickly.
_nmalloc
Although _fmalloc is fine and dandy, _nmalloc is not nearly (no pun intended) as
sophisticated. _nmalloc allocates fixed memory with LocalAlloc directly, which may
result in memory fragmentation in the local heap. _nmalloc performs no subsegment
allocation scheme, and the local heap must share a maximum of 64K with the stack and
static data.
Here's another gotcha: _nmalloc is the default for the new operator in the medium and
small models. _nmalloc allocates its memory from the local heap and must share the
heap with the static data and stack—so a lot of things compete for only 64K of space. It is
rather easy to run out of memory in the local heap. For example, a simple phone book
that requires 200 bytes of data per entry would be able to store a maximum of only 330
names.
Heap Walker can help you determine the source of memory allocation. Memory allocated
with LocalAlloc (through _nmalloc) expands the segment labeled DGroup. Memory
allocated with GlobalAlloc (through _fmalloc) is labeled as a private segment.
For more information on _fmalloc, see the "Allocating Memory the Old-Fashioned Way:
_fmalloc and Applications for Windows" technical article on the Microsoft Developer
Network CD (Technical Articles, C/C++ Articles).
Overloading the new Operator
Many developers want to overload the new operator as soon as they learn that new calls
_nmalloc. You can overload the new operator to perform specialized memory
management, but overloading new to call _fmalloc instead of _nmalloc will not work.
The new operator has four versions. In this article, we are concerned only with the
following two:
In small and medium models, the compiler calls the near version of the new operator, and
this version then calls _nmalloc. If we try to overload this function by calling _fmalloc,
we would get a far-to-near pointer conversion error:
A memory management scheme that overloads the near version of the new operator can
return only near pointers, so using GlobalAlloc or GlobalAllocPtr will not work either.
Overloading the new operator to call _fmalloc instead of _nmalloc is obviously not the
answer.
You can think of the ambient memory model as the default memory model. Normally, the
ambient memory model of a class is identical to the data model you specify at
compilation time. If the data model is near (for example, in small or medium models), the
ambient memory model is near. You can specify the ambient memory model for a class
explicitly by using __near or __far; for example:
Using the new operator on the CFoo class, as defined above, allocates the CFoo object
on the global heap using _fmalloc.
Note The ambient memory model of a class must be identical to the memory model of
all of its base classes. For example, if your class inherits from a Microsoft Foundation
class, your class must have the same memory model as the Foundation class. If you use
small and medium memory models, the ambient memory model of a Foundation class is
near. We discuss the large model in the "Large-Model Programs" section.
class CBar{
};
void main()
{
CBar __far *pBar = new __far CBar ;
}
At first glance, the code above looks very straightforward. However, nonstatic member
functions have a hidden parameter called the this pointer. It is through the this pointer
that an object instance references its data. If the member function is near, it expects the
this pointer to be near. A far this pointer results in an error because a far pointer cannot
be converted to a near pointer.
The following code generates an error because it cannot find a default constructor that
returns a far this pointer:
class CBar{
public:
CBar();
};
CBar::CBar()
{
}
void main()
{
CBar __far *pBar = new __far CBar ;
// ERROR C2512: 'CBar': An appropriate
// default constructor is not available.
}
To compile the code above, you must override the constructor based on the addressing
type. This results in the following correct code:
class CBar{
public:
CBar();
CBar() __far ;
// Overload the constructor to take far this pointers.
};
CBar::CBar()
{
}
// Overloaded constructor.
CBar::CBar() __far
{
}
void main()
{
CBar __far *pBar = new __far CBar ;
}
Only functions that are actually called through a far pointer need to be overridden.
class CBar{
private:
int value;
buildIt() __far {}; // Must be far: CBar() __far calls it.
public:
CBar();
CBar() __far ;
// Overload the constructor to take far this pointers.
CBar::CBar()
{
buildIt() ;
}
// Overloaded constructor.
CBar::CBar() __far
{
buildIt() ;
}
void main()
{
CBar *npBar = new CBar ; // Allocated in default data segment.
The use of the __far modifier can make programs very difficult to understand and debug.
For example, let's assume that the following code is compiled in the small or medium
memory models:
class CBar {
public:
CBar() ;
~CBar() ;
...
};
main()
{
CFoo anotherFoo; // Allocated on stack
// (default data segment).
npFoo = &aFoo;
// Error : Cannot convert from a far pointer to a near pointer.
}
You can see how complex an application can get when it mixes near objects and far
objects.
Again, Heap Walker can help you determine whether memory is being allocated in the
default data segment or in the global heap.
For additional information on the new operator, see Chapter 5 of the Microsoft C/C++
version 7.0 Programming Techniques manual on the Microsoft Developer Network CD.
Large-Model Programs
As we discussed in the previous section, mixing near and far addressing is even more of a
nightmare in C++ than it is in C and can offset many C++ benefits such as ease of
maintenance and readability. The solution is to use the large model.
Although the large model has not been recommended in the past, the combination of
Microsoft C/C++ version 7.0 and Windows version 3.1 now makes large model the
memory model of choice.
When a C or C++ program is compiled with the large memory model, malloc is mapped
to its model-independent or far version known as _fmalloc. Because the new operator
calls malloc, heap objects are allocated in global memory.
The two issues associated with using the large model involve speed and creating multiple
instances. The time you save by not worrying whether an object is near or far can be used
to run a profiler and to optimize the application, thus compensating for any speed losses
caused by the large model.
Multiple Instances
The new /Gx option in C/C++ version 7.0 simplifies the creation of multiple-instance,
large-model applications. Make sure to use the following compiler options:
/Gt65500 /Gx
Programs with multiple read/write data segments cannot have multiple instances. By
default, the Microsoft C compiler places initialized and uninitialized static data in two
separate segments. The compiler places each static object that is larger than or equal to
32,767 bytes into its own segment. The /Gx and /Gt options override this behavior.
The /Gx option forces all initialized and uninitialized static data into the same segment.
The /Gt[n] option places any object larger than n bytes in a new segment. (n is optional,
as indicated by the square brackets.) If n is not specified, it defaults to 256 bytes. If n is
large (for example, 65,500 bytes), most objects remain in the default data segment.
Because a multiple-instance application can have only one read/write data segment, the
application is limited to 64K for all statics, the local heap, and the stack. However, C++
promotes the use of the heap through the new operator, which allocates memory from the
global heap instead of the local heap in the large model, so the 64K local heap limit
should not be a problem. Moreover, multiple-instance, small-model and medium-model
applications also have only one read/write data segment.
Warning A bug in Microsoft C/C++ version 7.0 causes the compiler to place
uninitialized global instances of classes and structures in a far data segment
(FAR_DATA) when the /Gx option is used, resulting in two data segments. For this
reason, you must declare global class objects and structures as near.
To illustrate, most Microsoft Foundation Class Library programs have a global object
declared as follows:
CTheApp theApp;
To get multiple instances of this program, you must change the line to:
The EXEHDR utility determines the number of data segments a program contains. In the
sample EXEHDR output below, the lines that detail the number of segments are
underlined and appear in bold.
Module: NEWOPR
Description: newopr - demonstrates new operator in
medium v. large model
Data: NONSHARED
Initial CS:IP: seg 1 offset e392
Initial SS:SP: seg 4 offset 0000
Extra stack allocation: 1000 bytes
DGROUP: seg 4
Heap allocation: 0400 bytes
Application type: WINDOWAPI
Runs in protected mode only
Exports:
ord seg offset name
1 1 e358 _AFX_VERSION exported
2 1 f718 ___EXPORTEDSTUB exported
The MAP file helps determine the data that is placed in the FAR_DATA segment instead
of the default data segment. To get a MAP file, be sure to specify a MAP filename and the
/MAP option on the link line. In the sample MAP file below, lines of interest are
underlined and shown in bold.
0001:7C52 ??0CArchive@@REC@PEVCFile@@IHPEX@Z
.
.
.
0003:0000 ?spaceholder@@3VCObArray@@E
0001:7BF2 ?Store@CRuntimeClass@@RECXAEVCArchive@@@Z
0002:01F0 ?TextOut@CDC@@RECHHHPFDH@Z
0003:000E ?theApp@@3VCTheApp@@E
.
.
.
Address Publics by Value
.
.
.
0002:1380 ?GetStartPosition@CMapPtrToWord@@RFCPEXXZ
0003:0000 ?spaceholder@@3VCObArray@@E
0003:000E ?theApp@@3VCTheApp@@E
0004:0004 rsrvptrs
.
.
.
0004:1FEE __end
0004:1FEE _end
Multiple-instance, large-model programs that use the Microsoft Foundation classes must
build special versions of the Microsoft Foundation Class Library using the /Gt and /Gx
options. Use the following command line:
Warning This variant of the Microsoft Foundation Class Library has not been tested
extensively by Microsoft.
For additional information on using large-model programs with Windows, see the
"Programming at Large" technical article on the Microsoft Developer Network CD
(Technical Articles, C/C++ Articles).
The best way to use newopr is to compile it medium model, run it, and examine the heap
with Heap Walker. Run NMAKE with the CLEAN option, and then compile large model.
Run the large-model version, and re-examine the heap with Heap Walker.
• If the application owns the memory, exiting the application releases the memory.
• If the DLL owns the memory, unloading the DLL from memory releases the
memory.
The key point here is that memory owned by a DLL (for example, GMEM_SHARE) can
exist even after your application exits. The Smart Alloc sample application, which
accompanies "Allocating Memory the Old-Fashioned Way: _fmalloc and Applications for
Windows," illustrates this issue.
A DLL owns the memory allocated as GMEM_SHARE from within the DLL (in C++ or
C). A DLL also owns the memory allocated by new in the DLL. Determining when and
where memory is allocated can become very confusing in C++.
The code samples below are from the owner sample application and its associated
OWNERDLL.DLL.
CContainedClass aContainedClass ;
char aBuffer[1024] ;
char *aString ;
} ;
CFooInDLL::CFooInDLL()
{
aString = new char[1024] ;
}
The .EXE for the program contains the following code fragment:
// Code in .EXE
CFooInDLL aFoo;
void somefunc()
{
aFoo.yourString() ; // Now application owns aString.
aFoo.myString() ; // Now DLL owns aString.
aFoo.yourString() ; // Now application owns aString.
}
Given these code fragments (where the object is defined in a DLL and declared in an
application), the following rules apply:
• The application owns the memory for objects declared in the application.
• Space for an object and its contained objects is allocated where the object is
declared.
• The process that executes the new operator owns the memory for the object (see
figure below).
• The CFooInDLL constructor calls the new operator to allocate space for
aString; therefore, the DLL owns the memory for aString.
• myString executes inside the DLL; therefore, the DLL owns memory
allocated by myString.
• The debug versions of Foundation classes track the allocation of memory. An
assertion in the Microsoft Foundation Class Library MEmory.cpp source file will
fail when yourString tries to free memory allocated by the DLL. Therefore, the
retail versions of owner and OWNERDLL run fine, but the debug versions fail.
In most cases, it is best to design classes exported from a DLL so that memory ownership
will not bounce between the application and the DLL. Using the debug versions of the
Foundation class libraries helps track this problem.
The problem of determining memory ownership is just one more reason not to export
C++ class interfaces from a DLL. In most cases, it is much better to export a C interface
from a DLL.
Conclusion
There is no need to override the new operator to make it compatible with the Windows
environment. The new operator calls malloc. The model-independent version of malloc,
_fmalloc, is designed to manage subsegment allocation under Windows.
Bibliography
The following technical articles on the Microsoft Developer Network CD (Technical
Articles, C/C++ Articles) are good sources of information on memory management in
C++:
• "Programming at Large"
We also recommend the Microsoft C/C++ version 7.0 Programming Techniques manual,
also available on the Microsoft Developer Network CD. Chapter 5 of this manual
discusses memory management in C++.
Allocating Memory the Old-Fashioned
Way: _fmalloc and Applications for
Windows
Dale Rogerson
Microsoft Developer Network Technology Group
Click to open or copy the files in the Smart Alloc sample application for this technical
article.
Abstract
One of the most shocking things that a first-time programmer for Windows has to learn is
not to use malloc but to use special Microsoft® Windows™ memory allocation functions
such as GlobalAlloc, GlobalReAlloc, GlobalLock, GlobalUnlock, and GlobalFree.
The reasons for requiring special memory allocation functions have mostly gone away
with the demise of real mode. In fact, Microsoft C/C++ version 7.0 brings us almost full
circle, because the preferred method for memory allocation is the large-model version of
malloc or _fmalloc. Even the C startup code now uses malloc to allocate space for the
environment.
This article discusses the behavior of malloc supplied with Microsoft C/C++ version 7.0.
The article focuses on programming for the protected modes—standard and enhanced—
of Microsoft Windows version 3.1. The following topics are discussed:
The information for this article was gleaned from the C/C++ version 7.0 compiler run-
time library source code.
To interactively explore the behavior of _fmalloc, the Smart Alloc (SMART.EXE)
sample application is provided. Smart Alloc is best used in conjunction with Heap
Walker, which shows the exact state of the global segments allocated. Segments allocated
with GlobalAlloc (or _fmalloc) are listed by Heap Walker as having a type designation
of "Private." Smart Alloc has a dynamic-link library (DLL) that intercepts all calls to
GlobalAlloc, GlobalFree, and GlobalReAlloc made by Smart Alloc or the C run-time
library and prints messages with OutputDebugString to the debugging console. It is
usually most convenient to use DBWIN.EXE to view these messages.
Microsoft C/C++ version 7.0 was designed to develop protected-mode applications for
Windows. In protected mode, there is no penalty for locking a memory handle and
leaving it locked. It is not even necessary to retain the handle returned from GlobalAlloc,
because the GlobalHandle function returns the handle to a selector returned from
GlobalLock. Macros defined in WINDOWSX.H simplify the process of getting a pointer
to a block of memory. The GlobalAllocPtr and GlobalFreePtr macros automatically
lock and unlock a memory block.
Microsoft C/C++ version 7.0 takes advantage of the new freedom allowed by protected
mode. _fmalloc can now leave memory blocks locked with no penalty under the two
protected modes of Windows version 3.x.
For example, take a flat file database that reads in a list of names and addresses from the
hard disk and puts them in a binary tree. If GlobalAlloc is called for each name and
address, this program would not be able to store more than 4096 names. Many companies
have more than 4096 employees. In fact, the actual number of available selectors is far
less than 8192 because all Windows-based applications and libraries must share from the
same pool of selectors.
In the next call, _fmalloc first tries to satisfy the request without allocating any memory.
If this is not possible, it attempts to do a GlobalReAlloc instead of a GlobalAlloc. This
reduces the number of selectors used by the program. If the segment size must grow
larger than the _HEAP_MAXREQ constant defined in malloc.h to meet the allocation
request, GlobalAlloc is called again. _HEAP_MAXREQ is defined to be 0x0FFE6 or
65,510 bytes. This leaves enough room for the overhead needed to manage the heap and
not have any memory crossing a segment boundary. If more than _HEAP_MAXREQ
memory is requested, the _fmalloc call returns a null pointer.
Figure 1 illustrates how _fmalloc satisfies several memory requests with one segment
consuming only one selector when the requested blocks are less than _HEAP_MAXREQ.
Each call to GlobalAlloc, on the other hand, uses up a selector.
Figure 2. _fmalloc Subsegment Allocation
Figure 2 shows how _fmalloc allocates a new segment when it cannot satisfy a request
with the old segment because the requested block would cause the segment to grow larger
than _HEAP_MAXREQ. Notice how neither GlobalAlloc nor _fmalloc allocates exactly
the number of bytes that are requested. Both functions have some overhead. The current
version of _fmalloc requires 22 bytes of overhead on top of the overhead of GlobalAlloc.
It also defines the smallest segment size to be 26. Future versions of _fmalloc may
require more or less overhead. _fmalloc also returns a pointer that is guaranteed to be
aligned on double-word boundaries.
The amount of memory that _fmalloc initially allocates to a new segment is rounded up
to the nearest 4K boundary. If less than 4070 bytes (4096 - 26) is requested, 4K is
allocated. If 4096 - 26 + 1 is requested, 8K is allocated. This behavior differs from the
explanation in the Microsoft C/C++ version 7.0 Run-Time Library Reference, which
states that the initial requested size for a segment is just enough to satisfy the allocation
request.
When _fmalloc can satisfy a request by growing the segment, it calls GlobalReAlloc.
The global variable _amblksiz determines the amount by which the segment is grown.
_fmalloc will grow the segment in enough multiples of _amblksiz to satisfy the request.
The default value of _amblksiz is 4K for Windows, instead of the 8K used by MS-
DOS®. You can set _amblksiz to any value, but _fmalloc rounds it up to the nearest
whole power of two before it is used.
The sample application, Smart Alloc (SMART.EXE), can be used to explore the behavior
of _fmalloc in detail. Examine Smart Alloc's Help file for more information on using it.
Try allocating 1 byte of memory. _fmalloc calls GlobalAlloc with a size of 4K. Try
allocating 4070 bytes and 4071 bytes. Smart Alloc also lets you experiment with different
values of _amblksiz.
The frugal behavior of _fmalloc makes it suited to allocating bunches of small memory
objects. However, as will be shown in the next section, _fmalloc is not suitable for all
uses.
Note In Figures 3 through 7, it is possible for Selector 3 to have a lower or higher value
than Selector 1. The number indicates in what order the selectors were allocated.
In Figure 3, the last block allocated has been freed. However, its memory is not returned
to the system.
In Figure 4, the first and fourth blocks of memory are freed in addition to Block 5. Again,
no memory is returned to Windows with a GlobalFree. If _fmalloc returned the memory
for the first block to Windows, the pointer to Block 2 would have to change. It would be
possible for _fmalloc to GlobalReAlloc the memory associated with Selector 2 and
GlobalFree the memory associated with Selector 3. This can be accomplished with the
C/C++ run-time library, as will be explained in conjunction with Figure 7.
In Figure 5, a new block has been allocated. Because this block is half the size of the
previous first block, _fmalloc places it in this empty block of Selector 1.
In Figure 6, another block of memory is allocated. This time it is twice the size of the
previous blocks of memory. Because this block is too large to fit into the heap associated
with Selector 2, the memory associated with Selector 3 is reallocated to hold it.
Figure 7. Figure 4 Followed by _heapmin
If memory is set up as in Figure 4, calling _heapmin will leave memory in the state
shown by Figure 7. _heapmin performs the following actions to achieve this state:
• Selector 2's memory is GlobalReAlloc'ed to remove the freed block and padding.
To recreate the previous examples with Smart Alloc, use 22,000 bytes for the size x. It is
important to note that Smart Alloc sorts allocated memory by handle (that is, selector)
and not the order in which it was allocated.
In addition to _heapmin, the C compiler run-time library contains many other functions
to help manage the heap created by _fmalloc. Descriptions of these functions are in the
Microsoft C/C++ version 7.0 Run-Time Library Reference. Like _heapmin, most of these
functions are unique to C/C++ version 7.0 and are not ANSI C compatible. Below is a list
of these unique functions:
Reallocation functions:
_fexpand Expands or shrinks a block of memory without moving its
location.
_frealloc Reallocates a block to a new size. Might move the block of
memory.
_heapadd Adds memory to a heap.
_heapmin Releases unused memory in a heap.
Information functions:
_fmsize Returns size of an allocated block.
_fheapwalk Returns information about each entry in a heap.
Debugging functions:
_fheapset Fills free heap entries with a specified value.
All programmers who decide to use _fmalloc must be aware that _ffree does not return
memory to the operating system. For example, an application might read in an entire text
file and display it on the screen. Let's say that the application keeps a linked list of lines
and mallocs the memory for each line in the file. If the user selects a large file of about 1
megabyte (MB), the application allocates at least 1 MB of memory. The user then closes
the file. The application faithfully calls _ffree for each line in the file. Even though the
application does not need the memory, it is still hogging it from the system. This
application needs to call _heapmin or one of the other heap management functions.
Why doesn't _ffree call GlobalFree? There are two main reasons:
Note All memory (freed and unfreed) is returned to the system as part of the Windows
kernel's normal clean-up process when the application exits.
The GMEM_SHARE flag tells Windows that this memory is going to be shared by
several programs. The most immediate consequence of using GMEM_SHARE in a DLL
is that the memory will not be released until the DLL is unloaded from memory. The DLL
is not always unloaded from memory when the application that loads it exits. Because
multiple applications or instances of an application are using a DLL, the DLL and its
memory will not be unloaded until all applications using the DLL have exited.
• If an application allocates memory and does not free it, the memory is freed by
Windows when the application exits.
If a programmer is not careful, the use of _fmalloc in a DLL can lead to large pools of
allocated but unneeded memory. It is usually best to use the GMEM_SHARE flag only
when memory must be shared or must exist for the lifetime of the DLL. This means that,
in many cases, GlobalAlloc should be used instead of _fmalloc in a DLL.
Remember, calling _ffree does not generate a call to GlobalFree. Even if the DLL is
freeing memory before it returns to the application, memory can be wasted. Refer to the
previous section on _ffree for more information.
The situations listed above can be demonstrated by using the Smart Alloc sample
application. Perform the following steps:
3. GlobalAlloc 1000 bytes of movable memory from a DLL. (See the Smart Alloc
help file for details on how to do this.)
4. Walk the global heap using Heap Walker and examine the listing. The above
memory should be owned by Smart Alloc. It will differ slightly in size due to the
overhead and padding performed by GlobalAlloc.
6. Walk the global heap using Heap Walker and examine the listing. The memory
allocated in step 5 should be owned by SMARTDLL.DLL. It will differ slightly in
size due to the overhead and padding performed by GlobalAlloc.
7. Run a second instance of Smart Alloc. Do not exit the first instance.
8. GlobalAlloc 3000 bytes of movable memory from a DLL using the second
instance of Smart Alloc.
9. GlobalAlloc 4000 bytes of shared memory from a DLL using the second instance
of Smart Alloc.
10. Walk the global heap in Heap Walker and examine the listing. The memory
allocated in steps 8 and 9 should be owned and allocated like the memory
allocated by the first instance in steps 4 and 5. In fact, the memory allocated in
step 9 will be allocated in the same segment as the memory allocated for the first
instance of Smart Alloc in step 5.
Figures 8 and 9 illustrate the above sequence. Figure 8 illustrates the state of memory
after executing steps 1 through 10 in the list above.
Remember that _fmalloc allocates memory with the GMEM_SHARE option set.
• Allocate more than 64K. GlobalAlloc takes a DWORD, while _fmalloc takes a
size_t, which is an unsigned int. _halloc can also be used to allocate more than
64K in a block of memory.
• Allocate fixed memory, discardable memory, or memory with the other GMEM_*
attributes.
Although most programmers do not think of general protection faults as a positive thing,
they can be helpful in locating where a program writes outside of a memory block.
Because _fmalloc returns a pointer into a block of memory, it is possible to write past the
end of the block and not write past the end of the segment.
Conclusion
In most cases, _fmalloc and _ffree utilize system resources better than directly calling
GlobalAlloc and GlobalFree. The subsegment allocation scheme used by _fmalloc
reduces the number of selectors needed and also reduces the amount of system overhead.
Also keep in mind that calling _fmalloc from a DLL allocates memory with the
GMEM_SHARE attribute set, which is usually not what is wanted because memory is
not freed until the DLL is unloaded.
Click to open or copy the files in the CALLB sample application for this technical article.
Abstract
Microsoft® Windows™ version 3.1 has over 30 callback functions that applications can
use to enumerate objects, hook into the hardware, and perform a variety of other
activities. Due to the prevalence of callbacks, it is only natural to want to handle
callbacks with C++ member functions. However, callbacks are prototyped as C functions
and, therefore, do not associate data with operations on that data, making the handling of
callbacks less straightforward when you use C++ than it initially might appear.
This article explains why normal member functions cannot be used as callback functions,
gives several techniques for handling callbacks, and illustrates these techniques with code
fragments. The code fragments are included as the CALLB sample program on the
Microsoft Developer Network CD.
The article and source code are targeted toward Microsoft C/C++ version 7.0, but the
ideas presented apply to all C++ compilers, including those by Borland and Zortech.
The reader should be familiar with Windows callbacks and with C++. A bibliography is
supplied at the end of the article.
When Windows calls the EnumObjectsProc function, it passes the two parameters—
lpLogObject and lpData—to the function.
The following code attempts to set up a member function as a callback. The code
compiles and links successfully but causes a protection fault at run time.
// See CProg1.cpp
// Run nmake -fmake1
class CProg1 {
private:
int nCount ;
// Incorrect callback declaration
// Use a static or nonmember function.
int FAR PASCAL EXPORT
EnumObjectsProc( LPSTR lpLogObject, LPSTR lpData) ;
public:
// Constructor
CProg1() : nCount(0) {};
// Member function
void enumIt(CDC& dc) ;
};
// Callback handler
int FAR PASCAL EXPORT
CProg1::EnumObjectsProc( LPSTR lpLogObject, LPSTR pData)
{
// Process the callback.
nCount++ ;
MessageBeep(0) ;
return 1 ;
}
The reason for the above error and protection fault is that C++ member functions have a
hidden parameter known as the this pointer. C++ is able to associate a function with a
particular instance of an object by means of the this pointer. When C++ compiles the
following line:
The last parameter, (CDC*) &dc, is the this pointer. Member functions access an object's
data through the this pointer. C++ handles the this pointer implicitly when accessing
member data. In the CProg1::enumIt function, the line:
nCount = 0 ;
this->nCount = 0 ;
Windows passes only two parameters to EnumObjectsProc. It does not call functions
through objects and cannot send a this pointer to the callback function. However, as
compiled above, EnumObjectsProc expects three parameters instead of two. The result
is that a random value on the stack is used as the this pointer, causing a crash. To handle
EnumObjectsProc as a member function, the compiler must be told not to expect a this
pointer as the last parameter.
• Nonmember functions
Nonmember Functions
A nonmember function is not part of a C++ class and, therefore, does not have a this
pointer. A nonmember function does not have access to the private or protected members
of a class. However, a nonmember friend function can access the private and protected
class members with which the function is friendly. Using nonmember functions to handle
a callback is similar to handling a callback in C.
Static member functions are class member functions that do not receive this pointers. As
a result:
• An object does not have to be created before a static member function is called or
static member data is accessed.
• The class scope operator can access static members without an object, for
example:
• CFoo::someFunc(someValue)
• A static member function cannot access a nonstatic member of its class without an
object instance. In other words, all object access must be explicit, such as:
• object.nonStatFunc(someValue);
• // NOT: nonStatFunc(someValue) ;
ptrObject->nonStatFunc(someValue);
// NOT: nonStatFunc(someValue) ;
The last point above is the kicker. Unlike a nonstatic member function, a static member
function is not bound to an object. A static function cannot implicitly access nonstatic
members.
For more information on static member functions, see the bibliography at the end of this
article.
In some cases, object pointers are unnecessary because the callback does not need to
access member data. In these cases, the callback operates only on static data. The
following code fragment demonstrates the technique.
// See CProg3.cpp
// Run nmake -fmake3
class CProg3 {
private:
static int statCount ;
int nCount ;
// Use a static member function for callbacks.
static int FAR PASCAL EXPORT
EnumObjectsProc( LPSTR lpLogObject, LPSTR lpData) ;
public:
// Constructor
CProg3() : nCount(0) {};
// Member function
void enumIt(CDC& dc) ;
};
// Callback handler
int FAR PASCAL EXPORT
CProg3::EnumObjectsProc( LPSTR lpLogObject, LPSTR pData)
{
// Process the callback.
statCount++;
// nCount++; This line would cause an error if not commented.
MessageBeep(0) ;
return 1 ;
}
Note that all objects of the CProg3 class above will share the statCount variable. Whether
this is good or bad depends on what the application is trying to accomplish. The
following code fragment illustrates how the outcome might not be what is expected.
There are several ways to avoid the sharing of data between instances of a class. The next
sections describe techniques that link the callback function to a particular object by
providing a pseudo-this pointer.
The main reason to have a callback as a member function is for accessing class members
unique to a particular object (that is, nonstatic members). A callback member function
must be a static function and, therefore, can only access static members without using "."
or "->".
The next listing shows how to use a static member variable to pass an object's this
pointer to the callback. The callback can then use the pointer to access object members.
To simplify the code, the callback calls a helper function that performs all the work. The
helper function is nonstatic and can implicitly access member data through its this
pointer.
// See CProg5.cpp
// Run nmake -fmake1
class CProg5 {
private:
int nCount ;
// Use a static variable to pass the this pointer.
static CProg5 * pseudoThis ;
// Callback handler
int FAR PASCAL EXPORT
CProg5::EnumObjectsProc( LPSTR lpLogObject, LPSTR lpData)
{
if (pseudoThis != (CProg *)NULL)
return pseudoThis->EnumObjectsHelper(lpLogObject, lpData) ;
else
return 0 ;
}
While the above technique works fine in many cases, the objects must coordinate the use
of the callback. For callbacks (such as EnumObjects) that do their work and then exit,
coordination is not much of a problem. For other callbacks, it may be. The techniques
described in the next two sections require less coordination but work only with certain
callbacks.
A close examination of the EnumObjects function reveals that it has an extra 32-bit
parameter, lpData, for supplying data to the callback routine. This is a great place to pass
a pointer to an object. The following overworked sample demonstrates this technique.
// See CProg6.cpp
// Run nmake -fmake1
class CProg6 {
private:
int nCount ;
// Use a static member function for callbacks.
static int FAR PASCAL EXPORT
EnumObjectsProc( LPSTR lpLogObject, LPSTR lpData) ;
// Use a nonstatic member function as a helper.
int EnumObjectsHelper( LPSTR lpLogObject) ;
public:
CProg6() : nCount(0) {};
void enumIt(CDC& dc) ;
};
// Callback handler
int FAR PASCAL EXPORT
CProg6::EnumObjectsProc( LPSTR lpLogObject, LPSTR lpData)
{
CProg6 * pseudoThis = (CProg6 *)lpData ;
if ( pseudoThis != (CProg6 *)NULL )
return pseudoThis->EnumObjectsHelper(lpLogObject) ;
else
return 0 ;
}
This technique will, of course, only work with callbacks that take application-supplied
data. The following list shows those callbacks:
• EnumChildProc
• EnumChildWindows
• EnumFontFamProc
• EnumFontFamilies
• EnumFontsProc
• EnumMetaFileProc
• EnumObjectsProc
• EnumPropFixedProc
• EnumPropMovableProc
• EnumTaskWndProc
• EnumWindowsProc
• LineDDAProc
Keeping a Pointer in a Collection Indexed by a Return Value
Another technique for linking an object pointer with a callback uses the return value of
the function that sets up the callback. This return value is used as an index into a
collection of object pointers.
In the following example, SetTimer sets up a TimerProc callback and returns a unique
timer ID. The timer ID is passed to TimerProc each time the function is called. The
CTimer class uses the timer ID to find the object pointer in a CMapWordToPtr
collection. The CTimer class is an abstract class designed to be inherited by other classes.
// See CTimer.h
// Run nmake -ftmake
// Declaration
class CTimer
{
private:
UINT id ;
static CMapWordToPtr timerList ;
static void stopTimer(int id) ;
static void FAR PASCAL EXPORT
timerProc(HWND hwnd, UINT wMsg, int timerId, DWORD dwTime);
protected:
virtual void timer(DWORD dwTime) = 0 ;
public:
// Constructor
CTimer() : id(NULL) {};
// Destructor
~CTimer() {stop();};
// Use
BOOL start(UINT msec) ;
void stop() ;
};
// Define statics.
CMapWordToPtr CTimer::timerList ;
// Implementation
Conclusion
Static member functions are used in C++ to handle callbacks because they do not have
this pointers. Callback functions are not designed to accept this pointers. Because static
member functions do not have this pointers and, in many cases, it is desirable to have
access to an object, this article has suggested four ways of providing the static member
function with a this pointer.
Bibliography
For more information on C++ topics such as the this pointer, friend functions, or static
functions, see:
• Microsoft C/C++ version 7.0 C++ Class Libraries User's Guide, Microsoft
Corporation, 1991.
• Norton, Peter and Paul Yao. Peter Norton's Windows 3.0 Power Programming
Techniques. Bantam Computer Books, 1990.
Click to open or copy the files in the Back sample application for this technical article.
Abstract
WinMain, GlobalAlloc, and mixed-model programming—these are just some of the
conventions C programmers had to accept when they started programming for the
Microsoft® Windows™ operating system. Microsoft C/C++ version 7.0 can now hide
these conventions so that programmers can use standard C practices; applications thus
become much easier to develop and port. This article provides an overview of
programming conventions that C/C++ programmers no longer need and a discussion of
the new programming practices in C/C++ version 7.0. A bibliography of suggested
reading material is included at the end of this article.
Note: The information in this article is valid only for Microsoft Windows version 3.x
standard and enhanced modes.
Introduction
The Microsoft® C/C++ version 7.0 compiler and run-time libraries were designed for the
Microsoft Windows™ operating system. For this reason, programmers no longer have to
follow many of the conventions that differentiated Windows-based programs from MS-
DOS®–based programs. For example, C/C++ programmers can now use:
Single Instances
The behavior of Microsoft C version 6.0 was one reason why programmers were
reluctant to use the large model. C version 6.0 built large-model applications with
multiple read/write data segments. Windows forces an application that uses multiple
read/write data segments to be single instance; therefore, applications built by C version
6.0 would run only single instance.
If you want to build a single-instance application, the Microsoft C/C++ compiler's large
model gives it to you for free. There is no need to check hPrevInstance—Windows does
all the work for you, including putting up an informative dialog box that tells the user that
only one instance can run.
Note If you are not using the Microsoft C/C++ compiler, you should check the
documentation for your C compiler to see which options will generate multiple read/write
data segments.
Multiple Instances
For more information, see the "Programming at Large" and "Allocating Memory the
Newfangled Way: The new Operator" technical articles on the Microsoft Developer
Network CD (Technical Articles, C/C++ Articles).
Performance
It is preferable to optimize code using portable techniques. If you spend a week making
functions near and using other optimizations specific to a segmented architecture, the
optimizations (and your week of work) will be lost when you port the code to Windows
NT™. Instead, you could spend the week reworking the algorithms used in the code that
is executed the most. These improvements will impact performance more significantly
than which language, compiler, or compiler options you use.
However, if your marketing department changes specifications faster than an 80486 can
prefetch an instruction, algorithms often change overnight. In this situation, a
programmer must often use the compiler (sometimes blindly) to try to speed up code
instead of optimizing the code itself.
Why would a program want to use main? Possibly for portability or to use a common
source between Windows and MS-DOS or UNIX®. Using main also allows
programmers to build upon their MS-DOS knowledge for handling the command line and
the environment.
Not to be outdone by any old application, DLLs can also use main instead of LibMain as
an entry point. However, in C/C++ version 7.0, the Windows libraries include a default
LibMain, so most DLLs will not need a main or LibMain function. This is covered later
in the "Using DLLs" section.
The above information was found in plain and public display in the DETAILS.TXT file,
which is provided with the Microsoft C/C++ compiler. Those interested in reading code
should check out the SOURCE\STARTUP\WIN directory for the STUBMAIN.ASM and
CRT0.ASM files.
Getting to hInstance
The careful reader will be wondering where the program is going to get its instance
handle. Why, from _hInstance, of course! _hInstance is an undocumented feature of the
C/C++ startup code.
When Windows calls the startup code, the instance handle is passed to the startup code in
the DI register, as documented in the Microsoft Windows version 3.1 Software
Development Kit (SDK) Programmer's Reference, Volume 1: Overview, in Chapter 22.
The instance handle is then placed in a global variable called _hInstance. To access this
variable, you must declare it first:
The startup code also includes the following global variables for the other parameters
normally passed to WinMain:
• _hPrevInstance
• _lpszCmdLine
• _cmdShow
The parameters passed to a DLL are different from parameters passed to an application.
The following global variables are defined in the startup code for a DLL:
• _hModule
• _lpszCmdLine
• _wDataSeg
• _wHeapSize
A quick look into the startup code uncovered the above information. The startup code is
included with Microsoft C/C++ version 7.0; look in the SOURCE\STARTUP directory.
For more information on the startup code and what it does, see "A Comprehensive
Examination of the Microsoft C Version 6.0 Startup Code" in the Microsoft Systems
Journal, Vol. 7, No. 1, on the Microsoft Developer Network CD. The article examines C
version 6.0 startup code for MS-DOS, but most of the information is also valid for
version 7.0. This article explains the work the startup code must perform and provides
background information for reading the source code.
Note The startup source code is subject to change between compiler releases. The
inclusion of specific startup variables or functions is not guaranteed in future releases.
_fmalloc manages its own heap on top of the Windows global heap. When _fmalloc is
called, it first checks whether it can satisfy the memory request by simply returning a
pointer to an unused block inside its heap. If it can't, _fmalloc takes one of the following
actions:
• If the 64K limit of the block is reached or no memory had been allocated,
_fmalloc allocates a segment with GlobalAlloc.
You use _ffree to free the memory blocks allocated by _fmalloc. However, _ffree does
not call GlobalFree. Instead, _ffree marks a block as unused, and _fmalloc tries to
satisfy future requests for memory with these unused blocks by reusing them. The
_heapmin function releases unused blocks back to Windows.
For more information on using malloc in a Windows-based program, see the "Allocating
Memory the Old-Fashioned Way: _fmalloc and Applications for Windows" technical
article on the Microsoft Developer Network CD (Technical Articles, C/C++ Articles).
What makes this possible is the GlobalHandle function, which takes a pointer and
returns the handle to it. GlobalHandle removes the need for saving and tracking handles,
resulting in incredible savings in time, memory, and complexity.
• GlobalPtrHandle
• GlobalLockPtr
• GlobalUnlockPtr
• GlobalReAllocPtr
If these macros were C functions, they would be prototyped as follows:
For the curious, here are the definitions of GlobalAllocPtr and GlobalFreePtr:
#define GlobalFreePtr(lp) \
(GlobalUnlockPtr(lp),(BOOL)GlobalFree(GlobalPtrHandle(lp)))
Using DLLs
Microsoft C/C++ version 7.0 run-time libraries provide better support for building DLLs.
Two changes that simplify building DLLs are:
Most of the information for this section can be found in the DETAILS.TXT file, which is
included with the Microsoft C/C++ compiler.
Note The library files that do not include the C run-time functions (for example,
xNOCRTDW.LIB, where x is the memory model) do not have a default LibMain or
WEP function. You must provide your own LibMain and WEP functions if you use
these libraries.
LibMain
Many DLLs are collections of functions that do not need to perform initialization and
therefore do nothing in the LibMain function. If a function does not do anything, it
would be nice if the developer did not have to worry about it. The C run-time libraries
now include a version of LIBENTRY.OBJ and a default LibMain function. So, if the
DLL links to the C run-time functions, it does not have to link to LIBENTRY.OBJ or
provide its own LibMain function.
WEP
It is no longer necessary to include a dummy WEP function in your DLL code. The C
run-time libraries now include a default version of the WEP function. The default WEP
performs the following functions:
1. Calls the optional user termination function _WEP (see next section).
Placing WEP in a fixed segment ensures that it will exist in memory in case of error. For
proper placement of WEP, include the following lines in the .DEF file:
EXPORTS
WEP @1 RESIDENTNAME
The source code for the default WEP function is included with the Microsoft C/C++
version 7.0 compiler. Look in the SOURCE\STARTUP\WIN directory for a file called
WEP.ASM.
_WEP
To add your own processing to the default WEP function, add a _WEP function to your
DLL. (Note the leading underscore character.) Here is an example:
// Put _WEP code into same fixed segment as the WEP function.
#pragma alloc_text(WEP_TEXT, _WEP)
The _WEP function is optional; use this function for cleanup tasks that you want done
when the DLL is unloaded. If you do not provide a _WEP function, the default WEP
function calls the default _WEP function, which simply returns a one (1). To verify for
yourself, check the source in the STUBWEP.ASM file included with the Microsoft
C/C++ compiler in the SOURCE\STARTUP\WIN directory.
• Do not use deep stacks (that is, do not use recursion or call a bunch of functions).
Building DLLs becomes much easier with the default WEP and LibMain functions. It is
almost possible to cut functions from an application and simply recompile them to get a
DLL. Using the large model for both the DLL and the application simplifies this process.
To display the output, the MS-DOS version of Back uses printf and the Windows version
uses trace, which is a function exported from a DLL called TRACE.DLL. trace performs
printf-like printing to the debug monitor. It demonstrates how to export a CDECL
variable argument function from a DLL and shows how simple a DLL can be.
To view the BACK.C and TRACE.C files, click the sample application button at the
beginning of this article.
Conclusions
Microsoft C/C++ version 7.0 introduces new programming practices that facilitate the
development of applications for Windows version 3.1 in protected mode. Programmers
can now:
• Use the large memory model.
• Use _fmalloc.
You no longer need a dummy WEP function because the C/C++ run-time
libraries include a default WEP function. To add your own exit processing, use a
_WEP function.
Bibliography
Technical Articles
All of the articles below are available on the Microsoft Developer Network CD
(Technical Articles, C/C++ Articles):
• "Programming at Large"
Product Documentation
On the Microsoft Developer Network CD, you can find the following books under
C/C++ 7.0 in the Product Documentation section of the Source index:
• Programming Techniques
Other
February 1992
Abstract
This article explains how Microsoft® COBOL programs can pass parameters to and
receive parameters from Microsoft C programs. It assumes you have a basic
understanding of the COBOL and C languages.
• Microsoft COBOL Professional Development System (PDS) versions 4.0 and 4.5
for MS-DOS® and OS/2®
The C interface to COBOL utilizes the standard C extern statement. The following are the
recommended steps for using this statement to execute a mixed-language CALL from C:
1. In the C code, include an extern statement for each COBOL routine CALLed. The
extern statement should be at the beginning of the C program, before any CALLs
to the COBOL routine.
Note: When compiling, if the /Gc compiler directive is used (the /Gc option
causes all functions in the module to use the FORTRAN/Pascal naming and
CALLing conventions), then the cdecl keyword should be used when the COBOL
function is declared (because COBOL uses the C CALLing convention, not the
Pascal CALLing convention).
3. Once a routine has been properly declared with an extern statement, CALL it just
as you would CALL a C function.
4. If passing structures between COBOL and C, compile the C routine with the /Zp1
compiler option to pack structure members.
C Arguments
The default for C is to pass all arrays by reference (near or far, depending on the memory
model) and all other data types by value. C uses far data pointers for compact, large, and
huge models, and near data pointers for small and medium models.
Arrays can be passed by value only if they are declared as the only member of a structure.
The following example passes all 100 bytes of x directly to the C function test():
Note: To pass a pointer to an object, prefix the parameter in the CALL statement with
&. To receive a pointer to an object, prefix the parameter's declaration with *. In the latter
case, this may mean adding a second * to a parameter that already has an *. For example,
to receive a pointer by value, declare it as follows:
int *ptr;
int **ptr;
Near reference is the default for passing pointers in small and medium model C. Far
reference is the default for the compact, large, and huge models.
Note All C programs that are linked with COBOL must be compiled with the large
memory model.
The COBOL to C interface does not support near heap in the C run time. This means you
should not use the function calls that access near heap in your C programs. This includes
the following functions:
• _nfree()
• _nheapchk()
• _nheapset()
• _nheapwalk()
• _nmalloc()
• _nmsize()
To work around this, compile and link with C as the initial program. After the main C
program begins, the COBOL routine can be CALLed. The COBOL code can then CALL
back and forth with C. Since the C support modules are not used, there are no special
restrictions on the near heap functions.
C stores strings as simple arrays of bytes (like COBOL) but also uses a null character
[ASCII NULL (0)] as the delimiter to show the end of the string. For example, consider
the string declared as follows:
char str[] = "String of text"
When passing a string from COBOL to C, the string will normally not have a NULL
appended to the end. Because of this, none of the C routines that deal directly with a
string (printf, sprintf, scanf, and so on) can be used with these strings unless a NULL is
appended to the end.
A NULL can be put at the end of a COBOL string by using the following declaration:
01 CSTRING.
05 THE_STRING PIC X(10).
05 FILLER PIC X VALUE x"0".
Several compile and link options need to be used when interfacing C and COBOL. The
standard C compile line is as follows:
CL /c /Aulf CProgName ;
Option Description
/c Compiles without linking (produces only an .OBJ file).
/Aulf Sets up a customized large memory model.
u SS not equal to DS. DS is reloaded on function entry.
l Selects large memory model Far (32-bit) code pointers.
f Selects large memory model Far (32-bit) data pointers.
For MS-DOS®
For OS/2®
For DOS
For OS/2
Note that the order in which the libraries are specified in the LINK line is important.
Microsoft® COBOL versions 4.0 and 4.5 introduced the shared run-time system.
Although it is generally more useful to link your applications using the static run-time
system (LCOBOL.LIB), you may also choose to link the applications with the shared
run-time library (COBLIB.LIB) to take advantage of its more efficient methods of
utilizing memory. In order to do this and link your applications to Microsoft C, you must
SET the COBPOOL environment variable as referenced in the Microsoft COBOL
Operating Guide.
Common Pitfalls
This list supplies a simple checklist to go over when you encounter problems doing
mixed-language programming:
• Make certain the version numbers of the two languages are compatible. Microsoft
COBOL versions 4.0 and 4.5 are compatible with the C versions 5.1 and 6.x.
• Use the /NOD switch when LINKing to avoid duplicate definition errors. If
duplicate definition errors still occur, use the /NOE switch in addition to the
/NOD switch when LINKing.
• Make certain the C program is compiled in the large memory model and the /Aulf
compile options are used.
• If passing structures (records) to and from COBOL, use the /Zp1 compile option.
(/Zp1 means that structure members will be packed on one-byte boundaries.)
• When COBOL is the main module and there are some C functions that are not
working correctly, make the C routine the main routine and then CALL the
COBOL routine. The COBOL routine can then in turn CALL back into the C
routines. When this method is used, the COBOL/C support modules do not have
to be used. This can correct some incompatibilities.
Batch FIles
The following batch files can be helpful when using the sample programs below. The
CBC6.BAT file can be used to set your environment table correctly, but think of it as a
convenience rather than a necessity when using. This means that you should already have
these parameters preset in your environment when using both languages in tandem.
CBC6.BAT
REM THIS BATCH FILE SHOULD CONFIGURE THE ENVIRONMENT TABLE TO ENABLE
REM YOU TO COMPILE BOTH THE C AND COBOL APPLICATIONS UNDER MS-DOS
REM CORRECTLY.
REM
REM PLEASE LOOK CLOSELY AT THE ENVIRONMENT SETTINGS AND CHANGE THOSE
REM NECESSARY IN YOUR OWN AUTOEXEC.BAT FILE.
REM
REM NOTE: IF, AFTER INVOKING THIS BATCH FILE, YOU SEE THE MESSAGE
REM "OUT OF ENVIRONMENT", YOU WILL HAVE TO INCREASE THE AMOUNT OF
REM ENVIRONMENT TABLE SPACE. PLEASE SEE YOUR MS-DOS MANUAL UNDER THE
REM HEADING COMMAND.COM FOR INSTRUCTIONS ON HOW TO DO THIS.
REM
SET LIB=C:\COBOL\LIB;C:\C600\LIB
SET INCLUDE=C:\C600\INCLUDE;C:\COBOL\SOURCE
SET HELPFILES=C:\C600\HELP\*.HLP
SET INIT=C:\C600\INIT;C:\COBOL\INIT
PATH=C:\COBOL\BINB;C:\COBOL\BINR;C:\C600\BINB;C:\C600\BIN;C:\DOS
RUN.BAT
REM THIS BATCH FILE CAN BE USED TO COMPILE AND LINK BOTH THE C AND
REM COBOL APPLICATIONS FOR MS-DOS.
REM
REM THOSE PROGRAMS THAT REQUIRE A DIFFERENT METHOD OF COMPILING AND
REM LINKING WITHIN THE SCOPE OF THIS APPLICATION NOTE WILL BE NOTED.
REM
REM TO INVOKE THIS BATCH FILE, YOU MUST ENTER THE BATCH FILE NAME,
REM FOLLOWED BY THE C PROGRAM NAME (WITH NO EXTENSION), FOLLOWED BY
REM THE COBOL PROGRAM NAME. FOR EXAMPLE:
REM
REM RUN <C PROGRAM NAME> <COBOL PROGRAM NAME>
REM
cl /c /Aulf %1.c
COBOL %2;
LINK %2 %1 MFC6INTF C6DOSIF C6DOSLB,,,LCOBOL COBAPI LLIBCER /NOD/NOE;
RUN_C.BAT
REM THIS BATCH FILE CAN BE USED TO COMPILE AND LINK UNDER MS-DOS
REM ONLY WHEN THE SAMPLE C CODE IS CALLING A COBOL PROCEDURE.
REM
REM THOSE PROGRAMS THAT REQUIRE A DIFFERENT METHOD OF COMPILING AND
REM LINKING WITHIN THE SCOPE OF THIS APPLICATION NOTE WILL BE NOTED.
REM
REM TO INVOKE THIS BATCH FILE, YOU MUST ENTER THE BATCH FILE NAME,
REM FOLLOWED BY THE C PROGRAM NAME (WITH NO EXTENSION), FOLLOWED BY
REM THE COBOL PROGRAM NAME. FOR EXAMPLE:
REM
REM RUN <C PROGRAM NAME> <COBOL PROGRAM NAME>
REM
cl /c /Aulf %1.c
COBOL %2;
LINK %1 %2,,,LLIBCER LCOBOL COBAPI/NOD/NOE;
WINRUN.BAT
REM THIS BATCH FILE IS USED TO COMPILE AND LINK THE QUICKWIN
REM APPLICATION PROGRAM DEMONSTRATED IN THIS DOCUMENT. THIS IS A
REM SPECIALIZED BATCH FILE. IT HAS BEEN CREATED SPECIFICALLY FOR THE
REM SAMPLE PROGRAM PRESENTED.
REM TO CREATE A GENERIC BATCH FILE, CHANGE ALL OCCURRENCES CDLL AND
REM TEST TO %1 AND %2 RESPECTIVELY.
REM
cl /ML /Gs /c /Zi CDLL.C
LINK CDLL+LIBENTRY,CDLL.DLL,CDLL.MAP/MAP,LDLLCEW+LIBW/NOE/NOD,CDLL
/CO;
IMPLIB CDLL.LIB CDLL.DLL
COPY CDLL.DLL C:\
COBOL TEST TARGET(286);
LINK CBLWINC+TEST+ADIS+ADISINIT+ADISKEY,TEST.EXE,,LIBW+LLIBCEW+LCOBOL+
COBAPIDW+CDLL.LIB,TEST.DEF/NOE/NOD;
Sample Code
The following sample code demonstrates how to pass common numeric types to a C
routine by reference and by value.
COBNUMS.CBL
CFUNC.C
#include <stdio.h>
void CFunc(int *RefInt, long *RefLong, int ValInt, long ValLong)
{
printf("By Reference: %i %li\r\n", *RefInt, *RefLong);
printf("By Value : %i %li\r\n", ValInt, ValLong);
*RefInt = 321;
*RefLong = 987654;
}
OUTPUT
The following sample code demonstrates how to pass an alphanumeric string from C to
COBOL.
_COBPROG.CBL
program-id. "_cobprog".
data division.
linkage section.
01 field1 pic x(6).
C.C
#include <stdio.h>
extern cdecl cobprog(char *Cptr);
char Cptr[] = "ABCDEF";
void main() {
cobprog(Cptr);
}
Output
The following sample code demonstrates how to pass a record from COBOL to a C data
struct.
STRUCT.CBL
procedure division.
call "C_FUNCTION1" using by reference rec-1.
display "CBL varC--> " varC1.
display "CBL varC--> " varC2.
display "CBL varC--> " varC3.
display "CBL varC--> " varC4.
display "CBL varC--> " varC5.
display "CBL var1--> " var1.
display "CBL var2--> " var2.
stop run.
STRUCTC.C
#include <stdio.h>
struct struct1 {
unsigned char var1[8];
unsigned char var2[12];
unsigned int var3[5];
};
OUTPUT
2
3
4
5
1
HELLO
W O R LD
CBL VARC--> 00001
CBL VARC--> 00002
CBL VARC--> 00003
CBL VARC--> 00004
CBL VARC--> 00005
CBL VAR1--> HELLO
CBL VAR2--> W O R LD
The following sample code demonstrates how to pass a record from struct from C to
COBOL.
_COBPROC.CBL
identification division.
environment division.
data division.
working-storage section.
01 Integer pic 9(4).
01 Long pic 9(8).
linkage section.
01 CobRec.
03 COBInt pic s9(4) comp-5 value zero.
03 COBLong pic s9(8) comp-5 value zero.
03 COBString pic x(21) value spaces.
procedure division using CobRec.
move COBInt to Integer.
move COBLong to Long.
display "Integer from C: " Integer.
display "Long integer from C: " Long.
display "String from C: " COBString.
exit program.
STRUCT2C.C
#include <stdio.h>
#include <malloc.h>
#include <stdlib.h>
struct CobRec // defines data type CobRec
{
unsigned int varInt; // integer variable
unsigned long varLong; // long int
char szString[21]; // string variable
};
/* COBOL routines are cdecl; this means the name must be prefixed
* with '_'. Alternatively, you can manually reverse the
* parameters.
*/
extern far cdecl COBPROC(struct CobRec *cPtr);
main()
{
struct CobRec *cPtr; // declare pointer to struct
printf("\n\n\n");
COBPROC ( cPtr);
}
OUTPUT
The following sample code demonstrates how to pass an array of integers from COBOL
to C.
INTARRAY.CBL
#include <stdio.h>
void CProc(int IntTable[4]) {
int count;
OUTPUT
Array [0]: 1
Array [1]: 2
Array [2]: 3
Array [3]: 4
Array [4]: 5
The following sample code demonstrates how to pass a two-dimensional array of long
integers from COBOL to C.
LINT.CBL
procedure division.
perform varying I1 from 1 by 1 until I1 > 2
perform varying J1 from 1 by 1 until J1 > 3
move J1 to the-table(I1, J1)
end-perform
end-perform.
call "_CProc" using t-table.
stop run.
LINTC.C
#include <stdio.h>
void CProc(long IntTable[2][3]) {
int i, j;
OUTPUT
Array [0,0]: 1
Array [0,1]: 2
Array [0,2]: 3
Array [1,0]: 1
Array [1,1]: 2
Array [1,2]: 3
The following sample code demonstrates how to pass a two-dimensional array of records
from C to COBOL
_COBPROC.CBL
program-id. "_CobProc".
data division.
working-storage section.
01 I1 pic 9.
01 J1 pic 9.
linkage section.
01 the-table.
02 t-table occurs 2 times.
05 t-field occurs 3 times.
10 field1 pic 9(4) comp-5.
10 field2 pic x(6).
2DRECS.C
#include <stdio.h>
void main() {
int i, j;
OUTPUT
table[1][1]: 00000
[0][0]
table[1][2]: 00001
[0][1]
table[1][3]: 00002
[0][2]
table[2][1]: 00000
[1][0]
table[2][2]: 00001
[1][1]
table[2][3]: 00002
[1][2]
The following sample code demonstrates how to pass integers by reference from COBOL
to C.
COBINT.CBL
CINT.C
tmp = *var1;
*var1 = *var2;
*var2 = tmp;
return;
}
OUTPUT
The following sample code demonstrates how to pass an integer from COBOL to C.
CBLINT.CBL
working-storage section.
01 pass-var pic 9(4) comp-5 value 3.
procedure division.
call "_Circum" using by value pass-var.
display "Radius of circle: " pass-var.
display "Circumference of circle: " return-code.
stop run.
C.C
#include <stdio.h>
OUTPUT
The following sample code demonstrates passing a long integer from COBOL to C.
LINT.CBL
$set rtncode-size(2)
working-storage section.
01 pass-var pic 9(4) comp-5.
procedure division.
display "Radius of circle?".
accept pass-var.
call "_Area" using by value pass-var.
display "Area of circle: " return-code.
stop run.
LINTC.C
#include <stdio.h>
long Area(int Radius) {
float cir;
cir = 3.14159 * Radius * Radius;
return((long) cir);
}
OUTPUT
Radius of circle?
1
Area of circle: +0003
The following sample code demonstrates how to pass a string from COBOL to C.
COBSTR.CBL
CSTR.C
#include <ctype.h>
OUTPUT
This is what is passed: Replace this
This is what comes back: REPLACE THIS
The following samples demonstrate how to call a C 6.x routine from a COBOL 4.5
program, where the C function, in turn, spawns another COBOL 4.5 executable.
Note: The COBOL program titled COB2.CBL must be compiled and linked as a stand-
alone executable module. Use the following lines to compile and link this program:
COBOL COB2;
LINK COB2,,,LCOBOL COBAPI/NOE/NOD;
MAIN.CBL
PCEXEC.C
#include <stdio.h>
#include <process.h>
pcexec (commandL)
char far commandL[125];
{
printf ("Prior to C call of COB2.EXE \n");
spawnl (P_WAIT, "COB2.EXE", "COB2", "spawnl", NULL);
printf("After C call to COB2.EXE \n");
}
COB2.CBL
OUTPUT
In COBOL program 1
Prior to C call of COB2.EXE
Inside COBOL program 2
After C call to COB2.EXE
End of COBOL program 1
The following samples demonstrate how a COBOL 4.5 Quickwin application can call a
Windows™-based DLL written in C 6.x.
MAIN.CBL
working-storage section.
77 Var1 pic 9(4) comp-5.
77 Char pic x.
procedure division.
move 1 to Var1.
display "Prior to DLL call: " at 0101
display Var1 at 0120.
call 'cdll' using by reference Var1.
display "After DLL call: " at 0201.
display Var1 at 0217.
call "cbl_read_kbd_char" using Char.
stop run.
MAIN.DEF
CDLL.C
#include <windows.h>
int FAR PASCAL LibMain(HANDLE hInstance,
WORD wDataSeg,
WORD cbHeapSize,
LPSTR lpszCmdLine)
{
//Additional DLL initialization fits here
if (cbHeapSize != 0)
UnlockData(0);
return (1);
}
CDLL.DEF
LIBRARY cdll
DESCRIPTION 'C DLL FOR WINDOWS 3.0'
EXETYPE WINDOWS
STUB 'WINSTUB.EXE'
CODE PRELOAD MOVEABLE DISCARDABLE
DATA PRELOAD MOVEABLE SINGLE
HEAPSIZE 0
EXPORTS Cdll @1
WEP @2 RESIDENTNAME
Abstract
This article explains how the Microsoft® overlay virtual environment (MOVE) helps
overcome memory limitations for programs that run in the MS-DOS® operating system.
The article compares MOVE technology to conventional overlays and to paged virtual
memory systems, and explains the basics of the technology.
Introduction
Along with death and taxes, all programmers eventually share another misery:
insufficient memory. Since the beginning of their profession, programmers have needed
to cram too-big programs into too-little random-access memory (RAM). Programmers for
MS-DOS® are further restricted by the infamous 640K limit; a program running on a 4
MB computer, for example, can directly execute only in the first 640K of RAM. Many
techniques have been employed to overcome this limitation: optimizing compilers,
interpreters, MS-DOS extenders, and so on. The most commonly used technique,
overlays, is also one of the most cumbersome to use. The new Microsoft overlay virtual
environment (MOVE) is a significant advance over previous overlay methods. MOVE is
both easier to use and more effective than conventional overlay systems.
In many ways, the MOVE technology combines the benefits of overlays and virtual
memory. Some of the advantages of MOVE over conventional overlays are:
• The MOVE system keeps multiple overlays in memory at the same time. This
makes devising efficient overlay structures much easier.
• MOVE supports pointers to functions. You do not need to modify your source
code.
• The memory allocated for overlays can be set at program startup. Your program
can adapt to different memory situations.
The MOVE technology can be used only in MS-DOS operating system programs.
Programs in the Microsoft Windows™ graphical environment automatically take
advantage of a similar mechanism built into Windows.
The next three sections cover the basics of conventional overlays and virtual memory. If
you're already familiar with these concepts, you can skip ahead to "MOVE Basics."
Overlay Basics
If you're not using overlays or other techniques, your program size cannot exceed
available memory. When loading your program, MS-DOS copies the program's code and
data segments into memory, starting at the first available memory location and continuing
to the end of the program (see Figure 1).
When you use overlays, the linker automatically includes a routine called the overlay
manager in your program's EXE file. When the program calls a function located in
another overlay, the overlay manager loads the necessary overlay into memory,
overwriting the previous overlay (Figure 2).
This way, a program can be many times larger than available memory; it only needs
sufficient memory to hold the root and the largest overlay. In some overlay systems the
overlays are included within the EXE file, whereas in others the overlays are separate
files, usually with the OVL extension. You need not keep track of which overlay is in
memory or which function is in which overlay; the overlay manager automatically
handles loading the appropriate overlay when necessary.
Well, if overlays sound too good to be true, you're right; they have some drawbacks. They
slow your program down, sometimes considerably. All that reloading of overlays from
the disk can gum up the works. Reading an instruction from an overlay on the disk can be
several thousand times slower than reading the instruction from an already-loaded
overlay, so the speed of your program depends heavily on how the overlays are
structured. Ideal candidates for overlays are functions that are called only once during a
program's execution, like initialization or error-handling routines. Routines that are used
together should be grouped into the same overlay so that multiple overlays needn't be
loaded to accomplish a task. The worst situation is caused by a tight inner loop calling
routines in two different overlays. In cases like this, the computer spends more time
loading overlays from disk than executing instructions. This phenomenon, called
thrashing, is accompanied by grinding from your user's hard disk and groaning from your
users.
Root Overlays
MAIN.C 1: DATABASE.C
2: DATAFORM.C
3: DATEUTIL.C
4: INIT.C
5: PRINTER.C
6: STRUTIL.C
Root Overlays
MAIN.C 1: DATABASE.C
DATEUTIL.C (except DatabaseInit)
STRUTIL.C
2: DATAFORM.C
3: INIT.C
(plus DatabaseInit from DATABASE.C)
(plus PrinterInit from PRINTER.C)
PRINTER.C
(except PrinterInit)
Producing a good overlay structure requires lengthy and tedious trial-and-error work. As
new capabilities are added to your program, the structure quickly becomes obsolete.
Programmers working on a large system that contains hundreds of source files and
thousands of functions often spend as much time tuning the overlay structure as they do
writing code.
All addresses used in a VM program are virtual addresses. The computer's virtual
memory manager maps virtual page addresses to the physical addresses of memory.
When a program needs a virtual memory page that is not mapped to a physical page in
memory, the virtual memory manager copies the contents of that page from disk to a page
of physical memory. The operating system maps the virtual address of the page to the
physical address of the page's contents. This way, when the program reads from a
particular virtual address, the computer's VM mapping scheme ensures that the program
reads from the appropriate physical page. The computer doesn't need room for all the
pages containing a program. The more physical pages available, the less disk activity
needed and the faster the program runs. The operating system's VM manager handles
loading pages from the disk, swapping modified pages to the disk and translating virtual
addresses to physical addresses.
Virtual memory has several advantages over overlays. First, it does not require
programmer effort and eliminates the tedious process of creating overlay structures.
Second, the program performs efficiently regardless of the amount of memory the user's
computer contains. Most of the program's execution time is spent in a small fraction of
the code. As the program executes, pages containing this core code replace pages with
less critical code. The set of pages that make up the often-used code is called the
program's working set. If the working set can fit in the computer's physical memory, the
program executes efficiently and swaps pages only occasionally for infrequently used
routines. If the working set cannot fit in the computer's memory, the computer thrashes,
spending more time loading code from the disk than executing the program.
Of course, VM is no panacea either. First, the virtual memory manager and the address
translation scheme must be part of the computer hardware. The more powerful members
of the Intel® CPU family, particularly the 80386 and higher, support address translation.
Less powerful CPUs, however, do not support this feature. Second, the virtual memory
manager must realistically be an integral part of the operating system. MS-DOS does not
support virtual memory.
MOVE Basics
Microsoft's new MOVE overlay technology has the best of both the overlay and virtual
memory worlds. MOVE is an overlay system but has significant advantages over
conventional overlays. Unlike conventional overlays, MOVE allows more than one
overlay to reside in memory simultaneously. Like virtual memory, the MOVE memory
manager keeps resident as many overlays as will fit. Each overlay need not fully cover a
single task; two or three overlays can cooperate to complete the task. When loading a
new overlay, MOVE discards the least recently used (LRU) overlay. If there is still
insufficient room for the new overlay, MOVE discards the next least recently used
overlay, and so on.
With MOVE you can make your overlays smaller and more modular, letting the LRU
algorithm determine which overlays stay in memory. Some of your overlays may remain
in memory because they are needed for the normal operation of the program. This
working set of overlays is similar to the working set of pages in a virtual memory system.
Like virtual memory, MOVE programs naturally configure themselves for efficient
operation on a given computer. Unlike virtual memory, however, you are not limited to
fixed-size pages; you can group functions for better control. For example, if function A is
called each time function B is called and only when function B is called, you can group A
and B in the same overlay to save the disk time of loading them separately.
MOVE Mechanics
You don't need to modify your C source code to create a MOVE application, but you do
need to modify your CL and LINK command lines. These changes are described in the
"Creating Overlaid Programs" section.
Like a nonoverlaid program, a MOVE application has a single EXE file. The EXE file
contains the root and all overlays. The file also contains the overlay manager routines
(about 5K), which are automatically added by the linker. When a MOVE application is
launched, the program's startup routine allocates a memory area to store the overlays.
This area, called the overlay heap, is distinct from the regular heap used for malloc.
When your application calls a function in an overlay that is not currently loaded in RAM,
the MOVE manager must read the overlay from disk and copy its contents to the overlay
heap before program execution can continue. If the heap does not have enough free space
to hold the requested overlay, the MOVE manager discards one or more of the currently
resident overlays. The least recently used overlay is discarded first. Because overlays can
vary in size, the MOVE manager may have to discard multiple overlays to make
sufficient room for the requested overlay.
If your program is running on a computer with EMS or XMS memory, the MOVE
manager can create an overlay cache for copying discarded overlays. The program cannot
execute overlays directly from this cache because the cache resides above the 640K limit.
If a discarded overlay is needed again, the manager copies it from the overlay cache to
the overlay heap rather than reading it from the disk. Because reading from the cache is
much faster than reading from the disk, the space for your working set is effectively the
cache size plus the heap size. The overlay manager routines maintain the overlay cache
with an LRU algorithm in a manner similar to the overlay heap.
At program startup, the MOVE manager attempts to allocate an overlay heap equal to the
sum of the program's three largest overlays. If space is insufficient or there are less than
four program overlays, MOVE allocates a heap that is the size of the largest overlay. The
remaining computer free memory is retained for the conventional (malloc) heap. (This is
default initialization behavior and can be substituted by another scheme if desired.)
If the program is running on a computer with EMS or XMS memory, the MOVE manager
attempts to allocate an overlay cache three times the size of the overlay heap. If there is
not enough memory for a cache this size, all EMS or XMS memory is used.
When the MOVE manager discards an overlay from the heap, it does not copy the
overlay to the cache if a copy of the overlay is already in the cache.
Individual overlays can be up to 64K in size but are usually much smaller. Overlays can
be individual OBJ files, as in a conventional overlay system, or they may contain a list of
functions. With large overlays, your program's performance will suffer the problems
associated with conventional overlays. Your overlays should be large enough to justify
the time it takes to load them from disk. Specifics vary depending on your program, and
experimentation will help you find the optimal overlay size and organization. For most
programs, an optimal overlay size is about 4K.
If your overlaid program temporarily needs the EMS or XMS memory occupied by the
cache, you can use the MOVE application programming interface (API) _movepause
function to release the cache memory and _moveresume to restore the cache. This is
particularly useful if your program spawns another program that needs EMS or XMS
memory to function. The MOVE API functions are described in Appendix A.
EXETYPE DOS
FUNCTIONS:1 _database
FUNCTIONS:2 _dataform
FUNCTIONS:3 _init
FUNCTIONS:3 _printer
For more information on the syntax of DEF files, see "Creating Overlaid MS-DOS
Programs" and "Creating Module Definition Files" in the C/C++ Environment and Tools
manual.
MOVE gives you control over the placement of individual functions. Instead of moving a
function's source code physically to another file, you specify the function in a
FUNCTIONS statement in your application's DEF file. A function can be specified in
this way only if it is a packaged function. Functions can be packaged by specifying the
/Gy switch during compilation. For more information on packaging functions, see "CL
Command Reference" and "Creating Overlaid MS-DOS Programs" in the C/C++
Environment and Tools manual.
You can modify some of the characteristics of the MOVE manager. For example, you can
change the amount of memory MOVE allocates for the overlay heap and cache by
changing the constants and heuristics in the MOVEINIT.C file. For more information, see
"Creating Overlaid MS-DOS Programs" in the C/C++ Environment and Tools manual.
Appendix A: The MOVE API
The MOVE API is provided in a library called MOVE.LIB. This library is a component
of the C combined libraries for medium and large models. (Another form of the library,
MOVETR.LIB, also contains the MOVE API; see Appendix C.) The MOVE API is
declared in the MOVEAPI.H file, which is available on disk. This appendix describes
MOVE routines and functionality.
MOVE begins an overlaid program with a call to _moveinit, which calculates the heap
and cache needed for the overlays and allocates memory for the heap and cache.
You can use the default _moveinit function provided in MOVE.LIB, or you can write
your own version of _moveinit and link it to your program. The source code for the
default _moveinit function is available in the MOVEINIT.C file.
The _moveinit call occurs before the call to _astart that begins a C program and
performs initialization. For this reason, do not call C run-time routines from any version
of _moveinit.
• _movesetheap
• _movegetcache
• _movesetcache
The functions are described in the sections below. In addition, LINK creates several
variables that begin with $$; these variables are described in the "LINK Variables"
section.
Heap Allocation
where:
maxovl is the maximum number of overlays. The $$COVL variable always contains
this value.
minheap is the minimum heap size, specified in 16-byte paragraphs. The heap must
be at least the size of the largest overlay. To calculate overlay sizes, use
$$MPOVLSIZE as in MOVEINIT.C.
reqheap is the requested heap size, specified in 16-byte paragraphs. The default
_moveinit function requests the sum of the sizes of the three largest
overlays.
MOVE attempts to allocate the requested amount of memory. If that much memory is not
available, MOVE tries to allocate as much as possible. If the amount of available memory
is less than the minimum heap requested, MOVE ends the program and issues a run-time
error.
Cache Allocation
The _movegetcache function determines the amount of memory available for a cache.
where:
The _movesetcache function allocates expanded and extended memory for an overlay
cache.
where:
The _movesetcache function sets the following global variables when the overlay cache
is allocated:
The _moveckbxms variable is set to the size of the allocated extended memory. The
_moveckbems variable is set to the size of the allocated expanded memory.
You can temporarily release and then restore the memory allocated for the overlay cache.
This is useful when your program spawns another program that uses extended or
expanded memory or when you want to prepare for a possible abnormal exit from your
program.
The _movepause function frees the cache memory and closes the executable file.
The _moveresume function reallocates memory for the overlay cache and reopens the
executable file.
The _movepause function reads the value in _movefpause and sets _movefpaused to the
value of the action taken by _movepause. Before you call _movepause, set _movefpause
to __MOVE_PAUSE_DISK to close the file, and set it to __MOVE_PAUSE_CACHE to
free the cache, as in:
_movefpause |= __MOVE_PAUSE_DISK;
_movefpause |= __MOVE_PAUSE_CACHE;
_movepause();
The _moveresume function reads the value in _movefpaused and then clears
_movefpaused. The overlays that were in the heap and cache are not restored. Therefore,
after a call to _moveresume, the program may at first run slightly more slowly as it
makes calls to routines in overlays.
LINK Variables
To use these variables, set them to strings that represent the desired settings. Each string
must consist of exactly four hexadecimal digits.
Create a tracing version of your program as described in the following sections. When
you run your program, the tracing functions create a binary file called MOVE.TRC in the
directory from which the program is run. After your program ends, use TRACE to read
MOVE.TRC. If the tracing results indicate that some functions cause overlays to be
swapped frequently, you can reorganize the functions in the overlays by using statements
in the module definition file.
By default, tracing is in effect during the entire run of your program. You do not need to
make any changes in your program to enable tracing. However, MOVETR.LIB provides
two functions that you can use to turn tracing on and off within your program.
This function opens the MOVE.TRC file and activates tracing. During tracing,
information about overlay behavior is written to MOVE.TRC. The default _moveinit
function calls _movetraceon at the start of the program if MOVE_PROF is defined; this
definition is in MOVETR.LIB.
The tracing functions are declared in MOVEAPI.H. They are defined only in
MOVETR.LIB.
Running TRACE
The tracefile is the MOVE.TRC file created during a tracing session. You can specify a
path with the filename. If tracefile is not specified, TRACE looks in the current directory
for a file called MOVE.TRC.
An option is preceded by an option specifier, either a forward slash (/) or a dash (–).
Options are not case sensitive. An option can be abbreviated to its initial letter. Options
can appear anywhere on the command line.
TRACE Output
TRACE displays information on the tracing session to the standard output device. You
can use the redirection operator (>) to save the output in a file. The output is in table
format. Each line of output represents an interoverlay transaction. A line of information is
organized into the following fields:
• The overlay to which to return from the current transaction. (If blank, the overlay
in the previous line is implied.)
• The physical return address in segment:offset form. (If blank, the address in the
previous line is implied.)
• Invalid
• The overlay that is the object of the transaction.
• Return.
When you run TRACE with the /SUM option, TRACE displays a summary of overlay
performance to the standard output device. The full session is not displayed. You can use
the redirection operator (>) to save the output in a file. The summary information is
organized into the following fields.
OVERALL
HEAP
CACHE
TRACE Errors
The string specified with the /EXE option was not a valid filename.
The /EXE option must be followed by a colon and a filename, with no spaces in between.
TR1007Unrecognized option
The command line contained an option specifier, either a forward slash (/) or a dash (–),
followed by a string that was not recognized as a TRACE option.
• A trace file was specified on the command line, but the specified file does not
exist.
• No trace file was specified on the command line and TRACE assumed a trace file
called MOVE.TRC, but MOVE.TRC does not exist.
TR1011Error opening/reading .EXE file
TRACE either failed to find the executable file specified with /EXE or encountered an
error while opening the file.
TR1012Out of memory
The available memory is insufficient for the size of the program being traced.
The debugging information contained in the executable file was not packed using
CVPACK version 4.0.
TRACE could not find a function name to display. TRACE continues to generate output
without displaying the function name.
Function names are displayed when the /EXE option is specified. Either the executable
file contains corrupt debugging information or a module in the executable file was
compiled without the /Zi option for including debugging information.
TRACE could not find a symbol to correspond to a given physical address. A module
may have been compiled without the /Zi option for including debugging information.
Mr. Rogerson is widely known for having reported the largest number of duckbilled
platypus sightings in the greater Seattle area.
Click to view or copy the Zusammen sample application files for this technical article.
Abstract
One of the key issues in the development and design of commercial applications is
optimization—how to make an application run quickly while taking up as little memory
as possible. Although optimization is a goal for all applications, the Microsoft®
Windows™ graphical environment presents some unique challenges. This article
provides tips and techniques for using the Microsoft C version 6.0 and C/C++ version 7.0
compilers to optimize applications for Windows. It discusses the following optimization
techniques:
• If your application runs in real mode, always optimize for size. Memory is the
limiting resource in real mode. Using too much memory leads to both speed loss
and memory loss, resulting in a performance hit.
• Memory is not as scarce in protected mode (that is, in standard and enhanced
modes) as it is in real mode, so you must decide whether to optimize for speed or
for size. However, as users start running multiple programs simultaneously,
memory becomes scarce. The rule of thumb for both Windows and other
operating environments is to optimize for speed the 10 percent that runs 90
percent of the time. Tools such as the Microsoft Source Code Profiler help
determine where optimizations should be made.
Note The Microsoft C version 6.0 compiler precedes most function modifiers with a
single underscore (_), for example, _loadds, _export, _near, _far, _pascal, _cdecl, and
_export. The Microsoft C/C++ version 7.0 compiler uses two underscores (__) for ANSI
C compatibility but recognizes the single underscore for backward compatibility. This
article uses C version 6.0 compiler syntax except when discussing features available only
in C/C++ version 7.0.
The make files for Zusammen and Picker are combined for simplicity. All functions are
classified as local, global, entry point, or DLL entry point and declared with an
appropriate #define statement, for example:
For demonstration purposes, the symbols are defined in the make files. Using symbols
facilitates switching memory models and optimizing applications. You can also port
applications to flat-model environments easily by using #define NEAR and #define FAR
(from WINDOWS.H) instead of __near and __far. Some possibilities are:
or:
The Solution
Tables 1 through 3 show options recommended for general use. These options can be
used as defaults in make files because they do not require changes to the source code to
compile correctly. Each table shows the options for building an application and a DLL
and differentiates between the debugging (development) phase and the released product.
The options in Table 1 apply to applications or libraries that run in real mode; the options
in Tables 2 and 3 apply to applications or libraries that run only in protected mode. Table
3 is for C/C++ version 7.0 use only.
The developer must choose either the /Ot option to optimize for speed (time) or the /Os
option to optimize for size. The C version 6.0 compiler defaults to /Ot. The C/C++
version 7.0 compiler defaults to /Od, which disables all optimizations and enables fast
compiling (/f).
The /Oa and /Ow options do not appear in the tables; both options assume no aliasing
and require that the C source meet certain conditions to work properly. These two options
are discussed in the "Aliasing and Windows" section. In general, use /Ow instead of /Oa
for Windows-based applications. You can turn the no-aliasing assumption on and off
using #pragma optimize with the a or w switch.
Another option that is not included in the tables is the optimized prolog/epilog option
/GW. In C version 6.0, this option generates code that does not work in real mode; it is
fixed in C/C++ version 7.0. For backward compatibility, the C/C++ version 7.0 /Gq
option generates the same prolog/epilog as the C version 6.0 /GW switch. Although the
fixed /GW option results in a smaller prolog for non-entry-point functions, better
optimizations are available for protected-mode applications, as discussed in the next
section.
Table 1. Compiler Options for Real Mode (C 6.0 and C/C++ 7.0)
If your application runs only in protected mode, you can use the additional optimization
options shown in the second row of Table 2. Make1 demonstrates the use of these
options, which are safe for all modules in a protected-mode application.
You can realize additional savings in space and time by compiling modules without entry
points separately from those with entry points. Use the options in the third row of Table 2
for modules without entry points. Make2 demonstrates the use of both sets of options.
The Zusammen sample application is already set up with far calls and entry points in
separate C files. This application should run only in protected mode, so you should
compile with the resource compiler (RC) /T option to ensure that the application never
runs in real mode.
DLLs can benefit from the techniques presented in the "Optimized DLL Prolog and
Epilog" section. These techniques work with both C version 6.0 and C/C++ version 7.0.
Table 2. Compiler Options for Protected Mode Only (C 6.0 and C/C++ 7.0)
The C/C++ version 7.0 compiler includes special optimizations for protected-mode
Windows programs (see Table 3). These special optimizations include /GA (for
applications), /GD (for DLLs), and /GEx (to customize the prolog) and help reduce the
amount of overhead the prolog/epilog code causes. The /GA and /GD options add the
prolog and epilog code only to far functions marked with __export instead of compiling
all far functions with the extra code. With __export, entry points need not be placed in a
separate file as required by C version 6.0.
Applications that do not mark far functions with __export can use the /GA /GEf or /GD
/GEf options to generate the prolog/epilog code for all far functions. /GEe causes the
compiler to export the functions by emitting a linker EXPDEF record. By default, /GD
emits the EXPDEF record but /GA does not. Applications compiled with /GA usually do
not need the EXPDEF record. Only real-mode applications need /GEr and /GEm;
protected-mode applications have no use for these options. The following options
generate equivalent prolog/epilog code:
The /Oi option replaces often-used C library functions with equivalent inline versions.
This replacement saves time by removing the function overhead but increases program
size because it expands the functions.
In C version 6.0, the /Oi option is not recommended for general use because it causes
bugs in some situations, especially when DS != SS. Using #pragma intrinsic to
selectively optimize functions reduces the chance of encountering a bug.
The ZUSAMMEN.C module of the sample application demonstrates the use of #pragma
intrinsic. Although this particular use does not drastically increase program speed, it does
demonstrate the right ideas: It speeds up the WM_PAINT function and is used on a
function that is called three times per WM_PAINT message. The best savings occur when
the intrinsic function is in a loop or is called frequently.
The /Zp option controls storage allocation for structures and structure members. To save
as much memory as possible, Windows packs all structures on a 1-byte boundary.
Although this saves memory, it can result in performance degradation. Intel® processors
work more efficiently when word-sized data is placed in even addresses. An application
must pack Windows structures to communicate successfully with Windows, but it need
not pack its own structures. Because Windows structures are prevalent, it is better to
compile with the /Zp option and use #pragma pack on internal data structures. Passing
an improperly packed structure to Windows can lead to problems that are difficult to
debug. Both Zusammen and Picker use #pragma pack on their internal data structures.
(See the FRAME.H, APP.H, and PACK_DLL.H modules.)
All Windows-based programs should be compiled at warning level 3. You can fix many
hard-to-detect bugs by removing the warnings that appear during compilation. It is less
expensive to fix a warning message than to ship a bug fix release to unsatisfied users. All
applications should be run in Windows debug mode before release.
It is often easier to turn off optimizations to debug a module. Some optimizations can
introduce bugs into (or remove bugs from) otherwise correct programs. For this reason,
an application must be fully tested with release options, and all developers and testers
should be aware of the options used.
By default, the compiler generates code to "check the stack"; that is, each time a function
is called, chkstk (actually _aNchkstk) compares the available stack space with the
additional amount the function needs. If the function requires more space than is
available, the program generates a run-time error message. Table 4 (below, under
"Examining the Prolog and Epilog Code") shows the call to chkstk, which is removed by
compiling with /Gs. Stack checking adds significant overhead, so it is usually disabled
with the /Gs option after sufficient testing. It is usually a good idea to reenable stack
checking on recursive functions with the check_stack pragma.
The C header files use /D_WINDOWS and /D_WINDLL to determine the correct
prototypes and typedefs to include. /D_WINDLL ensures that using an invalid library
function in a DLL generates an error. The C/C++ version 7.0 compiler /GA option
automatically sets /D_WINDOWS; the /GD option sets both /D_WINDOWS and
/D_WINDLL.
Because /Gw adds the extra code only to far functions, reducing the number of far
functions is a good way to trim program size. In the small memory model, all functions
are near unless explicitly labeled as far, so reducing far calls is not a problem. In the
medium memory model, all functions default to far and therefore receive the extra prolog
and epilog code. In C version 6.0, you can use two methods to reduce this overhead:
• Organize source modules. Label all functions explicitly as either near or far, and
compile with the medium model.
C/C++ version 7.0 users do not need either of these methods; they can use the /GA and
/GD options to add prolog/epilog code only to far functions marked with __export. Other
far functions are compiled without additional overhead. To add the prolog and epilog
code to all far functions, use /GA /GEf or /GD /GEf.
To reduce the number of far calls, you must organize source modules carefully. Each
module is divided into internal functions and external functions. Internal functions are
called only from within the module; external functions are called from outside the
module. As a direct result of this arrangement, internal functions are marked near and
external functions are marked far.
The Zusammen sample application is arranged in this manner. Each module has a header
file that prototypes all external functions as far. Each source file prototypes its internal
functions as near because they are not needed outside the module.
For large applications, you can use a tool such as MicroQuill's Segmentor to determine
the best segmentation to use. You can also organize source modules manually, but the
process must be repeated whenever the source file changes.
Another method for reducing far call overhead is to use the FARCALLTRANSLATION
and PACKCODE linker options. This method works exclusively on protected-mode-
only applications and should not be used in real mode. PACKCODE combines code
segments. You can specify the size of the segments to pack on the command line (for
example, /PACKCODE:8000). The default size limit is 65530 bytes. C/C++ version 7.0
turns PACKCODE on by default for all segmented executables. If a far function is called
from the same segment, FARCALLTRANSLATION replaces the far call with a near
call.
Mixed-model programming
In mixed-model programming, the small model acts as the base. All far functions are
explicitly labeled as in the previous method. Each module is compiled with the /NT
option, which places the module in a different segment, for example:
Because the small model is used, all other functions default as near model and presto!—
no far call overhead. The SDK Multipad sample application uses this method for
compiling, although many of its near functions are labeled as such. Make3 compiles
Zusammen using this method.
In practice, this method does not save much work—it only eliminates the need to label
near functions explicitly. However, labeling near functions is useful for documenting
local and global functions.
In mixed-model programming, only functions in the default _TEXT code segment can
call the C run-time library. Multipad avoids this limitation by not calling any C run-time
library functions. Mixed-model programming uses the small-model C library, which is
placed in the _TEXT segment. Because these library routines are based in small model,
they assume all code as near. If a C library function is called from a different segment, a
linker fixup error occurs because the linker cannot resolve a near jump into another
segment. There is no convenient way to avoid this restriction.
Because the C run-time library is not used, you need not link to it. The Windows version
3.1 SDK includes libraries named xNOCRTW.LIBthat do not contain any C run-time
functions. Each memory model has one such library containing the minimum amount of
code needed to resolve all compiler references. Using this library saves about 1.5K from
the _TEXT code segment size and about 500 bytes from the default data segment size.
Linking time also improves slightly. When using the xNOCRTW.LIB libraries, note that
the standard C libraries may contain some operations that seem ordinary (such as long
multiplication).
Decreasing the number of far functions is only part of the battle. Not all far functions
need the full prolog and epilog code, as the existence of the /GW, /GA, and /GD options
shows. The C/C++ version 7.0 /GA and /GD options provide the best achievable
optimizations of the prolog and epilog code. The C version 6.0 /GW option provides an
optimized version of the prolog/epilog code for far functions that are not entry points.
However, when armed with a little knowledge, the C version 6.0 compiler user can
generate better results for protected-mode applications than those the /GW option
provides, as discussed in the following sections.
The prolog/epilog code sets the DS register to the correct value to compensate for the
existence of multiple data segments and their movements. The second column of Table 4
shows the assembly-language listing of the prolog/epilog code that every far function
receives when it is compiled with /Gw. The last column shows the prolog/epilog code
that near functions receive. This is the same code that far functions contain when they are
not compiled with /Gw.
C/C++ version 7.0 provides additional optimizations for real mode, even if you use the
/Gw and /GW options. These optimizations include:
• Using mov ax,ds instead of a push/pop sequence in the Preamble phase.
• Using lea sp, WORD PTR -2[bp] for the Release Frame phase.
Most of the prolog/epilog code is not needed in protected mode but is essential for real
mode. The /GW option does not have the push ds instruction that all far functions
require in real mode to save the data segment; for this reason, /GW does not work in real
mode. Not much can be done to optimize the prolog/epilog code that C version 6.0
generates for real-mode applications, so this article focuses on optimization in protected
mode only. For more information on what happens during real mode, see Programming
Windows by Charles Petzold (Redmond, Wash.: Microsoft Press, 1990). For the compiler
writer's viewpoint, see the Windows version 3.1 SDK Help file.
The order of phases in the C/C++ version 7.0 compiler options /GA and /GD differs
slightly from that of /Gw: The Alloc Frame phase occurs before the Save DS and Load
DS phases (when compiling without /G2). As a result, the /GA and /GD options remove
the two dec bp instructions from the Release Frame phase. The compiler output for the
/GA and /GD options is shown in Table 6.
The Mark Frame and Unmark Frame phases are not needed during protected mode and
can be ignored. The prolog/epilog code for a near function and the prolog/epilog
compiled with /Gw differ in four phases: Preamble, Save DS, Load DS, and Restore DS.
The other phases—Link Frame, Alloc Frame, Release Frame, and Unlink Frame—are the
same; they set up the stack frame for the function. (See Figure 1.)
The compiler generates code to access the parameters passed to the function using
positive offsets to BP ([BP + XXXX]). Negative offsets from BP ([BP – XXXX]) access
the function's local variables. This happens for all C functions—near functions, far
functions, and functions compiled with the /Gw option.
Because protected mode requires an 80286 processor at the minimum, you should use
some of the special 80286 instructions through the /G2 option. Two instructions—enter
and leave—are relevant to our current discussion. Enter performs the same function as
Link Frame and Allocate Frame, and leave performs the same function as Release Frame
and Unlink Frame. Table 7 shows the prolog/epilog code for near and far functions
compiled with the /G2s option and without the /Gw option.
Unfortunately, the /Gw option overrides the /G2 option in C version 6.0 and generates the
prolog/epilog code without the enter and leave instructions. The C/C++ version 7.0
compiler corrects this limitation; it generates Windows prolog/epilog code with the enter
and leave instructions when it compiles with /GA or /GD and /G2. Table 8 shows the
prolog/epilog code for functions compiled with C/C++ version 7.0 options.
Table 8. Assembly Listing of Prolog and Epilog Code for C/C++ 7.0 (Protected Mode
Only)
The Preamble, Save DS, Load DS, and Restore DS phases exist only when you compile a
far function with a Windows option (/Gw, /GW, /GD, or /GA). Programs developed for
Windows, unlike those developed for MS-DOS, can have multiple instances, each with
its own movable default data segment. When control is transferred from Windows to an
application or from an application to a DLL, a mechanism is needed for changing DS to
point to the correct default data segment. This mechanism consists of the prolog/epilog
code, the Windows program loader, the EXPORT section of the DEF file (or _export),
and the MakeProcInstance function.
Nothing seems to happen in the Preamble, Save DS, and Load DS phases:
The Windows program loader magically changes the Preamble phase of the prolog. The
loader first examines the list of exported functions when it loads a program. When it finds
an entry-point function with the /Gw preamble, it changes the preamble. If the function is
not exported or the preamble is different, the loader leaves it alone, and DS retains its
value. For example, in Client_Initialize, the DS register does not have to be changed so
it is not.
If the function is part of a single-instance application, the value can be set directly
because single-instance applications have only one data segment. Because DLLs are
always single instance, they belong to this group. AX is set directly to DGROUP. In the
Load DS phase, DS is loaded with the DGROUP value from AX, resulting in a correct
DS value for the function.
That leaves callbacks such as those used with the EnumFontFamilies function. You can
set up an EnumFontFamilies callback as follows:
FARPROC lpCallBack;
mov ax,XXXX
jmp <actual function address> ;jump to actual function
The return value of MakeProcInstance is the address of the instance thunk. This address
is passed to EnumFontFamilies, which calls the instance thunk instead of the function
itself. The instance thunk sets up AX with the current address of the data segment. In real
mode, Windows changes this address each time it moves the data segment and jumps to
the function that loads DS with the value in AX. And presto! chango! DS has the correct
value.
• An application cannot call an exported far function directly; it must use the result
of MakeProcInstance as a function pointer instead.
• DLLs should not call MakeProcInstance on any exported far function that
resides inside the DLL.
So far, we have not discussed the SS stack segment register. The prolog code does not set
SS anywhere. This must mean that the Windows Task Manager sets SS before the
function is executed. Because a Windows-based application is not normally compiled
with the /Au or /Aw option, SS == DS. So there is no reason why DS cannot be loaded
simply from SS.
Instead of pushing DS into AX, FixDS modifies the prolog to put SS into AX, which is
eventually placed in DS (see the fourth column of Table 10). This preamble differs from
the standard Windows preamble, so the Windows loader does not modify it.
The C/C++ version 7.0 compiler extends the ideas of FixDS by letting the programmer
specify where DS gets its value. You can use the /GEx option in conjunction with the
/GA and /GD options to load DS. The following options are available:
• /GEd—Load DS from DGROUP. This is the default behavior for /GD and is
useful for DLLs, as explained in the next section.
• /GEs—Load DS from SS. This is equivalent to FixDS and is the default behavior
for /GA.
When you compile an application with /GA, the functions marked with __export are not
really exported (you can look at the exported functions with EXEHDR). If you compile
the program with /GA /GEe, the EXEHDR listing shows all exported functions. A
program that you compile with /GA loads DS from SS and does not need to export its
entry points, as mentioned above. A program compiled with /GA /GEa should normally
be compiled with /GEe.
The /GD and /GA options work differently. The /GD option exports functions marked
with __export. To stop the compiler from exporting functions in a DLL, use /GA /GEd
/D_WINDLL /Aw instead of /GD.
Although the previous recommendations (excluding FixDS) work fine with DLLs, a
better optimization method exists. To optimize a DLL with C version 6.0, compile all
DLL modules with the options listed in Table 2 for modules without entry points:
This compilation does not generate prolog or epilog code because the /Gw option is not
used. To load DS correctly, mark all entry-point functions with _loadds. Place the
functions that the client application calls in the DEF file. This changes the prolog/epilog
code to match the second column of Table 10.
_loadds basically adds the same lines that the Windows function loader changes in the
Preamble for a DLL. See Make5 for an example of this method. Again, this is for
protected-mode-only applications.
The /GD option in C/C++ version 7.0 defaults to loading DS from the default data
segment (see the third column of Table 10). The /GD option also sets _WINDLL and
/Aw.
Notice that the compiler options include /Aw but not /Au. The /Aw option informs the
compiler that DS != SS. The /Au option is equivalent to /Aw and a _loadds on every
function, far and near. This is not an optimization because even near functions receive the
three lines of code that set up the DS register.
Using _loadds does not work for applications that have multiple instances and therefore
multiple DGROUPs. It does, however, work for single-instance applications. A single-
instance application need not export functions because the application passes function
addresses to Windows. The application should make sure that another instance cannot
start by checking the value of hInstance. Windows creates a new data segment for the
application, but the application contains hard-coded pointers to the first data segment.
The application should also set up a single data segment in the DEF file as:
Otherwise, the _loadds function modifier will generate warnings. There is no need to use
MakeProcInstance because the _loadds function modifier sets up the DS register
correctly.
In the previous examples, the functions are exported in the DEF file. You can also use the
_export keyword to export DLL functions. This method has some drawbacks, depending
on the method you use to link the application with the DLL. There are three methods:
Including an IMPORTS line in the DEF file of the application, for example:
IMPORTS
PICKER.Picker_Do
although inconvenient for DLLs with many functions, allows you to rename functions,
for example:
IMPORTS
PickIt = PICKER.Picker_Do
Now the application can call PickIt instead of Picker_Do. This is useful when DLLs
from different vendors use the same function name and when you import a function
directly by its ordinal number. The linker gives each exported function an ordinal number
to speed up linking by eliminating the need to search for the function. You can override
the default ordinal number by specifying a number after an "at" sign (@) in the DLL's
DEF file, for example:
; DLL .DEF
EXPORTS
Picker_Do @1
An application can import this function with the following DEF file entry:
; Apps .DEF
IMPORTS
PickIt = PICKER.1
Most programmers use the IMPLIB utility instead of an IMPORTS line in their DEF
files. IMPLIB takes the DEF file of a DLL or, if _export is used, takes the DLL itself and
builds a LIB file. The application links with the LIB file to resolve the calls to the DLL.
Therefore, the IMPORTS line is not needed.
One of the drawbacks of _export is that it assumes linking by name instead of linking by
ordinal number. As a result, the linker gives the function an ordinal number and the
function name is placed in the Resident Name Table.
The linker is not likely to assign the same number each time it links the program. For
example, the output of the EXEHDR program for a program with two exported functions
may originally look like this:
Exports:
ord seg offset name
1 1 07a1 WEP exported, shared data
4 1 0e06 ___EXPORTEDSTUB exported, shared data
3 1 00ac PICKER_OLDDLGPROC exported, shared data
2 1 0061 PICKER_DO exported, shared data
Adding a third exported function to the program may change all the ordinals in the
EXEHDR output, for example:
Exports:
ord seg offset name
1 1 07a1 WEP exported, shared data
3 1 0e06 ___EXPORTEDSTUB exported, shared data
4 1 0f00 NewFunction exported, shared data
2 1 00ac PICKER_OLDDLGPROC exported, shared data
5 1 0061 PICKER_DO exported, shared data
Applications that use any method of ordinal linking must now be recompiled to use the
new ordinals. You may also have to recompile if you use the EXPORT statement without
explicitly giving ordinal numbers. Having to recompile an application each time the DLL
changes offsets many of the advantages of using DLLs.
Linking by name also results in function names being placed in the Resident Name Table,
which is an array of function addresses indexed by function name. The Resident Name
Table stays in memory for the life of the DLL. When linking by ordinal number, the
function names reside on disk in the Non-Resident Name Table while an array of function
addresses indexed by ordinal number resides in memory.
For a large DLL, the Resident Name Table could consume a significant amount of
memory. Also, linking by name is much slower than linking by ordinal number because
Windows must perform a series of string comparisons to find the function in the table.
Run-time dynamic linking occurs when a function call is resolved at run time instead of
load time. For example:
HANDLE hLib ;
FARPROC lpfnPick ;
// Get library handle.
hLib = LoadLibrary("PICKER.DLL") ;
// Get address of function.
lpfnPick = GetProcAddress(hLib, "Picker_Do") ;
// Call the function.
(*lpfnPick) (hwnd, &aPicker ) ;
// Free the library.
FreeLibrary( hLib) ;
Linking by name does not use the ordinal number of the function. When linking by name
it is much faster to have the function name in the Resident Name Table.
However, using ordinal numbers is still faster and uses less memory. For example:
#define PICKER_DO 3
HANDLE hLib ;
FARPROC lpfnPick ;
// Get library handle.
hLib = LoadLibrary("PICKER.DLL") ;
// Get address of function.
lpfnPick = GetProcAddress(hLib, MAKEINTRESOURCE(PICKER_DO)) ;
// Call the function.
(*lpfnPick) (hwnd, &aPicker ) ;
// Free the library.
FreeLibrary( hLib) ;
The fastest, most flexible method, regardless of the linking method you use, is to
explicitly list the functions with ordinal numbers in the EXPORTS section of the DEF
file. The C/C++ version 7.0 /GD option encourages the use of __export to mark entry
points. If you use this option, we recommend that you add an EXPORT entry in the DEF
file for all functions that an application calls.
DS != SS issues
Some problems can arise within a DLL because DS != SS. A common problem occurs
when a DLL calls the standard C run-time library. For example, if you compile the
following code with the /Aw option:
void Foo()
{
char str[10]; // allocates str on stack,
strcpy(str,"BAR"); // passing the far pointer as a
// near pointer
}
the compiler generates a near/far mismatch error because strcpy expects str to be in the
default data segment (a near pointer). However, str is allocated on the stack (making it a
far pointer) because the stack segment does not equal the data segment. The following
examples show how to avoid this situation.
• You can place the array in the data segment by making it static:
• void Foo2()
• {
• static char str[10]; // allocate str in data segment
• strcpy(str,"BAR");
• }
• You can place the array in the data segment by making it global:
• char str[10]; // allocate str in data segment
•
• void Foo3()
• {
• strcpy(str,"BAR");
• }
• Instead of linking with the small-model version of strcpy, you can use the large-
model (also called the model-independent) version:
• void Foo4()
• {
• char str[10];
• _fstrcpy(str,"BAR"); // accept far pointers
• }
This version expects far pointers instead of near pointers and therefore casts the
near pointers into far pointers.
• You can also use the following functions from the Windows library:
• lstrcat
• lstrcmp
• lstrcmpi
• lstrcpy
• lstrlen
• wsprintf
• wvsprintf
void Foo4()
{
char str[10];
lstrcpy(str,"BAR"); // accept far pointers
}
void Foo5()
{
char str[10]; // allocated on stack
char *pstr ; // near pointer based on DS
In this example, pstr is set to the offset of str, and the segment is lost because pstr is a
near pointer. Declaring pstr a far pointer eliminates this problem. However, you cannot
pass a far pointer to strcpy so you must use _fstrcpy, which results in the following
corrected code:
void Foo6()
{
char str[10];
char FAR *pstr ; // far pointer
void Foo7()
{
static char str[10]; // DS-based pointer
char *pstr ;
What happens if the C run-time function does not have a far version? For example, in the
Picker DLL, the picker_OnMouseUp function calls _splitpath, which requires near
pointers. Using static or global structures poses problems for multiple applications that
use Picker simultaneously. To avoid these problems, Picker allocates memory from the
local heap with the LocalAlloc(LMEM_FIXED,size) function, which returns a local
pointer. This is exactly what Picker needs to call _splitpath.
Summary
• Be sure that all pointers you pass to a DLL are far pointers.
• /Au introduces a considerable amount of overhead; use /Aw and _loadds instead.
• _cdecl is the default C calling convention and is slightly slower than PASCAL
and _fastcall.
• _fastcall is the fastest method. It places some of the parameters in registers but
does not support variable argument functions and cannot be used with _export or
PASCAL, so entry points cannot use the _fastcall modifier. Under C/C++ version
7.0, the __fastcall modifiercan conflict with the Windows prolog/epilog code if
used in the following combinations.
Because the C run-time library is compiled with the _cdecl convention, you must include
header files such as STDLIB.H and STRING.H when you use a different calling
convention. These header files explicitly mark each function as _cdecl to simplify
changing the default convention. When you use a third-party library, you may have to add
the _cdecl function modifier to the header files.
You can use any calling convention as the default convention for applications, as long as
you declare all entry points FAR PASCAL and declare the WinMain function PASCAL.
Marking callback functions as PASCAL is usually safer, even if you use the /Gc Pascal
convention option, because it avoids problems if the calling convention changes
inadvertently. It is also a good form of code commenting.
A DLL, unlike an application, can use any calling convention, even for application-called
entry points. An application that calls a DLL must know which calling convention the
DLL expects and must use that convention.
A DLL may need to implement a variable argument function. Because _cdecl is the only
convention that supports variable arguments, it is the convention of choice. If you want a
DLL function to use variable arguments, use the _cdecl convention instead of the
PASCAL convention.
Note the following caveats when using variable argument lists in DLLs:
• The variable argument macros from STDARG.H use the default pointer size to
point to the arguments that are on the stack. In the small or medium model, the
pointers are near pointers. Because DS != SS, these pointers do not point to the
correct value and must be changed to far pointers before you can use these
macros, as shown in the modified STDARG.H below:
• /****************************************************************
• * File: wstdarg.h
• * Remarks: Macro definitions for variable argument lists
• * used in DLLs.
• ****************************************************************/
• typedef char _far *wva_list ;
•
• #define wva_start( ap, v ) (ap = (wva_list) &v + sizeof( v ))
• #define wva_arg( ap, t ) (((t _far *)(ap += sizeof( t )))[-1])
• #define wva_end( ap ) (ap = NULL)
• When passing arguments by reference, always use far pointer declarations. The
compiler synthesizes far pointers by pushing the DS and the offset of the memory
location onto the stack. This provides the DLL with the proper information to
access the application's data segment.
• Because functions with variable arguments are defined using _cdecl, pointer
arguments that are not declared in the parameter list must be typecast in the
function call; otherwise, the omission of the function parameter prototype causes
unpredictable results. For example:
• void FAR _cdecl DebugPrint( LPSTR lpStr, LPSTR lpFmt, ... )
• DebugPrint( szValue, "%s, value passed: %d\r\n",
• (LPSTR) "DebugPrint() called", (int) 10 ) ;
• When you import or export a function, you must declare it with an underscore (_)
prefix in the DEF file. You must also preserve case sensitivity in the function
name. For example, you can declare the function above as follows:
• EXPORTS
• WEP @1 RESIDENTNAME
• _DebugPrint @2
• cdecl functions must either be linked by ordinal number or have all-uppercase
names.
Unlike Pascal functions, which are converted to uppercase before they are
exported, _cdecl functions retain their case when exported. The Windows
dynamic-linking mechanism always converts function names to uppercase before
it looks in the DLL for the function. However, functions exported from a DLL are
expected to be in uppercase and are not converted. The result is a comparison
between an uppercase function name and a mixed-case function name. This
comparison, of course, fails. The solution is to declare the function name all-
uppercase or to link by ordinal number and avoid the whole comparison problem.
If the DLL will be used with different languages such as Visual Basic, Borland C++,
Microsoft Excel, Zortech C++, or Microsoft FORTRAN, you should use the PASCAL
convention. The registers used by the _fastcall convention can change between compiler
versions and are not compatible between compilers by different vendors.
int i ;
int *p ;
p = &i ;
pointer p is an alias of variable i. You can use aliases to perform tasks while keeping the
original pointer around, for example:
// No error checking.
// Get a pointer.
//
LPSTR ptr = GlobalLock(GlobalAlloc(GHND,1000));
LPSRT ptr_alias = ptr ; // alias the pointer
for ( i = 0 ; i < 1000 ; i++)
*(ptr_alias++) = foo(i) ; // use the alias
GlobalFree(GlobalHandle(ptr)); // free the memory
Although aliasing is a common and acceptable practice, the compiler can improve
optimizations if it can assume that there is no aliasing, because it can place more memory
locations into registers. By default, the compiler uses registers:
The /Ow and /Oa options signal the compiler that it has more freedom to place variables
or memory locations into registers; these options do not cause the compiler to keep
variables in registers.
The global register allocation option /Oe, on the other hand, allocates register storage to
variables, memory locations, or common subexpressions. Instead of using registers only
for temporary storage or for producing intermediate results, the /Oe option places the
most frequently used variables into registers. For example, /Oe places a window handle,
hWnd, in a register if a function is likely to use hWnd repeatedly.
Because the no-aliasing options increase the compiler's opportunities to place a variable
in a register, it makes sense to use these options with /Oe. In many cases, the /Ow and
/Oa options do not optimize without the /Oe option. In some cases, you can eliminate
problems with /Ow or /Oa by turning off /Oe optimization.
Using /Ow Instead of /Oa
What is the difference between /Ow (Windows version) and /Oa? Basically, /Ow is a
relaxed version of /Oa. It assumes aliasing will occur across function calls, so a memory
location placed in a register is reloaded after a function call. For example, in:
foobar( int * p) ;
{
// Compiler puts the value that p points to into a register.
*p = 5 ;
foo() ;
// If compiled with /Ow, the compiler reloads the register
// with p.
(*p)++ ;
}
the compiler places the memory referenced by pointer p into a register. If the /Ow option
is set, the compiler reloads the register. If the /Oa option is set, pointer p is not reloaded
after the function call. Thus, /Ow tells the compiler to forget everything about pointed-to
values after function calls.
Compiling the code fragment above with /Ox and /Oa results in the following code:
Notice how the compiler optimized away the last line that incremented pointer p.
Compiling the code with /Ox and /Ow results in the following correct version:
To understand the benefit this technique adds to a Windows-based program, look at the
following code fragment:
If you compile this code fragment with /Oa and C version 6.0, Bar is never called. If you
use C/C++ version 7.0, Bar is called. The C version 6.0 compiler assumes that ach does
not change in the SendMessage call and optimizes the call to the if block because ach[0]
is still zero. If you compile the code with /Ow, the compiler expects ach to change after
any function, including SendMessage.
The C version 6.0 compiler appears to be pretty dumb—it does not realize that the ach
pointer was passed to SendMessage. However, as far as the compiler can tell, a LONG
was passed, not the pointer. If a pointer had been passed, /Oa would have worked. For
example, in the following code:
//Pass a pointer.
SomeFunc(hwnd,(LPSTR)ach, sizeof(ach));
if (ach[0] != 0)
{
Bar(ach);
}
}
the compiler knows that the pointer is being passed and can be changed. This problem
can occur in any function that takes a pointer as a DWORD (lparam) or a WORD
(wparam). The C/C++ version 7.0 compiler corrects this behavior.
You can also solve this problem by simply declaring ach volatile. This causes the
compiler to place a variable in a register only if it must. However, /Ow usually generates
better code than using the volatile keyword.
Although /Ow is the easiest solution, the code it generates is not as efficient as the code
/Oa generates, as illustrated by the hWnd window handle in the previous example.
Window handles are commonly used in functions. They are perfect examples of variable
types that are meant to be placed into registers; however, with the /Ow option they are
reloaded after any function call. Using #pragma optimize at strategic locations to turn
/Ow and /Oa off prevents problems associated with reloading. A profiler can help
determine the placement of such statements.
The STRICT macros defined in the Windows version 3.1 SDK WINDOWS.H file also
reduce the need for the /Ow option. WINDOWSX.H includes macros that make most
window functions type-safe. So, a pointer is passed as a pointer instead of being passed
as a LONG. The STRICT macros can make an application more robust and should be
used even if the /Oa option is not in effect.
Undocumented "features" are rarely necessary or useful, with the exception of file
functions such as _lcreate that were not documented before Windows version 3.x. For
example, an undocumented feature that saves neither time nor effort is demonstrated by
the following code segment.
h2 = LocalAlloc(LMEM_MOVEABLE, cb);
if (*p = 0)
{
// Do something.
}
You should not use this undocumented feature for two reasons:
• Future versions of Windows will have a flat memory model and will not support
this type of memory accessing.
• The code will not compile as expected if you use the /Oa option. The p pointer is
not passed to the LocalAlloc function; therefore, the compiler assumes that p will
not change as a result of this function call. The programmer has tried to outsmart
the compiler by dereferencing the pointer again after the function call, so the
program appears to be safe. Not quite.... The compiler removes the second
dereference statement because it assumes that p did not change as a result of the
function call; this is exactly what the person who had to support the code would
do.
• Use #pragma optimize to selectively turn the /Ow option on and off. You can
also turn /Oe off.
• Use the volatile keyword to ensure that variables are not placed in registers.
Programming at Large
Dale Rogerson
Microsoft Developer Network Technology Group
Abstract
Microsoft® Windows™ version 3.1 signals the death of Windows real mode. With the
release of Windows version 3.1, only standard and enhanced modes are supported. The
end of real mode is the beginning of new programming freedoms, such as writing large-
model applications.
This article explains why the large model is valid for protected mode applications and
discusses the solutions for single instances and the Windows version 3.0 page-locking
bug, limitations of large-model applications.
In protected mode, the processor provides a mechanism, the segment selector, that
removes the need to track and update individual pointers. All far pointers in protected
mode consist of a 16-bit segment selector and a 16-bit segment offset. The segment
selector does not refer directly to a physical address; instead, it indexes into a table. The
value in this table is a segment address. When a segment moves, the segment selector
does not change, but the value in the table is updated. The maintenance of the segment
selector and the selector tables is supported directly by the Intel® 80x86 microprocessor.
While the segment selector solves many of the old problems caused by using the large
model, it does not resolve two limitations. One limitation requires applications with
multiple data segments to have only a single instance. The other limitation is a bug in
Windows version 3.0 that caused multiple data segments to be page-locked in memory.
These limitations do not affect dynamic-link libraries.
Single Instances
Windows version 3.1 cannot run multiple instances of applications with multiple read-
write data segments. If a large-model application has a single read-write data segment, it
can run multiple instances. A read-only segment can also be safely shared by multiple
instances because the instances cannot change the segment. Most large-model
applications, however, have multiple data segments and, therefore, cannot run multiple
instances.
While there are several methods for getting only one data segment in a large-model
program, one must remember that the application can have only 64 kilobytes (K) of static
data, local heap, and stack combined. This is the same as a medium-model program. For
this reason, when porting from a flat model 32-bit environment, it is probably best to use
a compiler that supports development of 32-bit applications under Windows. These
compilers, such as Watcom C 9.0, MetaWare 32-Bit Windows Application Development
Kit, or MicroWay NPD C-386, use WINMEM32.DLL to get a full 32-bit flat memory
model.
The Reason
In a multiple-instance application, all instances share the same code segments but have
unique default data segments. Small- and medium-model applications have only one data
segment. Most large-model applications have multiple data segments, but the current
Windows kernel cannot resolve fixups to multiple data segments. Consider the following
code fragment found in large-model applications that establishes the DS register:
mov ax,_data_01
mov ds,ax
This code is shared by all instances of the application. When the code is loaded,
_data_01 can hold only one value. Windows has no way to associate other data segments
with a given instance of an application.
The program loader determines if only one instance is allowed after examining the .EXE
header. If it discovers more than one data segment, it limits an application to one
instance. If an application has less than 64K of data, stack, and local heap, it is possible to
collapse the data into one data segment.
To get multiple instances, there must be only one read-write data segment. Under
Microsoft C/C++ version 7.0, follow these guidelines to allow for multiple instances:
• Do not use /ND to name extra data segments unless the segment is READONLY.
• Use /Gt65500 /Gx to force all data into the default data segment.
All of the above guidelines apply to Microsoft C version 6.0, except for the last one.
Microsoft C version 6.0 and C/C++ version 7.0 will usually generate two read-write data
segments. One is for initialized static data (DATA). The other one (FAR_BSS) is for
uninitialized static data. The Borland® C compilers default to generating only one data
segment. The existence of multiple data segments for a program called
SOMEPROG.EXE can be verified by the following command:
Microsoft C version 6.0 does not have the /Gx option to stop the generation of FAR_BSS
and to combine initialized and uninitialized data. While there are ways to stop the
creation of FAR_BSS with C version 6.0, in most cases it is easier to use C/C++ version
7.0. To eliminate FAR_BSS with C version 6.0:
• Initialize all uninitialized static variables, and mark all extern variables as NEAR.
• Mark all variables as NEAR, forcing the variables into the DATA segment.
For large programs, these ways of eliminating FAR_BSS can be very time-consuming.
The big problem with all methods for gaining multiple instances is that the application
still has only one read-write data segment. It does not have more data space than a
medium- or small-model program. A large-model program can have either multiple
instances or multiple read-write data segments, but not both.
Page-Lock Fix
1. Compile your application normally, and generate a map file during linking.
Examine the map file and find the names of the FAR_DATA and FAR_BSS
segments.
2. Write one or more assembly language routines that will return handles to the
FAR_DATA and FAR_BSS segments found in step 1. The following function will
return a handle to the data segments named MYSEGMENT and FAR_BSS:
3. title simhan.asm
4. ;****************************************************************
5. ?WIN = 1
6. ?PLM=1 ; PASCAL calling convention is DEFAULT
7. ?WIN=1 ; Windows calling convention
8. ; Use 386 code?
9. .MODEL LARGE
10.include cmacros.inc
11.sBegin DATA
12.sEnd DATA
13.MYSEGMENT SEGMENT MEMORY 'FAR_DATA'
14.MYSEGMENT ENDS
15.FAR_BSS SEGMENT MEMORY 'FAR_BSS'
16.FAR_BSS ENDS
17.sBegin CODE
18.assumes CS,CODE
19.assumes DS,DATA
20.;**************************************************************
21.cProc gethandle,<PUBLIC,FAR,PASCAL>
22.cBegin
23.mov ax,MYSEGMENT
24.cEnd gethandle
25.;**************************************************************
26.cProc gethandle2,<PUBLIC,FAR,PASCAL>
27.cBegin
28.mov ax,FAR_BSS
29.cEnd gethandle2
30.sEnd CODE
31.end
32. Add a call to the following function in your application's InitInstance function
after testing the success of your CreateWindow call:
33.void unlockAll()
34.{
35.// This fix is only needed for Windows version 3.0 so check
36.// version.
37.if (LOWORD(GetVersion()) == 0x0003)
38.{
39.// Un-pagelock MYSEGMENT
40.unlockExtra(gethandle()) ;
41.// Un-pagelock FAR_BSS
42.unlockExtra(gethandle2()) ;
43.}
44.}
45.void unlockExtra(HGLOBAL hExtraSeg)
46.{
47.BOOL fRet ;
48.// Unfix segment in logical memory
49.GlobalReAlloc(hExtraSeg, 0, GMEM_MODIFY | GMEM_MOVEABLE);
50.// Only discardable memory can be GlobalPageUnlock'ed
51.GlobalReAlloc(hExtraSeg, 0, GMEM_MODIFY | GMEM_DISCARDABLE);
52.// Unfix in physical (protected mode) memory
53.GlobalPageUnlock(hExtraSeg);
54.
55.// Reset the lock count to 0 because Windows happens to lock
56.// it multiple times.
57.do {
58.fRet = GlobalUnlock(hExtraSeg);
59.} while (fRet);
60.
61.// Modify the flags to moveable
62.GlobalReAlloc(hExtraSeg, 0, GMEM_MODIFY | GMEM_MOVEABLE);
63.}
64. Modify your make file to assemble and link your procedures that return handles to
your fixed data segments.
65. Recompile your program, and check results using the Microsoft Windows 80386
Debugger (WDEB386.EXE).
It is a good idea to test the fix under Windows version 3.0. A program that reports the
page-lock status of segments is needed. Microsoft CodeView® for Windows and the 3.0
version of the Windows Heap Walker utility do not report the page-lock status. Also, the
3.1 version of Heap Walker does not run reliably under Windows version 3.0. WDEB386,
however, does report the page-lock status of segments.
Finally, you can use WDEB386 to get page-lock information, as follows:
2. Run WDEB386.EXE.
3. Issue the DL selector command to dump the local descriptor table (LDT) entry
for the selector in which you are interested.
4. Take the Base linear address from the DL command and issue the .ML linear
address command.
5. Take the PFT address from the .ML command and issue the .MS PFT address
command. This will list the lock count for that page.
Words of Warning
It is important to keep the following points in mind when deciding to use the large model:
• A bug in Microsoft C/C++ version 7.0 causes C++ objects to be placed outside the
default data segment, ignoring the /Gx compiler option. To avoid this bug, specify
the object as near. For example:
• CTheApp NEAR theApp ;
• To get multiple instance large-model Microsoft Foundation Class (MFC)
applications, a special variant of the large-model libraries must be built. Use the
following make line:
• nmake MODEL=L TARGET=W DEBUG=1 OPT="/Gt65500 /Gx"
The above variant of the MFC library has not been extensively tested.
On a more positive note, large-model DLLs work very well because the equation SS !=
DS in the large model works exactly as it does in a DLL. Also, a DLL is always a single
instance. The Microsoft Foundation Classes recommend using a large model for DLLs.