Unix and C/C++ Runtime Memory Management For Programmers: Excellent Site That Contains Notes About Memory Management
Unix and C/C++ Runtime Memory Management For Programmers: Excellent Site That Contains Notes About Memory Management
html#unix Excellent site that contains notes about memory management http://www.zathras.de/angelweb/howmemorymanagementworks.htm
5. Heap Management With malloc() and free() 1. Usage Of malloc() 2. Usage Of free() 3. Freeing The Same Chunk Twice 5. C++ Runtime Memory Management 1. Heap Management with new and delete 1. Usage Of new 2. Usage Of delete 3. new Vs. new[] 4. delete Vs. delete[] 2. Inter-mixing C++ And C Memory Managers 3. The Odds Of References 6. Tools For C/C++ Memory Problems Resolving 1. Free Libraries For Locating Memory Leaks 2. Free Libraries For Identifying Memory Corrupting Code 3. "Must-Have" Free Tools For Many Memory Problems Types - Linux/x86 4. "Must-Have" Commercial Tools For Many Memory Problems Types
would include virtual memory management, paging services, memory protection and the like.
Machine Architectures - Memory, Cache, Registers Basically, computers have two main locations for storing data - physical memory and registers. Memory is usually large, and is made of a linear set of cells, also called 'bytes' or 'words', depending on their size. Each memory cell is accessed using an address, and the memory does not have to be consecutive. On various architectures, certain parts of the memory are used to access devices (this is called memory-mapped I/O). other parts of memory might not even be mapped into any physical memory at all. In the old days, a programmer had to take care of this info in their program. Modern operating systems, however, hide this information from the user, as we'll see later. Cache is a small chunk of memory, stored either directly in the CPU (level 1 cache), or on the motherboard (level 2 cache). Its purpose is to store a copy of 'recently used' parts of the main memory, in a location that can be accessed much faster, either because its in the CPU, or because its made of a faster (and more expensive) type of memory. Usually, the cache is hidden from our our programs by the hardware - a programmer usually need only worry about the cache when doing very low-level memory managed code inside the operating system's kernel. Registers are storage cells stored directly inside the CPU with very fast access to its internals (the arithmetic-logic unit, for example). They can be accessed much faster than memory cells, and are often used to store data that is needed for a short calculation, such as contents of local variables in a function, or intermediate results of arithmetic calculations. the keyword 'register', when used when defining a local variable in C, can serve as a hint to the compiler to assign that variable to a register, rather than to a memory cell. Note that with current compilers and their optimizers, its often better to let the compiler decide which variables are better kept in registers.
Virtual Memory In order to ease the programmer's life, as well as to allow a program to access more memory than might be physically installed in the machine, most modern operating systems support a virtual memory system. Virtual memory works by the operating system (in tight cooperation with the CPU) mapping virtual addresses used by a process, into physical addresses. When a process allocates a new block of memory, the operating system assigns this block to some section (called 'page') of physical memory, using a virtual memory translation table. Each access to a memory location by the process is translated to access to the matching physical memory page. In order to avoid slowing down the program, this translation is done directly by the CPU.
Memory Protection As we have seen, the operating system uses a translation table to map virtual memory to physical memory. Taking this a further step, the system can use a different translation table for each process, thereby giving each process its own address space. This means that the same virtual memory address used by two different processes, will be mapped into two different physical memory addresses. This means that one process cannot access the contents of the memory used by the other process, and thus one process corrupting its memory won't interfere with the contents of the memory of any other process in the system. This feature is known as "memory protection", and is used now by most operating systems (including all Unix variants).
Run-time Management Of Virtual Memory Some of the virtual memory sections might be mapped to no physical memory page. When a process tries to access a memory cell in such a section, the CPU identifies a page fault, and invokes an operating system routine that needs to handle this fault. Most operating systems use this feature to store part of the virtual memory sections on disk, rather than in RAM, thereby allowing the program to use an address space larger than the physical RAM of the machine. When a page fault occurs, the operating system loads the contents of the faulted virtual memory section from the disk, copies it into a free physical memory page, and updates the virtual memory translation table so the CPU will know to map the virtual memory section to the new physical RAM page. During the search for a free page, it might be that no free page is found. In such case, the operating system takes the contents of a busy physical memory page, copies it to the hard disk, and uses this memory page to load the data of the desired virtual memory section. The virtual memory section that was previously mapped to this physical memory page, is now marked as paged out. As you can see from this, the algorithm used to decide which memory page should be taken when there are no free pages, has to be efficient. With a bad algorithm, a process that accesses two virtual memory sections alternately, might cause the system to keep paging these sections in and out. Since access to the disk is much much slower than access to RAM, this will slow down the system tremendously. A common algorithm used to find candidates for paging out is called LRU - Least Recently Used. In this algorithm, the memory page that was used least recently, is the page that will have its contents paged out. This algorithm works well due to the locality principle - if a process accessed a given memory page, it is likely to access this same page or one near it on the next instruction.
SIGSEGV This! SIGBUS That! We saw that some virtual memory sections aren't mapped to physical memory. While some of them were simply paged out, others were never allocated by the process. When a
process runs, its virtual memory table is small. As it allocates more memory pages, the table grows. However, if the process tries to access a virtual memory address of a section it hasn't allocated yet, the operating system has no where to bring this page from. The designers of the Unix system decided that this situation indicates a program bug, and thus instead of making an automatic allocation of a memory page in such a case, they chose to send a signal to the process. This signal is a SEGV signal (or SIGSEGV), and its default signal handler prints out a "Segmentation violation - core dumped" message, and dumps the memory image of the process into a file named 'core' in the process's current directory. Another way to cause a 'segmentation violation' is trying to access an illegal location of virtual memory. Because many invalid pointer problems occur with very low pointer values, the operating system does not allow a process to allocate a memory page for the virtual memory section beginning with the virtual address '0'. This is what causes programs to receive a SEGV signal when trying to dereference a NULL pointer (NULL on _most_ machine architectures is defined as '0'). What about a BUS (or SIGBUS) signal? this signal is sent to a program that tries to access a non-aligned pointer. For instance, on many machine architectures, access to 'long' (4 byte) numbers must be done using a memory address that divides by 4. Trying to access such an entity using an address that does not abide by this rule will cause the CPU to emit a trap. The operating system's kernel catches this trap, and then sends a BUS signal to the program. The default signal handler for this signal emits a "Bus error - core dumped" message, and dumps the memory contents to a 'core' file, much like the handler for the SEGV signal does.
Load On Demand As we have seen, the efficiency of virtual memory is based on the locality principle. The same principle can be used when loading programs into memory. There is no point in loading all the executable file during application startup. Large parts of the code are likely not to be ever executed during the process's lifetime. Furthermore, loading all of the code at once will mean it will take more time until the process can start running. Thus, what Unix systems do, is just allocate the page table entries, and mark the pages as "load on demand". When the process tries to access a virtual memory cell in a page that wasn't loaded yet, a page fault will occur, causing the operating system to load the contents of this specific page into memory. The code part of shared libraries is treated in a similar manner. This feature, together with some properties of file management could explain the following phenomena: if you have a process executing program file 'foo', and you then compile a new version of 'foo' and copy it on top of the original 'foo' file - the process is likely to crash very quickly. The reason for this is that the process is likely to access a virtual memory cell of a page whose contents were not loaded yet. This will cause the
system to load the page from the new binary file - which is most likely different from the original binary file. Thus, the program will be executing a random machine instruction, instead of the machine instruction it was supposed to perform, most likely resulting a process crash. In order to avoid the above scenario, there are several options. One is killing all processes currently executing the program's binary, before updating it. A better method is to rename the old binary file to a new name (e.g. foo.old) and then copy the new binary to 'foo'. This will work, because the name of a file is relevant only when a process tries to open it. After the file was opened, the process accesses it via its disk location (I-node number) which is different for the old and the new files. Actually, you may even erase the old binary file without renaming it - the file will be removed from the directory, but not from the disk. It will be removed from disk only when all processes keeping it open (e.g. processes executing this binary program file) will close it (or exit). Note that this is NOT true if the binary file is accessed via NFS, only if it is accessed on a local file system. NFS file accesses always use the file name, whether for open operations or for read operations.
Read-Only Memory Some of the virtual memory space used by a process does not ever need to change during the process's life. This fact can be used to optimize handling of such memory pages by the paging code of the operating system. For example, executable memory pages are read-only pages, and are marked so by the system. Trying to write into them would fail, and generate a CPU trap which will be caught by the operating system, that will send a SEGV signal to the process. When the system needs to page out read-only code pages of this sort, it doesn't even have to page them out to disk - it already has a copy of them inside the binary executable (or the shared library) they were taken from. Thus, paging these pages out just requires marking them as free. The process can mark other memory pages as read-only too (see the man page for the mmap() system call, for example). This could be useful to make sure no part of the code tries to modify the contents of that memory - useful for trapping down some bugs.
Shared Memory Another feature supported by Unix systems is shared memory. Using this feature, virtual memory sections of different processes are marked as shared, and thus point to the same physical memory pages. When one process changes the contents of this memory, the other processes immediatly see the new contents. Of course this means that access to this shared memory should be synchronized among the processes sharing it, using some locking mechanism.
Writable memory pages can be shared by processes using various system APIs, such as the System-V shared memory API (see the man page of shmget() and shmat() for details) or using the mmap() system call. Note that a page may be shared between more than 2 processes, and some of these processes might only require read-only access to these shared memory pages. Read-only shared memory is easier to manage. One place where sharing read-only memory pages comes in very handy is sharing of executable code coming from program binary files and from shared libraries. The system automatically takes care of that - if a process tries to execute a program that is already loaded to memory due to being executed by another process, the memory pages holding the binary code will be mapped to the virtual address space of the new process, and thus overall memory usage of the system would not increase (other than the memory needed to hold the translation table for the new process - this table is not shared between processes). Note that dynamic data won't be shared by processes in this manner, since it is stored to writable memory pages. This is, ofcourse, the expected behaviour - you wouldn't want one process changing a variable's contents, affecting another process running the same program, would you?.
Memory Management For Related Processes A special case of shared memory handling is handled by the system when a process creates a new child process using the fork() system call. This call creates a new process that contains an almost identical copy of the parent process's memory. The operating system makes a copy of the virtual memory translation table for the child, but current Unix implementations do not make a copy of the pages themselves. Instead, they mark all the pages (now shared by parent and child processes) as "copy-on-write". This means that as long as both processes only read from a given page, they access the same physical page. However, when one of these processes tries to write into such a shared memory page, the system notes the "copy-on-write" flag, and then creates a new copy of the page, and inserts that copy into the translation table of the writing process. Then, the write is actually performed on the new copy of the page. This mechanism allows the fork() operation to work faster than if all of the contents of the virtual memory of the parent process would have been copied. One may ask why making this copy is useful in the first place. For the executable code pages - this is obvious, as the child process starts to run the same code that the parent process used. Regarding the data pages, this allows the parent process to pass some data to the child process that it would require during its run, such as file descriptors used for communications between the parent process and child process. This means that a call to fork() will be most efficient if done before the parent process allocated large ammounts of memory, i.e. as close to the parent process's own initialization. Note that some operating systems support a version of the system call, named vfork() that is not supposed to even make a copy of the translation table. This system call is not very portable - see the relevant man page for more info.
Memory Managers
Memory managers are software libraries that perform memory allocation and deallocation on behalf of the program. Every programming language comes with a runtime environment that includes at least one memory manager. Since the programming languages we deal with tend to be general purpose, the memory manager is also general purpose - it does not assume anything about the pattern of allocations, sizes of memory chunks allocated and the likes. Therefor, it is possible to find an allocation pattern that will cause a given memory allocation to perform poorly. Thus, having an idea regarding how a memory manager works could allow us to write programs that make more efficient use of memory, in terms of speed and of overall memory usage.
Memory Alignment One property of all memory managers is that they return memory chunks which are aligned best for the architecture of the machine on which they are run. Alignment means that the memory chunks returned to the user begin on an address that divides by the size of a word for the CPU they are running on. This size is often of 4 bytes (for 32 bit architectures) or 8 bytes (for 64 bit architectures). Further, the size of returned chunks is also kept to a multiple of the word size. This ensures that the returned memory blocks can be used to store the largest data type the machine supports (e.g. a long integer). The price of this mechanism is that often memory is wasted. For example, suppose that i have a machine with a word size of 4 bytes. Further, suppose that i want to store a string whose length is 5 bytes (including the terminating null character, if we're refering to a C language string). The memory manager will give me a memory chunk whose size is 8 bytes (2 words). I use 5 bytes, and 3 are wasted. Thus, general-purpose memory managers cause a lot of memory waste for programs that allocate many small memory chunks. We will discuss methods of overcoming this problem when we discuss specialized memory managers. We will also see how this affects issues with pointers and memory smearing (memory contents corrupting).
A Memory Manager's Algorithm Various algorithms have been devised over the years for implementing efficient memory managers for general-purpose programming. We will present here a simple algorithm that is easy to follow, so you'll get a feel of how a memory manager works.
Linked List Of Memory Chunks
Our memory manager keeps a linked list of free memory chunks. Inside each chunk, the first few bytes are used for storing the chunk's size (including the space used by the memory manager to keep the chunk's size), and the address of the next chunk. This means that the size of a chunk must be at least large enough to store one number and one address (e.g. 8 bytes, on a 32-bit architecture). Here is how a single memory chunk in the list looks:
The memory manager is normally implemented as a library, which means it is linked into our process's executable binary. This allows the memory manager to work in the address space of our process, so it can indeed allocate memory that our process will be able to access. Also, because it works in virtual memory, the memory manager sees a large and consecutive address space. However, the linked list does not have to be consecutive, for reasons we'll soon see.
Allocating A Chunk
When the user asks to allocate a memory chunk of a given size, the memory manager scans the list until it finds the first memory chunk which is at least slightly larger than what the user asked for. This chunk is then removed from the linked list, and the manager logically splits it to three sections. In the first (small and fixed-size) section, it writes the size of the allocated chunk. The second section is the chunk whose address will be returned to the user. The third section contains the part of the chunk that goes beyond what the user required - this third section is turned into a new (free) chunk, and is added
back into the linked list, instead of the original chunk. Note that if the original chunk had fit the exact size the user asked for, no third chunk will be left to be linked back to the list. Thus, Assuming an unsigned long is used to represent a chunk's size, the chunk allocated has a size X plus the size of memory needed to store an unsigned long number (on a 32bit architecture, an unsigned long occupies 4 bytes of memory). Let's see how the above list looks after a user asked to allocate a chunk of memory with a size of 62 bytes. Let's assume the machine's word size is 4 bytes, so the manager will actually try to allocate a memory chunk of 64 bytes. in fact, since the manager needs to prepend the chunk's size in front of the chunk, it will actually allocate a 68 bytes block - 4 bytes to store the size, and 64 bytes to return to the user.
Freeing A Chunk
Freeing a chunk is a done by reversing the above process, with a few optimizations. The user gives the memory manager the address of the chunk it had previusly got from the manager. the manager looks at the 4 bytes right before this address for the size of the chunk. It then inserts the chunk back into its place on the linked list, checking if it can merge it with consecutive chunks already found on the list. The reason for this merging is to avoid memory fragmentation - that is, to avoid turning one large memory chunk into several small ones after a set of allocation and free operations. This potential fragmentation would have caused allocation requests for large
chunks, to be unable to use the freed chunks, if the manager hadn't noticed those chunks were actually consecutively placed in memory. This chunk merging operation also implies that the linked list would better be stored in a sorted manner, so that it will be easy to find the neighbours of a chunk in the list. In our specific case, if the user asks to free the memory chunk at address 0x2018, the memory manager will look at the 4 bytes before this chunk, seeing that the chunk's size is 68 bytes. Then, a scan of the linked list will show that this chunk belongs after the 1st chunk in the list. Calculating the chunk's end address will show that it is adjucent to the 3rd chunk in the list, and so the manager will merge these 2 chunks into a single chunk, and store this chunk on the linked list. The "next chunk" pointer in the previous list item will of course be updated to point to this merged chunk, and the list resulting will be the same as before the allocation we've shown above. Note that in actuall cases, the order of memory allocation and free operations is arbitrary, so a freed chunk might be merged with a chunk it wasn't part of initially. It might also merge with two chunks - one on each side.
Algorithm's Efficiency
The usage of linked lists is not considered efficient in general. However, for short lists, this is far more efficient than storing sorted binary trees or similar structures, due to the overhead of the balancing operations performed on these trees. When dealing with memory allocation, lists tend to be short. Initially, the list will contain a single large memory chunk. The first allocation will split this chunk in two, returning the first part to the user, and leaving a single (smaller) chunk on the list. The same goes for the next allocations. When the user begins to free memroy, chunks return to the pool, often causing some fragmentation. however, these same chunks are likely to be used again on the next allocation, reducing the size of the list again. Thus, for common patterns of memory allocation, a linked list would turn out to be quite efficient both in speed, and in use of memory.
More Info On Memory Management Algorithms
The issue of memory management algorithms has been extensively researched over the years, and a lot of information about it can be found on the Web. For an extensive coverage of memory management, look at the following 1995 postscript article, that provides a survey of dynamic storage allocation techniques, from the univerisity of Texas, USA.
We have claimed that the memory manager begins with one chunk on its list of free memory chunks. This chunk has to be taken from some place - this is the role of the operating system's virtual memory manager. As you remember, the memory the process accesses is actually virtual memory. Also, most virtual memory sections are not mapped anywhere. This is done for efficiency - when a process's virtual address space is of a size of several giga-bytes, it will be wasteful to allocate a map to handle this ammount of virtual memory, especially since most processes use a much smaller ammount of memory. The way the operating system handles this situation is by not allocating any memory for the process, unless it asks for it. Of course, the prorgam's binary and shared library sizes can be known in advance, but memory used to store data is of an arbitrary size. When the process requires more memory to store data pages, it uses the brk() system call (or sbrk()) to ask the system to enlarge its allocated address space. The system than adds entries to the translation map for this process, and the pages themselves will be then mapped into physical memory when they are accessed. Actually, the system call interface is hardly ever used by programmers. The memory manager of the runtime environment of our programming language is the one that uses this interface. Thus, whenever it runs out of free memory chunks on its list, it'll call the brk() system call to allocate more memory pages, and add the new block to its chunks list. The ammount of memory it will ask the system to allocate can vary. Asking for too small sizes will cause many calls to brk(), which is time consuming. Allocating a too large block will take time for the system to update the virtual translation map for the process. Note: there is usually no system call that can be used to return unused memory back to the operating system. when you free memory, the memory manager cannot return this memory back to the operating system. If this freed memory is never used again, it will eventually be paged out to disk - but won't be freed until the process exits. Thus, the ammount of virtual memory used by a process always increases - never decreases. That is, unless...: Note 2: the above rule has an exeption - shared memory, and mmap()-ed memory. Shared memory is counted as part of the memory space used by the process. Thus, when the process attached to a shared memory chunk, its 'accounted' virtual address space size is increased by the ammount of space in the shared memory page. When the process releases a shared memory chunk, the process's 'accounted' virtual address space size is decreased. The same goes for memory mapped using the mmap() system call.
Specialized Memory Managers Since general memry managers are... well - general, it makes sense to implement specialized memory managers in cases where the general memory manager causes too much overhead (in CPU usage or of memory used due to fragmentation).
A good example is when a program needs to allocate many (hundreads of thousands, millions) memory chunks all of the same size. This can be done by allocating a large block (whose size is a multiple of the chunk's size) and slicing it into a set of such chunks. This allows both for faster allocation, as well as for less memory usage overhead - since we don't need to keep the size of each chunk when we allocate it - they are all of the same size, after all.
Memory Manager's Interaction With Program's Code As we have seen, the memory manager keeps its own data structures in the address space of the process for which it works. This means that if the process overwrites memory out of bounds (e.g. referencing a location right before an array it has allocated using the manager) - the data structures of the memory manager itself may be corrupted. This would explain cases where a program we write crashes, and the stack shows it was in a middle of a call to the memery manager's functions (e.g. malloc() or free() in C, new or delete in C++). Note that these kinds of bugs are not always easy to track - the memory corruption might have occured in a completely different part of the code, and just looking at the functions on the stack and the code they execute often won't help locating the cause of the problem. The cause of such a problem could be that our program had overwritten a location that had stored the size of a chunk, and thus when it was freed, the memory manager overtook a larger chunk than had originally been allocated. The manager later allocated parts of this chunk for other purposes, and thus we get the same chunk of memory used by different parts of our code for different purposes. Or a bug in our code might cause the list of free memory chunks to be corrupted, causing one of its pointers to contain an invalid value (e.g. pointing outside the process's address space).
A C program is layed out in memory in several seperate segments - code segment, stack segment and data segment. These segments are layed out inside the virtual address space of the process. The code segment holds the executable code of the program (both coming from the program's binary itself, and any shared libraries it loads). All memory pages of the code segment are marked as read-only, and are shared with any other process using the same program file and/or shared library files. Note: not all pages are shared in this way, due to 'relocation' issues. These are cases when a part of the program needs to know the address of another part, but the exact address will only be known at run-time. This address might be different for different processes (since they load a different set of shared libraries, for example), and thus cannot be written in a shared page. The dynamic loader (which is responsible for handling such situations, among other things) uses "indirect jumps" - it stores non-shared pages that contain these varied addresses in them, and the program's code is compiled in a way that it will read the addresses from these non-shared pages. If this was not done, complete pages of executeable code will need to be copied, just to change one or two locations holding such addresses. For a clearer and more complete explanation about relocations and how exactly they are handled, please refer to Ulrich Drepper's home page, and in particular to his How to write shared libraries paper. The stack segment is used to hold the contents of the stack of the process. The stack size is defined at process startup, and cannot be increased while the process runs. For a multithreaded process, several stacks will exist in the stack segment. The data segment occupies the space from the stack segment, to the last address allocated by the process using the brk() system call. This segment can grow as the process runs and needs more virtual memory.
The Stack And Local Variables The stack is used by the process to store the chain of functions which are currently in the middle of execution. Each function call causes the process to add an execution frame to the top of the stack. This execution frame would contain the contents of CPU registers as they were before the function call was made, the return address (i.e. where in the code this function was invoked, so we can return there when the function returns), the parameters passed to the function and all local variables of the function. Thus, when we have fucntion 'main' calling function 'a' which calls function 'b', we will have 3 frames on the stack. The 'allocation' of a stack frame does not require performing real memory allocation - as the space for the stack was reserved during process startup. Instead, this is done by merely marking a part of the stack as containing the frame, and copying data to that frame. Note that local variables are "allocated" by simply advancing the stack pointer
beyond their location on the stack, so allocating local variables takes a fixed ammount of time, no matter their size. When a function returns - its stack frame is freed by simply modifying the stack pointer to point below its frame. The local variables are not initialized - they just contain the values that were accidentally placed in the memory locations these variables occupy. You may consider a situation in which a function was called, its variables used and given some values. Later the function returned, and its frame was released. Then, another function was called. the stack frame of this new function is located in the same place in memory as the frame of the former function, so the new function's local variables will get the values that were left there by the local variables of the former function. This can explain why the values of unintialized local variables are often neither 0, nor look like total garbage. Since all local variables are stored in consecutive memory on the stack, if we take a pointer to such a variable, and later on write right above or right below this pointer, we'll be modifying the contents of another local variable, or even that of the function's return address, if we're "lucky". For example, consider the following code:
int foo() { int numbers[2]; int j; j = 2; printf("j - %d\n", j); numbers[2] = 3; printf("j - %d\n", j); }
During execution of this function, we first assign 2 to 'j', and thus the first print command will show "j - 2". Then, we try to assign a value to 'numbers[2]'. However, the 'numbers' array only has 2 cells - 0 and 1. Writing into subscript '2' of this array will cause us to write just beyond the array (which is fully located on the stack). The variable 'j' just happens to be stored in that location in memory, and thus, the value '3' will be actually assigned to 'j'. Our second print command will thus show "j - 3". Note that this assumes that the variables are stored in memory in the same order as they were declared inside the function's code. With some compilers, this might not be the case, and the out-of-range assignment might overwrite a different variable, or a part of the stack that does not hold variables, leading to other unexpected results. Note: local variables (as well as function parameters) might be stored in registers, rather than on the stack. This could be either because we used the 'register' keyword when declaring these variables, or because the compiler's optimization chose to place a variable in a register. Ofcourse, such variables cannot be over-written by stack overflows.
So what realy is a pointer? As we are taught in any C programming course, a pointer is an entity that holds the address of another entity - a variable, or some memory address. Normally, we can treat a pointer as a variable pointing to another variable. This is important - since it implies that the pointer, by itself, is a piece of data stored in memory. The only thing differentiating a pointer from a simple number, is how we treat the pointer's value. Thus, a pointer is a memory cell (or a register) which contains a number, that we happen to interpret as an address of another memory cell. Lets view a small function that uses pointers, and see how its memory is layed out:
void hello_pointer() { char data[8] = "abcdefgh"; char* p = data; int i; /* 1 */ for (i=0; i<9; i++, p++) { *p = 'a'; } /* 2 */ *p = 'b'; }
Here is the layout of the stack during this function call, at the location marked with '/* 1 */':
As you can see, 'data' occupies 8 characters (memory cells). 'p' occupies 4 memory cells (assuming a pointer is 4 bytes long - which is true for 32 bit architectures, such as on Intel/AMD 80386 and compatible CPUs). We ignore 'i' for now. As you can see, 'p' contains the address of 'data[0]' (that is, the first byte of the array 'data'), before the loop runs. While we execute the loop above, we modify the contents of the memory cell into which 'p' points, to contain the ascii code of 'a', and then advance 'p' to point to the next byte. This is all nice and dandy up to the 8th iteration, in which 'i' equals '7', and 'p' contains the
address of 'data[7]':
Then things begin to get ugly. On the next iteration, 'p' is advanced again, and points to 'data[8]'. But wait - the array only contains 8 bytes, which are indexed 0 through 7. Pointing to 'data[8]' means 'pointing one byte after the end of the 'data' array. And what do we find there? that's right - the first byte of 'p' itself. Thus, after the next iteration, we will have the following contents on the stack:
As you can see, 'p' actually pointed to the location of memory in which its own contents resides, and thus, writing into '*p', we modified 'p' itself to contain some random address (0x61, in hex, is the ascii code of 'a'). Read this line again please, to be sure you understood it. When we will try to execute the command after the loop (*p = 'b'), we will smear some random location in memory (possibly even unmapped memory), leading to varius random effects (either smearing another memory cell, or, in case the address now in 'p' points to an unmapped memory page, causing the program to crash with a SEGV signal). Now, what would have happened if the source code of the previous example looked like this:
void hello_pointer() { char data[10] = "abcdefghij"; char* p = data; int i;
As you can see, the 'data' array here occupies 10 bytes, and since 'p' has to be aligned to a 4-byte boundary, there are 2 unused bytes between 'data[9]' and 'p' (the contents of these bytes is, ofcourse, undefined, since there is no need to initialize them). This time, our 'offby-1' error in the loop will cause us to override the contents of an otherwise unused memory cell. Thus, we will not even notice this kind of memory smear, until we sometime in the future modify the size of the 'data' array, or add a 'short' (2 bytes) variable between 'data' and 'p'. Pointers And Assembly Language Note: This section will be readable if you know assembly. If not - skip to the heap management section. As you know, assembly (and ofcourse machine code) uses several addressing modes to access operands of commands. Direct addressing assumes that an operand is found in a register. Indirect addressing assumes that the register contains the address of a memory cell (or memory word) that contains the data. As you can see, indirect addressing treats the register as a pointer. Memory based indirect addressing assumes that the command contains the address of a memory cell, whose contents contains the address of another memory cell, which contains the operand. This kind of addressing can be used to handle pointers - after all, pointers are just variables, and variables are just addresses of memory cells.
To illustrate this point, lets see a small fragment of C source code, and how it translates into assembly code. We will compile this code using 'gcc -S <filename>' to get the assembly code. First, here is the C code:
int main() { char c; char* p = &c; *p = 'b'; return; }
And here is the resulting assembly code (as generated on linux, in the AT&T assembly format). Just a few notes about this assembly variant:
In 2 operand commands, the first operand is the 'source', the second is the 'target'. Register names are prefixed with '%', like '%eax' and '%ebp'. Indirect addressing is denoted by surrounding a register with pharanteses, as in '(%eax)'. To add a displacement (an offset), this is prefixed with a number, as in '4(%eax)'. Numeric arguments are prefixed with '$', like '$98' to denote the numeric value '98'. Register '%esp' is the stack pointer.
Regarsing the stack, the variables on it are stored 'backwards', and the stack grows into lower memory. This means that the variables are stored in the following manner (note '%esp' is where the stack pointer points _before_ allocating the variables on the stack):
bottom of stack | \|/ v padding .____|_____. | | | \|/ \|/ \|/ v v v %esp | \|/ v
----------------------------------------------- ---| p | p | p | p | | | | c | | -> towards higher memory ----------------------------------------------- ---. /|\ | -8(%esp)
.version "01.01" gcc2_compiled.: .text .align 4 .globl main .type main,@function # next comes the 'main' function. main: pushl %ebp # the following lines initialize register '%ebp' movl %esp,%ebp # to point to the top of the stack frame, where # variables 'c' and 'p' are stored. subl $8,%esp # then we update '%esp' (the stack pointer) to # the bottom of the stack. leal -1(%ebp),%edx # we want 'p' to contain the address of 'c'. # since 'c' is a single char, we need to take an frame. # offset of '1' from the top of the stack # we store this address in register '%edx'. # leal = Load Effective Address (Long). movl %edx,-8(%ebp) # then we move that address into the location # on the stack where 'p' is stored. movl -8(%ebp),%eax # # # # # # 'p' is stored on the stack, at location '%ebp' minus 8 bytes. we use indirect addressing, in order to fetch the contents of the memory word ('l') whose address is the contents of '%ebp', minus 4. so now we have the contents of 'p' (which
is
# the address of 'c') in register '%eax'. code of movb $98,(%eax) # and then we copy the value '98' (ascii # 'b') into where '%eax' points. xorl %eax,%eax jmp .L1 .p2align 4,,7 .L1: leave ret # and some more assembly mambo-jambo. .Lfe1: .size main,.Lfe1-main .ident "GCC: (GNU) egcs-2.91.66 19990314/Linux (egcs-1.1.2 release)" # that's it. from here, its the code used to # return '0' from the function.
If you don't get it all at first glance - stare at the code a while longer. It will become clear with enough staring ;) Heap Management With malloc() and free()
When programming in C, we normally use the standard memory allocator that comes with the C library, to allocate memory from the heap (the data segment). This memory allocator defines functions such as malloc() and free(), for memory allocation handling.
Usage Of malloc()
When we want to allocate a chunk of memory from the heap, we use the malloc() function. This function accepts a numeric parameter denoting the size of the memory chunk we want, and returns a pointer to a memory chunk with the following properties:
The size of the returned chunk is usually enlarged to fit some 'rounded' size for which the memory manager is optimized. For instance, if we allocate an array of 10 chars (10 bytes), the manager will most likely pad the chunk so it will contain a number of complete words, thus returning a pointer to a 12 bytes chunk (three 4byte words). We should NOT rely on this behaviour, and not try to use more bytes than we asked for. The starting address of the returned chunk is aligned based on the size of the machine's "native" data, or rather to the size of the largest 'native' C language object (usually a 'long long' or 'double'). This is required, since the malloc() function does not know what we wish to do with this memory chunk (it returns a char pointer), so it must returned memory that may be used to store any type of object.
The memory chunk returned by malloc() is assigned to a pointer variable, and cast to the type of object this pointer points to. since this is just a general memory chunk, we may later cast it into other types without a problem - as long as the chunk is long enough to hold an object of that type. Here is an example:
char* p_c; int* p_i; p_c = malloc(8); strcpy(p_C, "bla"); printf("%s\n", p_c); p_i = (int*)p_c; (*p_i) = 4; printf("%d\n", *p_i); printf("p_i is '%p', p_c is '%p'\n", p_i, p_c);
We allocated memory for use as a char pointer (a C "string"), and used it like that. We then cast this memory into an int pointer, assigned a value to it, and printed its contents as if it was an integer. Finally, with the last printf() call, we printed out the address that p_i and p_c contain, to show that both point to the same memory chunk (note - the use of '%p' in the format of printf() tells printf() to print the address that the variable contains, rather than the value stored in that address).
Usage Of free()
When we want to free a memory chunk previously allocated by malloc(), we use the free function. This function accepts a char pointer to a previously allocated memory chunk, and frees it - that is, adds it to the list of free memory chunks, that may be reallocated. Several notes about free():
The size of the chunk was stored by malloc() previously in its memory map, and that is how free() knows how many bytes to free. The freed memory is not being cleared or erased in any manner. This is why accessing memory that was just freed often does not cause a crash - any data in it is still the same as before calling free(). The free() function cannot nullify pointers to the given memory chunk that might still exist in our program. After we call free(), it is up to us (the programmers) not to try and dereference pointers that still point to that memory chunk. Such pointers are known as 'dangling pointers' - they point to memory that was already freed, and thus they should NOT be dereferenced again, unless they are assigned the address of a different (not-freed) memory chunk.
As you can see, free() only marks the memory chunk as free - there is no enforcement of this freeing operation. Accessing memory chunks that were previously freed is the cause of many memory errors for novices and experienced programmers. A good practice is that always nullify a pointer that was just freed, as in:
char* p = malloc(10); strcpy(p, "hello"); printf("%s\n", p); free(p); p = NULL;
This setting of the pointer to NULL wastes a little time when the program runs, but can save a lot of time when debugging code. In most non-time-critical applications, the time wasted during runtime is negligable relative to the time saved during application development and debugging.
Freeing The Same Chunk Twice
One might ask - what if i accidentally free a memory chunk twice? this kind of bug could occur, for example, if we free a data point but don't nullify it, and during cleanup later, we see its not NULL, and try to free it again. To understand what this kind of error might cause, we should remember how the memory manager normally works. Often, it stores the size of the allocated chunk right before the chunk itself in memory. If we freed the memory, this memory chunk might have been allocated again by another malloc() request, and thus this double-free will actually free the wrong memory chunk - causing us to have a dangling pointer somewhere else in our application. Such bugs tend to show themselves much later than the place in the code
where they occured. Sometimes we don't see them at all, but they still lurk around, waiting for an opportunity to rear their ugly heads. Another problem that might occure, is that this double-free will be done after the freed chunk was merged together with neighbouring free chunks to form a larger free chunk, and then the larger chunk was re-allocated. In such a case, when we try to free() our chunk for the 2nd time, we'll actually free only part of the memory chunk that the application is currently using. This will cause even more unexpected problems.
Heap Management with new and delete Memory management in C++ is done using the new and delete operators, and their big cousins, new[] and delete[].
Usage Of new
In order to allocate memory in a C++ program, we use the new operator. This operator has two modes of operation. One mode of operation is when allocating memory for a native type (char, int, float...). The other mode of operation is when allocating memory for an object instance. When allocating memory for a native type, new allocates a pointer of the proper type, and returns it. Because new is an operator, not a function, it returns a pointer to the proper type, rather than a point to char, as does the malloc() C function. In addition, new may be instructed to initialized the allocated memory, with a given value, suiteable for the allocated type. Here is an example of using new with native types:
// allocate an 'int', and store its address in 'p_i'. int* p_i = new int;
// allocate an 'int' initialized to '3', and store its address in 'p_i2'. int* p_i2 = new int(3); As you can see, new may accept either just a type, or a type and a value. If it accepts
a type, it just allocates the memory for that type and returns it. If it accepts both a type and a value, it allocates the memory for the type, initializes it to contain the given value, and returns a pointer to the allocated memory chunk. When new is instructed to allocate memory for an object instance, it allocates the memory, and then invokes one of the constructors of the object's class, to initialize the instance. The exact constructor chosen is the one matching the parameters passed to new. Lets see an example:
// lets define a class with two constructors. class A { public: A() { i = 0; } A(int num) { i = num; } ~A(); private: int i; }; // and now lets initialize a few instances: // create an instance on the stack. this does NOT invoke new, but // does invoke the default constructor - A() . A a1; // create an instance on the heap. This invokes new, which further // invokes the default constructor - A() . A* p_a2 = new A; // create another instance on the heap. This invokes new, which further // invokes the constructor that gets one int parameter - A(int num) . A* p_a3 = new A(5);
As you can see, there is strong seperation between allocation of memory, and invoking the object's constructor. When allocating an instance on the stack (a local variable) we don't need to use new, since the memory of the stack is already allocated - we only need to call the object's constructor. When we allocate an instance on the heap, we call new to do that, and then the matching constructor is also invoked.
Usage Of delete
When we want to free memory (and object instances) that were allocated using the new operator, we use the delete operator. This operator has two modes of operation, similar to the new operator. For native types, delete just freed the memory previously allocated
by new. For object instances, it first invokes the destructor for the object, and then frees the memory itself. Lets see a few examples:
// lets first allocate an integer... int* p_i = new int; // ...and then immediately free it. Note that the type of object // freed could determine the size of memory to free. However, the ammount of // memory was more likely stored when allocating the object, as was the case // when allocating memory with malloc(). delete p_i; // any atempt to access '*p_i' now, is a dereference of an invalid pointer. // lets hope no one does that ;) // now, lets define a class A. // This class contains a pointer to an int, which is allocated in the destructor, // and freed in the destructor. class A { public: A() { p_i = new int(0); } ~A() { delete p_i; } private: int* p_i; }; // lets allocate an object of class A. A* p_a = new A; // Now lets free this object. The destructor is first invoked, which will // free the 'p_i' member, and only then the memory of the object itself // is freed. delete p_a;
Note: when delete is invoked for an object, te compiler needs to know which destructor to invoke (if that object is of a class that inherits from another class, for example). Choosing the destructor is done using the same method of choosing which function to activate on an object, when dealing with virtual Vs. non-virtual functions. For example, lets suppose that we have two classes, Parent and Child, where Child inherits from Parent. Lets see two code chunks that allocate a Child instance, and then frees it.
// Define class Parent. class Parent { public: Parent() { p_parent = new int(0); } ~Parent { delete p_parent; } private:
};
int* p_parent;
// Define class Child. class Child : public Parent { public: Child() { p_child = new int(1); } ~Child() { delete p_child; } private: int* p_child; }; // Lets allocate an object of class Child. Child* p_obj1 = new Child; // Lets delete the object now. Since the pointer is of class 'Child*', // the destructor invoked is '~Child()' - which is what we expect to // occur. Ofcourse, after '~Child()' is invoked, the parent class's // destrcutor, '~Parent()' is also invoked - the compiler handles the // code to do that. Thus, both int pointed to by 'p_child' and the int // pointer to by 'p_parent' will be freed, in this order. delete p_obj1; // Now, lets allocate an object of class Child. Due to polymorphism rules, // we may store its pointer in a 'Parent*' pointer: Parent* p_obj2 = new Child; of // Lets delete the object now. Because we are freed it via a pointer
// type 'Parent*', the compiler can not know that the object is in fact // a 'Child' object, and thus, will invoke the '~Parent()' destructor, // instead of invoking the '~Child()' destructor. The '~Child()' destructor // won't be invoked, and thus the int pointer to by 'p_child' will not // be deleted, and this memory will leak. delete p_obj2; Lets now see how to fix the Parent class definition, to avoid the memory leak problem
// as 'virtual ~Child() {...}'. class Child : public Parent { public: Child() { p_child = new int(1); } virtual ~Child() { delete p_child; } private: int* p_child; }; // Lets allocate an object of class Child. Child* p_obj1 = new Child; // Lets delete the object now. Since the pointer is of class 'Child*', // the destructor invoked is '~Child()' - which is what we expect to // occur. Ofcourse, after '~Child()' is invoked, the parent class's // destrcutor, '~Parent()' is also invoked - the compiler handles the // code to do that. Thus, both int pointed to by 'p_child' and the int // pointer to by 'p_parent' will be freed, in this order. delete p_obj1; // Now, lets allocate an object of class Child. Due to polymorphism rules, // we may store its pointer in a 'Parent*' pointer: Parent* p_obj2 = new Child; // Lets delete the object now. We free an object of class 'Parent', but // the destructor for this class is defined as virtual, so the runtime // environment identifies the run-time type of the object, which // is 'Child', and invokes the destructor of class 'Child'. Thus, both // destructor '~Child()' and '~Parent()' will be invoked, avoiding the // memory leak we saw previously. delete p_obj2;
As you can see, the matter of virtual destructors could be quite confusing, and yet, pertains a lot to memory leaks (and a host of other problems). The compiler would most likely not warn us of this kind of bug - we'll need to find it ourselves. Thus, it is good practice, to always define the destructor as virtual, and avoid this problem. This does slow down a little the invocation of the destructor, but this is relevant only in extremely time-critical applications.
new Vs. new[]
When we wish to allocate an array of objects, we need to use the new[] operator, rather then the new operator. The new[] operator allocates an array of elements of a given type, invoking the default constructor for each of them (or no construction, if the array of is native data types, such as int or float). Here are a few examples of using new[]:
// allocate an array of 5 integers, indexed 0-4. // store its address in 'p_a'. int* p_a = new int[5];
// if we want to initialize the array's data, we could do something like: for (int i=0; i < 5; i++) { p_a[i] = 0; } // lets create an array of objects of class A. the default constructor, // A::A(), will be invoked for each object in this array. A a_vector = new A[10]; delete Vs. delete[]
When we want to delete the contents of an array allocated with new[], we need to use the delete[] operator. Here is how its done with the previously allocated arrays:
// delete the array of integers. delete[] p_a; // delete the array of objects. this will first invoke the destructor // for each element in the array, and only then de-allocate the memory // occupied by the entire array. delete[] a_vector;
Note that the compiler (or runtime library) cannot know by itself whether to use the delete or the delete[] operator - we need to tell it that. This is true, because an 'int*' could either point to a single integer, allocated using 'new int', or an array of integers, allocated using 'new int[7]' (for instance). Imagine that we have used delete instead of delete[]:
// allocate an array of integers. int* p_i = new int[5]; // delete the array. Oops - we used delete instead of delete[]. // with a proper memory manager, the underlying memory manager's // data structures will know the size of the allocated array, // and thus free all the array, not just a single int. // however, there is no guarantee that this will work with // a different memory manager. delete p_i; // now lets allocate an array of objects: A* a_vector = new A[10]; // lets delete the array. Oops again - we used delete instead of delete[]. // this time, however, the problem is more apparent - only the destructor // for the first element in the array is invoked. the destructors for // the rest of the objects in the array are not invoked. If those objects // contain pointers to memory they need to de-allocate - this memory // will now leak and be lost.
delete a_vector;
A similar problem (but of a different nature) could occur if we allocate an object using new, and later de-allocate it using delete[]. This time, however, instead of a memory leak, we will get memory corruption - since the runtime library might try to invoke a destructor for an entire array - while we only allocated a single element. This won't necessarily happen in practice, because the memory manager knows that the block of memory to be deleted has a size suiteable for a single object, and thus will destruct only a single object. However, counting on that is poor practice.
Inter-mixing C++ And C Memory Managers
The C++ language is actually an extention of the C language. Programs written in C++ are often linked with libraries written in C. This implies that the implementations of memory managers for C and C++ must be able to work together. On various systems, this is done by making new and delete actually use malloc() and free() for memory management - when we invoke new, it uses malloc() to allocate the memory, and then invokes the constructor for the object allocated. Because of this feature, accidental intermixing of new with free(), or of malloc() with delete might go un-noticed especially when performed with native data types. However, different implementations of the C++ memory manager might not work the same way. Thus, allocating memory with one memory manager, and freeing it with another, could cause memory corruption. One may simply decide to only use the C++ memory manager when writing C++ programs. However, this solution is not useful, if using C libraries, that were compiled with a C compiler, and use the C memory manager. For example, if we use the standard C library's strdup() function to copy a C language string (a char pointer, actually), we need to be careful to free it with free(), not with delete.
The Odds Of References A reference (in C++) is a variable that contains the address of another variable, but is not actually being treated like a pointer. A reference may be set to point to some variable only when it is defined (either as a local variable, or as a parameter of a function) - and from then on, accessing the reference will actually access the referenced variable. For example:
// lets define an int pointer, and initialize it. int* num = new int(5); // lets define a reference to the previous int pointer. int& num_ref = *num; // lets change the contents of the number that num_ref references.
num_ref = 7; // the following command will now print '7'. cout << *num << endl;
If a reference behaves like a pointer, except for the syntax, why is it used? In most situations, a reference is used to make the syntax simpler. When we have an object, Instead of accessing its memebrs of methods with '->', we access them with '.'. Another reason for using references, is that they can't become 'invalid' like pointers - once a reference references a given object, it cannot be made to reference another object, or to reference NULL. But, is this realy the case? lets look at the following piece of code:
// this function accepts a reference to an integer, and modifies its value. void f(int& num) { num = 3; } // lets have a number, and pass it to the above function: int num1 = 1; // the function modifies the contents of 'num1' to '3', // because it is passed by reference, not by value. f(num1); // lets have another number, this time a pointer. int* p_num2; // oops - we accidentally set this pointer to point to NULL. p_num2 = NULL; // lets modify the number p_num2 points to. // oops.... f(*p_num2);
Once the last line of code in the above example is executed, the function contains a parameter that is a reference to a NULL pointer. The code inside the function can NOT check that it received a proper value - since a reference can NOT be tested as an address. Only the caller could perform this validity test. This kind of coding defeats safe programming practices (those of testing pointers for NULL values before trying to dereference them). One may wonder why the program only crashed inside the function, not on the call to the function, where '*p_num2' clearly accesses a NULL pointer. The reason for this is that just typing '*p_num2' does not realy dereference the pointer - it does not access the contents of what p_num2 points to. Looking at the assembly code of this reveals the fact that this line merely loads the address (NULL) stored inside the variable pointer to by this pointer, into a register, to pass it to the function. The only code that actually accesses this variable is found inside the function f(). In a real program, the assignment of NULL into a pointer that is later passed as a reference won't occur just before the function call (obviously), but rather much earlier, in a different part of the code.
The moral here is that one should be careful to use references only where they clearly cannot contain a NULL value at runtime. If they might, use a pointer instead, and check for its validity.
Free Libraries For Locating Memory Leaks Many free memory-leak locating libraries. They all work by replacing the malloc functions family (malloc, free, calloc and friends) with functions that perform the allocation while keeping some book accounting. In order to detect memory leaks, you need to store a list of all memory chunks that were allocated, then delete from this list any memory chunk that is being freed, and when the program exits, look at all blocks that are still found on the list - they weren't freed, and thus represent memory leaks. Ofcourse, you need to do more than this - such as store the stack trace during each memory chunk allocation, so the programmers will be able to find where in the code they allocated the memory. Note that not all leaks are 'important' to deal with. For instance, if you allocate a chunk of memory during program startup, and allocate it only once, then freeing it isn't necessary to avoid leaks. However, if you do that, your malloc tracking log will be cluttered with messages about such '1-time leaks', and you'll find it hard to identify the real leaks. Also, if you ever change your 'main' into a function that gets invoked several times, (e.g. when turning a program into a library, for re-use), that '1-time leak' will turn into a real memory leak. 'dmalloc' is an example for a library that identifies memory leaks. It keeps a full log of memory allocations, and prints information out when the program exits. If your program never exits (e.g. a daemon) - then modify it to run a given ammount of iterations and then exit, just for memory leak detection. 'dmalloc' may be found at http://dmalloc.com.
Free Libraries For Identifying Memory Corrupting Code Memory corrupting code comes in various forms. Reading the contents of uninitialized memory; Writing past the end of an allocated memory chunk, or a little before this chunk;
Writing into a freed memory chunk (or reading from a freed memory chunk) which might have been re-allocated since. All sorts of odd behaviour could happen from these types of bugs - you can't even expect to know how such bugs will look like. But you can look for them using various libraries. Techniques for identifying such problems would include putting markings before and after allocated memory chunks, and then verifying that they were no modified by the code. Or initializing all memory chunks with a given pattern of data, and then making sure that when they are first read from, their contents is not equal to this pattern - if it is, it means they were not yet initialized. Sometimes, using read-only or no-read allowed attributes for memory pages (using operating-system specific interfaces) can help identifying invalid accesses. Electric fence is a library that uses the read-only page markings in this manner. It replaces malloc/free, and allocates each chunk of memory right after a read-only memory page. Any attempt to write into this page (which means, writing a little before an allocated memory chunk) will cause a SIGSEGV - and the programer will be able to identify this bug in this manner. The library uses a few other techniques, similar to that. The library may be downloaded from ftp://ftp.perens.com/pub/ElectricFence/. A similar library is Gnu checker. Find it at http://www.gnu.org/software/checker/checker.html.
"Must-Have" Free Tools For Many Memory Problems Types - Linux/x86 One application that deals with most types of memory management problems (memory leaks, memory smearing, etc), is called Valgrind. This application, which is free software (GPL), was developed to track down a lot of such problems, without requiring any change to be made to the tested application. What it does is replace not only the memory management functions (malloc(), free() and friends), but also re-tune all of the program's source code, and place checks for memory access correctness before and after each access to a memory area. This allows it to notice memory access bugs as soon as they occur, and in real-time. Valgrind is currently available only for the Linux operating system, and for the x86 processors family. However, nothing in its design should prevent it from being ported, in the future, to other operating systems, and other types of CPUs. Finally, using valgrind is relatively easy - you install it, and then run your application (assuming it is dynamically linked - which is the case for most applications) using valgrind, giving valgrind a few flags to tell it what kind of errors to look for, and how to tune its work. Valgrind will slow down your application (sometimes even considerably), but for debugging, it will work much faster then trying to find the bugs on your own. For more information about valgrind, please visit valgrind's home page.
"Must-Have" Commercial Tools For Many Memory Problems Types Because memory error bugs can be so hard to find, and since the existing free tools have limitations (or did not exist in the past), various companies have developed commercial tools that check for memory leaks, memory smearing, reading un-initialized memory, and so on. These tools tend to handle a much broader types of bugs than the free tools, and have some GUI to activate them. They are also more advanced regarding checking of multi-threaded and multi-process applications. One of these tools is called 'purify', currently sold by rational software. This tool can identify all sorts of memory problems, and there is no need to re-start the application after each problem is encountered - problems are reported as they are occuring, and unless there's a NULL pointer dereference, the program keeps running. Thus, in a single run, one may identify a large set of problems. It costs few thousands dollars for a 'floating' license, but its worth every dime. For more info look at http://www.rational.com/products/pqc/index.jsp. Another product similar to purify is called insure++, from parasoft. It has features similar to what purify provides, thought it is more cumbersome to operate. Its one greatest advantage is that it has a Linux port (rational software holds back doing such a port of purify for quite a while). For more info about insure++ look at http://www.parasoft.com/products/insure/.
[LUPG Home] [Tutorials] [Related Material] [Essays] [Project Ideas] [Send Comments] This document is copyright (c) 2001-2002 by guy keren. The material in this document is provided AS IS, without any expressed or implied warranty, or claim of fitness for a particular purpose. Neither the author nor any contributers shell be liable for any damages incured directly or indirectly by using the material contained in this document. permission to copy this document (electronically or on paper, for personal or organization internal use) or publish it on-line is hereby granted, provided that the document is copied as-is, this copyright notice is preserved, and a link to the original document is written in the document's body, or in the page linking to the copy of this document. Permission to make translations of this document is also granted, under these terms assuming the translation preserves the meaning of the text, the copyright notice is preserved as-is, and a link to the original document is written in the document's body, or in the page linking to the copy of this document.
For any questions about the document and its license, please contact the author.
Under Solaris 2, confusingly, there are several sets of the basic functions, which you get depends on which library you link against. They are the bsdmalloc routines, the malloc(3C) functions and the malloc(3X) functions. They are all ANSI compliant, the 3X routines are space efficient, the bsdmalloc routines are fast and the 3C routines are a compromise. Each group includes some extra nonANSI functionality that may be of interest. The 3C group is the default. 2. Programming Examples Here's an example of a program that creates an array and fills it with the character 'x'. The program also reports how much CPU time the various operations took.
#include #include main() { <stdlib.h> <time.h>
int i,size; char *base; clock_t start,end; printf("Enter size of array to create "); scanf("%d",&size); start = clock(); base = (char *)calloc(size,sizeof (char)); end = clock(); if(base != NULL) { printf("Array of size %d created OK at address %p\n",size,base); printf("Allocation took %d milliseconds\n", (int)((end-start)*1E3/CLOCKS_PER_SEC)); } else { printf("Not enough memory\n"); exit(1); } start = clock(); for(i=0;i<size;i++) base[i] = 'x'; end = clock(); printf("Filling the array took %d milliseconds\n", (int)((end-start)*1E3/CLOCKS_PER_SEC)); }
The value returned from the function calloc() is cast to a suitable type. This is normal practice when using this function since it's default return value is a generic pointer and there are no default conversions between pointer types in C.
The value returned by calloc() is stored in the variable base. The value will, of course, be the start or base address of the allocated area of memory. The relationship between pointers and array notation in C allows the use of array notation when the array is filled. Timing information is here determined using the ANSI C function clock(), this does not have particularly fine resolution. You may find higher resolution timers available on your system.
Here are some results from running the program to create arrays of 1, 10, 100 and 1000 Megabytes. You may find it interesting to repeat these tests on different systems.
bash$ a.out Enter size of array to create 1000000 Array of size 1000000 created OK at address 20be0 Allocation took 70 milliseconds Filling the array took 130 milliseconds bash$ a.out Enter size of array to create 10000000 Array of size 10000000 created OK at address 20be0 Allocation took 700 milliseconds Filling the array took 1390 milliseconds bash$ a.out Enter size of array to create 100000000 Array of size 100000000 created OK at address 20be0 Allocation took 7020 milliseconds Filling the array took 13900 milliseconds Enter size of array to create 1000000000 Not enough memory
The overheads involved in creation of the arrays should be noted as should the time taken to fill the arrays. It is tempting when using calloc() to create instances of structure one by one as required by the program logic, this may not be wise as the overheads involved are significant. A useful alternative is to write some sort of wrapper function that maintains a "pool" of such structures and calls calloc() only when the "pool" is empty. Here's some typical code.
#define POOLSIZE 100
struct x *new() { static int nfree = 0; static struct x *base; if(!nfree) { base = (struct x *)calloc(POOLSIZE,sizeof(struct x)); if(base == NULL) return NULL; nfree = POOLSIZE; }
A routine that creates new instances of memory consuming objects on demand in this fashion is often called a constructor. A common practice is to use structures to store information about objects, this will typically include descriptive text. The descriptive text needs to be stored somewhere and it is very common practice to calloc() a suitable area of memory. Here's a modified version of the "new" function shown above that does this. This time we're not using a pool of structures but allocating them one by one, this may be worth the overheads if they are going to be subsequently released.
struct { char int }; x *text; value;
struct x *new(char *s) { static int nfree = 0; static struct x *base,*xptr; char *sptr; sptr = (char *)calloc(strlen(s)+1,sizeof(char)); if(sptr == NULL) return NULL; strcpy(sptr,s); xptr = (struct x *)calloc(1,sizeof(struct x)); if(xptr == NULL) { free(sptr); return NULL; } xptr -> text = sptr; return xptr; }
Again there are several interesting points here. The allocated space for the copy of the value is its string length plus one to allow for the string terminating NUL. o The value is copied to its final storage place using the strcpy() function. o If there is insufficient memory for the struct x the already allocated memory for the value is released before returning from the constructor. 3. Releasing memory space, leaks and garbage collection
o
If instances of struct x are being created by the code shown in the previous section, it is very tempting to dispose of them thus.
struct x *instance;
This is wrong as, although it does release the memory occupied by the instance of a struct x it does not release the memory used to store a copy of the associated text. The address of this memory location is lost and it is no longer accessible, however the memory management routines have no way of knowing this. This is a classic example of a memory leak. The programmer should have written :free(instance->text); free(instance);
It is probably better practice to code a specific destructor function to go with the constructor function here called new(). This does not, of course, deal with the type of memory leak caused by creating an instance of something to deal with an input or some part of processing an input and then not destroying it, perhaps because the processing went through some unusual path through the program. Some C++ implementations, apparently, automatically destroy objects when they pass out of scope, this is of limited use if a pointer to an object is being used. In some programming languages, such as Java, the memory management facilities include a facility known as garbage collection. This requires that the memory management facilities periodically examine the whole data space of the program looking for pointers into the arena and checking that all objects in the arena are actually targets of pointers. This further requires that the language supports comparatively strong typing so that such pointers can be readily identified. The overheads involved are significant although Java implementations multi-thread garbage collection with normal activity so garbage collection can proceed while waiting for input. The normal memory management functions supplied with C compilers do not provide garbage collection. 4. Debugging Debugging a program that uses calloc() and free() extensively can be difficult. There are two types of error that need to be tracked down.
1.Those
caused by over-writing the bounds of an allocated memory block. This is almost identical to the well-known C problem of "running off" the end of an array. The resultant damage to the arena will often cause an almost immediate crash. 2.Memory leaks. This is a longer term problem that simple testing with a small amount of data is unlikely to reveal. Testing against large volumes of data is important.
Under Solaris 2, dbx has some facilities to deal with such problems although they require linkage against a special set of allocation routines. They can be used periodically to display information about the arena. Solaris also provides a library function mallinfo that provides access to the arena information, this can be wrapped in a simple print function and included at suitable points in the program, or used in conjunction with assertions about the various values (i.e. that the values at the end of execution of a piece of code are the same as those at the start). Here's an example of the sort of code that might be used.
struct mallinfo old,new; . . old = mallinfo(); . . . new = mallinfo(); assert(old.uordblks == new.uordblks); .
If the number of uordblks has changed, probably indicating that some memory has been allocated but not released the failed assertion will cause a core dump which can then be examined using the debugger. The usefulness of this approach is slightly restricted by lack of information about the significance of the various items of arena information reported by mallinfo(). 2. Addresses The addresses returned by calloc() are fairly arbitrary. If two, or more, objects are created by successive calls to calloc() they will not, in general, occupy successive locations as this example shows.
#include <time.h> main() { struct tm *t[3]; int i,s = sizeof (struct tm); for(i=0;i<3;i++) { t[i] = (struct tm*)calloc(1,s); } printf("Size of struct tm = %x\n",s); for(i=0;i<3;i++) printf("Instance %d base address %x\n", i,t[i]); }
The program allocates memory for three instances of a struct tm and reports the addresses. Here is the output.
bash$ a.out
Size of struct tm = 24 Instance 0 base address 209e8 Instance 1 base address 20a18 Instance 2 base address 20a48
This shows that the size of a struct tm is 24x whereas the difference between the addresses of the successive locations is 30x, this suggests an overhead of 1210 bytes per allocated piece of memory. It is also important to realise that the addresses associated with allocated memory objects are absolute within the program's address space. This means that the following practices are likely to cause severe problems.
o
Linked lists within arrays It is tempting to create and array (or "pool") of objects via a single call to calloc() for reasons of efficiency, in preference to multiple single calls and then build a linked list (or other data structure) within the array. This is perfectly safe until all the objects of the array have been used, there is a temptation to use the function realloc() to expand the array. realloc() will preserve the contents of the first part of the array and attach extra space at the end, however, as the ANSI standard makes quite clear the allocated space may have been moved. Moving the array without changing the pointers that define the linked list destroys the linked list as the poointers no longer point to the correct place.
Saving data structures Having constructed a possibly elaborate and expensive to build linked data structure in calloc'd space, it is tempting to write it to a file for future use by the program. Since there is no control over the absolute positioning of calloc'd space this will not work unless, by chance, the new area is positioned in the same place as the original. Similar constraints would apply if the structure were being created in ordinary "compile time allocated" memory and the program were still being developed. The best way of creating such persistent structures is to use the mmap() function to map a file into memory space at a specific location. This also has the advantage that there is no need to perform file operations to read in the persistent structure.