Garbage collection is a form of software engineering system which depends on automatic memory management technique depending on the availability of a garbage collector with purpose to reclaim the memory used by objects which are not to be accessed again by the application. The system was first developed by John McCarthy in late sixties of 20 th century with sole purpose to provide solution to the problems of manual memory management in Lisp, a programming language. The need for this sort of system is to reuse the space which has once been used for running an application. The application or mutator which has used the space now has lived up its utility or has no use of the occupied space. The application for memory retrieval system is made to reclaim the inaccessible memory through a collector system.
Since this Garbage Collection System is a language based feature; hence with the development of a number of languages, similar development has been seen in the development of garbage development system for each of the individual language. Languages like Java, C# requires garbage collection either as part of the language specification while formal languages lambda calculus that is an effective practical implementation of the same. The languages are said to be garbage-collected languages. Other languages like C and C++ have been designed for use along with a manual memory management but have implementations for garbage collection.
With Garbage collection numerous software engineering advantages comes into fore but at the same time has been found to be in poor interaction with virtual memory managers. The popularity of languages like Java and C# has been due to the attached feature of Garbage collection. However, memory requirement of garbage collection has been considerably more than the explicit memory management and hence creates a need for larger RAM and fewer garbage-collected applications have been found to fit in a given amount of RAM. The substitute to this space problem is that of disc based garbage collection system where disc space is being made to the use rather than the physical memory. The performance of this garbage system degrades because of more expensive behavior of disc access than main memory access which requires almost six times more energy. This reduction in performance can extend to tens of seconds or even minutes when paging is applied. Even in a circumstance when main memory is sufficient enough to fit an application’s working set, the heap collection would later induce paging. Most of the existing garbage collectors tend to touch pages without taking into account of the pages are resident in memory. During full-heap collection, more pages are visited than those in the application’s working set.
Garbage collection’s application and use many a times disrupts proper performance of virtual memory management and destroys information taken by virtual memory manager for tracking reference history. This phenomenon is perhaps the most widely known undesirable behavior of the garbage collector but has been tackled indirectly because of the importance associated with generational garbage collectors with purpose being the collection efforts on short-lived objects. Objects with low survival rate generational collection reduces the frequency of full-heap garbage collections. However, when a generational collector eventually performs a full heap collection, it triggers paging. This problem has led to a number of workarounds. One standard way to avoid paging is to size the heap so that it never exceeds the size of available physical memory. However, choosing an appropriate size statically is impossible on a multi-programmed system, where the amount of available memory changes. Another possible approach is over provisioning systems with memory, but high-speed, high-density RAM remains expensive. It is also generally impractical to require that users purchase more memory in order to run garbage-collected applications. Furthermore, even in an over provisioned system, just one unanticipated workload exceeding available memory can render a system unresponsive. These problems have led some to recommend that garbage collection only be used for small applications with minimal memory footprints.
In a distributed application environment, where users want to retrieve data seamlessly, developers need to understand the needs of the user as well as resources, and other constraints of limited devices. Memory is one of biggest issues for mobile device applications; therefore developers need to understand garbage collection mechanism in order to make their application more efficient and reliable.
Garbage collection is often portrayed as the opposite of manual memory management, which requires the programmer to specify which objects to deallocate and return to the memory system. However, many systems use a combination of the two approaches, and there are other techniques being studied (such as region inference) to solve the same fundamental problem. Note that there is an ambiguity of terms, as theory often uses the termsmanual garbage-collectionandautomatic garbage-collectionrather thanmanual memory managementandgarbage-collection, and does not restrictgarbage-collectionto memory management, rather considering that any logical or physical resource may be garbage-collected.
The basic principle of how a garbage collector works is:
- Determine what data objects in a program will not be accessed in the future
- Reclaim the resources used by those objects
By making manual memory deallocation unnecessary (and typically impossible), garbage collection frees the programmer from having to worry about releasing objects that are no longer needed, which can otherwise consume a significant amount of design effort. It also aids programmers in their efforts to make programs more stable, because it prevents several classes of runtime errors. For example, it prevents dangling pointer errors, where a reference to a deallocated object is used. (The pointer still points to the location in memory where the object or data was, even though the object or data has since been deleted and the memory may now be used for other purposes, creating a dangling pointer.)
Many computer languages require garbage collection, either as part of the language specification (e. g. C#, and most scripting languages) or effectively for practical implementation (e. g. formal languages like lambda calculus); these are said to begarbage-collected languages. Other languages were designed for use with manual memory management, but have garbage collected implementations (e. g., C, C++). Newer Delphi versions support garbage collected dynamic arrays, long strings, variants and interfaces. Some languages, like Modula-3, allow both garbage collection and manual memory management to co-exist in the same application by using separate heaps for collected and manually managed objects, or yet others like D, which is garbage-collected but allows the user to manually delete objects and also entirely disable garbage collection when speed is required. In any case, it is far easier to implement garbage collection as part of the language’s compiler and runtime system, but post hoc GC systems exist, including ones that do not require recompilation. The garbage collector will almost always be closely integrated with the memory allocator.
1. 1 Definition of Garbage Collector
The name “ Garbage Collection” implies that objects that are no longer needed by the program are garbage and can be thrown away. Garbage Collection is the process of collecting all unused nodes and returning them to available space. This process is carried out in two phases. In the first phase, all the nodes in use are marked known as marking phase. In the second phase, all the unmarked nodes are returned to the available space list. It is required to compact memory when variable size nodes are in use so that all free nodes from a contiguous block of memory. Second phase is known as memory compaction. Compaction of disk space to reduce average retrieval time is desirable even for fixed size node.
Garbage Collection algorithm identifies the objects which are live. An object is live if it is referenced in a predefined variable called root, or if it is referenced in a variable contained in a live object. Non-live objects, which don’t have any references, are considered as garbage. Objects and references can be considered a directed graph; live objects are those which reachable from the root. Fig. 1 shows how garbage collection works.
Objects, which are in blue squares, are reachable from root but object that are in red color are not reachable. An object may refer to reachable object but still can be unreachable.
1. 2 Basics of Garbage Collector Algorithms
There are three basic garbage collector algorithms available.
Reference Counting: In this case, object has count of number of references to it and garbage collector will reclaim memory when count reaches to zero.
Mark and Sweep: Mark and Sweep algorithm is also known as tracing garbage collector. In Mark, Garbage collector marks all accessible objects and in second phase, GC scans through heap and reclaims all unmarked objects.
- Figure shows the operation of mark and sweep garbage collector algorithm.
It shows the conditions before garbage collector begins. Fig. b shows the effect of mark phase. All live objects are marked at this point. Fig. c shows the effect after sweep has been performed.
Compact: Compact shuffles all the live objects in memory such that free entries form large contiguous chucks.
1. 3 Problem Statement
The performance evaluations in this thesis were conducted with three major goals: to
make controlled comparisons so that the performance effects of isolated parameters can be determined, to allow easy exploration of the design space so that parameters of interest can be quickly evaluated, and to provide information about parts of the design space that are not easily implementable. As with other experimental sciences, hypotheses about performance can only be tested if experimental conditions are carefully controlled. For example, to accurately compare non-incremental with incremental copying garbage collection, other algorithm parameters, such as semi space size, promotion policy, allocation policy, and copying policy must be held constant. Furthermore, the Lisp systems in which the algorithms are implemented must be identical. Comparing incremental collection on a Lisp machine to stop-and-copy collection on a RISC workstation would provide little information.
A second characteristic of an effective evaluation method is its ability to allow easy exploration of the space of design possibilities. In the case of garbage collection evaluation, new algorithms should be easy to specify, parameterize, and modify. Parameters that govern the behavior of the algorithms should be easy to introduce and change. Examples of such parameters include semi-space size, physical memory page size, promotion policy, and the number of bytes in a pointer.
A good evaluation method will answer questions about systems that do not exist or are not readily implementable. If technology trends indicate certain systems are likely to be of interest, performance evaluation should help guide future system design. In the case of garbage collection, several trends have already been noted. In particular, garbage collection evaluation techniques may help guide computer architects in building effective memory system configurations. In the case of multiprocessors, evaluation methods that predict an algorithm’s performance without requiring its detailed implementation on a particular multiprocessor will save much implementation effort. If a technique for evaluating garbage collection algorithms can provide these capabilities, then a much broader understanding of the performance tradeoffs inherent in each algorithm is possible.
- GARBAGE COLLECTION ALGORITHMS
Garbage collection provides a solution where storage reclamation is automatic. This section provides an overview of the simplest approaches to garbage collection, and then discusses the two forms of garbage collection most relevant to this dissertation: generational collection and conservative collection.
2. 1 Simple Approaches
All garbage collection algorithms attempt to de-allocate objects that will never be used again. Since they cannot predict future accesses to objects, collectors make the simplifying assumption that any object that is accessible to the program will indeed be accessed and thus cannot be de-allocated. Thus, garbage collectors, in all their variety, always perform two operations: identify unreachable objects (garbage) and then de-allocate (collect) them.
Reference-counting collectors identify unreachable objects and de-allocate them as soon as they are no longer referenced (Collins, 1960 & Knuth 1973) Associated with each object is a reference count that is incremented each time a new pointer to the object is created and decremented each time one is destroyed. When the count falls to zero, the reference counts for immediate descendents are decremented and the object is de-allocated. Unfortunately, reference counting collectors are expensive because the counts must be maintained and it is difficult to reclaim circular data structures using only local reachability information.
Mark-sweep collectors are able to reclaim circular structures by determining information about global reachability (Knuth 1973, McCarthy 1960). Periodically, (e. g. when a memory threshold is exhausted) the collector marks all reachable objects and then reclaims the space used by the unmarked ones. Mark-sweep collectors are also expensive because every dynamically allocated object must be visited, the live ones during the mark phase and the dead ones during the sweep phase. On systems with virtual memory where the program address space is larger than primary memory, visiting all these objects may require the entire contents of dynamic memory be brought into primary memory each time a collection is per formed. Also, after many collections, objects become scattered across the address space because the space reclaimed from unreachable objects is fragmented into many pieces by the remaining live objects. Explicit de-allocation also suffers from this problem. Scattering reduces reference locality and ultimately increases the size of primary memory required to support a given application program.
Copying collectors provide a partial solution to this problem (Baker 1978, Cohen 1981). These algorithms mark objects by copying them to a separate contiguous area of primary memory. Once all the reachable objects have been copied, the entire address space consumed by the remaining unreachable objects is reclaimed at once; garbage objects need not be swept individually. Because in most cases the ratio of live to dead objects tends to be small (by selecting an appropriate collection interval), the cost of copying live objects is more than offset by the drastically reduced cost of reclaiming the dead ones. As an additional benefit, spatial locality is improved as the copying phase compacts all the live objects. Finally, allocation of new objects from the contiguous free space becomes extremely inexpensive. A pointer to the beginning of the free space is maintained; allocation consists of returning the pointer and incrementing it by the size of the allocated object.
But copying collectors are not a panacea, they cause disruptive pauses and they can only be used when pointers can be reliably identified. Long pauses occur when a large number of reachable objects must be traced at each collection. Generational collectors reduce tracing costs by limiting the number of objects traced (Lieberman & Hewit, 1983; Moon, 1984; Ungar, 1984). Precise runtime-type information available for languages such as LISP, ML, Modula, and Smalltalk allows pointers to be reliably identified. However, for languages such as C or C++ copying collection is difficult to implement because lack of runtime type information prevents pointer identification. One solution is to have the compiler provide the necessary information (Diwan, Moss &Hudson, 92). Conservative collectors provide a solution when such compiler support is unavailable (Boehm & Weiser, 1988).
2. 2 Generational Collection
For best performance, a collector should minimize the number of times each reachable object is traced during its lifetime. Generational collectors exploit the experimental observation that old objects are less likely to die than young ones by tracing old objects less frequently. Since most of the dead objects will be young, only a small fraction of the reclaimable space will remain unreclaimed after each collection and the cost of frequently retracing all the old objects is saved. Eventually, even the old objects will have to be traced to reclaim long lived dead objects. Generational collectors divide the memory space into several generations where each successive older generation is traced less frequently than the younger generations. Adding generations to a copying collector reduces scavenge time pauses because old objects are neither copied nor traced on every collection.
Generational collectors can avoid tracing objects in the older generation when pointers from older objects to younger objects are rare. Tracing the old objects is especially expensive when they are in paged out virtual memory on disc. This cost increases as the older generations become significantly larger than younger ones, as is typically the case. One way implementations of generational collectors reduce tracing costs is to segregate large objects that are known not to contain pointers are into a special untraced area (Ungar & Jackson, 1992). Another way to reduce costs is to maintain forward in time intergenerational pointers explicitly in a collector data structure, the remembered set, which be comes an extension of the root set. When a pointer to a young object is stored into an object in an older generation, that pointer is added into the remembered set for the younger generation. Tracking such stores is called maintaining the write barrier. Stores from young objects to old ones are not explicitly tracked. Instead, whenever a given generation is collected, all younger generations are also collected. The write barrier is often maintained by using virtual memory to write protect pages that are eligible to contain such pointers (Apple, Ellis & Li 1988). Another method is to use explicit inline code to check for such stores. Such a check may be implemented by the compiler, but other approaches are possible. For example, a post processing program may be able to recognize pointer stores in the compiler output, and insert the appropriate instructions.
Designers of generational collectors must also establish the size, collection and promotion policies for each generation and how many generations are appropriate. The collection policy determines when to collect, the number of generations, their size, and the promotion policy determines what is collected.
The collector must determine how frequently to scavenge each generation; more frequent collections reduce memory requirements at the expense of increased CPU time because space is reclaimed sooner but live objects are traced more frequently. As objects age, they must be promoted to older generations to reduce scavenge costs; promoting a short-lived object too soon may cause space to be wasted because it may be reclaimed long after it becomes unreachable; promoting a long-lived object too late results in wasted CPU time as that object is traced repeatedly. The space required by each generation is strongly influenced by the promotion and scavenge policies. If the promotion policy of a generational collector is chosen poorly, then tenured garbage will cause excessive memory consumption. Tenured garbage occurs when many objects that are promoted to older generations die long before the generation is scavenged. This problem is most acute with a fixed age policy that promotes objects after a fixed number of collections. Ungar and Jackson devised a policy that uses object demographics to delay promotion of objects until the collector’s scavenge costs require it (Ungar & Johnson, 1992).
Because generational collectors trade CPU time maintaining the remembered sets for a reduced scavenge time, their success depends upon many aspects of program behavior. If objects in older generations consume lots of storage, their lifetimes are always long; they contain few pointers to young objects’ pointer stores into them are rare and many objects die at a far younger age, then generational collectors will be very effective. However, even generational collectors must still occasionally do a full collection, which can cause long delays for some programs. Often, however, collectors provide tuning mechanisms that must be manipulated directly by the end user to optimize performance for each of their programs (Apple Computer Inc. 1992, Symbolics Inc 1985, Xerox Corp. 1983). Generational collectors have been implemented successfully in prototyping languages, such as LISP, Modula-3, Smalltalk and PCedar. These languages share the characteristic that pointers to objects are readily identifiable, or hardware tags are used to identify pointers. When pointers cannot be identified, copying collectors cannot be used, for when an object is copied all pointers referring to it must be changed to reflect its new address. If a pointer cannot be distinguished from other data then its value cannot be updated because doing so may alter the value of a variable. The existing practice in languages such as C and C++ which prevent reliable pointer identification has motivated research into conservative non-copying collectors.
- 3. Conservative Collection
Conservative collectors may be used in language systems where pointers cannot be reliably identified (Boehm & Weiser 1988). Indeed an implementation already exists that allows a C programmer to retrofit a conservative garbage collector to an existing application (Boehm 1994). This class of collectors makes use of the surprising fact that values that look like pointers, ambiguous pointers usually are pointers. Misidentified pointers result in some objects being treated as live when, in fact, they are garbage. Although some applications can exhibit severe leakage (Boehm 1993, Wenworth 1990) usually only a small percentage of memory is lost because of conservative pointer identification.
Imprecise pointer identification causes two problems valid pointers to allocated objects may not be recognized (derived pointers), or non-pointers may be misidentified as pointers (false pointers). Both cases turn out to be critical concerns for collector implementers.
A derived pointer is one that does not contain the base address of the object to which it refers. Such pointers are typically created by optimizations made either by a programmer or a compiler and occur in two forms. Interior pointers are ones that point into the middle of an object. Array indices, and fields, of a record are common examples (BGS 94). Sometimes an object that has no pointer into it from anywhere is still reachable. For example, an array whose lowest index is a non-zero integer may only be reachable from a pointer referring to index zero. Here the problem is that a garbage collector may mistakenly identify an object as unreachable because no explicit pointers to it exist.
With the exception of interior pointers, which are more expensive to trace, compiler support is required to solve this problem no matter what collection algorithm is used. In practice, it turns out that compiler optimizations have not been a problem yet (June 1995), because enabling sophisticated optimizations often breaks other code in the users program and is not used with garbage collected programs in practice (Boehm 1995b). Such support has been studied by other researchers and will not be discussed further in this dissertation (Boehm 1991, Diwan, Moss & Hudson, 1992, Ellis & Detlefs 1994).
False pointers exist when the type (whether it is a pointer or not) of an object is not available to the collector. For example, if the value contained in an integer variable corresponds to the address of an allocated but unreachable object) a conservative collector will not de-allocate that object. A heuristic called blacklisting reduces this problem by not allocating new objects from memory that corresponded to previously discovered false pointers (Boehm 93). But even when the type is available, false pointers may still exits. For example, a pointer may be stored into a compiler generated temporary (in a register or on the stack) that is not overwritten until long after its last use. While memory leakage caused by the degree of conservativism chosen for a particular collector is still an area of active research, it will not be discussed further in this dissertation except in the context of costs incurred by the conservative collector’s pointer finding heuristic.
Not only can false pointers cause memory leakage, but they also preclude copying. When a copying collector finds a reachable object, it creates a new one, copies the contents of the old object into it deletes the original object, and overwrites all pointers to the old object with the address of the new object. If the overwritten pointer was not a pointer, but instead was the value of a variable, this false pointer cannot be altered by the collector. This problem can be partly solved by moving only objects that are not referenced through false pointers as in Bartlett’s Mostly Copying collection algorithm (Barlett 1990).
If true pointers cannot be recognized, then the collector may not copy any objects after they are created. One of the chief advantages of copying collectors, reference locality, is lost (Moon 1984). A conservative collector can also cause a substantial increase in the size of a process’s working set as long lived objects become scattered over a large number of pages. Memory becomes fragmented as the storage freed from dead objects of varying sizes becomes interspersed with long lived live ones. This problem is no different than the one faced by traditional explicit memory allocation systems such as malloc/free in widespread use in the C and C++ community. Solutions to this problem may be readily transferable between garbage collection and explicit memory allocation algorithms.
The trace or sweep phases of garbage collection, which are not present in explicit memory allocation systems’ can dramatically alter the paging behavior of a program. Implementations of copying collectors already adjust the order in which reachable objects are traced during the mark phase to minimize the number of times each page must be brought into main memory. Zorn has shown that isolating the mark bits from the objects in a Mark-Sweep collector and other improvements also reduce collector induced paging. Generational collectors also dramatically reduce the pages referenced as well (Moon 1984).
Even though generational collectors reduce pause times work is also being done to make garbage collection suitable for the strict deadlines of real time computing. Baker (Baker 1978) suggested incremental collection, which interleaves collection with the allocating program (mutator) rather than stopping it for the entire duration of the collection. Each time an object is allocated, the collector does enough work to ensure the current collection completes before another one is required.
Incremental collectors must ensure that traced objects (those that have already been scanned for pointers) are not altered for if a pointer to an otherwise unreachable object is stored into the previously scanned object, that pointer will never be discovered and the object, which is now reachable, will be erroneously reclaimed. Although originally maintained by a read barrier (Baker 78) this invariant may also be maintained by a write barrier. The write barrier detects when a pointer to an untraced object is stored into a traced one, which is then retraced. Notice that this barrier may be implemented by the same method as the one for the remembered set in generational collectors; only the set of objects monitored by the barrier changes. Nettles and O’Toole (Nettles & O’Toole 93) relaxed this invariant in a copying collector by using the write barrier to monitor stores into threatened objects and altering their copies before de-allocation. Because incremental collectors are often used where performance is critical, any technology to improve write barrier performance is important to these collectors. Conversely, high performance collection of any type is more widely useful if designed so it may be easily adapted to become incremental. This dissertation will not explicitly discuss incremental collection further, but keep in mind that write barrier performance applies to incremental as well as generational collectors.
2. 4. Related Work
This dissertation combines and expands upon the work done by several key researchers. Xerox PARC developed a formal model and the concept of explicit threatened and immune sets. Ungar and Jackson developed a dynamic promotion policy Hosking, Moss and Stefanovic compared the performance of various write barriers for precise collection, and Zorn showed that inline write barriers can be quite efficient. I shall now describe each of these works and then introduce the key contributions this dissertation will make and how they relate to the previous work.
2. 4. 1 Theoretical Models and Implementations
Researchers at Xerox PARC have developed a powerful formal model for describing the parameter spaces for collectors that are both generational and conservative. A garbage collection becomes a mapping from one storage state to another. They show that storage states may be partitioned into threatened and immune sets. The method of selecting these sets induces a specific garbage collection algorithm. A pointer augmentation provides the formalism for modeling remembered sets and imprecise pointer identifications. Finally, they show how the formalism may be used to combine any generational algorithm with a conservative one. They used the model to design and then implement two different conservative generational garbage collectors. Their Sticky Mark Bit collector uses two generations and promotes objects surviving a single collection. A refinement of this collector (Collector II) allows objects allocated beyond an arbitrary point in the past to be immune from collection and tracing. This boundary between old objects, which are immune, and the new objects, which are threatened, is called the threatening boundary. More recently, these authors have received a software patent covering their ideas.
Until now, Collector II was the only collector that made the threatening boundary an explicit part of the algorithm. It used a fixed threatening boundary and time scale that advanced only one unit per collection. This choice was made to allow an easy comparison with a non-generational collector, not to show the full capability of such an idea.
Both collectors show that the use of two generations substantially reduces the number of pages referenced by the collector during each collection. However, these collectors exhibited very high CPU overhead, the generational collectors frequently doubled the total CPU time. In later work, they implemented a Mostly Parallel concurrent two generation conservative Sticky Mark Bit collector for the PCeder language. This combination substantially reduced pause times for collection compared to a simple full sweep collector for the two programs they measured. These collectors used page protection traps to maintain the write barrier. They did so by write protecting the entire heap address space and installing a trap handler to update a dirty bit for the first write to each page. Pause times were reduced by conducting the trace in parallel with the mutator. Once the trace was complete, they stopped the mutator, and retraced objects on all pages that were flagged as dirty. All their collectors shared the limitation that once promoted to the next generation; objects were only reclaimed when a full collection occurred, so scavenger updates to the remembered set were not addressed. Tenured garbage could only be reclaimed by collecting the entire heap. My work extends upon theirs by exploiting the full power of their model to dynamically update the threatening boundary at each collection rather than relying only upon a simple fixed age or full collection policy.
2. 4. 2. Feedback-Mediation
Ungar and Jackson measured the effect of a dynamic promotion policy. Feedback Mediation upon the amount of tenured garbage and pause times for four six-hour Smalltalk sessions (UJ). They observed that object lifetime distributions are irregular and that object lifetime demographics can change during execution of the program. This behavior affects a fixed age tenuring policy by causing long pause times when a preponderance of young objects causes too little tenuring and excessive garbage when old objects cause too much tenuring.
They attempted to solve this problem using two different approaches. First, they placed pointer-free objects (bitmaps and strings) larger than one kilobyte into a separate area_ this approach was effective because such objects need not be traced and are expensive to trace and copy. Second, they devised a dynamic tenuring policy that used feedback mediation and demographic information to alter the promotion policy so as to limit pause times. Rather than promoting objects after a fixed number of collections. Feedback mediation only promoted objects when a pause time constraint was exceeded because a high percentage of data survived a scavenge and would be costly to trace again. To determine how much to promote, they maintained object demographic information as a table containing of the number of bytes surviving at (each age where age is number of scavenges). The tenuring threshold was then set so the next scavenge would likely promote the number of bytes necessary to reduce the size of the youngest generation to the desired value.
Their collector appears similar to Collector II in that it uses an explicit threatening boundary, but differs because it does so for promotion only not for selecting the immune set directly. My work extends theirs by allowing objects to be demoted. Their object promotion policies can be modeled by advancing the threatening boundary by an amount determined by the demographic information each time the pause time constraint is exceeded. I extend this policy by moving the threatening boundary backward in time to reclaim the tenured garbage that was previously promoted. Hanson implemented a movable threatening boundary for a garbage collector for the SNOBOL-4 programming language. After each collection, surviving objects were moved to the beginning of the allocated space and the remaining (now contiguous) space was freed. Allocation subsequently proceeded in sequential address order from the free space. After the mark phase, and before the sweep phase, the new threatening boundary was set to the address of the lowest unmarked object found by a sequential scan of memory. This action corresponds to a policy of setting the threatening boundary to the age of the oldest unmarked object before each sweep. His scheme is an optimization of a full copying garbage collector that saves the cost of copying long lived objects. His collector must still mark and sweep the entire memory space.
2. 4. 3. Write Barrier Performance
Hosking, Moss, and Stefanovific at the University of Massachusetts evaluated the relative performance of various inline write barrier implementations for a precise copying collector using five Smalltalk programs. They developed a language independent garbage collector toolkit for copying, precise, generational garbage collection which like Ungar and Jackson, maintains a large object space. They compared the performance of several write barrier implementations card marking using either inline store checks or virtual memory, and explicit remembered sets, and presented a breakdown of scavenge time for each write barrier and program. Their research showed that maintaining the remembered sets explicitly out performed other approaches in terms of CPU over head for Smalltalk.
Zorn, Zor a showed an inline write barrier exhibited lower than expected CPU overheads compared with using operating system page protection traps to maintain a virtual memory write barrier. Specifically, he concluded that carefully designed inline software tests appear to be the most effective way to implement the write barrier and result in overheads of 2-6%.
In separate work, he showed properly designed mark-sweep collectors can significantly reduce the memory overhead for a small increase in CPU overhead in large LISP programs. These results support the notion that using an inline write barrier and non-copying collection can improve performance of garbage collection algorithms.
Ungar and Jackson’s collector provided a powerful tool for reducing the creation rate of tenured garbage by adjusting the promotion policy dynamically. I take this policy a step further and adjust the generation boundary directly instead. PARC’s Collector II maintains such a threatening boundary, but they measured only the case where the time of the last collection was considered. I alter the threatening boundary dynamically before each scavenge which unlike Ungar and Jackson’s collector, allows objects to be un- tenured, and hence further reduce memory overhead due to tenured garbage. Unlike other generational garbage collection algorithms, I have adopted PARC’s notation for immune and threatened sets, which simplifies specification of my collector over generational collectors. In order to avoid compiler modifications, previous conservative collectors have used page protection calls to the operating system for maintaining the write barrier. Recent work has shown program binaries may be modified without compiler support. Tools exist, such as QPT, Pixie, and ATOM, that alter the executable directly to do such tasks as trace generation and profiling. The same techniques may be applied to generational garbage collectors to add an inline write barrier by inserting explicit instructions to check for pointer stores into the heap.
Previous work has only evaluated inline write barriers for languages other than C, e. g. LISP, Smalltalk, Cedar. I evaluate the costs of using an inline write barrier for compiled C programs. Generational copying collectors avoid destroying the locality of the program by compacting objects conservative, non-copying collectors cannot do this compaction. Even so, Zorn showed mark sweep collectors can perform well and malloc/free systems have been working in C and C++ for years with the same problem. However, in previous work I have examined the effectiveness of using the allocation site to predict short-lived objects. For the five C programs measured in that paper, typically over, of all objects were short lived and the allocation site often predicted over 80% of them. In addition, over 40% of all dynamic references were to predictable short lived objects. By using the allocation site and object size to segregate short-lived objects into a small (64 K-byte) arena short-lived objects can be prevented from fragmenting memory occupied by long-lived ones. Because most references are to short-lived objects now contained in a small arena, the reference locality is significantly improved. In this document, I will discuss new work based upon lifetime prediction and store behavior to show future opportunities for applying the prediction model.
The same could be said of designs for complex software systems. The designer’s task is to choose the simplest dynamic storage allocation system that meets the application’s needs. Which system is chosen ultimately depends upon program behavior. The designer chooses an algorithm, data structure, and implementation based upon the anticipated behavior and requirements of the application. Data of known size that lives for the entire duration of the program may be allocated statically. Stack allocation works well for the stack like control flow for subroutine invocations. Program portions that allocate only fixed sized objects lead naturally to the idea using explicit free lists to minimize memory fragmentation. The observation that the survival rate of objects is lower for the youngest ones motivated implementation of generational garbage collection. In all cases, observing behavior of the program resulted in innovative solutions. All the work presented in this dissertation is based upon concrete measurements of program behavior. Program behavior is often the most important factor in deciding what algorithm or policy is most appropriate. While I present measurements in the context of the above three contributions, they are presented in enough detail to allow current and future researchers to gain useful in sight from the behavior measurements themselves. Specifically, I present material about the store behavior of C programs which has previously not appeared elsewhere.
- Implementation of Garbage Collection Algorithm
Any type of dynamic storage allocation system imposes both CPU and memory costs. The costs often strongly affect the performance of the system and pass directly to the purchaser of the hardware as well as to software project schedules. Thus, the selection of the appropriate storage management technique will often be determined primarily by its costs. This chapter will discuss the implementation model for garbage collection so that the experimental methods and results to follow may be evaluated properly. I will proceed from the simplest storage allocation strategies to the more complex strategies, adding refinements and describing their costs as I proceed. For each strategy, I will discuss the outline of the algorithm and data structures, and then I will provide details of the CPU and memory costs. Initially, explicit storage allocation costs will be discussed and provide a context and motivation for the costs of the simplest garbage collection algorithms; mark-sweep and copy. Lastly, the more elaborate techniques of conservative and generational garbage collection are discussed.
3. 1 Explicit Storage Allocation
Explicit dynamic storage allocation (DSA) provides two operations to the programmer; allocate and de-allocate. Allocate creates un-initialized contiguous storage of the required size for a new allocated object and returns a reference to that storage. De-allocate takes a reference to an object and makes its storage available for future allocation by adding it to a free list data structure (objects in the free list are called de-allocated objects). A size must be maintained for each allocated object so that de-allocate can update the free list properly. Allocate gets new storage either from the free list or by calling an operating system function. Allocate searches the free list first. If an appropriately sized memory segment is not available, allocate either break up an existing segment from the free list (if available) or requests a large segment from the operating system and adds it to the free list.
Correspondingly, de-allocate may coalesce segments with adjacent addresses into a single segment as it adds new entries to the free list (boundary tags may be added to each object to make this operation easier). The implementation is complicated slightly by alignment constraints of the CPU architecture since the storage must be appropriately aligned for access to the returned objects. The costs of this strategy, in terms of CPU and memory overhead depend critically upon the implementation of the free list data structure and the policies used to modify it. The CPU cost of allocation depends upon how long it takes to find a segment of the specified size in the free list (if present possibly fragment it_remove it_ and return the storage to the program_The CPU cost of deallocation depends upon the time to insert a segment of the speci_ed address and size into the free list and coalesce adjacent segments_ The total CPU overhead depends upon the allocation rate of the program as measured by the ratio of the total number of instructions re quired by the allocation and deallocation routines to the total number of instructions executed.
The memory overhead consists entirely of space consumed by objects in the free list waiting to be allocated _external fragmentation _Ran assuming that internal fragmentation and the space consumed by the size _elds and boundary tags is negligible_
Internal fragmentation is caused by objects that were allocated more storage than required _either to meet alignment constraints or to avoid creating too small a free space element careful tuning is often done to the allocator to minimize this internal fragmentation.
The data structure required to maintain the free list may often be ignored because it can be stored in the free space itself. The amount of storage consumed by items in the free list depends highly upon the program behavior and upon the policy used by allocate to select among multiple eligible candidates in the free list. For example, if the program interleaves creation of long lived objects with many small short lived ones and then later creates large objects, most of the items in the free list will be unused. Memory overheads _as measured by the ratio of size of the free space to the total memory required of thirty to fifty percent are not unexpected _Knu____ which leaves much room for improvement _CL____.
The total memory overhead depends upon the size of the free space as compared to the total memory required by the program. This free list overhead is the proper one to use for comparing explicit dynamic storage allocation space overheads to those of garbage collection algorithms since garbage collection can be considered to be a form of deferred deallocation. Often, both the CPU and memory costs of explicit deallocation are unacceptably high. Programmers often write specific allocation routines for objects of the same size and maintain a free list for those objects explicitly thereby avoiding both memory fragmentation and high CPU costs to maintain the free list_ But_ as the number of distinct object sizes increase_ the space consumed by the multiple free lists become prohibitive. Also, the memory savings depend critically upon the programmer’s ability to determine as soon as possible when storage is no longer required. When allocated objects may have more than one reference to them _object sharing _ high CPU costs can occur as code is invoked to maintain reference counts. Memory can become wasted by circular structures or by storage that is kept live longer than necessary to ensure program correctness.
3. 2 Mark-Sweep Garbage Collection
Mark-sweep garbage collection relieves the programmer from the burden of invoking the deal locate operation, the collector performs the deal location. In the simplest case, there is assumed to be a finite fixed upper bound on the amount of memory available to the allocate function. When the bound is exceeded, a garbage collector is invoked to search for and deallocate objects that will never be referenced again. The mark phase discovers reachable objects, and the sweep phase deallocates all unmarked objects. A set of mark bits is maintained_ one mark bit for each allocated object. A queue is maintained to record reachable objects that have not yet been traced. The algorithms proceed as follows. First, the queue is empty, all the mark bits are cleared and the search for reachable objects begins by adding to the queue all roots, that is, statically allocated objects, objects on the stack, and objects pointed to by CPU registers. As each object is removed from the queue, its contents are scanned sequentially for pointers to allocated objects. As each pointer is discovered, the mark bit for the object being pointed to tested and set and, if unmarked, the object is queued. The mark phase terminates when the queue is empty. Next, during the sweep phase, the mark bit for each allocated object is examined and, if clear, deallocate is called with that object.
As a refinement, the implementor may use a set instead of a queue and may choose an order other than first-in-first out for removing elements from the set. Mark-sweep collection adds CPU costs over explicit DSA for clearing the mark bits and, for each reachable object, setting the mark bit, en-queuing, scanning, and de-queuing. In addition, the mark bit must be tested for each allocated object and each unreachable object must be located and de-allocated. Deferred sweeping may be used to reduce the length of pauses caused when the collector interrupts the application. For deferred sweep, the collector resumes the program after the mark phase. Subsequent allocate requests test mark bits, deallocating unmarked objects until one of the required size is found. Deferred sweeping should be completed before the next collection is invoked since starting a collection when memory is available is probably premature. The first component of the memory cost for mark-sweep is the same as for explicit deallocation where the deallocation for each object is deferred until the next collection, this cost can be a very significant, often one and one half to three times the memory required by explicit deallocation. In addition to the size, a mark bit must be maintained for each allocated object.
Memory for the queue to maintain the set of objects to be traced must be maintained by clever means to avoid becoming excessive. A brute force technique, to handle queue overflow, is to discard the queue and restart the mark phase without clearing the previously set mark bits. If at least one mark bit is set before the queue is discarded, the algorithm will eventually terminate. Virtual memory makes it attractive to collect more frequently than each time the entire virtual address space is exhausted. The frequency of collection affects both the CPU and memory over head. As collections occur more frequently the memory overhead is reduced because unreachable objects are deallocated sooner but the CPU over head rises as objects are traced multiple times before they are deallocated. The two degenerate cases are interesting. Collecting at every allocation uses no more storage than explicit deallocation but at the maximal CPU cost, no collection at all has the minimum CPU overhead of explicit deallocation with a zero cost deallocate operation, but consumes the most memory. The latter case may often be the best for short-lived programs that must be composed rapidly.
The designer of the collector must tune the collection interval to match the resources available. Although this dissertation will not discuss it further, policies for setting the collection interval are an interesting topic in their own right, and there is much room for future research. As mentioned earlier, during explicit dynamic storage deallocation, fragmentation can consume a significant portion of available memory, especially for systems that have high allocation and deallocation rates of objects of a wide variety of sizes and lifetimes. Other researchers have observed that the vast majority of objects have very short lifetimes, under one megabyte of allocation or a few million instructions. This observation motivates two other forms of garbage collection, copying collection, which reduces fragmentation and sweep costs, and generational collection, which reduces trace times for each collection.
3. 3 Copying Garbage Collection
Copying garbage collection marks objects by copying them to a separate empty address space –to-space. Mark bits are unnecessary because an address in to space implicitly marks the object as reachable. After each object is copied, the address of the newly copied object is written into the old object, s storage. The presence of this forwarding pointer indicates a previously marked object that need not be copied each subsequent time the object is visited. As each object is copied or a reference to a forwarding pointer is discovered, the collector overwrites the original object reference with the address of the new copy. The sweep phase does not require examining mark bits or explicit calls to de-allocate each unmarked object. Instead, the unused portion of to space and the entire old address space, from space becomes the new free list new space.
Allocation from new space becomes very inexpensive, incrementing an address, testing it for overflow, and returning the previous address. Collection occurs each time the test indicates overflow of the size of to space. No explicit free list management is required. Copying collection adds CPU overhead for the copying of the contents of each of the reachable objects. Memory overhead is added for maintaining a copy in to space during the collection, but fragmentation is eliminated because copying makes the free list a contiguous new space_ Tospace may be kept small by ensuring that the survival rate is kept low by increasing the collection interval. Copying collection can only be used where pointers can be reliably identified. If a value that appears to point to an object is changed to reflect the updated object’s address and that value is not a pointer, the program semantics would be altered.
3. 4 Conservative Garbage Collection
Unlike with copying collection, conservative collectors may be used in languages where pointers are difficult to reliably identify. Conservative collectors are conservative in two ways: they assume that values are pointers for the purposes of determining whether an object is reachable, and that values are not pointers when considering an object for movement. They will not deallocate any object (or its descendents referenced only by a value that appears to be a pointer) and they will not move an object once it has been allocated. Conservative garbage collection requires a pointer finding heuristic to determine which values will be considered potential pointers. More precise heuristics avoid unnecessary retained memory caused by misidentified pointers at the cost of additional memory and CPU overhead. The heuristic must maintain all allocated objects in a data structure that is accessed each time a value is tested for pointer membership. The test takes a value that appears to be an address, and returns true if the value corresponds to the address pointing into a currently allocated object. This test will occur for each value contained in each traced root or heap object during the mark phase.
The precise cost of the heuristic depends highly upon the architecture of the computer, operating system, language, compiler, runtime environment and the program itself. The Boehm collector usually requires instructions on the DEC Alpha to map a bit value to the corresponding allocated object descriptor. In addition to the trace cost, CPU overhead
is incurred to insert an object into the pointer finding data structure at each allocation_ and to remove it at each deallocation. As with mark- sweep, deferred sweep may be used.
In addition to the memory for the mark bits previously mentioned for mark-sweep. conservative collectors require space for the pointer finding data structure. On the DEC Alpha, the Boehm collector uses a two level hash table to map bit addresses to a page descriptor. All objects on a page are the same size. Six pointer sized words per virtual memory, sized page are required. The space for page descriptors is interleaved through out dynamically allocated memory in pages that are never deallocated.
3. 5 Generational Garbage Collection
Recall that generational garbage collectors attempt to reduce collection pauses by partitioning memory into one or more generations based upon the allocation time of an object, the youngest objects are collected more frequently than the oldest. Objects are assigned to generations are promoted to older generation’s as they age, and a write barrier is used to maintain the remembered set for each generation. The memory overhead consists of generation identifiers, tenured garbage, and the remembered set. Also, partition fragmentation can increase memory consumption for copying generational collectors when the memory space reserved for one generation cannot be used for the other generation. The CPU overhead consists of costs for promoting objects, the write barrier and updating the remembered set. Each of these costs is discussed in this section. An understanding of them is required to evaluate the results presented in the experimental chapters later in this dissertation. The collector must keep track of which generation each object belongs to. For copying collectors, the generation is encoded by the object’s address. For mark-sweep collectors, the generation must be maintained explicitly usually by clustering objects into blocks of contiguous addresses and maintaining a word in the block encoding the generation to which all objects within the block be long.
As objects age, they may be promoted to older generations either by copying or changing the value of the corresponding generation field. Tenured garbage is memory overhead that occurs in generational collectors when objects in promoted generations are not collected until long after they become unreachable. In a sense, all garbage collectors generate tenured garbage from the time objects become unreachable until the next collection and memory leaks are the tenured garbage of explicit dynamic storage allocation systems. One of the central research contributions of this dissertation is to quantify the amount of tenured garbage for some applications, to show how it may be reduced, and to show how that reduction can impact total memory requirements. time to the next scavenge of the generations containing that garbage.
In order to avoid tracing objects in generations older than the one currently being collected, a data structure, called the remembered set, is maintained for each generation. The remembered set contains the locations of all pointers into a generation from objects outside that generation. The remembered set is traced long with the root set when the scavenge begins. PARC’s formal model called the remembered set a pointer augmentation and each element of the set was called a rescuer. This additional tracing guarantees that the collector will not erroneously collect objects in the younger, traced generation reachable only through indirection through the older untraced generations. CPU overhead occurs during the trace phase in adding the appropriate remembered set to the roots, and in scanning each object pointed to from the remembered
Set. A heuristic to reduce the size, and memory overhead of the remembered set is often indeed universally used only pointers from generations older than the scavenged generation are recorded, but at the cost of requiring all younger generations to be traced. This heuristic makes a time space trade off between increased CPU overhead for tracing younger generations to reduce the size of the remembered set based upon the assumption that forward, in time pointers pointers from older objects to younger ones are rare. If objects containing pointers are rarely overwritten after being initialized, then the assumption would appear to be justified, however empirical evidence supporting this assumption is often not well supported in the literature when generational garbage collection is used in a specific language environment. Still, collecting all younger generations does have the advantage of reducing circular structures crossing generation boundaries. The write barrier adds pointers to the remembered set as they are created by the application program. Each store that creates a pointer into a younger generation from an older one inserts that pointer into the remembered set. The write barrier may implement either by an explicit inline instruction sequence, or by virtual memory page protection traps. The CPU cost of the instruction sequence consists of instructions inserted at each store. The sequence tests for creation of a forward intime intergenerational pointer and inserts the address of each pointer into the remembered set.
The virtual memory CPU cost consists of delays caused by page write protect traps used to field the first store to each page in an older generation since the last collection of that generation. The cost of page protection traps can be significant on the order of micro-seconds, there is motivation for investigating using an explicit instruction sequence for the write barrier. When three or more generations exist, updating the remembered sets requires the capability to delete entries. The collector must ensure that unreachable objects discovered and deallocated from scavenged generations are removed from the remembered sets. A crude, but correct, approach is to delete all pointers from the remembered sets for the scavenged generations and then add them back as the trace phase proceeds. Consider an n generation collector containing generations, the youngest, to generation and the oldest. Before initiating the trace phase, suppose we decide to collect generations k and younger for some k such that k n. We delete from the remembered set for each generation such that all pointers from generations s such that as the trace
Proceeds, any pointer traced that crosses one or more generation boundaries from an older generation’s to a younger generation t is then added to the remembered set for the target generation. Another approach is to explicitly remove from each generation’s remembered set all entries corresponding to pointers contained in each object as it is scanned. This deletion can occur during the mark phase or as each object is de allocated during the (possibly deferred sweep phase). The recent literature is not very precise about this presumably because currently only generational collectors that use two generations are common. In this case, only one remembered set exists (for generation) and it is completely cleared only when a full collection occurs, precise remembered set update operations are not required.
- EVALUATION OF GARBAGE COLLECTION ALGORITHMS
3. 1 Write-Barrier for C
3. 2 Garbage Collection for C++
A number of possible approaches to automatic memory management in C++ have been considered over the years. A number of different mechanisms for adding automatic memory reclamation (garbage collection) to C++ have been considered:
- Smart-pointer-based approaches which recycle objects no longer referenced via special library-defined replacement pointer types. Boost shared ptrs (in TR1, see N1450= 03-0033) are the most widely used example. The underlying implementation