// "sgc.h" Header file for Schiermers Garbage Collector // Include this file to acces the collector. /* SSSSSS GGGGGG CCCCCC SSSSSSSSSS GGGGGGGGGG CCCCCCCCCC SS SS GG GG CC CC SS GG CC SS GG CC SS GG CC SSSSS GG CC SSSSS GG GGGGG CC SS GG GGGGG CC SS GG GG CC SS GG GG CC SS SS GG GG CC CC SSSSSSSSSS GGGGGGGGGG CCCCCCCCCC SSSSSS GGGGGG CCCCCC Rel 0.2 (BETA), September 1999. (C) Finn Schiermer Andersen, schier@diku.dk Root-Finding code by Hans Boehm, boehm@sgi.com, used with permission. SGC is an almost-general-purpose storage manager for C++ programs. SGC uses garbage collection but supports usual new/delete semantics as well. SGC is aimed at, and works best with applications with recurring allocation patterns, many small objects, and no objects larger than 1Mb. SGC is "open source software". You may use and redistribute it freely, provided this notice is retained AND any use or distribution is accompanied by the source code or with an offer detailing where to obtain the code for free. Consult "http://www.opensource.dk/mirror/" for further details about the open source concept and its implications. General Disclaimer: This product is surely broken. It will almost certainly not fit your needs in any way, and is totally useless. If it screws up your application, its your own problem, and the author cannot be held responsible for anything at all. That aside: SGC is both fast and flexible, routinely beating GNUs built in new/delete on both Alpha/OSF1 and Linux/X86 in terms of speed. SGC uses typically 30-80% more memory when used as a traditional new/delete provider, and 80-300% more memory when used as a garbage collector. SGC is currently operational at the following platforms - Linux/x86 ELF platforms, gnu g++ 2.7.2 - DEC OSF1/Alpha, gnu g++ 2.8 SGC does not work with multithreading. */ #ifndef __GC__ #define __GC__ #include /* THE SHORT VERSION of life, the universe and everything... ========================================================= SGC provides the following allocators new(node) Allocate a collectable memory area new(leaf) As new(node), but without pointers to collectable areas new(fixedNode) As new(node), but size of area is known at compile time new(fixedLeaf) As new(leaf), but size of area is known at compile time as well as their corresponding [] versions. and... gc_delete(void*) can be used to release any kind of memory area immediately. Release of memory should be considered an optimization. Call of gc_delete for an object does not invoke its destructor. - Operators new(node) and new(fixedNode) returns a zeroed memory area. - Allocated areas should be relatively small, less than 256Kb is best. (This restriction is motivated in great detail later) Dont hide pointers by shifting, reversing, casting to non-pointer types or representing as distances from other pointers. Such tricks will cause SGC to loose track of live memory, recycle it, and usually crash. Dont use global objects requiring dynamic allocation during initialisation, since initialisation of such objects are unordered with respect to the actual initialisation of SGC itself. Note that destructors are NOT invoked when SGC recycles memory. Use finalization instead. Derive from class finalized and implement any cleanup in the virtual function "void finalize()". Do not rely on any particular order of finalization. Do not use finalization to control any user visible behaviour (e.g. closing of windows). SGC coexists peacefully with the built-in new and delete. Objects allocated using standard new are not scanned for pointers, and consequently garbage collectable areas only accessible from objects allocated through new will be recycled. Global new and delete may be replaced by new(node) and gc_delete by including "catchNew.o" in the link. The delete operator will still invoke destructors. THE LONGER VERSION of life, the universe and everything... ========================================================== GARBAGE COLLECTION and C++. A short introduction ------------------------------------------------ C++ is not designed with garbage collection in mind. Compilers may generate code that are not safe for garbage collection, and programmers are free to disguise pointers any way they like. In practice compilers seem to work well with collectors, though, and the restrictions placed on programmers in order to work with garbage collection are few and easily adopted. Pointers The collector has no acces to type information, and therefore does not know the exact location of pointers. It simply assumes that any bit pattern, that could potentially be a pointer, is exactly that. This approach is sometimes called "conservative" garbage collection. Pointers are freely assignable as usual in C++. Only naturally aligned objects are considered pointers. Thus, you may hide pointers for the collector by shifting them, representing them as distances from another pointer, inverting the bits, and in many other ways. Hidden pointers are not detected by the collector, and an object that is only reachable through hidden pointers is recycled by the collector. Blacklisting and large objects Since any bit pattern resembling a pointer is considered as such, a naive collector may fail to release a considerable amount of dead memory. To minimize this problem SGC uses a method developed by Boehm et.al. called "blacklisting". When the collector detects an unallocated, but reachable, object, it is blacklisted. Blacklisted objects cannot be allocated. In practice, this approach bounds the amount of retained dead memory to a small and constant portion of the heap. Unfortunately, blacklisting leads to small chunks of unusable memory scattered accros the heap. This causes fragmentation and in practice limits the amount of memory one can possibly allocate in a single chunk. There are no easy way around this. SGC supports allocation of large objects from the usual non-collectable free store. If needed, such objects can be indirectly accessed and administered by a smaller garbage collected object. INTERFACE --------- SGC provides several allocation functions optimized for different behaviour. The operator new(node) should be used for general allocation: new(node) Allocate a collectable area possibly containing pointers to collectable objects. Performance can be enhanced by using special variants for certain situations. The special situations are supported by: new(leaf) Allocates a node without pointers to collectable objects new(fixedNode) Allocates a small node with compiler determined size new(fixedLeaf) Allocates a small pointerfree node with compiler det. size. Using new(fixedNode) and new(fixedLeaf) will only be a win for small objects, however. Both variants are inline and causes significant increase in code size if used carelessly, and especially if used when the compiler *cannot* determine the object size. The function gc_delete(void*) causes immediate release of memory. This may lower the memory requirements of the collector substantially, but it usually slows down the application as well. For most applications the garbage collector will be faster on its own. delete causes invocation of the objects destructor as usual. SGC does not invoke destructors on reclaimed objects. Instead it supports finalization. Objects inherited from class 'Finalized' must supply a method 'finalize()' which will be invoked, when the object is to be reclaimed. This method can be used to release external ressources, such as file handles etc. Finalization is _not_ cheap and must be used with care. When a finalizer executes, all objects reachable from the finalizer will be available. Dependencies among finalizers will _not_ be taken into account when finalizers are scheduled. Finalization is asynchronous and may be delayed arbitrarily. Finalization is guaranteed to occur, if not before, then at application exit. A Finalizer may choose to revive its object by making it once again reachable. This is considered _a_ _bad_ _thing_. In any case, finalization is done _only_ once. Calling operator new from within a finalizer has undefined (and usually catastrophic) effects. Objects derived from class Finalized, but allocated as uncollectable, e.g. using normal new or as local variables will not be finalized. Large objects (more than 1M, perhaps more than 256K) should be explicitly managed by the standard C++ mechanisms, since they are not handled well by the collector. In most circumstances you can hide the administration of non-collectable objects using handles. Handles are small garbage collected objects which forward any request to the explicitly administered objects. The handles must then use finalization to release the big objects. A note on initialisation order: The collector is initialised as a global object. This guarantees that the collector is operational when main() is invoked. Order of initialisation among global objects are implementation dependant, and therefore the collector may be initialised _after_ other global objects. SGC and memory usage SGC relies on a good virtual memory system to work well. SGC maps in any memory it'll ever use (exact amount decided at installation) during initialisation. It never releases memory. It decreases the actual portion of the VM that is frequently active, and expects the OS to release physical memory if needed by other applications. This technique causes tools such as 'top' to report far too high memory usage. On Linux this works very well. On some systems it may force excessive allocation of swap space. When using traditional new/delete, it is acceptable to allocate far more memory than the actually available physical memory, relying on the OS to swap out the infrequently used portions. With a tracing collector like SGC this is not an option, since every live (non-leaf) memory cell will be visited during every collection. Therefore the max heap size is limited at installation time to avoid trashing. The cost of garbage collection Garbage collection is surprisingly cheap in terms of CPU-time, but may imply a drastic increase in memory consumption. SGC is optmized for speed, and typically it uses 3-4 times as much memory as a traditional new/delete environment. For the applications tested so far SGC is clearly faster than GNUs new/delete. SGC is a mark-and-sweep collector. During marking the application stops while the collector scans live objects. Depending on your system, SGC will visit from 15 to 250Mb pr second. See below for a detailed amortized cost analysis In many applications large amount of data are known to be free of pointers, e.g. bitmaps, and by allocating such objects using special variants of new the cost of garbage collection may be decreased. Many applications has a rapid turnover of small objects with compiler determined size. SGC offers specially optimized variants of new to handle such objects. Destructors are not invoked when objects are reclaimed and recycled. Usually a destructors sole purpose is to delete other objects, and this task is better left to the collector. The cost of destructor invocation, especially for virtual destructors, is substantial. For objects administering external ressources, e.g. file handles, finalization is supported. Finalization guarantees the invocation of a special method on the object after it is dead but before it is reclaimed. Finalized objects can be used to administer memory for objects allocated on the non-collected heap. To aid in designing and choosing the best allocation schemes, a detailed analysis of the amortized cost of allocation with SGC follows. The cost is expressed in terms of the following primitive operations: - enQ, enqueue operation. Insertion of a block in a singly linked list. - deQ, dequeue operation. Removal of a block from a singly linked list. - sweep, sweeping of an object. This consist of a few bit manipulations and pointer dereferences. - alloc, allocation. This is similar in cost to sweep. - map, mapping a size to a free list. This is proportional to log2 of the objects size. Very fast for small objects. - clear, clearing a word. One memory reference. - trace, tracing a word. Requires two, occasionally four memory references N is the size of the object. L is the average amount of live memory, H is the heap size. R is explained below. operator cost (amortized) new(node) map + enQ + deQ + (1+R)*sweep + alloc + N*clear + R*N*trace new(fixedNode) enQ + deQ + (1+R)*sweep + alloc + N*clear + R*N*trace new(leaf) map + enQ + deQ + (1+R)*sweep + alloc new(fixedLeaf) enQ + deQ + (1+R)*sweep + alloc For comparison a class specific free list (class specific new & delete) costs of enQ+deQ. For typical values of R (see below) this is at least one third (perhaps half) of the cost of new(fixedLeaf). Thus, you can win *at most* a factor of three rolling your own new/delete compared to sgc's new(fixedLeaf). For non-leafs the size of the object is important. For larger objects, in many scenarios, the cost of clearing the object at allocation time is the single most important component of the cost. Explicitly deleting carries a very small constant cost, almost nothing. (well, actually deleting consists of enQ+sweep, but it also reliefs a following new operation of the same operations, so....). The variable R depends on the average amount of live data in the heap. Calling the heap size H and the amount of live data L: R = (L/(H-L)), which for typical L/H ratios is: L/H R R+1 0.33 0.5 1.5 0.5 1 2 0.66 2 3 0.80 4 5 In the default configuration SGC tries to keep R at or below 1 by expanding the heap if needed. Storage requirements may be significantly reduced by choosing an upper limit of say 0.66. Upper limits above 0.70 will usually cause unacceptable CPU overhead. Further, additional overhead is incurred when the heap needs to be defragmented. This is somewhat harder to pin down. It depends heavily on the allocation pattern of the application, how often defragmentation is needed. The cost of defragmentation is substantial, perhaps doubling or trippling the cost of every allocation. Defragmentation is done by first aggressively combining blocks and later splitting them as needed. The L/H ratio can be controlled using the call: void gc_limits(double lower, double upper) where lower minimal L/H ratio. The heap is expanded as needed to maintain the ratio. upper maximal L/H ratio. The heap is contracted as needed to maintain the ratio. you should choose lower < upper :-) The default L/H limits are 0.25 < L/H < 0.5 void gc_leafWeight(double leafWeight) where leafWeight indicates the weight leafs should be assigned when the desired heapsize is computed. This defaults to 0.3, meaning that leafs only counts as 30% of ordinary nodes. Increasing causes higher memory consumption and fewer collections. Finally, the size of the static root set: the data segment, data from linked libraries etc also influences the size of the heap. The cost of scanning those areas for pointers is amortized across all allocations by increasing the heap size. This leads to bigger heaps than what you would expect, especially for programs using very little live heap space. (Even "hello world" often includes libraries with 100K of data area, or more, which sets a minimal heap size of about 200Kb for any application) */ void gc_limits(double lower, double upper); void gc_leafWeight(double leafWeight); // dummy classes for use with placement syntax class NODE {}; class LEAF {}; class FIXED_NODE {}; class FIXED_LEAF {}; // dummy instances for use with placement syntax extern NODE node; extern LEAF leaf; extern FIXED_NODE fixedNode; extern FIXED_LEAF fixedLeaf; // operator new variants inline void* operator new(size_t size, NODE&); inline void* operator new(size_t size, FIXED_NODE&); inline void* operator new(size_t size, LEAF&); inline void* operator new(size_t size, FIXED_LEAF&); inline void gc_delete(void*&); #ifndef OLDCPP inline void* operator new[](size_t size, NODE&); inline void* operator new[](size_t size, FIXED_NODE&); inline void* operator new[](size_t size, LEAF&); inline void* operator new[](size_t size, FIXED_LEAF&); #endif // Virtual base class for finalization. Derive from this class to get // finalization support. class Finalized { protected: Finalized(); virtual ~Finalized(); public: unsigned long finalizerEntry; // _DONT_ tamper with this virtual void finalize() = 0; }; extern void gc_release(void* p); inline void gc_delete(void*& p) { gc_release(p); p=0; }; /* --------------------------------------------------------------------------- IMPLEMENTATION FOLLOWS -- WARNING! WARNING! WARNING! -- SPIFFY CODE AHEAD!! --------------------------------------------------------------------------- */ // LOG_MIN_ENTRY is log to minimal allocation unit. Should be 4 on a // 32-bit arch, 5 on a 64-bit arch. #define LOG_MIN_ENTRY 5 #define MAX_SZ 31 #define SET_ALLOC(buddy) (*buddy) |= (1<<7) #define SET_LEAF(buddy) (*buddy) |= (1<<6) // A _very_ simple singly-linked sentinel-terminated list. // This class implements the collectors free-lists. struct QLink { QLink* next; char* buddy; }; extern QLink sentinel; struct Q { QLink *head; unsigned long puts; unsigned long gets; public: inline int notEmpty() { return head!=0; } inline QLink *get() { QLink *t= head; head = t->next; gets++; return t; }; inline QLink *revertPut() { QLink *t = head; head = t->next; puts--; return t; }; inline void undoGet() { gets--; }; inline void undoRevertPut() { puts++; }; inline int isValid(QLink* p) { return (p!=&sentinel); }; inline void put(QLink *p) { p->next = head; head = p; puts++; }; inline void reset() { puts = gets = 0; head = &sentinel; }; inline unsigned long putted() { return puts; }; inline unsigned long getted() { return gets; }; inline unsigned long entries(){ return puts-gets; }; }; extern Q qs[MAX_SZ+1]; // qs holds the size-specific free-lists extern void* newNode(size_t size); extern void* newLeaf(size_t size); extern void* newNoGC(size_t size); typedef long long byte8; inline void* operator new(size_t size, NODE& dummy) { return newNode(size); }; inline void* operator new(size_t size, LEAF& dummy) { return newLeaf(size); }; inline void* operator new(size_t size, FIXED_NODE& dummy) { if (size <= (1<buddy); byte8* d = (byte8*) p; d[0] = 0; d[1] = 0; return p; } else if (size <= (1<<(1+LOG_MIN_ENTRY))) { void *p = qs[1].get(); if (p==&sentinel) { qs[1].undoGet(); return newNode(size); }; SET_ALLOC(((QLink*)p)->buddy); byte8* d = (byte8*) p; d[0] = 0; d[1] = 0; d[2] = 0; d[3] = 0; return p; } else return newNode(size); }; inline void* operator new(size_t size, FIXED_LEAF& dummy) { if (size <= (1<buddy); SET_LEAF(((QLink*)p)->buddy); return p; } else if (size <= (1<<(1+LOG_MIN_ENTRY))) { void *p = qs[1].get(); if (p==&sentinel) { qs[1].undoGet(); return newLeaf(size); }; SET_ALLOC(((QLink*)p)->buddy); SET_LEAF(((QLink*)p)->buddy); return p; } else return newLeaf(size); }; #ifndef OLDCPP inline void* operator new[](size_t size, FIXED_NODE& dummy) { return operator new(size,dummy); } inline void* operator new[](size_t size, NODE& dummy) { return operator new(size,dummy); } inline void* operator new[](size_t size, LEAF& dummy) { return operator new(size,dummy); } inline void* operator new[](size_t size, FIXED_LEAF& dummy) { return operator new(size,dummy); } #endif #endif