One of the typical problems with implementing an allocator is that all sorts of memory failures in programs get attributed to the allocator. That’s because messing up heap memory somewhere in a program or library usually messes up the allocator state even if it is implemented correctly.

For several months now, we have been trying to track down and fix a very nasty memory corruption issue in Beast (#340437), which could be hunted down to rare and racy triggering with GSlice enabled, but not with http://developer.gnome.org/doc/API/2.0/glib/glib-running.html#G_SLICE set. Valgrind’s memchecker didn’t help with this particular Beast bug and couldn’t possibly track all invalid GSlice uses.

So i sat down to implement a slice address and size validator to catch the most frequent GSlice misuses. The debugging validator consists of a flat fixed-size hashing tree with fairly big prime-sized nodes and binary searchable arrays to manage collisions in the hash buckets. This structure avoids circular dependencies with GSlice (like GHashTable and GTree would have it) and is still reasonably easy to implement.

The hashing makes use of the approximately random distribution of allocator block addresses in the lower bits but fairly sequential distribution in the higher bits, so the structures are reasonably space efficient and can adapt in size incrementally with every new megabyte handed out by malloc(3)/memalign(3). This also keeps hash collisions on an acceptably low average for allocations within a 4GB address range. Using a single global structure keeps GSlice from scaling well across multiple threads, but i think that’s fair enough for a debugging mode validator. Quickly after running an early version of this validator, the cause of the above Beast bug was found and could be fixed.

$ G_SLICE=debug-blocks testcase
  GSlice: MemChecker: attempt to release block with invalid size: 0x80527e0 size=8 invalid-size=12
  Aborted

The faulty code was mixing up GSList and GList nodes, which couldn’t be caught by the compiler because the use of object data and timeout data forces void* casts and gives up type safety:

    void *slist = g_slist_alloc(); // void* gives up type-safety
    g_list_free (slist);           // corruption: sizeof (GSList) != sizeof (GList)

At this point, the validator is in GLib SVN and can be enabled with https://developer.gnome.org/doc/API/2.0/glib/glib-running.html#G_SLICE.

Another issue that some people might run into with the recent GSlice code is:

***MEMORY-WARNING***: GSlice: g_thread_init() must be called before all
 other GLib functions; memory corruption due to late invocation of
 g_thread_init() has been detected; this program is likely to crash,
 leak or unexpectedly abort soon...

The correct workaround for that is to add an early g_thread_init() call to your program. More details can be found in the gtk-devel email: bugs regarding late g_thread_init() calls.

Post comment via email