GSlice considerations and possible improvements

Tim Janik

Projects

The paper Mesh: Compacting Memory Management for C/C++ Applications is about moving memory allocations for compaction, even though the memory pointers are exposed. The idea is to merge allocation blocks from different pages that are not overlapping at page offsets, and then letting multiple virtual page pointers point to the same physical page. Some have asked about the applicability to the GSlice allocator. Well:

The GSlice allocator doesn’t suffer from the catastrophic fragmentation described in the paper intro. Basically, it has per-thread caches (magazines, on top of a slab allocator) for objects of the same size and it allocates and recycles objects of only one size from a single memory page, which prevents large fragmentation due to widely varying object sizes. The slab paper has extensive treatment about how this reduces internal and external fragmentation what sizes lead to which trade-offs, etc. This has been one of the main contributions of this allocator type.
Regarding performance; the GLibc malloc() allocator has very decent performance, it actually always was on par with GSlice performance. However, GSlice often had a speed advantage over malloc on non-GLibc systems, but the main reason to use it is that it can be much more memory efficient, because the slice size is passed on by the caller at release time. That means the GSlice allocator doesn’t have to store boundary tags on each slice - malloc stores two size_t fields before each memory block that contain the memory size/area release needs to know about.
True to the original slab magazine paper, GSlice tries to amortize overhead and reduce thrashing, so it will only recycle memory pages back to the kernel after a timeout of several seconds, a fact that is not reflected by the benchmarks in the Gitlab bug 1079 - Drop the GSlice allocator. In any case, both the GSlice and GLibc allocators maintain per thread caches and tend to favor performance over coalescing, so it’d be interesting to measure if the MESH allocator can keep up with the two performance wise.
The MESH allocator potentially increases pressure on the TLB (Translation Lookaside Buffer - several layers of indexing tables that are needed to locate a physical memory page from a virtual memory page address), which affects application access times to the memory regions handed out by the allocator. This is an area where GSlice could be improved by replacing the system page allocator used by the slab allocator in the following ways:
1. The use of posix_memalign() at page boundary still requires the system allocator to allocate pages plus boundary tags. The slab allocator could be made much better use of, if page allocation overhead and fragmentation at the page level was reduced by administering it’s own page pool by using mmap() directly.
2. Application access to the memory regions provided by GSlice could be made significantly faster by using 2MB page sizes via MAP_HUGETLB, essentially removing one layer of indirection from physical memory accesses. Unfortunately, many distributions still default to 0 allowed Hugepages. You can use cat /proc/sys/vm/nr_hugepages if you want to find out about your system.

On an aside, the paper links cited in gslice.c have stopped working, but a search for the paper titles can still reveal downloadable PDFs:

Post comment via email