Now, GSlice is supposed to be faster than memchunks, yes.
And its supposed to scale far better across multiple threads.
In the long run it is also meant to be more efficient in terms of memory consumption.
However, that is mostly due to the fact that many code portions which use GMemChunk are keeping their own never-freeing trash stacks. Such code should now be migrated to the GSlice API and the trash stacks should be removed. GSlice maintains its own working set memory in per-thread trash stacks. Home grown trash stacks will just waste memory and clutter up the cache lines.
Also, using separate GMemChunks prevents chunks of equal sizes from being shared across a program, this again wastes memory and clutters up the cache lines.
Because of the long-term wastage that GMemChunk application tends to build up, significant memory savings from GSlice are actually only to be expected for longer running programs which certainly is not a scenario met directly after N770 boot up ;)
That being said, the original check-in of the slab allocator in the GSlice code does behave a bit greedy actually. Basically, it allocates a new page (4096 bytes on IA32) per every different size, memory chunks are allocated at (chunk sizes are aligned to 8 bytes on IA32). That means, initially allocating 8 + 16 + 24 + 32 + 40 bytes in separate chunks does require opening up of 5 caches, so it uses 5 * 4096 = 20KB already.
I’ve tuned the most recent CVS version now to do more economical caching, so the above scenario now ends up at roughly the power-of-2 sums of 8*8, 16*8, 24*8, 32*8 and 40*8, which is approximately 1.6KB and can all be allocated from a single memory page. While this is a significant memory saver, it also does have some performance impacts. However, in all test scenarios on my machine, the GSlice performance didn’t drop by more than 5%. That’s probably bearable, considering how significant the savings are.