Jan 252013
 

C++11Signals

Performance of a C++11 Signal System

First, a quick intro for for the uninitiated, signals in this context are structures that maintain a lists of callback functions with arbitrary arguments and assorted reentrant machinery to modify the callback lists and calling the callbacks. These allow customization of object behavior in response to signal emissions by the object (i.e. notifying the callbacks by means of invocations).

Over the years, I have rewritten each of GtkSignal, GSignal and Rapicorn::Signal at least once, but most of that is long a time ago, some more than a decade. With the advent of lambdas, template argument lists and std::function in C++11, it became time for me to dive into rewriting a signal system once again.

So for the task at hand, which is mainly to update the Rapicorn signal system to something that fits in nicely with C++11, I’ve settled on the most common signal system requirements:

  • Signals need to support arbitrary argument lists.
  • Signals need to provide single-threaded reentrancy, i.e. it must be possible to connect and disconnect signal handlers and re-emit a signal while it is being emitted in the same thread. This one is absolutely crucial for any kind of callback list invocation that’s meant to be remotely reliable.
  • Signals should support non-void return values (of little importance in Rapicorn but widely used elsewhere).
  • Signals can have return values, so they should support collectors (i.e. GSignal accumulators or boost::signal combiners) that control which handlers are called and what is returned from the emission.
  • Signals should have only moderate memory impact on class instances, because at runtime many instances that support signal emissions will actually have 0 handlers connected.

For me, the result is pretty impressive. With C++11 a simple signal system that fullfils all of the above requirements can be implemented in less than 300 lines in a few hours, without the need to resort to any preprocessor magic, scripted code generation or libffi.

I say “simple”, because over the years I’ve come to realize that many of the bells and whistles as implemented in GSignal or boost::signal2 don’t matter much in my practical day to day programming, such as the abilities to block specific signal handlers, automated tracking of signal handler argument lifetimes, emissions details, restarts, cancellations, cross-thread emissions, etc.

Beyond the simplicity that C++11 allows, it’s of course the performance that is most interesting. The old Rapicorn signal system (C++03) comes with its own set of callback wrappers named “slot” which support between 0 and 16 arguments, this is essentially mimicking std::function. The new C++11 std::function implementation in contrast is opaque to me, and supports an unlimited number of arguments, so I was especially curious to see the performance of a signal system based on it.

I wrote a simple benchmark that just measures the times for a large number of signal emissions with negligible time spent in the actual handler.

I.e. the signal handler just does a simple uint64_t addition and returns. While the scope of this benchmark is clearly very limited, it serves quite well to give an impression of the overhead associated with the emission of a signal system, which is the most common performance relevant aspect in practical use.

Without further ado, here are the results of the time spent per emission (less is better) and memory overhead for an unconnected signal (less is better):

Signal System   Emit() in nanoseconds Static Overhead Dynamic Overhead
GLib GSignal 341.156931ns   0   0
Rapicorn::Signal, old  178.595930ns  64   0
boost::signal2   92.143549ns  24  400 (=265+7+8*16)
boost::signal   62.679386ns  40   392 (=296+6*16)
Simple::Signal, C++11    8.599794ns   8   0
Plain Callback    1.878826ns   –   –

 

Here, “Plain Callback” indicates the time spent on the actual workload, i.e. without any signal system overhead, all measured on an Intel Core i7 at 2.8GHz. Considering the workload, the performance of the C++11 Signals is probably close to ideal, I’m more than happy with its performance. I’m also severely impressed with the speed that std::function allows for, I was originally expecting it to be at least a magnitude larger.

The memory overhead gives accounts on a 64bit platform for a signal with 0 connections after its constructor has been called. The “static overhead” is what’s usually embedded in a C++ instance, the “dynamic overhead” is what the embedded signal allocates with operator new in its constructor (the size calculations correspond to effective heap usage, including malloc boundary marks).

The reason GLib’s GSignal has 0 static and 0 dynamic overhead is that it keeps track of signals and handlers in a hash table and sorted arrays, which only consume memory per (instance, signal, handler) triplet, i.e. instances without any signal handlers really have 0 overall memory impact.

Summary:

  • If you need inbuilt thread safety plus other bells and can spare lots of memory per signal, boost::signal2 is the best choice.
  • For tight scenarios without any spare byte per instance, GSignal will treat your memory best.
  •  If you just need raw emission speed and can spare the extra whistles, the C++11 single-file simplesignal.cc excels.

For the interested, the brief C++11 signal system implementation can be found here: simplesignal.cc
The API docs for the version that went into Rapicorn are available here: aidasignal.hh

PS: In retrospect I need to add, this day and age, the better trade-off for Glib could be one or two pointers consumed per instance and signal, if those allowed emission optimizations by a factor of 3 to 5. However, given its complexity and number of wrapping layers involved, this might be hard to accomplish.

Tweet about this on TwitterShare on Google+Share on LinkedInShare on FacebookFlattr the authorBuffer this pageShare on RedditDigg thisShare on VKShare on YummlyPin on PinterestShare on StumbleUponShare on TumblrPrint this pageEmail this to someone

[suffusion-the-author display='description']

  17 Responses to “Performance of a C++11 Signal System”

  1. Have you looked at http://www.reddit.com/r/cpp/comments/16zfn9/implementation_of_delegates_in_c11/ ?

    I’m asking because std::function is quite a complex beast and is the most generic solution with most features, but also the slowest because of the type erasure needed. Against the plain callback, it was also 8 times slower.
    If you leave out binding arguments, you could use of the delegates libraries discussed there which are a few times faster than std::function

    • Interesting you mention that. The Simple::slot() implementation that I’m providing to connect an instance and class method pointer to a signal provides delegate semantics. I’ve simply used a C++11 lambda wrapper to bind the first argument instead of rolling another class type.
      The reason std::function and type erasure is needed for the signal interface is that any kind of functor can be connected and called during an emission, C style function pointers, lambdas or callable objects.

  2. Are there any differences, cache-wise, that could account for this? I ask because AFAIK a remote L3 cache hit on a 3Ghz i7 is 100ns…

    • Modern chipsets like the i5 or i7 provide 32kB or 64kB of 1st level cache. The benchmark emission I’ve used has a single function that does a 64bit integer addition, then the emission is performed a million times. I.e. the user code doesn’t consume a vast amount of memory, and all signal systems should be able to keep track of a single handler connection with less than 1kB of memory.
      So I’m pretty sure caching doesn’t play significant effects in this particular benchmark, after the first emission all needed data is present in the 1st level cache and all branches can be predicted from the last path taken.

  3. For more details, the corresponding G+ posting has a treatment of the GSignal overhead in the comments section: https://plus.google.com/110781595661917803096/posts/ZjDxgUkrv7v

  4. Which standard library implementation of std::function did you use? libstdc++ always does dynamic memory allocation and I’ve yet to see a compiler remove the indirection. libc++’s implementation uses the small object optimization and I’ve seen compilers remove the indirection and inline through it.

    • I think that was libstdc++6-4.7, packaged with g++-4.7 on Ubuntu. Note that the dynamic allocation is not relevant for the performance during emission, as that doesn’t create/destroy function objects. And it’s also not relevant for the memory usage in the above table as that addresses the memory overhead for signals without any handler connection.

  5. I was looking at your source and I think that you could make your signals a litter faster by changing this

    CollectorResult
    emit (Args… args)
    {
    Collector collector;
    if (!callback_ring_)
    return collector.result();
    SignalLink *link = callback_ring_;
    link->incref();
    do
    {
    if (link->function != NULL)
    {
    const bool continue_emission = this->invoke (collector, link->function, args…);
    if (!continue_emission)
    break;
    }
    SignalLink *old = link;
    link = old->next;
    link->incref();
    old->decref();
    }
    while (link != callback_ring_);
    link->decref();
    return collector.result();
    }

    to use perfect forwarding

    CollectorResult
    emit (Args&&… args)
    {
    Collector collector;
    if (!callback_ring_)
    return collector.result();
    SignalLink *link = callback_ring_;
    link->incref();
    do
    {
    if (link->function != NULL)
    {
    const bool continue_emission = this->invoke (collector, link->function, std::forward(args)…);
    if (!continue_emission)
    break;
    }
    SignalLink *old = link;
    link = old->next;
    link->incref();
    old->decref();
    }
    while (link != callback_ring_);
    link->decref();
    return collector.result();
    }

    and it will retain the lvalue/rvalue nature of the function arguments, just an idea and you may have thought of this already and decided not to use it.

    • Thanks for the suggestion Aaron.
      If you actually modify the code according to your suggestion and test it, you’ll notice that it fails.
      That’s because the std::string signal argument of sig1 is not properly constructed/converted from the (const char*) arguments passed in with perfect forwarding.
      At calling emit() time, parameter conversion/construction of the signal argument types from the caller argument types needs to occur, that’s why this is the one place that cannot use forwarding.

  6. Thanks for the code! For anybody interested, in order to use it in Windows/VS2013, you just need to:
    – replace with
    – replace NULL with nullptr

  7. thank you for best posting! i always used boost::signals2 but, have too much overhead.

  8. Is there any complete example for using your Simple Signal APIs?

  9. Is Simple::Signal cross-platform, can I use it on iOS?

    • Feel free to go ahead and report the result here. If your compiler is reasonably up to par with C++ standards, it should work.

  10. Thank you for this smart piece of code!

    clang reports a warning:

    'CollectorInvocation' defined as a struct template here but previously declared as a class template [-Wmismatched-tags]
    struct CollectorInvocation {
    ^
    did you mean struct here?
    template class CollectorInvocation;
    ^~~~~
    struct

  11. I experienced some multithreading issues. So I converted ProtoSignal::refCount from an ‘int’ to a ‘std::atomic’.
    Additionally I added a mutex to protect each ‘callback_ring_’ access. In my case I used a ‘multiple read’, ‘single write’ mutex from boost, but a normal std::mutex will do it as well.


    #include
    #include

    boost::shared_mutex callback_ring_mutex_;

    void ensure_ring ()
    {
    boost::upgrade_lock sharedLock(callback_ring_mutex_); // read only
    if (!callback_ring_)
    {
    boost::upgrade_to_unique_lock uniqueLock(sharedLock); // upgrade to write access
    callback_ring_ = new SignalLink (CbFunction()); // ref_count = 1
    callback_ring_->incref(); // ref_count = 2, head of ring, can be deactivated but not removed
    callback_ring_->next = callback_ring_; // ring head initialization
    callback_ring_->prev = callback_ring_; // ring tail initialization
    }
    }

    Add the mutex wherever callback_ring_ is accessed: ‘ProtoSignal()’, ‘operator+=()’, ‘operator-=()’, ’emit()’.

  12. Thanks for the example code.

    I’ve taken the liberty of modifying your version with the intention of:

    – better understanding it.
    – simplify it.
    – tailoring it to my needs and tastes.
    – make it marginally more flexible and robust.

    A side effect of this is that, according to your benchmark functions, my modified version is about twice as fast. It can be found at:

    http://www.intraterrestrial.com/users/naptrel/sniks/simplesignal2.cpp

 Leave a Reply

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>