Jun 272015
 

c++11Yesterday I spent some 14+ hours on getting a templated undo method wrapper going.
Just to throw it all away this morning.

Here’s what I was trying to achieve, the C version of BEAST implements undo as follows:

// bse_track_remove_tick():
BseTrack *track;
uint tick;
BsePart *part;
bse_item_push_undo_proc (track, "insert-part", tick, part);

That is, it queues an undo step, that if executed, will call the “insert-part” procedure
on a BseTrack object that inserts a BsePart object at a ‘tick’.
This all happens through a varargs interface with lots of magic behind the scenes. In
particular the reference to ‘part’ is tricky. Future modifications to the BseTrack (or
project) may cause the removal and destruction of the BsePart object involved here.
While the execution of future undo steps will re-create a BsePart to be inserted here
before the step at hand is executed, the ‘part’ object pointer will have to be changed
to the re-created one instead of the destroyed one.
To achieve this, bse_item_push_undo_proc() internally converts the ‘part’ pointer into
a serializable descriptor string that allows to re-identify the BsePart object and the
undo machinery will resolve that before “insert-part” is called.

Now on to C++. I wanted the new pendant in the C++ version of Beast to look like:

// TrackImpl::remove_tick():
TrackImpl *this;
const uint tick;
PartImpl ∂
push_undo ("Remove Tick", *this, &TrackImpl::insert_part, tick, part);

But…

Under the hood that means push_undo() (which is a template method on ItemImpl, a base type of TrackImpl) needs to process its variable argument list to:

  • A) Put each argument into a wrapper structure and store away the argument list (i.e. std::tuple<Wrapper<Args>…>).
  • B) Special case the wrapper structure for objects to store a descriptor internally (i.e. template specialisation on Wrapper<Arg> for Arg=ItemImpl& or derived).
  • C) Copy the wrapped argument list into a closure to be called when the undo step is executed.
  • D) When the closure is called, “unwrap” each of the wrapped arguments to yield its original type (i.e. construct a std::tuple<Args…> from std::tuple<Wrapper<Args>…>).
  • E) When unwrapping an object, resolve the descriptor stored internally (i.e. put more magic into Wrapper<Arg> to yield a valid Arg& object).
  • F) Construct a variable argument call to &TrackImpl::insert_part(…) (i.e. apply a C++ argument pack).

In short, I got A, B, C, D, F working after significant efforts.
A is somewhat straight forward with C++11 variable template arguments. C can be accomplished with a C++11 lambda capture list and F involves copying over std::integer_sequence from the C++14 proposals and hacking its std::apply() template to support instance + method calls. Last, D can be implemented in a related fashion to F.
What’s left is B and E, i.e. writing a wrapper that will store and yield ordinary arguments such as int or std::string and convert ItemImpl& derived types back and forth between a string representation.
Probably laborious but doable — or so I thought.

It turns out that because of all the argument and tuple packing hassle (template recursion, integer sequencing and more) involved in implementing A, D, F, it would be hard to pass needed serialization context into Wrapper<>. And what’s much worse is that g++-4.9 started to choke on template errors during the Wrapper<> development, aborting with “confused by earlier errors” after pages and pages of template error messages. clang++-3.4 isn’t yet capable of processing the C++11 used by Rapicorn, so it wasn’t of help here either (I plan on another attempt at porting my C++11 code to be clang++ compatible once I get my hands on a newer clang++ version).
I.e. in the end, I gave up after an overlong day in the middle of E, everything else having been accomplished. g++-4.9 choking was a main let down, but probably even more important is that I had the necessary state and mood to process multiple pages of template error messages yesterday, but the same cannot be expected of every push_undo() user in the future if any push_undo() argument ever mismatches.

This morning, I threw away yesterdays templating excess and within an hour got an alternative interface to work:

// undoing part removal needs an undo_descriptor because future
// deletions may invalidate and recreate the part object
TrackImpl *this;
const uint tick;
PartImpl &part;
UndoDescriptor<PartImpl> part_descriptor = undo_descriptor (part);
auto lambda = [tick, part_descriptor] (TrackImpl &self) {
  PartImpl &part = self.undo_resolve (part_descriptor);
  self.insert_part (utick, part);
};
push_undo ("Remove Tick", *this, lambda);

That is, this interface is fully type-safe, but the ‘part’ wrapping has to be done manually, which involves writing a small lambda around TrackImpl::insert_part(). If any argument of the lambda or push_undo() calls is erroneous, the compiler will point at a single failing variable assignment in the implementation of push_undo<>() and list the mismatching arguments.
That is much more digestible than multiple template recursion error pages, so it’s a plus on the side of future maintenance.

The short version of push_undo<>() that takes a method pointer instead of a lambda is still available for implementing undo steps that don’t involve object references, incidentally covering the majority of uses.

Aug 052014
 

Map Jan-Dec to 1-12

In a time critical section of a recent project, I came across having to optimize the conversion of three digit US month abbreviations (as commonly found in log files) to integers in C++. That is, for “Jan” yield 1, for “Feb” yield 2, etc, for “Dec” yield 12.

In C++ the simplest implementation probably looks like the following:

std::string string; // input value
std::transform (string.begin(), string.end(), string.begin(), ::toupper);
if (string == "JAN") return 1;
if (string == "FEB") return 2;
// ...
if (string == "DEC") return 12;
return 0; /* mismatch */

In many cases the time required here is fast enough. It is linear in the number of months and depending on the actual value being looked up. But for an optimized inner loop I needed something faster, ideally with running time independent of the actual input value and avoiding branch misses where possible. I could take advantage of a constrained input set, which means the ‘mismatch’ case is never hit in practice.

To summarize:

  • Find an integer from a fixed set of strings.
  • Ideal runtime is O(1).
  • False positives are acceptable, false negatives are not.

That actually sounds a lot like using a hash function. After some bit fiddling, I ended up using a very simple and quick static function that yields a result instantly but may produce false positives. I also had to get rid of using std::string objects to avoid allocation penalties. This is the result:


static constexpr const char l3month_table[] = {
  12, 5, 0, 8, 0, 0, 0, 1, 7, 4, 6, 3, 11, 9, 0, 10, 2
}; // 17 elements

/// Lookup month from 3 letters, with 30% chance returns 0 for invalid inputs.
static constexpr inline unsigned int
l3month (const char *l3str)
{
  return l3month_table[((l3str[1] & ~0x20) + (l3str[2] & ~0x20)) %
                       sizeof (l3month_table) / sizeof (l3month_table[0])];
}

The hash function operates only on the last 2 ASCII letters of the 3-letter month abbreviations, as these two are sufficient to distinguish between all 12 cases and turn out to yield good hash values. The expression (letter & ~0x20) removes the lowercase ASCII bit, so upper and lower case letters are treated the same without using a potentially costly if-branch. Adding the uppercased ASCII letters modulo 17 yields unique results, so this simple hash value is used to index a 17 element hash table to produce the final result.

In effect, this perfectly detects all 12 month names in 3 letter form and has an almost 30% chance of catching invalid month names, in which case 0 is returned – useful for debugging or assertions if input contracts are broken.

As far as I know, the function is as small as possible given the constraints. Anyone can simplify it further?