Aug 052014

Map Jan-Dec to 1-12

In a time critical section of a recent project, I came across having to optimize the conversion of three digit US month abbreviations (as commonly found in log files) to integers in C++. That is, for “Jan” yield 1, for “Feb” yield 2, etc, for “Dec” yield 12.

In C++ the simplest implementation probably looks like the following:

std::string string; // input value
std::transform (string.begin(), string.end(), string.begin(), ::toupper);
if (string == "JAN") return 1;
if (string == "FEB") return 2;
// ...
if (string == "DEC") return 12;
return 0; /* mismatch */

In many cases the time required here is fast enough. It is linear in the number of months and depending on the actual value being looked up. But for an optimized inner loop I needed something faster, ideally with running time independent of the actual input value and avoiding branch misses where possible. I could take advantage of a constrained input set, which means the ‘mismatch’ case is never hit in practice.

To summarize:

  • Find an integer from a fixed set of strings.
  • Ideal runtime is O(1).
  • False positives are acceptable, false negatives are not.

That actually sounds a lot like using a hash function. After some bit fiddling, I ended up using a very simple and quick static function that yields a result instantly but may produce false positives. I also had to get rid of using std::string objects to avoid allocation penalties. This is the result:

static constexpr const char l3month_table[] = {
  12, 5, 0, 8, 0, 0, 0, 1, 7, 4, 6, 3, 11, 9, 0, 10, 2
}; // 17 elements

/// Lookup month from 3 letters, with 30% chance returns 0 for invalid inputs.
static constexpr inline unsigned int
l3month (const char *l3str)
  return l3month_table[((l3str[1] & ~0x20) + (l3str[2] & ~0x20)) %
                       sizeof (l3month_table) / sizeof (l3month_table[0])];

The hash function operates only on the last 2 ASCII letters of the 3-letter month abbreviations, as these two are sufficient to distinguish between all 12 cases and turn out to yield good hash values. The expression (letter & ~0x20) removes the lowercase ASCII bit, so upper and lower case letters are treated the same without using a potentially costly if-branch. Adding the uppercased ASCII letters modulo 17 yields unique results, so this simple hash value is used to index a 17 element hash table to produce the final result.

In effect, this perfectly detects all 12 month names in 3 letter form and has an almost 30% chance of catching invalid month names, in which case 0 is returned – useful for debugging or assertions if input contracts are broken.

As far as I know, the function is as small as possible given the constraints. Anyone can simplify it further?

Jul 142007

For Rapicorn i need a simple and nicely readable way to special case code on various strings, which isn’t a problem in most scripting languages but is not provided by C/C++. Switching on strings turns out to be a widely investigated topic: SOS (agay), TheScript, JavaBug (CNFW), CodeGuru, CLC-FAQ and google has lots more.

Now here goes my approach to switching on strings for C/C++, caveats upfront:
– The strings may only consist of identifier chars: [A-Za-z_0-9];
– Convenient application is based on a GNU Preprocessor feature: -imacros;
– Macro implementations use a GCC-ism: Statement Expressions;
– The convenient command line variant uses a bash-ism: Process Substitution;
– One compile-time script is needed, currently implemented in python (though it could be written in any language).

With this out of the way, let’s dive into the code. This is the syntax:

	switch (SOSID_LOOKUP (sample_string))
          case SOSID (hello): printf ("Hello ");   break;
          case SOSID (world): printf ("World! ");  break;
          case 0: default:    printf ("unknown "); break;

This is how the code needs to be compiled: gcc -Wall -O2 -imacros <( <input.c) input.c

The actual implementation is fairly simple, extracts all identifiers from SOSID() statements and generates a handful of macro definitions: SOSID(), SOSID_LOOKUP(), SOSID_LOOKUP_ICASE(), ... Plus input specific macros: SOSID__hello, SOSID__world, ... GCC will pick up those definitions via the -imacros option. That is enough to implement SOSID_LOOKUP() so that it can yield the integer IDs that the SOSID(*)/SOSID__* statements expand to for any string that corresponds to one of the SOSID(identifiers) occurring in the source file.

call_switch ("hello"); call_switch ("foo0815"); call_switch ("world");
Hello unkown World!

The lookups are implemented via a binary search and the strings are sorted in a way so that binary searches for case sensitive and case insensitive lookups are both possible. That means for large string lists, the lookups which are of complexity O(ld N) actually have a performance advantage over lengthy if..else if..else constructs with O(N) complexity.
The code is available as a single script at the moment:

Note that the bash-ism, imacros-use and GCC-isms are not mandatory, ordinary temporary files can be generated by the script and be included instead and it could be adapted to generate portable inline functions. The identifier-chars restriction is a hard one though. It comes from the fact that SOSID(ident) must yield valid C/C++ integers, and that can only be accomplished by token pasting. Depending on GCC is good enough for Rapicorn, so people will have to express active interest in this, in order for me to make the script more portable.

Nov 242006

Last week i got a bug report where activating “Help/Quick Start”, “Help/FAQ” and any other web browser related menu item in Beast would do nothing.
That of course completely defeats the purpose of writing help documents and offering help via the menu.
Luckily, we were able to get browser launching debugging print outs in that scenario, and what happened was:
* Beast started up gnome-open with the help URL.
* Since gnome-open was present on the system and could be started, browser launching was considered successful and no error dialog had to be displayed.
* gnome-open in turn would error out (for unknown reasons):

	Error showing URL: There was an error launching the default action command associated with this location.

The problem here is clearly that Beast wrongly assumed launching a browser or script successfully would be enough. I’ve since reworked the logic so that:
* Browser launch scripts/programs are now always started in the foreground to verify their respective exit codes.
* A correctly functioning browser launcher now has to reliably start the browser in the background and has to provide a correct exit code.
* Graphical browser programs are started in the background, because browsers can reasonably be expected to show a URL once started successfully. Besides that, the exit code depends on other events during a browsing session anyway and it also is only provided at the end of a browsing session.
* If no browser launching script with 0 exit code could be started and no browser program could be started successfully in the background, a descriptive error dialog is shown, including the target URL.

I had to test out a bunch of browser launchers to get this to work reliably, and some of the launchers that fell short to be usable really came at a surprise:
* gnome-open (GNOME) – the browser is opened in the background, and if no browser could successfully be launched a non-0 exit code is provided.
* exo-open (XFCE) – browser is opened in the background, provides correct exit code.
* kfmclient (KDE) – browser is opened in the background, provides correct exit code.
* gnome-moz-remote (old GNOME) – browser is opened in the background, provides correct exit code.
* browser-config (Gentoo) – this script goes through some lengths to make sure all browsers are opened in the background, but then it always quits with an exit code of 0, even if no browser was configured and launched.
* xdg-open (Portland) – provides correct exit code, but the browsers are unpredictably launched in the foreground or background (usually depending on whether a running browser instance already exists or not).
* sensible-browser (Debian) – provides correct exit code, but unpredictably opens in foreground or background.
* htmlview (Red Hat) – provides correct exit code, but unpredictably opens in foreground or background.

In the end, the Linux distribution launchers that i was really hoping for to get user configuration right (honor $BROWSER) and work reliably across desktops failed short to do so (browser-config, xdg-open, sensible-browser and htmlview), while all the desktop project launchers fullfiled the required needs (gnome-open, exo-open and kfmclient). For everyone interested, the current browser launching code used in Beast now can be found here:
Beast: birnet/ – scroll to “url_test_show“.

Jun 302005

Majorly improved the speed of rectangle shading in Rapicorn, here’s a quick-and-dirty random number generator which is still good enough for dithering:

	inline uint32 quick_rand32 (void)
	  static uint32 accu = 2147483563;
	  accu = 1664525 * accu + 1013904223;
	  return accu;


This has been the major speed-up-knob, other aspects were as confusing as profiling often gets, e.g. -ffast-math actually slowed down the rectangle shader by 13%.