Dec 272016
 

33c3

The last update has been a while, so with the new year around the corner and sitting in c-base @ 33c3, I’ll do my best to sum up what’s been going on in Rapicorn and Beast development since the last releases.

Now both projects make use of extended instruction sets (SIMD) that have been present in CPUs for the last 8 – 10 years, such as MMX, SSE, SSE2, SSE3 and CMPXCHG16B. Also both projects now support easy test builds in Docker images, which makes automated testing for different Linux distributions from travis-ci much simpler and more reproducible. Along the way, both got finally fixed up to fully support clang++ builds, although clang++ still throws a number of warnings. This means we can use clang++ based development and debugging tools now! A lot of old code that became obsolete or always remained experimental could be removed (and still is being removed).

Beast got support for using multiple CPU cores in its synthesis engine, we are currently testing performance improvements and stability of this addition. Rapicorn gained some extra logic to allow main loop integration with a GMainContext, which allows Beast to execute a Gtk+ and a Rapicorn event loop in the same thread.

Rapicorn widgets now always store coordinates relative to their parents, and always buffer drawings in per-widget surfaces. This allowed major optimizations to the size negotiation process so renegotiations can now operate much more fine grained. The widget states also got an overhaul and XML nodes now use declare=”…” attributes when new widgets are composed. Due to some rendering changes, librsvg modifications could be obsoleted, so librapicorn now links against a preinstalled librsvg. RadioButton, ToggleButton, SelectableItem and new painter widgets got added, as well as a few convenience properties.

After setting up an experimental Rapicorn build with Meson, we got some new ideas to speed up and improve the autotools based builds. I.e. I managed to do a full Rapicorn build with meson and compare that to autotools + GNU Make. It turns out Meson had two significant speed advantages:

  1. Meson builds files from multiple directories in parallel;
  2. Meson configuration happens a lot faster than what the autoconf scripts do.

Meson also has/had a lot of quirks (examples #785, #786, #753) and wasn’t really easier to use than our GNU Make setup. At least for me – given that I know GNU Make very well. The number one advantage of Meson was overcome with migrating Rapicorn to use a non-recursive Makefile (I find dependencies can still be expressed much better in Make than Meson), since parallel GNU Make can be just as fast as Ninja for small to medium sized projects.

The number two issue is harder to beat though. Looking at our configure.ac file, there where a lot of shell and compiler invocations I could remove, simply by taking the same shortcuts that Meson does, e.g. detect clang or gcc and then devise a batch of compiler flags instead of testing compiler support for each flag individually. Executing ./configure takes ca 3 seconds now, which isn’t too bad for infrequent invocations. The real culprit is autoreconf though, which takes over 12 seconds to regenerate everything after a configure.ac or related change (briefly looking into that, it seems aclocal takes longer than all of autoconf, automake, autoheader and libtoolize together).

 

PS: I’m attending 33C3 in Hamburg atm, so drop me a line (email or twitter) if you’re around and like to chat over coffee.

Oct 142015
 

ci-build-passing

I’ve spent the last week setting up Rapicorn and Beast with travis-ci.org, a free continuous integration service for Github. Since travis is only available for Github, this means the Beast Git repository (hosted on git.gnome.org) had to be moved (cloned) to Github.

Luckily, Git allows pushing to mutiple remotes:

git remote add all git@github.com:tim-janik/beast.git
git remote set-url --add --push all git@github.com:tim-janik/beast.git
git remote set-url --add --push all ssh://timj@git.gnome.org/git/beast
git remote show all
* remote all
  Fetch URL: git@github.com:tim-janik/beast.git
  Push  URL: git@github.com:tim-janik/beast.git
  Push  URL: ssh://timj@git.gnome.org/git/beast
  HEAD branch: master
  Remote branch:
    master new (next fetch will store in remotes/all)
  Local ref configured for 'git push':
    master pushes to master (up to date)

Now the following push will update both repositories:

git push all master

Also, ‘git push’ can be configured to push to ‘all’ instead of ‘origin’ by default:

git checkout master && git branch -u all/master
git push 
 To git@github.com:tim-janik/beast.git
  038d442..22c807a master -> master
 To ssh://timj@git.gnome.org/git/beast
  038d442..22c807a master -> master

The repos now contain a file .travis.yml that includes the complete build instructions, these need to be kept uptodate if any of the build dependencies change.

By default, travis-ci sets up Ubuntu 12.04 boxes for the continuous builds, but that’s way too old for most dependencies. Luckily there’s a beta program available to use Ubuntu 14.04 ‘trusty’, that can be selected with “dist: trusty”. The g++-4.8 compiler on trusty is still too old to build Beast, so the CI setup currently installs g++-5 from ppa:ubuntu-toolchain-r/test.

As a result, we now have automated test builds running on travis for the Rapicorn and Beast repositories that are triggered on each push command. After each build, the build bot reports success to the #beast IRC channel, and the current status can also be found via the “Build Status” buttons on github: Rapicorn Beast.

Jul 022015
 

Rapicorn 'visitor' branch

Trying to keep it up, here’s an update on recent developments in Rapicorn and Beast.

Git Branches

For now, Rapicorn and Beast are using Git branches the following way:

  • Topic branches are created for each change. Where possible, commits should compile and pass all tests (i.e. pass make check installcheck).
  • Once completed, topic branches are merged into the master branch. For intermediate merges of huge branches, I’ve recently been adding [ongoing] to the merge commit message. As an aside, branch merges should probably be more elaborate in the future to make devlog articles easier to write and potentially more accurate.
  • The master branch must always compile and pass all tests.
  • OpenHub: The OpenHub repo links have been adjusted to point at Rapicorn’s and Beast’s master branch. Because of problems with spammers and a corresponding reimplementations, code statistic updates on the OpenHub are platform currently stalled however.
    https://www.openhub.net/p/beast-bse
    https://www.openhub.net/p/rapicorn

Hello and goodbye clang++

Rapicorn C++11 code currently compiles with g++-4.7 and upwards. An initial attempt was made at making the C++11 code compile with clang++-3.4 but the incompatibilities are currently too numerous. A few good fixes have come out of this and are merged into master now, but further work on this branch probably has to wait for a newer clang++ version.

New Widgets

Rapicorn is growing more widgets that implement state rendering via SVG element matching. Recent additions are:

  • LayerPainter – A container that allows rendering widgets on top of each other.
  • ElementPainter – A container that displays state dependent SVG image elements.
  • FocusPainter – An ElementPainter that decorates its child according to focus changes.

IDL Improvements

Several changes around Rapicorn’s IDL compiler and support code made it into master recently:

  • The IDL layer got bind() and connect() mthods (on the ObjectBroker interface). This models the IDL setup phase after the zeromq API. Beast makes use of this when setting up IDL interface layers in the UI and in BSE.
  • The Python binding was rewritten using Cython. Instead of invoking a heap of generated Python glue code and talking to the message passing interfaces directly, the Python binding now sits on top of the C++ binding. This makes the end results operate much faster, is less complex on the maintenance side and more functional with regards to the Python API offered. As an added bonus, it also eases testing of the C++ bindings.
  • And just to prove the previous point, the new Cython port uncovered a major issue lurking in the C++ IDL handling of objects in records and sequences. At least since the introduction of remote reference counting, client side object handles and server side object references are implemented and treated in fundamentally different ways. This requires records (struct) and sequences (std::vector) to have separate implementation types on the client and server sides. Thus, the client and server types are now prefixed with ClnT_ and SrvT_ respectively. Newly generated typedef aliases are hiding the prefixes from user code.
  • IDL files don’t need ‘ = 0 ‘ postfixes for methods any more. After all, generating non-virtual methods wasn’t really used anyway.
  • The Enum introspection facilities got rewritten so things like the Enum name are also accessible now. This area probably isn’t fully finished yet, for future Any integration a more versatile API is needed still.
  • Auxillary information for properties is now accessible through an __aida_aux_data__() method on generated interfaces.
  • Generated records now provide a template method __accept__<>(Visitor) to visit all record fields by value reference and name string. Exemplary visitor implementations are provided to serialize/deserialize records to XML and INI file formats.

BEAST Developments

For the most part, changes in Beast are driving or chasing Rapicorn at the moment. This means that often the tip of Rapicorn master is required to build Beast’s master branch. Here is why:

  • Beast now uses RAPIDRES(1) to embedd compressed files. Rapicorn::Blob and Rapicorn::Res make these accessible.
  • Beast now makes use of Rapicorn’s IDL compiler to generate beastrc config structures and to add a new ‘Bse‘ IDL layer into libbse that allows the UI code to interface with Bse objects via C++ interfaces. Of course, lots of additional porting work is needed to complete this.
  • Beast procedures (a kind of ‘remote method’ implemented in C with lots of boilerplate code) are now migrated to C++ methods one by one which majorly simplifies the code base, but also causes lots of laborious adaptions on the call sites, the UI and the undo system. An excursion into the changes this brings for the undo implementation is provided in DevLog: A day of templates.
  • The GParamSpec introspection objects for properties that Beast uses for GUI generation can now be constructed from __aida_aux_data__()  strings, which enabled the beastrc config structure migration.
  • An explanatory file HACKING.md was added which describes the ongoing migration efforts and provides help in accessing the object types involved.

What’s next?

For the moment, porting the object system in Beast from GObject to IDL based C++11 interfaces and related procedure, signal and property migrations is keeping me more than busy. I’ll try to focus on completing the majority of work in this area first. But for outlooks, adding a Python REPL might make a good followup step. 😉

Jun 272015
 

c++11Yesterday I spent some 14+ hours on getting a templated undo method wrapper going.
Just to throw it all away this morning.

Here’s what I was trying to achieve, the C version of BEAST implements undo as follows:

// bse_track_remove_tick():
BseTrack *track;
uint tick;
BsePart *part;
bse_item_push_undo_proc (track, "insert-part", tick, part);

That is, it queues an undo step, that if executed, will call the “insert-part” procedure
on a BseTrack object that inserts a BsePart object at a ‘tick’.
This all happens through a varargs interface with lots of magic behind the scenes. In
particular the reference to ‘part’ is tricky. Future modifications to the BseTrack (or
project) may cause the removal and destruction of the BsePart object involved here.
While the execution of future undo steps will re-create a BsePart to be inserted here
before the step at hand is executed, the ‘part’ object pointer will have to be changed
to the re-created one instead of the destroyed one.
To achieve this, bse_item_push_undo_proc() internally converts the ‘part’ pointer into
a serializable descriptor string that allows to re-identify the BsePart object and the
undo machinery will resolve that before “insert-part” is called.

Now on to C++. I wanted the new pendant in the C++ version of Beast to look like:

// TrackImpl::remove_tick():
TrackImpl *this;
const uint tick;
PartImpl &part;
push_undo ("Remove Tick", *this, &TrackImpl::insert_part, tick, part);

But…

Under the hood that means push_undo() (which is a template method on ItemImpl, a base type of TrackImpl) needs to process its variable argument list to:

  • A) Put each argument into a wrapper structure and store away the argument list (i.e. std::tuple<Wrapper<Args>…>).
  • B) Special case the wrapper structure for objects to store a descriptor internally (i.e. template specialisation on Wrapper<Arg> for Arg=ItemImpl& or derived).
  • C) Copy the wrapped argument list into a closure to be called when the undo step is executed.
  • D) When the closure is called, “unwrap” each of the wrapped arguments to yield its original type (i.e. construct a std::tuple<Args…> from std::tuple<Wrapper<Args>…>).
  • E) When unwrapping an object, resolve the descriptor stored internally (i.e. put more magic into Wrapper<Arg> to yield a valid Arg& object).
  • F) Construct a variable argument call to &TrackImpl::insert_part(…) (i.e. apply a C++ argument pack).

In short, I got A, B, C, D, F working after significant efforts.
A is somewhat straight forward with C++11 variable template arguments. C can be accomplished with a C++11 lambda capture list and F involves copying over std::integer_sequence from the C++14 proposals and hacking its std::apply() template to support instance + method calls. Last, D can be implemented in a related fashion to F.
What’s left is B and E, i.e. writing a wrapper that will store and yield ordinary arguments such as int or std::string and convert ItemImpl& derived types back and forth between a string representation.
Probably laborious but doable — or so I thought.

It turns out that because of all the argument and tuple packing hassle (template recursion, integer sequencing and more) involved in implementing A, D, F, it would be hard to pass needed serialization context into Wrapper<>. And what’s much worse is that g++-4.9 started to choke on template errors during the Wrapper<> development, aborting with “confused by earlier errors” after pages and pages of template error messages. clang++-3.4 isn’t yet capable of processing the C++11 used by Rapicorn, so it wasn’t of help here either (I plan on another attempt at porting my C++11 code to be clang++ compatible once I get my hands on a newer clang++ version).
I.e. in the end, I gave up after an overlong day in the middle of E, everything else having been accomplished. g++-4.9 choking was a main let down, but probably even more important is that I had the necessary state and mood to process multiple pages of template error messages yesterday, but the same cannot be expected of every push_undo() user in the future if any push_undo() argument ever mismatches.

This morning, I threw away yesterdays templating excess and within an hour got an alternative interface to work:

// undoing part removal needs an undo_descriptor because future
// deletions may invalidate and recreate the part object
TrackImpl *this;
const uint tick;
PartImpl &part;
UndoDescriptor<PartImpl> part_descriptor = undo_descriptor (part);
auto lambda = [tick, part_descriptor] (TrackImpl &self) {
  PartImpl &part = self.undo_resolve (part_descriptor);
  self.insert_part (utick, part);
};
push_undo ("Remove Tick", *this, lambda);

That is, this interface is fully type-safe, but the ‘part’ wrapping has to be done manually, which involves writing a small lambda around TrackImpl::insert_part(). If any argument of the lambda or push_undo() calls is erroneous, the compiler will point at a single failing variable assignment in the implementation of push_undo<>() and list the mismatching arguments.
That is much more digestible than multiple template recursion error pages, so it’s a plus on the side of future maintenance.

The short version of push_undo<>() that takes a method pointer instead of a lambda is still available for implementing undo steps that don’t involve object references, incidentally covering the majority of uses.

May 262015
 

A good while ago at a conference, I got into a debate over the usefulness of TLS (thread-local storage of variables) in performance critical code. Allegedly TLS should be too slow for practical uses, especially for shared libraries.

TLS can be quite useful for context sensitive APIs, here’s a simple example:

push_default_background (COLOR_BLUE);
auto w = create_colorful_widget(); // gets blue background
pop_default_background();

For a single threaded program, the above push/pop functions can keep the default background color for widget creation in a static variable. But to allow concurrent widget creation from multiple threads, that variable will have to be managed per-thread, so it needs to become a thread local variable.

Another example is GSlice, a memory allocator that keeps per-thread allocation caches (magazines) for fast successive allocation and deallocation of equally sized memory chunks. While operating within the cache size, only thread local data needs to be accessed to release and reallocate memory chunks. So no other synchronization operations with other threads are needed that could degrade performance.

GCC (I’m using 4.9.1 here), GLibc (2.19), et all have seen a lot of improvements since, so I thought I’d dig out an old benchmark and evaluate how TLS does nowadays. To test the shared library case in particular, I’ve written the benchmark as a patch against Rapicorn and posted it here: thread-local-storage-benchmark.diff.

The following table lists the best results from multiple benchmark runs. The numbers shown are the times for 2 million function calls to fetch a (TLS) pointer of each kind (plus some benchmarking overhead), on a Core-i7 CPU @ 2.80GHz in 64bit mode:

Local pointer access (no TLS):                0.003351 seconds
Shared library TLS pointer access:            0.003741 seconds
Static pointer access (no TLS):               0.004450 seconds
Executable global TLS pointer access:         0.004735 seconds
Executable function-local TLS pointer access: 0.004828 seconds

The greatest timing variation in these numbers is within thirty percent (30.6%). In realistic scenarios, the time needed for pointer accesses is influenced by a lot of other more dominant factors, like code locality and data cache faults.

So while it might have been true that TLS had some performance impacts in its infancy, with a modern tool chain on AMD64 Linux, performance is definitely not an issue with the use of thread-local variables.

Here is the count out in nano seconds per pointer access call:

TLS Benchmark

Let me know if there are other platforms that don’t perform as well.

May 052015
 

Giving in to persistent nagging from Stephen and Stefan about progress updates (thanks guys), I’ll cherry pick some of the branches recently merged into Rapicorn devel for this post. We’ll see if I can keep posting updates more regularly in the future… 😉

Interactive Examples

Following an idea Pippin showed me for his FOSDEM talk, I’ve implemented a very simple small script (merged with the ‘interactive-examples’ branch) to restart an example program if any file of a directory hierarchy changes. This allows “live” demonstrations of widget tree modifications in source code, e.g.:

cd rapicorn/
misc/interactive.sh python ./docs/tutorial/tuthello.py &
emacs ./docs/tutorial/tuthello.py
# modify and save tuthello.py

Everytime a modification is saved, tuthello.py is restarted, so the test window it displays “appears” to update itself.

Shared_ptr widgets

Last weekend, I also pushed the make_shared_widgets branch to Rapicorn.

Some while ago, we started to use std::shared_ptr<> to maintain widget reference counts instead of the hand-crafted ref/unref functions that used atomic operations. After several cleanups, we can now also use std::make_shared() to allocate the same memory block for storing the reference count and widget data. Here is an image (originals by Herb Sutter) demonstrating it:

make_shared_widgets

The hand-optimized atomic operations we used previously had some speed advantages, but using shared_ptr was needed to properly implement remote reference counting.

Resources

Since 2003 or so, Beast and later Rapicorn have had the ability to turn any resource file, e.g. PNG icons, into a stream of C char data to be compiled into a program data section for runtime access. The process was rather unordered and adhoc though, i.e. any source file could include char data generated that way, but each case needed its own make rules and support code to access/uncompress and use that data. Lately I did a survey across other projects on how they go about integrating resource files and simplified matters in Rapicorn based on the inspirations I got.
With the merge of the ‘Res’ branch, resource files like icons and XML files have now all been moved under the res/ directory. All files under this subdir are automatically compressed and compiled into the Rapicorn shared library and are accessible through the ‘Res’ resource class. Example:

Blob data = Res ("@res icons/example.png");

Blob objects can be constructed from resources or memory mapped files, they provide size() and data() methods and are automatically memory managed.

New eval syntax

In the recently merged ‘factory-eval-syntax’ branch, we’ve changed the expression evaluation syntax for UI XML files to the following:

<label markup-text="@eval label_variable"></label>

Starting attribute values with ‘@’ has precedence on other platforms and is also useful in other contexts like resources, which allows us to reduce the number of syntax special cases for XML notations.

Additionally, the XML files now support property element syntax, e.g. to set the ‘markup_text’ property of a Label:

<Label>
    <Label.markup-text> Multiline <b>Text</b>... </Label.markup-text>
</Label>

This markup is much more natural for complex property values and also has precedence on other platforms.

What’s next

I’m currently knee deep in the guts of new theming code, the majority of which has just started to work but some important bits still need finishing. This also brings some interesting renovation of widget states, which I hope to cover here soon. As always, the Rapicorn Task List contains the most important things to be worked on next. Feedback on missing tasks or opinions on what to prioritize are always appreciated.

Dec 312014
 

In the true tradition of previous years, this years 31c3 in Hamburg revealed another bummer about surveillance capacities:

The brief summary is that viable attacks are available to surveillance agencies for PPTP, IPSEC, SSL/TLS and SSH.
New papers reveal that as of 2012, OTR and PGP seem to have resisted decryption attempts.

A related “Spiegel” article provides more details and the leaked papers that contain this information: Inside The Nsa War On Internet Security.

Several vulnerabilities regarding SSL/TLS have been discovered and fixed in the past years since these papers were created. But at the very least, for state agencies the possibility remains to decrypt individual connections with fake certificates via man-in-the-middle-attacks.

Claiming decryption of SSH caught me by surprise though, it’s a tool deeply ingrained into my daily workflow.

At he conference, I got a chance to discuss this with Jacob after studying some of the Spiegel revelations and since I’ve been asked about this so much I’ll wrap it up here:

  • The cited papers put an emphasis on breaking other crypto protocols like PPTP and IPSEC. That and even SSL enjoy much more focus than SSH attack possibilities.
  • Clearly, good attacks are possible against password protected sessions, given lots of computation power or (targeted) password collection databases.
  • Also 768bit RSA keys are probably nowadays breakable by surveillance agencies and 1024 bit key could be within reach based on revelations about their processing capacities.
  • Even 2048 bit keys could become approachable given future advances in mathematical attacks or weak random number generators used for key generation as was the case in Debian 2008 (CVE-2008-0166).
  • Additionally, there always remains the possibility of an undiscovered SSH implementation bug or protocol flaw that’s exploitable for agencies.

Fact is, we don’t yet know enough details about all possible attack surfaces against SSH available to the agencies and we badly need more information to know what infrastructure components remain save and reliable for our day to day work. However we do have an idea about the weak spots that should be avoided.

My personal take away is this:

  • Never allow password based SSH authentication ever:
    /etc/ssh/sshd_config: PasswordAuthentication no
  • Use 4096bit keys for SSH authentication only, I have been doing this for more than 5 years and performance has not been a problem:
    ssh-keygen -t rsa -b 4096 -f ~/.ssh/id_HOSTNAME -C account@HOSTNAME
  • Turn to PGP and OTR for useful encryption.

Have a happy new year everyone…