Tim Janik

Tim Janik studied computer science at the University of Hamburg, is a Free Software and Open Source author, advocate, speaker and contributor to various open projects. For more information see the Biography of Tim Janik. Tim Janik is available for professional consulting around Free Software through the Lanedo Website.

Dec 312014

In the true tradition of previous years, this years 31c3 in Hamburg revealed another bummer about surveillance capacities:

The brief summary is that viable attacks are available to surveillance agencies for PPTP, IPSEC, SSL/TLS and SSH.
New papers reveal that as of 2012, OTR and PGP seem to have resisted decryption attempts.

A related “Spiegel” article provides more details and the leaked papers that contain this information: Inside The Nsa War On Internet Security.

Several vulnerabilities regarding SSL/TLS have been discovered and fixed in the past years since these papers were created. But at the very least, for state agencies the possibility remains to decrypt individual connections with fake certificates via man-in-the-middle-attacks.

Claiming decryption of SSH caught me by surprise though, it’s a tool deeply ingrained into my daily workflow.

At he conference, I got a chance to discuss this with Jacob after studying some of the Spiegel revelations and since I’ve been asked about this so much I’ll wrap it up here:

  • The cited papers put an emphasis on breaking other crypto protocols like PPTP and IPSEC. That and even SSL enjoy much more focus than SSH attack possibilities.
  • Clearly, good attacks are possible against password protected sessions, given lots of computation power or (targeted) password collection databases.
  • Also 768bit RSA keys are probably nowadays breakable by surveillance agencies and 1024 bit key could be within reach based on revelations about their processing capacities.
  • Even 2048 bit keys could become approachable given future advances in mathematical attacks or weak random number generators used for key generation as was the case in Debian 2008 (CVE-2008-0166).
  • Additionally, there always remains the possibility of an undiscovered SSH implementation bug or protocol flaw that’s exploitable for agencies.

Fact is, we don’t yet know enough details about all possible attack surfaces against SSH available to the agencies and we badly need more information to know what infrastructure components remain save and reliable for our day to day work. However we do have an idea about the weak spots that should be avoided.

My personal take away is this:

  • Never allow password based SSH authentication ever:
    /etc/ssh/sshd_config: PasswordAuthentication no
  • Use 4096bit keys for SSH authentication only, I have been doing this for more than 5 years and performance has not been a problem:
    ssh-keygen -t rsa -b 4096 -f ~/.ssh/id_HOSTNAME -C account@HOSTNAME
  • Turn to PGP and OTR for useful encryption.

Have a happy new year everyone…

Dec 122014
Caption Text

Miller, Gary – Wikimedia Commons

In the last months I finally completed and merged a long standing debt into Rapicorn. Ever since the Rapicorn GUI layout & rendering thread got separated from the main application (user) thread, referencing widgets (from the application via the C++ binding or the Python binding) worked mostly due to luck.

I investigated and researched several remote reference counting and distributed garbage collection schemes and many kudos go to Stefan Westerfeld for being able to bounce ideas off of him over time. In the end, the best solution for Rapicorn makes use of several unique features in its remote communication layer:

  1. Only the client (user) thread will ever make two way calls into the server (GUI) thread, i.e. send a function call message and block for a result.
  2. All objects are known to live in the server thread only.
  3. Remote messages/calls are strictly sequenced between the threads, i.e. messages will be delivered and processed sequentially and in order of arrival.

This allows the following scheme:

  1. Any object reference that gets passed from the server (GUI) thread into the client (user) thread enters a server-side reference-table to keep the object alive. I.e. the server thread assumes that clients automatically “ref” new objects that pass the thread boundary.
  2. For any object reference that’s received by a client thread, uses are counted separately on the client-side and once the first object becomes unused, a special message is sent back to the server thread (SEEN_GARBAGE).
  3. At any point after receiving SEEN_GARBAGE, the server thread may opt to initiate the collection of remote object references. The current code has no artificial delays built in and does so immediately (thresholds for delays may be added in the future).
  4. To collect references, the serer thread swaps its reference-table for an empty one and sends out a GARBAGE_SWEEP command.
  5. Upon receiving GARBAGE_SWEEP, the client thread creates a list of all object references it received in the past and for which the client-side use count has dropped to zero. These objects are removed from the client’s internal bookkeeping and the list is sent back to the server as GARBAGE_REPORT.
  6. Upon receiving a GARBAGE_REPORT, corresponding to a previous GARBAGE_SWEEP command, the sever thread has an exact list of references to purge from its previously detached reference-table. Remaining references are merged into currently active table (the one that started empty upon GARBAGE_SWEEP initiation). That way, all object references that have been sent to the client thread but are now unused are discarded, unless they have meanwhile been added into the newly active reference-table.

So far, the scheme works really well. Swapping out the server side reference-tables copes properly with the most tricky case: A (new) object reference traveling from the server to the client (e.g. as part of a get_object() call result), while the client is about to report this very reference as unused in a GARBAGE_REPORT. Such an object reference will be received by the client after its garbage report creation and treated as a genuinely new object reference arriving, similarly to the result of a create_object() call. On the server side it is simply added into the new reference table, so it’ll survive the server receiving the garbage report and subsequent garbage disposal.

The only thing left was figuring out how to automatically test that an object is collected/unreferenced, i.e. write code that checks that objects are gone…
Since we moved to std::shared_ptr for widget reference counting and often use std::make_shared(), there isn’t really any way too generically hook into the last unref of an object to install test code. The best effort test code I came up with can be found in testgc.py. It enables GC layer debug messages, triggers remote object creation + release and then checks the debugging output for corresponding collection messages. Example:

TestGC-Create 100 widgets... TestGC-Release 100 widgets...
GCStats: ClientConnectionImpl: SEEN_GARBAGE (aaaa000400000004)
GCStats: ServerConnectionImpl: GARBAGE_SWEEP: 103 candidates
GCStats: ClientConnectionImpl: GARBAGE_REPORT: 100 trash ids
GCStats: ServerConnectionImpl: GARBAGE_COLLECTED: \
  considered=103 retained=3 purged=100 active=3

The protocol is fairly efficient in saving bandwitdh and task switches: ref-messages are implicit (never sent), unref-messages are sent only once and support batching (GARBAGE_REPORT). What’s left is the two tiny messages that initiate garbage collection (SEEN_GARBAGE) and synchronize reference counting (GARBAGE_SWEEP). As I hinted earlier, in the future we can introduce arbitrary delays between the two to reduce overhead and increase batching, if that ever becomes necessary.

While its not the most user visible functionality implemented in Rapicorn, it presents an important milestone for reliable toolkit operation and a fundament for future developments like remote calls across process or machine boundaries.

Merge 14.10.0 – Includes remote refernce counting,

Merge shared_ptr-reference-counting  – post-release.

Oct 222014

Poodle by Heather Hales

In my previous post Forward Secrecy Encryption for Apache, I’ve described an Apache SSLCipherSuite setup to support forward secrecy which allowed TLS 1.0 and up, avoided SSLv2 but included SSLv3.

With the new PODDLE attack (Padding Oracle On Downgraded Legacy Encryption), SSLv3 (and earlier versions) should generally be avoided. Which means the cipher configurations discussed previously need to be updated.

I’ll first recap the configuration requirements:

  • Use Perfect Forward Secrecy where possible.
  • Prefer known strong ciphers.
  • Avoid RC4, CRIME, BREACH and POODLE attacks.
  • Support browsing down to Windows XP.
  • Enable HSTS as a bonus.

The Windows XP point is a bit tricky, since IE6 as shipped with XP originally only supports SSLv3, but later service packs brought IE8 which at least supports TLS 1.0 with 3DES.

Here’s the updated configuration:

SSLEngine On
SSLProtocol All -SSLv2 -SSLv3
SSLHonorCipherOrder on
# Prefer PFS, allow TLS, avoid SSL, for IE8 on XP still allow 3DES
# Prevent CRIME/BREACH compression attacks
SSLCompression Off
# Commit to HTTPS only traffic for at least 180 days
Header add Strict-Transport-Security "max-age=15552000"

Last but not least, I have to recommend www.ssllabs.com again, which is a great resource to test SSL/TLS setups. In the ssllabs, the above configuration yields an A-rating for testbit.eu.

UPDATE: The above configuration also secures HTTPS connections against the FREAK (CVE-2015-0204) attack, as can be tested with the following snippet:

openssl s_client -connect testbit.eu:443 -cipher EXPORT

Connection attempts to secure sites should result in a handshake failure.

UPDATE: Meanwhile, the Mozilla Foundation provides a webserver configuration generator that almost guarantees an A+ rating on ssllabs: Generate Mozilla Security Recommended Web Server Configuration Files.

Aug 052014

Map Jan-Dec to 1-12

In a time critical section of a recent project, I came across having to optimize the conversion of three digit US month abbreviations (as commonly found in log files) to integers in C++. That is, for “Jan” yield 1, for “Feb” yield 2, etc, for “Dec” yield 12.

In C++ the simplest implementation probably looks like the following:

std::string string; // input value
std::transform (string.begin(), string.end(), string.begin(), ::toupper);
if (string == "JAN") return 1;
if (string == "FEB") return 2;
// ...
if (string == "DEC") return 12;
return 0; /* mismatch */

In many cases the time required here is fast enough. It is linear in the number of months and depending on the actual value being looked up. But for an optimized inner loop I needed something faster, ideally with running time independent of the actual input value and avoiding branch misses where possible. I could take advantage of a constrained input set, which means the ‘mismatch’ case is never hit in practice.

To summarize:

  • Find an integer from a fixed set of strings.
  • Ideal runtime is O(1).
  • False positives are acceptable, false negatives are not.

That actually sounds a lot like using a hash function. After some bit fiddling, I ended up using a very simple and quick static function that yields a result instantly but may produce false positives. I also had to get rid of using std::string objects to avoid allocation penalties. This is the result:

static constexpr const char l3month_table[] = {
  12, 5, 0, 8, 0, 0, 0, 1, 7, 4, 6, 3, 11, 9, 0, 10, 2
}; // 17 elements

/// Lookup month from 3 letters, with 30% chance returns 0 for invalid inputs.
static constexpr inline unsigned int
l3month (const char *l3str)
  return l3month_table[((l3str[1] & ~0x20) + (l3str[2] & ~0x20)) %
                       sizeof (l3month_table) / sizeof (l3month_table[0])];

The hash function operates only on the last 2 ASCII letters of the 3-letter month abbreviations, as these two are sufficient to distinguish between all 12 cases and turn out to yield good hash values. The expression (letter & ~0x20) removes the lowercase ASCII bit, so upper and lower case letters are treated the same without using a potentially costly if-branch. Adding the uppercased ASCII letters modulo 17 yields unique results, so this simple hash value is used to index a 17 element hash table to produce the final result.

In effect, this perfectly detects all 12 month names in 3 letter form and has an almost 30% chance of catching invalid month names, in which case 0 is returned – useful for debugging or assertions if input contracts are broken.

As far as I know, the function is as small as possible given the constraints. Anyone can simplify it further?

Apr 152014


The basic need to encrypt digital communication seems to be becoming common sense lately. It probably results from increased public awareness about the number of parties involved in providing the systems required (ISPs, backbone providers, carriers, sysadmins) and the number of parties these days taking an interest in digital communications and activities (advertisers, criminals, state authorities, voyeurs, …). How much to encrypt and to what extend seems to be harder to grasp though.

Default Encryption

A lot of reasons exist why it’s useful to switch to encryption of content and all data transfers by default (Ed). This has been covered elsewhere in more depth, just a very quick summary could be the following:

  • Not encrypting leads to exposing all online behavior to super easy surveillance by local, foreign and alien authorities, criminals or nosy operators.
  • Rare use of encryption makes it effectively an alert signal for sensitive messages (e.g. online banking) and an alert signal for interesting personalities (e.g. movement organizers).
Forward Secrecy

Even with good certificates in place and strong encryption algorithms like AES, there’re good reasons to enable use of perfect forward secrecy (PFS) for encrypted connections. One is that PFS means each connection uses a securely generated, separate new encryption key, so that recordings of the connection traffic cannot be decyphered in the future when the server certificate is acquired by an attacker. For the same reason PFS is also useful to mitigate security issues like heartbleed, since the vast majority of generated connection keys are not affected by occasional memory leaks, in contrast to the permanent server certificate that is likely to be exposed by even rare memory leaks.

HTTPS On Subdomains

I investigated the administrative side of things for default encryption of all testbit.eu traffic, which also hosts other sites like rapicorn.org. It turned out a number of measures need to be taken to yield an even mildly acceptable result:

  • Many browsers out there still only support encrypted HTTPS connections to just one virtual host per IP address (SSL & vhosts).
  • The affordable (or free) certificates signed by certificate authorities usually just allow one subdomain per certificate (e.g. a free StartSSL certificate covers just www.example.com and example.com).
Site Arrangements

So here’s the list of steps I took to improve privacy on testbit.eu:

  1. I moved everything that was previously hosted on rapicorn.org or beast.testbit.eu into testbit.eu/... subdirectories (leaving a bunch of redirects behind).
  2. Then I obtained a website certificate from startssl.com. I could get my 4096bit certificate signed from them free of charge within a matter of hours. CAcert seems to be the only free alternative, but it’s not supported by major browsers and isn’t even packaged by Debian now.
  3. Having the certificate, I setup my apache to a) reject known-to-be-broken encryption SSL/TLS settings; b) allow a few weak encryption variants to still support old XP browsers; c) give preference to strong encryption schemes with perfect-forward-secrecy.
  4. Last, I setup automatic redirection for all incoming traffic from HTTP to HTTPS on testbit.eu.
Apache PFS Configuration

The Apache configuration bits for TLS with PFS look like the following:

# SSL version 2 has been widely superseded by version 3 or TLS
SSLProtocol All -SSLv2
# Compression is rarely supported and vulnerable, see CRIME attack
SSLCompression Off
# Preferred Cipher suite selection favoring PFS
# Enable picking ciphers in the above order
SSLHonorCipherOrder on

Even stricter variants would be disabling 3DES and SSLv3. But it turns out that IE8 on XP still needs 3DES. Also Firefox-25 on Ubuntu-1304 still needs SSLv3. I still mean to support both for a few more months. This is the stricter configuration:

SSLProtocol All -SSLv3 -SSLv2
Apache HSTS Configuration

Sites that serve all content via HTTPS, might also decide to enforce secure connections with HTTP Strict Transport Security (HSTS), a recent security policy mechanism that prevents security downgrades of web connections and hijacking of web sessions. Apache needs the headers module enabled and a simple configuration line:

# Commit to HTTPS only traffic for at least 180 days
Header add Strict-Transport-Security "max-age=15552000"
Testing Testbit

Using this setup, I’m now getting a green A- at SSL Labs for Testbit: SSL Test.
Further references and tips can be found at BetterCrypto.org.

I’m happy about additional input to improve encryption and its use throughout the web, so tell me what I’ve been missing!

UPDATE: A newer post addresses setup changes to avoid the POODLE attack: Apache SSLCipherSuite without POODLE.

Nov 132013

Reblogged from the Lanedo GmbH blog:

Tobin Daily Visits Screenshot

Tobin Daily Visits Screenshot

During recent weeks, I’ve started to create a new tool “Tobin” to generate website statistics for a number of sites I’m administrating or helping with. I’ve used programs like Webalizer, Visitors, Google Analytics and others for a long time, but there’re some correlations and relationships hidden in web server log files that are hard or close to impossible to visualize with these tools.

The Javascript injections required by Google Analytics are not always acceptable and fall short of accounting for all traffic and transfer volume. Also some of the log files I’m dealing with are orders of magnitudes larger than available memory, so the processing and aggregation algorithms used need to be memory efficient.

Here is what Tobin currently does:

  1. Input records are read and sorted on disk, inputs are filtered for a specific year.
  2. A 30 minute window is used to determine visits via unique IP-address and UserAgent.
  3. Hits, such as images, CSS files, WordPress resource files, etc are filtered to derive page counts.
  4. Statistics such as per hour accounting and geographical origin are collected.
  5. Top-50 charts and graphs are generated in an HTML report from the collected statistics.

There is lots of room for future improvements, e.g. creation of additional modules for new charts and graphs, possibly accounting across multiple years, use of intermediate files to speed up processing and more. In any case, the current state works well for giant log files and already provides interesting graphs. The code right now is alpha quality, i.e. ready for a technical preview but it might still have some quirks. Feedback is welcome.

The tobin source code and tobin issue tracker are hosted on Github. Python 2.7 and a number of auxiliary modules are required. Building and testing works as follows:

$ git clone https://github.com/tim-janik/tobin.git
$ make # create tobin executable
$ ./tobin -n MySiteName mysite.log
$ x-www-browser logreport/index.html

Leave me a comment if you have issues testing it out and let me know how the report generation works for you. 😉

Sep 022013

London Southwark Bridge

Next Friday I’ll be giving a talk on Open Source In Business at the Campus Party Europe conference in the O2 arena, London. The talk is part of the Free Software Track at 14:00 on the Archimedes stage. I’m there the entire week and will be happy to meet up, so feel free to drop me a line in case you happen to be around the venue as well.

The picture above shows the London Southwark Bridge viewed from Millenium Footbridge with the London Tower Bridge in the background. Last week’s weather has been exceptional for sightseeing in London, so there were loads of great chances to visit that vast number of historical places and monuments present in London. My G+ stream has several examples of the pictures taken. I honestly hope the good weather continues and allows everyone to enjoy a great conference!

Update: The slides and video of the talk are now online here: Open Source In Business