Jul 252013


During a conference some while ago, Jacob Appelbaum gave a talk on the usefulness of the Tor project, allowing you to browse anonymously, liberating speech online, enabling web access in censored countries, etc.

Jacob described how the anonymizing Tor network consists of many machines world wide that use encryption and run the Tor software, which are routing internet traffic and on the way anonymize it, and then traffic leaves the network at some random host so the original sender cannot be traced back. These hosts are called “exit nodes”.

At the end of his talk, he prompted the audience:
Why don’t you run an exit node yet?
I had been using Tor in the past on and off, and while I couldn’t agree more with the privacy goals and anti-censorship measures outlined, I never setup an exit node to help the network. And I do admin quite a number of hosted machines that have idle bandwidth available…

It took me a while to get round to it, but some months after that I started to set up the first exit node on a hosted virtual server. It took a while to get it all going, I made sure I read up the legal implications of running it in Germany, setup disclaimers on the host for people checking it’s port 80, etc. After half a day or so, I had it going, watched in the logs how it connected to the network and… let it run.

Traffic came in slowly at first, but after 1 or 2 days, the node’s presence had propagated through the net and it started to max out CPU and bandwidth limits as configured. So far so good, I was happy helping people all over the world browsing the net anonymously and especially helping folks in countries with internet censorship to access all the net. Great!

Or so I thought at least. It only took some 5 or so days for me to get an official notice to cease network activity on this host immediately. Complaints about Copyright infringement were cited as the reason. Turned out that the majority of the “liberating” traffic I was relaying were torrenting copyrighted material. I had checked out the Tor guidelines in advance, which are correctly outlining that in Germany the TMG (law on telecommunication media) paragraphs §8 and §15 are actually protecting me as a traffic router from liability for the actual traffic contents, so initially I assumed I’d be fine in case of claims.

It turned out the notice had a twist to it. It was actually my virtual server provider who sent that notice on behalf of a complaining party and argued that I was in violation of their general terms and conditions for purchasing hosting services. Checking those, the conditions read:
Use of the server to provide anonymity services is excluded.
Regardless of the TMG, I was in violation of the hosting provider’s terms and conditions which allowed premature termination of the hosting contract. At that point I had no choice but stopping the Tor services on this hosting instance.

All in all a dissatisfying experience, but at least I could answer Jacob’s question now:
I’m not running an exit node because it’s not uncommon for German providers to exclude the use of anonymity services on the merits.
I actually got back to Jacob in Email and suggested that a note be added to the TorExitGuidelines wiki page so future contributors know to check out the terms and conditions of their hosting services. It seems my request has been ignored up to this day, for one reason or another.

I’d still like to support the Tor network however, so for all savvy readers out there, I’m asking:

  • Do you have any provider recommendations where running Tor exit nodes is not an issue? (In Germany perhaps?)
  • Is it at all feasible to be running Tor exit nodes in Germany without having to set a legal budget aside to defend yourself against claims?

Jul 162013

In the last few days I finished reading the “Black Swan” by Nassim Nicholas Taleb. Around last January I saw Günther Palfinger mentioning it in my G+ stream, looked it up and bought it.

At first, the book seemed to present some interesting ideas on error statistics and the first 20 or 30 pages are giving good examples for conscious knowledge we posses but don’t apply in every day actions. Not having a trading history like the author, I found reading further until around page 100 to be a bit of a drag. Luckily I kept on, because after that Taleb started to finally get interesting for me.

Once upon a time…
One of the lectures I attended at university touched on black box analysis (in the context of modelling and implementation for computer programs). At first of course the usual and expected or known input/output behavior is noted, e.g. calculus it may perform or pattern recognition or any other domain specific function. But in order to find out hints about how it’s implemented, short of inspecting the guts which a black box won’t allow for, one needs to look at error behavior. I.e. examine the outputs in response to invalid/undefined/distorted/erroneous/unusual inputs and assorted response times. For a simple example, read a sheet of text and start rotating it while you continue reading. For untrained people, reading speed slows down as the rotation angle increases, indicating that the brain engages in counter rotation transformations which are linear in complexity with increasing angles.

At that point I started to develop an interest in error analysis and research around that field, e.g. leading to discoveries like the research around “error-friendliness” in technological or biological systems or discoveries of studies on human behavior which implies corollaries like:

  • To enable speedy and efficient decision making, humans generally rely on heuristics.
  • Displaying heuristic behavior, people must make errors by design. So trying to eliminate or punish all human error is futile, aiming for robustness and learning from errors instead is much better.
  • Perfectionism is anti-evolutionary, it is a dead end not worth striving for. For something “perfect” lacks flexibility, creativity, robustness and cannot be improved upon.

A Black Swan?
Now “Black Swan” defines the notion of a high-impact, low-probability event, e.g. occurring in financial trading, people’s wealth or popularity – events from an extreme realm. That’s in contrast to normally distributed encounters like outcomes of a dice game, people’s body size or the number of someone’s relatives – encounters from a mediocre realm.

From Mediocre…
Here’s a short explanation for the mediocre realms. Rolling a regular dice will never give a number higher than 6 no matter how often it’s thrown. In fact, the more it’s thrown, the more even it’s numbers are distributed and the clearer its average emerges. Measuring people’s weight or number of relatives shows a similar pattern to throwing a dice, the more measurements are encountered the more certain the average becomes. Any new encounter is going to have lesser and lesser impact on the average of the total as the number of measurements increases.

To Extreme…
On the other hand there are the extreme realms. In trading or wealth or popularity, a single encounter can outweigh the rest of the distribution by several orders of magnitude. Most people have an annual income of less than $100k, but the tiny fraction of society that earns more in annual income possesses more than 50% of the entire distribution of wealth. A similar pattern exists with popularity, only very few people are so popular that they’re known by hundreds of thousands or maybe millions of people. But only very very few people are super popular so they’re known by billions. Averaging over a given set only works for so long, until a high-impact “outlier” is encountered that dominates the entire distribution. Averaging the popularity of hundreds of thousands of farmers, industrial workers or local mayors cannot account for the impact on the total popularity distribution by the encounter of a single Mahatma Gandhi.

On Errors
Taleb is spending a lot of time in the book on condemning the application of the Gauss distribution in fields that are prone to extreme encounters especially economics. Rightfully so, but I would have enjoyed learning more about examples of fields that are from the extreme realms and not widely recognized as such. The crux of the inapplicability of the Gauss distribution in the extreme realms lies in two things:

  1. Small probabilities are not accurately computable from sample data, at least not accurately enough to allow for precise decision making. The reason is simple, since the probabilities of rare events are very small, there simply cannot be enough data present to match any distribution model with high confidence.
  2. Rare events that have huge impact, enough impact to outweigh the cumulative effect of all other distribution data, are fundamentally non-Gaussian. Fractal distributions may be useful to retrofit a model to such data, but don’t allow for accurate predictability. We simply need to integrate the randomness and uncertainty of these events into our decision making process.

Aggravation in the Modern Age
Now Taleb very forcefully articulates what he thinks about economists applying mathematical tools from the mediocre realms (Gauss distribution, averaging, disguising uncertain forecasts as “risk measurements”, etc) to extreme realm encounters like trade results and if you look for that, you’ll find plenty of well pointed criticism in that book. But what struck me as very interesting and a new excavation in an analytical sense is that our trends towards globalisation and high interconnectedness which yield ever growing and increasingly bigger entities (bigger corporations, bigger banks, quicker and greater popularity, etc) are building up the potential for rare events to have higher and higher impacts. E.g. an eccentric pop song can make you much more popular these days on the Internet than TV could do for you 20 years ago. A small number of highly interconnected banks these days have become so big that they “cannot be allowed to fail”.

We are all Human
Considering how humans are essentially functioning as heuristic and not precise systems (and for good reasons), every human inevitably will commit mistakes and errors at some point and to some lesser or larger degree. Now admitting we all error once in a while, exercising a small miscalculation during grocery shopping, buying a family house, budgeting a 100 people company, leading a multi-million people country or operating a multi-trillion currency reserve bank has of course vastly different consequences.

What really got me
So the increasing centralisation and increasing growth of giant entities ensures that todays and future miscalculations are disproportionally exponentiated. In addition, use of the wrong mathematical tools ensures miscalculations won’t be small, won’t be rare, their frequency is likely to increase.

Notably, global connectedness alerts the conditions for Black Swan creation, both in increasing frequency and increasing impact whether positive or negative. That’s like our modern society is trying to balance a growing upside down pyramid of large, ever increasing items on top of its head. At some point it must collapse and that’s going to hurt, a lot!

Take Away
The third edition of the book closes with essays and commentary that Taleb wrote after the the first edition and in response to critics and curios questions. I’m always looking for relating things to practical applications, so I’m glad I got the third edition and can provide my personal highlights to take away from Taleb’s insights:

  1. Avoid predicting rare events
    The frequency of rare events cannot be estimated from empirical observation because of their very rareness (i.e. calculation error margin becomes too big). Thus the probability of high impact rare events cannot be computed with certainty, but because of the high impact it’s not affordable to ignore them.
  2. Limit Gauss distribution modeling
    Application of the Gauss distribution needs to be limited to modelling mediocre realms (where significant events have a high enough frequency and rare events have insignificant impact); it’s unfortunately too broadly abused, especially in economics.
  3. Focus on impact but not probability
    It’s not useful to focus on the probability of rare events since that’s uncertain. It’s useful to focus on the potential impact instead. That can mean to identify hidden risks or to invest small efforts to enable potentially big gains. I.e. always consider the return-on-investment ratio of activities.
  4. Rare events are not alike (atypical)
    Since probability and accurate impact of remote events are not computable, reliance on rare impacts of specific size or around specific times is doomed to fail you. Consequently, beware of others making related predictions and/or others relying them.
  5. Strive for variety in your endeavors
    Avoiding overspecialization, learning to love redundancy as well as broadening one’s stakes reduces the effect any single “bad” Black Swan event can have (increases robustness) and variety might enable some positive Black Swan events as well.

What’s next?
The Black Swan idea sets the stage for further investigations, especially investigation of new fields for applicability of the idea. Fortunately, Nassim Taleb continues his research work and has meanwhile published a new book “Antifragile – Things that Gain from Disorder”. It’s already lying next to me while I’m typing and I’m happily looking forward to reading it. 😉

The notion of incomputable rare but consequential events or “errors” is so ubiquitous that many other fields should benefit from applying “Black Swan”- or Antifragile-classifications and corresponding insights. Nassim’s idea to increase decentralization on the state level to combat escalation of error potentials at centralized institutions has very concrete applications at the software project management level as well. In fact the Open Source Software community has long benefited from decentralized development models and through natural organization avoided giant pitfall creation that occur with top-down waterfall development processes.

Algorithms may be another field where the classifications could be very useful. Most computer algorithm implementations are fragile due to high optimization for efficiency. Identifying these can help in making implementations more robust, e.g. by adding checks for inputs and defining sensible fallback behavior in error scenarios. Identifying and developing new algorithms with antifragility in mind should be most interesting however, good examples are all sorts of caches (they adapt according to request rates and serve cached bits faster), or training of pattern recognition components where the usefulness rises and falls with the variety and size of the input data sets.

The book “Black Swan” is definitely a highly recommended read. However make sure you get the third edition that has lots of very valuable treatment added on at the end, and don’t hesitate to skip a chapter or two if you find the text too involved or side tracking every once in a while. Taleb himself gives advice in several places in the third edition about sections readers might want to skip over.

Have you read the “Black Swan” also or heard of it? I’d love to hear if you’ve learned from this or think it’s all nonsense. And make sure to let me know if you’ve encountered Black Swans in contexts that Nassim Taleb has not covered!

Mar 312011
Multitasking Mind

(Image: Salvatore Vuono)


The self deceiving assumption of effective human multitasking.


People are often telling me they are good at multitasking, i.e. handling multiple things at once and performing well at doing so. Now, the human brain can only make a single conscious decision at a time. To understand this, we need to consider that making a conscious decision requires attention, and the very concept of attention means activating relevant information contexts for an observation or decision making and inhibiting other irrelevant information.

The suppression involved in attention control makes it harder for us to continue with a previously executed task, this is why interruptions affect our work flows badly, such as an incoming call, SMS or a door bell. Even just making a decision on whether to take a call already requires attention diversion.

Related, processing emails or surfing while talking to someone on the phone results in bad performance on both tasks, because the attention required for each, necessarily suppresses resources needed by the second task. Now some actions don’t suffer from this competition, we can walk and breathe or balance ourselves fine while paying full attention to a conversation. That’s because we have learned early on in our lives to automate these seemingly mundane tasks, so they don’t require our conscious attention at this point.

Studies [1] [2] have shown time and again, that working on a single task in isolation yields vastly better results and in a shorter time frame when frequent context switches are avoided. This can be further optimized by training in concentration techniques, such as breath meditation, autogenic training or muscle relaxation.

Here’s a number of tips that will help to put these findings to practical use:

  1. Let go of the idea of permanent reachability, nothing is so urgent that it cannot wait the extra hour to be handled efficiently.
  2. Make up your own mind about when to process emails, SMS, IM, news, voice messages.
  3. Start growing a habit of processing things in batches, e.g. walk through a list of needed phone calls in succession, compose related replies in batches, first queue and later process multiple pending reviews at once, queue research tasks and walk through them in a separate browsing session, etc.
  4. Enforce non-availability periods where you cannot be interrupted and may concentrate on tasks of your choice for an extended period.
  5. Schedule phone meetings in advance, ensure everyone has an agenda at hand for the meeting to avoid distractions (Don’t Call Me, I Won’t Call You).
  6. Deliberately schedule relaxation phases, e.g. take a 5 minute break off the screen per hour, ideally moving and walking around; rest breaks are needed after 90 minutes at latest.

Nov 082005

Often i have heard “Just trust me on this!” or similar requests during a technical discussion, and even though I would prefer to please the other person, i simply must say “No” to this kind of request, in bugzilla, on mailing lists, on IRC or even face-to-face discussions.
“Trust Me” simply is not a valid technical argument.
Now don’t get me wrong, i don’t want to attack anyone personally, and on the social level, you probably want to behave the other way around more often than not (most girl friends don’t appreciate “I won’t trust you.” as far as my experience goes at least), but not when you’re trying to maintain reproducible results, often according to scientific method. Many programming errors are based on false untested assumptions and assumptions can be false for an unbelievable variety of reasons, basing on “Trust Me” is just one of them 😉
Conversely, I certainly would love everyone in the world to trust my person on every (technical) fact i present, but over time i had to figure that i can’t even trust myself on every measure, so it would be ridiculous to ask others to do so. Completely valid, long standing assumptions can be broken by such simple things as a package upgrade, a software fix, processor variant switch, hard disk fill level, a timer wrap around or specification changes that i haven’t heard of ;-[]
And that was only warming up the list.
So over the years i have tried to make a habit out of basing every programming or design decision i make on a solid basis of facts, each of which i have checked at least once, heard a sound argument for or seen a reasonable proof for (and usually should be able to dig up the check for it, or relevant information resource).
This does render “Trust Me!” a completely irrelevant statement in technical arguments though, and with this blog entry i want to apologize for every time i had to respond “No” to such a request and plead for an attempt at understanding why i do this. I don’t mean to offend anyone with it, but to improve the software I’m producing for other people; so to a certain extent, i feel i have the obligation to question the reasonings and facts I’m supposed to operate with.
Thank you for reading this.