Nov 232012
 

For a while now, I’ve been maintaining my todo lists as backlogs in a Mediawiki repository. I’m regularly deriving sprints from these backlogs for my current task lists. This means identifying important or urgent items that can be addressed next, for really huge backlogs this can be quite tedious.

A SpecialPage extension that I’ve recently implemented now helps me through the process. Using it, I’m automatically getting a filtered list of all “IMPORTANT:”, “URGENT:” or otherwise classified list items. The special page can be used per-se or via template inclusion from another wiki page. The extension page at mediawiki.org has more details.

The Mediawiki extension page is here: http://www.mediawiki.org/wiki/Extension:ListItemFilter

The GitHub page for downloads is here: https://github.com/tim-janik/ListItemFilter

May 132011
 
Wiki↠HTML↠Man

 

What’s this?
Wikihtml2man is an easy to use converter that parses HTML sources, normally originating from a Mediawiki page, and generates Unix Manual Page sources based on it (also referred to as html2man or wiki2man converter). It allows developing project documentation online, e.g. by collaborating in a wiki. It is released as free software under the GNU GPLv3. Technical details are given in its manual page: Wikihtml2man.1.

Why move documentation online?
Google turns up a few alternative implementations, but none seem to be designed as a general purpose tool. With the ubiquituous presence of wikis on the web these days and the ease of content authoring they provide, we’ve decided to move manual page authoring online for the Beast project. Using Mediawiki, manual pages turn out to be very easily created in a wiki, all that’s then needed is a backend tool that can generate Manual Page sources from a wiki page. Wikihtml2man provides this functionality based on the HTML generated from wiki pages, it can convert a prerendered HTML file or download the wiki page from a specific URL. HTML has been choosen as input format to support arbitrary wiki features like page inclusion or macro expansion and to potentially allow page generation from other wikis than MediaWiki. Since wikihtml2man is based purely on HTML input, it is of course also possible to write the Manual Page in raw HTML, using tags such as h1, strong, dt, dd, li, etc, but that’s really much less convenient to use than a regular wiki engine.

What are the benefits?
For Beast, the benefits of moving some project documentation into an online wiki are:

  • We increase editability by lifting review requirements.
  • We are getting quicker edit/view turnarounds, e.g. through use of page preview functionality in wikis.
  • We allow assimilation of user contributions from non-programmers for our documentation.
  • Easier editability may lead to richer documentation and possibly better/frequently updated documentation.
  • Other projects also seem to make good progress by opening up some development parts to online web interfaces, like: Pootle translations, Transifex translations or PHP.net User Notes.

What are the downsides?
We have only recently moved our pages online and still need to gather some experience with the process. So far possible downsides we see are:

  • Sources and documentation can more easily get out of sync if they don’t reside in the same tree. We hope to be mitigating this by increasing documentation update frequencies.
  • Confusion about revision synchronization, with the source code using a different versioning system than the online wiki. We are currently pondering automated re-integration into the tree to counteract this problem.

How to use it?
Here’s wikihtml2man in action, converting its own manual page and rendering it through man(1):

  wikihtml2man.py http://testbit.eu/Wikihtml2man.1?action=render | man -l -

Where to get it?
Release tarballs shipping wikihtml2man are kept here: http://dist.testbit.eu/testbit-tools/.
Our Tools page contains more details about the release tarballs.

Have feedback or questions?
If you can put wikihtml2man to good use, have problems running it or other ideas about it, feel free to drop me a line about it. Alternatively you can also add your feedback and any feature requests to the Feature Requests page (a forum will be created if there’s any actual demand).

What’s related?
We would also like to hear from other people involved in projects that are using/considering wikis to build production documentation online (e.g. in manners similar to Wikipedia). So please leave a comment and tell us about it if you do something similar.

See Also

  1. New Beast Website – using html2wiki
  2. The Beast Documentation Quest – looking into documentation choices

Feb 092011
 
 

MediaWiki is a pretty fast piece of software out of the box. It’s written in PHP and covers a lot of features, so it can’t serve pages in 0 time, but it’s reasonably well written and allows use of PHP accelerators or caches in most cases. Since it’s primarily developed for Wikipedia, it’s optimized for high performance deployments, caching support is available for Squid, Varnish and plain files.

For small scale use cases like private or intranet hosts, running MediaWiki uncached will work fine. But once it’s exposed to the Internet, regularly crawled and might receive links from other popular sites, serving only a handful of pages per second is quickly not enough. A very simple but effective measure to take in this scenario is the enabling of Apache’s mod_disk_cache.

Here’s a sample benchmark for the unoptimized case:

$ ab -kt3 http://testbit.eu/Sandbox
Time taken for tests:   3.33173 seconds
Total transferred:      301743 bytes
Requests per second:    6.26 [#/sec] (mean)
Time per request:       159.641 [ms] (mean)
Transfer rate:          96.93 [Kbytes/sec] received

Now we configure mod_disk_cache in apache2.conf:

CacheEnable   disk /
CacheRoot     /var/cache/apache2/mod_disk_cache/
And enable it in Apache:
$ a2enmod disk_cache
Enabling module disk_cache.
Run '/etc/init.d/apache2 restart' to activate new configuration!

This in itself is not enough to enable caching of MediaWiki pages however, this is due to some bits in the HTTP header information it’s sending:

$ wget -S --delete-after -nd http://testbit.eu/Sandbox
--2011-02-09 00:48:21--  http://testbit.eu/Sandbox
  HTTP/1.1 200 OK
  Date: Tue, 08 Feb 2011 23:48:21 GMT
  Vary: Accept-Encoding,Cookie
  Expires: Thu, 01 Jan 1970 00:00:00 GMT
  Cache-Control: private, must-revalidate, max-age=0
  Last-Modified: Tue, 08 Feb 2011 03:24:32 GMT
2011-02-09 00:48:21 (145 KB/s) - `Sandbox' saved [14984/14984]

The Expires: and Cache-Control: headers both prevent mod_disk_cache from caching the contents.

A small patch against MediaWiki-1.16 fixes that by removing Expires: and adding s-maxage to Cache-Control:, which allows caches to serve “stale” page versions which are only mildly outdated (a few seconds).

mw-disk-cache-201101.diff
With the patch, the headers changed as follows:
$ wget -S --delete-after -nd http://testbit.eu/Sandbox
--01:03:03--  http://testbit.eu/Sandbox
 HTTP/1.1 200 OK
 Date: Wed, 09 Feb 2011 00:03:03 GMT
 Vary: Accept-Encoding,Cookie
 Cache-Control: s-maxage=3, must-revalidate, max-age=0
 Last-Modified: Tue, 08 Feb 2011 03:24:32 GMT
01:03:03 (386.21 MB/s) - `Sandbox' saved [14984/14984]

Upon inspection, there’s no Expires: header now and Cache-Control: adapted as described. Let’s now rerun the benchmark:

$ ab -kt3 http://testbit.eu/Sandbox
Time taken for tests:   3.5511 seconds
Total transferred:      38621189 bytes
Requests per second:    831.14 [#/sec] (mean)
Time per request:       1.203 [ms] (mean)
Transfer rate:          12548.95 [Kbytes/sec] received

That looks good, 831 requests instead of 6!

Utilizing mod_disk_cache with MediaWiki can easily speed up the number of possible requests per-second by more than a factor of one hundred for anonymous accesses. The caching behavior in the above patch can also be enabled for logged-in users with adding this setting to MediaWiki’s LocalSettings.php:

$wgCacheLoggedInUsers = true;

I hope this helps people out there to speed up your MediaWiki installation as well, happy tuning! 😉

Feb 052011
 
Last week the Beast project went live with a new website that has been in the making since December:


beast.testbit.eu
The old website was several years old, adding or changing content was very cumbersome and bottlenecked on Stefan or me. All edits had to go into the source code repository, adding content pages meant editing web/Makefile.am and changing a menu entry required the entire site to be rebuilt and re-synced. Also beast.gtk.org went offline for several weeks due to hosting problems at UC Berkeley.
So in the last few weeks the Beast website has been gradually moved from beast.gtk.org to beast.testbit.eu and a different hosting service that has more resources available. In the last few years, I’ve gained experiences with Plone, Drupal, DokuWiki, Confluence, a beast-specific markup parser, Joomla, WordPress, etc. They all have their up and down sides, and while I prefer WordPress for my own blog, I’ve settled on MediaWiki for the new Beast website.
Running the new site entirely as a wiki makes the contents easily accessible for everyone willing to contribute and MediaWiki’s markup is what most people already know or are likely to learn in the future. MediaWiki must be the hardest tested collaborative editing tool available, turns out to be impressively feature rich compared to other Wiki engines, has a rich set of extensions, scripting facilities and due to Wikipedia weight a reliable maintenance future.
Much of the previously hand crafted code used for site generation and operation becomes obsolete with the migration, like the screenshot gallery PHP snippets. The entire build-from-source process can be eliminated, and running a dedicated Beast theme on MediaWiki allows editing of the menu structure in a wiki page.
Also MediaWiki allows running multiple front ends under different domains and with different themes on the same Wiki database, which allowed me to merge the Beast site and testbit.eu to reduce maintenance.
A small set of patches/extensions were used to tune MediaWiki for the site’s needs:
  • Enhancing the builtin search logic, so it automatically resorts to partial matches for empty result lists.
  • Adjusting Expires/Cache-Control headers to utilize mod_disk_cache – this increases the number of possible requests per-second by more than a factor of one hundred.
  • Adding support for [[local:/path/to/download/area]] URLs, to refer to downloadable files from within wiki pages.
It took a while to migrate contents gradually into MediaWiki format, as some files had to be migrated from a very old ErfurtWiki installation, some came from the source code repository and some were available in HTML only. Big Kudos to David Iberri, his online html2wiki converter (html2wiki on CPAN) has been a huge help in the process.
I hope the new site is well received, have fun with it!