Nov 132013
 

Reblogged from the Lanedo GmbH blog:

Tobin Daily Visits Screenshot

Tobin Daily Visits Screenshot

During recent weeks, I’ve started to create a new tool “Tobin” to generate website statistics for a number of sites I’m administrating or helping with. I’ve used programs like Webalizer, Visitors, Google Analytics and others for a long time, but there’re some correlations and relationships hidden in web server log files that are hard or close to impossible to visualize with these tools.

The Javascript injections required by Google Analytics are not always acceptable and fall short of accounting for all traffic and transfer volume. Also some of the log files I’m dealing with are orders of magnitudes larger than available memory, so the processing and aggregation algorithms used need to be memory efficient.

Here is what Tobin currently does:

  1. Input records are read and sorted on disk, inputs are filtered for a specific year.
  2. A 30 minute window is used to determine visits via unique IP-address and UserAgent.
  3. Hits, such as images, CSS files, WordPress resource files, etc are filtered to derive page counts.
  4. Statistics such as per hour accounting and geographical origin are collected.
  5. Top-50 charts and graphs are generated in an HTML report from the collected statistics.

There is lots of room for future improvements, e.g. creation of additional modules for new charts and graphs, possibly accounting across multiple years, use of intermediate files to speed up processing and more. In any case, the current state works well for giant log files and already provides interesting graphs. The code right now is alpha quality, i.e. ready for a technical preview but it might still have some quirks. Feedback is welcome.

The tobin source code and tobin issue tracker are hosted on Github. Python 2.7 and a number of auxiliary modules are required. Building and testing works as follows:

$ git clone https://github.com/tim-janik/tobin.git
$ make # create tobin executable
$ ./tobin -n MySiteName mysite.log
$ x-www-browser logreport/index.html

Leave me a comment if you have issues testing it out and let me know how the report generation works for you. ūüėČ

Sep 022013
 

London Southwark Bridge

Next Friday I’ll be giving a talk on¬†Open Source In Business at the Campus Party Europe conference in the¬†O2 arena, London. The talk is part of the Free Software Track¬†at 14:00 on the Archimedes stage.¬†I’m there the entire week and will be happy to meet up, so feel free to drop me a line in case you happen to be around the venue as well.

The picture above shows the London¬†Southwark Bridge¬†viewed from Millenium Footbridge with the London Tower Bridge in the background. Last week’s weather has been exceptional for sightseeing in London, so there were loads of great chances to visit that vast number of historical places and monuments present in London. My G+ stream has several examples of the pictures taken. I honestly hope the good weather continues and allows everyone to enjoy a great conference!

Update: The slides and video of the talk are now online here: Open Source In Business