Traffic analysis tools for websites
Data for Breakfast
If you are looking for an alternative to Google Analytics for studying website data, you can choose from a few free alternatives. In this article, we look at Piwik, Open Web Analytics, and eAnalytics.
Admins who wanted details of the visitors to their websites in the early years of the Internet had to laboriously read the web server's logs. The first log file analysis applications appeared 20 years ago. Analog [1], Webalizer [2] and AWStats [3], which date from this period, are still occasionally in use (see the "Simple Web Analytics Tools" box).
Simple Web Analytics Tools
Many system administrators are quite happy with the simpler, resource-friendly log evaluations provided by statistics tools.
The oldest open source tools include Analog developed in 1995 and Webalizer first released in 1997. Both applications are still regularly updated today. The tools evaluate the logs several times a day, when run by the admin or a cron job. AWStats is also a simple analysis program. It has generated statistics about web page visits since 2000 and is still under active development. The script, implemented entirely in Perl, uses logfile analysis on web, mail, and FTP servers to produce its reports as HTML pages. Simple bar charts graphically enhance the results.
GoAccess [6] (Figure 1) gives the admin the ability to output and continuously update analyses in real time in a terminal or in a browser. GoAccess can handle virtually any log format used by Apache, Nginx, Amazon S3, Elastic Load Balancing, CloudFront, and others.
In 2005, Google launched Google Analytics (GA) [4], a website analysis service that is hugely popular today. Open source tools such as Piwik [5] picked up on this trend towards graphical web analytics, but moved its focus to the customer's own server.
With the help of web analytics, site operators collect and evaluate data on the surfing habits of their visitors. The access data are of interest not only for commercial reasons; the companies behind the sites also often seek to better understand their customers and their interests. The following applies: The closer an operator knows the visitors and their preferences, the better the operator can optimize its offerings to suit the target group.
Good to Know
Site operators are often interested in where the visitors come from, what they are looking for, what items they click on, and how long they remain on the site. It can also be useful to know when they leave the site. Admins want to know what browsers and operating systems visitors to the site use, which files and documents they download and with what bandwidth, and how many visitors subscribe to newsletters or RSS feeds.
Web shop operators are interested in how many visitors add goods to their shopping carts, to then purchase them, or possibly not. If a website hosts advertising for third parties, web analysis is essential, because access figures and similar factors determine the prices for advertisers.
Open Access
The market offers many different web analytics tools. They include around 150 commercial, typically proprietary applications, aimed at larger corporate websites. There are also some free and partly also open source tools. This article looks at Piwik, Open Web Analytics [7], and eAnalytics [8] (Table 1).
Table 1
Three Web Statistics Tools at a Glance
| Piwik | Open Web Analytics (OWA) | eAnalytics |
---|---|---|---|
Platforms |
Cross-Platform |
Cross-Platform |
Debian/Ubuntu |
License |
GPLv3 and others |
GPLv2 |
AGPLv3 |
Under development since |
2009 |
2009 |
2011 |
Language |
PHP |
PHP |
Java and others |
Methods |
JavaScript tags, log analysis, tracking pixels |
JavaScript tags, log analysis, tracking pixels |
eAnalytics tag, tracking pixels |
Functions |
Visitors (visitors, unique visitors), operating system, browser version, downloads, IP address (pseudonymization capable), geolocation by city, page impressions, referrer, plugins |
Visitors (visitors, unique visitors), operating system, downloads, browser version, IP address, geo location by country, page impressions, referrer, heat maps |
Visitors (visitors, unique visitors), operating system, downloads, browser version, IP address (pseudonymization capable, can be switched off), geolocation by city, page impressions, referrer, plugins |
From a technical point of view, the web analytics tools either prepare web server logfiles, or special tags integrated into the HTML web pages giving the admin statistics and graphics for a quick overview and access to all necessary key indicators. Although the server-based method analyzes the logfiles of the web server, developers of the client-based variant add tracking pixels into the source code of the web page to determine the key indicators.
Although none of the two methods fully represents the actual traffic of a website, the client-based system of counting pixels, combined with the controversial use of cookies, is currently just about winning the accuracy stakes.
Privacy Issues
Because they evaluate cookies and store the visitors' IP addresses, web analytics tools always face a difficult legal situation. For example, Germany's Telemedia Act (TMG) [9] allows you to create user profiles if the user does not object to the purposes of advertising and market research. Such a profile is only allowed to contain an anonymized IP address in addition to the data on the use of the website. IP addresses are typically automatically truncated to this end.
The TMG also requires the service provider to inform the user in a privacy statement on the website of whether, to what extent, and for what purpose it processes the IP address. And, the TMG stipulates that users must have an option to object to the creation of user profiles.
Probably the most controversial and at the same time most successful tool for website traffic analysis pages is the Google Analytics online service, which was launched in 2005, and which 50 percent of all websites employ. It is clearly the top dog. In contrast to the applications covered in this article, the data collected by GA leaves users' computers and heads to the United States, where data protection provisions are not as stringent as in Germany and the rest of Europe.
For example, GA delivers the unabridged IP address to the parent company. Also, the website visitor may not necessarily be informed of the fact that Google is collecting its data. Browser add-ons like Ghostery or NoScript can disable GA [10] to provide protection against unwanted data collection. GA doesn't cost anything up to a traffic volume of 10 million hits a month, but it only delivers certain data following a 24-hour delay. Also, the user has to agree to Google's using the data for its own purposes.
Buy this article as PDF
(incl. VAT)
Buy Linux Magazine
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters
Support Our Work
Linux Magazine content is made possible with support from readers like you. Please consider contributing when you’ve found an article to be beneficial.
News
-
Fedora Asahi Remix 41 Available for Apple Silicon
If you have an Apple Silicon Mac and you're hoping to install Fedora, you're in luck because the latest release supports the M1 and M2 chips.
-
Systemd Fixes Bug While Facing New Challenger in GNU Shepherd
The systemd developers have fixed a really nasty bug amid the release of the new GNU Shepherd init system.
-
AlmaLinux 10.0 Beta Released
The AlmaLinux OS Foundation has announced the availability of AlmaLinux 10.0 Beta ("Purple Lion") for all supported devices with significant changes.
-
Gnome 47.2 Now Available
Gnome 47.2 is now available for general use but don't expect much in the way of newness, as this is all about improvements and bug fixes.
-
Latest Cinnamon Desktop Releases with a Bold New Look
Just in time for the holidays, the developer of the Cinnamon desktop has shipped a new release to help spice up your eggnog with new features and a new look.
-
Armbian 24.11 Released with Expanded Hardware Support
If you've been waiting for Armbian to support OrangePi 5 Max and Radxa ROCK 5B+, the wait is over.
-
SUSE Renames Several Products for Better Name Recognition
SUSE has been a very powerful player in the European market, but it knows it must branch out to gain serious traction. Will a name change do the trick?
-
ESET Discovers New Linux Malware
WolfsBane is an all-in-one malware that has hit the Linux operating system and includes a dropper, a launcher, and a backdoor.
-
New Linux Kernel Patch Allows Forcing a CPU Mitigation
Even when CPU mitigations can consume precious CPU cycles, it might not be a bad idea to allow users to enable them, even if your machine isn't vulnerable.
-
Red Hat Enterprise Linux 9.5 Released
Notify your friends, loved ones, and colleagues that the latest version of RHEL is available with plenty of enhancements.