Setting Up GoAccess To Analyse Apache Logs

On a Debian machine serving websites with the Apache server, we install and configure GoAccess to monitor the websites activity.

UPDATE: This tutorial is outdated, please refer to the new tutorial: https://arnaudr.io/2020/08/10/goaccess-14-a-detailed-tutorial/

Introduction

Time to monitor a little bit my VPS and see if someone's reading my blogs!

I started self-hosting my blogs almost a year ago, and today I still have no idea if someone is actually reading it! Of course I'm a bit curious, and I wish I knew. But I never found an easy way to do that.

I never found, until lately, when I met a fellow blogger who wrote some articles on the matter. Among different solutions, I settled for the very easy one: GoAccess.

GoAccess works by analysing the HTTP server logs, building some statistics, and spitting the result out in a nice HTML page. It's a zero-hassle solution. Exactly what I was looking for!

The original article is available here, and it's more complete than mine. Go read it!
http://freedif.org/goaccess-bandwidth-statistics-by-folderurl-with-virtualhost/

Install

We will install GoAccess from the GoAccess Debian repository, so that we enjoy the latest cool features.

The several ways to install GoAccess are documented on the official webpage:
https://goaccess.io/download

$ echo "deb http://deb.goaccess.io/ $(lsb_release -cs) main" > /etc/apt/sources.list.d/goaccess.list
$ wget -O - https://deb.goaccess.io/gnugpg.key | apt-key add -
$ apt-get update

If you want to install goaccess without Tokyo Cabinet storage support, type:

$ apt-get install goaccess

With Tokyo Cabinet storage support:

$ apt-get install goaccess-tcb

Both gives you this output:

$ goaccess --version
GoAccess - 1.0.2.
For more details visit: http://goaccess.io
Copyright (C) 2009-2016 by Gerardo Orellana

Configure websites for monitoring

GoAccess works by scanning the logs of your HTTP server.

All we have to do is to ensure the log configuration of the HTTP server is OK. I assume you host a few websites with Apache, and each one of them is a virtual host.

In such situation, what we want is to have one logfile per virtual host. This is easily achieved with the CustomLog command.

$ vi /etc/apache2/sites-enabled/website1.conf
<VirtualHost *:80>
    ...
    ErrorLog  ${APACHE_LOG_DIR}/website1_vhosts_error.log
    CustomLog ${APACHE_LOG_DIR}/website1_vhosts_access.log combined
</VirtualHost>

That's all there is to do!

Configure GoAccess

Let's have a look at the configuration file.

$ vi /etc/goaccess.conf

To match the Apache configuration mentioned above, let's uncomment the following lines:

time-format %H:%M:%S
date-format %d/%b/%Y
log-format %h %^[%d:%t %^] "%r" %s %b "%R" "%u"

That's it! GoAccess is ready to parse Apache logs now.

Run GoAccess

Nothing easier!

goaccess -f /var/log/apache2/website1_vhosts_access.log -o /tmp/goaccess.html

With this command, GoAccess generate a static HTML page containing all the statistics, nicely ordered and displayed. Just open this page in your web browser and enjoy.

Run GoAccess with persistent logs

You will find out quickly that just parsing a log file is limited, because log files are usually rotated by your loggingn daemon. Quickly enough, a log file is moved and replaced by a fresh, empty file. As a result, GoAccess will only be able to display statistics on a short period, maybe one day.

To overcome this limitation, GoAccess can keep its own database if it's compiled with Tokyo Cabinet storage support (I'm sure you wondered what the hell was that Tokyo thing...). With that, GoAccess gain a memory of it own!

Then you just need to use it with a different set of options (see PROCESSING LOGS INCREMENTALLY in the man page for additional information).

The right command is then:

goaccess --load-from-disk --keep-db-files       \
    -f /var/log/apache2/website1_vhosts_access.log  \
    -o /tmp/goaccess.html

Choose where to store this output

Ok, now there's two things we still need to do. Assuming you run GoAccess on a remote server, you want the result to be accessible from outside. You have plenty of ways to do that.

For example, you could put the report at the root of your website named website1, under the name goaccess.html, and then you could get your stats easily by connecting to the address http://www.website1.com/goaccess.html or similar. You get the idea?

Another way is to have a dedicated directory where you put all your statistics. That's the solution I use. My Apache server serves a default page located in /srv/www/default when someone connects to my bare domain name. So I just added a sub-directory named goaccess, and that's where I put the result.

You can then see the stats for this blog at the following address:
https://goaccess.arnaudr.io/

Automate the process to keep the stats updated

Now the only thing left to do is to automate the process, and to have GoAccess generate a new report on a regular basis. We can schedule that hourly with cron.

crontab -e

Then add such a line (changes the pathes according to your setup, of course).

@hourly goaccess --load-from-disk --keep-db-files -f /var/log/apache2/website1_vhosts_access.log -o /srv/www/default/goaccess/website1.html

Done!