If you are one of those people who relies on your website statistics software to generate visitor reports for your management team then there are few things you should be aware of.
Did you know??
No two web statistics software packages are the same and they don’t produce the same results. Ideally all website statistics packages would report the same figures however there are a number of factors to consider:
- How the data is collected
- How the software determines what a visitor is
- Filtering
How website data is collected
There are two commonly used methods of collecting data for reporting, these are:
- Web server log files
- JavaScript-based ‘tagging’
The issue is that each of these methods has its problems.
Web Server Log Files
Website visitor data collection in log files relies on the end user requesting a web page and that web page being delivered by the web server. It sounds logical that this would be recorded into the log files. That is until we factor in proxy servers and the web browser cache.
A large Internet Service Provider (ISP) and some corporate networks will have a proxy server that will temporarily save a copy of a web page and make this available for the customers within that network. This means that when the first customer from that network requests the file, your web server will deliver the file and record the transaction into the log file. Later on when the second or subsequent visitor requests the page, they then get this from the proxy server and not your web server. In this situation nothing is likely to be recorded about this event in the web server log files. A similar thing happens with your own personal web browser. Once you view a page for the first time, depending on the settings of your web browser, you may not actually visit the web server again when visiting that page for a subsequent time.
In simple terms, if the web server doesn’t record the event, then this is missing from your statistics reports.
JavaScript Based ‘Tagging’
The JavaScript method overcomes the log file proxy server problem by making the web page trigger the event that records the transaction into the web analytics software reports directly. This is performed differently by each provider and there are a number of factors that can influence what is shown in the reports such as:
- Not enabling JavaScript
- How cookies are handled by the web reporting application
- Where the script is placed. It is safest at the bottom of the web page code but will count less than if placed at the top of the page. This is due to the fact that some visitors will leave a web page prior to the JavaScript code completely executing.
Again as with log file data collection, if it isn’t recorded then it will be missing from the reports.
Detecting visitors and sessions
Every website analysis application uses rules by which a visit or session on the website is determined. Generally speaking they all use a time out value (usually 30 minutes) so that very long sessions are not recorded or are treated as multiple visits.
Visitor detection
Usually a web log file tool will give a choice of options as to how a visitor should be calculated. These fall into two commonly used groups.
- IP address based
- Cookie based
Within this each statistics software product has subtle variations on how these are calculated. The general industry recommendation is to use a cookie based tracking method with a visitor’s session set to time out if there is no activity after 30 minutes.
The most commonly used visitor detection methods are:
IP address1 – Usually combined with the web browser’s signature – this is the least accurate method, however it is commonly the default method used by web server log file analysis tools.
Cookies2 – This is more accurate, however visitors do delete cookies or may choose not to accept them. It is commonly considered the best practicable method to detect visitors.
1An IP address is a numerical label assigned to each device (PC, server, etc).
2A cookie, also known as an HTTP cookie, web cookie, or browser cookie, is used for an origin website to send state information to a user’s browser and for the browser to return the state information to the origin site.
Filtering unwanted visitor statistics
Depending on how a reporting/analytics tool is implemented or designed it may make use of one or more filters to exclude certain information from the reports.
The reason for this is that a large percentage of traffic to a website is likely to be of little interest to a business owner or manager. Search engines for instance, send automated software programs to index a website on a regular basis and email harvesters repeatedly crawl sites looking for email addresses. There are many automated applications that perform functions such as these that should be discounted from any reporting. Not all log file based software products filter the results and included this traffic in their reports.
Products such as Google Analytics that rely on JavaScript tend not to have automatic filtering however there may be one or more filters that have been applied during the configuration of the system.
So are my statistics accurate?
Unfortunately it isn’t that simple to decide which is the most accurate method or product and there is no definitive answer to this question. If there is even the slightest variation in any or all of the data collection factors then your reports will differ greatly between the software products. You will get different answers depending on who you speak to but generally the industry recommendation is to look at statistical trends rather than actual figures. Make sure when presenting figures to your management team that you emphasise that website statistical figures cannot be 100% accurate and they should only be used as way of gauging visitor trends.
If you want to measure figures use one tool and don’t change it. If you do need to change tools or use multiple tools, then planning a transition between the old and the new numbers is a recommended step.
Here at Bigwave Media we use a combination of log file analysis tools and script-based tools. If you’d like us to advise you on any area of your website management just drop us a line at [email protected].
References:
http://www.google.com/analytics/
http://www.smartertools.com/
http://awstats.sourceforge.net/
http://www.panalysis.com/
http://en.wikipedia.org/wiki/Website_statistics