Analytics Click Fraud- Extracting Bots and Spiders

There currently is no 100% accurate way to extract all Bots and Spiders for statistical analysis. 

Analytics packages provide a very good tool for acquiring visitor information.  Some of these vendors include Google Analytics, Omniture's SiteCatalyst, Coremetrics, and Web Trends. 

Many analytics providers lead users to believe their products strip out all bots and spiders by default.  Some do attempt to strip this traffic out, but the accuracy is not perfect. Many analytics tools rely upon JavaScript for the tracking and the claim is that bots and spiders do not load JavaScript.  This is not the case for all bots.  The other issue is that some tools offer noscript tracking, Omniture being an example.  This helps analysts see traffic from users that turn off JavaScript.

In order for a user to gain accurate reporting on true "human traffic" one will need to know what the IP address of the visitor is and the user agent.  It is possible to populate the agent and IP in PHP with these two variables $_SERVER['HTTP_USER_AGENT'] and $_SERVER['REMOTE_ADDR'].   One can then strip out the IP address or the user agent for known bots/spiders from reporting.  It is possible to acquire lists of known bots and or spiders from different sources, here is one: http://www.user-agents.org/index.shtml.

This method of cleansing will never be 100% accurate because new bots and or spiders pop up every day.  It is possible to clean analytics data with these methods but it must be understood that analytics tools in general should be used for trending.

Contact us if you want further info.

tags: