jump to navigation

Click Fraud Detection Concepts March 30, 2006

Well, I’ve finally gotten around to writing my first post, and hopefully, it’s not my last. Quite frankly, detecting click fraud is very simple, yet very difficult at the same time. Mo gave you the background in his last post, and now I’ll flesh out the details, and hopefully get around to publishing the source pretty soon.

Let us consider only the click fraud occurring from hitbots or automated programs. There are a bunch of factors that can be looked at into calculating whether a click is fraudulent or not and most of these are actually statistical factors. For instance, if the click came from a proxy, or the search term was used n% of the time, or the number of clicks from a certain range of ip addresses, etc, etc. We decided to go about things a different way and that was to examine the question, “What can a human do that a bot can’t?”

Well, we’re not really going to go into that. What we did come up with is that most bots that were used at the time weren’t that good since they didn’t do full emulation, which is javascript, accurate mouse movement, and click events. Bots that didn’t use javascript were automatically filtered out. That includes scripts that use wget, curl, perl, etc. that are easy to automate. These bots use large proxy lists and fake the user-agents. Analysis of server-side logs won’t really tell you anything except if you do long statistical analysis. But, the way we figure it, it’s pretty rare for a browser to be using IE / Mozilla / Safari and not have JS enabled. Once upon a time we actually had stats for this, but I’d still guess it’s less than 1%. That was a long digression.

So that now leaves us with bots that do have JS emulation. This would include Internet Explorer automation, etc. Assuming the person doing the click fraud has a zombie network, (yes, we have witnessed this), this would make ip based detection close to impossible. So on to how one would detect a zombie click bot (and yes, I just made that term up).

The Algorithm

  • 1. Browser loads up Page A with all the links that need to be protected going through the binary and will be redirected to the correct URL later on.
  • 2. In page A, one calls the binary with an ssi/php include. The binary writes some javascript into the page with a token.
  • 3. Now this is an ajax type method, the javascript makes another call to the server, passing the token back. And there’s a time limit for the token passing back and forth. We do this to prevent easy automation
  • 4. If the token is correct, more javascript is given to the browser. This time it’s to modify the page behaviour. Everytime the user clicks on any item, a key that is written in the javascript is passed back to the server.
  • 5. The binary examines the key, if it is correct, it will let the surfer go onto the URL specified, otherwise dependent on the settings, it will redirect it to somewhere else.
  • So the crux of this is the javascript since it would be difficult to brute force this. Of course, one could still write a bot to do all the js parsing, but let’s assume for a minute that we make it reasonably annoying. If a human comes to the page, they will not notice the javascript at all or any changes in page behaviour. If a javascript capable bot comes to the page, it will pass steps 1, 2, and 3. However, the javascript test can be made arbitrarily difficult. Currently, it’s examining the events that are fired on a click. A click consists of a mousedown event, a mouseup event, and an onclick event. Most bots don’t bother emulating this, but it can be done. Fortunately for us, the javascript can be made arbitrarily difficult (though a great deal of testing is required), and the algorithm in concept can’t be bruteforced (though it’s present implementation has several weaknesses).

    And that is the essence of detecting click fraud without having to do statistical analysis, ie. finding out what a human can do easily that a bot cannot.

    Comments

    1. Bjorn Stannek Persson - April 17, 2006

    This was a very interesting article, and I’m looking forward to seeing your software in action as well. The simpler way of doing it, not showing the ads more than once or twice for the same ip within a set timeframe or for a specified number of pageviews, suddenly seems so obsolete, but it does defeat the human side of click-fraud. How does your software tackle that side of it?

    2. TJ - April 18, 2006

    It doesn’t handle any of the statistical analysis, so if there is a person in India clicking on links, it doesn’t take care of any of that. The software deals only with bots, the human side requires analyzing a lot of stats.