2010 Link-Referral log SPAM
Link/Referrer SPAM: For some reason this seems to have started in early 2010 (or at least that is when I first noticed it.) When reviewing server logs I was seeing strange links to my site (the referrer domains had no connection to the site content – they made no/little sense.)
- The links only read a single URL from my site (indicating non-browser behaviour) and coming from strange domains – when I investigated I would find no connection with my site or the content on my site.
- After several weeks I found patterns in the IP addresses as well as domains being pushed.
After a bit of research it seems that Web SPAMMERs started using this technique several years ago and it grew even more when Blogging software (i.e. Wordpress, Drupal, etc.) became popular AND when bloggers (and other web masters) decided that it was a good idea to show their web server logs (which contained the SPAMMER’s customer’s web addresses.) At this point I am guessing that the SPAMMERs are simply hitting every web site possible in hopes that the numbers game will yield the results that pay the bills.
Why are companies/people doing this?
The motivation is simple:
- some search engines include link statistics as a ranking factor - the more links you have to your site the higher your ranking might be
- blog software usually includes an auto-referral component which shows who is linking to your blog (hopefully you are managing this and selectively allowing/showing such links)
- any web site that displays web server logs is most likely contributing (encouraging) this problem (the SPAMMERs hit your site, you share the information and the Spiders pick up the junk information)
- combine the above and companies emerged/evolved to provide marketing referral services (you know, those emails you get where you are promised that your page will be in the #1 slot on search results…)
My solution to the problem is to both block the domains being pushed as well as the IP space being used to push the SPAM. Over a relatively short period of time the major players are easy to spot (using the same or related IP space.) No, I won’t mention any domains using these services and I also will not mention the IP space being used. A mod_security solution:
- create a blacklist of IP addresses
- create a blacklist of domains using these services
- create rules to block access for the blacklisted IP address and domains
Example Mod_security rules
######## Block 'bad' referrers
SecAction "pass,nolog,setvar:tx.REFERER='/%{REFERER}/'"
SecRule HTTP_REFERER "@pmFromFile blacklist.refer.txt" "t:lowercase,log,deny,msg:'Blacklist-Refer'"
########
The SecAction line above creates a variable based on the contents of the variables used by the web server for the current connection request. The SecRule line scans the file ‘blacklist.refer.txt’ for a match based on the REFERER variable. If a match is found then access is denied. The ‘blacklist.refer.txt’ file contains one pattern per line – it could contain as many entries as needed – and yes, a large file may impact server performance. The good news – once you install and verify the rule you only need to add entries to the blacklist and restart the web server to put your new patterns into use. Care and testing is advised since you could easily end up blocking non-spammers…
######## Block 'bad' IP addresses or IP ranges
SecAction "phase:1,pass,nolog,setvar:tx.REMOTE_ADDR='/%{REMOTE_ADDR}/'"
# check IP var
SecRule TX:REMOTE_ADDR "@pmFromFile blacklist.ip.txt" "phase:1,deny,msg:'Blacklist_IP'"
########
As with the previous mod_security rule pair, the above SecAction line first sets a variable to the IP address of the visiting connection and then the SecRule line compares that IP address against the known bad guys from the file ‘blacklist.ip.txt’. If there is a match then the connection would be denied.
Once you identify the repeat offenders (i.e. the same or similar REFERER values or the same or related IP addresses) then you can create/install a cron process to review your server logs for new/additional SPAMMING attempts; you can then update your blacklists (if using mod_security then remember that for changes to take effect you must restart your web server.)
Like many IT issues, dealing with this problem requires some time and in this case some skill with pattern matching (i.e. using Regular Expressions.) Of course you could just ignore it (as long as you are not sharing your web logs or allowing un-audited site linking…)
There are other approaches but they require manual editing of web-related access files – automation is probably a better solution.
Some references (indicating that this is an old problem)
- http://en.wikipedia.org/wiki/HTTP_referrer
- http://en.wikipedia.org/wiki/Referrer_spam
- http://www.wired.com/culture/lifestyle/news/2002/10/56017
- http://lorelle.wordpress.com/2007/08/07/battling-referrer-spam/
Note that the mod_security rules on this page are simple examples that may not be appropriate for your site(s) – your mileage will vary… and yes, I may be available if you need a consultant to assist you in optimizing your Apache web server.
Related posts:
- Blocking web SPAM with Apache and Wordpress Ignorance is bliss, right? After many years of publishing traditional...
- Dealing with Email SPAM Dealing with Email SPAM - options for your email server...
- Dealing with blog spam comments This is a small test site - experimenting is the...
- Apache, mod_security & GEO-IP I previously posted about using the mod_geoip Apache module to...
- Wordpress 3.0 – review & multi-site problems First! Hat’s Off to the folks working the Wordpress solution...