Search

Recent Posts

Tags


« | Main | »

Is Google-Bot gamed?

By Dale Reagan | March 17, 2011

Is Google-Bot gamed?**

Is web-SPAM a ‘bot’ problem, an ‘indexing problem’, a content provider problem, or a user problem?

NOTE – this is probably a problem for any search engine – it just so happens that a Google-bot log entry caught my eye in this instance…

Imagine your doctor NOT treating you for a diagnosed condition with a known remedy because it is not a perfect solution? This appears to be the current approach used by search engines… The search engines want users to:

So, WHO IS RESPONSIBLE for the CONTENT produced by a SEARCH ENGINE?

I will stand up and say that the search engines are responsible for their content in the same manner that I am responsible for mine.

This is, perhaps a different view, and, as with any INTJ I will simply state my position knowing that most folks do not think in this fashion (my opinions are based on reason, logic, experience, experimentation/replication, reflection, etc., etc.)   Pausing for reader reflection… 🙂

Is it really my problem to figure out how to convince a bot/engine that I am about quality content? (No…)  It is their problem to figure out if I provide quality or not…  IMO, any other view is a short term, only based on a traffic based revenue model, eventually-destined-to-failure approach.

The NEXT search engine will figure out how to do this without all of the current foolishness tied to the social thing/stuff, which with some reflection, tends to lead to biased, very small set results (you are limited by your ‘circle’), i.e.:

So, what about the recent Google Panda Update (Q1, 2011) which supposedly targeted content-farms (removed or re-ranked them in search results) – it’s a bit late, and, based on example below simply not enough, which provides an opportunity for newcomers, like DuckDuckGo.com and Blekko.com; which seem to be making headway as search engine solutions by simply not including all the junk/SPAM that Google brings (or retaining/presenting less) to you in search results – but they seem to be relying on the social thing…  Is that enough?

BTW – a web server 404 entry started this post – I saw an error in my logs so I visited the page – I find it curious that Google-Bot is fooled by what appears to be a Web-SPAM site/domain. When I checked, the IP address below was in Google IP-Space so I can only assume that it is a legit Google-Bot hitting my server.

A quick Google search for:   site: bad_domain ~= 16K links – yep, looks like Google has is being gamed by yet another, non-original possibly-stolen-content site…

~ A month later, another example:

The above site does not allow access to ‘out-siders’ or even appear to offer any real, visible ‘value’ – so, are they borrowing my content? Is it another content farm?  A Google search for ‘site:above_domain’ shows ~2100+ pages – if the site is ‘closed’ then, IMO, Google should NOT be indexing or at least showing search results for such sites.

I venture forth – ‘testing’ – yes, I clicked on one of the Google search results and without any real surprise, the browser ‘jumped’ through several ‘webs’ (including a visit to Russia, and most likely hitting an Adword or two…)   At this point the site appears to be an ad farm – Whoops! did Google forget to remove the ‘ad farms’ when removing the ‘content farms’ or did they simply miss these types of sites?

So, if your customers are paying for all these clicks then it’s not a problem, right? Or, phrased differently, is this now the cost of customer acquisition?

Here is data from my cache server (I have inserted random spaces in domain names – the IP addresses were not altered):

TCP_MISS/200 742 GET http://dhi vote .com/30.php? – DIRECT/69.89.31.204 text/html
TCP_MISS/302 788 GET http://cou nter.ya dro .ru/hit;newtraffic? – DIRECT/88.212.196.102 text/html
TCP_MISS/200 565 GET http://counter.yadro .ru/hit;newtraffic? – DIRECT/88.212.196.102 image/gif
TCP_MISS/302 561 GET http://fast -pc-scan.cw .cm/t/ – DIRECT/46.252.134.5 text/html
TCP_MISS/302 508 GET http://www.offerss uperior .com/rl_cmprwm.php? – DIRECT/207.226.177.42 text/html
TCP_MISS/200 710 GET http://offer sadvance .com/sc.php? – DIRECT/207.226.177.41 text/html
TCP_MISS/200 669 POST http://offersadvance .com/sc.php? – DIRECT/207.226.177.41 text/html
TCP_MISS/302 466 POST http://offer sadvance .com/sc.php? – DIRECT/207.226.177.41 text/html
TCP_MISS/302 405 GET http://www.offers superior .com/rl_cmprwm.php? – DIRECT/207.226.177.42 text/html
TCP_MISS/200 1251 GET http://click .eyk .net/ez/ckkalnpkkzexe/subid=8749z74z1z11drj5 – DIRECT/207.250.205.14 text/html
TCP_MISS/200 1289 GET http://clic k.eyk .net/favicon.ico – DIRECT/207.250.205.14 image/x-icon
TCP_MISS/302 750 GET http://ww w.track imizer  com/c5cc17e395d3049b03e0f1ccebb02b4d/MBA/CD11784 – DIRECT/38.101.10.147 text/html
TCP_MISS/200 7838 GET http://www. americanpri zepane l. com/aseg-144? – DIRECT/38.101.10.147 text/html
TCP_MISS/200 72505 GET http://www.am ericanprizepanel .com/content/jScript/jQuery.js – DIRECT/38.101.10.147 application/x-javascript
TCP_MISS/200 17953 GET http://media.am erica nprizepanel .com//content/landers/walmart/lp6/images/background.jpg – DIRECT/74.53.184.178 image/jpeg
TCP_MISS/200 23542 GET http://media.am ericanp rizepanel .com//content/landers/walmart/lp6/images/continue.jpg – DIRECT/74.53.184.178 image/jpeg
TCP_MISS/200 62231 GET http://media.ameri canprizepanel .com//content/landers/walmart/lp6/images/congrads.jpg – DIRECT/74.53.184.178 image/jpeg
TCP_MISS/200 66809 GET http://media.am ericanprizepanel. com//content/landers/walmart/lp6/images/input_box.jpg – DIRECT/74.53.184.178 image/jpeg
TCP_MISS/200 70004 GET http://media.americanp rizepanel .com//content/landers/walmart/lp6/images/card.jpg – DIRECT/74.53.184.178 image/jpeg
TCP_MISS/200 315 GET http://www.ameri canprizep anel. com/favicon.ico – DIRECT/38.101.10.147 image/x-icon

What about the IP addresses being used above – what is the (relative info, probably correct at this time, but, subject to change) GeoIP Info:

46.252.134.5     |  LV, Rezekne
88.212.196.102   |  RU, 48, Moscow
38.101.10.147    |  US, AZ, Phoenix
207.250.205.14   |  US, MN, Eden Prairie
74.53.184.178    |  US, TX, Houston
69.89.31.204     |  US, UT, Provo
207.226.177.41   |  US, VA, Herndon
207.226.177.42   |  US, VA, Herndon

So we seem to have an international network (at least three countries) of web sites (I will guess that the first two-three domains are all managed/owned by the same ‘company’) connected in such a manner as to generate ad revenue with some connection to Wal-Mart (in this case a Wal-Mart coupon was presented – I will guess that Wal-Mart is also a victim in this flow.)

Assuming no errors in my GeoIP data and then following the IP-Country-Chain we see:

  1. USA server for 1st link  (69.89.31.204 – provided by Google Search results)
  2. then Russia (88.212.196.10)
  3. then Latvia (46.252.134.5)
  4. back to USA (46.252.134.5)
  5. remainder in USA

In my opinion such tactics as shown above are examples of search-engine theft; any site that replicates any of my content or even links to my content needs to be involved in my content OR IT IS A SPAM web site.  If such sites are getting Google placement ‘bumps’ by including my content then WHERE IS MY REVENUE?

So, how challenging would it be to create an algo to check for some scenario as listed above?  Five approaches have already crossed my thoughts

To Solve the Web-SPAM problem, maybe Google can book some time with Watson? (the new, IBM super-Jeopardy-playing-problem-solving-computer…)

Of course, the sad part, is that it appears that Google is not willing to solve the web-spam-problem in their own ‘content’ unless they can figure out an algo to solve it for them?   Imagine your doctor NOT treating you for a diagnosed condition with a known remedy because it is not a perfect solution?

So are Adwords gamed? (Google supposedly does not bill for all clicks – they factor in an ‘error rate’.)

Solutions/Suggestions for improvement

*TERMS_WHICH_LINK_TO_ME = terms for which my web site(s) may have good/high SEO rankings; I have previously reported this type of issue to search engines – since it continues to be a problem (and since I don’t recall any feedback on my suggestions – those algos are simply not very conversational… I did hear a rumour that people worked at least some of the SEs.)

** Google provides many wonderful and useful products and services, but, IMO, a company with such an abundance of resources and talent is falling short by not leading-the-way (in areas other than search traffic/company profits); I encourage a review of the writings of Deming.  So, is business the business of business? If Google does not clean up it’s data then one conclusion that might be made is that there is some level of collusion with the SPAMMERS – they seem to be feeding each other…

Topics: Business Blogging, Internet Search, Long Tail Search, Search Engine Optimization, Virtual-Cloud Computing, Web Problem Solving, Web Technologies | Comments Off on Is Google-Bot gamed?

Comments are closed.


________________________________________________
YOUR GeoIP Data | Ip: 73.21.121.1
Continent: NA | Country Code: US | Country Name: United States
Region: | State/Region Name: | City:
(US only) Area Code: 0 | Postal code/Zip:
Latitude: 38.000000 | Longitude: -97.000000
Note - if using a mobile device your physical location may NOT be accurate...
________________________________________________

Georgia-USA.Com - Web Hosting for Business
____________________________________