Google has initiated several questionable algorithm changes over the last few years: sandbox, domain age, links aging, duplicate content, etc. It has frustrated many, but in the commotion of what it has done to our wallets, we rarely have considered whether or not these changes were in the best interests of searchers. I would like to comment on some of these. Please read the whole thing before you respond, there is a lot to be said.
(1) Duplicate Content
This is perhaps the greatest deceit committed by Google over the last few years. They have convinced us all that duplicate content is not valuable (duplicated across multiple sites or on the same site). This is simply not the case...
-> Duplicate content between multiple sites is actually evidence of its value. If an article is replicated across several web sites, it is evidence of its value to the community. Google's intention is to only show one of those results - which is a huge deceit commited against the searcher. He or she is led to believe that the article listed #3 is alone, even though it has been reprinted by 50 other publications. Duplicate content is representative of a beautiful, hand-audited network of quality content. The very essence of Google was this concept of links being like book citations. Book citations include duplicate content. They copy valuable content and cite the source. This is exactly what search engines NEED to be useful!
-> Duplicate content on the same site is far more representative of providing value to the user than of trying to dupe the search engines. A user-friendly ecommerce web site should present a terms of service, privacy statement, refund description, contact information, etc. on every product ordering page. Users should not have to click off to separate pages of the site to search out what their policy is on all of these issues. But, because these policies are uniform, this kind of information overshadows the unique content on a page and often initiates a duplicate content filter, causing the product pages to become supplementals.
-> Duplicate content is often a side effect of a comprehensive site. A site with 25 sections included in a navigation bar, parts duplicated across the top and bottom for usability, with bread crumbs is cannon fodder for a duplicate content filter. Webmasters are relegated to hiding these sections in javascript, flash, or image maps to avoid the penalty, but in the end do great damage to usability, especially for the disabled.
-> Unique content is often totally valueless. Take for example a site that sells car parts. They have windshield wipers specific to every make and model of cars released in the United States over the last 10 years. How many different ways does Google expect a site to say "These are the windshield wipers for a 1990 Dodge Caravan"? What ends up occuring is these pages get duplicate content penalties and, instead, sites that offer a smaller number of products end up ranking higher. Thus, users get placed onto sites where they are less likely to find other products that might fit their vehicle.
The true problem with duplicate content is when it is combined with other more nefarious search engine strategies, such as cloaking or doorway pages. Duplicate (on-site) content that stands alone is far less likely to be nefarious or created with the intent of duping the search engines.
Once again, what is the result of product pages not ranking because of duplicate content? Buy your way into Google Adwords.
(2) Domain Aging
A general trend noted across the board has been the value of old domains. Across every keyword field that my company has tested over the last 3 years, the relationship between allinanchor, allintext, and allintitle to SERPs has decreased while the relationship between Domain Age and SERPs has dramatically increased.
Google should know that Age is rarely a good determinator of value. Its founders were fresh out of Stanford when they began. If we applied the same filter to them, Sergei would still be working the college computer labs to pay off his educational loans. This has been greatly detrimental in fields which are research and innovation based, such as the entire technology industry and health industry. Because age is such a powerful factor, sites which are relaying new, useful information do not rank for quite some time. The information that is easily accessible tends to be archaic and generic.
This is particularly noticeable among niche, secondary or tertiary non-competetive keywords where only a handful of quality results exist. The sites that tend to retain the top positions are product pages from very old sites (the first to list on the web). A particular former client created a site from scratch in 2004 which contained over 500 articles specific to a nich topic. Each article was unique, written by Doctors, and absolutely useful. The site gained a large number of quality inbound links natural. The site's on-site SEO was impeccable. Subsequently, the site ranked in the top 10 for allinanchor, allintext, and allintitle. It is now 2006. This site continues to rank #284 for the niche keyword in google. #1 in Yahoo. #1 in MSN. For the top 100 keywords that my clients are targetting, you will not find 1 website created after 2003 in the top 10 - regardless of allinanchor, allintitle, and allintext rankings.
The result, just what Google wants. The client faithfully pays Google $1.13 per click for this term.
I like to think of search engines as mega Democracies. Each web page / site is a voter. They can vote as many times as they like, but the more they vote, the less their votes count (number of outbound links). If they know the candidate personally (same IP, same website, same Class C), their vote counts less. If they are experts on an issue, their vote counts more (theming). Domain and links aging though? They are incumbency, the bain of Democratic existence. They are the popularity contest that kept the smart transfer student from ever becoming Student Body President. They are the George W. Bush's who get nominated because upwords of 10% of voters think he is his father. The result, everyone has to pay the marketer to get their name out to fight the incumbent - the marketer is Adwords.
(3) The Sand Box
While there is and always will be question regarding what exactly is the sandbox, how long it lasts, and how to avoid it, I believe that there are some serious value questions which Google must answer in justifying its usage. For our purposes here, the sandbox is generally considered to be a method of preventing new sites from acquiring links abnormally fast and moving to the top of the search engines. Ostensibly, this is simply to prevent sites from using aggressive SEO tactics to launch new sites to the top of the SERPs.
I use the word "ostensibly" because the sandbox simply does not accomplish anything that could not be more accurately accomplished using traditional link verification methods. Filtering out link spamming techniques and using percentages of outbound links relative to inbound links are simple techniques which would counteract the majority of these attempts to boost rankings quickly. Improving these methods would exclude illegitimately acquired links.
Links-aging, often identified as the major culprit behind the sandbox, is simply not an accurate method of determining the value of a link. On the contrary, if the other techniques mentioned above are applied, the only links remaining to be questioned by a links-aging filter are those legitimately acquired. Thus, a links-aging filter tends to only harm sites with a large number of legitimately acquired links, such as phenomenon sites. This has been documented in several cases (Christopher Walkin 2008 for example) and has driven users on many occassions to use Yahoo and MSN to find phenomenon web sites.
What the "sandbox" and "links-aging" filters have accomplished, though, is encouraging new web site owners to use AdWords to promote their site through Google's search engine. Unlike the natural search results, deep enough pockets can secure a top ranking for any keyword in 1 day.
Conclusions
Generally speaking, an accurate linking-scheme filter is all that is needed to make a good search engine algorithm. The beauty of Google's original vision is that it turned the entire web into a giant Democracy - a hand audited system of determining the value of a website. All Google has to do is remove voter fraud.
The filters mentioned above are crude shortcuts that have improved quality marginally at best while excluding a large amount of quality content and ignoring a large number of new quality web sites. Their impact on AdWords revenue (just looking at my own client's participation) is far greater than their improvement of search relevancy and importance.
Google needs to focus on its original methods. A good link-scheme filter would encourage SEOs to do the only thing they can to get good, natural back links - write good content with open-republication policies (Yes! Encouraging other webmasters to duplicate the content with links back to the original - think quotes in an academic publication! This is good Google! Not Bad!). This is where SEOs are valuable to Google, to the Search User, and to the Web Site Owner.







Comments on this post