The site, http://bit.ly/WB3S68 was affected by Penguin. The site owner wanted to start over and did the following to create a new site (http://bit.ly/11xqetB):
-Used the Google URL removal tool to remove the old site from the index.
-Used Robots.txt on the old site to disallow search engines
-When the old site was gone from the index, placed the new site online. The new site has a different home page than the old but the inner pages are all the same.
A couple of weeks later, according to WMT the new site suddenly had 10,000+ links that previously had pointed at the old site's home page. There is a line underneath each link saying "Via this intermediate link: [old site]"
There was definitely never any redirect from the old to the new.
I am figuring the problem is related to what is described here, http://dejanseo.com.au/mind-blowing-hack/ where Dejan SEO noticed that if you copied a page completely Google, that page's links would appear in your WMT.
It turns out that when you remove a site from the index using Google's URL removal tool, if you choose to remove the whole site it removes it from the index and not the cache. When I searched for "webcache.googleusercontent.com/[old site]" it would display the cache for the new site! So, I went in and removed each individual url from the cache using the tool. (you only get the option to remove from cache if you remove individual URLs).
Now, 2 days later, the cache search for the old site throws a 404. Yet, all 10,000+ of the links to the old are still pointing at the new according to WMT.
My guess is that as these links get crawled they will disappear, but I would have expected at least some of them to be gone by now. I know that WMT is slow to update and I'm hoping that this is why we're still seeing them.
Questions:
1.What do you guys think of these "via another site" links appearing in WMT? Do you think Google is passing link juice through them? In the Dejan SEO article, those via links appeared even if a low pagerank site copied a high pagerank site's article, so I am guessing that the links aren't really passing juice and therefore don't really count as pointing to your site.
2. I know no one can answer this for sure, but how likely do you think it is that this site would be affected by Penguin if it refreshes soon?






Comments on this post