Great community. Great ideas.
Welcome to SEOChat, a community dedicated to helping beginners and professionals alike in improving their Search Engine Optimization knowledge. Sign up today to gain access to the combined insight of tens of thousands of members.
Nov 21st, 2012, 03:57 AM
How to remove very LARGE number of links from Google Index
I work as seo consultant for large website. Main content of website are pages of objects(accomodation, restaurants etc).
Because of bad SEO we end up with literally houndred of thousands of pages indexed. We repaired this in summer - all bad links have 301 redirects, and nobody links anymore to that content. After half a year nothing changed. Pages are still in google index. I googled and found 3 possible sollutions:
1. use google removal tool: it was designed to remove only few pages, and according to google it should be used only for urgent removal. I can write a script that will automaticly add pages to google removal, but it is against their policies (if links was in one folder it would be possible to delete all in folder, but they are not).
2. define rule to robots.txt
Google knows wildcards in robots.txt, so i can define rules to remove this urls. Problem with this solutions is, that google will not crawl this pages, but they will remain in index. When or if ever google removes it from index, i didnt find answer for that.
I put few of my bad pages to robots.txt month ago and they are still indexed.
3. put links back on my site - google will find them, go through them, find 301 and deindex them. But I am not sure if this isnt harmfull for my site.
I am hopeless with this. Do you know any better sollution?
Nov 30th, 2012, 10:20 PM
I am sorry that no one has replied to this but your description of the problem is highly confusing, apparently not only to me.
I understand that you have a bunch of pages indexed that you do not want indexed. If I understand correctly, you
have removed the content of those pages and 301 redirected their URLs (is that what you mean by "bad links"?). So why is it a problem that those pages are still indexed? As I see it, the worst thing that can happen is that people will find you in search via those pages, click on the search result and be 301-redirected to some better page, no? It's hard to fathom how having *more* pages indexed can hurt you.
If there is a specific reason (please explain it?) and if you *must* have those pages deindexed, one thing to try would be to generate a new XML sitemap and submit it via Webmaster's tools, then wait a while.
Dec 1st, 2012, 03:53 PM
I had a similar problem with about 50k pages of low quality, near duplicate content being indexed that shouldn't have been. My fault, they found their way into my sitemap, I should have been more careful. To make it worse, they were short life pages, and as they died my 404's skyrocketed. It took me about 6 weeks to clear out the 404's in WMT. All the junk pages have been 301 redirected to the nearest similar page. I set robots to nofollow, noarchive, noindex for the pages. I of course corrected my sitemap. A couple of weeks after I did this, I used the removal tool to remove the directories that these pages fell into. The junk pages are out of the SERPS now, but WMT still shows them in the number of indexed pages.
Strangely, even though WMT updates my number of indexed pages every Sunday, the number only changes every other week. So I am waiting 2 weeks to hopefully see some of these junk pages deindexed. The count only goes down by 3-5k each time. In the way that you are only allowed to clear out 1000 404 listings a day, I wonder if there is an invisible limit to the number or percentage of pages that can be deindexed within a certain time period.
If there is a faster way to get them deindexed I would love to know it. It's taken 5 weeks to see a drop of 16k. I'd like to see another 30-40k gone.
I have not seen much in the way of positive ranking results yet from my efforts. Search query keywords and postitions have started to improve, but impressions are still down and WMT shows only 5 clicks a day, never more. I realize it's probably a rounded estimate for that number, but it's always the same. It's like I'm only allowed x amount of visitors a day via G.
Last edited by eeyipes; Dec 1st, 2012 at 04:08 PM.
Dec 3rd, 2012, 04:47 AM
Actually you understand it quite well.
Originally Posted by PhilipSEO
1. I want them to be deindexed because they were nearly duplicate content, and they ara messing up my SERP. Bacause of this duplicates my pages rates 2-3x lower.
2. Solution with sitemap is not working, at least for my page. I created a sitemap for that pages 3 months ago, but that didnt help. Pages from that sitemap was still indexed 2weeks ago.
3. It seems that solution with robots.txt is working, 3weeks ago I defined wildcards for 100k of my pages, and they begin to disappear from google index.
By GaryTheScubaGuy in forum Link Development
Last Post: Feb 10th, 2011, 04:25 PM
By rjonesx in forum Google Optimization
Last Post: Dec 23rd, 2010, 04:42 AM
By Alex324 in forum Google Optimization
Last Post: Mar 12th, 2006, 04:10 AM
By thewormman in forum Google Optimization
Last Post: Jun 4th, 2005, 03:01 AM
By -search-engines-web in forum Google Optimization
Last Post: Dec 27th, 2003, 06:25 PM