#1
  1. No Profile Picture
    Contributing User
    SEO Chat Explorer (0 - 99 posts)

    Join Date
    Aug 2005
    Posts
    91
    Rep Power
    13

    Deindexed pages?


    I just found that I've gone from having about 1200 indexed pages in MSN and Yahoo to having barely 200... Is there any way at all to figure out why....?

    Also, some of my indexed pages in Yahoo shouldn't be getting indexed - this is not a robots.txt issue though... I need to figure out which of my pages is linking to the page I don't want indexed. ie.
    http://www.domain.com/a/b?c=d yields a legitimate (non-404) page BUT not one that I want indexed. I need to see how the robot got there (from which page on my site). Is there some combination of allinurl: domain: etc. I can use to see which page is pointing at the offending page?

    Thanks for any help
    Noah Gilbert
    http://www.osidealive.com
  2. #2
  3. No Profile Picture
    Contributing User
    SEO Chat Discoverer (100 - 499 posts)

    Join Date
    Aug 2005
    Location
    down by the seside.net
    Posts
    153
    Rep Power
    13
    Originally Posted by baal32
    ... http://www.domain.com/a/b?c=d yields a legitimate (non-404) page BUT not one that I want indexed.
    You need a little bit of white-hat cloaking :-). What you want to do is put a little pre-processor on all your pages, checking for a spider visit in general (use iP-addresses and user agents), check the parameters and 301-redirect to the same page with optimized parameters (eg /a/b?c=d&e=f&g=h is not good, so you 301 redirect from there to /a/b/?e=f ). Google will then in time remove the "bad" URL and replace it with the new one, with the optimized parameters. Takes a bit of work, but if you get "bad" links from elsewhere (or even have them on your site), Google is bound to find and index them, possibly harming you through "duplicate content".

    Check out Sebastians site, he has some more information and ideas on this: http://smart-it-consulting.com (too lazy to find the exact URL)

    Cheers!
  4. #3
  5. No Profile Picture
    Contributing User
    SEO Chat Explorer (0 - 99 posts)

    Join Date
    Aug 2005
    Posts
    91
    Rep Power
    13

    Actually


    The problem actually is a little weirder.

    To be exact, Yahoo has this page cached:
    http://www.osidealive.com/visitors/item/55
    but this page shouldn't exist

    the correct page would be this
    http://www.osidealive.com/dining/item/55

    As you can see they're basically the same page (as you've probably guessed the pages are dynamically generated and the /item/55 portion is actually more 'important; to the page generation than the /dining/ vs /visitors/

    The thing is, you shouldn't be able to get to
    http://www.osidealive.com/visitors/item/55 excewpt for by manually tryping it in, so somewhere I have a scripting error that's creating this link. I need to find where that is occurring...

    Originally Posted by softplus
    You need a little bit of white-hat cloaking :-). What you want to do is put a little pre-processor on all your pages, checking for a spider visit in general (use iP-addresses and user agents), check the parameters and 301-redirect to the same page with optimized parameters (eg /a/b?c=d&e=f&g=h is not good, so you 301 redirect from there to /a/b/?e=f ). Google will then in time remove the "bad" URL and replace it with the new one, with the optimized parameters. Takes a bit of work, but if you get "bad" links from elsewhere (or even have them on your site), Google is bound to find and index them, possibly harming you through "duplicate content".

    Check out Sebastians site, he has some more information and ideas on this: http://smart-it-consulting.com (too lazy to find the exact URL)

    Cheers!
  6. #4
  7. No Profile Picture
    Contributing User
    SEO Chat Discoverer (100 - 499 posts)

    Join Date
    Aug 2005
    Location
    down by the seside.net
    Posts
    153
    Rep Power
    13
    If you think the link is coming from your site, push the site through a crawler like my GSiteCrawler ( http://johannesmueller.com/gs/ ), if it's a normal link, it'll find it and let you know from here you first linked to that URL (it only finds the first link, so run it through it again if you want to be sure you get them all). You'll probably find some other "bad" links you didn't know about, at least I find them all the time on my sites :-))

    Cheers

Similar Threads

  1. Session IDs have created 60000 indexed pages
    By Seobiznezzy in forum Google Optimization
    Replies: 2
    Last Post: Sep 16th, 2005, 04:29 AM
  2. Here we've gone again
    By straitsex in forum Google Optimization
    Replies: 21
    Last Post: May 22nd, 2005, 05:38 PM
  3. Dynamic pages dissapeared from google
    By donkeyderby in forum Google Optimization
    Replies: 14
    Last Post: May 17th, 2005, 10:55 AM
  4. Which is good - HTML /ASP /PHP pages for Google ?
    By obiztek in forum Google Optimization
    Replies: 2
    Last Post: Jan 7th, 2005, 05:28 PM
  5. Google site: prefix search
    By channel5 in forum Google Optimization
    Replies: 9
    Last Post: Nov 24th, 2004, 11:37 AM

IMN logo majestic logo threadwatch logo seochat tools logo