#1
  1. No Profile Picture
    Contributing User
    SEO Chat Explorer (0 - 99 posts)

    Join Date
    Dec 2007
    Posts
    42
    Rep Power
    12

    Crawl errors and broken links, GWT reports "wrong"..


    Hi Guys
    as part of SEO efforts I work to reduce the number of crawl errors on my site.
    I go to webmaster tools and download the list of links that are returning a page not found error.
    Whats really confusing me is that from the total number of crawl errors that are listed, only a fraction are actually "valid" broken links (after I cross off the inbound links from external sources).

    I am surprised to see that google webmaster tools is listing flash parameters (google shouldn't be able to even read flash right ?), and the numbers of hit box and java script parameters that are getting listed as effectively broken links.

    (I look up the source page of the crawl error and check the source code and see that the link string is a parameter of some kind, not an actual link)

    Now while I am glad to realize that I am not serving my clients bad paths but I am just wondering is anyone else seeing anything similar. (And naturally are there any way s to avoid it?)

    And surely having a 1000+ not founds on a large site is not good, so if only 500 of those are real, its a big difference.

    Sam
    Last edited by sambkk; Jun 8th, 2010 at 02:49 AM.
  2. #2
  3. No Profile Picture
    Contributing User
    SEO Chat Discoverer (100 - 499 posts)

    Join Date
    Aug 2007
    Location
    England, Great Britain, United Kingdom
    Posts
    166
    Rep Power
    15
    Have you referenced your serverlogs, to see if Googlebot was given a 404 at 2am, when you "thought" your site was up and running.
    When GWT reports crawl errors, its because there were errors while crawling, not just because there are valid broken links, they were broken when it tried to crawl.
  4. #3
  5. No Profile Picture
    Contributing User
    SEO Chat Explorer (0 - 99 posts)

    Join Date
    Dec 2007
    Posts
    42
    Rep Power
    12
    Hi
    Thanks for the reply KillJoy.
    I understand what you mean. But thats not the problem. The problem really is that google webmaster tools is reporting url strings as 404 errros, that are not links.
    They are hit box & Java script parameters, even flash parameters. Obviously all kinds of urls are getting reported, some valid 404's and perhaps some 404s fall under the situation you are referring to, but a good 20-30% at least are not really 404's ( I have traced them down, looked at countless examples of where a page is reported to contain a broken link, and the only place on that page that is making a reference to the url thats supposed to be bad, is a hitbox etc parameter).

    On the google webmasters forum I found some reference regarding XHTML pages not validating and that this could be the root of the problem. It kind of makes sense, and I will do some tests, but the problem is I work on SEO of the site(s), and others are in dev and maintenance, and getting the pages all to validate is going to be a massive task, politically and "bureaucratically"..
    So if that turns out to be the issue, my ability to address the issue is limited, by "the challenges of a large team", but I hope this may then help others who could be in a similar situation.
    Will keep you posted.

    Sam
    Last edited by sambkk; Jun 14th, 2010 at 02:32 AM.
  6. #4
  7. No Profile Picture
    Registered User
    SEO Chat Explorer (0 - 99 posts)

    Join Date
    Apr 2010
    Posts
    16
    Rep Power
    0
    Do you use a robots.txt to tell bot where to crawl?
  8. #5
  9. No Profile Picture
    Contributing User
    SEO Chat Explorer (0 - 99 posts)

    Join Date
    Dec 2007
    Posts
    42
    Rep Power
    12
    Hi Weathor
    The errors are getting listed on pages that I want the crawlers to get to.
    So blocking them from the crawlers is a good idea.
    Its really odd. Some of the errors getting reported are for example within
    Code:
    <script language="javascript"> ..stuff here.. </stuff>
    tags.

    Sam
  10. #6
  11. No Profile Picture
    Registered User
    SEO Chat Explorer (0 - 99 posts)

    Join Date
    Apr 2010
    Posts
    16
    Rep Power
    0
    Originally Posted by sambkk
    Hi Weathor
    The errors are getting listed on pages that I want the crawlers to get to.
    So blocking them from the crawlers is a good idea.
    Its really odd. Some of the errors getting reported are for example within
    Code:
    <script language="javascript"> ..stuff here.. </stuff>
    tags.

    Sam
    ofcourse you know that this is <script language="javascript"> ..stuff here.. </script>

    maybe there is an document.location.replace in javascript and the url is broken??
  12. #7
  13. No Profile Picture
    Contributing User
    SEO Chat Explorer (0 - 99 posts)

    Join Date
    Dec 2007
    Posts
    42
    Rep Power
    12
    Hi

    yes it was meant to be a </script> tag ...
    Anyway I managed to get the wheels turning and the dev guys are going to test fixing some validation errors for me to see if that reduces the error count.
    luckily many of the fixes can be done at a template level so with any luck the errors will drop overnight..

    S

Similar Threads

  1. Finding broken links before the serps
    By jwbond in forum Google Optimization
    Replies: 7
    Last Post: Feb 21st, 2007, 07:40 PM

IMN logo majestic logo threadwatch logo seochat tools logo