|
|
|||||||||
|
|||||||||
|
|||||||||
| |
||
| |||||||||
![]() |
|
|
«
Previous Thread
|
Next Thread
»
|
Thread Tools | Search this Thread | Rate Thread | Display Modes |
|
#1
|
|||
|
|||
|
Crawler shows 404 for existing pages
Hello,
I use DotNetNuke for my web site but i have a strange problem. All the pages that are created as DNN pages are not indexed by Google although they are existing pages. Interestingly Live Search (MSN) have them indexed but Yahoo don't. The error I see in Google's webmaster tools is 404 (Not Found) but when I click on the link I am able to load the page. Does somebody have any idea why Google is not indexing them? Here is a link to my sitemap (URL address blocked: See forum rules) and the pages that are not indexed are for example the Search Properties, FAQs and the articles. Thanks tw |
|
#2
|
||||
|
||||
|
I get this same problem sometimes w/ google 404 not found on 120 links. In my case I believe its a bandwidth issue.
|
|
#3
|
|||
|
|||
|
Quote:
Hmm... I thought the same or eventually because they are marked as spam for whatever reason. My problem is that I don't have a lot of traffic right now and it seems that the application needs some time to "boot" before it starts serving the pages faster and if Google comes once in a while and crawls only one page then I won't have much success. Thanks! tw |
|
#4
|
||||
|
||||
|
Quote:
Yeah, it seems like google can overwhelm my server. |
|
#5
|
|||
|
|||
|
If you suspect that Google demands too much from your server, you can always reduce the crawling speed in the Webmaster Tools.
On my sites, there are frequently crawling errors concerning pages that are there after all. But with tens of thousands or more pages and the occasional server instabilities, I'm not surprised if Google misses a page once in a while. I trust that the bot comes back later and is happy to rediscover the page ;-) |
|
#6
|
||||
|
||||
|
If you think that your sites are having problems "booting" up as you call it after having a period of idle time then here is what you should do:
write or setup a spider that calls each page on a frequency just less than the cache timeout period.. so say you have a Java web app, and it is set on the standard cache timeout period of 20 minutes, then just have your spider hit every page every 19 minutes (i'd stagger the hits out so that you hit 1 in 19 pages every minute). This way all your site is always cached, so when googlebot drops by your site is nice and speedy.
__________________
www.clicksplice.com get free high PR blog links in exchange for content | Follow me on Twitter |
|
#7
|
|||
|
|||
|
Quote:
When the answer from the web server comes very slowly, you never get a 404 code. The 404 code is sent when the web server has determined that the URL you asked for does not exist. If the pages really exist, I would move to a better host or a better CMS. Jean-Luc
__________________
AWStats Support : add-on's, extra sections, forum, installation assistance Professional AWStats Services Checking redirects is now as easy as 1 2 3, even if you are not a HTTP-header guru ! |
|
#8
|
||||
|
||||
|
I don't get it, I'm just running a simple LAMP server.
|
|
#9
|
||||
|
||||
|
Quote:
I think there is a problem of your hosting. If there is request timed out from your server when googlebot comes for crawling site at that time googlebot not found your site and mention as a 404 Not found. so your site is not crawled properly by the googlebot. |
|
#10
|
|||
|
|||
|
A properly configured server will not answer with 404 if is does not have the time to answer a request.
404 means "I had enough time to complete your request, but what you were looking for does not exist here." Jean-Luc |
![]() |
| Viewing: SEO Chat Forums > Google > Google Optimization > Crawler shows 404 for existing pages |
| Thread Tools | Search this Thread |
| Display Modes | Rate This Thread |
|
|
|
|
|