Thread: GoogleBot FAQ

    #1
  1. No Profile Picture
    Contributing User
    SEO Chat Explorer (0 - 99 posts)

    Join Date
    Feb 2003
    Location
    B.C., Canada
    Posts
    37
    Rep Power
    17

    GoogleBot FAQ


    Sorry guys, I had been away for 2 weeks since I was out of town...
    here is some info for some people who want to know more about googlebot....Just a basic info... i got most of the info off the official page below....

    Official page: http://www.google.com/bot.html
    Robotstxt information regarding Googlebot: http://www.robotstxt.org/wc/active/html/googlebot.html

    1. What's the name of The Google Spider.
    2. What's the difference between Deepbot and Freshbot.
    3. What's the User Agent for Googlebot?
    4. How do you tell the difference between the deep crawl and the fresh crawl.
    5. When does Google use the different spiders?
    6. How do i see if i have been spidered?
    7. I haven't got access to my logfile, how do i then see if Googlebot spider my pages.
    8. I have changed my DNS/Ip and Googlebot doesn't come anymore.
    9. How do i get Freshbot to visit my site?
    10. Freshbot has been to my pages what happens then?
    11. My site was down during some parts of the deep crawl what happens now?
    12. What is a spidertrap and how do i prevent it?
    13. Should i include/exclude Googlebot in my trafficreports?
    14. How do i get Googlebot to spider my dynamic (url) pages?
    15. How do i prevent Googlebot from spidering my site/page/graphics?
    16. I've been deepspidered what now?
    17. Which Ip does Froogle spider from?
    18. Googlebot spiders both my http://domain.com and http://www.domain.com, what should i do?
    19. Does Googlebot crawl Adwords?

    __________________________________________________ ____________________________________

    1. What's the name of The Google Spider.
    Google calls its spider "Googlebot" whether it's a male of female we don't know.

    2. What's the difference between Deepbot and Freshbot.
    This is very well described in:
    Google Updates and Everflux, the Monthly Mid-Cycle Changes
    This is a really good thread if you are not used to the concepts of Deep Crawl and Fresh Crawl

    3. What's the User Agent for Googlebot?
    Googlebot/2.1 (+http://www.googlebot.com/bot.html)
    This appears for both Fresh Crawls and Deep Crawls.

    4. How do you tell the difference between the deepbot and the freshbot.
    The deepbot and the freshbot uses different IPs.
    The Deepbot uses IPs which run from 216.*
    and the Freshbot uses IPs which start with 64.*

    5. When does Google use the different spiders?
    The deepbot is sent out after each update, it normally takes a few days before it appears.
    It can continue to spider your site for many days afterwards, for most sites it visits within a 2-7 days period.
    The first thing it will request if the Robots.txt file, it may take days before it comes back, this is because
    Google uses schemes when they spider and crawl, which is mostly to put off
    the heavy load which Googlebot can cause when it requests pages.

    Page Rank vs. number of deep crawl listings.

    6. How do i see if i have been spidered?
    The easiest way is to do a search for "Googlebot/2.1 (+http://www.googlebot.com/bot.html)" in your logfile.
    Since both deepbot and freshbot uses this User Agent you will need to look at the ip to see if it's the deepbot or freshbot.

    7. I haven't got access to my log file, how do I see if Googlebot spiders my pages.
    Use a server side include (eg. Apache XSSI, PHP or ASP) to embed or call a script that checks for:
    "Googlebot/2.1 (+http://www.googlebot.com/bot.html)" as the USER_AGENT
    or "crawl*.googlebot.com" or "crawler*.googlebot.com" as the HOST. You cannot use an image or Javascript based tracker because Googlebot won't trigger it.
    Also take a look the different threads in the the "Google News" forum, it's often mentioned when they start deep spidering.

    8. I have changed my DNS/Ip and Googlebot doesn't come anymore.
    Recent thread about this: google not liking new IPs?
    Also from the Google Knowledge base:
    Googlebot How long does Google cache IP's?
    From it:

    Personally I prefer to keep a site on both IPs for a month. This
    required a helpful Web host if you don't run your own equipment. (-ciml)



    I think this is a very good tip.

    9. How do i get Freshbot to visit my site?
    This is also stated in the excellent
    Google Updates and Everflux, the Monthly Mid-Cycle Changes
    The easiest way is to have good inbound links, and it will help if they have a higher PageRank.
    Also if you change the content on your site it will help, I would say these are
    the 2 most important factors for getting the freshbot to visit your site.

    10. Freshbot has been to my pages what happens then?
    Freshbot visits for a numerous of different reasons.
    They best way to get pages spidered by the Freshbot is stated in the

    Google Updates and Everflux, the Monthly Mid-Cycle Changes
    Another tip was suggested by GoogleGuy:
    Are you using If Modified Since?
    If your site is completly new and gets spidered by the Freshbot before it's indexed in the mothly update,
    it may fall out of the index, from my experience if you haven't changed anything on the pages
    it normally drops out after around 5-7 days.
    This is not a "general" rule though, it deppends on a lot of other factors too.

    11. My site was down during some parts of the Deep crawl what happens now?
    GoogleGuy reported last year that they increased support for re-spidering of sites which
    had problems during the "normal" deep spidering. Vitaplease made a very interesting comment in:
    Google page dropping question

    I think basically I'm asking... does Google only drop a page if it can't reach it?



    I asked a Google rep what would happen if a site was down for a day during the deep crawl. Would Googlebot come back?
    He asked what Pagerank? I said 6, he said no problem, the site should be revisited.
    (that was last October during a pubconference)




    12. What is a spider trap and how do i prevent it?
    A spider trap is when a spider re-spiders the same page over
    and over again, you can compare it to a maze (labyrinth).
    The biggest problem with spider traps is often the amount of bandwidth and server load it puts on the site which is spiders.
    It also creates a problem for Googlebot which spiders the same page over
    and over again even though the content is almost always the same.
    The most known spider trap is Session ID's, a Session ID is often used to keep track
    of the visitors, and some sites puts a unique ID in the URL:
    Each user gets a unique ID and it's often requested from each page.
    The problem here is when Googlebot comes to the page, it spiders the page and
    then leaves, it comes back to another page and it finds a link to the same page but since it has been given
    a different session id now, the link shows up as another URL. This is one of the reasons
    why Googlebot is very very carefully when it spiders pages which uses the querystring "ID=".
    I've seen and heard many cases where the same page have been spidered over
    1000 times, and sometimes it's been indexed the same amount as it's been spidered, most
    search engines have very advanced duplicate filters which removes the duplicates and selets one url.

    13. Should i include Googlebot in my traffic reports?
    The general suggestion is NO, Googlebot is not a real human being visiting your site.
    One quote which i often use is "Don't build pages for search engines, build them for users, it's
    the user who will buy thing off your website, not search engines, and search engines wants to
    generate the best results for the user, and therefore tries, to think as a user, when it ranks its results"

    14. How do i get Googlebot to spider my dynamic (url) pages?
    First thing is, do you need to have a Dynamic URL?
    There is very many things you can do to get a dynamic site spidered, the support for spidering
    dynamic URL's seem to get bigger and bigger each day.
    Always try to stay out of using Session Id's in the URL, this is the ultimate killer when
    it comes to prevent Googlebot from spidering your dynamic URL'S.
    Also try to stay out of using the query string "ID=" Since this is the most common
    used query string when it comes to presenting Session Id's googlebot seem to
    put a "flag" each time it sees it in the URL. This question have been coveded quite a
    few times, but it seem to change over time:
    If I Use PHP Will Google Still Like Me?
    Googlebot & Dynamic Pages
    Does Google index dynamic content?

    PageRank seem to play a major roll when it comes to dynamic url's and spidering, the more
    PageRank the more chance there is that the url's will be spidered, and the more PageRank the deeper googlebot will go.

    15. How do i prevent Googlebot from spidering my site/page/graphics?
    Googlebot obey's the Robots.txt standard, to prevent Googlebot to spider your site, you
    can put this in your Robots.txt file (Should be placed in the HTTP root category)

    User-agent: Googlebot
    Disallow: /

    To prevent Google from Indexing your images, use:

    User-agent: Googlebot-Image
    Disallow: /

    It's also described officialy at Google.com: No Index tags
    Also try using the Search Engine World Robots.txt Validator

    16. I've been deep spidered what now?
    If you have been deep spidered the biggest chance is that you will appear in the
    upcomming "Google Update", This happens about once a month:
    Google Update Chart.
    Be advised though that your site may not appear in the next update for a lot of reasons, most
    of them which only Google knows, and will probably not tell you.
    For the PageRank to be correctly calculated from the incomming links i advise to wait up to 2 updates.
    And all inbound links may not show up with the LINK: commando
    (Rule seem to be that only pages which have a PageRank of 4 and above will show up).

    17. Which Ip does Froogle spider from?
    From what i know Froogle uses the 64.* ip, and the same User Agent.

    18. Googlebot spiders both my http://domain.com and http://www.domain.com, what should i do?
    The best thing to do is to pick one of the url's and use a 301 redirect (Permananet redirect)
    If you don't know what or how to use it try this

    Google search for: 301 redirect

    19. Does Googlebot crawl Adwords?
    From: Google crawls URLs in adwords?
    I was talking with Google and they do not crawl the adwords links.


    It doesn't seem to help buying adword listings, and getting spidered due to it.

    Thanks
    Darkroom (SEO)
  2. #2
  3. No Profile Picture
    Contributing User
    SEO Chat Discoverer (100 - 499 posts)

    Join Date
    Jan 2003
    Posts
    475
    Rep Power
    17
    Wow, a nice LONG post. It is very useful though.

    -Josh
  4. #3
  5. Contributing User
    SEO Chat Explorer (0 - 99 posts)

    Join Date
    Dec 2002
    Posts
    77
    Rep Power
    17
    Outstanding FAQ - Well Done !!!
  6. #4
  7. No Profile Picture
    Registered User
    SEO Chat Explorer (0 - 99 posts)

    Join Date
    Mar 2003
    Posts
    6
    Rep Power
    0
    It looks LIKE http://www.webmasterworld.com/forum3/9213.htm
  8. #5
  9. Contributing User
    SEO Chat Explorer (0 - 99 posts)

    Join Date
    Jan 2003
    Posts
    80
    Rep Power
    17
    nice post!
  10. #6
  11. Registered User
    SEO Chat Explorer (0 - 99 posts)

    Join Date
    Feb 2003
    Posts
    14
    Rep Power
    0

    Re: GoogleBot FAQ


    Originally posted by "darkroom"


    4. How do you tell the difference between the deepbot and the freshbot.
    The deepbot and the freshbot uses different IPs.
    The Deepbot uses IPs which run from 216.*
    and the Freshbot uses IPs which start with 64.*
    And what about the difference between Crawler and Crawl ?

    Look at this IP range :
    http://www.maxhoo.com/crawl.shtm

    Crawl aren't always "Deep Crawlers" ? Yes or No ?

    kendos
  12. #7
  13. Moderator
    SEO Chat Good Citizen (1000 - 1499 posts)

    Join Date
    Jan 2003
    Location
    Madrid, Spain
    Posts
    1,382
    Rep Power
    18
    Originally posted by "Darrin Ward"

    Originally Posted by WebRankInfo
    It looks LIKE http://www.webmasterworld.com/forum3/9213.htm
    Hummmm, I'll leave it up to one of the mods to decide what to do in this case. It's an informative piece of info though.
    When I originally saw this post I PM'd darkroom to ask him to at least credit the source of the original post. Since this is now clear from the thread I think we can leave it, since it IS good info. Although I don't like the fact that he posts as if it were his own info.
    i got most of the info off the official page below....
    Yeah, right.

    Gringo.

Similar Threads

  1. Round Robin DNS - effect on GoogleBot
    By LilOptimizer in forum Google Optimization
    Replies: 3
    Last Post: Sep 5th, 2011, 11:10 AM
  2. IP's and Googlebot
    By clasione in forum Google Optimization
    Replies: 8
    Last Post: Oct 13th, 2003, 04:15 PM
  3. Urgent Help with Googlebot
    By mariobox in forum Google Optimization
    Replies: 11
    Last Post: Sep 18th, 2003, 11:30 AM
  4. Do you undertstand this Googlebot log data?
    By New Yorker in forum Google Optimization
    Replies: 5
    Last Post: Aug 29th, 2003, 12:42 PM
  5. why googlebot dont like our section pages?
    By clif in forum Google Optimization
    Replies: 8
    Last Post: Jul 30th, 2003, 03:49 PM

IMN logo majestic logo threadwatch logo seochat tools logo