#1
  1. No Profile Picture
    Registered User
    SEO Chat Explorer (0 - 99 posts)

    Join Date
    Feb 2004
    Posts
    10
    Rep Power
    0

    A couple questions for the experts: Robots.txt and URL rewriting


    I have recently been advised to include a robots.txt file in one of my Web sites. As I understand, the robots.txt file tells the google spider/bot not to "spider" certain pages. My question is what pages should I include in the robots.txt?

    Also, dealing with URL rewriting - what is the best and most acceptable way of doing this. I was told that some techniques are better than others. I know of two ways to rewrite: 1. .htaccess and mod rewrite. Is one preferred? Does it matter? Also, when rewriting www.mysite.com/index.php?id=324 =should it be www.mysite.com/index/324 or www.mysite.com/company/ ?

    Thanks in advanced!

    -Ironchef

    On another subject - does anyone know about Reverse DNS? What is it? And how I would accomplish it?
  2. #2
  3. No Profile Picture
    Contributing User
    SEO Chat Explorer (0 - 99 posts)

    Join Date
    Sep 2004
    Posts
    57
    Rep Power
    15
    My question is what pages should I include in the robots.txt?
    It depends on what you want.

    If there pages that you don't want to show in search results (private pages, poor content pages, etc.), add their path in the robots.txt file.

    If you want spiders crawl all pages, just create a robots.txt that disallows nothing:

    Code:
       User-agent: *
       Disallow:
    Check the file with a robots.txt checker , to ensure its format is valid.


    Also, dealing with URL rewriting - what is the best and most acceptable way of doing this.
    I do it with .htacces and mod rewrite. It's a good method.
  4. #3
  5. Contributing User
    SEO Chat Discoverer (100 - 499 posts)

    Join Date
    Jul 2004
    Location
    Ritzville, Washington State, USA
    Posts
    162
    Rep Power
    15

    Cases vary.


    Originally Posted by ironchef
    I have recently been advised to include a robots.txt file in one of my Web sites. As I understand, the robots.txt file tells the google spider/bot not to "spider" certain pages. My question is what pages should I include in the robots.txt?
    A robots.txt file can block particular directories or even particular files; but it can also block by robot. The commonest use of robots.txt is to keep certain "evil" robots out, such as known email harvesters (though, in truth, evil bots often ignore the protocol anyway). Another common use is to block a particular "referrer" file (typically a php script) so as to "hide" certain outbound links from the search engines (one links to the referrer with the URL as a passed parameter in the link call, but the SE bots cannot follow past the referrer, which simply passes on the link to its intended target).

    Also, dealing with URL rewriting - what is the best and most acceptable way of doing this. I was told that some techniques are better than others. I know of two ways to rewrite: 1. .htaccess and mod rewrite. Is one preferred? Does it matter?
    Whoa, Nelly. The mod_rewrite module, part of the Apache server-software package, is normally accessed through entries in a local .htaccess file; those are not two distinct things. (One can use mod_rewrite or mod_alias, depending on the complexity of the needed redirecting.)

    There is no "should be": you make it what seems best. Perhaps the simplest form would be--
    www.nowhere.com/324.html
    --
    Cordially,
    Eric Walker
    OmniKnow, the online encyclopedia, and
    SEO Tools, Toys, and Packages

IMN logo majestic logo threadwatch logo seochat tools logo