#1
  1. No Profile Picture
    Registered User
    SEO Chat Explorer (0 - 99 posts)

    Join Date
    Sep 2013
    Posts
    2
    Rep Power
    0

    Regarding Robots.txt


    Greetings!

    Could anyone please tell me the reason behind using robots.txt file? I f we want to hide certain pages from bots then why are we keeping such pages on site at first? Wouldn't it be better if get rid of those pages?

    Also, what kind of pages do we put under robots.txt file?


    Thanks
    Richa Kaushal
  2. #2
  3. No Profile Picture
    Contributing User
    SEO Chat Explorer (0 - 99 posts)

    Join Date
    Aug 2013
    Posts
    43
    Rep Power
    6
    robots.txt unlike what you said, will not "hide" pages from the bots that crawl your site, this is a public file and anyone can see what your placing there. By placing pages in your robots.txt you are instructing robots to crawl or not crawl the pages on your site. Keep in mind that this is a suggestion for robots not to crawl your site, not all will listen but friendly robots like those of the search engines will most often follow your directions. Placement of files or directories into your robots.txt can be for any number of reasons, mostly it is up to you and are pages that you wish the search engines to not index. Some common types of pages that are put into robots.txt are:

    Login and Shopping Cart pages - necessary pages to have on your site, but they can also produce undesirable URLs through session ids
    Duplicate content from URLs created by your CMS - URLs being created by parameters, often caused by search functions, blogs and wordpress sites often have
    tag pages which create duplicate content
    Registration Pages
    Admin Files
    cgi-bin files

    The list can go on and on and can change due to the needs of your site, therefore without seeing the site its hard to say what you should or should not place in the file. Robots.txt can be thought of like a broad sword rather than a precision knife, the incorrect use of robots.txt could block your whole site or very important parts of your site with a small error. Meaning it may not always be the best solution to your problems.

    Check out The Web Robots Pages for more information about robots.txt

    Hope that helps some.
  4. #3
  5. No Profile Picture
    Registered User
    SEO Chat Explorer (0 - 99 posts)

    Join Date
    Sep 2013
    Posts
    2
    Rep Power
    0
    Originally Posted by kevin.w
    robots.txt unlike what you said, will not "hide" pages from the bots that crawl your site, this is a public file and anyone can see what your placing there. By placing pages in your robots.txt you are instructing robots to crawl or not crawl the pages on your site. Keep in mind that this is a suggestion for robots not to crawl your site, not all will listen but friendly robots like those of the search engines will most often follow your directions. Placement of files or directories into your robots.txt can be for any number of reasons, mostly it is up to you and are pages that you wish the search engines to not index. Some common types of pages that are put into robots.txt are:

    Login and Shopping Cart pages - necessary pages to have on your site, but they can also produce undesirable URLs through session ids
    Duplicate content from URLs created by your CMS - URLs being created by parameters, often caused by search functions, blogs and wordpress sites often have
    tag pages which create duplicate content
    Registration Pages
    Admin Files
    cgi-bin files

    The list can go on and on and can change due to the needs of your site, therefore without seeing the site its hard to say what you should or should not place in the file. Robots.txt can be thought of like a broad sword rather than a precision knife, the incorrect use of robots.txt could block your whole site or very important parts of your site with a small error. Meaning it may not always be the best solution to your problems.

    Check out The Web Robots Pages for more information about robots.txt

    Hope that helps some.
    Thanks Kevin. I got a few gold nuggets from it. Thank you for sharing that information.
    I would really appreciate if you could tell me what do files like Admin Files and cgi-bin files have in them? I mean what do they contain?
  6. #4
  7. No Profile Picture
    Contributing User
    SEO Chat Explorer (0 - 99 posts)

    Join Date
    Aug 2013
    Posts
    43
    Rep Power
    6
    I'm glad to hear that you got some useful information out of that! I would just like to re-affirm that the pages and folders that I discussed above were just examples of types of pages and folders commonly disallowed through robots.txt, what is and should be disallowed is different for every site. As for the cgi-bin folder...historically they were used to secure scripting used on websites, like a form for instance. (Please note that is a very simple explanation, I do not claim to be a programmer with extensive knowledge about these) They are commonly blocked for security reasons. This may not be applicable to your site as .php and .asp have become more standard for scripting. The admin folders can contain credentialing or login pages like on a wordpress site (/wp-admin the admin folder for wordpress sites), also blocked for security reasons. Like I said these may not be an issue for your site as each site is different. A good way to look at your site and see what could be showing up for those folders would be to do a site search in Google.

    For example,
    site:yoursitehere.com admin
    site:yoursitehere.com cgi-bin

    I hope that helped to answer your question/clarify a little more

Similar Threads

  1. Replies: 2
    Last Post: Jul 6th, 2011, 08:01 AM
  2. Study Results: Search Engines, Meta Robots Tag and Robots.txt
    By SEO Chat in forum SEO Chat Articles
    Replies: 1
    Last Post: Jan 13th, 2010, 01:47 PM
  3. How Search Engines Handle the Meta Robots Tag and Robots.txt
    By SEO Chat in forum SEO Chat Articles
    Replies: 0
    Last Post: Nov 23rd, 2009, 09:00 AM
  4. can u block robots accessing https pages with robots.txt?
    By Loco007 in forum Google Optimization
    Replies: 7
    Last Post: Feb 27th, 2009, 04:36 PM

IMN logo majestic logo threadwatch logo seochat tools logo