#1
  1. No Profile Picture
    Registered User
    SEO Chat Explorer (0 - 99 posts)

    Join Date
    Oct 2018
    Posts
    16
    Rep Power
    0

    Robots.txt file index homepage only


    Hello - We are close to launching a new site. Right now I only want to index the homepage and ignore the rest of the pages. Con you please confirm that the Robots.txt file I have below would be correct?

    user-agent: *
    Allow: /$
    Disallow: /
  2. #2
  3. No Profile Picture
    Moderator
    SEO Chat Scholar (3000 - 3499 posts)

    Join Date
    Sep 2016
    Location
    USA
    Posts
    3,046
    Rep Power
    3634
    Using robots.txt to block Google from your inner pages is not the correct way to block them. Why, because if ever the link to an inner page is seen on the net, Google will crawl the inner pages.

    The most effective way to prevent all inner pages from being indexed is by use of header tags.

    Add "Header set X-Robots-Tag "noindex, nofollow" to any page and Google will not index nor follow the page. This can be accomplished using the htaccess file on a linux servers or web.config on IIS (MS servers )

    Then your robots.txt file becomes simple

    Code:
    User-agent: * 
    Disallow:
    sitemap: https://yourdomain.com/sitemap.xml
    Then when you're ready to have Google crawl your site, just remove the header-x tags.
    If you have never failed in your life, you have never achieved anything Noteworthy !
  4. #3
  5. No Profile Picture
    Contributing User
    SEO Chat Explorer (0 - 99 posts)

    Join Date
    Nov 2014
    Location
    Edison, New Jersey
    Posts
    42
    Rep Power
    52
    No, it is not the scenario that robots.txt only index homepage. you can include a sitemap.xml in robots.txt. So all your pages indexed trough robots.txt.
  6. #4
  7. No Profile Picture
    Contributing User
    SEO Chat Explorer (0 - 99 posts)

    Join Date
    Feb 2018
    Posts
    32
    Rep Power
    18
    Playing around with robots.txt is no joke - I would avoid that if possible. Another option would be to consider using noindex tags but that too is dangerous. Perhaps the answer is not to launch the site until you're ready.
  8. #5
  9. No Profile Picture
    Registered User
    SEO Chat Explorer (0 - 99 posts)

    Join Date
    Oct 2018
    Posts
    16
    Rep Power
    0
    Thanks everyone for the feedback. I think that I will go with what KnowOneSpecial recommended.

  10. #6
  11. SEO Since 97
    SEO Chat Mastermind (5000+ posts)

    Join Date
    Mar 2011
    Location
    Arizona
    Posts
    8,764
    Rep Power
    5665
    Originally Posted by vinukum
    The robots.txt file known as the robots exclusion protocol or standard, is a text file that tells web robots which pages on your site to crawl.It also tells web robots which pages not to crawl.
    The question isn't asking for the defination, so I deleted your post.
  12. #7
  13. No Profile Picture
    Registered User
    SEO Chat Explorer (0 - 99 posts)

    Join Date
    Oct 2018
    Posts
    16
    Rep Power
    0
    ppres74 - I wish I could avoid this. We have a wholesale site and an eCommerce site. I want the wholesale site to show up on certain keywords when searched but only the homepage as the other pages if index will compete with our eCommerce site as they are the same other than pricing for the products.

    KnowoneSpecial - Just so that I understand, I would need to add "Header set X-Robots-Tag "noindex, nofollow" to all the pages except for the homepage correct? Where on the page would this be best added? Then for the Robotx I just need to submit this as you stated above?

    Code:
    User-agent:*
    Disallow:
    Sitemap: https://pawholesale.com/sitemap.xml

    We will never want to have Google index the inside pages of the site.

    Thanks.
    Last edited by KnowOneSpecial; Feb 4th, 2019 at 01:01 PM. Reason: Removed active link
  14. #8
  15. No Profile Picture
    Moderator
    SEO Chat Scholar (3000 - 3499 posts)

    Join Date
    Sep 2016
    Location
    USA
    Posts
    3,046
    Rep Power
    3634
    if you are on a linux host.... two lines of code does it.

    Try the first line below, I think I got the pattern correct. if not use the second line and you would have to include a "SetEnvIf" for each folder you have in the url.


    # Set environment variable if this request is not for the home page
    # set the var ROBOTS_NOINDEX
    SetEnvIf Request_URI !^pawholesale\.com$ [NC] ROBOTS_NOINDEX

    # use the next line if you have issues with the above
    #SetEnvIf Request_URI ^/your/example/url/ ROBOTS_NOINDEX

    # Send custom header if environment variable is set
    Header set X-Robots-Tag "noindex, nofollow" ENV=ROBOTS_NOINDEX
    Last edited by KnowOneSpecial; Feb 4th, 2019 at 01:41 PM.
  16. #9
  17. No Profile Picture
    Registered User
    SEO Chat Explorer (0 - 99 posts)

    Join Date
    Oct 2018
    Posts
    16
    Rep Power
    0
    Thanks - We are not on a Linux host. We use Amazon AWS server. Is the code still the same for those? Thanks.
  18. #10
  19. No Profile Picture
    Registered User
    SEO Chat Explorer (0 - 99 posts)

    Join Date
    Oct 2018
    Posts
    16
    Rep Power
    0
    Sorry I got that wrong. We are on Apache Web server owned by Netsuite. Is the code the same for that?
  20. #11
  21. No Profile Picture
    Moderator
    SEO Chat Scholar (3000 - 3499 posts)

    Join Date
    Sep 2016
    Location
    USA
    Posts
    3,046
    Rep Power
    3634
    Yes the code example I provided is for Linux servers. So you should be able to just copy and paste.

Similar Threads

  1. Index file redirected to a different file via 301
    By rocki in forum New User SEO Questions and Answers
    Replies: 1
    Last Post: Dec 16th, 2014, 04:28 PM
  2. When we add any file in robots.txt. then why disallow file display at SERP?
    By smyth in forum New User SEO Questions and Answers
    Replies: 2
    Last Post: Jun 22nd, 2014, 09:16 PM
  3. Replies: 1
    Last Post: Oct 19th, 2012, 03:37 AM
  4. What is sitemap index file? how to create sitemap index file?
    By rohit_tripath60 in forum Google Optimization
    Replies: 4
    Last Post: Jun 19th, 2008, 06:59 AM

IMN logo majestic logo threadwatch logo seochat tools logo