Search Engine Optimization
 
Forums: » Register « |  User CP |  Games |  Calendar |  Members |  FAQs |  Sitemap |  Support | 
 
 
User Name:
Password:
Remember me
Go Back   SEO Chat ForumsSearch Engine StrategiesSearch Engine Optimization

Reply
Add This Thread To:
  Del.icio.us   Digg   Google   Spurl   Blink   Furl   Simpy   Y! MyWeb 
Thread Tools Search this Thread Rate Thread Display Modes
 
Unread SEO Chat Forums Sponsor:
  #1  
Old November 15th, 2004, 02:09 PM
mphuneko mphuneko is offline
Registered User
SEO Chat Newbie (0 - 499 posts)
 
Join Date: Nov 2004
Posts: 12 mphuneko User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: < 1 sec
Reputation Power: 0
robots.txt help

If I want to ONLY exclude my cgi bin from
being spidered,

would I just create a file called robots.txt with this in it:

User-agent: *
Disallow: /cgi-bin/

Is this correct?
would this still allow every other page to be spidered freely?

Reply With Quote
  #2  
Old November 15th, 2004, 02:22 PM
Wit's Avatar
Wit Wit is offline
http://tinyurl.com/cz56g
SEO Chat God 2nd Plane (6000 - 6499 posts)
 
Join Date: Sep 2004
Location: D0RDRECHT NL
Posts: 6,065 Wit User rank is Sergeant (500 - 2000 Reputation Level)Wit User rank is Sergeant (500 - 2000 Reputation Level)Wit User rank is Sergeant (500 - 2000 Reputation Level)Wit User rank is Sergeant (500 - 2000 Reputation Level)Wit User rank is Sergeant (500 - 2000 Reputation Level) 
Time spent in forums: 2 Months 6 Days 10 h 52 m 26 sec
Reputation Power: 18
Yup, you are absolutely correct, no need to do any more than that. (Just put your robots.txt file in the root of your domain.)

P.S.: keep in mind that this WON'T deter any of the "bad" bots, nor email harvesters....

Reply With Quote
  #3  
Old November 15th, 2004, 03:03 PM
4Comparison's Avatar
4Comparison 4Comparison is offline
Punkawalla
SEO Chat Newbie (0 - 499 posts)
 
Join Date: Nov 2004
Location: Ohio
Posts: 170 4Comparison User rank is Private First Class (20 - 50 Reputation Level)4Comparison User rank is Private First Class (20 - 50 Reputation Level) 
Time spent in forums: 6 Days 6 h 55 m 33 sec
Reputation Power: 5
Question Another Robots question.

Looks like a good place to drop this question.

Do the spiders honor the DISALLOW statements and completely ignore all items ?

For example, if you are running an affiliate site and want to prevent any PR leakage out, can the outbound link pages be in a disallowed directory and GBot will not even access these pages, or will it read the pages for PR purposes but not for SERPs ?

Would noindex or nofollow achieve the same results ?

Reply With Quote
  #4  
Old November 15th, 2004, 05:38 PM
Wit's Avatar
Wit Wit is offline
http://tinyurl.com/cz56g
SEO Chat God 2nd Plane (6000 - 6499 posts)
 
Join Date: Sep 2004
Location: D0RDRECHT NL
Posts: 6,065 Wit User rank is Sergeant (500 - 2000 Reputation Level)Wit User rank is Sergeant (500 - 2000 Reputation Level)Wit User rank is Sergeant (500 - 2000 Reputation Level)Wit User rank is Sergeant (500 - 2000 Reputation Level)Wit User rank is Sergeant (500 - 2000 Reputation Level) 
Time spent in forums: 2 Months 6 Days 10 h 52 m 26 sec
Reputation Power: 18
Gbot is very civilised. It will obey all the rules you describe, including <meta ... noindex,nofollow> instead of a robots.txt file. Provided you don't make mistakes, or provide dubious info, e.g. a robots.txt with both User-agent: * and User-agent: googlebot in it. G-bot might interpret that in a way you wouldn't expect.

Reply With Quote
  #5  
Old November 16th, 2004, 12:31 PM
4Comparison's Avatar
4Comparison 4Comparison is offline
Punkawalla
SEO Chat Newbie (0 - 499 posts)
 
Join Date: Nov 2004
Location: Ohio
Posts: 170 4Comparison User rank is Private First Class (20 - 50 Reputation Level)4Comparison User rank is Private First Class (20 - 50 Reputation Level) 
Time spent in forums: 6 Days 6 h 55 m 33 sec
Reputation Power: 5
Thanks for the response.
The reason for asking was that my logs show spiders hitting through links on pages meta tagged as noindex, nofollow.
I have since moved the outbound link pages into a disallowed directory and the hits have stopped.
From this experience, they may not index a noindex page, but may consider the content and outbound links for PR consideration.
And appear not to even load a page in a disallowed.

Reply With Quote
  #6  
Old November 25th, 2004, 02:38 AM
mhdeaton mhdeaton is offline
Registered User
SEO Chat Newbie (0 - 499 posts)
 
Join Date: Nov 2004
Posts: 14 mhdeaton User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: < 1 sec
Reputation Power: 0
what if you dont place this robot.txt file then what happens ? I noticed

in my traffic analyzer a reference to robot.txt as a refer ??? like i would see when someone
found me with a keyword on yahoo ?

Reply With Quote
  #7  
Old November 25th, 2004, 05:09 AM
tstolber's Avatar
tstolber tstolber is offline
Contributing User
SEO Chat Frequenter (2500 - 2999 posts)
 
Join Date: Jul 2004
Location: Bedfordshire
Posts: 2,721 tstolber User rank is Sergeant Major (2000 - 5000 Reputation Level)tstolber User rank is Sergeant Major (2000 - 5000 Reputation Level)tstolber User rank is Sergeant Major (2000 - 5000 Reputation Level)tstolber User rank is Sergeant Major (2000 - 5000 Reputation Level)tstolber User rank is Sergeant Major (2000 - 5000 Reputation Level)tstolber User rank is Sergeant Major (2000 - 5000 Reputation Level) 
Time spent in forums: 2 Weeks 11 h 6 m 7 sec
Reputation Power: 28
Send a message via MSN to tstolber Send a message via Google Talk to tstolber Send a message via Skype to tstolber
Hey

I am pretty sure if you have a dissallow on a directory the spiders still follow the links but don't index them.
This is usualy for member only content or other content you don't want made public.

Some bots ignor it toaly. All good bots will reuest a robotx.txt file from the root of your domain - if you don't have one it will simply generate a 404 error
on the server and will index the site regardless.

You can use .htaccess to block certain IP's and user agents. You could also cloak some pages if you realy wanted it not to be found but that could get you into trouble and I don't recomend it.

There is no downside to not having the file - it won't make any difference to search engines - its just like them ringing the bell to see if they are allowed in - no robots.txt file and you have left the door open for them. It won't effect SERPs or indexing in the slightest. It can give you more control if you require it.

Reply With Quote
Reply

Viewing: SEO Chat ForumsSearch Engine StrategiesSearch Engine Optimization > robots.txt help


Thread Tools  Search this Thread 
Search this Thread:

Advanced Search
Display Modes  Rate This Thread 
Rate This Thread:


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
View Your Warnings | New Posts | Latest News | Latest Threads | Shoutbox
Forum Jump


Forums: » Register « |  User CP |  Games |  Calendar |  Members |  FAQs |  Sitemap |  Support | 
  
 





© 2003-2008 by Developer Shed. All rights reserved. DS Cluster 4 hosted by Hostway
Stay green...Green IT