|
|
|||||||||
|
|||||||||
|
|||||||||
| |
||
| |||||||||
![]() |
|
|
«
Previous Thread
|
Next Thread
»
|
Thread Tools | Search this Thread | Rate Thread | Display Modes |
|
#1
|
|||
|
|||
|
robots.txt help
If I want to ONLY exclude my cgi bin from
being spidered, would I just create a file called robots.txt with this in it: User-agent: * Disallow: /cgi-bin/ Is this correct? would this still allow every other page to be spidered freely? |
|
#2
|
||||
|
||||
|
Yup, you are absolutely correct, no need to do any more than that. (Just put your robots.txt file in the root of your domain.)
P.S.: keep in mind that this WON'T deter any of the "bad" bots, nor email harvesters.... |
|
#3
|
||||
|
||||
|
Looks like a good place to drop this question.
Do the spiders honor the DISALLOW statements and completely ignore all items ? For example, if you are running an affiliate site and want to prevent any PR leakage out, can the outbound link pages be in a disallowed directory and GBot will not even access these pages, or will it read the pages for PR purposes but not for SERPs ? Would noindex or nofollow achieve the same results ? |
|
#4
|
||||
|
||||
|
Gbot is very civilised. It will obey all the rules you describe, including <meta ... noindex,nofollow> instead of a robots.txt file. Provided you don't make mistakes, or provide dubious info, e.g. a robots.txt with both User-agent: * and User-agent: googlebot in it. G-bot might interpret that in a way you wouldn't expect.
|
|
#5
|
||||
|
||||
|
Thanks for the response.
The reason for asking was that my logs show spiders hitting through links on pages meta tagged as noindex, nofollow. I have since moved the outbound link pages into a disallowed directory and the hits have stopped. From this experience, they may not index a noindex page, but may consider the content and outbound links for PR consideration. And appear not to even load a page in a disallowed. |
|
#6
|
|||
|
|||
|
what if you dont place this robot.txt file then what happens ? I noticed
in my traffic analyzer a reference to robot.txt as a refer ??? like i would see when someone found me with a keyword on yahoo ? |
|
#7
|
||||
|
||||
|
Hey
I am pretty sure if you have a dissallow on a directory the spiders still follow the links but don't index them. This is usualy for member only content or other content you don't want made public. Some bots ignor it toaly. All good bots will reuest a robotx.txt file from the root of your domain - if you don't have one it will simply generate a 404 error on the server and will index the site regardless. You can use .htaccess to block certain IP's and user agents. You could also cloak some pages if you realy wanted it not to be found but that could get you into trouble and I don't recomend it. There is no downside to not having the file - it won't make any difference to search engines - its just like them ringing the bell to see if they are allowed in - no robots.txt file and you have left the door open for them. It won't effect SERPs or indexing in the slightest. It can give you more control if you require it. |
![]() |
| Viewing: SEO Chat Forums > Search Engine Strategies > Search Engine Optimization > robots.txt help |
| Thread Tools | Search this Thread |
| Display Modes | Rate This Thread |
|
|
|
|