|
|
|||||||||
|
|||||||||
|
|||||||||
| |
||
| |||||||||
![]() |
|
|
«
Previous Thread
|
Next Thread
»
|
Thread Tools | Search this Thread | Rate Thread | Display Modes |
|
#1
|
||||
|
||||
|
another newbie question: What is a robots txt file?
hello. this is my third posting and am new to SEO.
what is a robots txt file? is this necessary to be in the web page can somebody refer me to the previous thread that discussed this already? or perhaps, a URL address that discussed this? thank you. gerardo |
|
#2
|
||||
|
||||
|
That tells robots what to do. Get every file? Don't get certian pages? Don't make a cache? Get index but not other files?
I don't use them because I want all my files to get listed.
__________________
Get Visitors Quick - FREE!!!
SEO Contests - v7ndotcom Elursrebmem - carcasherdotcom seocontest |
|
#3
|
|||
|
|||
|
Shouldn't you have an empty robots.txt file in that case, since it seems to be that not having one, keeps generating 404 errors.
|
|
#4
|
||||
|
||||
|
so the robots txt file then is a set of seperate/special instructions for spiderbots to crawl/leave the pages...
where does one get the source of this file? and yes, about the 404 errors? |
|
#5
|
||||
|
||||
|
hi gerardo!
as i been looking for informations too, on robots txt file > i believe to find a nice source for it: http://www.searchengineworld.com/robots/robots_tutorial.htm SE look first for the robots txt file, so if there don't find it > you will see a 404 error in your logfile (if i understand it right hope it helps stefan |
|
#6
|
||||
|
||||
|
thank you to all those who have responded...
i am beginning to understand a bit of what robots text file does. thanks mauri for the URL you referred me to. that is good resource. i copied and pasted the opening paragraphs for easy reference as i have further things to clarify. here is the copy and my questions will follow: [b] [b]Robots.txt Tutorial Search engines will look in your root domain for a special file named "robots.txt" (http://www.mydomain.com/robots.txt). The file tells the robot (spider) which files it may spider (download). This system is called, The Robots Exclusion Standard. The format for the robots.txt file is special. It consists of records. Each record consists of two fields : a User-agent line and one or more Disallow: lines. The format is: <Field> ":" <value> The robots.txt file should be created in Unix line ender mode! Most good text editors will have a Unix mode or your FTP client *should* do the conversion for you. Do not attempt to use an HTML editor that does not specifically have a text mode to create a robots.txt file. [b][b] basing from the above explanation, - if i did not have any robots.txt file ftp'd to the same folder, will my page(s) still be indexed? - is the robots.txt file NECESSARY (assuming i want everything spidered)? - if i have this meta tag in the HEAD section of my HTML document: <META NAME="ROBOTS" CONTENT="INDEX,FOLLOW"> this would ensure (?) my page getting indexed as logic would say. is this a MUST? - if the above meta tag is used, does this mean a SEPERATE robots.txt file is NOT NEEDED as the meta tag will do the function? kindly bear with me. anybody feel free to respond. and again, thank you. gerardo |
|
#7
|
||||
|
||||
|
If you wish everything to be spidered and indexed etc, you don't need a robots.txt file.
However to stop meaningless entries in the error log you may want a robots.txt file that allows everything. Sample entry: User-agent: * Disallow: This says everyone is disallowed nothing. Cheers,
__________________
theBear |
|
#8
|
||||||
|
||||||
|
Quote:
You absolutely NEED a robot.txt. Also if it's just empty. Some SE's will not go further if the robots.txt is not there. Quote:
YES Quote:
Yes again. Quote:
No. Use both. The robots.txt can do much more - two separate pair of shoes. BUT: please do realise that the robots.txt is not only visible for SE's, but also for anyone who likes to see it. So if you are blocking folders like /Administration or /secret - people easily get to know your pagestrukture and maybe do some harm. That's the reason why I always advise using an empty robots.txt Regards,
__________________
Birthe |
|
#9
|
||||
|
||||
|
Quote:
So where do you put instructions to exclude directories? HWG www.dbdinc.com |
|
#10
|
||||
|
||||
|
Quote:
You would use the robots.txt for it. But realise the above. And then realise, not all spiders "listen" to the robots.txt Thereabove, if you haven't linked your to-be-excluded directories on your -normal accessible- pages, the spider will not find to it anyway. Reg, |
|
#11
|
||||
|
||||
|
Quote:
If that's the case the bot is busted and should be reported to its owner. Cheers. |
|
#12
|
||||
|
||||
|
gerado
take Notepad: type: User-agent: * Disallow: and save it as robots Text Document = txt and upload it in you puplic html where your have your index page. this file is a all robots to visit all files because the wildcard "*" specifies all robots. stuijts, say that : BUT: please do realise that the robots.txt is not only visible for SE's, but also for anyone who likes to see it. So if you are blocking folders like /Administration or /secret - people easily get to know your pagestrukture and maybe do some harm. That's the reason why I always advise using an empty robots.txt i'm not to sure in this > as other people say > the blank robots txt file make some robots see it as not spidering. but a stuijts say's too> is that people can see your page structure, but with the wildcard you are ok. so i believe > a wildcard> helps you reducing the error logfile for robots text. but otherwise > it's only needfull> if you want some pages not to get spidert. regards hope it helps mauri |
|
#13
|
||||
|
||||
|
Quote:
You can also trap request in the server exits (some servers) and block them that way. Cheers, |
|
#14
|
||||
|
||||
|
thank you everybody. you are all kind and generous with your responses and advice.
thank you. gerardo |
![]() |
| Viewing: SEO Chat Forums > Other > HTML Coding > another newbie question: What is a robots txt file? |
| Thread Tools | Search this Thread |
| Display Modes | Rate This Thread |
|
|