HTML Coding
 
Forums: » Register « |  User CP |  Games |  Calendar |  Members |  FAQs |  Sitemap |  Support | 
 
 
User Name:
Password:
Remember me
Go Back   SEO Chat ForumsOtherHTML Coding

Reply
Add This Thread To:
  Del.icio.us   Digg   Google   Spurl   Blink   Furl   Simpy   Y! MyWeb 
Thread Tools Search this Thread Rate Thread Display Modes
 
Unread SEO Chat Forums Sponsor:
  #1  
Old April 14th, 2003, 08:04 PM
gerardo's Avatar
gerardo gerardo is offline
Contributing User
SEO Chat Newbie (0 - 499 posts)
 
Join Date: Apr 2003
Posts: 46 gerardo User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: < 1 sec
Reputation Power: 6
another newbie question: What is a robots txt file?

hello. this is my third posting and am new to SEO.

what is a robots txt file?

is this necessary to be in the web page : or ftp'd as a seperate page : in the same folder so a site can be seen by the googlebots in the deepcrawl (in my microsite's case)?

can somebody refer me to the previous thread that discussed this already? or perhaps, a URL address that discussed this?

thank you.



gerardo

Reply With Quote
  #2  
Old April 14th, 2003, 10:05 PM
Nintendo's Avatar
Nintendo Nintendo is offline
King of da Wackos
SEO Chat Novice (500 - 999 posts)
 
Join Date: Jan 2003
Location: Planet Zeekois
Posts: 680 Nintendo User rank is Private First Class (20 - 50 Reputation Level)Nintendo User rank is Private First Class (20 - 50 Reputation Level) 
Time spent in forums: 1 Day 2 h 37 m 30 sec
Reputation Power: 6
That tells robots what to do. Get every file? Don't get certian pages? Don't make a cache? Get index but not other files?

I don't use them because I want all my files to get listed.

Reply With Quote
  #3  
Old April 15th, 2003, 01:20 AM
seo_rat seo_rat is offline
Contributing User
SEO Chat Newbie (0 - 499 posts)
 
Join Date: Mar 2003
Posts: 398 seo_rat User rank is Private First Class (20 - 50 Reputation Level)seo_rat User rank is Private First Class (20 - 50 Reputation Level) 
Time spent in forums: 23 h 44 m 6 sec
Reputation Power: 6
Shouldn't you have an empty robots.txt file in that case, since it seems to be that not having one, keeps generating 404 errors.

Reply With Quote
  #4  
Old April 15th, 2003, 08:36 AM
gerardo's Avatar
gerardo gerardo is offline
Contributing User
SEO Chat Newbie (0 - 499 posts)
 
Join Date: Apr 2003
Posts: 46 gerardo User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: < 1 sec
Reputation Power: 6
so the robots txt file then is a set of seperate/special instructions for spiderbots to crawl/leave the pages...

where does one get the source of this file? and yes, about the 404 errors?

Reply With Quote
  #5  
Old April 15th, 2003, 08:50 AM
mauri's Avatar
mauri mauri is offline
Contributing User
SEO Chat Newbie (0 - 499 posts)
 
Join Date: Apr 2003
Location: Mauritius
Posts: 82 mauri User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 18 m 58 sec
Reputation Power: 6
hi gerardo!
as i been looking for informations too, on robots txt file > i believe to find a nice source for it:
http://www.searchengineworld.com/robots/robots_tutorial.htm

SE look first for the robots txt file, so if there don't find it > you will see a 404 error in your logfile (if i understand it right )
hope it helps
stefan

Reply With Quote
  #6  
Old April 15th, 2003, 01:20 PM
gerardo's Avatar
gerardo gerardo is offline
Contributing User
SEO Chat Newbie (0 - 499 posts)
 
Join Date: Apr 2003
Posts: 46 gerardo User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: < 1 sec
Reputation Power: 6
thank you to all those who have responded...

i am beginning to understand a bit of what robots text file does. thanks mauri for the URL you referred me to. that is good resource.

i copied and pasted the opening paragraphs for easy reference as i have further things to clarify. here is the copy and my questions will follow:


[b]

[b]Robots.txt Tutorial


Search engines will look in your root domain for a special file named "robots.txt" (http://www.mydomain.com/robots.txt). The file tells the robot (spider) which files it may spider (download). This system is called, The Robots Exclusion Standard.

The format for the robots.txt file is special. It consists of records. Each record consists of two fields : a User-agent line and one or more Disallow: lines. The format is:


<Field> ":" <value>


The robots.txt file should be created in Unix line ender mode! Most good text editors will have a Unix mode or your FTP client *should* do the conversion for you. Do not attempt to use an HTML editor that does not specifically have a text mode to create a robots.txt file.

[b][b]


basing from the above explanation,

- if i did not have any robots.txt file ftp'd to the same folder, will my page(s) still be indexed?

- is the robots.txt file NECESSARY (assuming i want everything spidered)?

- if i have this meta tag in the HEAD section of my HTML document:

<META NAME="ROBOTS" CONTENT="INDEX,FOLLOW">

this would ensure (?) my page getting indexed as logic would say. is this a MUST?

- if the above meta tag is used, does this mean a SEPERATE robots.txt file is NOT NEEDED as the meta tag will do the function?


kindly bear with me. anybody feel free to respond. and again, thank you.



gerardo

Reply With Quote
  #7  
Old April 15th, 2003, 01:29 PM
theBear's Avatar
theBear theBear is offline
Contributing User
SEO Chat Novice (500 - 999 posts)
 
Join Date: Mar 2003
Location: Maine USA
Posts: 524 theBear User rank is Private First Class (20 - 50 Reputation Level)theBear User rank is Private First Class (20 - 50 Reputation Level) 
Time spent in forums: 25 m 38 sec
Reputation Power: 6
Send a message via AIM to theBear
If you wish everything to be spidered and indexed etc, you don't need a robots.txt file.

However to stop meaningless entries in the error log you may want a robots.txt file that allows everything.

Sample entry:

User-agent: *
Disallow:


This says everyone is disallowed nothing.

Cheers,
__________________
theBear

Reply With Quote
  #8  
Old April 15th, 2003, 01:37 PM
stuijts's Avatar
stuijts stuijts is offline
Contributing User
SEO Chat Newbie (0 - 499 posts)
 
Join Date: Mar 2003
Location: Kerken, Germany
Posts: 115 stuijts User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 1 m 10 sec
Reputation Power: 6
Quote:
- if i did not have any robots.txt file ftp'd to the same folder, will my page(s) still be indexed?

You absolutely NEED a robot.txt. Also if it's just empty. Some SE's will not go further if the robots.txt is not there.

Quote:
- is the robots.txt file NECESSARY (assuming i want everything spidered)?

YES

Quote:
- if i have this meta tag in the HEAD section of my HTML document:
<META NAME="ROBOTS" CONTENT="INDEX,FOLLOW">
this would ensure (?) my page getting indexed as logic would say. is this a MUST?

Yes again.

Quote:
- if the above meta tag is used, does this mean a SEPERATE robots.txt file is NOT NEEDED as the meta tag will do the function?

No. Use both. The robots.txt can do much more - two separate pair of shoes.

BUT: please do realise that the robots.txt is not only visible for SE's, but also for anyone who likes to see it. So if you are blocking folders like /Administration or /secret - people easily get to know your pagestrukture and maybe do some harm.
That's the reason why I always advise using an empty robots.txt

Regards,
__________________
Birthe

Reply With Quote
  #9  
Old April 15th, 2003, 02:17 PM
hgoldman's Avatar
hgoldman hgoldman is offline
Contributing User
SEO Chat Newbie (0 - 499 posts)
 
Join Date: Mar 2003
Location: Chicago
Posts: 39 hgoldman User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: < 1 sec
Reputation Power: 6
Quote:
That's the reason why I always advise using an empty robots.txt


So where do you put instructions to exclude directories?


HWG
www.dbdinc.com

Reply With Quote
  #10  
Old April 15th, 2003, 02:20 PM
stuijts's Avatar
stuijts stuijts is offline
Contributing User
SEO Chat Newbie (0 - 499 posts)
 
Join Date: Mar 2003
Location: Kerken, Germany
Posts: 115 stuijts User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 1 m 10 sec
Reputation Power: 6
Quote:
So where do you put instructions to exclude directories?

You would use the robots.txt for it.

But realise the above.
And then realise, not all spiders "listen" to the robots.txt

Thereabove, if you haven't linked your to-be-excluded directories on your -normal accessible- pages, the spider will not find to it anyway.

Reg,

Reply With Quote
  #11  
Old April 15th, 2003, 02:22 PM
theBear's Avatar
theBear theBear is offline
Contributing User
SEO Chat Novice (500 - 999 posts)
 
Join Date: Mar 2003
Location: Maine USA
Posts: 524 theBear User rank is Private First Class (20 - 50 Reputation Level)theBear User rank is Private First Class (20 - 50 Reputation Level) 
Time spent in forums: 25 m 38 sec
Reputation Power: 6
Send a message via AIM to theBear
Quote:
Originally posted by "stuijts"

Quote:
- if i did not have any robots.txt file ftp'd to the same folder, will my page(s) still be indexed?

You absolutely NEED a robot.txt. Also if it's just empty. Some SE's will not go further if the robots.txt is not there.


If that's the case the bot is busted and should be reported to its owner.

Cheers.

Reply With Quote
  #12  
Old April 15th, 2003, 02:22 PM
mauri's Avatar
mauri mauri is offline
Contributing User
SEO Chat Newbie (0 - 499 posts)
 
Join Date: Apr 2003
Location: Mauritius
Posts: 82 mauri User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 18 m 58 sec
Reputation Power: 6
gerado
take Notepad:
type:

User-agent: *
Disallow:

and save it as robots Text Document = txt

and upload it in you puplic html where your have your index page.

this file is a all robots to visit all files because the wildcard "*" specifies all robots.


stuijts, say that :
BUT: please do realise that the robots.txt is not only visible for SE's, but also for anyone who likes to see it. So if you are blocking folders like /Administration or /secret - people easily get to know your pagestrukture and maybe do some harm.
That's the reason why I always advise using an empty robots.txt

i'm not to sure in this > as other people say > the blank robots txt file make some robots see it as not spidering.

but a stuijts say's too> is that people can see your page structure, but with the wildcard you are ok.
so i believe > a wildcard> helps you reducing the error logfile for robots text. but otherwise > it's only needfull> if you want some pages not to get spidert.
regards
hope it helps
mauri

Reply With Quote
  #13  
Old April 15th, 2003, 02:24 PM
theBear's Avatar
theBear theBear is offline
Contributing User
SEO Chat Novice (500 - 999 posts)
 
Join Date: Mar 2003
Location: Maine USA
Posts: 524 theBear User rank is Private First Class (20 - 50 Reputation Level)theBear User rank is Private First Class (20 - 50 Reputation Level) 
Time spent in forums: 25 m 38 sec
Reputation Power: 6
Send a message via AIM to theBear
Quote:
Originally posted by "stuijts"

Quote:
So where do you put instructions to exclude directories?

You would use the robots.txt for it.

But realise the above.
And then realise, not all spiders "listen" to the robots.txt

Thereabove, if you haven't linked your to-be-excluded directories on your -normal accessible- pages, the spider will not find to it anyway.

Reg,


You can also trap request in the server exits (some servers) and block them that way.

Cheers,

Reply With Quote
  #14  
Old April 15th, 2003, 06:21 PM
gerardo's Avatar
gerardo gerardo is offline
Contributing User
SEO Chat Newbie (0 - 499 posts)
 
Join Date: Apr 2003
Posts: 46 gerardo User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: < 1 sec
Reputation Power: 6
thank you everybody. you are all kind and generous with your responses and advice.

thank you.



gerardo

Reply With Quote
Reply

Viewing: SEO Chat ForumsOtherHTML Coding > another newbie question: What is a robots txt file?


Thread Tools  Search this Thread 
Search this Thread:

Advanced Search
Display Modes  Rate This Thread 
Rate This Thread:


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
View Your Warnings | New Posts | Latest News | Latest Threads | Shoutbox
Forum Jump