- Total Members: 263,786
- Threads: 454,030
- Posts: 1,062,515
Great community. Great ideas.
Welcome to SEOChat, a community dedicated to helping beginners and professionals alike in improving their Search Engine Optimization knowledge. Sign up today to gain access to the combined insight of tens of thousands of members.
-
Jan 4th, 2013, 01:48 AM
#1
is robots.txt required in my case ?
I want web spiders crawl the whole website and i have no xml sitemap to mention in it. The only thing i need is to block bad bots, if they are bad means they will bypass robots.txt. Is robots.txt required in this case.
Second question whether genuine web spiders like googlebot will look for robots.txt if there isn't a robots.txt available in the site. I need to differentiate the bad bots with genuine spiders.
-
Jan 4th, 2013, 01:55 AM
#2
What sort of 'bad bots' are you concerned about?
-
Jan 4th, 2013, 02:38 AM
#3

Originally Posted by
dzine
What sort of 'bad bots' are you concerned about?
The bots which will eat your bandwidth unnecessarily.........
-
Jan 4th, 2013, 05:25 AM
#4
Do you have any examples? I have found that there are hardly any bots like that, that DO obey robots.txt
Also, I must admit that I'm not all that concerned with my bandwidth 
If you want to limit crawling to only a small number of bots, say: Google + Bing/Yahoo + Ask + Baidu + Yandex, then you could do that with a robots.txt file. But it still wouldn't keep the 'bad bots' out -- if said bots ignore the robots.txt file.
You'd have to check each one.
What you could do is:
- check your logs which bot is eating lots of your bandwidth and doesn't send you any good visitors
- check online if that bot obeys the robots.txt protocol
- if so: block it specifically
- if not: try to block its IP address(es) using .htaccess ("allow/deny") or something similar
-
Jan 4th, 2013, 06:02 AM
#5
i think only robots.txt is not sure for whole website...
-
Jan 4th, 2013, 07:21 AM
#6
Unknown robot (identified by 'bot/' or 'bot-'), Unknown robot (identified by 'robot'), Unknown robot (identified by 'spider'), Unknown robot (identified by 'crawl'). I have no idea who are these and how to block them in robots.txt file.
Its a hell out of job to analyze log file and find the bad bot. I have some issues that i will post in new thread.
Also whats the alternative of .htaccess in windows server i need to discuss.
-
Jan 4th, 2013, 07:24 AM
#7
But my original question still remains is robots.txt is required in my case. I am speaking about bad bots not google, yahoo, ask
-
Jan 4th, 2013, 08:38 AM
#8
Only you can tell if it's necessary.
Personally I wouldn't bother. But if you really want to keep ALL bots out EXCEPT for a few trusted ones, then by all means do so using robots.txt
-
Jan 4th, 2013, 11:00 AM
#9
can i block these in robots.txt:
Unknown robot (identified by 'bot/' or 'bot-'), Unknown robot (identified by 'robot'), Unknown robot (identified by 'spider'), Unknown robot (identified by 'crawl')
-
Jan 4th, 2013, 04:33 PM
#10
I think you cannot. As far as I know, one cannot use 'wildcards' for parts of the bots' names.
How much bandwidth are these bots costing your by the way?
-
Jan 5th, 2013, 12:23 AM
#11

Originally Posted by
dzine
I think you cannot. As far as I know, one cannot use 'wildcards' for parts of the bots' names.
How much bandwidth are these bots costing your by the way?
For December 2012 (1 month):
Unknown robot (identified by 'bot/' or 'bot-') 6994(Hits) 80.34 MB(Bandwidth) 25 Dec 2012 - 00:52(Last Visit)
site is hosted on a windows shared hosting server....
Similar Threads
-
By webkul in forum Google Optimization
Replies: 2
Last Post: Jul 6th, 2011, 07:01 AM
-
By RSS_News_User in forum Technology News
Replies: 0
Last Post: Mar 22nd, 2011, 12:01 PM
-
By peterson in forum SEO Help (General Chat)
Replies: 4
Last Post: Mar 10th, 2011, 06:07 AM
-
By SEO Chat in forum SEO Chat Articles
Replies: 1
Last Post: Jan 13th, 2010, 12:47 PM
-
By godwin in forum SEO Help (General Chat)
Replies: 0
Last Post: Oct 26th, 2005, 07:31 AM