HTML Coding
 
Forums: » Register « |  User CP |  Games |  Calendar |  Members |  FAQs |  Sitemap |  Support | 
 
User Name:
Password:
Remember me
Go Back   SEO Chat ForumsOtherHTML Coding

Reply
Add This Thread To:
  Del.icio.us   Digg   Google   Spurl   Blink   Furl   Simpy   Y! MyWeb 
Thread Tools Search this Thread Rate Thread Display Modes
 
Unread SEO Chat Forums Sponsor:
  #1  
Old July 9th, 2004, 11:03 AM
timhagger timhagger is offline
Registered User
SEO Chat Newbie (0 - 499 posts)
 
Join Date: Jul 2004
Posts: 15 timhagger User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 12 m 27 sec
Reputation Power: 0
Why do this with Robots.txt

I am researching some companies that come in the top search results and
noticed that one of them had this in their robots.txt file:
User-agent:*
Disallow: /google/
Disallow: /mirago/
Disallow: /overture/
Disallow: /looksmart/
Disallow: /looksmart2/
Disallow: /yell/
Disallow: /thomson/
Disallow: /discounts/

Could anyone explain why there would be a reason for this...
i know that it is to stop engines from crawling the directories,
i just found it interesting that they related to search engine names
themselves - not some dark secret robot trick is it?

Also what is the correct syntax for a robots.txt file to allow all
engines to crawl you site... one of the posts in here said it was:

User-agent: *
Disallow:

But this resulted in my site NOT being crawled....
(i.e. no title and description in google now...!!!)
Comments on this post
Jasontnyc disagrees!

Last edited by Jasontnyc : January 27th, 2005 at 06:33 AM. Reason: testing reputation system - please ignore

Reply With Quote
  #2  
Old July 9th, 2004, 11:07 AM
Mano70's Avatar
Mano70 Mano70 is offline
<- Solan Gundersen
SEO Chat Novice (500 - 999 posts)
 
Join Date: Jun 2004
Location: C:\Norway
Posts: 756 Mano70 User rank is Corporal (100 - 500 Reputation Level)Mano70 User rank is Corporal (100 - 500 Reputation Level)Mano70 User rank is Corporal (100 - 500 Reputation Level)Mano70 User rank is Corporal (100 - 500 Reputation Level) 
Time spent in forums: 1 Week 3 Days 1 h 42 m 42 sec
Reputation Power: 7
See this post, I gave you the answer there with link's, the advice you got was wrong.
http://forums.seochat.com/showthrea...96078#post96078

User-agent: *
Disallow:

tells the spider that it can spider whatever it wan't to.

Reply With Quote
  #3  
Old July 10th, 2004, 04:37 AM
ih8google ih8google is offline
Posts: 20, 574 (I wish!)
SEO Chat Novice (500 - 999 posts)
 
Join Date: Jun 2004
Posts: 764 ih8google User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: < 1 sec
Reputation Power: 6
Angry Cheater...

Quote:
Originally Posted by timhagger
I am researching some companies that come in the top search results and
noticed that one of them had this in their robots.txt file:
User-agent:*
Disallow: /google/
Disallow: /mirago/
Disallow: /overture/
Disallow: /looksmart/
Disallow: /looksmart2/
Disallow: /yell/
Disallow: /thomson/
Disallow: /discounts/

Could anyone explain why there would be a reason for this...
i know that it is to stop engines from crawling the directories,
i just found it interesting that they related to search engine names
themselves - not some dark secret robot trick is it?


yep, a cheater if i couldnt guess....

on Index.html or whatever its prob like this...

PHP Code:
<? 
if(eregi("google",$HTTP_USER_AGENT)) 
{
include(
"google/index.html");
} else {
include(
"site/index.html");
}
?>


So it probably is
Quote:
some dark secret robot trick



Dan

p.s report it to Google ;)

Reply With Quote
  #4  
Old October 19th, 2004, 05:25 PM
yorganic yorganic is offline
Registered User
SEO Chat Newbie (0 - 499 posts)
 
Join Date: Oct 2004
Location: Manchester, Lancashire, UK
Posts: 8 yorganic User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 1 h 2 m 59 sec
Reputation Power: 0
Tim, I noticed no-one had provided you with what would appear be the proper answer and, that is;

All the engines listed provide some form of PPC service. By listing all of those engines as pages or directories on your site then it allows the webmaster to monitor what hits are being received from each of the respective PPC services and analyse them via a log file analyser tool. The pages within those directories would then be copies of your home page or redirects to your home page so the user is non the wiser and is not distracted from their experience.

One of the problems of PPC is that they all provide different reporting systems so if you can monitor what is hitting your site using software under your own control then you can achieve a couple of objectives such as validate what the PPC services are charging you and monitor for competitors stealing your budget by clicking your PPC adverts as well as time of day/week benefits.

If you run such a system to monitor what is being received via PPC the last thing you want is search bot to crawl those pages too. Hence the reason for the robots.txt file on those directories.

Last edited by dazzlindonna : October 19th, 2004 at 05:44 PM. Reason: no fake sigs allowed

Reply With Quote
  #5  
Old October 20th, 2004, 03:23 AM
timhagger timhagger is offline
Registered User
SEO Chat Newbie (0 - 499 posts)
 
Join Date: Jul 2004
Posts: 15 timhagger User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 12 m 27 sec
Reputation Power: 0
Interesting, but i can think of much better ways of tracking PPC without having to replicate content.... thanks for the reply though.

Reply With Quote
Reply

Viewing: SEO Chat ForumsOtherHTML Coding > Why do this with Robots.txt


Thread Tools  Search this Thread 
Search this Thread:

Advanced Search
Display Modes  Rate This Thread 
Rate This Thread:


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
View Your Warnings | New Posts | Latest News | Latest Threads | Shoutbox
Forum Jump



 Free IT White Papers!
 
How to Present Effectively Online
This white paper offers practical and actionable advice on the key steps that any presenter should consider as they plan and execute a Webinar or online meeting.

Request Your Free Technology Downloads!
 
Open Source Security Myths
Open Source Software (OSS) is computer software whose source code is available to the general public with relaxed or non-existent intellectual property restrictions (or arrangement such as the public domain), and is usually developed with the input of many contributors.

Request Your Free Technology Downloads!
 
Power and Cooling Capacity Management for Data Centers
This paper describes the principles for achieving power and cooling capacity management.

Request Your Free Technology Downloads!
 
Scalable, Fault-Tolerant NAS for Oracle - The Next Generation
For several years NAS has been evolving as a storage alternative for Oracle databases, and for good reason: NAS is quite often the simplest, most cost-effective storage approach for Oracle. Learn about the benefits that HP's approach to scalable NAS brings to Oracle environments in this comprehensive white paper.

Request Your Free Technology Downloads!
 
Understanding Web Application Security Challenges
This white paper discusses many common threats and preventive measures for Web application security, and explains what you can do to help protect your organization.

Request Your Free Technology Downloads!
 

Forums: » Register « |  User CP |  Games |  Calendar |  Members |  FAQs |  Sitemap |  Support | 
  
 





© 2003-2009 by Developer Shed. All rights reserved. DS Cluster 2 hosted by Hostway
Stay green...Green IT