Search Engine Spiders
 
Forums: » Register « |  User CP |  Games |  Calendar |  Members |  FAQs |  Sitemap |  Support | 
 
User Name:
Password:
Remember me
Go Back   SEO Chat ForumsSearch Engine StrategiesSearch Engine Spiders

Reply
Add This Thread To:
  Del.icio.us   Digg   Google   Spurl   Blink   Furl   Simpy   Y! MyWeb 
Thread Tools Search this Thread Rate Thread Display Modes
 
Unread SEO Chat Forums Sponsor:
  #1  
Old January 28th, 2004, 03:14 PM
pk_synths's Avatar
pk_synths pk_synths is offline
Contributing User
SEO Chat Specialist (4000 - 4499 posts)
 
Join Date: May 2003
Location: Chicago
Posts: 4,109 pk_synths User rank is Corporal (100 - 500 Reputation Level)pk_synths User rank is Corporal (100 - 500 Reputation Level)pk_synths User rank is Corporal (100 - 500 Reputation Level)pk_synths User rank is Corporal (100 - 500 Reputation Level) 
Time spent in forums: 5 Days 17 h 45 m 3 sec
Reputation Power: 13
Send a message via AIM to pk_synths Send a message via Yahoo to pk_synths
Robots.txt question

Okay here's a brain teaser (or not) I have added some affiliates to my site and use subdomains to seperate each affiliate's layout and for tracking purposes. Now problem is that I noticed some are getting indexed and thats not good. What would you recommend to be the best way to get everypage in a subdomain disallowed from spiders??? These are virtual subdomains BTW.


Thanks for all the help people

<- scary a$$ icon
__________________
-PK

Litigation Support

Reply With Quote
  #2  
Old January 28th, 2004, 05:31 PM
relaxzoolander's Avatar
relaxzoolander relaxzoolander is offline
web designer
SEO Chat Frequenter (2500 - 2999 posts)
 
Join Date: Aug 2003
Location: designing a web site in columbus ohio
Posts: 2,996 relaxzoolander User rank is Sergeant (500 - 2000 Reputation Level)relaxzoolander User rank is Sergeant (500 - 2000 Reputation Level)relaxzoolander User rank is Sergeant (500 - 2000 Reputation Level)relaxzoolander User rank is Sergeant (500 - 2000 Reputation Level)relaxzoolander User rank is Sergeant (500 - 2000 Reputation Level) 
Time spent in forums: 4 h 42 m
Reputation Power: 15
Post

dont use that eggs and hot dog smilie....you know how much i hate it.
[i feel the rage building up inside of me.....]

Reply With Quote
  #3  
Old January 29th, 2004, 01:25 PM
PaulS PaulS is offline
Contributing User
SEO Chat Newbie (0 - 499 posts)
 
Join Date: Jan 2004
Location: Brighton, UK
Posts: 115 PaulS User rank is Private First Class (20 - 50 Reputation Level)PaulS User rank is Private First Class (20 - 50 Reputation Level) 
Time spent in forums: 1 Day 7 h 36 m 58 sec
Reputation Power: 7
Quote:
Originally Posted by pk_synths
Okay here's a brain teaser (or not) I have added some affiliates to my site and use subdomains to seperate each affiliate's layout and for tracking purposes. Now problem is that I noticed some are getting indexed and thats not good. What would you recommend to be the best way to get everypage in a subdomain disallowed from spiders??? These are virtual subdomains BTW.


Yup, robots.txt is going to be the best way of stopping the spiders on a large number of pages.

If they are subdomains then I'm guessing they're something like...
banana.mysite.com
mango.mysite.com
kiwi.mysite.com

In which case they will all have their own root directories, pop a robots.txt file in each with...

User-agent: *
Disallow: /

in it, and that'll stop dem pesky spiders in their tracks (at least all the ones you have to worry about.)
__________________
Work: Web Positioning Centre ---- Spiderability test & keyword report: Spider Test

Reply With Quote
  #4  
Old January 29th, 2004, 01:36 PM
pk_synths's Avatar
pk_synths pk_synths is offline
Contributing User
SEO Chat Specialist (4000 - 4499 posts)
 
Join Date: May 2003
Location: Chicago
Posts: 4,109 pk_synths User rank is Corporal (100 - 500 Reputation Level)pk_synths User rank is Corporal (100 - 500 Reputation Level)pk_synths User rank is Corporal (100 - 500 Reputation Level)pk_synths User rank is Corporal (100 - 500 Reputation Level) 
Time spent in forums: 5 Days 17 h 45 m 3 sec
Reputation Power: 13
Send a message via AIM to pk_synths Send a message via Yahoo to pk_synths
Thanks but these are dymanic sites so there are no root directories because they are virtual subs.

The subs are what you stated but the index page looks at what sub a user is in and loads the appropriate graphics, content, and compiles it. The same index page is used for all sites and domain. Loads the right droplets.

Kinda like:

<droplet>
if <param name="bananas.mysite.com">
load header="/bananas/header.jhtml"
load body="/bananas/body.jhtml"
load footer="/bananas/footer.jhtml"

if <param name="apples.mysite.com">
load header="/apples/header.jhtml"
load body="/apples/body.jhtml"
load footer="/apples/footer.jhtml"
</droplet>

See?? Tough one The subs are virtual so I can point them anywhere and dont need seperate sites or directories for each. If I disallow the root i'll wipe out ALL subs and sites because they all share the same files just load up different stuff depending on the sub domain.

Thanks

Reply With Quote
  #5  
Old January 29th, 2004, 02:14 PM
PaulS PaulS is offline
Contributing User
SEO Chat Newbie (0 - 499 posts)
 
Join Date: Jan 2004
Location: Brighton, UK
Posts: 115 PaulS User rank is Private First Class (20 - 50 Reputation Level)PaulS User rank is Private First Class (20 - 50 Reputation Level) 
Time spent in forums: 1 Day 7 h 36 m 58 sec
Reputation Power: 7
Can the server be set up to serve up something that says it's a .txt file but is actually a dynamic file? Then you could have an if/else for the different info you need for the different sub-domains.

Otherwise, I suppose you could stick a
<meta name="robots" content="noindex,nofollow" />
in the headers, which fits in with how you're building the pages, but I'm not dead sure which spiders support it. Google seems to, so that'll be most of the people finding them covered.

Reply With Quote
  #6  
Old January 29th, 2004, 02:20 PM
pk_synths's Avatar
pk_synths pk_synths is offline
Contributing User
SEO Chat Specialist (4000 - 4499 posts)
 
Join Date: May 2003
Location: Chicago
Posts: 4,109 pk_synths User rank is Corporal (100 - 500 Reputation Level)pk_synths User rank is Corporal (100 - 500 Reputation Level)pk_synths User rank is Corporal (100 - 500 Reputation Level)pk_synths User rank is Corporal (100 - 500 Reputation Level) 
Time spent in forums: 5 Days 17 h 45 m 3 sec
Reputation Power: 13
Send a message via AIM to pk_synths Send a message via Yahoo to pk_synths
Quote:
Can the server be set up to serve up something that says it's a .txt file but is actually a dynamic file? Then you could have an if/else for the different info you need for the different sub-domains.


Haven't thought of that. Thanks alot. I'll see if it works. I'll post my finding (if i remember lol)

Reply With Quote
  #7  
Old January 30th, 2004, 10:18 AM
PaulS PaulS is offline
Contributing User
SEO Chat Newbie (0 - 499 posts)
 
Join Date: Jan 2004
Location: Brighton, UK
Posts: 115 PaulS User rank is Private First Class (20 - 50 Reputation Level)PaulS User rank is Private First Class (20 - 50 Reputation Level) 
Time spent in forums: 1 Day 7 h 36 m 58 sec
Reputation Power: 7
I've just remembered - if you use the meta robots method your pages will still get visited by the spider (at least the ones it finds links to) but they just won't get indexed - as you can guess, it has to visit the pages in order to be told to ignore them. This doesn't happen if they're blocked by a robots.txt file.

Reply With Quote
  #8  
Old January 30th, 2004, 11:15 AM
pk_synths's Avatar
pk_synths pk_synths is offline
Contributing User
SEO Chat Specialist (4000 - 4499 posts)
 
Join Date: May 2003
Location: Chicago
Posts: 4,109 pk_synths User rank is Corporal (100 - 500 Reputation Level)pk_synths User rank is Corporal (100 - 500 Reputation Level)pk_synths User rank is Corporal (100 - 500 Reputation Level)pk_synths User rank is Corporal (100 - 500 Reputation Level) 
Time spent in forums: 5 Days 17 h 45 m 3 sec
Reputation Power: 13
Send a message via AIM to pk_synths Send a message via Yahoo to pk_synths
Quote:
I've just remembered - if you use the meta robots method your pages will still get visited by the spider (at least the ones it finds links to) but they just won't get indexed - as you can guess, it has to visit the pages in order to be told to ignore them. This doesn't happen if they're blocked by a robots.txt file.


I dont mind if they get visited I have enough bandwidth to cover that. I just dont want them indexed. Some content is very similar and it seems that Google indexes the first one it gets too and boots the content I want indexed out because it considers the other content as newer content. I have pages getting kicked out and than reindexed every other day. I want my main sites content to be indexed not an affiliates. Thanks alot for all the help.

Reply With Quote
Reply

Viewing: SEO Chat ForumsSearch Engine StrategiesSearch Engine Spiders > Robots.txt question


Thread Tools  Search this Thread 
Search this Thread:

Advanced Search
Display Modes  Rate This Thread 
Rate This Thread:


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
View Your Warnings | New Posts | Latest News | Latest Threads | Shoutbox
Forum Jump




 Free IT White Papers!
 
Create the Optimal Architecture for your Critical Applications
Warburton's the largest independently owned bakery in the UK faced a number of difficult challenges in providing the most robust yet efficient IT infrastructure for their organization's success. IBM's services combined with their xSeries servers created the perfect platform for their SAP environment with sufficient flexibility, and did so in very time effective fashion.

Request Your Free Technology Downloads!
 
Five Best Practices for Deploying a Successful Service-Oriented Architecture
This white paper describes the benefits you can expect with SOA, and how IBM can help take your business there.

Request Your Free Technology Downloads!
 
Gartner Magic Quadrant for Application Delivery Controllers
Gartner summarizes its view on Application Delivery Controllers, evaluates strengths and weaknesses of solutions, and provides Magic Quadrant reporting for a quick comparison across all vendors. Learn from Gartner how you can benefit from an all-in-one device like Citrix NetScaler that delivers the highest levels of availability, performance and security.

Request Your Free Technology Downloads!
 
Knowledge is Power
What you don't know can hurt you, and is likely costing you money and increasing your security risks during an era of scarce resources. This white paper proposes six key strategies that enterprise security managers can use to improve their network defense posture.

Request Your Free Technology Downloads!
 
Rationalizing the Multi-Tool Environment
The rationalized multi-tool approach is flexible, scalable and cost effective. It provides the necessary input to the IT service management business processes. It preserves prior investments in monitoring tools, empowers technologists to select the best tools with which to do their jobs, and enhances effective response to incidents.

Request Your Free Technology Downloads!
 

Forums: » Register « |  User CP |  Games |  Calendar |  Members |  FAQs |  Sitemap |  Support | 
  
 




© 2003-2010 by Developer Shed. All rights reserved. DS Cluster 5 Hosted by Hostway
For more Enterprise Application Development news, visit eWeek