|
|
|||||||||
|
|||||||||
|
|||||||||
| |
||
| ||||||||||||||||||||||||||
![]() |
|
|
«
Previous Thread
|
Next Thread
»
|
Thread Tools | Search this Thread | Rate Thread | Display Modes |
|
#1
|
||||
|
||||
|
Robots.txt question
Okay here's a brain teaser (or not) I have added some affiliates to my site and use subdomains to seperate each affiliate's layout and for tracking purposes. Now problem is that I noticed some are getting indexed and thats not good. What would you recommend to be the best way to get everypage in a subdomain disallowed from spiders??? These are virtual subdomains BTW.
Thanks for all the help people <- scary a$$ icon |
|
#2
|
||||
|
||||
|
dont use that eggs and hot dog smilie....you know how much i hate it.
[i feel the rage building up inside of me.....] |
|
#3
|
|||
|
|||
|
Quote:
Yup, robots.txt is going to be the best way of stopping the spiders on a large number of pages. If they are subdomains then I'm guessing they're something like... banana.mysite.com mango.mysite.com kiwi.mysite.com In which case they will all have their own root directories, pop a robots.txt file in each with... User-agent: * Disallow: / in it, and that'll stop dem pesky spiders in their tracks (at least all the ones you have to worry about.)
__________________
Work: Web Positioning Centre ---- Spiderability test & keyword report: Spider Test |
|
#4
|
||||
|
||||
|
Thanks but these are dymanic sites so there are no root directories because they are virtual subs.
The subs are what you stated but the index page looks at what sub a user is in and loads the appropriate graphics, content, and compiles it. The same index page is used for all sites and domain. Loads the right droplets. Kinda like: <droplet> if <param name="bananas.mysite.com"> load header="/bananas/header.jhtml" load body="/bananas/body.jhtml" load footer="/bananas/footer.jhtml" if <param name="apples.mysite.com"> load header="/apples/header.jhtml" load body="/apples/body.jhtml" load footer="/apples/footer.jhtml" </droplet> See?? Tough one Thanks |
|
#5
|
|||
|
|||
|
Can the server be set up to serve up something that says it's a .txt file but is actually a dynamic file? Then you could have an if/else for the different info you need for the different sub-domains.
Otherwise, I suppose you could stick a <meta name="robots" content="noindex,nofollow" /> in the headers, which fits in with how you're building the pages, but I'm not dead sure which spiders support it. Google seems to, so that'll be most of the people finding them covered. |
|
#6
|
||||
|
||||
|
Quote:
Haven't thought of that. Thanks alot. I'll see if it works. I'll post my finding (if i remember lol) |
|
#7
|
|||
|
|||
|
I've just remembered - if you use the meta robots method your pages will still get visited by the spider (at least the ones it finds links to) but they just won't get indexed - as you can guess, it has to visit the pages in order to be told to ignore them. This doesn't happen if they're blocked by a robots.txt file.
|
|
#8
|
||||
|
||||
|
Quote:
I dont mind if they get visited I have enough bandwidth to cover that. I just dont want them indexed. Some content is very similar and it seems that Google indexes the first one it gets too and boots the content I want indexed out because it considers the other content as newer content. I have pages getting kicked out and than reindexed every other day. I want my main sites content to be indexed not an affiliates. Thanks alot for all the help. |
![]() |
| Viewing: SEO Chat Forums > Search Engine Strategies > Search Engine Spiders > Robots.txt question |
| Thread Tools | Search this Thread |
| Display Modes | Rate This Thread |
|
|
|
|
|