SEO Test and Experimentation
 
Forums: » Register « |  User CP |  Games |  Calendar |  Members |  FAQs |  Sitemap |  Support | 
 
 
User Name:
Password:
Remember me
Go Back   SEO Chat ForumsOtherSEO Test and Experimentation

Reply
Add This Thread To:
  Del.icio.us   Digg   Google   Spurl   Blink   Furl   Simpy   Y! MyWeb 
Thread Tools Search this Thread Rate Thread Display Modes
 
Unread SEO Chat Forums Sponsor:
  #1  
Old August 17th, 2007, 07:20 AM
pro_seo's Avatar
pro_seo pro_seo is offline
Moderator
SEO Chat Frequenter (2500 - 2999 posts)
 
Join Date: Apr 2006
Location: I N D I A
Posts: 2,950 pro_seo User rank is Sergeant Major (2000 - 5000 Reputation Level)pro_seo User rank is Sergeant Major (2000 - 5000 Reputation Level)pro_seo User rank is Sergeant Major (2000 - 5000 Reputation Level)pro_seo User rank is Sergeant Major (2000 - 5000 Reputation Level)pro_seo User rank is Sergeant Major (2000 - 5000 Reputation Level)pro_seo User rank is Sergeant Major (2000 - 5000 Reputation Level) 
Time spent in forums: 1 Month 4 Weeks 1 Day 18 h 52 m 54 sec
Reputation Power: 44
Send a message via AIM to pro_seo Send a message via MSN to pro_seo Send a message via Yahoo to pro_seo Send a message via Google Talk to pro_seo Send a message via Skype to pro_seo
Blocking Dynamic URLs with Robots.txt

Ok friends,

What command in Robots.txt exactly works for blocking Dynamic URLs??

We have been hitting on this thing back and forth so many times in the forums and there seems to be varied answers to this question.

Some say that:

user-agent: *
disallow: /filename.php

Will block not only the filename.php file but also any other query strings/parameters which it attached to it.

While some others (like me ) say that

user-agent: *
disallow: /filename.php*

is the one which works.

So I thought of starting a small test to see which command is really effective in blocking Dynamic URLs through the robots.txt file.

The test will go like this....

We'll take a domain that has dynamic pages in it.

Then we'll try to block those dynamic pages with both the above mentioned commands PLUS any other which anyone of you can suggest.

We'll apply those commands one at a time and see which one really blocks those dynamic pages.

I believe that this test will be an eye-opener for all those like me who are still in this dilemma.

Thoughts ??

Thanks!
__________________

SEO FAQs - You might find your answer here.
SEOchat Forum Rules - Read Before You Post


**Do what you feel in your heart to be right- for you'll be criticized anyway. You'll be damned if you do, and damned if you don't.**

Reply With Quote
  #2  
Old August 17th, 2007, 07:28 AM
lovekills_s's Avatar
lovekills_s lovekills_s is offline
The Outstanding Red Apple
SEO Chat Regular (2000 - 2499 posts)
 
Join Date: Apr 2006
Posts: 2,485 lovekills_s User rank is Sergeant Major (2000 - 5000 Reputation Level)lovekills_s User rank is Sergeant Major (2000 - 5000 Reputation Level)lovekills_s User rank is Sergeant Major (2000 - 5000 Reputation Level)lovekills_s User rank is Sergeant Major (2000 - 5000 Reputation Level)lovekills_s User rank is Sergeant Major (2000 - 5000 Reputation Level)lovekills_s User rank is Sergeant Major (2000 - 5000 Reputation Level) 
Time spent in forums: 1 Month 2 Weeks 4 Days 14 h 46 m 7 sec
Reputation Power: 33
Send a message via AIM to lovekills_s Send a message via MSN to lovekills_s Send a message via Yahoo to lovekills_s Send a message via Google Talk to lovekills_s Send a message via Skype to lovekills_s
MySpace
Sure.. Thats sounds nice.. We need a Volunteer though
__________________
Link Diary - Build Links Fast & Easy. Similar to Linkmarket, with option of three way link exchange and anchor rotation.

**"Save SEO Industry - GO VIRAL!! - Tips and Tricks ."**
**"If you surrender to the wind, you can ride it."**
**"AdSlots Available - PM me for more Info on Website and its Stats - Pfft! The Website is for Sale! - Contact Me"

Reply With Quote
  #3  
Old August 17th, 2007, 08:05 AM
dessign dessign is offline
Registered User
SEO Chat Newbie (0 - 499 posts)
 
Join Date: Aug 2007
Posts: 2 dessign User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 10 m 23 sec
Reputation Power: 0
You should also try

user-agent: *
disallow: /filename.php?



Cheers

Reply With Quote
  #4  
Old August 17th, 2007, 08:10 AM
pro_seo's Avatar
pro_seo pro_seo is offline
Moderator
SEO Chat Frequenter (2500 - 2999 posts)
 
Join Date: Apr 2006
Location: I N D I A
Posts: 2,950 pro_seo User rank is Sergeant Major (2000 - 5000 Reputation Level)pro_seo User rank is Sergeant Major (2000 - 5000 Reputation Level)pro_seo User rank is Sergeant Major (2000 - 5000 Reputation Level)pro_seo User rank is Sergeant Major (2000 - 5000 Reputation Level)pro_seo User rank is Sergeant Major (2000 - 5000 Reputation Level)pro_seo User rank is Sergeant Major (2000 - 5000 Reputation Level) 
Time spent in forums: 1 Month 4 Weeks 1 Day 18 h 52 m 54 sec
Reputation Power: 44
Send a message via AIM to pro_seo Send a message via MSN to pro_seo Send a message via Yahoo to pro_seo Send a message via Google Talk to pro_seo Send a message via Skype to pro_seo
Thumbs up This thread already #1 in Google

Boy...the Gbot is devouring threads of SEOchat like crazy...

I made this thread about 30 mins ago..and it's already ranking at #1 for the keyphrase

"Blocking Dynamic URLs Through Robots.txt" ...I feel that I chose the right Title for the thread



This also proves that Google indeed meant when they said that they are now following "Minty Fresh Indexing"

Reply With Quote
  #5  
Old August 17th, 2007, 08:25 AM
pro_seo's Avatar
pro_seo pro_seo is offline
Moderator
SEO Chat Frequenter (2500 - 2999 posts)
 
Join Date: Apr 2006
Location: I N D I A
Posts: 2,950 pro_seo User rank is Sergeant Major (2000 - 5000 Reputation Level)pro_seo User rank is Sergeant Major (2000 - 5000 Reputation Level)pro_seo User rank is Sergeant Major (2000 - 5000 Reputation Level)pro_seo User rank is Sergeant Major (2000 - 5000 Reputation Level)pro_seo User rank is Sergeant Major (2000 - 5000 Reputation Level)pro_seo User rank is Sergeant Major (2000 - 5000 Reputation Level) 
Time spent in forums: 1 Month 4 Weeks 1 Day 18 h 52 m 54 sec
Reputation Power: 44
Send a message via AIM to pro_seo Send a message via MSN to pro_seo Send a message via Yahoo to pro_seo Send a message via Google Talk to pro_seo Send a message via Skype to pro_seo
Quote:
Originally Posted by dessign
You should also try

user-agent: *
disallow: /filename.php?



Cheers


Thanks !

We'll surely try that out as well.

Any volunteers care to spare a domain for the test ??

Reply With Quote
  #6  
Old August 17th, 2007, 08:54 AM
JagNet's Avatar
JagNet JagNet is offline
Smoke me a kipper...
Click here for more information. Click here for more information
 
Join Date: Aug 2007
Posts: 2,389 JagNet User rank is Sergeant Major (2000 - 5000 Reputation Level)JagNet User rank is Sergeant Major (2000 - 5000 Reputation Level)JagNet User rank is Sergeant Major (2000 - 5000 Reputation Level)JagNet User rank is Sergeant Major (2000 - 5000 Reputation Level)JagNet User rank is Sergeant Major (2000 - 5000 Reputation Level)JagNet User rank is Sergeant Major (2000 - 5000 Reputation Level) 
Time spent in forums: 1 Month 2 Weeks 1 Day 12 h 45 m 59 sec
Reputation Power: 45
I've just run:
Code:
user-agent: *
disallow: /filename.php?

through the robots.txt analysis tool in Google's webmaster tools and it shows that:
/filename.php with no query string is allowed, whilst
/filename.php?id=5 is blocked.

From personal experience I've done something similar on an existing site:
Code:
user-agent: *
disallow: /contact.php?id=

Having only one variable in use for that file meant I could include that in the robots.txt file. Since using this the existing indexed pages using the variable have been dropped from Google's index, whilst the main page without a query string has remained.

Reply With Quote
  #7  
Old August 17th, 2007, 09:02 AM
pro_seo's Avatar
pro_seo pro_seo is offline
Moderator
SEO Chat Frequenter (2500 - 2999 posts)
 
Join Date: Apr 2006
Location: I N D I A
Posts: 2,950 pro_seo User rank is Sergeant Major (2000 - 5000 Reputation Level)pro_seo User rank is Sergeant Major (2000 - 5000 Reputation Level)pro_seo User rank is Sergeant Major (2000 - 5000 Reputation Level)pro_seo User rank is Sergeant Major (2000 - 5000 Reputation Level)pro_seo User rank is Sergeant Major (2000 - 5000 Reputation Level)pro_seo User rank is Sergeant Major (2000 - 5000 Reputation Level) 
Time spent in forums: 1 Month 4 Weeks 1 Day 18 h 52 m 54 sec
Reputation Power: 44
Send a message via AIM to pro_seo Send a message via MSN to pro_seo Send a message via Yahoo to pro_seo Send a message via Google Talk to pro_seo Send a message via Skype to pro_seo
Quote:
Originally Posted by JagNet
I've just run:
Code:
user-agent: *
disallow: /filename.php?

through the robots.txt analysis tool in Google's webmaster tools and it shows that:
/filename.php with no query string is allowed, whilst
/filename.php?id=5 is blocked.

From personal experience I've done something similar on an existing site:
Code:
user-agent: *
disallow: /contact.php?id=

Having only one variable in use for that file meant I could include that in the robots.txt file. Since using this the existing indexed pages using the variable have been dropped from Google's index, whilst the main page without a query string has remained.


Thanks for your input

More observations...anybody ?

Reply With Quote
  #8  
Old August 24th, 2007, 06:12 AM
netSEO's Avatar
netSEO netSEO is offline
Contributing User
SEO Chat Newbie (0 - 499 posts)
 
Join Date: Mar 2007
Location: India
Posts: 427 netSEO User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 3 Days 20 h 14 m 23 sec
Reputation Power: 2
Nice Info...

Thanks
__________________
SEO Services

Reply With Quote
  #9  
Old August 24th, 2007, 10:22 AM
Galen's Avatar
Galen Galen is offline
hi
SEO Chat Newbie (0 - 499 posts)
 
Join Date: Jun 2007
Location: Shelton, CT
Posts: 253 Galen User rank is Corporal (100 - 500 Reputation Level)Galen User rank is Corporal (100 - 500 Reputation Level)Galen User rank is Corporal (100 - 500 Reputation Level)Galen User rank is Corporal (100 - 500 Reputation Level) 
Time spent in forums: 2 Days 18 h 39 m 9 sec
Reputation Power: 3
Send a message via AIM to Galen Send a message via MSN to Galen
Use this

Code:
user-agent: *
disallow: /*filename.php


Code:
http://www.skatevideosonline.net/filename.php?id=23 	  Blocked  by line 4: Disallow: /*filename.php
http://www.skatevideosonline.net/filename.php 	Blocked by line 4: Disallow: /*filename.php

Reply With Quote
  #10  
Old August 24th, 2007, 02:21 PM
pro_seo's Avatar
pro_seo pro_seo is offline
Moderator
SEO Chat Frequenter (2500 - 2999 posts)
 
Join Date: Apr 2006
Location: I N D I A
Posts: 2,950 pro_seo User rank is Sergeant Major (2000 - 5000 Reputation Level)pro_seo User rank is Sergeant Major (2000 - 5000 Reputation Level)pro_seo User rank is Sergeant Major (2000 - 5000 Reputation Level)pro_seo User rank is Sergeant Major (2000 - 5000 Reputation Level)pro_seo User rank is Sergeant Major (2000 - 5000 Reputation Level)pro_seo User rank is Sergeant Major (2000 - 5000 Reputation Level) 
Time spent in forums: 1 Month 4 Weeks 1 Day 18 h 52 m 54 sec
Reputation Power: 44
Send a message via AIM to pro_seo Send a message via MSN to pro_seo Send a message via Yahoo to pro_seo Send a message via Google Talk to pro_seo Send a message via Skype to pro_seo
Thanks Galen

Reply With Quote
  #11  
Old August 24th, 2007, 05:04 PM
Jean-Luc Jean-Luc is offline
Contributing User
SEO Chat Newbie (0 - 499 posts)
 
Join Date: Dec 2004
Location: Brussels, Belgium
Posts: 425 Jean-Luc User rank is Corporal (100 - 500 Reputation Level)Jean-Luc User rank is Corporal (100 - 500 Reputation Level)Jean-Luc User rank is Corporal (100 - 500 Reputation Level)Jean-Luc User rank is Corporal (100 - 500 Reputation Level) 
Time spent in forums: 6 Days 8 h 39 m 31 sec
Reputation Power: 8
Quote:
Originally Posted by pro_seo
What command in Robots.txt exactly works for blocking Dynamic URLs??

Since 1994, there is a universally accepted standard about robots.txt and it is defined here: A Standard for Robot Exclusion. Yes, it is old. Yes, that's just one page. Yes it looks like a personal web site, but it completely defines the standard and every serious robot designer refers to it.

Carefully read this page and you will know what command in robots.txt should work for blocking dynamic URLs. On top of that Google, Yahoo, Microsoft and others have all defined their own private extension to this standard, but they did not mutually agree about these extensions.

If you use these private extensions in the part of your robots.txt that follows
User-agent: *
, expect that a few robots will understand it and many will not. By the way, if you want to check this, you have to look at what several bots do, not only Googlebot.

I would recommend to only use these private extensions after a user agent line pointing to a robot that supports it.

Private extensions to the standard include:
- * used as a wildcard
- $ used as a mark for the end of the URL
- the Allow: directive
- the Crawl-delay: directive

Jean-Luc
__________________
AWStats Support : add-on's, extra sections, forum, installation assistance
AWStats remote service for less than $2 a month
Checking redirects is now as easy as 1 2 3, even if you are not a HTTP-header guru !

Reply With Quote
  #12  
Old November 13th, 2007, 06:18 AM
Emerson's Avatar
Emerson Emerson is offline
Senior SEO Analyst
SEO Chat Beginner (1000 - 1499 posts)
 
Join Date: Aug 2007
Location: Cebu Philippines
Posts: 1,143 Emerson User rank is Sergeant (500 - 2000 Reputation Level)Emerson User rank is Sergeant (500 - 2000 Reputation Level)Emerson User rank is Sergeant (500 - 2000 Reputation Level)Emerson User rank is Sergeant (500 - 2000 Reputation Level)Emerson User rank is Sergeant (500 - 2000 Reputation Level) 
Time spent in forums: 1 Week 3 Days 18 h 23 m 31 sec
Reputation Power: 17
Quote:
Originally Posted by JagNet
I've just run:
Code:
user-agent: *
disallow: /filename.php?

through the robots.txt analysis tool in Google's webmaster tools and it shows that:
/filename.php with no query string is allowed, whilst
/filename.php?id=5 is blocked.

From personal experience I've done something similar on an existing site:
Code:
user-agent: *
disallow: /contact.php?id=

Having only one variable in use for that file meant I could include that in the robots.txt file. Since using this the existing indexed pages using the variable have been dropped from Google's index, whilst the main page without a query string has remained.


So you mean that for example:

/shopping_cart.php?

That would block all shopping cart pages and will not crawl any pages including dynamic generated pages?

Does the above command prevent all shopping cart pages from being indexed?

Thanks.
__________________
SEO Specialist - SEO Company UK

SEO campaign return of investment calculator


"You don't have to be great to start, but you have to start to be great "-Ziglar

Reply With Quote
  #13  
Old November 13th, 2007, 06:22 AM
JagNet's Avatar
JagNet JagNet is offline
Smoke me a kipper...
Click here for more information. Click here for more information
 
Join Date: Aug 2007
Posts: 2,389 JagNet User rank is Sergeant Major (2000 - 5000 Reputation Level)JagNet User rank is Sergeant Major (2000 - 5000 Reputation Level)