Google Optimization
 
Forums: » Register « |  User CP |  Games |  Calendar |  Members |  FAQs |  Sitemap |  Support | 
 
 
User Name:
Password:
Remember me
Go Back   SEO Chat ForumsGoogleGoogle Optimization

Reply
Add This Thread To:
  Del.icio.us   Digg   Google   Spurl   Blink   Furl   Simpy   Y! MyWeb 
Thread Tools Search this Thread Rate Thread Display Modes
 
Unread SEO Chat Forums Sponsor:
  #1  
Old August 2nd, 2008, 04:16 PM
DanielR's Avatar
DanielR DanielR is offline
Capitalist
SEO Chat Newbie (0 - 499 posts)
 
Join Date: Aug 2007
Location: TX, USA
Posts: 178 DanielR User rank is Corporal (100 - 500 Reputation Level)DanielR User rank is Corporal (100 - 500 Reputation Level)DanielR User rank is Corporal (100 - 500 Reputation Level)DanielR User rank is Corporal (100 - 500 Reputation Level) 
Time spent in forums: 3 Days 6 h 1 m 18 sec
Reputation Power: 5
Problems with Robots.txt - Please Help!

Hi, we recently installed some filters into our website that allows customers to select groups of products based on type, price ect. These filters add query strings to the end of the URL when you click on them. Our problem now is that the same content is getting indexed over and over and it's affecting our rankings.

I've tried to block these filter URL's through the robots.txt, but it's affecting our sitemap in webmaster tools. Can someone with more experience please explain the proper way to modify our robots.txt to block URL's with these filter parameters, but allows the sitemap URL's to be crawled?

The problem is both our filter URL's, and our Sitemap URL's use parameters in a similar fashion. I can't figure out how to block one without blocking the other.

An example of a Filter URL I would like to block:
http://www.mydomain.com/c-1-my-categorie.aspx?pagesize=100&genreids=4

An example of the URL for our sitemap that we want to allow:
http://www.mydomain.com/googleentity.aspx?entityname=Category&entityid=1

Here's what I've tried:
1) Disallow: /*? (This was the my first try, and it worked to block the duplicate pages, but it ALSO blocked my sitemap from google. Both URL's use the ?)

2) Disallow: /*?*.aspx (I though this would only block the .aspx pages, and leave the sitemap crawlable, but it does not. Sitemap & filter URL's were both blocked.)

3) Disallow: /*?pagesize (This did not work either, in webmaster tools it stills tells me that it could not crawl the sitemap due to being blocked by robots.txt.)

4)
Disallow: /*? (To block everything with a ?)
Allow: /*googleentity (To allow the path to our sitemap)
Method 4 did not work either. After waiting an hour or so I checked in webmaster tools again, and it's still blocking our sitemap URL's.

Can someone please help me with this? What can I put in the robots.txt that will block google from crawling the duplicate pages, but still allow the sitemap to be crawled like normal?

Thanks so much for your attention to my problem. It's much appreciated.
__________________
You do, or you do not. There is no try.

Reply With Quote
  #2  
Old August 2nd, 2008, 08:43 PM
Powerspirit Powerspirit is offline
Contributing User
SEO Chat Newbie (0 - 499 posts)
 
Join Date: Jun 2006
Location: Toronto, Canada
Posts: 108 Powerspirit User rank is Corporal (100 - 500 Reputation Level)Powerspirit User rank is Corporal (100 - 500 Reputation Level)Powerspirit User rank is Corporal (100 - 500 Reputation Level)Powerspirit User rank is Corporal (100 - 500 Reputation Level) 
Time spent in forums: 1 Day 15 h 15 m 23 sec
Reputation Power: 4
You need to use a different approach to this problem. Keep the sitemap URLs exactly as you have them now, and clear out any robots.txt lines you added as part of this solution. Then, modify the actual code of your site to output a
Code:
<META NAME="ROBOTS" CONTENT="NOINDEX">
tag in the header if one of the modifier parameters is present in the URL. Here is some PHP code to get you started on having myfile.php?catid=3 indexed, but not myfile.php?catid=3&page=6:

Code:

<html>
<head>
...
if( $_GET[page] )
{
   echo '<META NAME="ROBOTS" CONTENT="NOINDEX">';
}
...
other metatags etc.
....
</head>
__________________
powerspirit's services

Reply With Quote
  #3  
Old August 2nd, 2008, 09:04 PM
Emerson's Avatar
Emerson Emerson is offline
Senior SEO Analyst
SEO Chat Beginner (1000 - 1499 posts)
 
Join Date: Aug 2007
Location: Cebu Philippines
Posts: 1,035 Emerson User rank is Sergeant (500 - 2000 Reputation Level)Emerson User rank is Sergeant (500 - 2000 Reputation Level)Emerson User rank is Sergeant (500 - 2000 Reputation Level)Emerson User rank is Sergeant (500 - 2000 Reputation Level)Emerson User rank is Sergeant (500 - 2000 Reputation Level) 
Time spent in forums: 1 Week 3 Days 1 h 12 m 53 sec
Reputation Power: 15
Quote:
Originally Posted by DanielR
Hi, we recently installed some filters into our website that allows customers to select groups of products based on type, price ect. These filters add query strings to the end of the URL when you click on them. Our problem now is that the same content is getting indexed over and over and it's affecting our rankings.

I've tried to block these filter URL's through the robots.txt, but it's affecting our sitemap in webmaster tools. Can someone with more experience please explain the proper way to modify our robots.txt to block URL's with these filter parameters, but allows the sitemap URL's to be crawled?

The problem is both our filter URL's, and our Sitemap URL's use parameters in a similar fashion. I can't figure out how to block one without blocking the other.

An example of a Filter URL I would like to block:
http://www.mydomain.com/c-1-my-categorie.aspx?pagesize=100&genreids=4

An example of the URL for our sitemap that we want to allow:
http://www.mydomain.com/googleentity.aspx?entityname=Category&entityid=1

Here's what I've tried:
1) Disallow: /*? (This was the my first try, and it worked to block the duplicate pages, but it ALSO blocked my sitemap from google. Both URL's use the ?)

2) Disallow: /*?*.aspx (I though this would only block the .aspx pages, and leave the sitemap crawlable, but it does not. Sitemap & filter URL's were both blocked.)

3) Disallow: /*?pagesize (This did not work either, in webmaster tools it stills tells me that it could not crawl the sitemap due to being blocked by robots.txt.)

4)
Disallow: /*? (To block everything with a ?)
Allow: /*googleentity (To allow the path to our sitemap)
Method 4 did not work either. After waiting an hour or so I checked in webmaster tools again, and it's still blocking our sitemap URL's.

Can someone please help me with this? What can I put in the robots.txt that will block google from crawling the duplicate pages, but still allow the sitemap to be crawled like normal?

Thanks so much for your attention to my problem. It's much appreciated.


You can try this one:
1. Disallow: */pagesize

Do not use ?, as it will block dynamically created pages, in my own that one will block any URL containing "page size"
__________________
SEO Specialist - SEO Company UK

SEO campaign return of investment calculator


"You don't have to be great to start, but you have to start to be great "-Ziglar

Reply With Quote
Reply

Viewing: SEO Chat ForumsGoogleGoogle Optimization > Problems with Robots.txt - Please Help!


Thread Tools  Search this Thread 
Search this Thread:

Advanced Search
Display Modes  Rate This Thread 
Rate This Thread:


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
View Your Warnings | New Posts | Latest News | Latest Threads | Shoutbox
Forum Jump


Forums: » Register « |  User CP |  Games |  Calendar |  Members |  FAQs |  Sitemap |  Support | 
  
 





© 2003-2008 by Developer Shed. All rights reserved. DS Cluster 6 hosted by Hostway
Stay green...Green IT