|
|
|||||||||
|
|||||||||
|
|||||||||
| |
||
| |||||||||
![]() |
|
|
«
Previous Thread
|
Next Thread
»
|
Thread Tools | Search this Thread | Rate Thread | Display Modes |
|
#1
|
||||
|
||||
|
Problems with Robots.txt - Please Help!
Hi, we recently installed some filters into our website that allows customers to select groups of products based on type, price ect. These filters add query strings to the end of the URL when you click on them. Our problem now is that the same content is getting indexed over and over and it's affecting our rankings.
I've tried to block these filter URL's through the robots.txt, but it's affecting our sitemap in webmaster tools. Can someone with more experience please explain the proper way to modify our robots.txt to block URL's with these filter parameters, but allows the sitemap URL's to be crawled? The problem is both our filter URL's, and our Sitemap URL's use parameters in a similar fashion. I can't figure out how to block one without blocking the other. An example of a Filter URL I would like to block: http://www.mydomain.com/c-1-my-categorie.aspx?pagesize=100&genreids=4 An example of the URL for our sitemap that we want to allow: http://www.mydomain.com/googleentity.aspx?entityname=Category&entityid=1 Here's what I've tried: 1) Disallow: /*? (This was the my first try, and it worked to block the duplicate pages, but it ALSO blocked my sitemap from google. Both URL's use the ?) 2) Disallow: /*?*.aspx (I though this would only block the .aspx pages, and leave the sitemap crawlable, but it does not. Sitemap & filter URL's were both blocked.) 3) Disallow: /*?pagesize (This did not work either, in webmaster tools it stills tells me that it could not crawl the sitemap due to being blocked by robots.txt.) 4) Disallow: /*? (To block everything with a ?) Allow: /*googleentity (To allow the path to our sitemap) Method 4 did not work either. After waiting an hour or so I checked in webmaster tools again, and it's still blocking our sitemap URL's. Can someone please help me with this? What can I put in the robots.txt that will block google from crawling the duplicate pages, but still allow the sitemap to be crawled like normal? Thanks so much for your attention to my problem. It's much appreciated.
__________________
You do, or you do not. There is no try. |
|
#2
|
|||
|
|||
|
You need to use a different approach to this problem. Keep the sitemap URLs exactly as you have them now, and clear out any robots.txt lines you added as part of this solution. Then, modify the actual code of your site to output a
Code:
<META NAME="ROBOTS" CONTENT="NOINDEX"> Code:
<html>
<head>
...
if( $_GET[page] )
{
echo '<META NAME="ROBOTS" CONTENT="NOINDEX">';
}
...
other metatags etc.
....
</head>
__________________
powerspirit's services |
|
#3
|
||||
|
||||
|
Quote:
You can try this one: 1. Disallow: */pagesize Do not use ?, as it will block dynamically created pages, in my own that one will block any URL containing "page size"
__________________
SEO Specialist - SEO Company UK SEO campaign return of investment calculator "You don't have to be great to start, but you have to start to be great "-Ziglar |
![]() |
| Viewing: SEO Chat Forums > Google > Google Optimization > Problems with Robots.txt - Please Help! |
| Thread Tools | Search this Thread |
| Display Modes | Rate This Thread |
|
|
|
|
|