#1
  1. Contributing User
    SEO Chat Discoverer (100 - 499 posts)

    Join Date
    Dec 2005
    Posts
    204
    Rep Power
    12

    Duplicate content issues PLEASE HELP


    I was using a tool to make an xml site map yesterday and I got a warning that there was a duplicate content issue with my www.mysite.(com) and http://www.mysite.(com)
    as well as a few other pages
    Huhh
    I have checked the preferred domain in google webmaster section and I have this in my .htaccess
    RewriteEngine On
    rewriteCond %{HTTP_HOST} ^mysite\.com$ [NC]
    rewriteRule ^(.*)$ http://www.mysite.com/$1 [R=301,L]

    So I forgot about it for the time but today I ran another tool to check the site and here are the errors I am getting.
    1.)
    WWW/NonWWW Header Check: FAILED
    Your site is not returning a 301 redirect from www to non-www or vice versa. This means that Google may cache both versions of your site, causing sitewide duplicate content penalties

    2.)
    Similarity Check: FAILED
    Google indicates that it has "omitted some entries very similar" to the top 1000 pages on your site. This similarity is a duplicate content penalty preventing these pages from being considered uniquely valuable in Google's index.

    3.)
    Default Page Check: FAILED
    You have not standardized your default pages meaning the following versions of your url return a 200/OK Header, which may cause duplicate content issues. The following extensions work:
    http://www.mysite.(com)/index.html
    http://www.mysite.(com)/

    My question is how can I authenticate these results and should I worry about them.
    I am aware that pages should be as unique as they can be.
    Please point me in the right direction.
  2. #2
  3. Contributing User
    SEO Chat Adventurer (500 - 999 posts)

    Join Date
    Jul 2005
    Location
    Canada
    Posts
    762
    Rep Power
    22
    I believe there may be an error in your rewriteCond

    Anchors:
    ^ Start-of-line anchor
    $ End-of-line anchor
    So in your case ^mysite\.com$ only matches server string without trailing slash or file name.

    Get rid of the trailing dollar sign.

    Comments on this post

    • tommr1 agrees
  4. #3
  5. Contributing User
    SEO Chat Discoverer (100 - 499 posts)

    Join Date
    Dec 2005
    Posts
    204
    Rep Power
    12
    OK Great.
    That solved the first problem.
    Well Done!!!

    Now I need to figure out where the similar content is and the last issue too.

    One flaming hoop at a time!
  6. #4
  7. Contributing User
    SEO Chat Adventurer (500 - 999 posts)

    Join Date
    Jul 2005
    Location
    Canada
    Posts
    762
    Rep Power
    22
    Duplicate content may be related to #1 if both versions were indexed. The next crawl by Google should sort that out.

    As for #3 .. who cares? I have never done anything special with index.* (with the exception of NEVER linking to the default file by name).

    Where it has happened ... Google has been smart enough to figure it out and drop the duplicate reference to index.*
  8. #5
  9. Contributing User
    SEO Chat Discoverer (100 - 499 posts)

    Join Date
    Dec 2005
    Posts
    204
    Rep Power
    12
    Thank you.
  10. #6
  11. Contributing User
    SEO Chat Discoverer (100 - 499 posts)

    Join Date
    Dec 2005
    Posts
    204
    Rep Power
    12
    Originally Posted by rtchar
    Duplicate content may be related to #1 if both versions were indexed. The next crawl by Google should sort that out.

    As for #3 .. who cares? I have never done anything special with index.* (with the exception of NEVER linking to the default file by name).

    Where it has happened ... Google has been smart enough to figure it out and drop the duplicate reference to index.*
    I do not link back to my site.index.html but there are links to sub directories and to the index.html there
    For example /links/index.html
    Should I change this to /links/
  12. #7
  13. Contributing User
    SEO Chat Adventurer (500 - 999 posts)

    Join Date
    Jul 2005
    Location
    Canada
    Posts
    762
    Rep Power
    22
    As long as you are consistant it does not matter ...

    Usually this is only a problem at the root because other webmasters are creating links into your site. Some will use www or filenames and some will not ...

    Use one form or the other, just do it the same everytime.
  14. #8
  15. Contributing User
    SEO Chat Discoverer (100 - 499 posts)

    Join Date
    Dec 2005
    Posts
    204
    Rep Power
    12
    Thanks,
    I found some links in my .htaccess that went to mysite/index.html so I changed them to mysite/

    I also found other problems with some of the redirects where there were links on my site going to old files which I had removed and used 301 to redirect. I cleaned it up.
    What a mess.

    I only wonder if this could be part of my problems.

    We have clawed our way back to page 4 or high page 5 from page 17 A week ago.

    Thanks for sending me in the right direction.
    Who would have thought that the thing I need the most right now "$" could cause such a mess.

    Now I better look around and see if I can help out in some way.

Similar Threads

  1. Content is No Longer King - Results of a Two Year Study
    By distinctseo in forum Search Engine Optimization
    Replies: 20
    Last Post: Feb 16th, 2007, 12:21 PM
  2. Duplicate content penalty again and I hope this thread is not too far out
    By SwissAboriginal in forum Google Optimization
    Replies: 5
    Last Post: Feb 1st, 2006, 07:30 AM
  3. Duplicate Content Question...
    By frankc in forum Google Optimization
    Replies: 6
    Last Post: Jan 25th, 2006, 03:57 PM
  4. Duplicate Content Issues....
    By nobbish in forum Google Optimization
    Replies: 3
    Last Post: Oct 12th, 2005, 08:35 PM
  5. Duplicate content and PR? any way out?
    By Scott.G in forum Google Optimization
    Replies: 7
    Last Post: Jun 30th, 2004, 01:10 PM

IMN logo majestic logo threadwatch logo seochat tools logo