#1
  1. No Profile Picture
    Contributing User
    SEO Chat Explorer (0 - 99 posts)

    Join Date
    Jun 2004
    Posts
    33
    Rep Power
    15

    Duplicate content and PR? any way out?


    Another post mentioned Duplicate content and PR. Curious how duplicate content might affect PR? We have old pages from when we used a certain script and database to create static pages, well we have added so many part numbers that now we don't dare try to write over this old directory because it is cached in Google and we get lots of business from this sub-directory of pages. So we had to create a whole new directory for the new pages, but along with the new pages comes lots of near duplicate data. We are working on eventually migrating all the Google cache to the new data, but we still have about 90,000 old static pages out there. The last thing I'd want ever want to do is spam Google with duplicate data, but it is taking a long time to change, I've been trying for 6 months now.
    So how close is considered duplicate, what might I be able to add to the content or head to prevent the info as being flagged as duplicate... would a auto referral from the old page to the near duplicate new page be a no-no, would Google drop it from their cache and not index the redirected page?

    I know questions... questions...

    Thanks for your time and any responses are most appreciated!

    Scott
  2. #2
  3. No Profile Picture
    Contributing User
    SEO Chat Adventurer (500 - 999 posts)

    Join Date
    Jul 2003
    Posts
    834
    Rep Power
    17
    Change whatever you can. The longer its been there, the more likely it is that's nothing to going to happen.
    SootleDir - When only the finest link will do | Webmaster Forum |
  4. #3
  5. No Profile Picture
    Registered User
    SEO Chat Explorer (0 - 99 posts)

    Join Date
    Jun 2004
    Posts
    11
    Rep Power
    0
    Try some unix command to change your files. Like SED. You may need 10 minutes to install cygwin in Windows and 1 hour to learn SED.
    _____________
    Mirror
    Last edited by fathom; Jul 1st, 2004 at 05:03 AM. Reason: Intentional link spam
  6. #4
  7. No Profile Picture
    EGOL
    SEO Chat Mastermind (5000+ posts)

    Join Date
    Jun 2003
    Posts
    9,689
    Rep Power
    2482
    Can you rewrite out over the old file names? Then you have updated content and not duplicate?
    * "It's not the size of the dog in the fight that matters, it's the size of the fight in the dog." Mark Twain
    * "Free advice isn't worth much. Cheap advice is worth even less." EGOL
  8. #5
  9. No Profile Picture
    Contributing User
    SEO Chat Explorer (0 - 99 posts)

    Join Date
    Jun 2004
    Posts
    41
    Rep Power
    15

    the same problems.


    Hello everyone,

    I have the same problem, I manage a content management system that creates a statics .html pages but also at the same time update backup sites with the same structure ) what I have done is to put in the backup sites indexes, ( index, no follow ) to avoid being indexed and have the risk of be considered duplicate content but I do not know if this is enough.

    First Question: the dinamic version of my sites are considered duplicated content?
    Second question: Am I at risk with my backup sites even tough I put the robot " no follow "?

    Regards to all,
    Mariano.
  10. #6
  11. No Profile Picture
    Contributing User
    SEO Chat Explorer (0 - 99 posts)

    Join Date
    Jun 2004
    Posts
    33
    Rep Power
    15
    Originally Posted by EGOL
    Can you rewrite out over the old file names? Then you have updated content and not duplicate?
    I wish I could. In the newer directory we can as we are using a modified item part number as the actual html document file name, so when we do the updates it simply writes right over the last set of files. The older database and scripts used the ms access index number and since we have added so many new items these numbers would never relate to the proper item again. I have not been able to figure out any find and replace routine for access that will maintain the same indexed number. As soon as enough of the newer catalog pages get indexed, I posted them end of January, then I will put a robots.txt to exclude the old directories.
    I think the new ones are slow to index because the googlebot may think they are duplicates and disregards them... what a delima. So I'm going to write over all the new files and maintain the same part numbers of course but add more content before, after, and in the title and keywords. Perhaps this will help???

    Scott.g
  12. #7
  13. No Profile Picture
    Contributing User
    SEO Chat Explorer (0 - 99 posts)

    Join Date
    Jun 2004
    Posts
    33
    Rep Power
    15
    Originally Posted by marianosoler
    Hello everyone,

    I have the same problem, I manage a content management system that creates a statics .html pages but also at the same time update backup sites with the same structure ) what I have done is to put in the backup sites indexes, ( index, no follow ) to avoid being indexed and have the risk of be considered duplicate content but I do not know if this is enough.

    First Question: the dinamic version of my sites are considered duplicated content?
    Second question: Am I at risk with my backup sites even tough I put the robot " no follow "?

    Regards to all,
    Mariano.
    I doubt the dynamic portion of your content is being indexed, you would have to do an exclusive search of your site via Google and see if the dynamic pages are stored in Googles cache.

    Second I don't believe the googlebot pays any attention to the ROBOTS meta tag, you need to put a robots.txt file in your root directory of your web server, the bots look at this to see what files and directories to exclude. Here is a link for info and examples on the robots.txt file, be careful!

    http://www.robotstxt.org/wc/norobots.html

    Good luck... Scott.g
  14. #8
  15. No Profile Picture
    Registered User
    SEO Chat Explorer (0 - 99 posts)

    Join Date
    Jun 2004
    Location
    Vienna, Austria
    Posts
    8
    Rep Power
    0
    Would not FOLLOW, NOINDEX be more correct to avoid duplicate content?

Similar Threads

  1. duplicate content
    By SEOWiz in forum Search Engine Optimization
    Replies: 3
    Last Post: Oct 27th, 2005, 06:11 PM
  2. Duplicate content.
    By sorvoja in forum Google Optimization
    Replies: 15
    Last Post: Jul 8th, 2004, 10:40 AM
  3. Duplicate Content?
    By MGuru2004 in forum Google Optimization
    Replies: 4
    Last Post: Apr 13th, 2004, 09:03 PM
  4. Semi duplicate content and google?
    By Davey Boy in forum Google Optimization
    Replies: 19
    Last Post: Oct 18th, 2003, 01:31 AM

IMN logo majestic logo threadwatch logo seochat tools logo