#1
  1. No Profile Picture
    Newbie
    SEO Chat Explorer (0 - 99 posts)

    Join Date
    Feb 2016
    Posts
    4
    Rep Power
    0

    How to disable the indexing of duplicate urls of articles by Disallow and RewriteCond


    Hello everyone.

    Recently we find many duplicate urls of site (see it in the Google Search Console - HTML improvements). After analyzing the urls we find out
    that most of duplicate are creating by adding index.php or articles id to correct ursl. For examples:

    Cirrect url:
    /matematika-4-klass/olimpiady-diktanty-kartochki/kartochki-3-4-chetverty
    Duplicate :
    /matematika-4-klass/olimpiady-diktanty-kartochki/kartochki-3-4-chetverty/18-matematika-4-klacc-kontrolnye-raboty-testy-zadaniya-zadachi

    Correct url:
    /matematika-5-klass-uroki-temy-ploshad-formulf-ploshadi
    Duplicate :
    /index.php/matematika-5-klass-uroki-temy-ploshad-formulf-ploshadi
    /matematika-5-klass-uroki-temy-ploshad-formulf-ploshadi/22-matematika-5-klass-teoriya

    So the questions:
    1. What rules we should write in Disallow in robots.txt to disable from indexing the urls which contains index.php and articles id? It is to prevent
    the duplicates from indexing.
    2. And how we may use RewriteCond rules to cut out the word "index.php" from url and to delete the remaining of urls with id of articles?

    Thanks for any ideas.
  2. #2
  3. Dinosaur
    SEO Chat High Scholar (3500 - 3999 posts)

    Join Date
    Jun 2011
    Location
    UK
    Posts
    3,882
    Rep Power
    6496
    You are looking at this wrong, you should fold the pages together using the canonical tag.

    basically you add the following to the header section of your page

    <link rel="canonical" href="http://www.example.com/yourpage" />

    the url you give must be absolute and not relative and contain the page you want indexed.

    Comments on this post

    • Pierre Benneton agrees
    Owner of Page Explorer the page onsite SEO checker
    Useful Tools: Site Statistics: SEM Rush | Site Crawler: Screaming Frog
  4. #3
  5. No Profile Picture
    Newbie
    SEO Chat Explorer (0 - 99 posts)

    Join Date
    Feb 2016
    Posts
    4
    Rep Power
    0
    Originally Posted by Chedders
    You are looking at this wrong, you should fold the pages together using the canonical tag.

    basically you add the following to the header section of your page

    <link rel="canonical" href="http://www.example.com/yourpage" />

    the url you give must be absolute and not relative and contain the page you want indexed.
    I'm sorry, there are too much wrong pages, more than 2000.

    Of course it is possible to add canonical tag, but may be there is more easier way
  6. #4
  7. Dinosaur
    SEO Chat High Scholar (3500 - 3999 posts)

    Join Date
    Jun 2011
    Location
    UK
    Posts
    3,882
    Rep Power
    6496
    Easier than doing the job correctly?
    No.
    Having the current 2 pages won't harm you as such but they may be competing with each other
    Owner of Page Explorer the page onsite SEO checker
    Useful Tools: Site Statistics: SEM Rush | Site Crawler: Screaming Frog

Similar Threads

  1. not indexing few urls
    By maheshyeshanth in forum New User SEO Questions and Answers
    Replies: 2
    Last Post: Jan 5th, 2017, 08:33 PM
  2. Google is not indexing my articles
    By ishalovie in forum New User SEO Questions and Answers
    Replies: 2
    Last Post: Jan 8th, 2014, 04:28 AM
  3. Disable Indexing?
    By vblord1 in forum Google Optimization
    Replies: 5
    Last Post: May 6th, 2010, 11:13 AM
  4. Disallow Search Engine Indexing
    By davidku in forum Search Engine Optimization
    Replies: 1
    Last Post: Dec 4th, 2003, 09:12 PM

IMN logo majestic logo threadwatch logo seochat tools logo