#1
  1. No Profile Picture
    Contributing User
    SEO Chat Explorer (0 - 99 posts)

    Join Date
    May 2004
    Location
    Stockholm, Sweden
    Posts
    86
    Rep Power
    15

    number of indexed pages in decline ?


    Does anyone see the number of indexed pages of his/her website, in Google, falling ?

    Is it another serious update, on the way ?

    Any idea ?

    use this tool -> http://www.uncoverthenet.com/google-dance/dancing.php

    thanx
  2. #2
  3. No Profile Picture
    SEO Chat Skiller (1500 - 1999 posts)

    Join Date
    Jul 2004
    Location
    St. James Gate
    Posts
    1,988
    Rep Power
    18
    Google has been dropping large numbers of duplicate (or near duplicate) pages since mid-December. A lot of DMOZ clones are taking a big hit (as are other Directory's without unique content on each page)
  4. #3
  5. No Profile Picture
    Contributing User
    SEO Chat Explorer (0 - 99 posts)

    Join Date
    May 2004
    Location
    Stockholm, Sweden
    Posts
    86
    Rep Power
    15
    Originally Posted by Jasontnyc
    duplicate (or near duplicate) pages
    duplicate or near duplicate ?

    case :

    http://www.example.com/script.php?a=1
    http://www.example.com/script.php?a=2

    Is the above case an example of duplicated content ? presentation can be identical but content is not.

    What do you think ?
  6. #4
  7. No Profile Picture
    SEO Chat Skiller (1500 - 1999 posts)

    Join Date
    Jul 2004
    Location
    St. James Gate
    Posts
    1,988
    Rep Power
    18
    Those are fake urls so what am I supposed to compare for duplicate content? Just having urls that are similar won't cause them to be dropped, its what is on the page.
  8. #5
  9. No Profile Picture
    Contributing User
    SEO Chat Discoverer (100 - 499 posts)

    Join Date
    Jan 2005
    Posts
    193
    Rep Power
    14
    I find this very interesting. What, exactly, is considered duplicate content? I've read lots of opinions, but I've never really seen it nailed down by anyone. If you build a section of a site with a template page, where everything is identical except for the title tags, images, and a paragraph or two of unique text in each page, is that unique enough to not be considered duplicate?

    The basic structure of the pages would all be the same, same colors, etc., but a large percentage of the html on each page is identical. Does anyone have a percentage or a range? Is 10% unique content on each page enough, or does it need to be 50%? Or more?

    For instance, I see a lot of pages with eBay auctions on them. It looks like a javascript is in the coding of the page, which is all the SEs would see, right? So the page as displayed in a browser is very different, but the html would show only minor differences in the javascript. Is the opinion here that these pages would be considered duplicate content, and could be responsible for a decline in the number of pages indexed? How much unique text would need to be added to pages of this type to prevent tripping the duplicate filter?

    Adrew
  10. #6
  11. A trained killer.. in SEO
    SEO Chat Discoverer (100 - 499 posts)

    Join Date
    Jun 2004
    Posts
    289
    Rep Power
    15
    Originally Posted by jorje29
    duplicate or near duplicate ?

    case :

    http://www.example.com/script.php?a=1
    http://www.example.com/script.php?a=2

    Is the above case an example of duplicated content ? presentation can be identical but content is not.

    What do you think ?
    duplication Google is talking about is not in the URL jorje, it is in the contents of the pages themselves.
  12. #7
  13. No Profile Picture
    SEO Chat Skiller (1500 - 1999 posts)

    Join Date
    Jul 2004
    Location
    St. James Gate
    Posts
    1,988
    Rep Power
    18
    Originally Posted by Adrew
    I find this very interesting. What, exactly, is considered duplicate content? I've read lots of opinions, but I've never really seen it nailed down by anyone.
    Adrew,

    Thats a fair question as you won't see an actual percentage nailed down by anyone. I was doing an experiment with a biz directory script (very popular one and a lot of users reporting dropped pages) to test if Google would still index the whole thing. I was over 90% duplicate as it was a new directory meaning there where very little submissions. I started getting pages dropped early January so I added a rotating RSS feed to drop the duplicate percentage below 85%.

    Then within two weeks Google re indexed about 200 previously dropped pages. I am still waiting to see if this trend continues but I found it very interesting nonetheless. It is hard to isolate it but I was satisfied that it wasn't due merely to "fresh content". You can never be 100% sure of course.

    I've always tried to have unique pages (under 80% at the very least) so I have never had pages dropped but I started the experiment in case at some point in the future I decide to add dynamic pages with similar content.

    Hopefully at some point I can be more sure of percentages myself.
  14. #8
  15. Extremely Googled
    SEO Chat Good Citizen (1000 - 1499 posts)

    Join Date
    Dec 2004
    Posts
    1,035
    Rep Power
    501
    I don't think any of the browsers see Javascript. Remember, the robots generally are text based browsers like Lynx. When they load a page they don't generally process things that the browsers interpret like images, Flash and Javascript, mainly because you can't interpret images and they don't want the hassle of processing Javascript (which is written for client side processing). The bot has enough to look at without processing Javascript.
  16. #9
  17. No Profile Picture
    Contributing User
    SEO Chat Adventurer (500 - 999 posts)

    Join Date
    Sep 2004
    Posts
    758
    Rep Power
    15
    Does anyone see the number of indexed pages of his/her website, in Google, falling ?
    I have observed some indexing problems, some pages have disappeared from the index, and not for a penalization.

    Probably Google has rebuilt the index from scratch. It should be a rare event, in theory, but the last time it happened was in January and I would call this high frequency an "anomaly".

    What, exactly, is considered duplicate content?
    Well, there are different kinds of "duplication". The detection of HTML code duplication can be used for some purposes and the detection of text duplication can be used for other different purposes.

    For example, locating HTML duplicates can be used as one way to start a domain-farm network detection. The HTML code of each web page is analyzed and its features (tags, tags placement, etc.) are recorded to create a "digimark" of the HTML code. This digimark is an identifier that summarize all the main HTML features in just one numeric value. This value is not unique, that is different pages can show the same value, but HTML codes with similar (or same) values can be considered "more near".

    Exact HTML detection can also be used to detect doorway pages created by authomated software. A search engine engineer uses several doorway creator softwares to calculate the digimark values of the doorway pages created by them.

    Then, if the digimark value of a group of pages under the same website exactly matches with the digimark values of a doorway created by one of the softwares, that group of pages can be considered "suspicious" and the search engine can decide to analyze those pages more accurately or even definitely mark them as "software created", if the digimark calculation algorithm is very good.

    About duplicate text, well, it is a very interesting ad complex topic. In the past it was just a matter of word percentages or the calculation of the statistical dispersion of the words (like variance or standard deviation), but today the differences beetweek two texts can be calculated taking in account their positions within a "map of concepts". That is, an algorithm calculates the "overall meaning" or "overall sense" of each text and finds its coordinates within a multidimensional onthology. Of course, nearer coordinates mean more similar "meanings".

    This "duplicate meaning" concept can be used to reinforce simple statistical duplicate text detection algorithms.

    I could write a lot about this topic, unfortunately I have not much time just now.
    Please, please, help tsunami survivors making a donation: UNICEF - Red Cross
  18. #10
  19. No Profile Picture
    Contributing User
    SEO Chat Discoverer (100 - 499 posts)

    Join Date
    Jan 2005
    Posts
    193
    Rep Power
    14
    Hmmm...digimark...are we talking a huge number of pages for this consideration to occur? When you say automated software, I think of many hundreds or thousands of pages being generated. But what if we're only talking about a small amount of pages being very similar? Let's say 250 just for kicks.

    Would that small number of pages be enough to trip a duplicate filter? And if so, it sounds like adding more unique text to them might improve their rankings. I see so many pages that rank well that are obviously software generated. A logo change here or there, and some color changes are about all that differ on them. I always wondered how they got away with it.

    I wish I could remember the search I did a few weeks back. The first 2 pages in Google were all the same type of site. You click on one link, and it takes you to another page with a bunch of links on it. And so on and so on. It was almost like a maze, and I was very surprised that so many of them existed. After a few clicks, I recognized the pages and could tell that at the very least the same type of software must have generated them, they were so similar. This was before the more recent changes at Google, so they may have been dropped by now.

    Drew
  20. #11
  21. No Profile Picture
    Contributing User
    SEO Chat Adventurer (500 - 999 posts)

    Join Date
    Sep 2004
    Posts
    758
    Rep Power
    15
    Hmmm...digimark...are we talking a huge number of pages for this consideration to occur?
    No. The ID value is useful for comparing many pages, but any decent duplicate detection algorithm is able to calculate a similarity percentage just comparing the code (i.e. the digimark value) and the text of just two different pages.

    And if so, it sounds like adding more unique text to them might improve their rankings.
    Any modification in the text or in the HTML code can change the rankings.

    A logo change here or there, and some color changes are about all that differ on them.
    Most duplicate detection algorithms does not rely on graphics or colors. Most calculations are done on the text, and HTML code analyses are useful to detect pages coming from the same network/site/webmaster.

    If a search engine shows a result page where all the listed pages differ just for a logo, than its duplicate detection algorithm is not very good or (usually) the query submitted by the user does not produce a good amount of different pages.

Similar Threads

  1. How can this be - Pages indexed but nowhere in SERPs for title phrases?
    By tomdude48 in forum BING/Yahoo Search Optimization
    Replies: 1
    Last Post: Dec 14th, 2004, 09:04 PM
  2. Yahoo keeps on visiting but never renews my indexed pages
    By fng in forum BING/Yahoo Search Optimization
    Replies: 1
    Last Post: Nov 19th, 2004, 01:34 PM
  3. Gone from 120 indexed pages to 650+ OVERNIGHT?
    By scriblesvurt in forum Google Optimization
    Replies: 7
    Last Post: Nov 17th, 2004, 05:21 PM
  4. Link Popularity being Re-Defined and Revised
    By -search-engines-web in forum Google Optimization
    Replies: 17
    Last Post: Sep 2nd, 2004, 02:30 AM
  5. Indexed pages ?
    By jorje29 in forum Google Optimization
    Replies: 10
    Last Post: Jun 7th, 2004, 11:19 PM

IMN logo majestic logo threadwatch logo seochat tools logo