Search Technologies
 
Forums: » Register « |  User CP |  Games |  Calendar |  Members |  FAQs |  Sitemap |  Support | 
 
User Name:
Password:
Remember me
Go Back   SEO Chat ForumsSearch Engine StrategiesSearch Technologies

Reply
Add This Thread To:
  Del.icio.us   Digg   Google   Spurl   Blink   Furl   Simpy   Y! MyWeb 
Thread Tools Search this Thread Rate Thread Display Modes
 
Unread SEO Chat Forums Sponsor:
  #1  
Old May 19th, 2005, 02:16 PM
randfish's Avatar
randfish randfish is offline
SEO Chat Intermediate (1500 - 1999 posts)
 
Join Date: Jul 2004
Location: Seattle, WA
Posts: 1,877 randfish User rank is Sergeant (500 - 2000 Reputation Level)randfish User rank is Sergeant (500 - 2000 Reputation Level)randfish User rank is Sergeant (500 - 2000 Reputation Level)randfish User rank is Sergeant (500 - 2000 Reputation Level)randfish User rank is Sergeant (500 - 2000 Reputation Level) 
Time spent in forums: 6 Days 13 h 18 m 26 sec
Reputation Power: 18
Exclamation The Search Engines Lie About Index Sizes?

Fantomaster pointed me to a fantastic blog belonging to a professor of IR research in France - Jean Véronis. Sadly I cannot read and comprehend French, but he has several English entries on the subject of search engine index sizes:
  • Google - aixtal.blogspot.com/2005/02/web-googles-missing-pages-mystery.html
  • Yahoo! - aixtal.blogspot.com/2005/03/web-yahoo-indexes-more-pages-than.html
  • MSN - aixtal.blogspot.com/2005/02/web-msn-cheating-too.html
The findings are very impressive and his research appears solid to me.

Basically, his findings are that MSN and Google are both inflating their index size, that you can find a "truer" number of results at Google by typing the term you're seeking into the engine twice (i.e. search for "string string" instead of "string"), and that Yahoo! probably has the largest index size of major search engines, along with the highest level of honesty.

If any SEOChatters are versed in Francais, please bring back any other tidbits of information you find at his site - these posts alone are enough to make me sign up for a French class.

Reply With Quote
  #2  
Old May 19th, 2005, 02:38 PM
jrothra's Avatar
jrothra jrothra is offline
<-- No I am not Kermit
SEO Chat Beginner (1000 - 1499 posts)
 
Join Date: Mar 2005
Location: Fort Worth, TX
Posts: 1,015 jrothra User rank is Corporal (100 - 500 Reputation Level)jrothra User rank is Corporal (100 - 500 Reputation Level)jrothra User rank is Corporal (100 - 500 Reputation Level)jrothra User rank is Corporal (100 - 500 Reputation Level) 
Time spent in forums: 1 Week 2 Days 19 h 25 m 27 sec
Reputation Power: 9
Wink

Interesting information. Makes one wonder if Google and Microsoft (and even NBC) will try to keep this guy's information hush-hush fearing backlash in their stocks.

Reply With Quote
  #3  
Old May 19th, 2005, 03:05 PM
Bernard's Avatar
Bernard Bernard is offline
Supercalifragilistic
SEO Chat Beginner (1000 - 1499 posts)
 
Join Date: May 2003
Location: Friendswood, TX
Posts: 1,020 Bernard User rank is Corporal (100 - 500 Reputation Level)Bernard User rank is Corporal (100 - 500 Reputation Level)Bernard User rank is Corporal (100 - 500 Reputation Level)Bernard User rank is Corporal (100 - 500 Reputation Level) 
Time spent in forums: 5 Days 15 h 52 m 56 sec
Reputation Power: 9
I can see it now:

"Billions and billions indexed."

Thanks McDonalds.
Comments on this post
jrothra agrees: ROFLOL!!
__________________
Have a thumb? Check out my gardening forum.

Reply With Quote
  #4  
Old May 19th, 2005, 03:15 PM
jrothra's Avatar
jrothra jrothra is offline
<-- No I am not Kermit
SEO Chat Beginner (1000 - 1499 posts)
 
Join Date: Mar 2005
Location: Fort Worth, TX
Posts: 1,015 jrothra User rank is Corporal (100 - 500 Reputation Level)jrothra User rank is Corporal (100 - 500 Reputation Level)jrothra User rank is Corporal (100 - 500 Reputation Level)jrothra User rank is Corporal (100 - 500 Reputation Level) 
Time spent in forums: 1 Week 2 Days 19 h 25 m 27 sec
Reputation Power: 9
Talking

Quote:
Originally Posted by Bernard
I can see it now:

"Billions and billions indexed."

Thanks McDonalds.


ROFLOL!!! More like, McGoogle's?

Reply With Quote
  #5  
Old May 19th, 2005, 04:15 PM
Wit's Avatar
Wit Wit is offline
http://tinyurl.com/cz56g
SEO Chat God 2nd Plane (6000 - 6499 posts)
 
Join Date: Sep 2004
Location: D0RDRECHT NL
Posts: 6,063 Wit User rank is Sergeant (500 - 2000 Reputation Level)Wit User rank is Sergeant (500 - 2000 Reputation Level)Wit User rank is Sergeant (500 - 2000 Reputation Level)Wit User rank is Sergeant (500 - 2000 Reputation Level)Wit User rank is Sergeant (500 - 2000 Reputation Level) 
Time spent in forums: 2 Months 6 Days 10 h 52 m 26 sec
Reputation Power: 19
It's funny, but I just read another report that SEs UNDERestimate their index sizes.

Not that I believe that (because what would be the point?), but still...

Reply With Quote
  #6  
Old May 19th, 2005, 04:23 PM
jrothra's Avatar
jrothra jrothra is offline
<-- No I am not Kermit
SEO Chat Beginner (1000 - 1499 posts)
 
Join Date: Mar 2005
Location: Fort Worth, TX
Posts: 1,015 jrothra User rank is Corporal (100 - 500 Reputation Level)jrothra User rank is Corporal (100 - 500 Reputation Level)jrothra User rank is Corporal (100 - 500 Reputation Level)jrothra User rank is Corporal (100 - 500 Reputation Level) 
Time spent in forums: 1 Week 2 Days 19 h 25 m 27 sec
Reputation Power: 9
Quote:
Originally Posted by Wit
It's funny, but I just read another report that SEs UNDERestimate their index sizes.

Not that I believe that (because what would be the point?), but still...


LOL. Well, I guess we can therefore follow the President's theory (which he used for his tax plan)...

Some say the index is overestimated, some say it's underestimated... therefore it's probably about right.

All kidding aside, to my knowledge, the only real benefit to knowing the exact size of the index is for that SE's advertising/marketing campaign and stock value. For website SEO, the focus is on those sites which rank well on SERPs for keyword/keyword phrases. But the actual index size is interesting to know if own that SE's stock.

Reply With Quote
  #7  
Old May 19th, 2005, 05:20 PM
randfish's Avatar
randfish randfish is offline
SEO Chat Intermediate (1500 - 1999 posts)
 
Join Date: Jul 2004
Location: Seattle, WA
Posts: 1,877 randfish User rank is Sergeant (500 - 2000 Reputation Level)randfish User rank is Sergeant (500 - 2000 Reputation Level)randfish User rank is Sergeant (500 - 2000 Reputation Level)randfish User rank is Sergeant (500 - 2000 Reputation Level)randfish User rank is Sergeant (500 - 2000 Reputation Level) 
Time spent in forums: 6 Days 13 h 18 m 26 sec
Reputation Power: 18
Actually index size is very useful for a lot of the calculations done with my tools and for purposes of identifying keywords, calculating term weight on a page, etc.

Perhaps this is why Google & MSN mis-represent...
Comments on this post
Wit agrees: Size matters - I know - but in some cases "size is in the eye of the beholder"

Reply With Quote
  #8  
Old May 20th, 2005, 05:13 AM
mick.sawyer mick.sawyer is offline
I love SEO Chat.
SEO Chat Regular (2000 - 2499 posts)
 
Join Date: Aug 2004
Location: I love SEO Chat.
Posts: 2,422 mick.sawyer User rank is Sergeant (500 - 2000 Reputation Level)mick.sawyer User rank is Sergeant (500 - 2000 Reputation Level)mick.sawyer User rank is Sergeant (500 - 2000 Reputation Level)mick.sawyer User rank is Sergeant (500 - 2000 Reputation Level)mick.sawyer User rank is Sergeant (500 - 2000 Reputation Level) 
Time spent in forums: 3 Weeks 1 Day 17 h 2 sec
Warnings Level: 10
Number of bans: 1
Reputation Power: 0
If you just check some of your bigger sites you will notice patterns.

I have a 10k pages site that is showing 23k pages.
It first happened just before google doubled their index from 4 billiom to 8 billion about 6 months ago.
It started indexing all different formats of sites,, pdf, etc and also just showing silly amounts of untrue pages.

Reply With Quote
  #9  
Old May 20th, 2005, 12:15 PM
xan's Avatar
xan xan is offline
Contributing User
SEO Chat Newbie (0 - 499 posts)
 
Join Date: Oct 2004
Location: UK
Posts: 108 xan User rank is Lance Corporal (50 - 100 Reputation Level)xan User rank is Lance Corporal (50 - 100 Reputation Level)xan User rank is Lance Corporal (50 - 100 Reputation Level) 
Time spent in forums: 20 h 9 m 25 sec
Reputation Power: 6
Nice find Rand,

really interesting blog, and yes his research does look good, he documents all his experiemnts really well, so you can see exactly what's going on there.

I particularly like the "trustrank" lots of noise for nothing: aixtal.blogspot.com/2005/05/google-trustrank-beaucoup-de-bruit.html

He says the technical report from Stanford is also co-authored by jan Pederson. Obviously Yahoo! wouldn't want Google to have a part of this for a start. He says that it's possible that Google tried to get in there quick by applying for patent.

He also points out that the date that it was submitted the 16 septembre 2003.

He also talks about the search engine meeting in Boston. His blog is mostly though about the constitutions translation. However his work written in english is really relevant.

It's interesting to read those articles. I don't have time to pick at it (you know me), but I'll have a look later. Anyway, looks nice!

Reply With Quote
Reply

Viewing: SEO Chat ForumsSearch Engine StrategiesSearch Technologies > The Search Engines Lie About Index Sizes?


Thread Tools  Search this Thread 
Search this Thread:

Advanced Search
Display Modes  Rate This Thread 
Rate This Thread:


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
View Your Warnings | New Posts | Latest News | Latest Threads | Shoutbox
Forum Jump




 Free IT White Papers!
 
Create the Optimal Architecture for your Critical Applications
Warburton's the largest independently owned bakery in the UK faced a number of difficult challenges in providing the most robust yet efficient IT infrastructure for their organization's success. IBM's services combined with their xSeries servers created the perfect platform for their SAP environment with sufficient flexibility, and did so in very time effective fashion.

Request Your Free Technology Downloads!
 
Five Best Practices for Deploying a Successful Service-Oriented Architecture
This white paper describes the benefits you can expect with SOA, and how IBM can help take your business there.

Request Your Free Technology Downloads!
 
Gartner Magic Quadrant for Application Delivery Controllers
Gartner summarizes its view on Application Delivery Controllers, evaluates strengths and weaknesses of solutions, and provides Magic Quadrant reporting for a quick comparison across all vendors. Learn from Gartner how you can benefit from an all-in-one device like Citrix NetScaler that delivers the highest levels of availability, performance and security.

Request Your Free Technology Downloads!
 
Knowledge is Power
What you don't know can hurt you, and is likely costing you money and increasing your security risks during an era of scarce resources. This white paper proposes six key strategies that enterprise security managers can use to improve their network defense posture.

Request Your Free Technology Downloads!
 
Rationalizing the Multi-Tool Environment
The rationalized multi-tool approach is flexible, scalable and cost effective. It provides the necessary input to the IT service management business processes. It preserves prior investments in monitoring tools, empowers technologists to select the best tools with which to do their jobs, and enhances effective response to incidents.

Request Your Free Technology Downloads!
 

Forums: » Register « |  User CP |  Games |  Calendar |  Members |  FAQs |  Sitemap |  Support | 
  
 




© 2003-2010 by Developer Shed. All rights reserved. DS Cluster 3 Hosted by Hostway
For more Enterprise Application Development news, visit eWeek