Search Technologies
 
Forums: » Register « |  User CP |  Games |  Calendar |  Members |  FAQs |  Sitemap |  Support | 
 
 
User Name:
Password:
Remember me
Go Back   SEO Chat ForumsSearch Engine StrategiesSearch Technologies

Reply
Add This Thread To:
  Del.icio.us   Digg   Google   Spurl   Blink   Furl   Simpy   Y! MyWeb 
Thread Tools Search this Thread Rate Thread Display Modes
 
Unread SEO Chat Forums Sponsor:
  #31  
Old February 27th, 2005, 10:35 AM
xan's Avatar
xan xan is offline
Contributing User
SEO Chat Newbie (0 - 499 posts)
 
Join Date: Oct 2004
Location: UK
Posts: 108 xan User rank is Lance Corporal (50 - 100 Reputation Level)xan User rank is Lance Corporal (50 - 100 Reputation Level)xan User rank is Lance Corporal (50 - 100 Reputation Level) 
Time spent in forums: 20 h 9 m 25 sec
Reputation Power: 5
The way that semantics are used in IR is really not as simple as that. They also only form a part of the method. I think that trying to optimize your keywords for better ranking is not a bad idea at all, but its going to be pretty hard to get it right. Using simple solutions appear to me to be as efficient as more involved ones, as its difficult to get it right without the exact method. Semantics is a really vast and exreemly difficult area of research. Its a bit like a very complex code. Different combinations of methods all the time, and sometimes we get them to change periodically all on their own as they adapt to new environments and situations. It would actually be quite a student approach to use semantics in a simple way. Words on page are not always trustworthy, but are considered a variable. An error rate will be incorporated as well. Meta-data gets ignored, unless its Dublin core in publications, which it never is on websites. We soon learnt meta-data was not often correct.

Do include words that describe your area well and pay special attention to the quality of the prose. Spelling mistakes are also lethal. Its important to remember that IR's are not interested in which site ranks first. Sometimes its beneficial to offer a whole range of different types of sites to do with the same subject area, its been done before. After looking at content and links, a lot more goes on. The basic information has been collected, and the differnet types of sites and subjects identified. Now lets sort 'em all.

Last edited by xan : February 27th, 2005 at 10:43 AM.

Reply With Quote
  #32  
Old February 27th, 2005, 01:56 PM
raz's Avatar
raz raz is offline
Contributing User
SEO Chat Intermediate (1500 - 1999 posts)
 
Join Date: Mar 2004
Location: Los Angeles, CA
Posts: 1,847 raz User rank is Sergeant (500 - 2000 Reputation Level)raz User rank is Sergeant (500 - 2000 Reputation Level)raz User rank is Sergeant (500 - 2000 Reputation Level)raz User rank is Sergeant (500 - 2000 Reputation Level)raz User rank is Sergeant (500 - 2000 Reputation Level) 
Time spent in forums: 4 Weeks 14 h 20 m 46 sec
Reputation Power: 24
xan,
I have been using a very simplified version of the analysis for a while now. You are right in that the whole thing can get very complex, but the key is that engineers are trying to hone in their algorithms to recognize quality content and if we simply build web sites with quality content then we are half way there. In my opinion we must analyze the pages after building the quality site, in an effort to fine tune it rather than do the analysis and truy and build pages for it.

I do believe that engines (at least google) have been introducing LSA in small steps for a while now and I am basing my argument on the fact that since the December update and even more so since the super bowl update I have noticed an increase in my traffic which is a result visitors using really off the wall key phrases, the kind that you will never optimize for. In fact that increase is very substantial some 40% of total searchers are coming in using those weird (but on topic) search terms.
In other words my site is doing better than before simply because it is a quality, information packed site and the engines are beginning to recognize that.

My competitors who have disappeared from the top results were the ones that were there imply because of a huge number of links and a huge number of pages with only one or two pages optimized for my main search term.

raz
__________________
#include <Cognac.h> -The only code I know How to Write!
Got SEO Questions? - >>SEO Chat FAQ<<

I AVE LOST MY RANKINGS! Read Before Strting a Thread!!

Reply With Quote
  #33  
Old February 28th, 2005, 07:46 AM
2K's Avatar
2K 2K is offline
Professional SEO
SEO Chat Novice (500 - 999 posts)
 
Join Date: May 2003
Location: Finland
Posts: 711 2K User rank is Lance Corporal (50 - 100 Reputation Level)2K User rank is Lance Corporal (50 - 100 Reputation Level)2K User rank is Lance Corporal (50 - 100 Reputation Level) 
Time spent in forums: 3 Days 1 h 56 m 20 sec
Reputation Power: 6
Quote:
Originally Posted by xan
Spelling mistakes are also lethal.


Hi xan, could you specify this... I just finished reading Maarten de Rijke's excellent series about IR-technology and he also mentions spelling mistakes as an issue. I understand that spelling mistakes are a lethal mistake for quality striven results in controlled vocabulary enviroment, but why would they be same with uncontrolled, very large vocabulary like www?
__________________
KK Mediat - a professional search engine optimization company that helps you to achieve higher search engine placement and increased web site traffic. 2K also provides some nifty SEO tools like 2KRT: Free Google Keyword Ranking and Tracking Tool

Reply With Quote
  #34  
Old February 28th, 2005, 08:29 AM
internex's Avatar
internex internex is offline
search engine voyeur
SEO Chat Newbie (0 - 499 posts)
 
Join Date: Jul 2004
Location: Lancashire, UK
Posts: 282 internex User rank is Private First Class (20 - 50 Reputation Level)internex User rank is Private First Class (20 - 50 Reputation Level) 
Time spent in forums: 3 Days 2 h 26 m 49 sec
Reputation Power: 5
Send a message via MSN to internex
Must say this is one of the best posts I have seen on this forum for a while. Whislt I dont proclaim to know a lot about LSI, it is something that I shall be focusing more time on, as even if its not directly resposible for the resultset returned by the engines, it may well help develop further strategies with this knowledge in mind.

Reply With Quote
  #35  
Old March 1st, 2005, 10:29 AM
xan's Avatar
xan xan is offline
Contributing User
SEO Chat Newbie (0 - 499 posts)
 
Join Date: Oct 2004
Location: UK
Posts: 108 xan User rank is Lance Corporal (50 - 100 Reputation Level)xan User rank is Lance Corporal (50 - 100 Reputation Level)xan User rank is Lance Corporal (50 - 100 Reputation Level) 
Time spent in forums: 20 h 9 m 25 sec
Reputation Power: 5
The thesaurus is a book of synonyms, including related and contrasting
words and antonyms. It is used for broadening the search term and thus
increasing recall.


The metathesaurus is used predominantly in medical data sets, but is
of great interest. It “preserves the names, meanings, hierarchical contexts,
attributes, and inter-term relationships present in its source vocabularies;
adds certain basic information to each concept; and establishes new
relationships between terms from different source vocabularies.” (UMLS)


The SPECIALIST lexicon gets used for determining spelling variations,
abbreviations, acronyms, and inflectional variations;

http://mirrored.ukoln.ac.uk/lis-journals/dlib/dlib/dlib/january98/01kirriemuir.html



Spelling deviations between different dialects, like UK and US spelling are dealt with
quite efficiently for the most part


Different methods used include string-to-string edit distance and a word language model,
POS-tagging for selecting candidates,


Why its lethal:

Spelling mistakes means that if stemming is used, non-words are formed and then
subsequently dropped as non-comprehensible.
Another term is suggested, and other sites are served up in the results. If people
have made spelling mistakes the same as yours though an alternative will be provided:


If you stick in “pattern” in Google.com you get:

Pettern

stumbleupon toolbar, 127,059 members. Pettern, Reviews, Friends, I am a 29 year-old guy,
from Oslo, Norway. Pettern. ... Stumbler #141947. Pettern, Reviews, Friends,
pettern.stumbleupon.com/about/ - 26k - Cached - Similar pages


And Pattern:



Pattern Blocks: Exploring Fractions with Shapes

Java applet that implements pattern blocks typically used to visualize and learn
fractions. Your browser does not seem to be able ...
www.arcytech.org/java/patterns/patterns_j.shtml

As you can see there is quite a drastic effect as far as the results are concerned.


Apparently in 10% of all search queries users tend to misspell. That would imply
that with a spelling mistake in a key term, you have 10% chance of getting returned
in the results.


Fuzzy Search Technology can be used to give an approximation of the term,
but that’s what that suggestion for alternative spelling is.
As you know through using different common applications, word checkers are hardly
accurate.

As I have already said a few times, IR systems have been using wordnet for a
significant amount of time and most systems still use it.


“pattern – sorry, no matches found”



Pattern - The noun "pattern" has 8 senses in WordNet.



Other than spelling mistakes, there is the problem of the “unintentional misuse
of a word by confusion with one that sounds similar” (wordnet), which is malapropism.


A way of dealing with natural language or groups of words is to use statistical methods
to decide what the probability of these words in sequence is, but this obviously has drawbacks.

I blogged this and expanded on techniques as well if you're interested








Comments on this post
randfish agrees: Very thorough explanation; thanks xan - Rand
Chatmaster agrees: Very good explained
2K agrees: good reply... thank you.

Last edited by xan : March 1st, 2005 at 02:49 PM.

Reply With Quote
  #36  
Old March 1st, 2005, 11:30 PM
randfish's Avatar
randfish randfish is offline
SEO Chat Intermediate (1500 - 1999 posts)
 
Join Date: Jul 2004
Location: Seattle, WA
Posts: 1,874 randfish User rank is Sergeant (500 - 2000 Reputation Level)randfish User rank is Sergeant (500 - 2000 Reputation Level)randfish User rank is Sergeant (500 - 2000 Reputation Level)randfish User rank is Sergeant (500 - 2000 Reputation Level)randfish User rank is Sergeant (500 - 2000 Reputation Level) 
Time spent in forums: 6 Days 12 h 54 m 32 sec
Reputation Power: 11
Xan,
Sorry not see you here at SES
I think you've touched on a great point that many SEOs and webmasters are totally unaware. The spelling issue touches on the deeper issue of 'quality' measurement that search engines use. On-page factors like spelling, grammar, even readability indices (like Fleisch-Kincaid - used in MS Word) are all parts of this. A search engine's time is well-spent in on-page analysis, because it is computationally inexpensive - this means that quality writing and quality articles can get a boost simply from their writing style - very good to know.

Xan - Just so you're aware for the future, unlike SEW, SEOChat doesn't allow for live links generally - so the mods may disable them, sorry! I know you have no personal interest in promoting yourself and are just seeking to educate all of us and I appreciate it quite a bit.

More coming soon on advanced material - when I return from this conference (which has me sleeping 5.5hours and racing around like crazy)!

Reply With Quote
  #37  
Old March 2nd, 2005, 05:52 AM
xan's Avatar
xan xan is offline
Contributing User
SEO Chat Newbie (0 - 499 posts)
 
Join Date: Oct 2004
Location: UK
Posts: 108 xan User rank is Lance Corporal (50 - 100 Reputation Level)xan User rank is Lance Corporal (50 - 100 Reputation Level)xan User rank is Lance Corporal (50 - 100 Reputation Level) 
Time spent in forums: 20 h 9 m 25 sec
Reputation Power: 5
Thanks Randfish,

I understand about the links, its normal. I hear that you guys are having tons of fun at the SEW conference. I'm reading bits and pieces and there seems to be word of blogging, RSS, marketing issues and things like that, google, yahoo and msn compant too. SEO heaven right?

I will be going to SEM in Boston next month, and this is a pure search conference for researchers and people involved in the build. I'm looking forward to that, always great to meet new people.

Maybe we can compare notes!

Last edited by xan : March 3rd, 2005 at 05:47 PM.

Reply With Quote
  #38  
Old March 2nd, 2005, 08:04 AM
Chatmaster's Avatar
Chatmaster Chatmaster is offline
The BIG Lion of SEO
SEO Chat Novice (500 - 999 posts)
 
Join Date: Jan 2004
Location: Gangsters Paradise, South Africa
Posts: 669 Chatmaster User rank is Corporal (100 - 500 Reputation Level)Chatmaster User rank is Corporal (100 - 500 Reputation Level)Chatmaster User rank is Corporal (100 - 500 Reputation Level)Chatmaster User rank is Corporal (100 - 500 Reputation Level) 
Time spent in forums: 1 Week 33 m 4 sec
Reputation Power: 6
I guess Bush is a malapropism specialist, lol.

But seriously now, I can see the benefits that semantics holds in for Google, especialy when you think on the amount of research they've been doing on globalising. Google certainly is aiming for global non-english markets, therefore semantics is the way to go.

Reply With Quote
  #39  
Old March 8th, 2005, 04:42 PM
raz's Avatar
raz raz is offline
Contributing User
SEO Chat Intermediate (1500 - 1999 posts)
 
Join Date: Mar 2004
Location: Los Angeles, CA
Posts: 1,847 raz User rank is Sergeant (500 - 2000 Reputation Level)raz User rank is Sergeant (500 - 2000 Reputation Level)raz User rank is Sergeant (500 - 2000 Reputation Level)raz User rank is Sergeant (500 - 2000 Reputation Level)raz User rank is Sergeant (500 - 2000 Reputation Level) 
Time spent in forums: 4 Weeks 14 h 20 m 46 sec
Reputation Power: 24
Any News?

Hi Randfish,
Any news of the tool you were working on?
raz

Reply With Quote
  #40  
Old March 8th, 2005, 05:35 PM
randfish's Avatar
randfish randfish is offline
SEO Chat Intermediate (1500 - 1999 posts)
 
Join Date: Jul 2004
Location: Seattle, WA
Posts: 1,874 randfish User rank is Sergeant (500 - 2000 Reputation Level)randfish User rank is Sergeant (500 - 2000 Reputation Level)randfish User rank is Sergeant (500 - 2000 Reputation Level)randfish User rank is Sergeant (500 - 2000 Reputation Level)randfish User rank is Sergeant (500 - 2000 Reputation Level) 
Time spent in forums: 6 Days 12 h 54 m 32 sec
Reputation Power: 11
Sadly, my developer has been ill and busy. I'm hoping for a release within the next 2 weeks, however. Thanks for enquiring.

Reply With Quote
  #41  
Old March 8th, 2005, 10:37 PM
raz's Avatar
raz raz is offline
Contributing User
SEO Chat Intermediate (1500 - 1999 posts)
 
Join Date: Mar 2004
Location: Los Angeles, CA
Posts: 1,847 raz User rank is Sergeant (500 - 2000 Reputation Level)raz User rank is Sergeant (500 - 2000 Reputation Level)raz User rank is Sergeant (500 - 2000 Reputation Level)raz User rank is Sergeant (500 - 2000 Reputation Level)raz User rank is Sergeant (500 - 2000 Reputation Level) 
Time spent in forums: 4 Weeks 14 h 20 m 46 sec
Reputation Power: 24
Quote:
Originally Posted by randfish
Sadly, my developer has been ill and busy. I'm hoping for a release within the next 2 weeks, however. Thanks for enquiring.

Can't wait Randfish, I'll be praying hard for his quick recovery...
raz

Reply With Quote
  #42  
Old October 17th, 2007, 09:24 AM
Ondrej Ondrej is offline
Registered User
SEO Chat Newbie (0 - 499 posts)
 
Join Date: Oct 2007
Posts: 1 Ondrej User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 27 m 28 sec
Reputation Power: 0
Quote:
Originally Posted by randfish
Sadly, my developer has been ill and busy. I'm hoping for a release within the next 2 weeks, however. Thanks for enquiring.


Hi, I know this is pretty old discussion, but since I am fairly new to SEO, could you please tell where does the research stand nowadays? I suppose your tool hasn't been developed after all, but what about the whole maths and mainly LSA (or LSA principle) usage? Can you please tell?

Reply With Quote
  #43  
Old October 19th, 2007, 03:10 PM
Stimmed Stimmed is offline
Registered User
SEO Chat Newbie (0 - 499 posts)
 
Join Date: Oct 2007
Posts: 2 Stimmed User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 3 h 43 m 16 sec
Reputation Power: 0
Quote:
Originally Posted by Ondrej
Hi, I know this is pretty old discussion, but since I am fairly new to SEO, could you please tell where does the research stand nowadays? I suppose your tool hasn't been developed after all, but what about the whole maths and mainly LSA (or LSA principle) usage? Can you please tell?


Yes I would be interested in any updates that could be made to this discussion after a year of SEO concept changes.

Reply With Quote
Reply

Viewing: SEO Chat ForumsSearch Engine StrategiesSearch Technologies > Advanced On/Off-Page Optimization for Engines using Semantic Analysis