I am trying to understand how the Google search engine works in indexing page-specific content. A number of documents I have read imply that Google creates a list of word occurrences called a "hit list" for each word on each document and that it considers the type of hit, i.e., title, header, anchor text, etc., and the properties of the hit, i.e., position, font size, capitalization, etc. I also understand that during a search Google looks at each document's hit list for each search word and applies a weight depending upon its type and properties and produces an IR score.

What I am confused about is how Google handles multiple instances of simillar words. Is there a separate hit list for each of the similar words, or are the similar words combined into a single hit list?

I would appreciate any clarification on this question.