|
-
Oct 15th, 2003, 04:46 PM
#1
Thread Starter
Addicted Member
Anyone got some good search rank algorithms?
I'm developing a search engine for my company. We're using SQL Server 2000. Granted, using FreeText is an option. One we plan to utilize.
We have some more specific text searches where we are not going to use FreeText. I am going to have to build the engine myself.
Given a field and a searched value, what are some good algorithms to rank the results I've returned?
I've been told many people use the number of word occurence to rank result. I've been told position of the word. I've been told exact word matches come first.
All of these sound good. Anyone got any routines that use any multiples of these?
Any help is greatly appreciated.
Thanx in advance.
-
Oct 15th, 2003, 05:34 PM
#2
I don't have a specific algorithm here, but I did something like this looking for similarities in strings. That would be like a spell checker suggesting options, except that my strings weren't words.
This can be loads of fun coming up with complex ranking schemes, but easier may be better.
I would go with an exact match of words and sequences being always the best. However, for the next best, I would be looking at the way the user would be putting in words. If they will type sentences: "Find me all records with the word BLUE", then all of those words except "Blue" have nothing to do with the search. If they will just enter nouns, then sequence is probably unimportants. If they will be entering adjective-noun combinations, then matching the words is probably less important than matching the sequence. For example, if the query if for Red Bull, then all records that have both would not be as valuable as words that have them in that sequence.
Therefore, look for a pattern in how the user is expected to query, and build your ranking based on that.
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules
|
Click Here to Expand Forum to Full Width
|