|
-
Sep 19th, 2002, 11:52 PM
#1
Thread Starter
Addicted Member
keyword search
I need to do a keyword search on various fields in a database. However, I'm not getting my keywords from a textbox but from a string. A user enters a question or sentence, and I need to find other records that may be related.
Problem 1) I figure to get the keywords, I would split the string on spaces, but I don't want to search for all the words, especially not the more common ones (like the, a, an, but, etc.) Could I do this with a text file of common words, or maybe a table of common words in the database?
Problem 2) Assuming I have the keywords, how would I do the search? I can't assume that the keywords will always be in the same order...
Any help would be appreciated. Code would be ok, but I'm looking more just for the general logic that I would need for such a project. Thanks.
-
Sep 20th, 2002, 12:35 AM
#2
-= B u g S l a y e r =-
Hi there Feetstink, sounds to me that you already have thought this through. 
1. As you suggested yourself, split the sting into an array.
Load the nono words from a file. Loop through the array and build up your SQL expr. While looping check that the current word is not a nono word by using InStr.
2. The strings you want must contain all the valid words from the array in 1. You can achieve this by using AND in your sql
WHERE Field1 Like '%asdfasd%' And Field1 Like '%axcfrewr%' ......
-
Sep 20th, 2002, 12:45 AM
#3
Thread Starter
Addicted Member
Thanks for the reply 
The first part sounds good, and I think I can code it. 
The second part is good too, but I need something a little different. I need it to bring back matches that are like >75% right too... I dont' want to eliminate a match because someone used the word Compaq instead of computer. You know what I mean?
thanks..
-
Sep 20th, 2002, 02:13 AM
#4
-= B u g S l a y e r =-
Oh.. I see. Think that will be a tough one. I have never done such coding, so I really can't give you any usefull tip on that subject ... sorry
-
Sep 20th, 2002, 04:54 AM
#5
Thread Starter
Addicted Member
-
Sep 20th, 2002, 08:18 AM
#6
I looked at this and thought "I should know how to do this!".. so i've worked it out!
you need to do this in a few stages, including temporary tables (temp table syntax may be different depending on you database system). You'll need to change field/table names to what you have in your database
Step 1: put all matching rows into temporary table
Code:
select *, 0 as tmp_count from department
where dept_name like '%Di%'
or dept_name like '%ec%'
or ....
into my_temp
Step 2: for each row n the temp table, calculate how many of the words match (repeat for each word)
Code:
update my_temp
set tmp_count = tmp_count + 1
where dept_name like '%Di%'
;
update my_temp
set tmp_count = tmp_count + 1
where dept_name like '%ec%'
;
...
Step 3: get the results! (in descending order or relevance)
Code:
select * from my_temp
order by tmp_count desc
Step 4: remove the temporary table
Code:
drop temp table my_temp
-
Sep 20th, 2002, 08:33 AM
#7
update: to get the percentage relevance too, change the SQL in step 3 to this:
Code:
select *, ( tmp_count / <number of words to search for> ) * 100
from my_temp
order by tmp_count desc
-
Sep 20th, 2002, 08:36 AM
#8
Frenzied Member
Which database your checking against ??
If it is SQL Server there is something like English Query that is available for this kind of stuff.
Code:
If Question = Incomplete Then
AnswerNextOne
Else
ReplyIfKnown
End If
cu Swatty
-
Sep 25th, 2002, 04:42 AM
#9
Thread Starter
Addicted Member
Temporary Tables
I'm sorry i took such a long time to reply to my own question. Thanks for your help. I'm using Microsoft Access 2000 for my database. I can't find anything about temporary tables in the help for Access. is it possible to do?
In the first line of Step 1: it says
select *, 0 as tmp_count from department
what does the 0 do?
Also does anyone know where I can find a text file of common words so I don't have to start one from scratch? preferably comma delmited...
thanks again..
-
Sep 25th, 2002, 05:46 AM
#10
the "0 as tmp_count" means that when the temporary table is filled with the matching rows, there is an extra column added (called tmp_count), which is later used to count the number of times that the word(s) appear (in step 2).
This way you can get them ordered by relevance, rather than just in a random order.
the alternative would be to just do this instead of all the steps I listed before:
select * from department
where dept_name like '%Di%'
or dept_name like '%ec%'
or ....
of course this doesn't put the most relevant ones first.
I've just checked Access, the INTO TEMP bit moves up to before the FROM, eg:
select *, 0 as tmp_count into my_temp
from department
where dept_name like '%Di%'
or dept_name like '%ec%'
or ....
FROM Countries;
-
Sep 25th, 2002, 05:48 AM
#11
Addicted Member
Maybe this will be some help,
1000 most common English words
-
Sep 25th, 2002, 06:06 AM
#12
Thread Starter
Addicted Member
thank you both for your replies, si especially...
i'll probably be back sometime, but it may take a long time for me to get enough of a working version to have more questions.
-
Sep 25th, 2002, 06:10 AM
#13
no problem, it was useful for me too... not that I'm gonna use it at the moment!
have fun with the rest of it
-
Sep 25th, 2002, 06:28 AM
#14
Thread Starter
Addicted Member
where did you find the information for the temporary tables in Access? I can't find anything about dropping them... i just need a site where i can find more of that kind of information, if you know of one. thanx.
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules
|
Click Here to Expand Forum to Full Width
|