|
-
Jan 5th, 2007, 05:57 PM
#1
Thread Starter
Fanatic Member
[2005] Compare Strings
I am trying to find a way to compare two strings to see if they are similar to or equal to one another.
I know that I can use String.Compare to determine if two strings are equal to one another, but I want to see if the strings match in an 80% or less range of equality.
For example,
VB Code:
Dim strA As String, strB As String
strA = "The dog ran around the house."
strB = "The dog ran around a house."
' Now I want to compare strA and strB to see if they are the same within an 80% or less margin of difference.
' If they are almost the same do something within 80%.
Now I want to compare strA and strB to see if they are the same within an 80% or less margin of difference.
Any tips on where to start?
Thanks.
-
Jan 5th, 2007, 06:54 PM
#2
Re: [2005] Compare Strings
well you could check characters for equality to get a % of what matches, however that would basically break as soon as it encountered a difference, because it would now know where to then pick up from in the matching again unless you have a pretty spiffy algo in place to do so.
-
Jan 5th, 2007, 07:08 PM
#3
Thread Starter
Fanatic Member
Re: [2005] Compare Strings
Looks like I need to use Levenshtein Distance Algorithm that will get me pretty close.
-
Jan 5th, 2007, 07:09 PM
#4
Frenzied Member
Re: [2005] Compare Strings
An algorithm/idea you could use could be:
The number of words stringA and stringB has.(Do they have same number or within the same range?)
Does each word from stringA have a matching word from stringB.
The location of the matching words, are they placed in the same place or within a certain range.
For instance:
stringA = "Dog chewed on bone"
stringB = "Dog chewed bone"
stringA has 4 words and stringB has 3 words. So they have "about" same number of words. So you can take that into account. StringA has 3 words that have matching words from stringB which are "Dog", "chewed", and "bone". So 3 out of 4 matching words; So you take that into account. Then location, stringA's "bone" is fourth word in string, while stringB's "bone" is third word, and depending on how you want the range to be, you can use that on how to determine how close the strings match.
-
Jan 5th, 2007, 07:16 PM
#5
Re: [2005] Compare Strings
This is a complicated subject, the first place to start is to define what you mean by "similarity", the differences between two text strings are usually stated as being one or more of the following (I got this list from a book I have).
Shared letters
Shared sequences of letters
Shared words
Shared sequences of words
Shared sequences of word segments
Shared syntax
Shared vocabulary
Shared equivalence of word meanings
Writing an algorithm for the first four or five is probably not that tough, but the last ones would be real nasty.
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules
|
Click Here to Expand Forum to Full Width
|