Hi i am kinda new for VB6, i am looking for some code that can compare the similarity of two words in VB, like "weather" and "wther"?
anyone got an idea?
Printable View
Hi i am kinda new for VB6, i am looking for some code that can compare the similarity of two words in VB, like "weather" and "wther"?
anyone got an idea?
Do you mean u want it to check if the words are the same if so try this code
If Word1.text = Word2.text then
Msgbox " The words are the same"
Else if
Msgbox "The Words are not the same"
End if
If this isnt what you mean can you explain a little more
Dom
no i mean how to check the similarity? like 2 words or sentences have 90% are the same, can vb doing that?
You would have to give us several examples and the "percent the same" that you'd expect for each example.
For example "weather" and "wther" are 70% similar
"Unable to select an address from the list" and "Unable to select an address " has 23 characters out of 34 characters are the same so 67.6% are similar
I imagine it's quite hard to write something like that... i'm interested if somebody has allready did it or is about to... :)
Luckily I remembered that I wrote this years ago, and somehow managed to find the code.. it's certainly not pretty, but it does the job:
VB Code:
Public Function Similarity(ByVal String1 As String, _ ByVal String2 As String, _ Optional ByRef RetMatch As String, _ Optional min_match = 1) As Single 'Returns percentile of similarity between 2 strings (ignores case) '"RetMatch" returns the characters that match(in order) '"min_match" specifies minimum number af char's in a row to match Dim b1() As Byte, b2() As Byte Dim lngLen1 As Long, lngLen2 As Long Dim lngResult As Long If UCase(String1) = UCase(String2) Then '..Exactly the same Similarity = 1 Else '..one string is empty lngLen1 = Len(String1) lngLen2 = Len(String2) If (lngLen1 = 0) Or (lngLen2 = 0) Then Similarity = 0 Else '..otherwise find similarity b1() = StrConv(UCase(String1), vbFromUnicode) b2() = StrConv(UCase(String2), vbFromUnicode) lngResult = Similarity_sub(0, lngLen1 - 1, _ 0, lngLen2 - 1, _ b1, b2, _ String1, _ RetMatch, _ min_match) Erase b1 Erase b2 If lngLen1 >= lngLen2 Then Similarity = lngResult / lngLen1 Else Similarity = lngResult / lngLen2 End If End If End If End Function Private Function Similarity_sub(ByVal start1 As Long, ByVal end1 As Long, _ ByVal start2 As Long, ByVal end2 As Long, _ ByRef b1() As Byte, ByRef b2() As Byte, _ ByVal FirstString As String, _ ByRef RetMatch As String, _ ByVal min_match As Long, _ Optional recur_level As Integer = 0) As Long '* CALLED BY: Similarity * (RECURSIVE) Dim lngCurr1 As Long, lngCurr2 As Long Dim lngMatchAt1 As Long, lngMatchAt2 As Long Dim I As Long Dim lngLongestMatch As Long, lngLocalLongestMatch As Long Dim strRetMatch1 As String, strRetMatch2 As String If (start1 > end1) Or (start1 < 0) Or (end1 - start1 + 1 < min_match) _ Or (start2 > end2) Or (start2 < 0) Or (end2 - start2 + 1 < min_match) Then Exit Function '(exit if start/end is out of string, or length is too short) End If For lngCurr1 = start1 To end1 '(for each char of first string) For lngCurr2 = start2 To end2 '(for each char of second string) I = 0 Do Until b1(lngCurr1 + I) <> b2(lngCurr2 + I) 'as long as chars DO match.. I = I + 1 If I > lngLongestMatch Then '..if longer than previous best, store starts & length lngMatchAt1 = lngCurr1 lngMatchAt2 = lngCurr2 lngLongestMatch = I End If If (lngCurr1 + I) > end1 Or (lngCurr2 + I) > end2 Then Exit Do Loop Next lngCurr2 Next lngCurr1 If lngLongestMatch < min_match Then Exit Function 'no matches at all, so no point checking for sub-matches! lngLocalLongestMatch = lngLongestMatch 'call again for BEFORE + AFTER RetMatch = "" 'Find longest match BEFORE the current position lngLongestMatch = lngLongestMatch _ + Similarity_sub(start1, lngMatchAt1 - 1, _ start2, lngMatchAt2 - 1, _ b1, b2, _ FirstString, _ strRetMatch1, _ min_match, _ recur_level + 1) If strRetMatch1 <> "" Then RetMatch = RetMatch & strRetMatch1 & "*" Else RetMatch = RetMatch & IIf(recur_level = 0 _ And lngLocalLongestMatch > 0 _ And (lngMatchAt1 > 1 Or lngMatchAt2 > 1) _ , "*", "") End If 'add local longest RetMatch = RetMatch & Mid$(FirstString, lngMatchAt1 + 1, lngLocalLongestMatch) 'Find longest match AFTER the current position lngLongestMatch = lngLongestMatch _ + Similarity_sub(lngMatchAt1 + lngLocalLongestMatch, end1, _ lngMatchAt2 + lngLocalLongestMatch, end2, _ b1, b2, _ FirstString, _ strRetMatch2, _ min_match, _ recur_level + 1) If strRetMatch2 <> "" Then RetMatch = RetMatch & "*" & strRetMatch2 Else RetMatch = RetMatch & IIf(recur_level = 0 _ And lngLocalLongestMatch > 0 _ And ((lngMatchAt1 + lngLocalLongestMatch < end1) _ Or (lngMatchAt2 + lngLocalLongestMatch < end2)) _ , "*", "") End If 'Return result Similarity_sub = lngLongestMatch End Function
Just call Similarity, which you can do like this:
I don't know if this is what you want, but it finds multiple similarities if possible. For example, if the second string was changed to "Unable select an address list", it would match all of the characters to the first string, in 3 parts ("Unable" + " select an address " + "list").VB Code:
Dim sngPercentage as Single Dim strMatch as String sngPercentage = Similarity("Unable to select an address from the list", "Unable to select an address ", strMatch) * 100 '(need to multiply by 100, as the code returns a percentile rather than percentage) MsgBox "Match of " & sngPercentage & "% " & vbNewLine & "Matching text: " & strMatch
..if you don't want that, just something like the For loops in _sub should be enough!
I've never tried anything like this before, so maybe I'm missing something (someone tell me), but it seems pretty easy. This is what I came up with.
VB Code:
Option Explicit 'Actual function that compares two strings. Private Function PercentTheSame(ByVal Text As String, ByVal CompareWith As String, Optional ByVal CaseSensitive As Boolean = False) As Single Dim lonLenText As Long, lonLenCompare As Long Dim lonLoop As Long, lonDiff As Long Dim strCur As String, strC As String lonLenText = Len(Text) lonLenCompare = Len(CompareWith) For lonLoop = 1 To lonLenText If lonLoop > lonLenCompare Then lonDiff = lonDiff + 1 Else If CaseSensitive = False Then strCur = LCase$(Mid$(Text, lonLoop, 1)) strC = LCase$(Mid$(CompareWith, lonLoop, 1)) Else strCur = Mid$(Text, lonLoop, 1) strC = Mid$(CompareWith, lonLoop, 1) End If If Not strCur = strC Then lonDiff = lonDiff + 1 End If End If Next lonLoop PercentTheSame = CSng(((lonLenText - lonDiff) / lonLenText) * 100) End Function 'Finds which string is longer to use in the comparison function. Private Function StartCompare(ByVal Text As String, ByVal CompareWith As String, Optional ByVal CaseSensitive As Boolean = False) As Single 'Quick test so we don't waste time. If Text = CompareWith Then StartCompare = 100 Exit Function End If If Len(Text) > Len(CompareWith) Then StartCompare = PercentTheSame(Text, CompareWith, CaseSensitive) Else StartCompare = PercentTheSame(CompareWith, Text, CaseSensitive) End If End Function Private Sub Form_Load() Dim str1 As String, str2 As String str1 = "this is a test string" str2 = "this is a sentence not too much like the first" MsgBox StartCompare(str1, str2) 'MsgBox StartCompare(str1, str2, True) 'TRUE for case-sensitive compare. End Sub
Hi
Try the following & let me know if it helped...
VB Code:
Public Function SoundEx(cString As String) As String Dim iLen As Integer Dim cChar As String * 1 Dim jTemp As String * 1 Dim jTemp1 As String iLen = Len(Trim(cString)) For i = 2 To iLen cChar = LCase(Mid(cString, i, 1)) Select Case cChar Case "a", "e", "h", "i", "o", "u", "w", "y" jTemp = 0 Case "b", "f", "p", "v" jTemp = "1" Case "c", "g", "j", "k", "q", "s", "x", "z" jTemp = "2" Case "d", "t" jTemp = "3" Case "l" jTemp = "4" Case "m", "n" jTemp = "5" Case "r" jTemp = "6" End Select If jTemp <> 0 Then If Right(jTemp1, 1) <> jTemp Then jTemp1 = jTemp1 & jTemp End If End If jTemp = 0 Next i SoundEx = UCase(Mid(cString, 1, 1)) & jTemp1 SoundEx = SoundEx & String(4 - Len(SoundEx), "0") End Function
Now for comparison, try
VB Code:
If SoundEx(Text1.Text) = SoundEx(text2.Text) Then MsgBox "Yeah!!! They are similar" Else MsgBox "Oops!!! Doesn't match" End If
Hi there, long shot replying 12 years later but is this code based on a mathematical/statistical method or just your own intuition? Loving it regardless!
Can someone please explain how to utilize this vba? Everytime I put it in module I get sub error.
Thank you!
Please see this explanation.
https://www.source-code.biz/snippets/vbasic/4.htm
Looks like it’s written for SQL not to be used in EXCEL is that right?
It's VBA code with two versions. Put either one in a code module and see what happens. You can also search VBForums for 'soundx' and probably find an answer. BTW, what exactly happened when you tried the solutions from Digirev and twisted thoughts.
This got me interested in the algorithmic point of view. But also makes me wonder, what use there is to know the similarity percentage between two strings?Quote:
How to compare the similarity of two words in VB?
For example "weather" and "wther" are 70% similar
"Unable to select an address from the list" and "Unable to select an address " has 23 characters out of 34 characters are the same so 67.6% are similar
I don't get it. (And Regex makes me blackout :-))Quote:
Think google-search and/or Regex
But I thought it could be useful to find a correction suggestions to a misspelled word.
https://github.com/StefH/SimMetrics.Net -- some algorithms besides basic Levenstein.
I've used this (similar library) to implement auto-correct on free-text search on a products catalog site.
When the user is searching for a term which is not found in terms table (i.e. unique list of stemmed roots of all words in product descriptions) then a suggestion was made and the query was executed for the most similar search term instead (with an option to rollback to original).
cheers,
</wqw>
Our task is not to ask why, unless asking gives an idea for implementation (which it often does, and the question should be worded that way). If someone asks for something and we have an idea of a way to do it, we should make suggestions.
The original request also is very vague...phonetic similarity, characters, similes between words? We can assume character-based, in which case a custom algorithm which takes into account the length of the word and the number of characters that appear in both in the same place or nearby should do the job...it's something even the amateur coders can do, and is often something given as a programming task in classes, though I highly doubt people are still being taught VB6 these days :-)
Another one which plays in the same ballpark is Ratcliff/Obershelp:
https://en.wikipedia.org/wiki/Gestalt_pattern_matching
These algos do not work like "Google" (which is a FullText-search, based on AND/OR combined whole "stemmed words").
It also has nothing to do with "RegExp"...
Ratcliff is built into the RC5/6 SQLite-wrapper (as a UserDefined-Function):
OlafCode:Option Explicit
Private Sub Form_Load()
Dim MemDB As cMemDB, Rs As cRecordset
Set MemDB = New_c.MemDB
MemDB.Exec "Create Table T(ID Integer Primary Key, Name Text)"
'Demo-inserts
Const insName = "Insert Into T(Name) Values(?)"
MemDB.ExecCmd insName, "Beth Mercks"
MemDB.ExecCmd insName, "Brett Perks"
MemDB.ExecCmd insName, "Fred Merx"
MemDB.ExecCmd insName, "Mad Max"
MemDB.ExecCmd insName, "Matt Murks"
MemDB.ExecCmd insName, "Mett Wurst"
'Ratcliff-query (unsharp search)
Const qryUnsharp = "Select Top 5 Ratcliff(Name, ?) RcPerc, * From T " & _
" Order By RcPerc Desc"
Set Rs = MemDB.GetRs(qryUnsharp, "Brad Merks") '<- SearchParameter-Passing
Do Until Rs.EOF
If Rs.AbsolutePosition = 1 Then Debug.Print "RcPerc Name"
Debug.Print Rs!RcPerc, Rs!Name
Rs.MoveNext
Loop
End Sub
Wow! So much to learn!
I didn't know so many algorithms exist for this.
I only found about Myers algorithm, and, kind of, implemented some variant of it, since I didn't understand the original one.
But I think mine is doing fine though :-)
Maybe I will take the time to know more algorithms...
Would you share your code? I am also trying to compare texts, and Myers-Diff was the next thing I wanted to try.
After adding this module, use TextDiff(s1, s2) which returns a value between 0-1. (1 means identitcal)
I think the Levenshtein Distance method is the "Industry Standard"
Code:Public Function LevenDis(sTxt1 As String, sTxt2 As String, Optional blnAsPercent = True) As Double
Dim i As Long, j As Long
Dim Len1, Len2 As Long
Dim min1 As Long, min2 As Long
Dim lDis() As Long
Len1 = Len(sTxt1)
Len2 = Len(sTxt2)
ReDim lDis(Len1, Len2)
For i = 0 To Len1: lDis(i, 0) = i: Next i
For j = 0 To Len2: lDis(0, j) = j: Next j
For i = 1 To Len1
For j = 1 To Len2
If Mid(sTxt1, i, 1) = Mid(sTxt2, j, 1) Then
lDis(i, j) = lDis(i - 1, j - 1)
Else
min1 = lDis(i - 1, j) + 1
min2 = lDis(i, j - 1) + 1
If min2 < min1 Then
min1 = min2
End If
min2 = lDis(i - 1, j - 1) + 1
If min2 < min1 Then
min1 = min2
End If
lDis(i, j) = min1
End If
Next j
Next i
LevenDis = lDis(Len1, Len2)
If blnAsPercent Then LevenDis = Abs(1 - (LevenDis / IIf(Len(sTxt1) > Len(sTxt2), Len(sTxt1), Len(sTxt2)))) * 100
End Function