Results 1 to 3 of 3

Thread: SoundX and Levenshtein Distance Algorithms

  1. #1

    Thread Starter
    Addicted Member
    Join Date
    Mar 2007
    Location
    India
    Posts
    227

    SoundX and Levenshtein Distance Algorithms

    Soundex
    From Wikipedia, the free encyclopedia
    Soundex is a phonetic algorithm for indexing names by their sound when pronounced in English. The basic aim is for names with the same pronunciation to be encoded to the same string so that matching can occur despite minor differences in spelling. Soundex is the most widely known of all phonetic algorithms and is often used (incorrectly) as a synonym for "phonetic algorithm".
    The Soundex code for a name consists of a letter followed by three numbers: the letter is the first letter of the name, and the numbers encode the remaining consonants. Similar sounding consonants share the same number so, for example, the HYPERLINK "http://en.wikipedia.org/wiki/Labial"

    Levenshtein distance
    From Wikipedia, the free encyclopedia
    In information theory and computer science, the Levenshtein distance or edit distance between two strings is given by the minimum number of operations needed to transform one string into the other, where an operation is an insertion, deletion, or substitution of a single character. It is named after Vladimir Levenshtein, who considered this distance in 1965. It is useful in applications that need to determine how similar two strings are, such as spell checkers.
    For example, the Levenshtein distance between "kitten" and "sitting" is 3, since these three edits change one into the other, and there is no way to do it with fewer than three edits:
    kitten sitten (substitution of for
    sitten sittin (substitution of for
    sittin sitting (insert at the end)

    This is a class so using it will be very easy.

    Where do we use these algorithms. Well in writing Spell checkers and such software where words have to be computed into some kind of values for proximity calcualtions, etc.

    Will upload a working sample soon also.

    Hope you enjoy this release.
    Attached Files Attached Files

  2. #2
    Fanatic Member FireXtol's Avatar
    Join Date
    Apr 2010
    Posts
    874

    Re: SoundX and Levenshtein Distance Algorithms

    I was having issues with large words. So I modified this function in the class:

    vb Code:
    1. Private Function Soundex(argWord As String)
    2. Dim workStr As String, i As Long
    3.  
    4.     '// Capitalize it to remove ambiguity
    5.     argWord = UCase$(argWord)
    6.    
    7.     '// 1. Retain the first letter of the string
    8.     workStr = Left$(argWord, 1)
    9.    
    10.     '// 2. Replacement
    11.     '   [a, e, h, i, o, u, w, y] = 0
    12.     '   [b, f, p, v] = 1
    13.     '   [c, g, j, k, q, s, x, z] = 2
    14.     '   [d, t] = 3
    15.     '   [l] = 4
    16.     '   [m, n] = 5
    17.     '   [r] = 6
    18.    
    19.     For i = 2 To Len(argWord)
    20.         Select Case Mid$(argWord, i, 1)
    21.             Case "B", "F", "P", "V"
    22.                     workStr = workStr & Chr$(49) '// 1
    23.             Case "C", "G", "J", "K", "Q", "S", "X", "Z"
    24.                     workStr = workStr & Chr$(50) '// 2
    25.             Case "D", "T"
    26.                     workStr = workStr & Chr$(51) '// 3
    27.             Case "L"
    28.                     workStr = workStr & Chr$(52) '// 4
    29.             Case "M", "N"
    30.                     workStr = workStr & Chr$(53) '// 5
    31.             Case "R"
    32.                     workStr = workStr & Chr$(56) '// 6
    33.             '// A, E, H, I, O, U, W, Y do nothing
    34.         End Select
    35.     Next i
    36.    
    37.     '// 5. Return the first four bytes padded with 0
    38.     'fix: for long string compatible, do not return only the first four bytes, but all of them
    39.     'fix2: removed padding, seemed like it did not make any difference to the GetLevenshteinDistance function
    40.         Soundex = workStr
    41. End Function

    It seems to work much better for long words, but it may have unintended side effects. Additionally, the padding of zeros seems unnecessary, so that was removed, too.

    Here's an example of how to use this(afaik!):
    vb Code:
    1. Dim cP As New clsPhoneme
    2. Dim subStr(1) As String
    3.  
    4. subStr(0) = cP.GetSoundexWord("electromagnet")
    5. subStr(1) = cP.GetSoundexWord("electromagnetic")
    6.  
    7. Debug.Print subStr(0), subStr(1)
    8.  
    9. Debug.Print cP.GetLevenshteinDistance(subStr(0), subStr(1)) 'should return 1 if you used my modified Soundex function, otherwise it'll be zero
    10. Set cP = Nothing
    Last edited by FireXtol; May 12th, 2010 at 09:33 AM.

  3. #3

    Thread Starter
    Addicted Member
    Join Date
    Mar 2007
    Location
    India
    Posts
    227

    Re: SoundX and Levenshtein Distance Algorithms

    @FireXtol,

    Thanks for suggesting modifications for making the code better.

    Currently I am busy on a project but the moment I have some time on my hand I will look into it and test it extensively.

    Once again thanks for your suggesting and code modifications.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  



Click Here to Expand Forum to Full Width