Re: Employing fuzzy logic
Re: Employing fuzzy logic
Thanks wb ... not sure how I would use your class in my case though. Anyone else have any ideas?
Re: Employing fuzzy logic
You would have to get a list of items from your system, then compare the input to each one, then choose the one with the highest score.
Re: Employing fuzzy logic
Sounds like a nightmare. All systems I've ever seen have some type of unique identifier, Part#, UPC. Who is inputting and creating these xml file. Maybe you establish some type of control at that point.
Re: Employing fuzzy logic
nb
I gotta go with Wes on this.
Eliminate the fuzzy choices and force the users to select pre-defined choices.
Spoo
Re: Employing fuzzy logic
That would be ideal, but in some cases is not applicable. In my case I was cleaning duplicate entries out of a database. Because they were names, it still required human intervention to verify that the entries were in fact duplicates, but the fuzzy logic search aided the users in finding possible duplicates.
Re: Employing fuzzy logic
It's not going to be easy, your best bet is to spend more than 10 seconds thinking about the code wild_bill linked. When it comes to tedious, hard problems you'll find few people willing to write the whole application for you so you can copy/paste your way to a paycheck/grade.
Fuzzy string matching always works the same way at the highest level. You have a target string and an input string, and you want to know if the input string is close at all to the input string. The most common way to do this is to write an algorithm that compares the two strings and assigns a "score" to them. Then, you decide some threshold for the score and assume any string with a score greater than the threshold is probably a match.
For example, suppose we have a function that compares two strings and returns a value between 0 and 1.0 based on how close the strings are:
Code:
Function FuzzyCompare(ByVal target As String, ByVal input As String) As Double
- The strings "cat" and "Thomas Jefferson" will probably result in 0; they have different numbers of letters and completely different letters in different positions.
- "cat" and "dog" might return 0.1, since they have the same letters.
- "cat" and "tab" might be between 0.25 and 0.5 since they have similar letters and one match.
- "cat" and "cta" might return a high value like 0.8 since there's just a transposed letter.
- "CAT" and "cat" should be very high, maybe 0.9 or even 1.0 since the only difference is capitalization.
Do you see the pattern? A good fuzzy string matching algorithm defines a score range and assigns a higher score to strings that more closely match. You could certainly implement one yourself, but it's quite complicated. Since wild_bill's already written one, why not ask him to document how it's used rather than imply, "I looked but I can't copy and paste it so it's not what I want."?
Note that no fuzzy matching algorithm is perfect. You're going to have to embrace some uncertainty. Strings with a high score probably match, but depending on algorithm implementation they might vary by case. Also keep in mind strings like '6" PVC THREADED' and '4" PVC THREADED' are already very close, so a fuzzy algorithm doesn't help much with deciding which one "4 PCV THRD" matches: you have to do some extra work. I wouldn't automatically select strings based on fuzzy matching in an important application unless there was a chance for the user to double-check the value and correct errors.