|
-
May 13th, 2011, 01:31 PM
#1
Thread Starter
Frenzied Member
Employing fuzzy logic
I need to take a bill of material file from a customer in xml format and import it into our system. The problem I'm running into is that the terms they use to describe the types of material may vary. So, for example, if they are specifying a piece of threaded rod, they may call it "THREADED ROD", "THREADEDROD", "THRD ROD", "THRD'D ROD" or some other variation. Another example might be rectangular tubing. They might call it "RECTANGULAR TUBING", "RECT TUBING", "RECT TUBE", etc. There will be many types of material besides these two that I need to catch. Once the program determines what the material is it then branches to the appropriate subroutine to do further parsing. I was trying to think of a way to use regex or the "Like" or "Contains" operators to do this. Here is one way I thought of using "Select ... Case" but it isn't very flexible & will only catch hard-coded terms:
Code:
Select Case material
Case "THREADED ROD", "THRD ROD" 'etc...
Case "RECTANGULAR TUBE", "RECT TUBE", "RECT TUBING" 'etc...
End Select
I am looking for something a little more flexible. Anyone have any ideas?
-
May 13th, 2011, 02:27 PM
#2
Re: Employing fuzzy logic
That is the very essence of human beings and our very unique capability to perform complex reasoning and actually use our perception to further our understanding of things. We like to solve problems. -Kleinma
Does your code in post #46 look like my code in #45? No, it doesn't. Therefore, wrong is how it looks. - jmcilhinney
-
May 16th, 2011, 12:01 PM
#3
Thread Starter
Frenzied Member
Re: Employing fuzzy logic
Thanks wb ... not sure how I would use your class in my case though. Anyone else have any ideas?
-
May 16th, 2011, 02:36 PM
#4
Re: Employing fuzzy logic
You would have to get a list of items from your system, then compare the input to each one, then choose the one with the highest score.
That is the very essence of human beings and our very unique capability to perform complex reasoning and actually use our perception to further our understanding of things. We like to solve problems. -Kleinma
Does your code in post #46 look like my code in #45? No, it doesn't. Therefore, wrong is how it looks. - jmcilhinney
-
May 16th, 2011, 08:29 PM
#5
Re: Employing fuzzy logic
Sounds like a nightmare. All systems I've ever seen have some type of unique identifier, Part#, UPC. Who is inputting and creating these xml file. Maybe you establish some type of control at that point.
-
May 16th, 2011, 09:43 PM
#6
Re: Employing fuzzy logic
nb
I gotta go with Wes on this.
Eliminate the fuzzy choices and force the users to select pre-defined choices.
Spoo
-
May 16th, 2011, 11:17 PM
#7
Re: Employing fuzzy logic
That would be ideal, but in some cases is not applicable. In my case I was cleaning duplicate entries out of a database. Because they were names, it still required human intervention to verify that the entries were in fact duplicates, but the fuzzy logic search aided the users in finding possible duplicates.
That is the very essence of human beings and our very unique capability to perform complex reasoning and actually use our perception to further our understanding of things. We like to solve problems. -Kleinma
Does your code in post #46 look like my code in #45? No, it doesn't. Therefore, wrong is how it looks. - jmcilhinney
-
May 17th, 2011, 09:42 AM
#8
Re: Employing fuzzy logic
It's not going to be easy, your best bet is to spend more than 10 seconds thinking about the code wild_bill linked. When it comes to tedious, hard problems you'll find few people willing to write the whole application for you so you can copy/paste your way to a paycheck/grade.
Fuzzy string matching always works the same way at the highest level. You have a target string and an input string, and you want to know if the input string is close at all to the input string. The most common way to do this is to write an algorithm that compares the two strings and assigns a "score" to them. Then, you decide some threshold for the score and assume any string with a score greater than the threshold is probably a match.
For example, suppose we have a function that compares two strings and returns a value between 0 and 1.0 based on how close the strings are:
Code:
Function FuzzyCompare(ByVal target As String, ByVal input As String) As Double
- The strings "cat" and "Thomas Jefferson" will probably result in 0; they have different numbers of letters and completely different letters in different positions.
- "cat" and "dog" might return 0.1, since they have the same letters.
- "cat" and "tab" might be between 0.25 and 0.5 since they have similar letters and one match.
- "cat" and "cta" might return a high value like 0.8 since there's just a transposed letter.
- "CAT" and "cat" should be very high, maybe 0.9 or even 1.0 since the only difference is capitalization.
Do you see the pattern? A good fuzzy string matching algorithm defines a score range and assigns a higher score to strings that more closely match. You could certainly implement one yourself, but it's quite complicated. Since wild_bill's already written one, why not ask him to document how it's used rather than imply, "I looked but I can't copy and paste it so it's not what I want."?
Note that no fuzzy matching algorithm is perfect. You're going to have to embrace some uncertainty. Strings with a high score probably match, but depending on algorithm implementation they might vary by case. Also keep in mind strings like '6" PVC THREADED' and '4" PVC THREADED' are already very close, so a fuzzy algorithm doesn't help much with deciding which one "4 PCV THRD" matches: you have to do some extra work. I wouldn't automatically select strings based on fuzzy matching in an important application unless there was a chance for the user to double-check the value and correct errors.
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules
|
Click Here to Expand Forum to Full Width
|