Results 1 to 8 of 8

Thread: Checking for similar duplicates

  1. #1

    Thread Starter
    Member
    Join Date
    Jun 2007
    Posts
    43

    Checking for similar duplicates

    Hi,

    I have a web site where users enter company names to use in the rest of the app. However, recently I've noticed that similar duplicates are appearing, e.g. someone will enter EastTec Solicitors another will enter EastTec Solicitors Ltd someone else will enter EastTec Solictors (missed the i out in Solicitors), when there should only be one entry of EastTec Solicitors. What is the best way of checking the database for entries similar to what they have entered? How would you about checking for spelling mistakes as well like the Solicitors one?

    Cheers.

  2. #2
    Super Moderator Shaggy Hiker's Avatar
    Join Date
    Aug 2002
    Location
    Idaho
    Posts
    38,989

    Re: Checking for similar duplicates

    Don't use a textbox.

    Seriously, this is an issue that comes up all the time in DB programming. If you have a text field where somebody CAN add abbreviations, typos, alternate spellings, etc., then they will. You might be able to identify the bulk of the errant strings using RegEx or Contains (Contains EastTec might work in your example), but you won't get them all. Therefore, the only certain way to handle this is to design the program in such a way that the user has to select from a list. If there is no such list, then your problem becomes vastly more difficult, as you pretty much have to accept the fact that there will be duplicates.
    My usual boring signature: Nothing

  3. #3

    Thread Starter
    Member
    Join Date
    Jun 2007
    Posts
    43

    Re: Checking for similar duplicates

    I've got to use a textbox as certain users can add to the list... They should be checking themselves anyway but you what users are like...

  4. #4
    Code Monkey wild_bill's Avatar
    Join Date
    Mar 2005
    Location
    Montana
    Posts
    2,993

    Re: Checking for similar duplicates

    You can try my fuzzy logic class. This is what it was designed for.
    http://www.vbforums.com/showthread.php?t=540094
    That is the very essence of human beings and our very unique capability to perform complex reasoning and actually use our perception to further our understanding of things. We like to solve problems. -Kleinma

    Does your code in post #46 look like my code in #45? No, it doesn't. Therefore, wrong is how it looks. - jmcilhinney

  5. #5
    VB Addict Pradeep1210's Avatar
    Join Date
    Apr 2004
    Location
    Inside the CPU...
    Posts
    6,614

    Re: Checking for similar duplicates

    I'll advice you to put a listbox/combobox with all the company names filled in to let the user select from that list. Add an additional item "Others..." as the last entry in the list and if the user selects that just show him a textbox to enter whatever he wants to. You may need a bit of javascript for this, but I assure you that's worth the effort.
    Pradeep, Microsoft MVP (Visual Basic)
    Please appreciate posts that have helped you by clicking icon on the left of the post.
    "A problem well stated is a problem half solved." — Charles F. Kettering

    Read articles on My Blog101 LINQ SamplesJSON ValidatorXML Schema Validator"How Do I" videos on MSDNVB.NET and C# ComparisonGood Coding PracticesVBForums Reputation SaverString EnumSuper Simple Tetris Game


    (2010-2013)
    NB: I do not answer coding questions via PM. If you want my help, then make a post and PM me it's link. If I can help, trust me I will...

  6. #6
    Super Moderator Shaggy Hiker's Avatar
    Join Date
    Aug 2002
    Location
    Idaho
    Posts
    38,989

    Re: Checking for similar duplicates

    I would like to emphasize what Pradeep suggested.

    I've had a similar situation to what you are describing, where people have to pick locations. Some locations may have been used before, while others would be totally new. As you might imagine, no situation is worse for spelling than locations, as people abbreviate directions (S, S., South, etc.), names, and suffixes (rd, Rd, Road, etc.). Since I was working with streams, I could have S. Fork, South Fork, S Fk, SFk, S. Fk, and MANY MANY others.

    Validating this would be a total and utter nightmare. Therefore, I gave the user a list of all the known locations, but gave them an option to add a new location if it wasn't already on the list. In my case, if they gave me a new location, I was alerted to the fact, and could check those. Since this only happened a couple times a year, and since I had to add a bunch of other information for any new locations, notification made perfect sense. In your case, having you approve/alter/reject any typed in items might be unreasonable, but the key point is still the same: To the greatest extent possible, don't let users type in text for fields that might be searchable!!! The only fields I let people type into without some kind of oversight mechanism is comment fields. Other than that, I go out of my way to guide their entry.
    My usual boring signature: Nothing

  7. #7

    Thread Starter
    Member
    Join Date
    Jun 2007
    Posts
    43

    Re: Checking for similar duplicates

    Cheers for your help. I do what Pradeep says already, i.e. there is a combobox with all the company names listed and an option to add another if the one they want is not there, but still the users add duplicates, I think that they just can't be bothered to look through the list... I might give that fuzzy logic class from wild_bill a go, I just want to check if there are similar entries already there and show the user them so then they'll see that the company is already on the list...

  8. #8
    Super Moderator Shaggy Hiker's Avatar
    Join Date
    Aug 2002
    Location
    Idaho
    Posts
    38,989

    Re: Checking for similar duplicates

    There is an old saying: "Against stupidity, the gods themselves contend in vain."

    Laziness could be substituted for stupidity.
    My usual boring signature: Nothing

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  



Click Here to Expand Forum to Full Width