Results 1 to 20 of 20

Thread: What is the most fastest way to see if a string already exists in an array

  1. #1

    Thread Starter
    Frenzied Member
    Join Date
    Jul 2003
    Posts
    1,269

    What is the most fastest way to see if a string already exists in an array

    Anyone know?
    Currently I have 25,000 records in a mysql database.

    What I am doing right now to see if something exists is the statement..
    rs.Open "INSERT INTO yIDs (yID) VALUES ('" & yID & "')", Mysql_Connection

    if the string doesnt exist, it is added to the database. if it does, i can tell as an error is raised (since yID is a primary key)

    but i need something faster. I need to do about 50 queries a second with different strings, and this method takes a lot of CPU.

    Is there a faster way anyone can think of? The ids are ~60 characters long. I need to be able to check if it exists in an array, if not add it.

    Would building a string of all the ids on form_load, then using instr be faster?
    What about arrays? Dictionary/collection objects? Anything else?

  2. #2
    Fanatic Member
    Join Date
    Jul 2006
    Location
    nasik,india
    Posts
    909

    Re: What is the most fastest way to see if a string already exists in an array

    why don't you use select query for searching record in table ?
    i think it will be faster that what you are doing that by insrt query.

  3. #3
    PowerPoster
    Join Date
    May 2006
    Location
    Location, location!
    Posts
    2,673

    Re: What is the most fastest way to see if a string already exists in an array

    There was a discussion a while back about fast searching methods using strings...I wrote this bit of code back then, which stores every string in one big string with a specific code dividing them (a unique code not in any of the strings)...the code it uses is "¬" so if you wanted to search for the word dog, you would search the string for "¬dog¬" and if it returns a result it's in there...if you *just* searched for "dog" you might get "doghouse", "doggone", etc and this method ensures exact match :-)

    You might find this fast enough for your needs, but be aware it uses up quite a bit of memory if you're gonna have 25000 entries...if each word is 8 or less characters, the total memory taken up would be anything up to 200k with 25k for overheads (all the ¬)...at 60 chars each that'd be 1.5MB
    Attached Files Attached Files
    Well, everyone else has been doing it :-)
    Loading a file into memory QUICKLY - Using SendKeys - HyperLabel - A highly customisable label replacement - Using resource files/DLLs with VB - Adding GZip to your projects
    Expect more to come in future
    If I have helped you, RATE ME! :-)

    I love helping noobs with their VB problems (probably because, as an amateur programmer, I am only slightly better at VB than them :-)) but if you SERIOUSLY want to get help for free from a community such as VBForums, you have to first have a grounding (basic knowledge) in VB6, otherwise you're way too much work to help...You've got to give a little if you want to get help from us, in other words!

    And we DON'T do your homework. If your tutor doesn't teach you enough to help you make the project without his or her help, FIND A BETTER TUTOR or try reading books on programming! We are happy to help with minor things regarding the project, but you have to understand the rest of it if you want our help to be useful.

  4. #4
    MS SQL Powerposter szlamany's Avatar
    Join Date
    Mar 2004
    Location
    Connecticut
    Posts
    18,263

    Re: What is the most fastest way to see if a string already exists in an array

    The fastest way to find a string in an array is to store the strings in the array in sorted order.

    Then use a simple binary search algorithm - I've written then dozens of times - I'm sure you can search the forum for one here.

    Basically it's a high/low search - so if you have 32768 items in an array you can find if one exists in less then 16 iterations...

    Nothing gets faster then that.

    You certainly do not want to use a string with INSTR - imagine what that does in memory to find your string!

    *** Read the sticky in the DB forum about how to get your question answered quickly!! ***

    Please remember to rate posts! Rate any post you find helpful - even in old threads! Use the link to the left - "Rate this Post".

    Some Informative Links:
    [ SQL Rules to Live By ] [ Reserved SQL keywords ] [ When to use INDEX HINTS! ] [ Passing Multi-item Parameters to STORED PROCEDURES ]
    [ Solution to non-domain Windows Authentication ] [ Crazy things we do to shrink log files ] [ SQL 2005 Features ] [ Loading Pictures from DB ]

    MS MVP 2006, 2007, 2008

  5. #5
    Hyperactive Member
    Join Date
    Apr 2004
    Posts
    342

    Re: What is the most fastest way to see if a string already exists in an array

    Another way that I have done this is placing the strings in a Hidden Listbox, you can use the SendMessageAPI to determine if the string is in the list... Very quick....

    VB Code:
    1. Public Const LB_FINDSTRING = &H18F
    2. Public Const LB_FINDSTRINGEXACT = &H1A2
    3.  
    4. Public Declare Function SendMessageString Lib "user32" Alias "SendMessageA" (ByVal hwnd as long, ByVal wMsg as Long, ByVal wParam As Long, Byref lParam as String) as Long
    5.  
    6. Dim ListIndex as long
    7.  
    8. ListIndex = SendMessageString(Me.List1.hWnd, LB_FINDSTRINGEXACT, -1, ByVal (Cstr(MyString))
    9.  
    10. If ListIndex = -1 Then
    11.  
    12.   ' String NOT in the List '
    13.  
    14. End If

  6. #6
    PowerPoster
    Join Date
    May 2006
    Location
    Location, location!
    Posts
    2,673

    Re: What is the most fastest way to see if a string already exists in an array

    Yeah, a sorted array being used to search probably would be the fastest method available, however it requires that you keep the array sorted which in itself would probably take lots of processor power to do...a simple bubble sort would sort the array though. If your list of strings is going to be dynamic (always changing) my code is probably faster (if you need to remove a word, eg "dog", you would just do replace(string,"¬dog¬","¬")) but if your strings are mostly static (never changing) then using szlamany's suggestion would probably be better
    Well, everyone else has been doing it :-)
    Loading a file into memory QUICKLY - Using SendKeys - HyperLabel - A highly customisable label replacement - Using resource files/DLLs with VB - Adding GZip to your projects
    Expect more to come in future
    If I have helped you, RATE ME! :-)

    I love helping noobs with their VB problems (probably because, as an amateur programmer, I am only slightly better at VB than them :-)) but if you SERIOUSLY want to get help for free from a community such as VBForums, you have to first have a grounding (basic knowledge) in VB6, otherwise you're way too much work to help...You've got to give a little if you want to get help from us, in other words!

    And we DON'T do your homework. If your tutor doesn't teach you enough to help you make the project without his or her help, FIND A BETTER TUTOR or try reading books on programming! We are happy to help with minor things regarding the project, but you have to understand the rest of it if you want our help to be useful.

  7. #7
    MS SQL Powerposter szlamany's Avatar
    Join Date
    Mar 2004
    Location
    Connecticut
    Posts
    18,263

    Re: What is the most fastest way to see if a string already exists in an array

    You don't need to sort the array of strings - sort a pointer array of integers instead.

    Use a QUICK SORT or SHELL SORT - they are fastest (which one can be debated).

    If you need to INSERT new entries then it gets much more complex - you can try to amend the pointer-array of sort entries - but that's potentially buggy...

    Coming at this from a DB approach you really should load all new entries into a staging table and then INSERT into the production table from that - with something like:

    Code:
    INSERT INTO yIds
        Select yId From TempTable TT
        Left Join yIds YI on YI.yId=TT.yId
        Where YI.yId is null
    or...

    Code:
    Delete From TempTable Where yId in (Select yId From yIds)
    Insert into yIds Select yId From TempTable
    and probably several other ways to go about it.

    Give us more background - where are these new Ids coming from? Why 50 a second? How quickly do the new ids need to get into the DB?

    *** Read the sticky in the DB forum about how to get your question answered quickly!! ***

    Please remember to rate posts! Rate any post you find helpful - even in old threads! Use the link to the left - "Rate this Post".

    Some Informative Links:
    [ SQL Rules to Live By ] [ Reserved SQL keywords ] [ When to use INDEX HINTS! ] [ Passing Multi-item Parameters to STORED PROCEDURES ]
    [ Solution to non-domain Windows Authentication ] [ Crazy things we do to shrink log files ] [ SQL 2005 Features ] [ Loading Pictures from DB ]

    MS MVP 2006, 2007, 2008

  8. #8
    PowerPoster
    Join Date
    May 2006
    Location
    Location, location!
    Posts
    2,673

    Re: What is the most fastest way to see if a string already exists in an array

    I was considering adding that you could use an array of numbers (rather than moving the text around, the numbers would be less processing) but felt that it was more memory wasted...I don't think that speed is an issue when sorting the data, although it is an issue when searching for an entry. The issue here for me is that having to sort 25,000 entries is going to take a while whatever way you do it...with my instr code you don't need to sort anything, which is why it's better suited to this if the array will be dynamic.

    You know, I really should stress-test my prog...give it a large text document and have it split the file into words and add them all into the word list or something...see how long it takes to do them all :-)

    Edit: Did a test...130965 words (700k file, "Isaac Asimov - Prelude to Foundation.txt") went through the program in around 140 minutes...that's about 15 words a second, so it's a bit slow...but I did run it in IDE without any speedup optimization or anything so it could be improved on, I am sure :-)
    Last edited by smUX; Aug 18th, 2006 at 09:54 AM.
    Well, everyone else has been doing it :-)
    Loading a file into memory QUICKLY - Using SendKeys - HyperLabel - A highly customisable label replacement - Using resource files/DLLs with VB - Adding GZip to your projects
    Expect more to come in future
    If I have helped you, RATE ME! :-)

    I love helping noobs with their VB problems (probably because, as an amateur programmer, I am only slightly better at VB than them :-)) but if you SERIOUSLY want to get help for free from a community such as VBForums, you have to first have a grounding (basic knowledge) in VB6, otherwise you're way too much work to help...You've got to give a little if you want to get help from us, in other words!

    And we DON'T do your homework. If your tutor doesn't teach you enough to help you make the project without his or her help, FIND A BETTER TUTOR or try reading books on programming! We are happy to help with minor things regarding the project, but you have to understand the rest of it if you want our help to be useful.

  9. #9

    Thread Starter
    Frenzied Member
    Join Date
    Jul 2003
    Posts
    1,269

    Re: What is the most fastest way to see if a string already exists in an array

    Heres a little more info
    The app is somewhat of a webpage crawler, grabbing information from websites (IDs) and storing it in a database.

    It works using about 200 winsock threads working at once. It retrieves on average, 50 IDs a second. These IDs need to be unique, so i insert the value into a database (and if it fails, i know it exists).

    I cannot just save all the ids to a file at once and then add them later, as work needs to be done on the ids as soon as its determined it is unique.

    Right now the database is 25k IDs big. It will only get bigger

    I was thinking of a sort but the problem is I need to add the IDs too.. (not just retrieve). Perhaps I could have a sort on the 25k IDs already in the database, check for duplicate on those first, and then use a dictionary object or something to see if there is a duplicate present on new IDs obtained on the fly?

  10. #10
    PowerPoster Static's Avatar
    Join Date
    Oct 2000
    Location
    Rochester, NY
    Posts
    9,390

    Re: What is the most fastest way to see if a string already exists in an array

    webpage crawler? what kind of id's are u collecting?
    JPnyc rocks!! (Just ask him!)
    If u have your answer please go to the thread tools and click "Mark Thread Resolved"

  11. #11
    MS SQL Powerposter szlamany's Avatar
    Join Date
    Mar 2004
    Location
    Connecticut
    Posts
    18,263

    Re: What is the most fastest way to see if a string already exists in an array

    So starting with a sorted array of 25K id's is no problem - right?

    Doing a binary search on those 25K items is quick and most efficient - right?

    Then you need to handle the "new ids" - determining if they have come in recently.

    You can use the string/INSTR method - see if that's sufficient. It won't have 25K entries - only the number just processed as new.

    Or if this proves to be a problem then you are loading them into a link-list array in sorted position - this is standard practice, just needs lots of testing.

    *** Read the sticky in the DB forum about how to get your question answered quickly!! ***

    Please remember to rate posts! Rate any post you find helpful - even in old threads! Use the link to the left - "Rate this Post".

    Some Informative Links:
    [ SQL Rules to Live By ] [ Reserved SQL keywords ] [ When to use INDEX HINTS! ] [ Passing Multi-item Parameters to STORED PROCEDURES ]
    [ Solution to non-domain Windows Authentication ] [ Crazy things we do to shrink log files ] [ SQL 2005 Features ] [ Loading Pictures from DB ]

    MS MVP 2006, 2007, 2008

  12. #12

    Thread Starter
    Frenzied Member
    Join Date
    Jul 2003
    Posts
    1,269

    Re: What is the most fastest way to see if a string already exists in an array

    how can i do a binary search?

  13. #13
    Hyperactive Member
    Join Date
    Aug 2006
    Posts
    367

    Re: What is the most fastest way to see if a string already exists in an array

    A binary search is basically what the non-idiotic Price is Right contestants do on that beat the clock game.. Say you have 1024 possibilities:
    You guess 512 and Bob says "HIGHER!"
    You guess 768 (512+256) and Bob says "HIGHER"
    You guess 896 (512+256+128) and Bob says "LOWER"
    You guess 832 (512+256+128-64) and Bob says "HIGHER"
    You guess 864 (512+256+128-64+32) and Bob says "HIGHER"
    You guess 880 (512+256+128-64+32+16) and Bob says "LOWER"
    You guess 872 (512+256+128-64+32+16-8) and Bob says "LOWER"
    You guess 868 (512+256+128-64+32+16-8-4) and Bob says "HIGHER"
    You guess 870 (512+256+128-64+32+16-8-4+2) and Bob says "LOWER!!"
    You guess 869 (512+256+128-64+32+16-8-4+2-1) Then the music starts playing and you get to pay taxes on something you didn't really want/need..
    BTW, 1024=2^10 obviously and I picked a worst case scenario where 10 iterations were needed since we weren't lucky enough to find it outright part way through

  14. #14
    Addicted Member
    Join Date
    Dec 2005
    Posts
    139

    Re: What is the most fastest way to see if a string already exists in an array

    Quote Originally Posted by VaxoP
    how can i do a binary search?
    Google is your friend. Here's the #1 result from a search of "VB6 binary search". You can also search for "VB6 binary sort" if you need more info.

    http://www.devx.com/vb2themax/Tip/18913

  15. #15

    Thread Starter
    Frenzied Member
    Join Date
    Jul 2003
    Posts
    1,269

    Re: What is the most fastest way to see if a string already exists in an array

    thanks guys
    binary search was 156x faster lol

  16. #16
    PowerPoster jcis's Avatar
    Join Date
    Jan 2003
    Location
    Argentina
    Posts
    4,430

    Re: What is the most fastest way to see if a string already exists in an array

    Quote Originally Posted by Cavar
    Strange, testing that, I did this..
    VB Code:
    1. Private arr() As String
    2.  
    3. Private Sub Form_Load()
    4. Dim i As Long
    5.     ReDim arr(100000)
    6.     For i = 0 To UBound(arr)
    7.         arr(i) = i
    8.     Next
    9.     MsgBox BinarySearch(arr, "1000", 100000)  ' = -1
    10. End Sub
    Result = -1 (not found)

  17. #17

    Thread Starter
    Frenzied Member
    Join Date
    Jul 2003
    Posts
    1,269

    Re: What is the most fastest way to see if a string already exists in an array

    this works

    Dim arr() As String
    Dim i As Long
    ReDim arr(100000)
    For i = 0 To UBound(arr)
    arr(i) = Format$(i, "000000")
    Next
    MsgBox BinarySearch(arr, "001000") ' = -1

  18. #18
    PowerPoster
    Join Date
    May 2006
    Location
    Location, location!
    Posts
    2,673

    Exclamation Re: What is the most fastest way to see if a string already exists in an array

    Quote Originally Posted by VaxoP
    thanks guys
    binary search was 156x faster lol
    Now I feel i need to reiterate the facts...Binary search *IS* a lot faster than any other method but it REQUIRES that the data be sorted...if you have dynamic data (always changing) which I think you do considering you are looking for duplicates and adding the data if it isn't a duplicate, then this option WILL NOT WORK as you need to sort the list each time. Now, you COULD use a static list that's pre-sorted and have a dynamic list that's sorted each time you add a word (I think this was mentioned before...not sure)...this would result in a very flexible go-between that would allow you to use this method with any size database but you may want to merge the dynamic and static list. If you plan to merge the two lists however, seeing as you have the lists split, you could probably add each word one at a time rather than just chucking them in and sorting the list after...you just find the position where the word should be then move everything after it to the next position and add the word (or work out what position all the words in dynamic list should go into the static list and do a clever merge after making space)

    The bottom line here is a basic binary sort is not going to work with a list of 25,000 words if you are adding words to it...but it can be made to work :-)
    Well, everyone else has been doing it :-)
    Loading a file into memory QUICKLY - Using SendKeys - HyperLabel - A highly customisable label replacement - Using resource files/DLLs with VB - Adding GZip to your projects
    Expect more to come in future
    If I have helped you, RATE ME! :-)

    I love helping noobs with their VB problems (probably because, as an amateur programmer, I am only slightly better at VB than them :-)) but if you SERIOUSLY want to get help for free from a community such as VBForums, you have to first have a grounding (basic knowledge) in VB6, otherwise you're way too much work to help...You've got to give a little if you want to get help from us, in other words!

    And we DON'T do your homework. If your tutor doesn't teach you enough to help you make the project without his or her help, FIND A BETTER TUTOR or try reading books on programming! We are happy to help with minor things regarding the project, but you have to understand the rest of it if you want our help to be useful.

  19. #19
    MS SQL Powerposter szlamany's Avatar
    Join Date
    Mar 2004
    Location
    Connecticut
    Posts
    18,263

    Re: What is the most fastest way to see if a string already exists in an array

    Ok - here's a little reality on how SQL and other ISAM-based index systems work.

    First they all use BINARY searches to look through the INDEX for an entry. There is basically nothing faster.

    And new entries are added to an INDEX all the time...

    But the INDEX is not re-sorted after each INSERT.

    Most of the "buckets" of data in an index have free space - you can select the percentage of free space. So if 50% of a bucket is empty, new entries are put into that free space - after finding the proper spot in the sorted order.

    When a bucket fills a split occurs - pointers are added to the spot where the split occured indicating where the two split buckets of index leaves are now residing.

    At some point in time the engine determines that an index would be better served by rebuilding it - and the sort occurs...

    The reason I mention all this is just to point out how far you can truly go with this type of concept.

    Re-sorting the list is not required - simply getting around the sort issue is.

    If you maintain a pointer array - all you have to do is open a spot in the pointer array to fit the new entry in. To clarify, the pointer array simply is an array of INTEGER values - matching the size of the "string array" of actual entries. It starts off it's life with Pointer(1)=1, Pointer(2)=2, Pointer(3)=3, Pointer(x)=x - with x being the size of the array.

    When you have to insert a new entry into the array, it goes into the "last" spot of the "string array" - let's say StringValue(25001)="some new value".

    If that new value belongs in the 5000th spot in regard to it's sorted position, then you need to move each Pointer(x) up one spot - basically Pointer(25001)=Pointer(25000), Pointer(25000)=Pointer(24999) - done with a simple loop.

    You do this all the way to Pointer(5001)=Pointer(5000).

    This frees up Pointer(5000) - which you now load with the number 25001 (since that is where you put the new value).

    Back in our mainframe days we would have used a "memory move" operation to actually shift this data around, so looping was not required.

    Although I still feel, as stated in a prior post, that putting the new entries into the main 25000 entry array is simply not needed. You can maintain a separate array of new entries - or a string of new entries and do an INSTR on that string. You could consider re-loading the main array after so main strings get put into the "second-tier-search-array".

    *** Read the sticky in the DB forum about how to get your question answered quickly!! ***

    Please remember to rate posts! Rate any post you find helpful - even in old threads! Use the link to the left - "Rate this Post".

    Some Informative Links:
    [ SQL Rules to Live By ] [ Reserved SQL keywords ] [ When to use INDEX HINTS! ] [ Passing Multi-item Parameters to STORED PROCEDURES ]
    [ Solution to non-domain Windows Authentication ] [ Crazy things we do to shrink log files ] [ SQL 2005 Features ] [ Loading Pictures from DB ]

    MS MVP 2006, 2007, 2008

  20. #20
    PowerPoster jcis's Avatar
    Join Date
    Jan 2003
    Location
    Argentina
    Posts
    4,430

    Re: What is the most fastest way to see if a string already exists in an array

    Quote Originally Posted by VaxoP
    this works
    VB Code:
    1. Dim arr() As String
    2. Dim i As Long
    3.     ReDim arr(100000)
    4.     For i = 0 To UBound(arr)
    5.         arr(i) = Format$(i, "000000")
    6.     Next
    7.     MsgBox BinarySearch(arr, "001000")  ' = -1
    Ok. Now if youll use fixed length strings like that, maybe you could consider using VB native filter. This works even if the array is not sorted. Its slower than BinarySearch, but not so much..
    VB Code:
    1. If UBound(Filter(arr, "095550", True)) = -1 Then
    2.         MsgBox "Item Not found"
    3.     Else
    4.         MsgBox "Item found"
    5.     End If
    Filter has even more functionality than this, it returns..
    - an array of matches (using True as 3rd parameter)
    - an array of no matches (using False as 3rd parameter)
    Last edited by jcis; Aug 19th, 2006 at 04:42 PM.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  



Click Here to Expand Forum to Full Width