Page 1 of 2 12 LastLast
Results 1 to 40 of 74

Thread: [RESOLVED] Fast count of words in a list

  1. #1

    Thread Starter
    PowerPoster SamOscarBrown's Avatar
    Join Date
    Aug 2012
    Location
    NC, USA
    Posts
    9,177

    Resolved [RESOLVED] Fast count of words in a list

    I am using the following code snippet to find counts of each words in one list box, and putting the results in a second listbox.
    (For example: list 1 may have (as a LIST) "Now, Is, The, Time, For, Now, Time, Good, Now"....list 2 would end up with "Now-3, Is-1, The-1, For-1, Time-2, Good-1) -Where the numbers are the number of occurrences of each particular word.
    Working with 'small' word counts (500 or so), it goes pretty rapidly (< a second).
    But when I increase the word count, say to over 10,000, it slows down to a CRAWL (minutes).


    Any suggestions/better&faster way to do this?

    Code:
    Private Sub poplist2()
        List2.Clear
        Dim MyArray() As String, CountOccurance As Long
        Dim i As Long, j As Long, k As Long
        k = 0
        Do While k < (List1.ListCount - 1)
            For i = 0 To List1.ListCount - 1
                If LCase(List1.List(k)) = LCase(List1.List(i)) Then _
                CountOccurance = CountOccurance + 1
            Next i
        
            ReDim Preserve MyArray(k + 1) As String
            MyArray(k) = LCase(List1.List(k)) & "-" & CountOccurance
            CountOccurance = 0
            k = k + 1
        Loop
        For i = LBound(MyArray) To UBound(MyArray)
            List2.AddItem MyArray(i)
        Next i
        Dim totListItms As Double
        totListItms = List2.ListCount - 1
        Do While totListItms >= 0
            For i = totListItms - 1 To 0 Step -1
                If LCase(List2.List(i)) = LCase(List2.List(totListItms)) Then
                    List2.RemoveItem i
                    totListItms = totListItms - 1
                End If
            Next i
            totListItms = totListItms - 1
        Loop
    End Sub

  2. #2
    PowerPoster
    Join Date
    Feb 2006
    Posts
    24,482

    Re: Fast count of words in a list

    ListBox is a user interface control, not a data structure. It isn't optimized for use as such.

  3. #3
    VB-aholic & Lovin' It LaVolpe's Avatar
    Join Date
    Oct 2007
    Location
    Beside Waldo
    Posts
    19,541

    Re: Fast count of words in a list

    I think that entire routine can be sped up immensely by using API: SendMessage and LB_FINDSTRINGEXACT
    That API would remove your inner loops that iterates thru the listbox. If you search the forum for LB_FINDSTRINGEXACT, you should find good examples.

    Out of curiosity, why is totListItms declared a double? It has to be converted by VB to a Long or Integer at least once for each iteration in the last loop
    Insomnia is just a byproduct of, "It can't be done"

    Classics Enthusiast? Here's my 1969 Mustang Mach I Fastback. Her sister '67 Coupe has been adopted

    Newbie? Novice? Bored? Spend a few minutes browsing the FAQ section of the forum.
    Read the HitchHiker's Guide to Getting Help on the Forums.
    Here is the list of TAGs you can use to format your posts
    Here are VB6 Help Files online


    {Alpha Image Control} {Memory Leak FAQ} {Unicode Open/Save Dialog} {Resource Image Viewer/Extractor}
    {VB and DPI Tutorial} {Manifest Creator} {UserControl Button Template} {stdPicture Render Usage}

  4. #4

    Thread Starter
    PowerPoster SamOscarBrown's Avatar
    Join Date
    Aug 2012
    Location
    NC, USA
    Posts
    9,177

    Re: Fast count of words in a list

    @dile....roger....gonna change that to an array.
    @LaV...I'll give the API a shot.
    ...TotListItms probably should be an Int. (will try this part first).

    Thanks

  5. #5
    PowerPoster
    Join Date
    Feb 2012
    Location
    West Virginia
    Posts
    14,205

    Re: Fast count of words in a list

    You could also speed it up by not using the redim preserve in the loop.
    Would be faster to have a larger array than needed to start with and then if needed reduce the size after the loop has finished so you have only one call to redim preserve rather than several.

    Another thing is that if the same word occurs more than once your code is going to process that word more than once which would both give undesired results and slow it down a good bit, especially so with that redim preserve in there

    One more thing after looking it over again, your loop is going to run until k = listcount so there is yet another reason to use the redim above the loop and set the boundry to listcount or listcount-1 as needed.

  6. #6
    PowerPoster
    Join Date
    Feb 2012
    Location
    West Virginia
    Posts
    14,205

    Re: Fast count of words in a list

    I think if I were going to try this using a listbox the first thing I would do is sort the listbox.
    I would create a variable that holds the last value read from the list which would of course start out as an empty string.
    I would then loop through the list comparing the value just read to the lastvalue var.
    If they are the same then increment the counter var by 1
    If they are different write the lastvalue and the counter to the other list, or the counter to an array if that is what you need, set the lastvalue to the current value, set counter back to 1 and continue the loop. If using an array to store the count of each word I would redim that array before the loop to the size of the list I am reading
    After the loop resize the array to the size of the second list, or not bother to resize it after the loop depending on the needs.

    This method should give better results and should be a LOT faster than what is in the OP

  7. #7
    PowerPoster
    Join Date
    Feb 2012
    Location
    West Virginia
    Posts
    14,205

    Re: Fast count of words in a list

    For example you might try something like this
    Code:
        ReDim MyArray(list1.ListCount) As String
        For k = 0 To list1.ListCount - 1
            If list1.List(k) = LastValue Then
                CountOccurance = CountOccurance + 1
            ElseIf LastValue = "" Then
                CountOccurance = 1
                LastValue = list1.List(k)
            Else
                list2.AddItem LCase(LastValue) & "-" & CountOccurance
                MyArray(list2.ListCount - 1) = LCase(LastValue) & "-" & CountOccurance
                CountOccurance = 1
                LastValue = list1.List(k)
                
            End If
        Next
        ReDim Preserve MyArray(list2.ListCount - 1) As String
    Assumes the list1 is sorted.

    Not tested so ...

    Add of course the array does not need to be there at all since it will hold the same thing as list 2
    Last edited by DataMiser; Dec 19th, 2014 at 02:30 PM.

  8. #8

    Thread Starter
    PowerPoster SamOscarBrown's Avatar
    Join Date
    Aug 2012
    Location
    NC, USA
    Posts
    9,177

    Re: Fast count of words in a list

    thx, DM....probably won't get back to this little project until Monday, but yes, I currently have both lists 1 and 2 sorted.
    Will be trying all three major suggestions above, but the weekend is nearing!!!! So, will revisit on Monday.

  9. #9
    Frenzied Member
    Join Date
    May 2014
    Location
    Kallithea Attikis, Greece
    Posts
    1,289

    Re: Fast count of words in a list

    I give my thought here, not a single line of code.
    Every word in a dictionary can be simplify like this "like" to be "li" and "ke". So We can use an array of 1000 strings in a shorted array (using an index array for swapping indexes) for two letters first part of a word and a second array for three letter "the", "thi". So we give a second row for that arrays where we put actual tokenized words but from second part of the word except for some numbers that means ...no other part, only a letter, and two more guards the &hffff and &hfffe. So in any string we put a guard, tokens, the second guard and an integer number as char (2 bytes).
    So for finding "probaly" we have to break this in three parts, "pro","ba","ly", found the tokens, and went to "pro" second row, make the search string with the one guard plus two tokens (so 3 chars in unicode) plus second guard and find the position, or not. If not then we add the search string plus a charW(1). If we found it in position X then we add the len of the search string and we have the char where we increase the number.
    We have a fault if we have a word more than 65534 times...
    Last edited by georgekar; Dec 19th, 2014 at 03:16 PM.

  10. #10

    Thread Starter
    PowerPoster SamOscarBrown's Avatar
    Join Date
    Aug 2012
    Location
    NC, USA
    Posts
    9,177

    Re: Fast count of words in a list

    Quote Originally Posted by georgekar View Post
    I give my thought here, not a single line of code.
    Every word in a dictionary can be simplify like this "like" to be "li" and "ke". So We can use an array of 1000 strings in a shorted array (using an index array for swapping indexes) for two letters first part of a word and a second array for three letter "the", "thi". So we give a second row for that arrays where we put actual tokenized words but from second part of the word except for some numbers that means ...no other part, only a letter, and two more guards the &hffff and &hfffe. So in any string we put a guard, tokens, the second guard and an integer number as char (2 bytes).
    So for finding "probaly" we have to break this in three parts, "pro","ba","ly", found the tokens, and went to "pro" second row, make the search string with the one guard plus two tokens (so 3 chars in unicode) plus second guard and find the position, or not. If not then we add the search string plus a charW(1). If we found it in position X then we add the len of the search string and we have the char where we increase the number.
    We have a fault if we have a word more than 65534 times...
    ????????
    I understand LaVolpe, DataMiser and dilettante, but your posts always confuse me. Guess they think at a higher level in Greece (been there, LOVE the country, tho).

  11. #11
    Sinecure devotee
    Join Date
    Aug 2013
    Location
    Southern Tier NY
    Posts
    6,582

    Re: Fast count of words in a list

    Quote Originally Posted by SamOscarBrown View Post
    thx, DM.... but yes, I currently have both lists 1 and 2 sorted.
    ....
    I assume List 1 must not have been sorted at the time of your original question (two hours before your response in post #8), otherwise you wouldn't have needed that nested loop you had comparing each word to every other work in the list. All the matching words would have been adjacent to each other and you would just make one pass through the list and count the adjacent words, as DM showed.
    For the example of 10,000 words, doing 10,000 (or so) compares is much faster than nested looping (10,000*10,000), so doing 100 million compares.
    Last edited by passel; Dec 19th, 2014 at 05:49 PM.

  12. #12
    Frenzied Member
    Join Date
    May 2014
    Location
    Kallithea Attikis, Greece
    Posts
    1,289

    Re: Fast count of words in a list

    If I found time I will make the code...But my problem is that I wrote my thoughts with the greek way where the object is present once and then I can write some sentences without reference to that by name, but there is a reference by the meaning of the words (For that reason Greek language is difficult to others). In Greek language you say one sentence "put the other there", and you know what is that, what is the other and where is there, because you have read the sentences before. This happen too in a conversation. Ancient greeks before Alexander the Great have no spaces in the text so to read a document you have to read from start. To separate words are simple rules, and for that rules there are multiple typed forms of phonemes. An example is the "AND" in greek writing as ΚΑΙ and is listening as two phonemes.."k" and "eh" but also there is E that have the same sound as ΑΙ.
    If you like continue the reading...
    So how this is helping in a text without spaces? All greek words have a theme and an ending. So when you read you read the theme and then the ending. So for each theme you now what ending can fit, so you now where the word ends. There are words with two themes and an ending, but there is no theme that is same as any ending. So Greek language is like a programming language. Prόgramma is a greek noun, that has Pro - Gram -ma, the "PRO" meaning "the thing or meaning that comes from", /"Gram" is the them of LINE so any letter is a matter of line, so this word is a noun because has an ending of a noun..."ma" (like Prάgma, the thing in english, is an noun, is ending to "ma", Ι use ά because that letter has more volume from other a and is not A as English but Ah, or the u in butter).. So programma is that thing or meaning that extract from letters. In greek language all themes have a verb also. So the verb for them "gram" is Grafo which means "to type". So you say type letters but we say Γράφω γράμματα...and the two words have the same theme...So Greek words carrying the meaning together...So writing English as thinking in Greek is wrong... (γράμμα-τα here we have two endings the last one means many, is the plural of γράμμα).
    Last edited by georgekar; Dec 19th, 2014 at 06:06 PM.

  13. #13
    Sinecure devotee
    Join Date
    Aug 2013
    Location
    Southern Tier NY
    Posts
    6,582

    Re: Fast count of words in a list

    Hence the phrase in English, "it's Greek to me",
    when we can't understand something.

  14. #14
    Frenzied Member
    Join Date
    May 2014
    Location
    Kallithea Attikis, Greece
    Posts
    1,289

    Re: Fast count of words in a list

    This function counts words (using VbTextCompare) that are different.
    I leave as exercise to alter this to get the times each word appear in the text.
    Also notice that I use pointers (as offsets in arrays). Also the defined size of arrays is the maximum needed.
    Also the adding of two pointers, position and size, make a list of words in an order of appearance.

    Code:
    Function CountWords(a$) As Long
    Dim l As Long, wordbreak$
    wordbreak$ = " !@#$%^&*()[]{}\/?><.,_-+=;:'" + Chr$(34) + vbCrLf
    l = Len(a$)
    If l = 1 Then
    CountWords = -(Trim(a$) <> "")
    Else
    Dim pos() As Long, slen() As Long
    ReDim pos(1 To l \ 2) As Long, slen(1 To l \ 2) As Long
    Dim top As Long, i As Long, p As Long, k As Long
    l = Len(a$)
    p = 1
    Do While p <= l
            k = 0
            Do While p + k <= l
                If InStr(wordbreak$, Mid$(a$, p + k, 1)) = 0 Then Exit Do
                k = k + 1
            Loop
            p = p + k
            k = 0
            Do While p + k <= l
                 If InStr(wordbreak$, Mid$(a$, p + k, 1)) <> 0 Then Exit Do
                k = k + 1
            Loop
              ' Debug.Print "[" & Mid$(a$, p, k) & "]"
            If k = 0 Then Exit Do
            For i = 1 To top
               If k = slen(i) Then If StrComp(Mid$(a$, p, k), Mid$(a$, pos(i), k), vbTextCompare) = 0 Then Exit For
            Next i
            If i = top + 1 Then
                top = i
                pos(top) = p
                slen(top) = k
            End If
            p = p + k
      Loop
    CountWords = top
    End If
    End Function
    Last edited by georgekar; Dec 19th, 2014 at 08:13 PM. Reason: drop some repeated lines...

  15. #15
    Frenzied Member
    Join Date
    May 2014
    Location
    Kallithea Attikis, Greece
    Posts
    1,289

    Re: Fast count of words in a list

    @SamOscarBrown
    Never add to a listbox in a for loop without disable it before.
    When you forget that, then for each addition the listbox perform a refresh. So 10000 refreshes are a big time. A refresh wait vertical sync to send to screen. We have 50 or 60 or 70 of them...(those are the Hertz of screen refresh).
    So 10000 time to loop /70 circles of refresh per second 1000/7=142.8 seconds or 2.38 minutes..
    So you are right, you need minutes...

  16. #16
    PowerPoster
    Join Date
    Feb 2012
    Location
    West Virginia
    Posts
    14,205

    Re: Fast count of words in a list

    Well while it is true that the performance will be much better if you hide the list box first I doubt that it would take 2.38 minutes to add 10,000 items to the list. It may try to refresh but it does not actually do it. If you watch when adding items in a tight loop the list box will appear blank until the loop is done then they all appear at once. Now if you had a do events in there that is a whole nother story

    There was a bunch of things there that was making it slow. The biggest culprit was searching through the entire list multiple times and calling that redim preserve in that loop. Combine that with the list being visible and that just adds a little more to that large amount of processing time.

    Consider the list has 10,000 items, the 1st 3 match the original code would scan all 10,000 items to find a match and then it would scan them again and then again so it would go through 30,000 items 2 find the first three matches where it only needed to look at 3

    Edit: Out of curiousity I tried adding 10000 items to a listbox while visible and it takes less than 1 second to complete, even with doevents in the loop it only took about 1 second so that 142.8 seconds is off by quite a lot.

  17. #17
    PowerPoster
    Join Date
    Aug 2011
    Location
    B.C., Canada
    Posts
    2,887

    Re: Fast count of words in a list

    Without modifying too much OP's code I got the function to be faster.

    Code:
    Private Sub PopList2()
    Dim MyArray() As String
    Dim Added     As Boolean
    Dim iCount    As Integer
    Dim i         As Integer
    Dim j         As Integer
    
      ReDim MyArray(List1.ListCount - 1)
      
      For i = 0 To List1.ListCount - 1
        MyArray(i) = List1.List(i)
      Next
          
      List2.Clear
      List1.Visible = False
      
      Total = List1.ListCount - 1
      
      For i = 0 To UBound(MyArray)
        
        For j = List1.ListCount - 1 To 0 Step -1
          If LCase(List1.List(j)) = LCase(MyArray(i)) Then
            If Not Added Then List2.AddItem MyArray(i)
            List1.RemoveItem j
            iCount = iCount + 1
            If Added = False Then Added = Not Added
            End If
        Next
        
        If iCount > 0 Then List2.List(List2.ListCount - 1) = List2.List(List2.ListCount - 1) & " - " & iCount
        Added = False
        iCount = 0
        k = 0
      Next
      
      For i = 0 To UBound(MyArray)
        List1.AddItem MyArray(i)
      Next
      List1.Visible = True
    End Sub

  18. #18
    PowerPoster
    Join Date
    Jun 2013
    Posts
    7,259

    Re: Fast count of words in a list

    To throw my 2cents in (along the lines of what dilettante already said) ...

    GUI-elements should remain in the realm of "relative passive Visualizing"
    (responsible aside from the Visualizing, only for "throwing Events back to you").

    As for the task at hand - optimized "pure DataContainers" which are suitable for that, are e.g.
    the VB-Collection, an ADO-Recordset, or any other Dictionary-like Object-containers which
    support either a Sorting or a fast "Keybased lookup".

    Below I explain this "Data-first, passive GUI-updates second" approach, using the
    SortedDictionary of the vbRichClient (which is one of the fastest performing helpers
    out there for stuff like this).

    The following example needs (aside from a vbRichClient5-reference) 3 ListBoxes in a VB-Form
    (standard-naming List1, List2, List3) ... the Sum of all Code-Snippets below will need to be pasted into it.

    We start with preparation of Test-Data:

    Code:
    Option Explicit
    
    Private Words() As String, D As cSortedDictionary
    
    Private Sub Form_Load()
      ScanForWords Words, 10000 'could be a FileScan or something, here we use a Random generator
      Set D = New_c.SortedDictionary(TextCompare)
    End Sub
    
    Private Sub ScanForWords(Words() As String, Optional MaxWords As Long = 10000)
    Dim i As Long
      ReDim Words(0 To MaxWords - 1)
      Rnd -1
      For i = 0 To MaxWords - 1
        Words(i) = "Word" & Format$(Int(Rnd * 1000) + 1, "0000")
      Next i
    End Sub
    The above ensures the filling of the "raw Word-Data" (which could be done by a file-scan or whatever)
    resulting in a "raw Words container" (the Words()-StringArray) - and we define a Helper-DataContainer,
    a Sorting-Dictionary for the "counted results".

    Since the GUI shall remain relative passive, we can define "non-specialized, relative dumb fill-routines" for our
    3 VB-ListBoxes (which don't need to be sorting ones, since that happens on the non-GUI Data-Containers).

    Code:
    'just two (more or less) generically usable "dumb ListBox-Filler-Routines"
    Private Sub FillListFromArray(Lst As ListBox, Arr)
    Dim i As Long
      Lst.Clear
      For i = LBound(Arr) To UBound(Arr)
        Lst.AddItem Arr(i)
      Next i
    End Sub
    
    Private Sub FillListFromDictionary(Lst As ListBox, D As cSortedDictionary)
    Dim i As Long
      Lst.Clear
      For i = 0 To D.Count - 1
        Lst.AddItem D.KeyByIndex(i) & " (" & D.ItemByIndex(i) & ")"
      Next i
    End Sub
    What we did above so far was a relative trivial task, not related to the Word-Counting at hand,
    only ensuring a "structured approach" to the problem, separating GUI and Data-Containers.

    So, here comes the remaining "DoAction-Code" (triggered by a Form_Click):
    Code:
    Private Sub Form_Click()
      'let's do the real actions on the plain data-containers, ...
      DoWordCountingOn Words, D
      Caption = "We found " & D.Count & " uniquely differing words"
     
      '... only later resulting in "passive" GUI-updates
      FillListFromArray List1, Words
      FillListFromDictionary List2, D
      
      'and the same scheme again (data-actions first, followed by passive GUI-updates)
      Set D = ReSortDictionaryByItems(D) 'data-action, resulting in a new, reordered 'D'
      FillListFromDictionary List3, D 'passive GUI-update, reusing an existing generic fill-routine
    End Sub
    
    Private Sub DoWordCountingOn(Words() As String, D As cSortedDictionary)
    Dim i As Long, K As String '<- we treat each incoming Word as a Key
      D.RemoveAll
      For i = LBound(Words) To UBound(Words)
        K = Words(i): If D.Exists(K) Then D(K) = D(K) + 1 Else D.Add K, 1
      Next i
    End Sub
    
    Private Function ReSortDictionaryByItems(D As cSortedDictionary) As cSortedDictionary
    Dim i As Long
      Set ReSortDictionaryByItems = New_c.SortedDictionary(BinaryCompare, False)
      For i = 0 To D.Count - 1
        ReSortDictionaryByItems.Add D.ItemByIndex(i), D.KeyByIndex(i)
      Next i
    End Function
    As one can see, the DoWordCounting-routine is relative simple, not incorporating any
    GUI-Objects - using a suitable DataContainer-Object instead, to do its specific job
    (and it does that really fast).

    I've introduced another routine, ReSortDictionaryByItems which follows the same
    principle (doing a certain task on a non-GUI-object) - and reusing a generic and
    dumb GUI-fill-routine, to update our 3rd ListBox with the resorted results (by WordCount).

    Here's a ScreenShot, what the example will generate in your form:



    Olaf

  19. #19
    Frenzied Member
    Join Date
    May 2014
    Location
    Kallithea Attikis, Greece
    Posts
    1,289

    Re: Fast count of words in a list

    @Datamiser
    You have not to hide list, just set enable to false, do the process and change to true. About the calculation, is not true that every demand for refresh is executed in time. It depends from what os we have. Because there is a stack for gdi32 commands, and a flushing needed to perform. So from what condition the os make the final refresh? If os wait a refresh command from user, so that will be ok. But as you see from code, we don 't have a refresh command. You said that in a loop without a doevents we see a blank container. This is because the redraw routine quit before finish. This is a strategy from os programer, to gain speed? This maybe true. Because there is time to finish the redraw and while waiting, a new message reset the refresh timer, and wait again for the right time to redraw. Why didn't redraw partial? So in entire loop we would see three or four refreshes that finish. To do that, the os has to know how time have until next vertical blank. But this is not work. Os may decide to perform refresh if a mouse is moving over the listbox. So demand for redraw may occur from os too, and that, demand, has priority. So a refresh is a matter of os priorities. So user get a time penalty if controls need to redraw and that is leaving to os to decide.So if you disable the listbox by using the enable property with false value then you have no demands for redraw, no time penalty, you finish the job and then you enable again. So then because some dirty flag exist in listbox a refresh command automatic executed, so we see the refreshing list. When we use doevents we demand from os to flush any graphic job, so we change the priorities.
    You can understand the os if you think what you can do if you write the os... My environment M2000 is an os in an os, with a specific language.So I make a special doevents that decide when call a doevents or a refresh, or do nothing. So I can leave it automatic or I can set the refresh counter or reset it or perform an immediate refresh. To display 10000 numbers in a scrolling window we can display with a small refresh counter value, so we see all numbers, a slow run, or with a big value we see one or two changes and the last numbers, very fast. So there is a way to setup our refresh system, as I do in M2000.

  20. #20

    Thread Starter
    PowerPoster SamOscarBrown's Avatar
    Join Date
    Aug 2012
    Location
    NC, USA
    Posts
    9,177

    Re: Fast count of words in a list

    Herr Schmidt said "
    As for the task at hand - optimized "pure DataContainers" which are suitable for that, are e.g.
    the VB-Collection, an ADO-Recordset, or any other Dictionary-like Object-containers which
    support either a Sorting or a fast "Keybased lookup".
    So, maybe, instead of placing all the words in a listbox (which I do through a load routine reading an excel spreadsheet), maybe I should simply place that word list into a DB, and do an SQL query. Do you believe that would be a much better route to take? (I am not at the computer where I have this program, so can't test 'til Monday.)

    Am glad to see the many alternatives...I will try them all (well most of them that is) and see the speed comparisons. Even if I end up the DB route, I have already learned some valuable lessons about using listboxes. (And yes, both of my list boxes are sorted, but I obviously took the LONG way around in my comparison routine.) Thanks for all the advice...will post back next week.

  21. #21
    PowerPoster
    Join Date
    Feb 2012
    Location
    West Virginia
    Posts
    14,205

    Re: Fast count of words in a list

    Well using a query would be very quick and simple.

    Have you considered opening a recordset directly from the Excel file?

  22. #22
    Frenzied Member
    Join Date
    May 2014
    Location
    Kallithea Attikis, Greece
    Posts
    1,289

    Re: Fast count of words in a list

    And a small idea ..
    He can take the code in #14, and there is an easy way to make it to read words from array, or excel cells. My code find words but here the words are in separate cell...so he need only the for n>1 two arrays (1 to n\2), and can place in the first array not position in string but cell position. Because he want to find how many have from each word...another array can hold..."hit" number.(hit when a word founded in list). The optimization about word length is still valuable. When equal length strings are compared we can compare partial two strings, first character....last character...from first+1 to last-1....

  23. #23

    Thread Starter
    PowerPoster SamOscarBrown's Avatar
    Join Date
    Aug 2012
    Location
    NC, USA
    Posts
    9,177

    Re: Fast count of words in a list

    Quote Originally Posted by DataMiser View Post
    Well using a query would be very quick and simple.

    Have you considered opening a recordset directly from the Excel file?
    Could, I guess. (Done it before). I DO have to do some parsing (removing punctuation splitting by spaces, etc) (I'm pulling from different versions of the Bible from different spreadsheets, counting words in each chapter, and book). So, I guess I can do that.....but currently I read either an entire BOOK or just a CHAPTER into a textbox, parse it to a sorted listbox, and then count and place results in a second sorted listbox. CHAPTERS aren't the issue with speed, but BOOKS are. So, maybe I'll 'cheat' and load the TEXTBOX text into the DB, then do my queries from there to do the counting. (Instead of rewriting the load from Excel routine). We'll see....Monday looms. f
    My words come from one column in the Spreadsheet(s), so is pretty easy to import, or even to use as a RS.
    EX:
    Code:
    Version     BookNum   Book           Chapter         Verse      Scripture
       NIV           1           GENESIS          1                 1          In the beginning, ......
       NIV           1           GENESIS          1                 2          Now earth was formless....ETC
    All words come the "Scripture" column.

  24. #24
    Frenzied Member
    Join Date
    May 2014
    Location
    Kallithea Attikis, Greece
    Posts
    1,289

    Re: Fast count of words in a list

    So words are in a string in a cell so you need my word count routine.

  25. #25
    PowerPoster
    Join Date
    Jun 2013
    Posts
    7,259

    Re: Fast count of words in a list

    Quote Originally Posted by SamOscarBrown View Post
    ... maybe I should simply place that word list into a DB, and do an SQL query.
    Do you believe that would be a much better route to take?
    That depends on, if you need the Raw-Word-Data "persisted on Disk" (in your own *.mdb copy) - or not.

    If the read-out from Excel is only a "temporary thing" (when the raw Word-Data shall remain in the XL-File),
    then transferring these Raw-Words into an Array first, would still be the fastest way to approach this.

    In my example I filled a String-Array with these "raw words" in Form_Load (using a random generator).

    But there's no problem to transfer the Raw-Words from an Excel-Sheet into an Array in your VB6-App.
    Just take care, that you avoid "looping over single XL-Cells" to read out and fill such an Array.

    When remoting is involved (when you open the XL-Sheet from outside XL-VBA - directly in your VB-App),
    then doing the Array-Filling + Transferring in a single Remote-COM-Call is recommended.

    XL supports that quite nicely - e.g. when you have your WordList in a Sheet like the following
    (e.g the raw words starting in the second Column "B" downwards, and you know or calculate
    the "Count of Words" in that B-Column-Range beforehand), you can do a single call like below:

    At the VB-Remote-Side, WS being a Variable which holds the WorkSheet-instance...
    Code:
    Dim TopLeftCellAddr As String, EndOfRangeAddr As String, Words() As Variant
     
      TopLeftCellAddr = "$B$2" '<- this is assuming, you have a HeaderCell in $B$1, Data starting from $B$2 downwards
    
      EndOfRangeAddr = WS.Range("$B$65536").End(xlUp).Address
    
      Words = WS.Range(TopLeftCellAddr & ":" & EndOfRangeAddr)
    That's the fastest possible way, to transfer the contents of an XL-Range into an Array at the VB-Remote-Side.

    The Words-Array will (despite being derived from a single Column) still be a two-dimensional one,
    with LBounds at 1, so to loop over all the Words it contains, you would need to change the small
    DoCounting-Routine in my example to:

    Code:
    Private Sub DoWordCountingOn(Words() As Variant, D As cSortedDictionary)
    Dim i As Long, K As String '<- we treat each incoming Word as a Key
      D.RemoveAll
      For i = LBound(Words) To UBound(Words)
        K = Words(i, 1): If D.Exists(K) Then D(K) = D(K) + 1 Else D.Add K, 1
      Next i
    End Sub
    To measure the speed properly, you would have to spare out the List-Filling (or measure them
    in a separate timing).

    Filling VB-ListBoxes will take considerably more time than the WordCounting -
    so visualizing the results in a virtual ListBox or ListView is recommended in your final solution
    (in case you can't live with the ListBox-fill-times of course - depends a bit on the unique wordcounts).

    Olaf
    Last edited by Schmidt; Dec 20th, 2014 at 07:54 PM.

  26. #26
    PowerPoster
    Join Date
    Jun 2013
    Posts
    7,259

    Re: Fast count of words in a list

    Quote Originally Posted by SamOscarBrown View Post
    ...I'm pulling from different versions of the Bible ...
    ...CHAPTERS aren't the issue with speed, but BOOKS are...
    Ah - now there's a challenge...

    So, what about scanning the whole Text-File of the King James Bible directly (bible13.txt ~5MB of data)?

    This file is available for free from different sources - the version 13 textfile I've used for the
    small Demo below, came from Poject Gutenberg, downloadable as a zip from this directory:
    http://www.gutenberg.org/files/30/old/

    The parsing-routine is using SafeArrays to come up with the split-up words, working directly on
    the raw FileData (leaving out all the leading comments and other stuff in the text-file automatically).

    What the algorithm accomplishes performancewise (when native compiled) is the following...

    Doing a scan for all 66 books in the bible13.txt-file gives these results:


    One can also specify an optional param for the scan-process, to filter for words of a given BookNr only -
    below is the results for the book with the highest "unique wordcount" (book 23):


    Here's the source for the Demo: FastWordScan.zip

    Maybe that gives a few ideas with regards to potentially avoiding all Excel-stored Data (aside from
    giving an impression about the achievable timings, such things should take on raw-data of a certain size).

    @George
    Your current Wordsplitter is Ok for smaller sized input-data - but has still quite some room for
    improvement, in case you want to do your own version against the same test-data file.

    Olaf

  27. #27

    Thread Starter
    PowerPoster SamOscarBrown's Avatar
    Join Date
    Aug 2012
    Location
    NC, USA
    Posts
    9,177

    Re: Fast count of words in a list

    WOW! Kool, Olaf....I'll take a look. I have the ASV, NIV84, KJV and BEB all in Spreadsheet format....am importing the NIV 2011 now from a PDF. Easy to xfer all 'back to' text format, but don't wanna go through all that. I'll experiment on Monday with suggestions.

  28. #28
    Frenzied Member
    Join Date
    May 2014
    Location
    Kallithea Attikis, Greece
    Posts
    1,289

    Re: Fast count of words in a list

    @Olaf,
    I run your code. But your code is not visible, is not open source. So If I want to do it, then I have to make as closed code..but free to use, like yours.

  29. #29
    PowerPoster
    Join Date
    Jun 2013
    Posts
    7,259

    Re: Fast count of words in a list

    Quote Originally Posted by georgekar View Post
    @Olaf,
    I run your code. But your code is not visible, is not open source. So If I want to do it, then I have to make as closed code..but free to use, like yours.
    Well, I'd say that the VB6-code of the WordScan-routine in the example is quite visible - and in case you
    don't want to use the vbRichClien5-Dictionary, then you will have to change to something else - e.g. the
    MS-Scripting-Dictionary also supports StringKey-Lookups in a decent speed (but its sources are also not
    available, so I don't really understand what you mean).

    Olaf

  30. #30
    PowerPoster
    Join Date
    Jun 2013
    Posts
    7,259

    Re: Fast count of words in a list

    Well, to not leave those "out of the loop", who don't want to use the RichClients sorting dictionary...

    Here's a version of the ParseWords-routine that uses the MS-Scripting.Dictionary instead:

    Code:
    Private Sub ParseWords(FileName$, WD As Dictionary, Optional WordCountTotal&, Optional ByVal RestrictToBookNr&)
    Dim i&, j&, BookNr&, WordLen&, Idx, W$, saW As SAFEARRAY1D, FNr As Long
    Dim B() As Byte, WCounts() As Long, WB() As Integer
      
      FNr = FreeFile
      Open FileName For Binary As FNr
        ReDim B(LOF(FNr) - 1)
        Get FNr, , B
      Close FNr
     
      W = Space$(512) 'now span a safearray over W ... for fast Wchar-access per WB()
          saW.cDims = 1
          saW.cbElements = 2
          saW.lLbound1D = -2
          saW.cElements1D = Len(W) + 2
          saW.pvData = StrPtr(W) - 4
      BindArray WB, VarPtr(saW) 'now we bind WB() -> W
        
      WordCountTotal = 0 'reset the WordTotal-Counter
      WB(-2) = 0 'set the initial StrLenB to zero
      Set WD = New Scripting.Dictionary
          WD.CompareMode = TextCompare
      ReDim WCounts(8192)
    
      For i = 1 To UBound(B)
        If B(i) = 66 And B(i - 1) = 10 Then 'check potential new "Book-Headers"
          j = i: Do Until B(i) = 10: i = i + 1: Loop '...and skip to the next linefeed
          If i - j <= 31 Then BookNr = BookNr + 1
        End If
        Select Case B(i)
          Case 39, 65 To 90, 97 To 122 'add the char into our current Single-Word Array
            WB(WordLen) = B(i): WordLen = WordLen + 1
          Case Else 'it's whitespace
            If WordLen > 0 Then 'WordLen is only > 0 on the first (new) WhiteSpace-Char, follwing a word,
              If BookNr > 0 And (RestrictToBookNr = 0 Or RestrictToBookNr = BookNr) Then
                WB(-2) = WordLen + WordLen 'so now we adjust the LenB-Field of the "array-mapped BString"
                WB(WordLen) = 0  'and set a terminating zero-WChar
    
                Idx = WD.Item(W) 'Ok, now we can use W, to perform an exists-check
                If IsEmpty(Idx) Then 'Word doesn't exist yet in dictionary
                  Idx = WD.Count 'so we assign the Idx-Variable to the next free index-slot
                  WD(W) = Idx  'and add the new Word into our Dictionary
                  If Idx > UBound(WCounts) Then ReDim Preserve WCounts(2 * Idx)
                End If
    
                WCounts(Idx) = WCounts(Idx) + 1
                WordCountTotal = WordCountTotal + 1
                If i + 15 < UBound(B) Then If B(i + 15) = 42 Then Exit For 'we reached the end of Book 66 and exit here
              End If
              WordLen = 0
            End If
        End Select
      Next i
      
      saW.pvData = 0: saW.cElements1D = 0
      ReleaseArray WB 'release the "WB()-> W" safearray-binding
    
      Dim Keys():  Keys = WD.Keys
      Dim Items(): Items = WD.Items
      For i = 0 To WD.Count - 1: WD(Keys(i)) = WCounts(Items(i)): Next
    End Sub
    The Fill-Routine for the ListBox could be made Scripting.Dictionary compatible this way:
    Code:
    Private Sub FillListFromDictionary(Lst As ListBox, WD As Dictionary)
    Dim i As Long, Keys(), Items()
      Lst.Visible = False: Lst.Clear
        Keys = WD.Keys
        Items = WD.Items
        
        For i = 0 To WD.Count - 1
          Lst.AddItem Keys(i) & " (" & Items(i) & ")"
        Next i
      Lst.Visible = True
    End Sub
    Well, all that brings a performance which is about factor 4 less than before -
    and is also not yet accomplishing a sorted delivery of the unique words -
    but this might still be "Ok" for many purposes.

    Here's what the version with the Scripting-Dictionary will produce (with the small changes as shown above):



    Olaf

  31. #31
    Frenzied Member
    Join Date
    May 2014
    Location
    Kallithea Attikis, Greece
    Posts
    1,289

    Re: Fast count of words in a list

    In a form we place a rtb. No distinguish books...nor headers.
    Countwords make no sort. I leave the order of appearance. We can feed a listbox with sort capability. It isn't the best one, but uses no external class, such as MS-Scripting.Dictionary.
    I also include the reading routine for text files.
    GetWord routine optimizes the expansion of string.
    Getcount uses no string expression. Only pointers. This is fast. But because we can use binary search isn't as fast as I wish. Next time I put a sort routine..and I move only pointers..

    Code:
    dim a$, jj as long, kk as long
    a$ = ReadUnicodeOrANSI(App.Path & "\Bible13.txt")
    jj = CountWords(a$)
    Caption = jj
    Text1 = ""
    c$ = Space$(200)
    kk = 1
    For i = 1 To jj
    GetWord i, a$, c$, kk
    Next i
    Text1 = Trim(c$)


    This is for a module
    Code:
    Declare Function GetShortPathName Lib "kernel32" Alias _
    "GetShortPathNameW" (ByVal lpszLongPath As Long, _
    ByVal lpszShortPath As Long, ByVal cchBuffer As Long) As Long
    Private Declare Function WideCharToMultiByte Lib "kernel32" (ByVal codepage As Long, ByVal dwFlags As Long, ByVal lpWideCharStr As Long, ByVal cchWideChar As Long, ByVal lpMultiByteStr As Long, ByVal cchMultiByte As Long, ByVal lpDefaultChar As Long, ByVal lpUsedDefaultChar As Long) As Long
    Declare Function MultiByteToWideChar& Lib "kernel32" (ByVal codepage&, ByVal dwFlags&, MultiBytes As Any, ByVal cBytes&, ByVal pWideChars&, ByVal cWideChars&)
    Declare Function CompareString Lib "kernel32" Alias "CompareStringW" (ByVal Locale As Long, ByVal dwCmpFlags As Long, ByVal lpString1 As Long, ByVal cchCount1 As Long, ByVal lpString2 As Long, ByVal cchCount2 As Long) As Long
    Public Declare Function GetLocaleInfoW Lib "kernel32" (ByVal Locale As Long, ByVal LCType As Long, ByVal lpLCData As Long, ByVal cchData As Long) As Long
    Private Declare Function GetKeyboardLayout& Lib "user32" (ByVal dwLayout&) ' not NT?
    Private Const DWL_ANYTHREAD& = 0
    Const LOCALE_ILANGUAGE = 1
    Public curLCID As Long
    Public p3 As Long
    Public pos() As Long, slen() As Long, hit() As Long
    Public Sub DropLeft(ByVal uStr As String, fromStr As String)
    Dim i As Long
    i = InStr(fromStr, uStr)
    If i > 0 Then
    fromStr = Mid$(fromStr, i + Len(uStr))
    Else
    fromStr = ""
    End If
    End Sub
    Sub GetWord(i As Long, a$, b$, cur)
    Dim k$
    k$ = Mid$(a$, (pos(i) - p3) / 2 + 1, slen(i)) & " (" & CStr(hit(i) + 1) & ")" + vbCrLf
    If Len(b$) < cur + Len(k$) Then b$ = b$ + Space$(Len(b$))
    Mid$(b$, cur, Len(k$)) = k$
    cur = cur + Len(k$)
    End Sub
    Function CountWords(a$) As Long
    Dim l As Long, wordbreak$, ml As Long
    Const NORM_IGNORECASE = &H1
    wordbreak$ = " !@#$%^&*()[]{}\/?><.,_-+=;:'1234567890~`" + Chr$(34) + vbCrLf
    ml = (Len(wordbreak$)) * 2
    l = Len(a$)
    If l = 1 Then
    CountWords = -(Trim(a$) <> "")
    Else
    ReDim pos(1 To l \ 2 + 1) As Long, slen(1 To l \ 2 + 1) As Long, hit(1 To l \ 2 + 1) As Long
    Dim top As Long, i As Long, p As Long, k As Long, pa As Long, m As Long, pm As Long
    p = 1
    Dim P2 As Long
    P2 = StrPtr(a$)
    p3 = P2
    p3check = P2
    pm = StrPtr(wordbreak)
    Dim ok As Boolean
    l = P2 + 2 * l
    Do While P2 < l
            k = 0
           
            Do While P2 + k < l
                 ok = False
                For m = pm To pm + ml Step 2
                    If CompareString(0, 0, m, 1, P2 + k, 1) = 2 Then ok = True: Exit For
                Next m
               If Not ok Then Exit Do
                k = k + 2
            Loop
    
            P2 = P2 + k
            k = 2
            Do While P2 + k < l
                             ok = False
                For m = pm To pm + ml Step 2
                    If CompareString(0, 0, m, 1, P2 + k, 1) = 2 Then ok = True: Exit For
                Next m
               If ok Then Exit Do
                k = k + 2
            Loop
            If k = 0 Or P2 >= l Then Exit Do
            k = k \ 2
            For i = 1 To top
               If k = slen(i) Then
               If CompareString(0, NORM_IGNORECASE, P2, k, pos(i), k) = 2 Then
               hit(i) = hit(i) + 1
               Exit For
               End If
               End If
            Next i
            If i = top + 1 Then
                top = i
                pos(top) = P2
                slen(top) = k
            End If
            P2 = P2 + 2 * k
    
      Loop
     
    CountWords = top
    End If
    End Function
    Public Function GetDosPath(LongPath As String) As String
    
    Dim s As String
    Dim i As Long
    Dim PathLength As Long
    
            i = Len(LongPath) * 2 + 2
    
            s = String(1024, 0)
    
            PathLength = GetShortPathName(StrPtr(LongPath), StrPtr(s), i)
    
            GetDosPath = Left$(s, PathLength)
    
    End Function
    Function ReadUnicodeOrANSI(Filename As String, Optional ByVal EnsureWinLFs As Boolean, Optional feedback As Long) As String
    Dim i&, FNr&, BLen&, WChars&, BOM As Integer, BTmp As Byte, b() As Byte
    ' code from Schmidt, member of vbforums
    If Filename = "" Then Exit Function
    On Error GoTo ErrHandler
      BLen = FileLen(GetDosPath(Filename))
      If BLen = 0 Then Exit Function
      
      FNr = FreeFile
      Open GetDosPath(Filename) For Binary Access Read As FNr
      
        Get FNr, , BOM
        Select Case BOM
          Case &HFEFF, &HFFFE 'one of the two possible 16 Bit BOMs
            If BLen >= 3 Then
              ReDim b(0 To BLen - 3): Get FNr, 3, b 'read the Bytes
              feedback = 0
              If BOM = &HFFFE Then 'big endian, so lets swap the byte-pairs
              feedback = 1
                For i = 0 To UBound(b) Step 2
                  BTmp = b(i): b(i) = b(i + 1): b(i + 1) = BTmp
                Next
              End If
              ReadUnicodeOrANSI = b
            End If
          Case &HBBEF 'the start of a potential UTF8-BOM
            Get FNr, , BTmp
            If BTmp = &HBF Then 'it's indeed the UTF8-BOM
            feedback = 2
              If BLen >= 4 Then
                ReDim b(0 To BLen - 4): Get FNr, 4, b 'read the Bytes
                
                WChars = MultiByteToWideChar(65001, 0, b(0), BLen - 3, 0, 0)
                ReadUnicodeOrANSI = Space$(WChars)
                MultiByteToWideChar 65001, 0, b(0), BLen - 3, StrPtr(ReadUnicodeOrANSI), WChars
              End If
            Else 'not an UTF8-BOM, so read the whole Text as ANSI
            feedback = 3
              ReadUnicodeOrANSI = Space$(BLen)
              Get FNr, 1, ReadUnicodeOrANSI
            End If
            
          Case Else 'no BOM was detected, so read the whole Text as ANSI
            feedback = 3
            ReadUnicodeOrANSI = Space$(BLen)
            Get FNr, 1, ReadUnicodeOrANSI
        End Select
        
        If InStr(ReadUnicodeOrANSI, vbCrLf) = 0 Then
          If InStr(ReadUnicodeOrANSI, vbLf) Then
          feedback = feedback + 10
       If EnsureWinLFs Then ReadUnicodeOrANSI = Replace(ReadUnicodeOrANSI, vbLf, vbCrLf)
          ElseIf InStr(ReadUnicodeOrANSI, vbCr) Then
          feedback = feedback + 20
          
        If EnsureWinLFs Then ReadUnicodeOrANSI = Replace(ReadUnicodeOrANSI, vbCr, vbCrLf)
          End If
        End If
        
    ErrHandler:
    If FNr Then Close FNr
    If Err Then Err.Raise Err.Number, Err.Source & ".ReadUnicodeOrANSI", Err.Description
    End Function
    Public Function GetLCIDFromKeyboard() As Long
        Dim Buffer As String, Ret&, r&
        Buffer = String$(514, 0)
          r = GetKeyboardLayout(DWL_ANYTHREAD) And &HFFFF
          r = Val("&H" & Right(Hex(r), 4))
            Ret = GetLocaleInfoW(r, LOCALE_ILANGUAGE, StrPtr(Buffer), Len(Buffer))
        GetLCIDFromKeyboard = CLng(Val("&h" + Left$(Buffer, Ret - 1)))
    End Function
    Last edited by georgekar; Dec 21st, 2014 at 02:31 PM. Reason: Update GETWORD and WordBreak$ const

  32. #32
    Frenzied Member
    Join Date
    Mar 2008
    Posts
    1,210

    Re: Fast count of words in a list

    Sam,
    Re. ListBoxes
    1. We touched upon related stuff back in http://www.vbforums.com/showthread.p...83#post4654083
    2. If you are in a real hurry and you have a longish list it can be faster to sort the list in code and then add it to an unsorted ListBox.

  33. #33
    Frenzied Member
    Join Date
    Mar 2008
    Posts
    1,210

    Re: Fast count of words in a list

    Olaf,
    What is the Definition of SAFEARRAY1D in your last post.

  34. #34

    Thread Starter
    PowerPoster SamOscarBrown's Avatar
    Join Date
    Aug 2012
    Location
    NC, USA
    Posts
    9,177

    Re: Fast count of words in a list

    Quote Originally Posted by Magic Ink View Post
    Sam,
    Re. ListBoxes
    1. We touched upon related stuff back in http://www.vbforums.com/showthread.p...83#post4654083
    2. If you are in a real hurry and you have a longish list it can be faster to sort the list in code and then add it to an unsorted ListBox.
    That we did, 8 months ago! (unfortunately for old men, memories ain't like they used to be). I THINK what I'll end up doing is not using ANY list boxes, but end up loading the excel files into a db (I am so much more used to queries from that than using Excel data in a recordset.---and recall my comment about old men. And, instead of doing all this on the fly, I can keep it resident (without referring back to Excel) and only return my already parsed, counted, sourced data. (IOW, will have a table that is derived from others and contains the counts already. So, no matter how long it takes the FIRST time, when I run the program later on, it will be instantaneous. (Of course, at that point, I may use a list box to display it). Lots of good communication on this subject....hope to learn a whole lot more next week as I experiment with
    these great suggestions. Thanks, all....

  35. #35
    PowerPoster
    Join Date
    Jun 2013
    Posts
    7,259

    Re: Fast count of words in a list

    Quote Originally Posted by Magic Ink View Post
    What is the Definition of SAFEARRAY1D in your last post.
    It's in the small 2.9KB-Zip (at the bottom of my post #26).

    What I posted in #30 was thought as replacements for existing routines in the Basecode which
    came in that Zip. (Its content is just a *.vbp and a *.frm, with already placed ListBoxes, already
    set ProjectReferences and Compiler-Settings - just add the additional reference to the MS-
    Scripting-Runtime - and comment out the 2 or 3 other, still cSortedDictionary related Code-lines.

    Olaf

  36. #36
    Frenzied Member
    Join Date
    May 2014
    Location
    Kallithea Attikis, Greece
    Posts
    1,289

    Re: Fast count of words in a list

    This is the sort routine...is very fast
    QuickSortExtended 1, top


    Code:
    Public Sub QuickSortExtended(ByVal LB As Long, ByVal UB As Long)
    Dim M1 As Long, M2 As Long, pivlen As Long
    Dim Piv As Long, Tmp As Long '<- adjust types here, when switching to something different than Long
      If UB - LB = 1 Then
         M1 = LB
         If CompareString(0, 0, pos(M1), slen(M1), pos(UB), slen(UB)) = 3 Then
          Tmp = pos(M1): pos(M1) = pos(UB): pos(UB) = Tmp
          Tmp = slen(M1): slen(M1) = slen(UB): slen(UB) = Tmp
          Tmp = hit(M1): hit(M1) = hit(UB): hit(UB) = Tmp
         End If
         
         Exit Sub
      Else
        M1 = (LB + UB) \ 2
        If CompareString(0, 0, pos(M1), slen(M1), pos(LB), slen(LB)) = 2 Then
          M2 = UB - 1
          M1 = LB
          Do
            M1 = M1 + 1
            If M1 > M2 Then
              If pos(UB) < pos(LB) Then
                Tmp = pos(LB): pos(LB) = pos(UB): pos(UB) = Tmp
                Tmp = slen(LB): slen(LB) = slen(UB): slen(UB) = Tmp
                Tmp = hit(LB): hit(LB) = hit(UB): hit(UB) = Tmp
              End If
              Exit Sub
            End If
          Loop Until CompareString(0, 0, pos(M1), slen(M1), pos(LB), slen(LB)) <> 2
          Piv = pos(M1)
            pivlen = slen(M1)
          If M1 > LB Then
            If CompareString(0, 0, pos(LB), slen(LB), Piv, pivlen) = 3 Then
                Tmp = hit(LB): hit(LB) = hit(M1): hit(M1) = Tmp
                Tmp = pos(LB): pos(LB) = pos(M1): pos(M1) = Tmp
                Tmp = slen(LB): slen(LB) = slen(M1): slen(M1) = Tmp
                Piv = pos(M1)
                pivlen = slen(M1)
            End If
          End If
        Else
          Piv = pos(M1)
          pivlen = slen(M1)
          M1 = LB
          Do While CompareString(0, 0, pos(M1), slen(M1), Piv, pivlen) = 1: M1 = M1 + 1: Loop
        End If
      End If
      
      M2 = UB
      Do
      Do While CompareString(0, 0, pos(M2), slen(M2), Piv, pivlen) = 3: M2 = M2 - 1: Loop
        
        If M1 <= M2 Then
        If M1 <> M2 Then
                 Tmp = pos(M2): pos(M2) = pos(M1): pos(M1) = Tmp
                Tmp = slen(M2): slen(M2) = slen(M1): slen(M1) = Tmp
                Tmp = hit(M2): hit(M2) = hit(M1): hit(M1) = Tmp
        End If
          
          M1 = M1 + 1
          M2 = M2 - 1
        End If
        If M1 > M2 Then Exit Do
        Do While CompareString(0, 0, pos(M1), slen(M1), Piv, pivlen) = 1: M1 = M1 + 1: Loop
      Loop
      If LB < M2 Then QuickSortExtended LB, M2
      If M1 < UB Then QuickSortExtended M1, UB
    End Sub
    This is for example in #34
    Last edited by georgekar; Dec 21st, 2014 at 06:10 PM.

  37. #37

    Thread Starter
    PowerPoster SamOscarBrown's Avatar
    Join Date
    Aug 2012
    Location
    NC, USA
    Posts
    9,177

    Re: Fast count of words in a list

    GK..I have NO idea what you just posted. Are you insinuating that I can simply copy your code (which I don't generally do) and I will be able to import an excel spreadsheet, find all the words in the sixth column, and I will magically find the number of times that each word is found,, rapidly? I didn't think so....PLEASE keep you your posts directly related to the problem. ευχαριστίες.

  38. #38
    Frenzied Member
    Join Date
    May 2014
    Location
    Kallithea Attikis, Greece
    Posts
    1,289

    Re: Fast count of words in a list

    @SamOscarBrown,
    Its time to read code...
    So if you read that is a quicksort for my example
    But I found a better way to sort words. Without Quick sort. I found a way to sort the list of word while I do a binary search. If I didn't find the word then the binary search gives me the position to put the pointers..so i copy the memory to leave a a place to put my pointes. So this routine is 2 times faster from my previous one, and do the sorting also. I try the binary search with quick sort and it is 9 times more slow...
    I put it on the code of Olaf, but works only for all bibles.
    I didn't alter the time measure system, so we see all time to display also some 13000 words

    So No I haven't make the word counter for excel cells. My routine finds words in a string. So can be used everywhere that you can put a document in a string.

    ? strcomp("Aaronites","Aaron's",vbtextCompare)
    Aaronites is before Aaron's
    but in Olaf sort is after???
    but he use this " Set WD = New_c.SortedDictionary(TextCompare)"
    so what is the right??
    Attached Files Attached Files
    Last edited by georgekar; Dec 21st, 2014 at 06:35 PM.

  39. #39
    Frenzied Member
    Join Date
    May 2014
    Location
    Kallithea Attikis, Greece
    Posts
    1,289

    Re: Fast count of words in a list

    I found another part that is not sorted by vbtextcompare...in olaf routine
    This is from my routine (i can cut because is written in an RTB. Olaf's is in a Listbox...)
    Code:
    widow (50)
    widowhood (4)
    widows (23)
    widow's (5)
    widows' (3)

  40. #40
    PowerPoster
    Join Date
    Jun 2013
    Posts
    7,259

    Re: Fast count of words in a list

    George, your current solution is (native compiled, 24sec for the whole bible) still about factor 72! slower than
    the one with the RC5.SortedDictionary... and about Factor 18 slower than the one with the MS-Scripting.Dictionary.

    So, my suggestion would be, to adopt the fast Word-Splitting-routine I wrote - try to understand what
    the SafeArray-Type is doing in that context - perhaps followed by writing your own, fast implementation
    of a List-class in either Collection- or Dictionary-Style (in case you find the usability of those already
    tested and precompiled Binaries insufficient).

    Olaf

Page 1 of 2 12 LastLast

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  



Click Here to Expand Forum to Full Width