[RESOLVED] Fast count of words in a list

**DataMiser** · Dec 21st, 2014, 07:53 PM

What do you mean by not sorted? I would expect VB to sort that as widow, widow's, windowhood, widows, widows'

For the simple fact that the ASCII value of ' is 39 which is lower than the value of any alpha character.

**georgekar** · Dec 21st, 2014, 08:03 PM

@Olaf
In my machine is 12sec for binary search with auto sorting, 20 if I use the old one that compares only equal length words, and 90sec if I use binary search and quick sort for every entry.

As I said before there are routines to do much better, and yours is very good, but the task here for me is not to stay to that solutions, which are a black box, but explore ways that I can understand how it works, and share to others. You share the ways to do something, that is good, but you didn't give the deep knowledge.

I thing that I have to use a hash table. I never make one and I will try.
Merry Christmas
George

**georgekar** · Dec 21st, 2014, 08:06 PM

@DataMiser,
perhaps you can find that this
? strcomp("Aaronites","Aaron's")
and this
? strcomp("Aaronites","Aaron's",vbtextCompare)
give totally different output...

**Schmidt** · Dec 21st, 2014, 08:32 PM

Originally Posted by georgekar

@DataMiser,
perhaps you can find that this
? strcomp("Aaronites","Aaron's")
and this
? strcomp("Aaronites","Aaron's",vbtextCompare)
give totally different output...

George, maybe you noticed, that the "make it case-insensitive"-Flag of the
cSortedDictionary is a different Enum-Type - and named TextCompare instead of vbTextCompare.

So it is not surprising, that you will get results which are a bit differently sorted.

If you want to play around with all kinds of different Sort- and Compare-Schemes, the cSortedDictionary
supports that as well, over its SetStringCompareFlags method, which is quite versatile.

E.g. if you add a new small function to the CodeBase as:

Code:

Private Function ReSortDictionaryKeys(WD As cSortedDictionary, Flags As CmpFlags, Optional ByVal LCID As LangIDs) As cSortedDictionary
Dim i As Long
  Set ReSortDictionaryKeys = New_c.SortedDictionary(, False)
      ReSortDictionaryKeys.SetStringCompareFlags Flags, LCID

  For i = 0 To WD.Count - 1
    ReSortDictionaryKeys.Add WD.KeyByIndex(i), WD.ItemByIndex(i)
  Next i
End Function

You could then apply it after the counting went through - by adding an additional line into the
List-Click-Event:

Code:

  ...
  New_c.Timing True
    ParseWords App.Path & "\Bible13.txt", WD, WordCountTotal, lstBooks.ListIndex
    Set WD = ReSortDictionaryKeys(WD, cmpIgnoreCase)
  lblReport.Caption = lstBooks.Text & ": -> Words Total: " & WordCountTotal & ", unique Words: " & WD.Count & ", Time needed:" & New_c.Timing
  ...

The blue line above will cause the performance to dropy by ~10% (now about 0.37-0.38 seconds instead of the ~0.34sec without it).
But those who want to sort the results by whatever LCID (or SubSorting-Schemes) of their specific countries, can do so now.

Intellisense for playing around with that is supported - and I can easily choose e.g.:
cSortedDictionary.SetStringCompareFlags cmpIgnoreCase, German_Germany, SORT_GERMAN_PHONE_BOOK
(in case I want the special treatment for german umlauts, according to german phonebook-rules).

Olaf

**Schmidt** · Dec 21st, 2014, 09:49 PM

Originally Posted by georgekar

As I said before there are routines to do much better,

There are some approaches which *could* do better, but not by a very high margin -
and neither your word-splitter-routine is currently up to the task - nor your current
Binary-Sorting-Whilst-adding approach (which is BTW also used within the cSortedDictionary).

Originally Posted by georgekar

...the task here for me is not to stay to that solutions, which are a black box, ...

That's kind of a hypocrisy, because - when you apply that consequently, then you should refrain
also from using the CompareStringW-call in your code (which sits in the "blackbox"-kernel32.dll).

Please re-implement all its functionality in your own procedures first, to understand better.

Originally Posted by georgekar

You share the ways to do something, that is good, but you didn't give the deep knowledge.

That's something one could really take offense with - George, the amount of code I shared
with the community over the last two decades is immense. And if you have specific questions
about something, you only need to ask (but perhaps better in another, dedicated thread).

And it's again not really decent behaviour from you, that you not even *considered* looking at
(or working with) the Word-Splitting routine I've posted, instead you stubbornly insist on using
your own one further, although it is slower by a magnitude.

For fast String-routines in VB, you have to "avoid using Strings" - and as "oxymoronic" as that
sounds, it is nevertheless true - please look at the fastest performing StringRoutines at:
http://www.xbeat.net/vbspeed/

See, I could even point you to some older postings of mine, where I explained how to implement
something like a Sorting Dictionary (with code) - but given your ignorance of the code I *did* post
here already - I'm not sure, whether that would make any sense.

Olaf

**georgekar** · Dec 22nd, 2014, 04:03 AM

@Olaf
Ok, I will look better your library. Not to replicated it

**georgekar** · Dec 22nd, 2014, 07:42 PM

@Schmidt
Your example is very fast but from using safearry isn't too much gain of speed.
To prove that please insert this line :
WD.SetStringCompareFlags cmpIgnoreCase, English_United_States, SORT_DEFAULT

After these lines

WordCountTotal = 0 'reset the WordTotal-Counter
WB(-2) = 0 'set the initial StrLenB to zero
Set WD = New_c.SortedDictionary(TextCompare)
..............
WD.SetStringCompareFlags cmpIgnoreCase, English_United_States, SORT_DEFAULT

in the original your example. As you see now need 3 times more to finish.

Can you describe what exactly do in the TextCompare...option in a SortedDictionary?

**LaVolpe** · Dec 22nd, 2014, 08:13 PM

Sam, or is it SOB?

Trying to wade thru all these posts. Let me see if I got this right?

1) Both lists are sorted
2) You want to count how many times a word is repeated in List1
3) Then you want to add that word, with number of repeats, to List2, deleting the previous List2 entry (if one exists?)

If I got the above correct, I see one flaw in your original routine & correct me if I'm wrong:

Code:

If LCase(List2.List(i)) = LCase(List2.List(totListItms)) Then...

Since each item in List2 has an occurrence count appended to the word, are you only removing the entry if they have the same occurrence count as before? In other words, lets say List2 already has an entry: "The - 5 Occurrences". And after processing List1, you have "The - 3 Occurrences". What is suppose to happen. Is the previous entry to be removed, or only removed if the previous and new entry match exactly. Equal sign in your code is a problem if trying to match just on the word in List2

Either way, I see this being done looping thru List1 just once. And using APIs, not looping thru List2 at all.

Edited: Here's an example using the API to locate a match just on the word, not the entire entry in List2. It could be what you are looking for a close enough to modify. The API usage prevents us from having to loop thru List2 every iteration until we found a 'match' or exhausted the list

Code:

Private Declare Function SendMessage Lib "user32.dll" Alias "SendMessageA" (ByVal hwnd As Long, ByVal wMsg As Long, ByVal wParam As Long, ByRef lParam As Any) As Long
Private Const LB_FINDSTRING As Long = &H18F

Private Sub Command1_Click()
    Dim l1 As Long, l2 As Long, nrRepeat As Long
    Dim lLast As Long, sEntry as String
    
    nrRepeat = 1
    For l1 = 1 To List1.ListCount - 1
        If StrComp(List1.List(l1), List1.List(lLast), vbTextCompare) = 0 Then ' match
            nrRepeat = nrRepeat + 1
        Else
            sEntry = List1.List(lLast) & " -"
            l2 = SendMessage(List2.hwnd, LB_FINDSTRING, -1, ByVal sEntry)
            sEntry = sEntry & nrRepeat & " Occurrences"
            If l2 = -1 Then             ' no match found on the word, add it
                List2.AddItem sEntry
            Else                        ' match found, replace it
                List2.List(l2) = sEntry
            End If
            nrRepeat = 1
            lLast = l1
        End If
    Next
    sEntry = List1.List(lLast) & " - "
    l2 = SendMessage(List2.hwnd, LB_FINDSTRING, -1, ByVal sEntry)
    sEntry = sEntry & nrRepeat & " Occurrences"
    If l2 = -1 Then             ' no match found on the word, add it
        List2.AddItem sEntry
    Else                        ' match found, replace it
        List2.List(l2) = sEntry
    End If
End Sub

I didn't add a check to see if List1 is empty. You should probably do that
If an exact match is what you need, then replace that API constant with this one

Code:

Private Const LB_FINDSTRINGEXACT As Long = &H1A2

**georgekar** · Dec 22nd, 2014, 08:28 PM

I Have 6 examples. We have to put the Bible13.txt.
Because some examples cannot work with books, I make all examples to count all books. I put an example with a HashTable class (from Francesco Balena, changed slight by me). I put my binary search routine, and my first slow routine...with a sort routine. So I have three more. One using MS Dictionary without using the Olaf's safearray. One using Olaf's RC5 cSortedDictionary but with proper sorting by using
WD.SetStringCompareFlags cmpIgnoreCase, English_United_States, SORT_DEFAULT
(It seems that TextCompare ignore case but has not the right sorting order unless we set these flags)
And finaly we can click on the list of books (any item gives all books and introduction text).

I do all of this to find out what factor add speed. In my computer in first version the binary search way of counting took 12000 ms. The last version took 1400ms (my slow routine start from 20000ms and end to 7500ms). These two routines and Olafs with RC5 give sorted result. Olafs without CompareFlags settings took 550ms...with these settings 1550. He propose to do make a sort after the routine run, and with that optimization took 760ms (200ms more, because the sort routine is before we check timer). But this is not the same, as for the other routines that do the proper comparison.

HashTable has no sorted output. Dictionary output started with a click in listbox, are 200ms more quick from my implementation -started from a button Dictionary- (is like Olaf's without the safearray). So safe array add speed.

From all that experiments...I have to notice that Olaf has do very good work with cSortedDictionary but must clarify some things about the options for that. I look help and only the syntax described, not what we can do, no specifications. (and these are the same for all of classes in RC5...so not only is a black box, but also has no embedded info or examples for each class, method, property. So if I need a life to understand...how this RC5 can be useful?

6examples.zip

**SamOscarBrown** · Dec 23rd, 2014, 08:17 AM

From LV:

Sam, or is it SOB?

Trying to wade thru all these posts. Let me see if I got this right?

1) Both lists are sorted
2) You want to count how many times a word is repeated in List1
3) Then you want to add that word, with number of repeats, to List2, deleting the previous List2 entry (if one exists?)

1- it is Sam!!!!!!!!!!!!! (Actually , Samuel Oscar Brown (Shorted to Sam Oscar Brown) is simply a moniker I created LONG LONG time ago...."Sam Brown" has a unique/funny history behind it (used to have it posted on my Website (MacIsIrish.com) but have since discontinued the site). I added Oscar, yes, to make it SOB. I hope I don't appear that way on this site....my life has changed considerably since the event occurred wherein I started using this alias.
2-Yes, 1, 2 and 3 is precisely how I had approached the problem, and yesterday (very busy day), used an array instead of list1---speed about the same.
I have not yet had time to experiment any further...hopefully today....with the many suggestions above.

Here is what it looks like (and works fine (it seems)) except for the speed, when I was using two lists...

Attachment 122007

I'll respond to more of your post in a few..but the 'answers' appear correct with my orginal code. (Again, I have not had the opportunity to play around with others' suggestions yet.) From the image, you can see I used chapter 1 of Genesis in the NIV version---I clicked on the right-hand side GO button. (The left GO button would do the entire BOOK of, in this case, Gensis, and takes around 5 minutes or so). (Just an image to show how I originally approached this).

**SamOscarBrown** · Dec 23rd, 2014, 08:40 AM

LV---using YOUR code, it only took me 0.593 seconds to separate and count the words in Genesis. (13082 words in the NIV).

NOW, to experiment with others' suggestions, and THEN, the hard part, try to understand the logic and code ya'll presented.

SOB......ooops, I Mean, Sam

**LaVolpe** · Dec 23rd, 2014, 09:02 AM

Originally Posted by SamOscarBrown

LV---using YOUR code, it only took me 0.593 seconds to separate and count the words in Genesis. (13082 words in the NIV).

NOW, to experiment with others' suggestions, and THEN, the hard part, try to understand the logic and code ya'll presented.

Don't think you'll get much faster using a listbox, a tad, if uber-optimizing. Since the items are already sorted, looping thru once is as fast as you can hope for. Probably more time taken, adding the words to the listbox

Any massive improvement will be thru the use of other methods than the sorted listbox.

**Max187Boucher** · Dec 23rd, 2014, 09:03 AM

Originally Posted by LaVolpe

Sam, or is it SOB?

Trying to wade thru all these posts. Let me see if I got this right?

1) Both lists are sorted
2) You want to count how many times a word is repeated in List1
3) Then you want to add that word, with number of repeats, to List2, deleting the previous List2 entry (if one exists?)

If I got the above correct, I see one flaw in your original routine & correct me if I'm wrong:

Code:

If LCase(List2.List(i)) = LCase(List2.List(totListItms)) Then...

Since each item in List2 has an occurrence count appended to the word, are you only removing the entry if they have the same occurrence count as before? In other words, lets say List2 already has an entry: "The - 5 Occurrences". And after processing List1, you have "The - 3 Occurrences". What is suppose to happen. Is the previous entry to be removed, or only removed if the previous and new entry match exactly. Equal sign in your code is a problem if trying to match just on the word in List2

Either way, I see this being done looping thru List1 just once. And using APIs, not looping thru List2 at all.

Edited: Here's an example using the API to locate a match just on the word, not the entire entry in List2. It could be what you are looking for a close enough to modify. The API usage prevents us from having to loop thru List2 every iteration until we found a 'match' or exhausted the list

Code:

Private Declare Function SendMessage Lib "user32.dll" Alias "SendMessageA" (ByVal hwnd As Long, ByVal wMsg As Long, ByVal wParam As Long, ByRef lParam As Any) As Long
Private Const LB_FINDSTRING As Long = &H18F

Private Sub Command1_Click()
    Dim l1 As Long, l2 As Long, nrRepeat As Long
    Dim lLast As Long, sEntry as String
    
    nrRepeat = 1
    For l1 = 1 To List1.ListCount - 1
        If StrComp(List1.List(l1), List1.List(lLast), vbTextCompare) = 0 Then ' match
            nrRepeat = nrRepeat + 1
        Else
            sEntry = List1.List(lLast) & " -"
            l2 = SendMessage(List2.hwnd, LB_FINDSTRING, -1, ByVal sEntry)
            sEntry = sEntry & nrRepeat & " Occurrences"
            If l2 = -1 Then             ' no match found on the word, add it
                List2.AddItem sEntry
            Else                        ' match found, replace it
                List2.List(l2) = sEntry
            End If
            nrRepeat = 1
            lLast = l1
        End If
    Next
    sEntry = List1.List(lLast) & " - "
    l2 = SendMessage(List2.hwnd, LB_FINDSTRING, -1, ByVal sEntry)
    sEntry = sEntry & nrRepeat & " Occurrences"
    If l2 = -1 Then             ' no match found on the word, add it
        List2.AddItem sEntry
    Else                        ' match found, replace it
        List2.List(l2) = sEntry
    End If
End Sub

I didn't add a check to see if List1 is empty. You should probably do that
If an exact match is what you need, then replace that API constant with this one

Code:

Private Const LB_FINDSTRINGEXACT As Long = &H1A2

It takes 2 seconds (lb_findstring) or 3 seconds (lb_findstringexact) to complete (modified a bit Lavolpe code).
I know this can probably look better but I am in a hurry this morning

Code:

Private Sub PopList4()
Dim l1 As Long
Dim l2 As Long
Dim sTinme As Long
Dim nrRepeat As Long
Dim sEntry As String
Dim S() As String

  
    sTime = Timer
    
    ReDim S(List1.ListCount - 1)
    
    For i = 0 To List1.ListCount - 1
      S(i) = List1.List(i)
    Next
    
    List1.Visible = False
    List2.Visible = False
    
    nrRepeat = 0
    Do While List1.ListCount > 0
      DoEvents
      For l1 = List1.ListCount - 1 To 0 Step -1
        If List1.List(l1) = " " Or List1.List(l1) = "" Then
          If l1 < List1.ListCount Then List1.RemoveItem l1
          Exit For
        End If
        sEntry = LCase(List1.List(l1))
        Do Until l2 = -1
          DoEvents
          l2 = SendMessage(List1.hwnd, LB_FINDSTRINGEXACT, -1, ByVal sEntry)
          If l2 = -1 Then             ' no match found on the word, add it
            If Not sEntry = "" Then List2.AddItem sEntry & " - " & nrRepeat & " Occurance"
            nrRepeat = 0
            l2 = 0
            Exit Do
          Else                        ' match found, replace it
            List1.RemoveItem l2
            nrRepeat = nrRepeat + 1
          End If
        Loop
      Next
    Loop
    
    List1.Visible = True
    List2.Visible = True
    
    For i = 0 To UBound(S)
      List1.AddItem S(i)
    Next
    
    Debug.Print Timer - sTime
End Sub

list1 contains the bible13.txt...... split by space (Split(text1.text, " "))
list2 is sorted = true (not necessary)

**LaVolpe** · Dec 23rd, 2014, 09:07 AM

Max, a couple things.

1) When Sam said 1/2 second, I'm sure he wasn't counting the time taken to get the words in the list
2) DoEvents is probably slowing your routine down
3) Do While is a tad slower. It has to re-evaluate List1.ListCount each iteration

**Max187Boucher** · Dec 23rd, 2014, 09:14 AM

Lavolpe, I added DoEvents (i always add it for testing) because if the do-loop crashes i lose all the code... it can be removed, there was no errors or inifinite looping in my example.

I did this on the fly, but yes it can be done a lot better, the look and the speed.

just wanted to add the looping to find/add every word in one go

Edit:
You do save .1 to .2 seconds... 100ms to 200ms speed gain by removing the doevents

**SamOscarBrown** · Dec 23rd, 2014, 09:28 AM

Correct, I was not counting the pop of list1...however, that was very rapid as well (2.63 secs)....(for a total of 3.223 seconds for both lists to be popped)
Now, when I don't use a list1 at all, but populate an array instead (was 'instantaneous' in populating that array instead of2.63 seconds to pop the list1)
and use the array in lieu of list1 to pop list2, the total time do pop the array (0) and list2 was 5.787 seconds....very interesting...I thought that perhaps using an array in lieu of list1 would be faster populating list2....but no, a total of ~2.5 seconds MORE.
So, back to using two lists, I am MORE THAN PLEASED with the total of 3.222 seconds when MY times originally were in the MINUTES.

**LaVolpe** · Dec 23rd, 2014, 10:27 AM

Sam, the 5.787 seconds might have been an anomaly? Did you try it more than once without unloading the form? Maybe some initialization was going between VB and List2, something that happens just once?

**DataMiser** · Dec 23rd, 2014, 10:33 AM

When you were using the array were you by chance using redim preserve in a loop? It so that could explain the slow down.

**SamOscarBrown** · Dec 23rd, 2014, 10:49 AM

LV, DM: Here's the code (LV's---only thing changed was I substituted wordArray() for list1) (word array, in Genesis, has 13081 'words')
No redimming, simply looping through the array once.
Times are pretty much consistent between 5.7 and 6.6 seconds, (first time, second click, etc) as compared to using list1, ~ 3.2 seconds.

Code:

 StartTiming
 List2.Clear
    Dim l1 As Long, l2 As Long, nrRepeat As Long
    Dim lLast As Long, sEntry As String
    
    nrRepeat = 1
    For l1 = 1 To UBound(wordArray()) - 1
        If StrComp(wordArray(l1), wordArray(lLast), vbTextCompare) = 0 Then ' match
            nrRepeat = nrRepeat + 1
        Else
            sEntry = wordArray(lLast) & " -"
            l2 = SendMessage(List2.hwnd, LB_FINDSTRING, -1, ByVal sEntry)
            sEntry = sEntry & nrRepeat & " Occurrences"
            If l2 = -1 Then             ' no match found on the word, add it
                List2.AddItem sEntry
            Else                        ' match found, replace it
                List2.List(l2) = sEntry
            End If
            nrRepeat = 1
            lLast = l1
        End If
    Next
    sEntry = wordArray(lLast) & " - "
    l2 = SendMessage(List2.hwnd, LB_FINDSTRING, -1, ByVal sEntry)
    sEntry = sEntry & nrRepeat & " Occurrences"
    If l2 = -1 Then             ' no match found on the word, add it
        List2.AddItem sEntry
    Else                        ' match found, replace it
        List2.List(l2) = sEntry
    End If
    MsgBox GetTime

**LaVolpe** · Dec 23rd, 2014, 11:05 AM

That's a head-scratcher. You'd figure that compare List1.List(l1) to List1.List(lLast) would be slower since calls to the listbox would be needed to extract the strings for comparison, unless VB is optimizing something under the covers. What also may be a reason is that by using an array, I'd imagine twice the bytes are being compared: wordArray() is a unicode string where List1.List(x) is an ANSI string? A head-scratcher

**Max187Boucher** · Dec 23rd, 2014, 12:16 PM

Shouldn't you be looping from 0 to ubound(array)
And not 1 to ubound(array) - 1?

Just a thought

**Max187Boucher** · Dec 23rd, 2014, 12:19 PM

What is in the StartTiming and GetTime function i would use

Code:

Dim sTime as Long
sTime = Timer
'execute code here
Debug.Print Timer - sTime

**LaVolpe** · Dec 23rd, 2014, 12:32 PM

Originally Posted by Max187Boucher

Shouldn't you be looping from 0 to ubound(array)
And not 1 to ubound(array) - 1?

Nope. The way it is written, the 1st "word" starts out with occurrence count of 1. No need to process it, one less string comparison

**georgekar** · Dec 23rd, 2014, 12:37 PM

Describe better...The first item, in position 0, has a variable used as offset to start, named ILast. So the next will be 1, next 2, so there is a line to advance the offset to next one lLast=L1.

**SamOscarBrown** · Dec 23rd, 2014, 12:43 PM

Originally Posted by Max187Boucher

What is in the StartTiming and GetTime function i would use

Code:

Dim sTime as Long
sTime = Timer
'execute code here
Debug.Print Timer - sTime

These are those functions....I was going to use timer but have done it this way before....
Again, tho, the results are about the same ~6 seconds (WITH ARRAY VICE LISTBOX).

Code:

Private Declare Function GetTickCount Lib "kernel32" () As Long
Private m_StartTime As Long
Private Sub StartTiming()
    m_StartTime = GetTickCount()
End Sub
Private Function GetTime() As Single
    Dim StopTime As Long
    StopTime = GetTickCount()
    GetTime = (StopTime - m_StartTime) / 1000
End Function

I'm going to go ahead and mark as Resolves (as it works fast now), but will keep experimenting with those other options provided. Yes, curious how the same code using a listbox was faster than the array. Oh well.

THANKS TO ALL for educating this old man....still learnin'

"SAM"

**georgekar** · Dec 23rd, 2014, 01:07 PM

I think windows process in high priority messages for controls...so the search in a listbox is very fast.

**SamOscarBrown** · Dec 23rd, 2014, 01:11 PM

Actually, PROBABLY, most likely, is that I did not have the array sorted first, like I did list1. THAT is probably the reason for the extra time....I may go in and sort the array first, and then test the speed. OPINIONS?

**georgekar** · Dec 23rd, 2014, 01:20 PM

I think a sorted list 1 give a sorted list2. If you have list 2 auto sorted then a non sorted input maybe is a lost of speed, but not much. So use not sorted list 2 and sorted list 1.

**SamOscarBrown** · Dec 23rd, 2014, 01:25 PM

Yup, as suspected....If I sort my array first (as I had list1 sorted first), the time to count the words and populate list2 was reduced from ~ 5 seconds to various instances between 0.007 and 0.99 secs...big difference.

**SamOscarBrown** · Dec 23rd, 2014, 01:45 PM

So, I suppose, if I eliminate BOTH listboxes, and use two arrays, it will be much faster to populate that second array as to populate list2.
If I try that, I still need to show on the screen, but what I may do is take that second array and populate a msflexgrid with two columns, the first being the count, the second being the word, and sort it by count (highest first). That would make the most sense to me if I wanted to see the most common words. I then will probably have a search routine to find the grid's row for any given word. Sounds like a fun little challenge.

**georgekar** · Dec 23rd, 2014, 02:22 PM

No, if you use an array as the second list you slow the process...because you have to implement the search routine...So use ms dictionary...to use a fast search...

**SamOscarBrown** · Dec 23rd, 2014, 02:39 PM

Well, you may be right....what I did was kept the second listbox, split the text values of that listbox into word and number of occurrences, then populated a table in my database, then retrieved the information back into a grid with an sql which brought back the most occurrences first, etc. WHOLE process (including reading the data in the first time, until the grid was fully populated) took only 1.15 seconds.

Of course now my code is all commented out in many, many places, so now have to go back and clean it up...and, once again, fully understand how the populating of the second listbox actually occurred so much more rapidly than my first attempt.

**LaVolpe** · Dec 23rd, 2014, 03:11 PM

Originally Posted by SamOscarBrown

Yup, as suspected....If I sort my array first (as I had list1 sorted first), the time to count the words and populate list2 was reduced from ~ 5 seconds to various instances between 0.007 and 0.99 secs...big difference.

That could explain the speed hit. Not only that, the routines are dependent on a sorted source list. Without it being sorted, you are not only getting false occurrence counts, but are updating List2 way too often

**georgekar** · Dec 24th, 2014, 08:16 AM

I do some practice...with examples..to optimize them. I see that making big arrays from the start is worst scenario from making them at steps. So here is the 7 example program...I conform to the proposal for later sorting on Olaf's super speed vbRichClient, so now has the full speed with later appropriate sorting.
The 7th example is the myDoc class (for documents) that has "on board" the binary search routine. So if we have a document (also mydoc is combined with the text viewer - an editor I have in M2000 - look code), with a setting to a reference the text viewer just use the data from mydoc). I make some functions such as to count in ms the number of words (very fast), and we can get all the words in an array or in a listbox, or we can count the unique words and if we want we can use later the saved words to fill a listbox or an array.

Binary search routine count the words in the same time that sort the words. Now has comparable speed (the stand alone routine need 1.3 seconds to find from 792367 words 13231 unique words and the number of appearance. Ms Dictionary with a stage for sorting need 1.118, Olaf's routine 0.574 sec (very fast, including second sort), and 1.156 if we apply sort and include the time to final (this is for the Ms Dictionary -clicking in the left listbox, with books). (we must hide all other windows to get a good measure...). Binary search 1.330, Hashtable 1.515 including the sort routine. The slow routine 10.500 (including the sorting routine). Finally the 7th routine (has the same binary routine but get the data from Document class and run in the space of the class, 1.480ms. So except the slow routine all routines are between 0.574ms and 1.515. The binary search routine in the Document class is more versatile, because works for unicode (and sort for that). We can pass the Licd, so we can sort for any country (although I have no manage to fit the sub regions..is something too much...for now).

That's all...
(Now I place the new Document class myDoc.class to M2000, so I make new commands, to count words and read the number of appearance for any word.)

Thread: [RESOLVED] Fast count of words in a list

Thread Tools

Display

Re: Fast count of words in a list

Re: Fast count of words in a list

Re: Fast count of words in a list

Re: Fast count of words in a list

Re: Fast count of words in a list

Re: Fast count of words in a list

Re: Fast count of words in a list

Re: Fast count of words in a list

Re: Fast count of words in a list

Re: Fast count of words in a list

Re: Fast count of words in a list

Re: Fast count of words in a list

Re: Fast count of words in a list

Re: Fast count of words in a list

Re: Fast count of words in a list

Re: Fast count of words in a list

Re: Fast count of words in a list

Re: Fast count of words in a list

Re: Fast count of words in a list

Re: Fast count of words in a list

Re: Fast count of words in a list

Re: Fast count of words in a list

Re: Fast count of words in a list

Re: Fast count of words in a list

Re: Fast count of words in a list

Re: [RESOLVED] Fast count of words in a list

Re: [RESOLVED] Fast count of words in a list

Re: [RESOLVED] Fast count of words in a list

Re: [RESOLVED] Fast count of words in a list

Re: [RESOLVED] Fast count of words in a list

Re: [RESOLVED] Fast count of words in a list

Re: [RESOLVED] Fast count of words in a list

Re: [RESOLVED] Fast count of words in a list

Seven Examples...

Posting Permissions