-
May 29th, 2018, 06:02 PM
#1
Thread Starter
Hyperactive Member
[RESOLVED] Best approach: Sort array after all items added or sort as items are added?
I have an algorithm question.
Currently I'm adding strings (file names) to an array of strings and then sorting the array (using a Quicksort algorithm and StrCompLogical to sort the names like Win Explorer) after all file names have been added to the array. But something I read recently in another post leads me to believe that it might be more performant to actually sort as I go (add each new string into the array at the correct sorted position) so that at any point in time the array of strings is always sorted.
Question 1: Which approach is "better" when dealing with hundreds of thousands of file names? Better can mean faster, less memory intensive, etc...
Question 2: If it's agreed that keeping the array sorted as strings are added to it is "best", does anyone have code for such a routine? I started writing this routine myself but then realized that this must be a common enough procedure that many of you already have classes or modules with this code which you've used and tested already. Why reinvent the wheel? Any of you have some code for this that you're willing to share?
The fastest and least memory consumption solution will be the one I choose assuming that the code isn't so clever that I can't easily understand it and maintain it. Not interested in adding third-party DLLs/TLBs to my project to solve this, if I don't need to. For example, I know that vbRichClient5 has a SortedDictionary class that can do this for me. I'd prefer a VB6 and Win32 API code solution to this.
Thanks in advance!
Last edited by AAraya; May 29th, 2018 at 06:15 PM.
-
May 29th, 2018, 06:50 PM
#2
Re: Best approach: Sort array after all items added or sort as items are added?
Here is a codebank link with various sorting algorithms, including their pros and cons.
In short, you seem to be asking whether and "insertion sort" (sort as you go) is better/worse than a "quick sort" (sort afterwards). Though insertion sort is rated poorly compared to quick sort, in most articles you'll read, each can have their own advantages in specific scenarios. If arrays are used, part of the reason for insertion sort's lower rating is mostly negated via CopyMemory to move entire blocks of items vs moving one item at a time -- main reason for its lower rating. Typically, insertion sort will be rated poorly for large lists.
Anyway, you can try many of those sort routines yourself. Definitely read the pros/cons for each one offered.
Here's another link (at CodeProject site) discussion other sorting algos.
Not mentioned in your scenarios is the need to identify duplicate items. Is that still a requirement as you mentioned in your previous thread (similar topic)?
-
May 29th, 2018, 07:07 PM
#3
Thread Starter
Hyperactive Member
Re: Best approach: Sort array after all items added or sort as items are added?
Hi LaVolpe
Thanks for the response. I came across that sorting algorithms thread earlier. Lots of information in there!
1) Yes, I am asking about the performance of an Insertion sort (sort-as-you-go) vs a "quicksort" (sort-at-the-end)
2) Yes, avoiding duplicate filenames in the list is still a requirement. (Thanks for remembering.)
3) You mentioned using CopyMemory to negate the negative performance hit of moving blocks of items in the array. That is exactly where I saw the bottle neck as well. Can you provide me with sample code to shift a block of string array elements down one position in the array? Right now in my implementation I'm doing this one at a time which is clearly not performant.
Last edited by AAraya; May 29th, 2018 at 07:11 PM.
-
May 29th, 2018, 07:31 PM
#4
Re: Best approach: Sort array after all items added or sort as items are added?
Sure, I can whip something up and not sure how it will compare to other sort algos even enhancing it with CopyMemory and pre-sizing the initial array very large to help prevent 1000s of ReDim Preserve calls for appending new items to the "collection"
-
May 29th, 2018, 07:41 PM
#5
Thread Starter
Hyperactive Member
Re: Best approach: Sort array after all items added or sort as items are added?
Originally Posted by LaVolpe
Sure, I can whip something up and not sure how it will compare to other sort algos even enhancing it with CopyMemory and pre-sizing the initial array very large to help prevent 1000s of ReDim Preserve calls for appending new items to the "collection"
LaVolpe - just to be clear... I'm not asking you to write the insertion sort and binary search algos for me. I'm specifically interested in the portion of your reply where you mentioned using CopyMemory to move blocks of string array elements. Just a few lines to demonstrate that will be great!
-
May 29th, 2018, 08:00 PM
#6
Re: Best approach: Sort array after all items added or sort as items are added?
Easier to copy & paste existing code than to try and explain it in detail. Moving StrPtrs around w/CopyMemory can cause crashes if not done properly
Here's the API declaration I'm using for CopyMemory & form/module level variables
Code:
Private Declare Sub CopyMemory Lib "kernel32.dll" Alias "RtlMoveMemory" (ByRef Destination As Any, ByRef Source As Any, ByVal length As Long)
Private m_Strings() As String
Private m_StrCount As Long
Here's the InsertionSort routine
Code:
Private Function InsertSort(ByRef NewString As String, ByRef isNew As Boolean) As Long
' MODIFIED BINARY SEARCH ALGORITHM -- Divide and conquer. Used by cFunctionsBMP & cFunctionsTGA
' Binary search algorithms are about the fastest on the planet, but
' its biggest disadvantage is that the array must already be sorted.
' Ex: binary search can find a value among 1 million values between 1 and 20 iterations
' [in] NewString. A value to search for. Order is always ascending
' [out] isNew. If NewString not found, isNew is True else False
' [out] Return value: The Index where NewString was found or where NewString was inserted
Dim UB As Long, LB As Long
Dim newIndex As Long, lResult As Long
If m_StrCount = 0& Then
newIndex = 1&
isNew = True
Else
UB = m_StrCount
LB = 1&
Do Until LB > UB
newIndex = LB + ((UB - LB) \ 2&)
lResult = StrComp(m_Strings(newIndex), NewString)
Select Case lResult
Case 0 ' duplicate
Exit Do
Case Is > 0 ' NewString is lower in sort order
UB = newIndex - 1&
Case Else ' NewString is higher in sort order
LB = newIndex + 1&
End Select
Loop
If LB > UB Then ' NewString was not found
If lResult < 0 Then newIndex = newIndex + 1&
isNew = True
Else
isNew = False
End If
End If
If isNew Then
If m_StrCount = UBound(m_Strings) Then
ReDim Preserve m_Strings(1 To m_StrCount + 101) ' buffer 100 items, make even larger?
End If
m_StrCount = m_StrCount + 1&
If newIndex < m_StrCount Then ' keep our array in ascending order
CopyMemory ByVal VarPtr(m_Strings(newIndex + 1&)), ByVal VarPtr(m_Strings(newIndex)), (m_StrCount - newIndex) * 4&
CopyMemory ByVal VarPtr(m_Strings(newIndex)), 0&, 4& ' clear StrPtr
End If
m_Strings(newIndex) = NewString ' add NewString now
End If
InsertSort = newIndex
End Function
Note. This routine expects a form/module level m_Strings() String array (LBound of 1) and m_StrCount Long variable w/initial value of zero.
Suggestion: If you know these arrays will be large, start with a large array, i.e., ReDim m_Strings(1 to 400000)
Last edited by LaVolpe; May 29th, 2018 at 08:18 PM.
-
May 29th, 2018, 08:13 PM
#7
Thread Starter
Hyperactive Member
Re: Best approach: Sort array after all items added or sort as items are added?
EXACTLY what I was looking for LaVolpe. You've been very helpful to me in many of my questions on this forum. I just want you to know that I really appreciate you.
Art
-
May 29th, 2018, 08:16 PM
#8
Re: Best approach: Sort array after all items added or sort as items are added?
Note: I had a typo in the sample code and edited it
Line was: ReDim Preserve m_Strings(1 To newIndex + 100)
Changed to: ReDim Preserve m_Strings(1 To m_StrCount + 101)
Kinda curious how it pans out in your tests. With such large arrays, I'm not sure buffering 100 new items per array resizing will be efficient enough -- may need to play with that, say 200, 500, etc, but wouldn't get overly greedy on that number.
Edited: If you need to reset in your code, ensure you re-prime the string array with a new ReDim statement, not ReDim Preserve. Also reset m_StrCount to zero.
And just FYI. The function I provided has IsNew parameter. You can make that Optional (keyword) if you'd like. In that sample code, I added the array resizing and purposely ignored duplicates from being added to the array. However, if interested, the adding of items to the array and resizing can be done outside the function for more control/flexibility. The function then becomes just a binary search. And in that case, you'd remove that part of the code for adding/sizing to a separate routine...
Code:
Dim Index As Long, bNew As Boolean
Index = InsertSort(sSomeString, bNew)
' if you just want to know if sSomeString exists in array
If bNew = False Then ' exists
' if you want to add sSomeString to the array (whether bNew is important or not)
' recommend creating a separate routine to append strings to the array, passing it the
' Index value & new string
Private Sub AppendString(Index As Long, NewString As String)
If m_StrCount = UBound(m_Strings) Then
ReDim Preserve m_Strings(1 To m_StrCount + 101) ' buffer 100 items
End If
m_StrCount = m_StrCount + 1&
If Index < m_StrCount Then ' keep our array in ascending order
CopyMemory ByVal VarPtr(m_Strings(Index + 1&)), ByVal VarPtr(m_Strings(Index)), (m_StrCount - Index) * 4&
CopyMemory ByVal VarPtr(m_Strings(Index)), 0&, 4& ' clear
End If
m_Strings(Index) = NewString
End Sub
Last edited by LaVolpe; May 29th, 2018 at 10:47 PM.
Tags for this Thread
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules
|
Click Here to Expand Forum to Full Width
|