Results 1 to 8 of 8

Thread: [RESOLVED] Best approach: Sort array after all items added or sort as items are added?

  1. #1

    Thread Starter
    Hyperactive Member
    Join Date
    Aug 2011
    Location
    Palm Coast, FL
    Posts
    436

    Resolved [RESOLVED] Best approach: Sort array after all items added or sort as items are added?

    I have an algorithm question.

    Currently I'm adding strings (file names) to an array of strings and then sorting the array (using a Quicksort algorithm and StrCompLogical to sort the names like Win Explorer) after all file names have been added to the array. But something I read recently in another post leads me to believe that it might be more performant to actually sort as I go (add each new string into the array at the correct sorted position) so that at any point in time the array of strings is always sorted.

    Question 1: Which approach is "better" when dealing with hundreds of thousands of file names? Better can mean faster, less memory intensive, etc...

    Question 2: If it's agreed that keeping the array sorted as strings are added to it is "best", does anyone have code for such a routine? I started writing this routine myself but then realized that this must be a common enough procedure that many of you already have classes or modules with this code which you've used and tested already. Why reinvent the wheel? Any of you have some code for this that you're willing to share?

    The fastest and least memory consumption solution will be the one I choose assuming that the code isn't so clever that I can't easily understand it and maintain it. Not interested in adding third-party DLLs/TLBs to my project to solve this, if I don't need to. For example, I know that vbRichClient5 has a SortedDictionary class that can do this for me. I'd prefer a VB6 and Win32 API code solution to this.


    Thanks in advance!
    Last edited by AAraya; May 29th, 2018 at 06:15 PM.

  2. #2
    VB-aholic & Lovin' It LaVolpe's Avatar
    Join Date
    Oct 2007
    Location
    Beside Waldo
    Posts
    19,541

    Re: Best approach: Sort array after all items added or sort as items are added?

    Here is a codebank link with various sorting algorithms, including their pros and cons.

    In short, you seem to be asking whether and "insertion sort" (sort as you go) is better/worse than a "quick sort" (sort afterwards). Though insertion sort is rated poorly compared to quick sort, in most articles you'll read, each can have their own advantages in specific scenarios. If arrays are used, part of the reason for insertion sort's lower rating is mostly negated via CopyMemory to move entire blocks of items vs moving one item at a time -- main reason for its lower rating. Typically, insertion sort will be rated poorly for large lists.

    Anyway, you can try many of those sort routines yourself. Definitely read the pros/cons for each one offered.

    Here's another link (at CodeProject site) discussion other sorting algos.

    Not mentioned in your scenarios is the need to identify duplicate items. Is that still a requirement as you mentioned in your previous thread (similar topic)?
    Insomnia is just a byproduct of, "It can't be done"

    Classics Enthusiast? Here's my 1969 Mustang Mach I Fastback. Her sister '67 Coupe has been adopted

    Newbie? Novice? Bored? Spend a few minutes browsing the FAQ section of the forum.
    Read the HitchHiker's Guide to Getting Help on the Forums.
    Here is the list of TAGs you can use to format your posts
    Here are VB6 Help Files online


    {Alpha Image Control} {Memory Leak FAQ} {Unicode Open/Save Dialog} {Resource Image Viewer/Extractor}
    {VB and DPI Tutorial} {Manifest Creator} {UserControl Button Template} {stdPicture Render Usage}

  3. #3

    Thread Starter
    Hyperactive Member
    Join Date
    Aug 2011
    Location
    Palm Coast, FL
    Posts
    436

    Re: Best approach: Sort array after all items added or sort as items are added?

    Hi LaVolpe

    Thanks for the response. I came across that sorting algorithms thread earlier. Lots of information in there!

    1) Yes, I am asking about the performance of an Insertion sort (sort-as-you-go) vs a "quicksort" (sort-at-the-end)

    2) Yes, avoiding duplicate filenames in the list is still a requirement. (Thanks for remembering.)

    3) You mentioned using CopyMemory to negate the negative performance hit of moving blocks of items in the array. That is exactly where I saw the bottle neck as well. Can you provide me with sample code to shift a block of string array elements down one position in the array? Right now in my implementation I'm doing this one at a time which is clearly not performant.
    Last edited by AAraya; May 29th, 2018 at 07:11 PM.

  4. #4
    VB-aholic & Lovin' It LaVolpe's Avatar
    Join Date
    Oct 2007
    Location
    Beside Waldo
    Posts
    19,541

    Re: Best approach: Sort array after all items added or sort as items are added?

    Sure, I can whip something up and not sure how it will compare to other sort algos even enhancing it with CopyMemory and pre-sizing the initial array very large to help prevent 1000s of ReDim Preserve calls for appending new items to the "collection"
    Insomnia is just a byproduct of, "It can't be done"

    Classics Enthusiast? Here's my 1969 Mustang Mach I Fastback. Her sister '67 Coupe has been adopted

    Newbie? Novice? Bored? Spend a few minutes browsing the FAQ section of the forum.
    Read the HitchHiker's Guide to Getting Help on the Forums.
    Here is the list of TAGs you can use to format your posts
    Here are VB6 Help Files online


    {Alpha Image Control} {Memory Leak FAQ} {Unicode Open/Save Dialog} {Resource Image Viewer/Extractor}
    {VB and DPI Tutorial} {Manifest Creator} {UserControl Button Template} {stdPicture Render Usage}

  5. #5

    Thread Starter
    Hyperactive Member
    Join Date
    Aug 2011
    Location
    Palm Coast, FL
    Posts
    436

    Re: Best approach: Sort array after all items added or sort as items are added?

    Quote Originally Posted by LaVolpe View Post
    Sure, I can whip something up and not sure how it will compare to other sort algos even enhancing it with CopyMemory and pre-sizing the initial array very large to help prevent 1000s of ReDim Preserve calls for appending new items to the "collection"
    LaVolpe - just to be clear... I'm not asking you to write the insertion sort and binary search algos for me. I'm specifically interested in the portion of your reply where you mentioned using CopyMemory to move blocks of string array elements. Just a few lines to demonstrate that will be great!

  6. #6
    VB-aholic & Lovin' It LaVolpe's Avatar
    Join Date
    Oct 2007
    Location
    Beside Waldo
    Posts
    19,541

    Re: Best approach: Sort array after all items added or sort as items are added?

    Easier to copy & paste existing code than to try and explain it in detail. Moving StrPtrs around w/CopyMemory can cause crashes if not done properly

    Here's the API declaration I'm using for CopyMemory & form/module level variables
    Code:
    Private Declare Sub CopyMemory Lib "kernel32.dll" Alias "RtlMoveMemory" (ByRef Destination As Any, ByRef Source As Any, ByVal length As Long)
    
    Private m_Strings() As String
    Private m_StrCount As Long
    Here's the InsertionSort routine
    Code:
    Private Function InsertSort(ByRef NewString As String, ByRef isNew As Boolean) As Long
    
        ' MODIFIED BINARY SEARCH ALGORITHM -- Divide and conquer. Used by cFunctionsBMP & cFunctionsTGA
        ' Binary search algorithms are about the fastest on the planet, but
        ' its biggest disadvantage is that the array must already be sorted.
        ' Ex: binary search can find a value among 1 million values between 1 and 20 iterations
        
        ' [in] NewString. A value to search for. Order is always ascending
        ' [out] isNew. If NewString not found, isNew is True else False
        ' [out] Return value: The Index where NewString was found or where NewString was inserted
    
        Dim UB As Long, LB As Long
        Dim newIndex As Long, lResult As Long
        
        If m_StrCount = 0& Then
            newIndex = 1&
            isNew = True
        
        Else
            UB = m_StrCount
            LB = 1&
            
            Do Until LB > UB
                newIndex = LB + ((UB - LB) \ 2&)
                lResult = StrComp(m_Strings(newIndex), NewString)
                Select Case lResult
                Case 0          ' duplicate
                    Exit Do
                Case Is > 0     ' NewString is lower in sort order
                    UB = newIndex - 1&
                Case Else       ' NewString is higher in sort order
                    LB = newIndex + 1&
                End Select
            Loop
        
            If LB > UB Then  ' NewString was not found
                If lResult < 0 Then newIndex = newIndex + 1&
                isNew = True
            Else
                isNew = False
            End If
        End If
        
        If isNew Then
            If m_StrCount = UBound(m_Strings) Then
                ReDim Preserve m_Strings(1 To m_StrCount + 101) ' buffer 100 items, make even larger?
            End If
            m_StrCount = m_StrCount + 1&
            If newIndex < m_StrCount Then  ' keep our array in ascending order
                CopyMemory ByVal VarPtr(m_Strings(newIndex + 1&)), ByVal VarPtr(m_Strings(newIndex)), (m_StrCount - newIndex) * 4&
                CopyMemory ByVal VarPtr(m_Strings(newIndex)), 0&, 4& ' clear StrPtr
            End If
            m_Strings(newIndex) = NewString ' add NewString now
            
        End If
        InsertSort = newIndex
    End Function
    Note. This routine expects a form/module level m_Strings() String array (LBound of 1) and m_StrCount Long variable w/initial value of zero.

    Suggestion: If you know these arrays will be large, start with a large array, i.e., ReDim m_Strings(1 to 400000)
    Last edited by LaVolpe; May 29th, 2018 at 08:18 PM.
    Insomnia is just a byproduct of, "It can't be done"

    Classics Enthusiast? Here's my 1969 Mustang Mach I Fastback. Her sister '67 Coupe has been adopted

    Newbie? Novice? Bored? Spend a few minutes browsing the FAQ section of the forum.
    Read the HitchHiker's Guide to Getting Help on the Forums.
    Here is the list of TAGs you can use to format your posts
    Here are VB6 Help Files online


    {Alpha Image Control} {Memory Leak FAQ} {Unicode Open/Save Dialog} {Resource Image Viewer/Extractor}
    {VB and DPI Tutorial} {Manifest Creator} {UserControl Button Template} {stdPicture Render Usage}

  7. #7

    Thread Starter
    Hyperactive Member
    Join Date
    Aug 2011
    Location
    Palm Coast, FL
    Posts
    436

    Re: Best approach: Sort array after all items added or sort as items are added?

    EXACTLY what I was looking for LaVolpe. You've been very helpful to me in many of my questions on this forum. I just want you to know that I really appreciate you.


    Art

  8. #8
    VB-aholic & Lovin' It LaVolpe's Avatar
    Join Date
    Oct 2007
    Location
    Beside Waldo
    Posts
    19,541

    Re: Best approach: Sort array after all items added or sort as items are added?

    Note: I had a typo in the sample code and edited it
    Line was: ReDim Preserve m_Strings(1 To newIndex + 100)
    Changed to: ReDim Preserve m_Strings(1 To m_StrCount + 101)

    Kinda curious how it pans out in your tests. With such large arrays, I'm not sure buffering 100 new items per array resizing will be efficient enough -- may need to play with that, say 200, 500, etc, but wouldn't get overly greedy on that number.

    Edited: If you need to reset in your code, ensure you re-prime the string array with a new ReDim statement, not ReDim Preserve. Also reset m_StrCount to zero.

    And just FYI. The function I provided has IsNew parameter. You can make that Optional (keyword) if you'd like. In that sample code, I added the array resizing and purposely ignored duplicates from being added to the array. However, if interested, the adding of items to the array and resizing can be done outside the function for more control/flexibility. The function then becomes just a binary search. And in that case, you'd remove that part of the code for adding/sizing to a separate routine...
    Code:
        Dim Index As Long, bNew As Boolean
    
        Index = InsertSort(sSomeString, bNew)
        ' if you just want to know if sSomeString exists in array
        If bNew = False Then ' exists
    
        ' if you want to add sSomeString to the array (whether bNew is important or not)
        ' recommend creating a separate routine to append strings to the array, passing it the 
        ' Index value & new string
    
    Private Sub AppendString(Index As Long, NewString As String)
        If m_StrCount = UBound(m_Strings) Then
            ReDim Preserve m_Strings(1 To m_StrCount + 101) ' buffer 100 items
        End If
        m_StrCount = m_StrCount + 1&
        If Index < m_StrCount Then  ' keep our array in ascending order
            CopyMemory ByVal VarPtr(m_Strings(Index + 1&)), ByVal VarPtr(m_Strings(Index)), (m_StrCount - Index) * 4&
            CopyMemory ByVal VarPtr(m_Strings(Index)), 0&, 4& ' clear 
        End If
        m_Strings(Index) = NewString
    End Sub
    Last edited by LaVolpe; May 29th, 2018 at 10:47 PM.
    Insomnia is just a byproduct of, "It can't be done"

    Classics Enthusiast? Here's my 1969 Mustang Mach I Fastback. Her sister '67 Coupe has been adopted

    Newbie? Novice? Bored? Spend a few minutes browsing the FAQ section of the forum.
    Read the HitchHiker's Guide to Getting Help on the Forums.
    Here is the list of TAGs you can use to format your posts
    Here are VB6 Help Files online


    {Alpha Image Control} {Memory Leak FAQ} {Unicode Open/Save Dialog} {Resource Image Viewer/Extractor}
    {VB and DPI Tutorial} {Manifest Creator} {UserControl Button Template} {stdPicture Render Usage}

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  



Click Here to Expand Forum to Full Width