Results 1 to 18 of 18

Thread: [RESOLVED] Quicksort issue

  1. #1

    Thread Starter
    Fanatic Member
    Join Date
    Apr 2021
    Posts
    521

    Resolved [RESOLVED] Quicksort issue

    http://www.vbforums.com/showthread.php?t=231925 is the quicksort implementation I am using, and some might say I should be posting this in there...I've modified it slightly so that it can work with numerical data and can sort multiple arrays based on the primary array (hard coded into it, as I only use it for this one thing). I also plan to incorporate indexed sorting so that it actually moves a pointer to the data rather than having to move all the data every time...this will speed things up, as it will only move 5-7 array entries when it is finished sorting...no need to mention it, I've done it before and will do it again in this app eventually :-)

    However, here is my problem...I have a function before it that checks to see if an array is perfectly sorted, and if it isn't it will call the quicksort function. That quicksort function then calls itself time after time. It's always worked fine for me, despite nesting within itself deep like this. Those who know what I am working on will know I am working with millions of lines of data (the files can be up to 6.5GB, but the data I am actually sorting is a smaller subset of that (it takes ~1 second to run a sort on 1.6m lines of data, though most of that data IS in order...essentially, the more out of order the data is the longer it takes). One of my blocks of data is far (or so it wants to pretend) more out of order than the rest. Most of the nesting (where the program calls itself) goes as deep as around 20 layers, but this specific file has gone up to and past 3500 layers at which point it says "out of stack space" and bugs out.

    To combat this, I put in a bit of code that counts the number of nests, and if it goes above a certain amount (I have found the golden number to be ~500-1000) it is to un-nest all the way back and complete the sorting failed. I then have it set to loop if failed, in which case it of course checks to see if the array is sorted (that bit of code takes ms to run, it isn't an issue) and repeats the process. The code is set up to output to me how the data scores (how out of order it is) and it looks like the file becomes more out of order with each iteration up to a point then starts to recover and bring the number back down again...it'll bounce up and down like this for a little while, eventually taking 7+ minutes to fully sort the array.

    So, does anyone have alternative suggestions for sorting algorithms (preferably with a link to code) I could use to perhaps "pre-sort" the data loosely before sending it to QuickSort? QuickSort is fast when it wants to work, but not when it doesn't. I'm not sure exactly what it is that is causing it to bug out like this, I don't understand the QS sort properly myself, perhaps someone else has an idea of what is causing QuickSort to become SlowSort (apart from the fact that I am sending ~4m badly sorted lines of data to it)?

  2. #2
    PowerPoster Elroy's Avatar
    Join Date
    Jun 2014
    Location
    Near Nashville TN
    Posts
    9,941

    Re: Quicksort issue

    One of the terms you're looking for is "recursion" (a procedure calling itself).

    Here's a quicksort I've used for decades. Its origins go back to some PDS Basic software I developed back in the 1980s. And it actually doesn't use recursion.

    Code:
    
    Public Sub QuickSortStrings(ToBeSorted() As String)
        ' Lbound and Ubound can be anything.
        ' No check for whether ToBeSorted() is dimensioned.
        '
        ' To change the type being sorted, change the incoming type and also the two following.
        ' When this is done, the procedure name should also be appropriately renamed.
        ' UDTs or anything could be used.  However, if UDTs are used, the "Do While" comparison will need to be tweaked.
        Dim TmpCompare As String
        Dim TmpSwap As String
        '
        Dim iJumps() As Long
        Dim iMkr As Long
        Dim iLp1 As Long
        Dim iLp2 As Long
        Dim iTmp As Long
        Dim iJmp As Long
        Dim iCnt As Long
        Dim iLbd As Long
        '
        iCnt = UBound(ToBeSorted) - LBound(ToBeSorted) + 1&
        iLbd = LBound(ToBeSorted)
        If iCnt < 2& Then Exit Sub
        ReDim iJumps(0& To 99&)
        '
        iJumps(0&) = 1&
        iJumps(1&) = 4&
        iJumps(2&) = 13&
        iMkr = 0&
        Do While iJumps(iMkr + 2&) < iCnt
            iMkr = iMkr + 1&
            If iMkr + 2& > UBound(iJumps) Then ReDim Preserve iJumps(LBound(iJumps) To UBound(iJumps) + 100&)
            iJumps(iMkr + 2&) = 3& * iJumps(iMkr + 1&) + 1&
        Loop
        '
        For iLp1 = iMkr To 0& Step -1&
            iJmp = iJumps(iLp1)
            For iLp2 = iJmp To iCnt - 1&
                iTmp = iLp2 - iJmp
                TmpCompare = ToBeSorted(iLp2 + iLbd)
                '
                ' Tweak the following if an element in a UDT is to be sorted.
                Do While TmpCompare < ToBeSorted(iTmp + iLbd)
                    TmpSwap = ToBeSorted(iTmp + iJmp + iLbd)
                    ToBeSorted(iTmp + iJmp + iLbd) = ToBeSorted(iTmp + iLbd)
                    ToBeSorted(iTmp + iLbd) = TmpSwap
                    iTmp = iTmp - iJmp
                    If iTmp < 0& Then Exit Do
                Loop
                ToBeSorted(iTmp + iJmp + iLbd) = TmpCompare
                Next
          Next
    End Sub
    
    
    Truth be told though, I don't use it much these days. If I need to sort something, I tend to do one of the following: 1) cram them all into a hidden ListBox, 2) cram them all into a Collection, just using the key to sort. That second method has the problem of retrieving the keys while iterating, but maybe the Dictionary object can solve that problem. There are ways to get the keys of a Collection though.

    -------------

    You may want to run some timings between that QuickSortStrings and CVMichael's QuickSort, which I haven't done, and have no idea how they'd work out.
    Last edited by Elroy; Feb 29th, 2024 at 12:42 PM.
    Any software I post in these forums written by me is provided "AS IS" without warranty of any kind, expressed or implied, and permission is hereby granted, free of charge and without restriction, to any person obtaining a copy. To all, peace and happiness.

  3. #3

    Thread Starter
    Fanatic Member
    Join Date
    Apr 2021
    Posts
    521

    Re: Quicksort issue

    Quote Originally Posted by Elroy View Post
    One of the terms you're looking for is "recursion" (a procedure calling itself).

    Here's a quicksort I've used for decades. Its origins go back to some PDS Basic software I developed back in the 1980s. And it actually doesn't use recursion.

    (code snipped)

    Truth be told though, I don't use it much these days. If I need to sort something, I tend to do one of the following: 1) cram them all into a hidden ListBox, 2) cram them all into a Collection, just using the key to sort. That second method has the problem of retrieving the keys while iterating, but maybe the Dictionary object can solve that problem. There are ways to get the keys of a Collection though.

    -------------

    You may want to run some timings between that QuickSortStrings and CVMichael's QuickSort, which I haven't done, and have no idea how they'd work out.
    Thanks for that, I'll test it out (making the few little changes I need to use it with Longs and also using an already created (for QuickSort) function which does the swap of all the different arrays for me...as I mentioned above, I plan to actually do an indexed sort in future (so the index changes but the values in all the arrays I am working with don't...and when I'm done I run through those arrays performing those changes), and working with an external function to handle the swapping would make that easier to implement later. An indexed sort will be far quicker again, but I am just trying to sanitise the incoming data and that starts by making sure it's ordered correctly by time (weirdly, 19 out of the ~20 files I am working with are in need of sorting).

    I'll let you know how it performs later...if it's better than the quicksort I am using, I'll use it instead...if it isn't, I'll put it in as a backup for if the current quicksort errors out (hits 500 recursions). QuickSort has always been my go-to as it's a simple and USUALLY fast method for sorting data that doesn't use extra memory...I'd use Radix if it wasn't such a memory hog...always wanted to use Radix in a project :-)

    And I don't think a listbox would like me if I were to cram 4m values into it...especially wouldn't work as I am actually sorting multiple variables from a target variable's value, not just that first one :-)

    Edit: It seems comparable in speed for the previous files...slightly faster by milliseconds...but SOMETIMES it performs drastically worse, so it may be a decent backup contender for if the other QuickSort bugs out...it manages to sort THAT bad one in under 7 seconds!

    Edit 2: If I use the previous QS and fallback to ElroyQS, it manages all the sorting in 56s (far better than without ElroyQS, where it was hitting 7 minutes)...but if I ignore QS and use ElroyQS as the main, it manages 30-40s...I really need to properly benchmark them side by side sometime. ElroyQS is definitely marginally better and doesn't need to be a backup contender.
    Last edited by SmUX2k; Feb 29th, 2024 at 02:09 PM.

  4. #4
    Addicted Member
    Join Date
    Jul 2021
    Posts
    200

    Re: Quicksort issue

    I recommend using MergeSort.
    It is as fast as QuickSort, and you can use is with NO RECURSION (Bottom-Up) or with recursion (Top-Down) that GUARANTEED to be very low in depth.
    It will always work the same speed, no matter what "level of disorder" the array is.
    The only downside of MergeSort - it requires an auxiliary array to "merge into".
    Here is an example of MergeSort of Long array:

    Code:
    Sub MergeSort(arr() As Long, aux() As Long, ByVal low As Long, ByVal high As Long)
    Dim m As Long, i As Long, a As Long, b As Long
    If high <= low Then Exit Sub
    m = (low + high) \ 2
    MergeSort arr, aux, low, m
    MergeSort arr, aux, m + 1, high
    i = low - 1
    a = low
    b = m + 1
    Do
        i = i + 1
        If arr(a) <= arr(b) Then
            aux(i) = arr(a): a = a + 1: If a > m Then Exit Do
        Else
            aux(i) = arr(b): b = b + 1: If b > high Then Exit Do
        End If
    Loop
    For a = a To m: i = i + 1: aux(i) = arr(a): Next
    For b = b To high: i = i + 1: aux(i) = arr(b): Next
    For i = low To high: arr(i) = aux(i): Next
    End Sub
    
    Sub test()
    Dim i As Long
    Const SZ As Long = 1000000
    Randomize Timer
    Dim a(1 To SZ) As Long, b(1 To SZ) As Long
    For i = 1 To SZ
        a(i) = Int(Rnd * SZ)
    Next
    MergeSort a, b, 1, SZ
    End Sub

  5. #5

    Thread Starter
    Fanatic Member
    Join Date
    Apr 2021
    Posts
    521

    Re: Quicksort issue

    I've built a sorting benchmarker into the app now...DryBone, I'll add yours at a later date (to test as well) as it is designed for a single array swap (I'm working with multiple arrays that need swapping)...I'll NEED to write the indexed sort to the app to be able to properly use your sorting method properly (or have 5 extra arrays rather than 1 for the MergeSort, which I could probably do...perhaps tomorrow). Also, you mention with or without recursion but don't say how to set it to be bottom-up or top-down (I assume it's to do with low and high, but no explanation there :-) )

    The benchmarker loads perfectly sorted data from the DB and forces in some errors (swaps data points randomly). It is loading 20 files to process (a month's worth) and the timings are JUST for the sorting, nothing else.

    While working on this benchmarker, I came up with the idea that there might be a bug in the QS implementation I was using where it might request a swap between the SAME data point (where the low and high value are exactly the same), and it turns out that I was right. If you go to the link for the QS code you will find line 53 is "If Low <= High Then" and directly after that (bear in mind that Low can equal High in this example) it performs the swap. I disabled swapping if Low = High and it sped things up a little. You would be surprised at how many times this would occur where Low=High when calling for a swap, and moving data around from variable to variable adds up if there's enough of them.

    Elroy...this benchmarker stress-tests your algorithm, and it doesn't fare well.

    In my first test (without the Low/High fix on the QS I was using before) with a large amount of out of order data (about 50% of them aren't in order) these were the results:
    107.27s CVMichael Quicksort (adapted)
    171.16s CVMichael Quicksort (adapted) with ElroyQS fallback on first failure
    731.39s ElroyQS

    When I used less out of order data your algorithm still performed worse but only marginally (double the time).

    I have other things running on the PC, which might be messing with results a little, and they are generally quieter in the daytime. I suspect it kicked in around the middle of this test, as the fallback result is usually about the same as the normal QS with this fake data that doesn't seem to trigger the fallback.

    I'll run this again tomorrow when I wake up, and will try to create some real-world data (taken from my process of creating the data when processing it into the database, I'll output it to a file using binary like suggested elsewhere, so it can be loaded into an array the same way :-) ) rather than using this fake unsorted data, as perhaps this has something to do with it. It should still perform better with my fake unsorted if it performs better with real data with issues...I'll set up the benchmarker so it can toggle the QS fix on (and perhaps add a counter to see how many times it performs the null swap) and off AND can choose fake or real data to test with and try all the different variations.

    Edit: I put in a check to see if the fallback was called for any of the results, and it doesn't perform the test with fallback if it doesn't (speeds the process up). I also set it to only have 20 values in the wrong order.

    56.01s CVMichael Quicksort (adapted)
    47.64s ElroyQS

    That's me done for the night
    Last edited by SmUX2k; Feb 29th, 2024 at 07:50 PM.

  6. #6
    Addicted Member
    Join Date
    Jul 2021
    Posts
    200

    Re: Quicksort issue

    Pardon me for the confusion...
    The above implementation is the recursive (top-down) version.
    The non-recursive version is slightly more involved and less elegant, and I don't see a reason to use it. It has no significant advantage over the recursive one. But I might as well post it if you find it useful (and I have the peace of mind...)
    Sorting by indexes will make MergeSort even more efficient...

  7. #7
    Fanatic Member 2kaud's Avatar
    Join Date
    May 2014
    Location
    England
    Posts
    1,000

    Re: Quicksort issue

    With Quick Sort you need to choose a 'viable' pivot. In the worst-case scenario, which occurs when the smallest or largest element is chosen as the pivot, Quick Sort shows a time complexity of O(n^2) which is undesirable for larger arrays. If a Quick Sort is sometimes performing very poorly, it could be that the chosen pivot value is not right.
    All advice is offered in good faith only. You are ultimately responsible for the effects of your programs and the integrity of the machines they run on. Anything I post, code snippets, advice, etc is licensed as Public Domain https://creativecommons.org/publicdomain/zero/1.0/

    C++23 Compiler: Microsoft VS2022 (17.6.5)

  8. #8
    PowerPoster Zvoni's Avatar
    Join Date
    Sep 2012
    Location
    To the moon and then left
    Posts
    4,444

    Re: Quicksort issue

    Just looking at the requirement gives me a headache.
    I'd probably use a SQLite InMemory-Database (or even filebased, if it's a reocurring theme), throw everything in there, and do a
    SELECT * FROM MyTable ORDER BY MyCriteria
    and be done with it

    No need for Quicksort, Mergesort, Right sort, wrong sort, Recursion, no recursion, juggling pointers around, out-of-stack-space or whatever
    Last edited by Zvoni; Tomorrow at 31:69 PM.
    ----------------------------------------------------------------------------------------

    One System to rule them all, One Code to find them,
    One IDE to bring them all, and to the Framework bind them,
    in the Land of Redmond, where the Windows lie
    ---------------------------------------------------------------------------------
    People call me crazy because i'm jumping out of perfectly fine airplanes.
    ---------------------------------------------------------------------------------
    Code is like a joke: If you have to explain it, it's bad

  9. #9
    PowerPoster
    Join Date
    Jun 2013
    Posts
    7,255

    Re: Quicksort issue

    Quote Originally Posted by Zvoni View Post
    Just looking at the requirement gives me a headache.
    I'd probably use a SQLite InMemory-Database (or even filebased, if it's a reocurring theme), throw everything in there, and do a
    SELECT * FROM MyTable ORDER BY MyCriteria
    and be done with it

    No need for Quicksort, Mergesort, Right sort, wrong sort, Recursion, no recursion, juggling pointers around, out-of-stack-space or whatever
    +1

    And I would "import" (from the text-parsing) directly into "normal, typed fields" in such an SQLite-DB-File.
    There'd be no need for "Arrays" or "Array-loading" anymore, once that is done.
    And even a large SQLite-DB (>2GB) will open in under 30msec, being ready for Rs-returning queries from then on...

    Olaf

  10. #10

    Thread Starter
    Fanatic Member
    Join Date
    Apr 2021
    Posts
    521

    Re: Quicksort issue

    Quote Originally Posted by 2kaud View Post
    With Quick Sort you need to choose a 'viable' pivot. In the worst-case scenario, which occurs when the smallest or largest element is chosen as the pivot, Quick Sort shows a time complexity of O(n^2) which is undesirable for larger arrays. If a Quick Sort is sometimes performing very poorly, it could be that the chosen pivot value is not right.
    I think the pivot point is automatically mid-range in the file. You could be right there though, I should do a pre-check to see what the pivot is in relation to the average of the data and change pivot point if it is wildly out...I'm already doing a pre-check to see if the data is in order (or how out of order it is) so I should be able to calculate an average.

    Quote Originally Posted by Zvoni View Post
    Just looking at the requirement gives me a headache.
    I'd probably use a SQLite InMemory-Database (or even filebased, if it's a reocurring theme), throw everything in there, and do a
    SELECT * FROM MyTable ORDER BY MyCriteria
    and be done with it
    It's a possibility, and I already have a ram drive (installed it yesterday for free...was almost no hassle at all) giving me 4GB of memory-based swap space (I have 32GB so can probably afford more) that I can use within my app for temporary files. Rather than an in-memory database, I could have a normal database that just happens to be in-memory because it's stored on a ram drive, and I don't have to change anything because what I have set up will let me point to a specific database before performing work on it...I should have thought of this myself, to be honest...I'll code it for my specific needs and call it "SQLSort", and include its results in benchmarking :-)

    Quote Originally Posted by Schmidt View Post
    +1

    And I would "import" (from the text-parsing) directly into "normal, typed fields" in such an SQLite-DB-File.
    There'd be no need for "Arrays" or "Array-loading" anymore, once that is done.
    And even a large SQLite-DB (>2GB) will open in under 30msec, being ready for Rs-returning queries from then on...
    Store the data from the text parsing directly in the SQL database as text? Did you miss the "I've got hundreds of gigabytes of compressed stock tick data" post? Just the tick data alone will fill up a 4TB drive easily.

    Yes to storing it in SQL directly as I go (to a temporary file on ram disk, as I mentioned to Zvoni), but when I'm done processing a file I pull out the data into arrays to be compressed and re-stored permanently. The bonus here is that I can select by specific columns.

    You might be glad to know I was planning on redoing the ENTIRE text parsing section of code...it's not that it doesn't work, it's that I wanted to create something that was able to work with different kinds of incoming data (raw tick data which has 7 columns, or OHLC data which has 8...and they have different column headings)...because of that, I can probably try these different suggestions and see what is best.

    My problem is that I know almost nothing about SQL (both within VB and generally), though I am slowly getting there with how to do things. I've worked out .ExecCmd, how to use rs.recordset, things like that...I'm probably doing things the entirely wrong way, but as long as it works that's all that matters.

  11. #11
    PowerPoster Zvoni's Avatar
    Join Date
    Sep 2012
    Location
    To the moon and then left
    Posts
    4,444

    Re: Quicksort issue

    Quote Originally Posted by SmUX2k View Post
    My problem is that I know almost nothing about SQL (both within VB and generally), though I am slowly getting there with how to do things. I've worked out .ExecCmd, how to use rs.recordset, things like that...I'm probably doing things the entirely wrong way, but as long as it works that's all that matters.
    When (not if *gg*) you come to that point, give a shout.

    SQL is daily business for me.
    Just looking at the last "monster" i had to write for the company i work for:

    Source: several million rows across some 20 Base-tables
    Output: some 40 columns

    Query:
    270 lines of SQL-Code (Widescreen, FontSize 12)
    27 CTE's
    Some 50 JOIN's (too lazy to count them)
    Some 100 CASE WHEN THEN ELSE END calculations
    Some 50 COALESCE and IS NULL-Checks
    A smattering of ROW_NUMBER and LAG-Function-Calls

    Execution-Time= some 30 seconds for a result of some 5K result-rows
    Last edited by Zvoni; Tomorrow at 31:69 PM.
    ----------------------------------------------------------------------------------------

    One System to rule them all, One Code to find them,
    One IDE to bring them all, and to the Framework bind them,
    in the Land of Redmond, where the Windows lie
    ---------------------------------------------------------------------------------
    People call me crazy because i'm jumping out of perfectly fine airplanes.
    ---------------------------------------------------------------------------------
    Code is like a joke: If you have to explain it, it's bad

  12. #12
    PowerPoster Elroy's Avatar
    Join Date
    Jun 2014
    Location
    Near Nashville TN
    Posts
    9,941

    Re: Quicksort issue

    Now the idea of memory-based recordsets to accomplish this is interesting. You probably still wouldn't be able to match the speeds of some of the QuickSort approaches, but you'd certainly have a great deal more flexibility (additional sorts, pulling out subsets of records, etc).

    -----------

    Ohhh, and if we're just sorting a string array, "smartening up" some of the QuickSort approaches to just swap the string pointers, rather than the whole strings, could probably substantially increase the speed of some of those approaches. If I find some time, I might smarten up mine to do that.
    Any software I post in these forums written by me is provided "AS IS" without warranty of any kind, expressed or implied, and permission is hereby granted, free of charge and without restriction, to any person obtaining a copy. To all, peace and happiness.

  13. #13

    Thread Starter
    Fanatic Member
    Join Date
    Apr 2021
    Posts
    521

    Re: Quicksort issue

    Quote Originally Posted by Zvoni View Post
    When (not if *gg*) you come to that point, give a shout.

    SQL is daily business for me.
    You really don't want to offer that service to me :-P

    I get by with google, I know the basics of what I want to do and I ask google how to do it with SQLite...that helps with the SQL statement side of things...it's more RC6 and how it handles SQL. To give you an example of how I do things:
    Code:
    Public Function GetRecords(re() As String, Query As String)
    Dim dt As String
    Set rs = Cnn.OpenRecordset(Query)
    ReDim re(0)
    If rs.RecordCount > 0 Then
        ReDim re(rs.RecordCount)
        f = 0
        Do
        dt = rs.Fields(0)
        f = f + 1
        re(f) = dt
        rs.MoveNext
        Loop While Not rs.EOF
        End If
    End Function
    I understand the very VERY basics of recordsets, and this bit of code will take in a query (usually a single column query) and store the results in a string array...whenever I need a specific column of data I will use getrecords(). Obviously it will have opened the database previously.

    My code for getting a BLOB is:
    Code:
    Public Function GetRecord(re() As Byte, Query As String) ', rec As Double)
    Dim dt As String
    
    Set rs = Cnn.OpenRecordset(Query)
    ReDim re(0)
    If rs.RecordCount = 1 Then re() = rs.Fields("File").Value
    If rs.RecordCount > 1 Then MsgBox "Duplicate file in database!"
    If rs.RecordCount < 1 Then MsgBox "File not found, when getting blob"
    addlog "Query complete"
    End Function
    Query includes the "FileName" (there's two columns, FileName which is essentially the data and type of file, and File which is the BLOB) and details of which table it is from (each stock has its own table...one database, but many tables) so it essentially pulls one file (there is potential for there to be multiples of the same file, and I know I can set the filename column as unique so no two exact filenames exist but it would probably cause me more issues...better to delete if the file exists, it is supposed to but I didn't finish working on it).

    As I mentioned the query, this is the actual "PullFile" command:
    Code:
    Public Function PullFile(db As String, table As String, fna As String, FN() As Byte)
    Set Cnn = New_c.Connection
    
    Cnn.OpenDB db
    GetRecord FN(), "SELECT File FROM " & table & " WHERE Filename = '" & fna & "'"
    
    End Function
    As you can tell, THIS is where the DB is set. All this code works, but I'm still not fully sure why with some of it. As it works, doesn't matter if there are other ways to do it, but there might be more efficient ways. There's plenty of times that I have thought to myself that if I had just worked on my own storage system I might have less hassle to work through, but I DO prefer the SQL DB for storing the BLOBs. I don't go too complicated that I need things like JOINs or logical statements like CASE WHEN THEN ELSE, so can handle the SQL query logic fairly easily myself (though had a few rabbitholes where things weren't working like expected...got through those). I just wish (and we've heard it 1000s of times) there was a tutorial for using SQLite in RC6...something that explained exactly what a command does, and how it is used :-)

  14. #14

    Thread Starter
    Fanatic Member
    Join Date
    Apr 2021
    Posts
    521

    Re: Quicksort issue

    Quote Originally Posted by Elroy View Post
    Now the idea of memory-based recordsets to accomplish this is interesting. You probably still wouldn't be able to match the speeds of some of the QuickSort approaches, but you'd certainly have a great deal more flexibility (additional sorts, pulling out subsets of records, etc).
    Worth trying either way, as I already have SQL built in as part of RC6 and other code I am using...I just hope it isn't a rabbithole that keeps me busy for hours or days trying to get it working right :-)

    Quote Originally Posted by Elroy View Post
    Ohhh, and if we're just sorting a string array, "smartening up" some of the QuickSort approaches to just swap the string pointers, rather than the whole strings, could probably substantially increase the speed of some of those approaches. If I find some time, I might smarten up mine to do that.
    Indexed sorting, as I mentioned previously...you store an index to which array element is actually being sorted, and when it does the sort it actually swaps the index rather than the source data. At the end, you parse through the index array and do the swaps. Let's say you start at arr(1) and it says 154 is its true value. Swap 1 and 154 and then you know that the data in 1 is correct (be sure to also swap the index array)...154 might not be, but doesn't matter. Iterate through the numbers, passing any whose value is lower than its index (you've already swapped them).

    Useful either for long strings (as you mention) or multidimensional arrays where you have a column you want to sort but also have associated entries that need to move with them.

    I do plan to try it within the app.

  15. #15
    PowerPoster Elroy's Avatar
    Join Date
    Jun 2014
    Location
    Near Nashville TN
    Posts
    9,941

    Re: Quicksort issue

    Here's my String QuickSort modified to swap string pointers, rather than the actual strings. This should substantially speed up the algorithm. Note that now, it'll only work with strings. Use the above version if you wish to modify it for Longs or some other array type.

    I suspect you could do something similar to CVMichael's algorithm, but I'll leave that to others.

    Code:
    
    Option Explicit
    '
    Private Declare Function GetMem4 Lib "msvbvm60.dll" (ByRef Source As Any, ByRef Dest As Any) As Long
    '
    
    Public Sub QuickSortStrings(ToBeSorted() As String)
        ' Lbound and Ubound can be anything.
        ' No check for whether ToBeSorted() is dimensioned.
        '
        ' To use anything other than strings, this one wouldn't be easy to change!
        Dim TmpCompare As String
        Dim TmpSwap As Long
        '
        Dim iJumps() As Long
        Dim iMkr As Long
        Dim iLp1 As Long
        Dim iLp2 As Long
        Dim iTmp As Long
        Dim iJmp As Long
        Dim iCnt As Long
        Dim iLbd As Long
        '
        iCnt = UBound(ToBeSorted) - LBound(ToBeSorted) + 1&
        iLbd = LBound(ToBeSorted)
        If iCnt < 2& Then Exit Sub
        ReDim iJumps(0& To 99&)
        '
        iJumps(0&) = 1&
        iJumps(1&) = 4&
        iJumps(2&) = 13&
        iMkr = 0&
        Do While iJumps(iMkr + 2&) < iCnt
            iMkr = iMkr + 1&
            If iMkr + 2& > UBound(iJumps) Then ReDim Preserve iJumps(LBound(iJumps) To UBound(iJumps) + 100&)
            iJumps(iMkr + 2&) = 3& * iJumps(iMkr + 1&) + 1&
        Loop
        '
        For iLp1 = iMkr To 0& Step -1&
            iJmp = iJumps(iLp1)
            For iLp2 = iJmp To iCnt - 1&
                iTmp = iLp2 - iJmp
                GetMem4 ByVal VarPtr(ToBeSorted(iLp2 + iLbd)), ByVal VarPtr(TmpCompare) ' Temporary aliasing.
                Do While TmpCompare < ToBeSorted(iTmp + iLbd)
                    GetMem4 ByVal VarPtr(ToBeSorted(iTmp + iJmp + iLbd)), TmpSwap
                    GetMem4 ByVal VarPtr(ToBeSorted(iTmp + iLbd)), ByVal VarPtr(ToBeSorted(iTmp + iJmp + iLbd))
                    GetMem4 TmpSwap, ByVal VarPtr(ToBeSorted(iTmp + iLbd))
                    iTmp = iTmp - iJmp
                    If iTmp < 0& Then Exit Do
                Loop
                GetMem4 ByVal VarPtr(TmpCompare), ByVal VarPtr(ToBeSorted(iTmp + iJmp + iLbd))
            Next
        Next
        TmpSwap = 0&
        GetMem4 TmpSwap, ByVal VarPtr(TmpCompare) ' Clean-up aliasing.
    End Sub
    
    Private Sub Form_Load()
        Dim i As Long
    
        Dim sa() As String
        ReDim sa(99)
        Randomize
        For i = LBound(sa) To UBound(sa)
            sa(i) = Right$("000" & Hex$(CLng(Rnd * &HFFFF&)), 4&)
        Next
    
        QuickSortStrings sa
    
        For i = LBound(sa) To UBound(sa)
            Debug.Print sa(i)
        Next
    
    End Sub
    
    
    
    
    There's now no "string shuffling" at all, which should help things considerably.
    Any software I post in these forums written by me is provided "AS IS" without warranty of any kind, expressed or implied, and permission is hereby granted, free of charge and without restriction, to any person obtaining a copy. To all, peace and happiness.

  16. #16

    Thread Starter
    Fanatic Member
    Join Date
    Apr 2021
    Posts
    521

    Re: Quicksort issue

    Quote Originally Posted by Elroy View Post
    Here's my String QuickSort modified to swap string pointers, rather than the actual strings. This should substantially speed up the algorithm. Note that now, it'll only work with strings. Use the above version if you wish to modify it for Longs or some other array type.
    Ah, I see what you did there...rather than switching the contents of the boxes, you're switching the labels...A pity I still can't rate your posts (I need to share it around, apparently) as there's been a few helpful ones I would have rated :-)

    Still not more efficient for multidimensional arrays (where you have to swap each array element) and the index route will be far quicker in that case if you indexed the single dimension index array and used your pointer swap to perform the actual pointer swapping based on the index array values...an indexed array would probably be no better than your code, and would STILL have to perform swaps at the end which will add time.

  17. #17
    PowerPoster Elroy's Avatar
    Join Date
    Jun 2014
    Location
    Near Nashville TN
    Posts
    9,941

    Re: Quicksort issue

    Quote Originally Posted by SmUX2k View Post
    Ah, I see what you did there...rather than switching the contents of the boxes, you're switching the labels...
    I'm swapping the "pointers to the strings" rather than the "strings themselves". This keeps string deallocation/reallocation completely out of the process, and BSTR allocation/deallocation isn't terribly swift, but moving some Long pointers around is lightening fast.

    But yeah, if you're swapping an entire UDT, it's not so straightforward to swap the pointers, although not impossible. One way would be to create a TypeLib for your UDT, then make an array of objects, with the objects being your UDTs. Then, you could swap your object pointers (almost exactly like I swapped the string pointers).

    ---------------

    Also, just FYI, it doesn't matter to me whether you "rate my post" or not. My rating level has been maxed out for years.
    Last edited by Elroy; Mar 1st, 2024 at 02:03 PM.
    Any software I post in these forums written by me is provided "AS IS" without warranty of any kind, expressed or implied, and permission is hereby granted, free of charge and without restriction, to any person obtaining a copy. To all, peace and happiness.

  18. #18

    Thread Starter
    Fanatic Member
    Join Date
    Apr 2021
    Posts
    521

    Re: Quicksort issue

    To update on this a little, I have decided to put the sorting quandary on the backburner as it (and other rabbitholes) is holding me back from persevering through the project...for now I will use ElroyQS as it seems more stable, but I will keep an eye on how long the sorting takes to complete AND I will implement the indexed sorting rather than directly sorting the 5 elements (meaning 10 movements of 4 bytes each time...as theres so many swaps done, it adds up). If I feel like a break from the main progress path, I'll maybe venture back into it and see what I can do...and haven't forgotten about the SQLSort, it'll be the first thing I work on if I want a distraction!

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  



Click Here to Expand Forum to Full Width