Results 1 to 10 of 10

Thread: Remove Dups from array

  1. #1

    Thread Starter
    Hyperactive Member Maven's Avatar
    Join Date
    Feb 2003
    Location
    Greeneville, TN
    Posts
    322

    Remove Dups from array

    It's another late night and I'm board so I've played with routines to get rid of duplicates in an array. Of course I did this in ASM =P

    I compared it to a visual basic algorithm done by plenderj. You can view it here: http://vbforums.com/showthread.php?t=157498

    My version is 162.5 times faster. (I predict a post from Merri telling me how unfair this is lol)

    I've included a source and the dll.

    Code:
    Example Functions:
    ~~~~~~~~~~~~
    
    'Any 1 Byte Variable:
    Private Declare Sub RemoveDuplicatesB Lib "arrfunctions.dll" (ByRef arr() As Byte)
    
    'Any 2 byte variable
    Private Declare Sub RemoveDuplicatesWD Lib "arrfunctions.dll" (ByRef arr() As Integer)
    
    'Any 4 byte variable
    Private Declare Sub RemoveDuplicatesDD Lib "arrfunctions.dll" (ByRef arr() As Long)
    
    'Any 8 byte variable
    Private Declare Sub RemoveDuplicatesQW Lib "arrfunctions.dll" (ByRef arr() As Double)
    Byte 1 byte 1 byte
    Integer 2 bytes 2 bytes
    Long 4 bytes 4 bytes
    Single 4 bytes 4 bytes
    Double 4 bytes 8 bytes
    Date 4 bytes 8 bytes
    Attached Files Attached Files
    Education is an admirable thing, but it is well to remember from time to time that nothing that is worth knowing can be taught. - Oscar Wilde

  2. #2
    VB6, XHTML & CSS hobbyist Merri's Avatar
    Join Date
    Oct 2002
    Location
    Finland
    Posts
    6,654

    Re: Remove Dups from array

    Your code is broken, it gives one item too few. It might have even other bugs but I didn't dare to check them out: I don't know if you did a check for an empty array.

    Anyways... here we are again: first of all, ASM code in this case isn't as fast as you're saying. plenderj's code is only ten times slower than yours when working with 1 000 000 items and six times slower with 10 000 000 items). Next, the optimized VB6 code I made for comparison isn't even two times slower than yours: only about 1.6 times. I bet some guru could make it even faster.

    As a note, I worked with byte arrays here. It isn't hard to modify the stuff to work with Longs.


    OFFTOPIC
    Apparently you didn't get the thing when I used word "fair": what was behind it was I originally had a Function, and so it copied memory a few extra times, which hitted performance in such a fast thing to benchmark. And as you can see, even calling a Sub a lot of times is slow in VB (it made about 20% performance hit), so positioning the code in the benchmark loop gives more accurate results on the performance of the actual code. Just to make it clear since you seemed to understand it wrong, based on your joke about it

    Btw: if you were a board, how could you code? Boards have no hands (yeah, it should be "bored")
    /OFFTOPIC


    Go ahead and test
    Attached Files Attached Files

  3. #3

    Thread Starter
    Hyperactive Member Maven's Avatar
    Join Date
    Feb 2003
    Location
    Greeneville, TN
    Posts
    322

    Re: Remove Dups from array

    Quote Originally Posted by Merri
    Your code is broken, it gives one item too few. It might have even other bugs but I didn't dare to check them out: I don't know if you did a check for an empty array.

    Anyways... here we are again: first of all, ASM code in this case isn't as fast as you're saying. plenderj's code is only ten times slower than yours when working with 1 000 000 items and six times slower with 10 000 000 items). Next, the optimized VB6 code I made for comparison isn't even two times slower than yours: only about 1.6 times. I bet some guru could make it even faster.

    As a note, I worked with byte arrays here. It isn't hard to modify the stuff to work with Longs.


    OFFTOPIC
    Apparently you didn't get the thing when I used word "fair": what was behind it was I originally had a Function, and so it copied memory a few extra times, which hitted performance in such a fast thing to benchmark. And as you can see, even calling a Sub a lot of times is slow in VB (it made about 20% performance hit), so positioning the code in the benchmark loop gives more accurate results on the performance of the actual code. Just to make it clear since you seemed to understand it wrong, based on your joke about it

    Btw: if you were a board, how could you code? Boards have no hands (yeah, it should be "bored")
    /OFFTOPIC


    Go ahead and test
    No your coding as if my function returns one too many.

    Change this:
    For I = 0 To UBound(X) - 1

    to this:

    For I = 0 To UBound(X)

    Using your timing method:
    18 on my pc
    38 Yours
    290 his

    at 10,000,000
    181
    302
    3021
    Education is an admirable thing, but it is well to remember from time to time that nothing that is worth knowing can be taught. - Oscar Wilde

  4. #4
    VB6, XHTML & CSS hobbyist Merri's Avatar
    Join Date
    Oct 2002
    Location
    Finland
    Posts
    6,654

    Re: Remove Dups from array

    Heh, should check other people's code and not trust it: I copy-pasted plenderj's code without too much checking on it I did wonder at one point why I had to have + 1 in my code. I guess I also miscounted how many items plenderj's code returned - or then plenderj's code has that weird flaw. Haven't checked. Luckily an easy fix. Edit plenderj, be your name cursed

    Here are some results on my machine.

    1 000 000 items:
    - 25 ms yours (1.00)
    - 34 ms mine (1.36)
    - 165 ms plenderj (6.60)

    10 000 000 items:
    - 266 ms (1.00)
    - 347 ms (1.30)
    - 1688 ms (6.34)

    100 000 000 items:
    - 2656 ms (1.00)
    - 3413 ms (1.28)
    - 16906 ms (6.36)


    In case interested: AMD Sempron 2800+, 1024 MB @ 333MHz


    Edit Updated results to more optimized code.
    Last edited by Merri; Dec 30th, 2004 at 11:36 PM.

  5. #5

    Thread Starter
    Hyperactive Member Maven's Avatar
    Join Date
    Feb 2003
    Location
    Greeneville, TN
    Posts
    322

    Re: Remove Dups from array

    Quote Originally Posted by Merri
    Heh, should check other people's code and not trust it: I copy-pasted plenderj's code without too much checking on it I did wonder at one point why I had to have + 1 in my code. I guess I also miscounted how many items plenderj's code returned - or then plenderj's code has that weird flaw. Haven't checked. Luckily an easy fix. Edit plenderj, be your name cursed

    Here are some results on my machine.

    1 000 000 items:
    - 25 ms yours (1.0)
    - 45 ms mine (1.8)
    - 165 ms plenderj (6.6)

    10 000 000 items:
    - 266 ms (1.00)
    - 422 ms (1.58)
    - 1688 ms (6.34)

    100 000 000 items:
    - 2656 ms (1.00)
    - 4468 ms (1.68)
    - 16906 ms (6.36)


    In case interested: AMD Sempron 2800+, 1024 MB @ 333MHz
    hrm... try looping it

    Could be just a dif between intel an amd though
    Education is an admirable thing, but it is well to remember from time to time that nothing that is worth knowing can be taught. - Oscar Wilde

  6. #6
    Banned dglienna's Avatar
    Join Date
    Jun 2004
    Location
    Center of it all
    Posts
    17,901

    Re: Remove Dups from array

    btw, it's HIT, not HITTED. There is no such word.

  7. #7
    VB6, XHTML & CSS hobbyist Merri's Avatar
    Join Date
    Oct 2002
    Location
    Finland
    Posts
    6,654

    Re: Remove Dups from array

    Quote Originally Posted by Maven
    hrm... try looping it
    Sorry, looping has no point in this case because there are so many items there, it is easy to benchmark just like it is. This command is seldomly used many many times in a row, so it only matters to benchmark one run. Also, benchmarking it would also mean copying the array must be included in the loop, which in part gives false information on the speed of the array...

    I think I might make a faster string duplicate removal array as well, since I know how to make it work faster than what plenderj's code is... also, there is a super fast sorting routine at PSC for VB6, which returns sorted array one item at a time: you could call it and check for similarities within the incoming string, as the duplicates are easy to catch (they all come in a row).

    Quote Originally Posted by Maven
    Could be just a dif between intel an amd though
    Which means you optimized your ASM code for Intel only - if you optimize for AMD then your code probably runs slower on Intel


    dglienna: nimet tulisi kirjoittaa isolla alkukirjaimella ja kuten todettua, enkku on tosi outo kieli

  8. #8
    Banned dglienna's Avatar
    Join Date
    Jun 2004
    Location
    Center of it all
    Posts
    17,901

    Re: Remove Dups from array

    dglienna: nimet tulisi kirjoittaa isolla alkukirjaimella ja kuten todettua, enkku on tosi outo kieli
    :: ??? ::

  9. #9
    VB6, XHTML & CSS hobbyist Merri's Avatar
    Join Date
    Oct 2002
    Location
    Finland
    Posts
    6,654

    Re: Remove Dups from array

    Updated code.

    VB Code:
    1. Public Sub bArrRemoveDuplicate(ByRef ByteArray() As Byte)
    2.     Dim LowBound As Long, UpBound As Long
    3.     Dim TempArray() As Byte, TempByte As Byte, Cur As Long
    4.     Dim A As Long, B As Long
    5.    
    6.     'check for empty array
    7.     If (Not ByteArray) = True Then Exit Sub
    8.    
    9.     'we need these often
    10.     LowBound = LBound(ByteArray)
    11.     UpBound = UBound(ByteArray)
    12.    
    13.     'reserve check buffer
    14.     ReDim TempArray(LowBound To UpBound)
    15.    
    16.     'set first item
    17.     Cur = LowBound
    18.     TempArray(Cur) = ByteArray(LowBound)
    19.    
    20.     'loop through all items
    21.     For A = LowBound + 1 To UpBound
    22.         TempByte = ByteArray(A)
    23.         'make a comparison against all items
    24.         For B = LowBound To Cur
    25.             'if is a duplicate, exit array
    26.             If (TempArray(B) Xor TempByte) = 0 Then Exit For
    27.         Next B
    28.         'check if the loop was exited: add new item to check buffer if not
    29.         If B > Cur Then Cur = B: TempArray(Cur) = ByteArray(A)
    30.     Next A
    31.    
    32.     'fix size
    33.     ReDim Preserve TempArray(LowBound To Cur)
    34.     'copy
    35.     ByteArray = TempArray
    36. End Sub

    Noticed a silly thing I forgot to optimize. Now it is only 1.3 times the speed of the ASM code.

  10. #10

    Thread Starter
    Hyperactive Member Maven's Avatar
    Join Date
    Feb 2003
    Location
    Greeneville, TN
    Posts
    322

    Re: Remove Dups from array

    Quote Originally Posted by Merri
    Sorry, looping has no point in this case because there are so many items there, it is easy to benchmark just like it is. This command is seldomly used many many times in a row, so it only matters to benchmark one run. Also, benchmarking it would also mean copying the array must be included in the loop, which in part gives false information on the speed of the array...

    I think I might make a faster string duplicate removal array as well, since I know how to make it work faster than what plenderj's code is... also, there is a super fast sorting routine at PSC for VB6, which returns sorted array one item at a time: you could call it and check for similarities within the incoming string, as the duplicates are easy to catch (they all come in a row).



    Which means you optimized your ASM code for Intel only - if you optimize for AMD then your code probably runs slower on Intel


    dglienna: nimet tulisi kirjoittaa isolla alkukirjaimella ja kuten todettua, enkku on tosi outo kieli
    Well the whole purpose would be to avg out operating sytem interrupts. I usually loop my code and run it in real time.

    Well you're code isn't always faster, it really depends on the status of the array. If the array has more duplicates then your code does better, if it has less then you're code ends up worse. Plenderj's code is different, it gets better the less duplicates that are found.

    I'm looking into the idea of working with these numbers at 4 bytes per go. The speed gain would probably be 100-1000% but would have to figure out how to deal with odd bytes.
    Education is an admirable thing, but it is well to remember from time to time that nothing that is worth knowing can be taught. - Oscar Wilde

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  



Click Here to Expand Forum to Full Width