Results 1 to 6 of 6

Thread: VB Dataset Sorting Problem

  1. #1

    Thread Starter
    Junior Member
    Join Date
    Oct 2001
    Posts
    22

    VB Dataset Sorting Problem

    I have a set of data that needs to be sorted.

    It is comprised of six numbers per line (a pair of triplet data sets), there may be more than 500 lines

    The data comes in a form where the lines are all jumbled up like this:

    A(8),B(8)C(8),A(9),B(9),C(9)
    A(5),B(5)C(5),A(6),B(6),C(6)
    A(1),B(1)C(1),A(2),B(2),C(2)
    A(2),B(2)C(2),A(3),B(3),C(3)
    A(4),B(4)C(4),A(5),B(5),C(5)
    A(3),B(3)C(3),A(4),B(4),C(4)
    A(6),B(6)C(6),A(7),B(7),C(7)
    A(7),B(7)C(7),A(8),B(8),C(8)
    A(0),B(0)C(0),A(1),B(1),C(1)
    A(9),B(9)C(9),A(10),B(10),C(10)
    A(16),B(16)C(16),A(17),B(17),C(17)
    A(12),B(12)C(12),A(13),B(13),C(13)
    A(n),B(n)C(n),A(n),B(n),C(n)
    A(14),B(14)C(14),A(15),B(15),C(15)
    A(15),B(15)C(15),A(16),B(16),C(16)
    A(11),B(11)C(11),A(12),B(12),C(12)
    A(n-1),B(n-16)C(n-1),A(n-1),B(n-1),C(n-1)
    .......
    A(13),B(13)C(13),A(14),B(14),C(14)


    The sequence on each line is not changed but the line position is. The jumbling does not follow a pattern, it is random.


    When sorted correctly the data should take the form:

    A(0),B(0)C(0),A(1),B(1),C(1)
    A(1),B(1)C(1),A(2),B(2),C(2)
    A(2),B(2)C(2),A(3),B(3),C(3)
    A(3),B(3)C(3),A(4),B(4),C(4)
    A(4),B(4)C(4),A(5),B(5),C(5)
    A(5),B(5)C(5),A(6),B(6),C(6)
    A(6),B(6)C(6),A(7),B(7),C(7)
    A(7),B(7)C(7),A(8),B(8),C(8)
    A(8),B(8)C(8),A(9),B(9),C(9)
    A(9),B(9)C(9),A(10),B(10),C(10)
    A(11),B(11)C(11),A(12),B(12),C(12)
    A(12),B(12)C(12),A(13),B(13),C(13)
    A(13),B(13)C(13),A(14),B(14),C(14)
    A(14),B(14)C(14),A(15),B(15),C(15)
    A(15),B(15)C(15),A(16),B(16),C(16)
    A(16),B(16)C(16),A(17),B(17),C(17)
    ......
    A(n-1),B(n-16)C(n-1),A(n-1),B(n-1),C(n-1)
    A(n),B(n)C(n),A(n),B(n),C(n)

    Notice that for each line the last triplet matches the first triplet of the next line; however, as the lines

    """"
    A(9),B(9)C(9),A(10),B(10),C(10)
    A(11),B(11)C(11),A(12),B(12),C(12)

    """"
    show sometimes there are skips in the data.



    Does anyone have an idea for a sort alogorith that will put the data file back together?

    Any help would be appreciated.

    Thank You,
    CWCookman

    Edit,

    It looks like my example may have been too abstract. I am dealing with actual numbers, not variables with indices.

    Here is a small piece of an actual data file.


    2251098,11322792,3000,2251349,11322625,3000
    2250848,11321349,3000,2251349,11322625,3000
    2251098,11322792,3000,2249880,11323773,3000
    2249880,11323773,3000,2250587,11325115,3000
    2250848,11321349,3000,2252137,11322137,3000
    2250587,11325115,3000,2252035,11324479,3000
    2252973,11325105,3000,2253656,11324885,3000
    2252035,11324479,3000,2252973,11325105,3000
    2253656,11324885,3000,2254407,11325342,3000
    2252137,11322137,3000,2253694,11321591,3000
    2254191,11321166,3000,2253694,11321591,3000
    2254407,11325342,3000,2255890,11325936,3000
    2256065,11325874,3000,2255890,11325936,3000
    2256065,11325874,3000,2256260,11325946,3000
    2254191,11321166,3000,2255715,11322187,3000
    2255715,11322187,3000,2256107,11321912,3000
    2256107,11321912,3000,2256963,11320361,3000
    2256963,11320361,3000,2257081,11320632,3000
    2256260,11325946,3000,2256827,11326189,3000
    2256827,11326189,3000,2258439,11326557,3000
    2257081,11320632,3000,2258654,11319446,3000
    2258654,11319446,3000,2259433,11317569,3000


    I hope this clarifies things.

    The missing comma only means my typing is poor sometimes.
    CWC
    Last edited by cwcookman; Jan 23rd, 2003 at 05:46 PM.

  2. #2
    PowerPoster
    Join Date
    Feb 2002
    Location
    Canada, Toronto
    Posts
    5,803
    First of all... there is a comma missing between B and C on the left sie:
    A(0),B(0)C(0),A(1),B(1),C(1)

    Is that intentional ?


    But it seems to me that you need to break the lines into triplets, and sort them like that...
    And when writing back, each line will have 2 triplets.

    For the sorting part, I think you can use a ListBox to sort, just set the lisbox.sorted = true, then insert all triplets, then in a for loop, you can get the triplets sorted back by index (i think).

  3. #3
    PowerPoster
    Join Date
    Feb 2002
    Location
    Canada, Toronto
    Posts
    5,803
    Something like this:
    VB Code:
    1. Private Sub Form_Load()
    2.     Dim K As Integer
    3.    
    4.     List1.Visible = False
    5.     List1.AddItem "A(3),B(3)C(3)"
    6.     List1.AddItem "A(7),B(7)C(7)"
    7.     List1.AddItem "A(1),B(1)C(1)"
    8.     List1.AddItem "A(9),B(9)C(9)"
    9.     List1.AddItem "A(3),B(3)C(3)"
    10.     List1.AddItem "A(8),B(8)C(8)"
    11.    
    12.     For K = 0 To List1.ListCount - 1 Step 2
    13.         Debug.Print List1.List(K) & "," & List1.List(K + 1)
    14.     Next K
    15. End Sub

  4. #4

    Thread Starter
    Junior Member
    Join Date
    Oct 2001
    Posts
    22

    Edit

    I edited my first post. i realized maybe I was not being very clear.
    Last edited by cwcookman; Jan 23rd, 2003 at 06:18 PM.

  5. #5
    PowerPoster
    Join Date
    Feb 2002
    Location
    Canada, Toronto
    Posts
    5,803
    VB Code:
    1. Option Explicit
    2.  
    3. Private Type myData
    4.     Num1 As Long
    5.     Num2 As Long
    6.     Num3 As Long
    7. End Type
    8.  
    9. Private Sub QuickSort(C() As myData, ByVal First As Long, ByVal Last As Long)
    10.     Dim Low As Long, High As Long
    11.     Dim MidValue As myData
    12.    
    13.     Low = First
    14.     High = Last
    15.     MidValue = C((First + Last) \ 2)
    16.    
    17.     Do
    18.         While C(Low).Num1 < MidValue.Num1
    19.             Low = Low + 1
    20.         Wend
    21.        
    22.         While C(High).Num1 > MidValue.Num1
    23.             High = High - 1
    24.         Wend
    25.        
    26.         If Low <= High Then
    27.             Swap C(Low), C(High)
    28.             Low = Low + 1
    29.             High = High - 1
    30.         End If
    31.     Loop While Low <= High
    32.    
    33.     If First < High Then QuickSort C, First, High
    34.     If Low < Last Then QuickSort C, Low, Last
    35. End Sub
    36.  
    37. Private Sub Swap(ByRef A As myData, ByRef B As myData)
    38.     Dim T As myData
    39.    
    40.     T = A
    41.     A = B
    42.     B = T
    43. End Sub
    44.  
    45. Private Sub Form_Load()
    46.     Dim LN As String, myArray() As myData, Pos As Long, LNArray As Variant, K As Long
    47.    
    48.     Open "C:\temp\myDataFile.txt" For Binary Access Read Lock Write As #1
    49.         Do
    50.             Line Input #1, LN
    51.            
    52.             LNArray = Split(LN, ",")
    53.            
    54.             On Error GoTo Err_Resize
    55.             ReDim Preserve myArray(UBound(myArray) + 2)
    56.            
    57.             myArray(UBound(myArray) - 1).Num1 = LNArray(0)
    58.             myArray(UBound(myArray) - 1).Num2 = LNArray(1)
    59.             myArray(UBound(myArray) - 1).Num3 = LNArray(2)
    60.            
    61.             myArray(UBound(myArray)).Num1 = LNArray(3)
    62.             myArray(UBound(myArray)).Num2 = LNArray(4)
    63.             myArray(UBound(myArray)).Num3 = LNArray(5)
    64.         Loop Until Loc(1) >= LOF(1)
    65.     Close #1
    66.    
    67.     QuickSort myArray, 0, UBound(myArray)
    68.    
    69.     Open "C:\Temp\myDataFile_Sorted.txt" For Binary Access Write Lock Write As #1
    70.        
    71.         For K = 0 To UBound(myArray) - 1 Step 2
    72.             LN = myArray(K).Num1 & "," & myArray(K).Num2 & "," & myArray(K).Num3 & "," & _
    73.                 myArray(K + 1).Num1 & "," & myArray(K + 1).Num2 & "," & myArray(K + 1).Num3
    74.                
    75.             Put #1, , LN & vbNewLine
    76.         Next K
    77.        
    78.     Close #1
    79.    
    80.     Exit Sub
    81. Err_Resize:
    82.     ReDim myArray(1)
    83.     Err.Clear
    84.     Resume Next
    85. End Sub

  6. #6

    Thread Starter
    Junior Member
    Join Date
    Oct 2001
    Posts
    22

    Looks Interesting

    After further study of the dataset I have realized the first and second triplets are displayed sometimes in reverse order. This is the case with the second triplet in lines one and two of the numerical example. The first triplet in lines one and three show the same reversal.

    I will study your code to understand it completely.

    I think you have shared several ideas that will allow me to crack the problem.

    After a brief review of your code it appears that the QuickSort Sub sorts only Num1 in increasing value. The data actually needs to be sorted so that on any line the values of the second triplet equal the values of the first triplet on the next line.

    ie if a,b,c,d,e,f,g are triplets and the starting data file looks like
    c,d
    e,d
    c,b
    f,g
    a,b
    e,f



    the correct sort could either be

    g,f
    f,e
    e,d
    d,c
    c,b
    b,a

    or

    a,b
    b,c
    c,d
    d,e
    e,f
    f,g


    A way of viewing this spacially would be to see the whole data set as a line made of of segements. The endpoints of the segements are represented by the triplet pairs. In the example above the line starts at a point "a" and ends at a point "g" .

    The first segment goes from a to b, the second from b to c, the third from c.... until the last segment runs from f to g.

    What is important in the data order is the endpoint of a segment on one dataline generally equals the begining point of a segement on the next dataline. If the values are not equal then we can view the dataset as representing more than one multisegment line. In our example above more data read might show the resolved dataset to look like

    x,a
    a,b
    b,c
    c,d
    d,e
    e,f
    f,g
    j,k
    k,l
    l,m


    with the two triplet pairs

    f,g
    j,k

    representing the end of one multisegment line and the begining of another.

    In our example the values of b-first may or may not be be equal, greater than or less than a-first.

    It seems the first, middle, and last values of each endpoint triplet must be compared to the first, middle, and last values of the prospective next begining line triplet; furthermore, the propsective next begining line triplet may either be the first or the second triplet on the dataline being evaluated.

    I have tried to give a better explaination of the dataset problem, I wonder if I havent made things worse.

    I hope to work on the problem more tomorrow, your code will be of great help.


    Thank You for your ideas and your effort CVMicheal!!
    Last edited by cwcookman; Jan 23rd, 2003 at 09:53 PM.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  



Click Here to Expand Forum to Full Width