|
-
Jan 22nd, 2003, 09:43 PM
#1
Thread Starter
Junior Member
VB Dataset Sorting Problem
I have a set of data that needs to be sorted.
It is comprised of six numbers per line (a pair of triplet data sets), there may be more than 500 lines
The data comes in a form where the lines are all jumbled up like this:
A(8),B(8)C(8),A(9),B(9),C(9)
A(5),B(5)C(5),A(6),B(6),C(6)
A(1),B(1)C(1),A(2),B(2),C(2)
A(2),B(2)C(2),A(3),B(3),C(3)
A(4),B(4)C(4),A(5),B(5),C(5)
A(3),B(3)C(3),A(4),B(4),C(4)
A(6),B(6)C(6),A(7),B(7),C(7)
A(7),B(7)C(7),A(8),B(8),C(8)
A(0),B(0)C(0),A(1),B(1),C(1)
A(9),B(9)C(9),A(10),B(10),C(10)
A(16),B(16)C(16),A(17),B(17),C(17)
A(12),B(12)C(12),A(13),B(13),C(13)
A(n),B(n)C(n),A(n),B(n),C(n)
A(14),B(14)C(14),A(15),B(15),C(15)
A(15),B(15)C(15),A(16),B(16),C(16)
A(11),B(11)C(11),A(12),B(12),C(12)
A(n-1),B(n-16)C(n-1),A(n-1),B(n-1),C(n-1)
.......
A(13),B(13)C(13),A(14),B(14),C(14)
The sequence on each line is not changed but the line position is. The jumbling does not follow a pattern, it is random.
When sorted correctly the data should take the form:
A(0),B(0)C(0),A(1),B(1),C(1)
A(1),B(1)C(1),A(2),B(2),C(2)
A(2),B(2)C(2),A(3),B(3),C(3)
A(3),B(3)C(3),A(4),B(4),C(4)
A(4),B(4)C(4),A(5),B(5),C(5)
A(5),B(5)C(5),A(6),B(6),C(6)
A(6),B(6)C(6),A(7),B(7),C(7)
A(7),B(7)C(7),A(8),B(8),C(8)
A(8),B(8)C(8),A(9),B(9),C(9)
A(9),B(9)C(9),A(10),B(10),C(10)
A(11),B(11)C(11),A(12),B(12),C(12)
A(12),B(12)C(12),A(13),B(13),C(13)
A(13),B(13)C(13),A(14),B(14),C(14)
A(14),B(14)C(14),A(15),B(15),C(15)
A(15),B(15)C(15),A(16),B(16),C(16)
A(16),B(16)C(16),A(17),B(17),C(17)
......
A(n-1),B(n-16)C(n-1),A(n-1),B(n-1),C(n-1)
A(n),B(n)C(n),A(n),B(n),C(n)
Notice that for each line the last triplet matches the first triplet of the next line; however, as the lines
""""
A(9),B(9)C(9),A(10),B(10),C(10)
A(11),B(11)C(11),A(12),B(12),C(12)
""""
show sometimes there are skips in the data.
Does anyone have an idea for a sort alogorith that will put the data file back together?
Any help would be appreciated.
Thank You,
CWCookman
Edit,
It looks like my example may have been too abstract. I am dealing with actual numbers, not variables with indices.
Here is a small piece of an actual data file.
2251098,11322792,3000,2251349,11322625,3000
2250848,11321349,3000,2251349,11322625,3000
2251098,11322792,3000,2249880,11323773,3000
2249880,11323773,3000,2250587,11325115,3000
2250848,11321349,3000,2252137,11322137,3000
2250587,11325115,3000,2252035,11324479,3000
2252973,11325105,3000,2253656,11324885,3000
2252035,11324479,3000,2252973,11325105,3000
2253656,11324885,3000,2254407,11325342,3000
2252137,11322137,3000,2253694,11321591,3000
2254191,11321166,3000,2253694,11321591,3000
2254407,11325342,3000,2255890,11325936,3000
2256065,11325874,3000,2255890,11325936,3000
2256065,11325874,3000,2256260,11325946,3000
2254191,11321166,3000,2255715,11322187,3000
2255715,11322187,3000,2256107,11321912,3000
2256107,11321912,3000,2256963,11320361,3000
2256963,11320361,3000,2257081,11320632,3000
2256260,11325946,3000,2256827,11326189,3000
2256827,11326189,3000,2258439,11326557,3000
2257081,11320632,3000,2258654,11319446,3000
2258654,11319446,3000,2259433,11317569,3000
I hope this clarifies things.
The missing comma only means my typing is poor sometimes.
CWC
Last edited by cwcookman; Jan 23rd, 2003 at 05:46 PM.
-
Jan 22nd, 2003, 11:29 PM
#2
First of all... there is a comma missing between B and C on the left sie:
A(0),B(0)C(0),A(1),B(1),C(1)
Is that intentional ?
But it seems to me that you need to break the lines into triplets, and sort them like that...
And when writing back, each line will have 2 triplets.
For the sorting part, I think you can use a ListBox to sort, just set the lisbox.sorted = true, then insert all triplets, then in a for loop, you can get the triplets sorted back by index (i think).
-
Jan 22nd, 2003, 11:34 PM
#3
Something like this:
VB Code:
Private Sub Form_Load()
Dim K As Integer
List1.Visible = False
List1.AddItem "A(3),B(3)C(3)"
List1.AddItem "A(7),B(7)C(7)"
List1.AddItem "A(1),B(1)C(1)"
List1.AddItem "A(9),B(9)C(9)"
List1.AddItem "A(3),B(3)C(3)"
List1.AddItem "A(8),B(8)C(8)"
For K = 0 To List1.ListCount - 1 Step 2
Debug.Print List1.List(K) & "," & List1.List(K + 1)
Next K
End Sub
-
Jan 23rd, 2003, 06:15 PM
#4
Thread Starter
Junior Member
Edit
I edited my first post. i realized maybe I was not being very clear.
Last edited by cwcookman; Jan 23rd, 2003 at 06:18 PM.
-
Jan 23rd, 2003, 06:47 PM
#5
VB Code:
Option Explicit
Private Type myData
Num1 As Long
Num2 As Long
Num3 As Long
End Type
Private Sub QuickSort(C() As myData, ByVal First As Long, ByVal Last As Long)
Dim Low As Long, High As Long
Dim MidValue As myData
Low = First
High = Last
MidValue = C((First + Last) \ 2)
Do
While C(Low).Num1 < MidValue.Num1
Low = Low + 1
Wend
While C(High).Num1 > MidValue.Num1
High = High - 1
Wend
If Low <= High Then
Swap C(Low), C(High)
Low = Low + 1
High = High - 1
End If
Loop While Low <= High
If First < High Then QuickSort C, First, High
If Low < Last Then QuickSort C, Low, Last
End Sub
Private Sub Swap(ByRef A As myData, ByRef B As myData)
Dim T As myData
T = A
A = B
B = T
End Sub
Private Sub Form_Load()
Dim LN As String, myArray() As myData, Pos As Long, LNArray As Variant, K As Long
Open "C:\temp\myDataFile.txt" For Binary Access Read Lock Write As #1
Do
Line Input #1, LN
LNArray = Split(LN, ",")
On Error GoTo Err_Resize
ReDim Preserve myArray(UBound(myArray) + 2)
myArray(UBound(myArray) - 1).Num1 = LNArray(0)
myArray(UBound(myArray) - 1).Num2 = LNArray(1)
myArray(UBound(myArray) - 1).Num3 = LNArray(2)
myArray(UBound(myArray)).Num1 = LNArray(3)
myArray(UBound(myArray)).Num2 = LNArray(4)
myArray(UBound(myArray)).Num3 = LNArray(5)
Loop Until Loc(1) >= LOF(1)
Close #1
QuickSort myArray, 0, UBound(myArray)
Open "C:\Temp\myDataFile_Sorted.txt" For Binary Access Write Lock Write As #1
For K = 0 To UBound(myArray) - 1 Step 2
LN = myArray(K).Num1 & "," & myArray(K).Num2 & "," & myArray(K).Num3 & "," & _
myArray(K + 1).Num1 & "," & myArray(K + 1).Num2 & "," & myArray(K + 1).Num3
Put #1, , LN & vbNewLine
Next K
Close #1
Exit Sub
Err_Resize:
ReDim myArray(1)
Err.Clear
Resume Next
End Sub
-
Jan 23rd, 2003, 09:44 PM
#6
Thread Starter
Junior Member
Looks Interesting
After further study of the dataset I have realized the first and second triplets are displayed sometimes in reverse order. This is the case with the second triplet in lines one and two of the numerical example. The first triplet in lines one and three show the same reversal.
I will study your code to understand it completely.
I think you have shared several ideas that will allow me to crack the problem.
After a brief review of your code it appears that the QuickSort Sub sorts only Num1 in increasing value. The data actually needs to be sorted so that on any line the values of the second triplet equal the values of the first triplet on the next line.
ie if a,b,c,d,e,f,g are triplets and the starting data file looks like
c,d
e,d
c,b
f,g
a,b
e,f
the correct sort could either be
g,f
f,e
e,d
d,c
c,b
b,a
or
a,b
b,c
c,d
d,e
e,f
f,g
A way of viewing this spacially would be to see the whole data set as a line made of of segements. The endpoints of the segements are represented by the triplet pairs. In the example above the line starts at a point "a" and ends at a point "g" .
The first segment goes from a to b, the second from b to c, the third from c.... until the last segment runs from f to g.
What is important in the data order is the endpoint of a segment on one dataline generally equals the begining point of a segement on the next dataline. If the values are not equal then we can view the dataset as representing more than one multisegment line. In our example above more data read might show the resolved dataset to look like
x,a
a,b
b,c
c,d
d,e
e,f
f,g
j,k
k,l
l,m
with the two triplet pairs
f,g
j,k
representing the end of one multisegment line and the begining of another.
In our example the values of b-first may or may not be be equal, greater than or less than a-first.
It seems the first, middle, and last values of each endpoint triplet must be compared to the first, middle, and last values of the prospective next begining line triplet; furthermore, the propsective next begining line triplet may either be the first or the second triplet on the dataline being evaluated.
I have tried to give a better explaination of the dataset problem, I wonder if I havent made things worse.
I hope to work on the problem more tomorrow, your code will be of great help.
Thank You for your ideas and your effort CVMicheal!!
Last edited by cwcookman; Jan 23rd, 2003 at 09:53 PM.
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules
|
Click Here to Expand Forum to Full Width
|