I am trying to query text files with 280,000 records. Each record has 4 tabbed fields, and I need to query on field 1. I don't know how to do this in VB. Help Appreciated.
Thanks in advance.
Bob
Printable View
I am trying to query text files with 280,000 records. Each record has 4 tabbed fields, and I need to query on field 1. I don't know how to do this in VB. Help Appreciated.
Thanks in advance.
Bob
Hello Bob,
I can tell you what I would try first, and see how it profomed.
1) Create a UDT with the four fields
2) Use a Get Stament to read the file into an array of the UDT
3) Search the array with a for.. next.. loop
Hope this helps,
Thanks RvA,
...but how do I use a GET ststement to read the file into the array of User Defined Type.
I have created a simple example to try and get it working. It reads ...
Type record
number as string * 8
desc As String * 12
init As String * 4
price As String * 6
End Type
...and...
Dim myrec As record
Open "c:\bob.txt" For Random As #1 Len = 90
position = 1
Get #1, position, myrec
x = myrec.desc
Debug.Print x
Close #1
....This works, but how do I expand to account for ARRAY.
Hmmmmmmmm,
I forgot you using a tab delimeted file, let me look into this and I will get back to you.
Best
Hello Bob,
Quick question. Would it be possible to turn this list into a comma delimited file instead of a tab delimited file? There are functions in VB to read a comma delimited file, but I haven’t been able to find one for a tab delimited file. If there it can't be done then you can either parse the file your self, or look for something else I guess.
Best
You can open it in Excel, then save it as .csv. Then use INPUT to read it into an array in VB.
That's what I'm hoping he can do, the code to read it into an array then is really easy.
Thanks,
What about reading the entire file into a string (or strings if larger than a single string)? You could then use the Split function based on vbTab to break up the string into an array, where every 4 elements would be one record. Just an idea (works fast too).
[Edited by SonGouki on 05-09-2000 at 10:20 PM]
Excel will only accept 65536 records, and I have 280,000 records. I don't understand how to read the file into a string/strings, and to Split based on VBTab. Can you give me some more info so that I can try that.
Thanks to all.
Bob
This might work for you. It will input 1 line at a time from "bob.txt" and place it into the "myrec" array.
Obviously this is going to take a bit of time. Espically as you are going to be loading 280,000 records. Best i can do in a hurry though.
Make sure the file does not have an empty line at the bottom. it seems to crash and i havn't got time to fix it. So make sure there is no carriage return after the last record.
[Edited by Iain17 on 05-11-2000 at 06:24 PM]Code:Option Explicit
Private Type record
number As String * 8
desc As String * 12
init As String * 4
price As String * 6
End Type
Dim myrec() As record
Private Sub Command1_Click()
Dim strInput As String
Dim lCount As Long
'start with 280,000 records
ReDim myrec(280000) As record
'open the file
Open "bob.txt" For Binary As #1
'the counter
lCount = 0
Do While Not EOF(1)
'input a line
Line Input #1, strInput
getRecords strInput, lCount
lCount = lCount + 1
Loop
'resize the array to the exact number of elements.
ReDim Preserve myrec(lCount - 1) As record
Close #1
End Sub
Private Sub getRecords(strData As String, lIndex As Long)
Dim strArray() As String
'if the index is greater than than the Ubound of the
'array we need to redimension the array
If lIndex > UBound(myrec) Then
ReDim Preserve myrec(UBound(myrec) + 100000) As record
End If
'split the input string
strArray = Split(strData, vbTab)
'assuming number is the first field, desc the second etc.
myrec(lIndex).number = strArray(0)
myrec(lIndex).desc = strArray(1)
myrec(lIndex).init = strArray(2)
myrec(lIndex).price = strArray(3)
End Sub
I also tried what SonGouki suggested. Reading the whole file into a string and then spliting the string based on vbTab. If this worked it would indeed be quicker than my suggested way.
The problem with that is that you have to have a tab after the last field in each record. Otherwise you get a split that is incorrect, as usually there will be no tab after the last field, and just a carriage return. Though i am sure you could work around this problem
Just realised that you can use the replace function to replace all the carriage returns with tab characters. Easy.:D
So this is how you read the file in one fell swoop.
Hope this helps.;)Code:Private Type record
number As String * 8
desc As String * 12
init As String * 4
price As String * 6
End Type
Dim myrec() As record
Private Sub Command1_Click()
Dim strInput As String
Dim i As Long
Dim myStr() As String
Dim iNumRecs As Long
'open the file
Open "bob.txt" For Binary As #1
'read the file
strInput = Input(LOF(1), #1)
'replace all of the cariage returns with tabs
strInput = Replace(strInput, vbCrLf, vbTab)
'split the string
myStr = Split(strInput, vbTab)
'find the number of records
iNumRecs = (UBound(myStr) + 1) / 4
'redim the array to the number of records
ReDim myrec(iNumRecs) As record
'loop to fill the array
For i = 0 To UBound(myStr) Step 4
myrec(i \ 4).number = myStr(i + 0)
myrec(i \ 4).desc = myStr(i + 1)
myrec(i \ 4).price = myStr(i + 2)
myrec(i \ 4).init = myStr(i + 3)
Next i
Close #1
End Sub
[Edited by Iain17 on 05-10-2000 at 06:07 PM]
Just tested the function to read the whole file on 100,000 records, and it takes a bloody long time, and while it is splitting you can't let the user know what is going on.
So i have updated the function. This time you read the whole file, but instead of splitting on tab cahracters, you split on carriage returns. This gives you the number of records.
Then you go through each of these records to fill in the array.
Sorry to keep posting, but once i get involved with a problem i can't help myself.;)Code:Option Explicit
Private Type record
number As String * 8
desc As String * 12
init As String * 4
price As String * 6
End Type
Dim myrec() As record
Private Sub Command1_Click()
Dim strInput As String
Dim i As Long
Dim myStr() As String, myStr2() As String
Dim iNumRecs As Long
'open the file
Open "bob.txt" For Binary As #1
'read the file
strInput = Input(LOF(1), #1)
'split the string into records
myStr = Split(strInput, vbCrLf)
'clear that string
strInput = ""
'find the number of records
iNumRecs = (UBound(myStr))
'redim the array to the number of records
ReDim myrec(iNumRecs) As record
'loop to fill the array
For i = 0 To iNumRecs
'split the record into fields
myStr2 = Split(myStr(i), vbTab)
myrec(i).number = myStr2(0)
myrec(i).desc = myStr2(1)
myrec(i).price = myStr2(2)
myrec(i).init = myStr2(3)
Next i
Close #1
End Sub
Thanks ALL,
Iain, That last piece of code works - thank you very much for the time and effort - much appreciated - I would have been a long time creating same.
It works, but is extremely slow because of the amount of records. I am beginning to think that I should schedule a job to run overnight which would populate an SQL Database with these records, and my program could then query the database.
Your effort has not been in vain as you have thought me some very useful techniques. Thanks.
Bob
Though i agree with you that SQL is the way to go, this problem has intrigued me. I just can't stop myself from trying to make it run faster. And i have finally succedded, with quite a margin.
Just in case any one else is interested.
Forget the posts that read the file in one go. They are far to slow.
It is back to my first solution, where you read one line at a time. With only a very small modification it runs twice as quick as the other two solutions.
I really don't know what i was thinking. The guy told me there was 280,000 records. Why did i dimension the array to only 100, and increase it 100 at a time. God knows. It is the ReDim'ing of the array that kills the function, Because we have to preserve it as well.
So the very simple solution is to start with a very large array, if you know you will have 280,00 records then start with an array that size. Then if you do have to make it larger, add a lot more than 100 elements at a time.
The post i am talking about has been modified, to how it should coded to make it run faster.
Regards.
is this just for searching or are you going to write out to this file as well? There's a much less memory intensive way to pull this information if it's not going to be modifiable here.
If you just want to compare one of the Fields in each record to a value or expression, consider this approach:
Add a Picturebox, (as a progressbar) and a Command Button to a Form:
In tests on my PIII 600 with 128MB I averaged between 9,000 and 10,000 records per second, giving a total of 28 - 32 seconds to process ALL 280,000 records. (Each record was 29 Chars in length).Code:Private Sub Command1_Click()
Dim iFile As Integer
Dim iRec As Long
Dim sFile As String
Dim sChunk As String
Dim lChunk As Long
Dim lLOF As Long
Dim iPos As Long
Dim ilPos As Long
Dim vFields As Variant
Dim tTimer As Single
'Time the whole process
tTimer = Timer
'Read the Large File into a String as Fast as possible as this will
'Allow us to manipulate it the fastest.
Caption = "Reading File..."
iFile = FreeFile
Open "D:\LargeFile.txt" For Binary Access Read As iFile
lChunk = 64000 'Read the File in 64K Chunks
lLOF = LOF(iFile) 'Store the Length of the File
'Create a Buffer to store the File, this method used with the Mid()
'function is much faster than doing sFile = sFile & sChunk
sFile = Space(lLOF)
While Loc(iFile) < lLOF
If lChunk + Loc(iFile) > lLOF Then lChunk = lLOF - Loc(iFile)
sChunk = Space(lChunk)
Get #iFile, , sChunk
Mid(sFile, (Loc(iFile) - lChunk) + 1, lChunk) = sChunk
Wend
Close iFile
'Now Interogate the Records, comparing the 1st Field of each record
'Use a picturebox as the progress bar for the smallest overhead
Caption = "Processing Records..."
Picture1.ScaleWidth = lLOF
Do
ilPos = iPos + 1
'Each Record is seperated by a CFLF
iPos = InStr(iPos + 1, sFile, vbCrLf)
If iPos Then
iRec = iRec + 1
Picture1.Line (ilPos, 0)-(iPos, Picture1.ScaleHeight), vbBlue, BF
'Get the Fields from the Record, (Tab delimited = Chr(9))
vFields = Split(Mid(sFile, ilPos, iPos - ilPos), Chr(9))
'Check the Field
If vFields(0) = "Field1" Then
'If it matches do whatever with the record here.
End If
End If
Loop While iPos
'Report the time taken
MsgBox iRec & " records processed in " & (Timer - tTimer) & " seconds."
End Sub
Regards,
- Aaron.
If you are intending to use this a lot (and don't want to take the SQL path) then you could read it once into memory, sort it by the field you want to search it by and dump it back to file.
In future you can use a binary-search technique to track the file by reading about ... (hang on while I wotk this out) 18 reads of the file using binary access.
The algorithm for binary-search is below, use what ever sorting technique you like as you only have to do it once.
(Binary searching is fine for strings and integers)
But I admit, my first thought also was to use a binary read and the split function. If the records are already sorted however, then max 18 single record reads to find the record would be quicker.Code:
Dim min As Long
Dim max As Long
Dim middle As Long
Dim Found As Boolean
Dim Rec_Position As Long
' During the search the target's index will be
' between Min and Max: Min <= target index <= Max
min = 1
max = NumItems
Found = False
Do While min <= max
middle = (max + min) \ 2
If target = List(middle) Then ' We have found it!
Rec_Position = middle
Found = True
Exit Do
ElseIf target < List(middle) Then ' Search the left half.
max = middle - 1
Else ' Search the right half.
min = middle + 1
End If
Loop
If Found = True then
Debug.Print "Found at Position " & str(Rec_Position)
Else
Debug.Print "Not Found"
End if
[Edited by Paul282 on 05-12-2000 at 01:00 AM]
I am sorry but i am going to have to disagree with Aaron Young on some points.
While i freely admit that his approach is much quicker, that picture box of his really kills things.
Well on my Pentium 233 with 64MB i can average 19,754 records per second, giving a total of 15 seconds to process 296,320 records.Quote:
In tests on my PIII 600 with 128MB I averaged between 9,000 and 10,000 records per second, giving a total of 28 - 32 seconds to process ALL 280,000 records. (Each record was 29 Chars in length).
Where i was going wrong earlier, and thank you Aaron for making me realise this, is that the VB Split function is very slow. It is better to write your own string parsing functions.
To make Aarons code run at the speed i achived on my PC, all you need is two extra lines of code.
What is slowing Aaron's code down is updating the picture box everytime you loop through a record. To fix this all you need to do is make sure you do not run that line of code every time you go around the loop.
e.g. Where Aaron does this
place this code around it.Code:Picture1.Line (ilPos, 0)-(iPos, Picture1.ScaleHeight), vbBlue, BF
This will cause the update of the picture box to happen once for every 1000 times the loop goes around. Much quicker.:DCode:If iRec Mod 1000 = 0 Then
Picture1.Line (ilPos, 0)-(iPos, Picture1.ScaleHeight), vbBlue, BF
End If
I failed to note what exactly you disagreed with, it seems you agree my approach is much faster, altering the update period of the Picturebox progressbar is hardly a major change. If we were to "knit pick", then I could say making the update interval larger would increase the speed gain even further, plus, if you were to add an interval to the update process you should really first check to make sure that interval doesn't fall outside the complete file length, giving an incorrect reading, if this were the case then the code would look more like this:You could go to the extreme and forget the Picturebox all together, giving a process time of around 6 seconds for all 280,000 record on a machine of my spec, but the idea here was to give a faster solution to those suggested, and trimming downCode:Private Sub Command1_Click()
Dim iFile As Integer
Dim iRec As Long
Dim sFile As String
Dim sChunk As String
Dim lChunk As Long
Dim lLOF As Long
Dim iPos As Long
Dim ilPos As Long
Dim vFields As Variant
Dim tTimer As Single
Dim iInterval As Long
'Time the whole process
tTimer = Timer
'Read the Large File into a String as Fast as possible as this will
'Allow us to manipulate it the fastest.
Caption = "Reading File..."
iFile = FreeFile
Open "D:\LargeFile.txt" For Binary Access Read As iFile
lChunk = 64000 'Read the File in 64K Chunks
lLOF = LOF(iFile) 'Store the Length of the File
'Create a Buffer to store the File, this method used with the Mid()
'function is much faster than doing sFile = sFile & sChunk
sFile = Space(lLOF)
While Loc(iFile) < lLOF
If lChunk + Loc(iFile) > lLOF Then lChunk = lLOF - Loc(iFile)
sChunk = Space(lChunk)
Get #iFile, , sChunk
Mid(sFile, (Loc(iFile) - lChunk) + 1, lChunk) = sChunk
Wend
Close iFile
'Now Interogate the Records, comparing the 1st Field of each record
'Use a picturebox as the progress bar for the smallest overhead
Caption = "Processing Records..."
Picture1.ScaleWidth = lLOF
iInterval = lLOF / 1000
Do
ilPos = iPos + 1
'Each Record is seperated by a CFLF
iPos = InStr(iPos + 1, sFile, vbCrLf)
If iPos Then
iRec = iRec + 1
If iRec Mod iInterval = 0 Then
Picture1.Line (0, 0)-(iPos, Picture1.ScaleHeight), vbBlue, BF
End If
'Get the Fields from the Record, (Tab delimited = Chr(9))
vFields = Split(Mid(sFile, ilPos, iPos - ilPos), Chr(9))
'Check the Field
If vFields(0) = "Field1" Then
'If it matches do whatever with the record here.
End If
End If
Loop While iPos
Picture1.Line (0, 0)-(Picture1.ScaleWidth, Picture1.ScaleHeight), vbBlue, BF
'Report the time taken
MsgBox iRec & " records processed in " & (Timer - tTimer) & " seconds."
End Sub
to 28 seconds seemed, to me, more than sufficient.Quote:
.. a bloody long time ..
But that's if I'm picky.
:)