|
-
May 8th, 2000, 07:35 PM
#1
Thread Starter
Member
I am trying to query text files with 280,000 records. Each record has 4 tabbed fields, and I need to query on field 1. I don't know how to do this in VB. Help Appreciated.
Thanks in advance.
Bob
-
May 8th, 2000, 08:08 PM
#2
Hello Bob,
I can tell you what I would try first, and see how it profomed.
1) Create a UDT with the four fields
2) Use a Get Stament to read the file into an array of the UDT
3) Search the array with a for.. next.. loop
Hope this helps,
-
May 8th, 2000, 10:15 PM
#3
Thread Starter
Member
Thanks RvA,
...but how do I use a GET ststement to read the file into the array of User Defined Type.
I have created a simple example to try and get it working. It reads ...
Type record
number as string * 8
desc As String * 12
init As String * 4
price As String * 6
End Type
...and...
Dim myrec As record
Open "c:\bob.txt" For Random As #1 Len = 90
position = 1
Get #1, position, myrec
x = myrec.desc
Debug.Print x
Close #1
....This works, but how do I expand to account for ARRAY.
-
May 9th, 2000, 12:05 AM
#4
Hmmmmmmmm,
I forgot you using a tab delimeted file, let me look into this and I will get back to you.
Best
-
May 9th, 2000, 03:07 AM
#5
Hello Bob,
Quick question. Would it be possible to turn this list into a comma delimited file instead of a tab delimited file? There are functions in VB to read a comma delimited file, but I haven’t been able to find one for a tab delimited file. If there it can't be done then you can either parse the file your self, or look for something else I guess.
Best
-
May 9th, 2000, 04:35 AM
#6
Hyperactive Member
You can open it in Excel, then save it as .csv. Then use INPUT to read it into an array in VB.
-
May 9th, 2000, 05:20 AM
#7
That's what I'm hoping he can do, the code to read it into an array then is really easy.
Thanks,
-
May 9th, 2000, 09:19 AM
#8
Addicted Member
What about reading the entire file into a string (or strings if larger than a single string)? You could then use the Split function based on vbTab to break up the string into an array, where every 4 elements would be one record. Just an idea (works fast too).
[Edited by SonGouki on 05-09-2000 at 10:20 PM]
Dan PM
Analyst Programmer
VB6 SP3 (also VB4 16-bit sometimes  )
-
May 9th, 2000, 08:20 PM
#9
Thread Starter
Member
Excel will only accept 65536 records, and I have 280,000 records. I don't understand how to read the file into a string/strings, and to Split based on VBTab. Can you give me some more info so that I can try that.
Thanks to all.
Bob
-
May 9th, 2000, 09:16 PM
#10
Fanatic Member
This might work for you. It will input 1 line at a time from "bob.txt" and place it into the "myrec" array.
Obviously this is going to take a bit of time. Espically as you are going to be loading 280,000 records. Best i can do in a hurry though.
Make sure the file does not have an empty line at the bottom. it seems to crash and i havn't got time to fix it. So make sure there is no carriage return after the last record.
Code:
Option Explicit
Private Type record
number As String * 8
desc As String * 12
init As String * 4
price As String * 6
End Type
Dim myrec() As record
Private Sub Command1_Click()
Dim strInput As String
Dim lCount As Long
'start with 280,000 records
ReDim myrec(280000) As record
'open the file
Open "bob.txt" For Binary As #1
'the counter
lCount = 0
Do While Not EOF(1)
'input a line
Line Input #1, strInput
getRecords strInput, lCount
lCount = lCount + 1
Loop
'resize the array to the exact number of elements.
ReDim Preserve myrec(lCount - 1) As record
Close #1
End Sub
Private Sub getRecords(strData As String, lIndex As Long)
Dim strArray() As String
'if the index is greater than than the Ubound of the
'array we need to redimension the array
If lIndex > UBound(myrec) Then
ReDim Preserve myrec(UBound(myrec) + 100000) As record
End If
'split the input string
strArray = Split(strData, vbTab)
'assuming number is the first field, desc the second etc.
myrec(lIndex).number = strArray(0)
myrec(lIndex).desc = strArray(1)
myrec(lIndex).init = strArray(2)
myrec(lIndex).price = strArray(3)
End Sub
[Edited by Iain17 on 05-11-2000 at 06:24 PM]
Iain, thats with an i by the way!
-
May 9th, 2000, 10:12 PM
#11
Fanatic Member
I also tried what SonGouki suggested. Reading the whole file into a string and then spliting the string based on vbTab. If this worked it would indeed be quicker than my suggested way.
The problem with that is that you have to have a tab after the last field in each record. Otherwise you get a split that is incorrect, as usually there will be no tab after the last field, and just a carriage return. Though i am sure you could work around this problem
Iain, thats with an i by the way!
-
May 9th, 2000, 10:33 PM
#12
Fanatic Member
Iain, thats with an i by the way!
-
May 10th, 2000, 12:40 AM
#13
Fanatic Member
Opps!
Just tested the function to read the whole file on 100,000 records, and it takes a bloody long time, and while it is splitting you can't let the user know what is going on.
So i have updated the function. This time you read the whole file, but instead of splitting on tab cahracters, you split on carriage returns. This gives you the number of records.
Then you go through each of these records to fill in the array.
Code:
Option Explicit
Private Type record
number As String * 8
desc As String * 12
init As String * 4
price As String * 6
End Type
Dim myrec() As record
Private Sub Command1_Click()
Dim strInput As String
Dim i As Long
Dim myStr() As String, myStr2() As String
Dim iNumRecs As Long
'open the file
Open "bob.txt" For Binary As #1
'read the file
strInput = Input(LOF(1), #1)
'split the string into records
myStr = Split(strInput, vbCrLf)
'clear that string
strInput = ""
'find the number of records
iNumRecs = (UBound(myStr))
'redim the array to the number of records
ReDim myrec(iNumRecs) As record
'loop to fill the array
For i = 0 To iNumRecs
'split the record into fields
myStr2 = Split(myStr(i), vbTab)
myrec(i).number = myStr2(0)
myrec(i).desc = myStr2(1)
myrec(i).price = myStr2(2)
myrec(i).init = myStr2(3)
Next i
Close #1
End Sub
Sorry to keep posting, but once i get involved with a problem i can't help myself.
Iain, thats with an i by the way!
-
May 10th, 2000, 05:55 PM
#14
Thread Starter
Member
Thanks ALL,
Iain, That last piece of code works - thank you very much for the time and effort - much appreciated - I would have been a long time creating same.
It works, but is extremely slow because of the amount of records. I am beginning to think that I should schedule a job to run overnight which would populate an SQL Database with these records, and my program could then query the database.
Your effort has not been in vain as you have thought me some very useful techniques. Thanks.
Bob
-
May 11th, 2000, 12:21 AM
#15
Fanatic Member
Again
Though i agree with you that SQL is the way to go, this problem has intrigued me. I just can't stop myself from trying to make it run faster. And i have finally succedded, with quite a margin.
Just in case any one else is interested.
Forget the posts that read the file in one go. They are far to slow.
It is back to my first solution, where you read one line at a time. With only a very small modification it runs twice as quick as the other two solutions.
I really don't know what i was thinking. The guy told me there was 280,000 records. Why did i dimension the array to only 100, and increase it 100 at a time. God knows. It is the ReDim'ing of the array that kills the function, Because we have to preserve it as well.
So the very simple solution is to start with a very large array, if you know you will have 280,00 records then start with an array that size. Then if you do have to make it larger, add a lot more than 100 elements at a time.
The post i am talking about has been modified, to how it should coded to make it run faster.
Regards.
Iain, thats with an i by the way!
-
May 11th, 2000, 01:48 AM
#16
Lively Member
is this just for searching or are you going to write out to this file as well? There's a much less memory intensive way to pull this information if it's not going to be modifiable here.
-
May 11th, 2000, 09:11 AM
#17
If you just want to compare one of the Fields in each record to a value or expression, consider this approach:
Add a Picturebox, (as a progressbar) and a Command Button to a Form:
Code:
Private Sub Command1_Click()
Dim iFile As Integer
Dim iRec As Long
Dim sFile As String
Dim sChunk As String
Dim lChunk As Long
Dim lLOF As Long
Dim iPos As Long
Dim ilPos As Long
Dim vFields As Variant
Dim tTimer As Single
'Time the whole process
tTimer = Timer
'Read the Large File into a String as Fast as possible as this will
'Allow us to manipulate it the fastest.
Caption = "Reading File..."
iFile = FreeFile
Open "D:\LargeFile.txt" For Binary Access Read As iFile
lChunk = 64000 'Read the File in 64K Chunks
lLOF = LOF(iFile) 'Store the Length of the File
'Create a Buffer to store the File, this method used with the Mid()
'function is much faster than doing sFile = sFile & sChunk
sFile = Space(lLOF)
While Loc(iFile) < lLOF
If lChunk + Loc(iFile) > lLOF Then lChunk = lLOF - Loc(iFile)
sChunk = Space(lChunk)
Get #iFile, , sChunk
Mid(sFile, (Loc(iFile) - lChunk) + 1, lChunk) = sChunk
Wend
Close iFile
'Now Interogate the Records, comparing the 1st Field of each record
'Use a picturebox as the progress bar for the smallest overhead
Caption = "Processing Records..."
Picture1.ScaleWidth = lLOF
Do
ilPos = iPos + 1
'Each Record is seperated by a CFLF
iPos = InStr(iPos + 1, sFile, vbCrLf)
If iPos Then
iRec = iRec + 1
Picture1.Line (ilPos, 0)-(iPos, Picture1.ScaleHeight), vbBlue, BF
'Get the Fields from the Record, (Tab delimited = Chr(9))
vFields = Split(Mid(sFile, ilPos, iPos - ilPos), Chr(9))
'Check the Field
If vFields(0) = "Field1" Then
'If it matches do whatever with the record here.
End If
End If
Loop While iPos
'Report the time taken
MsgBox iRec & " records processed in " & (Timer - tTimer) & " seconds."
End Sub
In tests on my PIII 600 with 128MB I averaged between 9,000 and 10,000 records per second, giving a total of 28 - 32 seconds to process ALL 280,000 records. (Each record was 29 Chars in length).
Regards,
- Aaron.
-
May 11th, 2000, 10:44 AM
#18
Fanatic Member
If you are intending to use this a lot (and don't want to take the SQL path) then you could read it once into memory, sort it by the field you want to search it by and dump it back to file.
In future you can use a binary-search technique to track the file by reading about ... (hang on while I wotk this out) 18 reads of the file using binary access.
The algorithm for binary-search is below, use what ever sorting technique you like as you only have to do it once.
(Binary searching is fine for strings and integers)
Code:
Dim min As Long
Dim max As Long
Dim middle As Long
Dim Found As Boolean
Dim Rec_Position As Long
' During the search the target's index will be
' between Min and Max: Min <= target index <= Max
min = 1
max = NumItems
Found = False
Do While min <= max
middle = (max + min) \ 2
If target = List(middle) Then ' We have found it!
Rec_Position = middle
Found = True
Exit Do
ElseIf target < List(middle) Then ' Search the left half.
max = middle - 1
Else ' Search the right half.
min = middle + 1
End If
Loop
If Found = True then
Debug.Print "Found at Position " & str(Rec_Position)
Else
Debug.Print "Not Found"
End if
But I admit, my first thought also was to use a binary read and the split function. If the records are already sorted however, then max 18 single record reads to find the record would be quicker.
[Edited by Paul282 on 05-12-2000 at 01:00 AM]
Paul Dwyer 
Network Engineer
Aussie In Tokyo
Using Powerbasic 6 & VB6 SP4 (Please also add your VB Version to your signature!)
-
May 15th, 2000, 09:55 PM
#19
Fanatic Member
Going up against a Guru!
I am sorry but i am going to have to disagree with Aaron Young on some points.
While i freely admit that his approach is much quicker, that picture box of his really kills things.
In tests on my PIII 600 with 128MB I averaged between 9,000 and 10,000 records per second, giving a total of 28 - 32 seconds to process ALL 280,000 records. (Each record was 29 Chars in length).
Well on my Pentium 233 with 64MB i can average 19,754 records per second, giving a total of 15 seconds to process 296,320 records.
Where i was going wrong earlier, and thank you Aaron for making me realise this, is that the VB Split function is very slow. It is better to write your own string parsing functions.
To make Aarons code run at the speed i achived on my PC, all you need is two extra lines of code.
What is slowing Aaron's code down is updating the picture box everytime you loop through a record. To fix this all you need to do is make sure you do not run that line of code every time you go around the loop.
e.g. Where Aaron does this
Code:
Picture1.Line (ilPos, 0)-(iPos, Picture1.ScaleHeight), vbBlue, BF
place this code around it.
Code:
If iRec Mod 1000 = 0 Then
Picture1.Line (ilPos, 0)-(iPos, Picture1.ScaleHeight), vbBlue, BF
End If
This will cause the update of the picture box to happen once for every 1000 times the loop goes around. Much quicker.
Iain, thats with an i by the way!
-
May 16th, 2000, 05:45 AM
#20
Picky, picky..
I failed to note what exactly you disagreed with, it seems you agree my approach is much faster, altering the update period of the Picturebox progressbar is hardly a major change. If we were to "knit pick", then I could say making the update interval larger would increase the speed gain even further, plus, if you were to add an interval to the update process you should really first check to make sure that interval doesn't fall outside the complete file length, giving an incorrect reading, if this were the case then the code would look more like this:
Code:
Private Sub Command1_Click()
Dim iFile As Integer
Dim iRec As Long
Dim sFile As String
Dim sChunk As String
Dim lChunk As Long
Dim lLOF As Long
Dim iPos As Long
Dim ilPos As Long
Dim vFields As Variant
Dim tTimer As Single
Dim iInterval As Long
'Time the whole process
tTimer = Timer
'Read the Large File into a String as Fast as possible as this will
'Allow us to manipulate it the fastest.
Caption = "Reading File..."
iFile = FreeFile
Open "D:\LargeFile.txt" For Binary Access Read As iFile
lChunk = 64000 'Read the File in 64K Chunks
lLOF = LOF(iFile) 'Store the Length of the File
'Create a Buffer to store the File, this method used with the Mid()
'function is much faster than doing sFile = sFile & sChunk
sFile = Space(lLOF)
While Loc(iFile) < lLOF
If lChunk + Loc(iFile) > lLOF Then lChunk = lLOF - Loc(iFile)
sChunk = Space(lChunk)
Get #iFile, , sChunk
Mid(sFile, (Loc(iFile) - lChunk) + 1, lChunk) = sChunk
Wend
Close iFile
'Now Interogate the Records, comparing the 1st Field of each record
'Use a picturebox as the progress bar for the smallest overhead
Caption = "Processing Records..."
Picture1.ScaleWidth = lLOF
iInterval = lLOF / 1000
Do
ilPos = iPos + 1
'Each Record is seperated by a CFLF
iPos = InStr(iPos + 1, sFile, vbCrLf)
If iPos Then
iRec = iRec + 1
If iRec Mod iInterval = 0 Then
Picture1.Line (0, 0)-(iPos, Picture1.ScaleHeight), vbBlue, BF
End If
'Get the Fields from the Record, (Tab delimited = Chr(9))
vFields = Split(Mid(sFile, ilPos, iPos - ilPos), Chr(9))
'Check the Field
If vFields(0) = "Field1" Then
'If it matches do whatever with the record here.
End If
End If
Loop While iPos
Picture1.Line (0, 0)-(Picture1.ScaleWidth, Picture1.ScaleHeight), vbBlue, BF
'Report the time taken
MsgBox iRec & " records processed in " & (Timer - tTimer) & " seconds."
End Sub
You could go to the extreme and forget the Picturebox all together, giving a process time of around 6 seconds for all 280,000 record on a machine of my spec, but the idea here was to give a faster solution to those suggested, and trimming down
to 28 seconds seemed, to me, more than sufficient.
But that's if I'm picky.
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules
|
Click Here to Expand Forum to Full Width
|