Results 1 to 20 of 20

Thread: URGENT***Reading V.Large Text Files****

  1. #1

    Thread Starter
    Member
    Join Date
    Mar 2000
    Posts
    57

    Cool

    I am trying to query text files with 280,000 records. Each record has 4 tabbed fields, and I need to query on field 1. I don't know how to do this in VB. Help Appreciated.
    Thanks in advance.
    Bob

  2. #2
    Guest

    Exclamation

    Hello Bob,

    I can tell you what I would try first, and see how it profomed.

    1) Create a UDT with the four fields
    2) Use a Get Stament to read the file into an array of the UDT
    3) Search the array with a for.. next.. loop

    Hope this helps,

  3. #3

    Thread Starter
    Member
    Join Date
    Mar 2000
    Posts
    57
    Thanks RvA,
    ...but how do I use a GET ststement to read the file into the array of User Defined Type.
    I have created a simple example to try and get it working. It reads ...

    Type record
    number as string * 8
    desc As String * 12
    init As String * 4
    price As String * 6
    End Type

    ...and...

    Dim myrec As record

    Open "c:\bob.txt" For Random As #1 Len = 90
    position = 1

    Get #1, position, myrec
    x = myrec.desc
    Debug.Print x

    Close #1

    ....This works, but how do I expand to account for ARRAY.

  4. #4
    Guest
    Hmmmmmmmm,

    I forgot you using a tab delimeted file, let me look into this and I will get back to you.

    Best

  5. #5
    Guest
    Hello Bob,

    Quick question. Would it be possible to turn this list into a comma delimited file instead of a tab delimited file? There are functions in VB to read a comma delimited file, but I haven’t been able to find one for a tab delimited file. If there it can't be done then you can either parse the file your self, or look for something else I guess.

    Best

  6. #6
    Hyperactive Member billwagnon's Avatar
    Join Date
    Jul 1999
    Location
    St. Louis, Missouri, Mississippi Valley
    Posts
    290
    You can open it in Excel, then save it as .csv. Then use INPUT to read it into an array in VB.

  7. #7
    Guest
    That's what I'm hoping he can do, the code to read it into an array then is really easy.

    Thanks,

  8. #8
    Addicted Member
    Join Date
    Aug 1999
    Location
    Ottawa,ON,Canada
    Posts
    217

    Lightbulb

    What about reading the entire file into a string (or strings if larger than a single string)? You could then use the Split function based on vbTab to break up the string into an array, where every 4 elements would be one record. Just an idea (works fast too).

    [Edited by SonGouki on 05-09-2000 at 10:20 PM]
    Dan PM
    Analyst Programmer

    VB6 SP3 (also VB4 16-bit sometimes )

  9. #9

    Thread Starter
    Member
    Join Date
    Mar 2000
    Posts
    57

    Angry

    Excel will only accept 65536 records, and I have 280,000 records. I don't understand how to read the file into a string/strings, and to Split based on VBTab. Can you give me some more info so that I can try that.
    Thanks to all.
    Bob

  10. #10
    Fanatic Member
    Join Date
    Mar 2000
    Location
    That posh bit of England known as Buckinghamshire
    Posts
    658
    This might work for you. It will input 1 line at a time from "bob.txt" and place it into the "myrec" array.

    Obviously this is going to take a bit of time. Espically as you are going to be loading 280,000 records. Best i can do in a hurry though.

    Make sure the file does not have an empty line at the bottom. it seems to crash and i havn't got time to fix it. So make sure there is no carriage return after the last record.

    Code:
    Option Explicit
    
    Private Type record
      number As String * 8
      desc As String * 12
      init As String * 4
      price As String * 6
    End Type
    
    Dim myrec() As record
    
    Private Sub Command1_Click()
        Dim strInput As String
        Dim lCount As Long
        
        'start with 280,000 records
        ReDim myrec(280000) As record
        
        'open the file
        Open "bob.txt" For Binary As #1
        
        'the counter
        lCount = 0
        Do While Not EOF(1)
          'input a line
          Line Input #1, strInput
          getRecords strInput, lCount
          lCount = lCount + 1
        Loop
        
        'resize the array to the exact number of elements.
        ReDim Preserve myrec(lCount - 1) As record
        
        Close #1
    
    End Sub
    
    Private Sub getRecords(strData As String, lIndex As Long)
        Dim strArray() As String
        
        'if the index is greater than than the Ubound of the
        'array we need to redimension the array
        If lIndex > UBound(myrec) Then
          ReDim Preserve myrec(UBound(myrec) + 100000) As record
        End If
        
        'split the input string
        strArray = Split(strData, vbTab)
        
        'assuming number is the first field, desc the second etc.
        myrec(lIndex).number = strArray(0)
        myrec(lIndex).desc = strArray(1)
        myrec(lIndex).init = strArray(2)
        myrec(lIndex).price = strArray(3)
        
    End Sub
    [Edited by Iain17 on 05-11-2000 at 06:24 PM]
    Iain, thats with an i by the way!

  11. #11
    Fanatic Member
    Join Date
    Mar 2000
    Location
    That posh bit of England known as Buckinghamshire
    Posts
    658
    I also tried what SonGouki suggested. Reading the whole file into a string and then spliting the string based on vbTab. If this worked it would indeed be quicker than my suggested way.

    The problem with that is that you have to have a tab after the last field in each record. Otherwise you get a split that is incorrect, as usually there will be no tab after the last field, and just a carriage return. Though i am sure you could work around this problem
    Iain, thats with an i by the way!

  12. #12
    Fanatic Member
    Join Date
    Mar 2000
    Location
    That posh bit of England known as Buckinghamshire
    Posts
    658

    Unhappy I'm an idiot

    Just realised that you can use the replace function to replace all the carriage returns with tab characters. Easy.

    So this is how you read the file in one fell swoop.

    Code:
    Private Type record
      number As String * 8
      desc As String * 12
      init As String * 4
      price As String * 6
    End Type
    
    Dim myrec() As record
    
    Private Sub Command1_Click()
        Dim strInput As String
        Dim i As Long
        Dim myStr() As String
        Dim iNumRecs As Long
        
        'open the file
        Open "bob.txt" For Binary As #1
        
        'read the file
        strInput = Input(LOF(1), #1)
        
        'replace all of the cariage returns with tabs
        strInput = Replace(strInput, vbCrLf, vbTab)
        
        'split the string
        myStr = Split(strInput, vbTab)
        
        'find the number of records
        iNumRecs = (UBound(myStr) + 1) / 4
        
        'redim the array to the number of records
        ReDim myrec(iNumRecs) As record
        
        'loop to fill the array
        For i = 0 To UBound(myStr) Step 4
          myrec(i \ 4).number = myStr(i + 0)
          myrec(i \ 4).desc = myStr(i + 1)
          myrec(i \ 4).price = myStr(i + 2)
          myrec(i \ 4).init = myStr(i + 3)
        Next i
            
        Close #1
    End Sub
    Hope this helps.

    [Edited by Iain17 on 05-10-2000 at 06:07 PM]
    Iain, thats with an i by the way!

  13. #13
    Fanatic Member
    Join Date
    Mar 2000
    Location
    That posh bit of England known as Buckinghamshire
    Posts
    658

    Opps!

    Just tested the function to read the whole file on 100,000 records, and it takes a bloody long time, and while it is splitting you can't let the user know what is going on.

    So i have updated the function. This time you read the whole file, but instead of splitting on tab cahracters, you split on carriage returns. This gives you the number of records.

    Then you go through each of these records to fill in the array.

    Code:
    Option Explicit
    
    Private Type record
      number As String * 8
      desc As String * 12
      init As String * 4
      price As String * 6
    End Type
    
    Dim myrec() As record
    
    Private Sub Command1_Click()
        Dim strInput As String
        Dim i As Long
        Dim myStr() As String, myStr2() As String
        Dim iNumRecs As Long
        
        'open the file
        Open "bob.txt" For Binary As #1
        
        'read the file
        strInput = Input(LOF(1), #1)
        
        'split the string into records
        myStr = Split(strInput, vbCrLf)
        
        'clear that string
        strInput = ""
        
        'find the number of records
        iNumRecs = (UBound(myStr))
        
        'redim the array to the number of records
        ReDim myrec(iNumRecs) As record
        
        'loop to fill the array
        For i = 0 To iNumRecs
          'split the record into fields
          myStr2 = Split(myStr(i), vbTab)
          myrec(i).number = myStr2(0)
          myrec(i).desc = myStr2(1)
          myrec(i).price = myStr2(2)
          myrec(i).init = myStr2(3)
        Next i
            
        Close #1
    End Sub
    Sorry to keep posting, but once i get involved with a problem i can't help myself.
    Iain, thats with an i by the way!

  14. #14

    Thread Starter
    Member
    Join Date
    Mar 2000
    Posts
    57

    Cool

    Thanks ALL,

    Iain, That last piece of code works - thank you very much for the time and effort - much appreciated - I would have been a long time creating same.
    It works, but is extremely slow because of the amount of records. I am beginning to think that I should schedule a job to run overnight which would populate an SQL Database with these records, and my program could then query the database.

    Your effort has not been in vain as you have thought me some very useful techniques. Thanks.

    Bob

  15. #15
    Fanatic Member
    Join Date
    Mar 2000
    Location
    That posh bit of England known as Buckinghamshire
    Posts
    658

    Wink Again

    Though i agree with you that SQL is the way to go, this problem has intrigued me. I just can't stop myself from trying to make it run faster. And i have finally succedded, with quite a margin.

    Just in case any one else is interested.

    Forget the posts that read the file in one go. They are far to slow.

    It is back to my first solution, where you read one line at a time. With only a very small modification it runs twice as quick as the other two solutions.

    I really don't know what i was thinking. The guy told me there was 280,000 records. Why did i dimension the array to only 100, and increase it 100 at a time. God knows. It is the ReDim'ing of the array that kills the function, Because we have to preserve it as well.

    So the very simple solution is to start with a very large array, if you know you will have 280,00 records then start with an array that size. Then if you do have to make it larger, add a lot more than 100 elements at a time.

    The post i am talking about has been modified, to how it should coded to make it run faster.

    Regards.
    Iain, thats with an i by the way!

  16. #16
    Lively Member
    Join Date
    Jan 1999
    Location
    Rochester NY, USA
    Posts
    93
    is this just for searching or are you going to write out to this file as well? There's a much less memory intensive way to pull this information if it's not going to be modifiable here.

  17. #17
    Guru Aaron Young's Avatar
    Join Date
    Jun 1999
    Location
    Red Wing, MN, USA
    Posts
    2,177
    If you just want to compare one of the Fields in each record to a value or expression, consider this approach:

    Add a Picturebox, (as a progressbar) and a Command Button to a Form:
    Code:
    Private Sub Command1_Click()
        Dim iFile As Integer
        Dim iRec As Long
        Dim sFile As String
        Dim sChunk As String
        Dim lChunk As Long
        Dim lLOF As Long
        Dim iPos As Long
        Dim ilPos As Long
        Dim vFields As Variant
        Dim tTimer As Single
        
        'Time the whole process
        tTimer = Timer
        'Read the Large File into a String as Fast as possible as this will
        'Allow us to manipulate it the fastest.
        Caption = "Reading File..."
        iFile = FreeFile
        Open "D:\LargeFile.txt" For Binary Access Read As iFile
        lChunk = 64000      'Read the File in 64K Chunks
        lLOF = LOF(iFile)   'Store the Length of the File
        
        'Create a Buffer to store the File, this method used with the Mid()
        'function is much faster than doing sFile = sFile & sChunk
        sFile = Space(lLOF)
        While Loc(iFile) < lLOF
            If lChunk + Loc(iFile) > lLOF Then lChunk = lLOF - Loc(iFile)
            sChunk = Space(lChunk)
            Get #iFile, , sChunk
            Mid(sFile, (Loc(iFile) - lChunk) + 1, lChunk) = sChunk
        Wend
        Close iFile
        
        'Now Interogate the Records, comparing the 1st Field of each record
        'Use a picturebox as the progress bar for the smallest overhead
        Caption = "Processing Records..."
        Picture1.ScaleWidth = lLOF
        Do
            ilPos = iPos + 1
            'Each Record is seperated by a CFLF
            iPos = InStr(iPos + 1, sFile, vbCrLf)
            If iPos Then
                iRec = iRec + 1
                Picture1.Line (ilPos, 0)-(iPos, Picture1.ScaleHeight), vbBlue, BF
                'Get the Fields from the Record, (Tab delimited = Chr(9))
                vFields = Split(Mid(sFile, ilPos, iPos - ilPos), Chr(9))
                'Check the Field
                If vFields(0) = "Field1" Then
                    'If it matches do whatever with the record here.
                End If
            End If
        Loop While iPos
        'Report the time taken
        MsgBox iRec & " records processed in " & (Timer - tTimer) & " seconds."
    End Sub
    In tests on my PIII 600 with 128MB I averaged between 9,000 and 10,000 records per second, giving a total of 28 - 32 seconds to process ALL 280,000 records. (Each record was 29 Chars in length).

    Regards,

    - Aaron.

  18. #18
    Fanatic Member
    Join Date
    Feb 2000
    Location
    Japan
    Posts
    840
    If you are intending to use this a lot (and don't want to take the SQL path) then you could read it once into memory, sort it by the field you want to search it by and dump it back to file.

    In future you can use a binary-search technique to track the file by reading about ... (hang on while I wotk this out) 18 reads of the file using binary access.

    The algorithm for binary-search is below, use what ever sorting technique you like as you only have to do it once.

    (Binary searching is fine for strings and integers)

    Code:
    Dim min As Long
    Dim max As Long
    Dim middle As Long
    Dim Found As Boolean 
    Dim Rec_Position As Long
    
        ' During the search the target's index will be
        ' between Min and Max: Min <= target index <= Max
        min = 1
        max = NumItems
        Found = False
    
      
        Do While min <= max
            
            middle = (max + min) \ 2
            If target = List(middle) Then     ' We have found it!
                Rec_Position = middle
                 Found = True         
                 Exit Do
            ElseIf target < List(middle) Then ' Search the left half.
                max = middle - 1
            Else                              ' Search the right half.
                min = middle + 1
            End If
        Loop
        
    If Found = True then
        Debug.Print "Found at Position " & str(Rec_Position)
    Else
        Debug.Print "Not Found"
    End if
    But I admit, my first thought also was to use a binary read and the split function. If the records are already sorted however, then max 18 single record reads to find the record would be quicker.

    [Edited by Paul282 on 05-12-2000 at 01:00 AM]
    Paul Dwyer
    Network Engineer
    Aussie In Tokyo

    Using Powerbasic 6 & VB6 SP4 (Please also add your VB Version to your signature!)

  19. #19
    Fanatic Member
    Join Date
    Mar 2000
    Location
    That posh bit of England known as Buckinghamshire
    Posts
    658

    Wink Going up against a Guru!

    I am sorry but i am going to have to disagree with Aaron Young on some points.

    While i freely admit that his approach is much quicker, that picture box of his really kills things.

    In tests on my PIII 600 with 128MB I averaged between 9,000 and 10,000 records per second, giving a total of 28 - 32 seconds to process ALL 280,000 records. (Each record was 29 Chars in length).
    Well on my Pentium 233 with 64MB i can average 19,754 records per second, giving a total of 15 seconds to process 296,320 records.


    Where i was going wrong earlier, and thank you Aaron for making me realise this, is that the VB Split function is very slow. It is better to write your own string parsing functions.

    To make Aarons code run at the speed i achived on my PC, all you need is two extra lines of code.

    What is slowing Aaron's code down is updating the picture box everytime you loop through a record. To fix this all you need to do is make sure you do not run that line of code every time you go around the loop.

    e.g. Where Aaron does this
    Code:
    Picture1.Line (ilPos, 0)-(iPos, Picture1.ScaleHeight), vbBlue, BF
    place this code around it.
    Code:
    If iRec Mod 1000 = 0 Then
      Picture1.Line (ilPos, 0)-(iPos, Picture1.ScaleHeight), vbBlue, BF
    End If
    This will cause the update of the picture box to happen once for every 1000 times the loop goes around. Much quicker.
    Iain, thats with an i by the way!

  20. #20
    Guru Aaron Young's Avatar
    Join Date
    Jun 1999
    Location
    Red Wing, MN, USA
    Posts
    2,177

    Picky, picky..

    I failed to note what exactly you disagreed with, it seems you agree my approach is much faster, altering the update period of the Picturebox progressbar is hardly a major change. If we were to "knit pick", then I could say making the update interval larger would increase the speed gain even further, plus, if you were to add an interval to the update process you should really first check to make sure that interval doesn't fall outside the complete file length, giving an incorrect reading, if this were the case then the code would look more like this:
    Code:
    Private Sub Command1_Click()
        Dim iFile As Integer
        Dim iRec As Long
        Dim sFile As String
        Dim sChunk As String
        Dim lChunk As Long
        Dim lLOF As Long
        Dim iPos As Long
        Dim ilPos As Long
        Dim vFields As Variant
        Dim tTimer As Single
        Dim iInterval As Long
        
        'Time the whole process
        tTimer = Timer
        'Read the Large File into a String as Fast as possible as this will
        'Allow us to manipulate it the fastest.
        Caption = "Reading File..."
        iFile = FreeFile
        Open "D:\LargeFile.txt" For Binary Access Read As iFile
        lChunk = 64000      'Read the File in 64K Chunks
        lLOF = LOF(iFile)   'Store the Length of the File
        
        'Create a Buffer to store the File, this method used with the Mid()
        'function is much faster than doing sFile = sFile & sChunk
        sFile = Space(lLOF)
        While Loc(iFile) < lLOF
            If lChunk + Loc(iFile) > lLOF Then lChunk = lLOF - Loc(iFile)
            sChunk = Space(lChunk)
            Get #iFile, , sChunk
            Mid(sFile, (Loc(iFile) - lChunk) + 1, lChunk) = sChunk
        Wend
        Close iFile
        
        'Now Interogate the Records, comparing the 1st Field of each record
        'Use a picturebox as the progress bar for the smallest overhead
        Caption = "Processing Records..."
        Picture1.ScaleWidth = lLOF
        iInterval = lLOF / 1000
        Do
            ilPos = iPos + 1
            'Each Record is seperated by a CFLF
            iPos = InStr(iPos + 1, sFile, vbCrLf)
            If iPos Then
                iRec = iRec + 1
                If iRec Mod iInterval = 0 Then
                    Picture1.Line (0, 0)-(iPos, Picture1.ScaleHeight), vbBlue, BF
                End If
                'Get the Fields from the Record, (Tab delimited = Chr(9))
                vFields = Split(Mid(sFile, ilPos, iPos - ilPos), Chr(9))
                'Check the Field
                If vFields(0) = "Field1" Then
                    'If it matches do whatever with the record here.
                End If
            End If
        Loop While iPos
        Picture1.Line (0, 0)-(Picture1.ScaleWidth, Picture1.ScaleHeight), vbBlue, BF
        'Report the time taken
        MsgBox iRec & " records processed in " & (Timer - tTimer) & " seconds."
    End Sub
    You could go to the extreme and forget the Picturebox all together, giving a process time of around 6 seconds for all 280,000 record on a machine of my spec, but the idea here was to give a faster solution to those suggested, and trimming down
    .. a bloody long time ..
    to 28 seconds seemed, to me, more than sufficient.

    But that's if I'm picky.


Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  



Click Here to Expand Forum to Full Width