Results 1 to 25 of 25

Thread: Counting lines in a text file

  1. #1

    Thread Starter
    Member
    Join Date
    Dec 1999
    Location
    Cleveland,OH
    Posts
    42

    Question

    I am loading data from a CSV into a dynamic array that I size before reading the data into it. First thing I do is open the file and count the lines.

    Open FileName For Input As #1
    Do Until EOF(1)
    Line Input #1, SomeVariable
    Linecount = linecount + 1
    Loop
    Close #1

    Then I ReDIM my array to the line count

    Redim MyArray(linecount,5)

    After that I reopen the file and load the array from the file. Although this works for now for a couple of hundred records I have some concerns when the data grows. This is for data import and is only temporary for loading into DB. So is there a way to count the lines without reading all of them? Then I can change the code to process 100 lines at a time until done. That way avoiding some huge array that chokes out of memory or something.

    Thanks

  2. #2
    Fanatic Member
    Join Date
    Jan 1999
    Location
    UK
    Posts
    554
    Your worrying over nothing as:

    You dim the variable as local in the procedure so every time a new line text is read in it overwrites the previous and all your keeping hold of the total count value which, if its an integer its only 2 bytes and if its a long then its four bytes

    Your Code:- documented
    Code:
    'open file
    Open FileName For Input As #1 
    'loop until end of file
    Do Until EOF(1) 
    'read first/next textline into Somevariable (the buffer)
    Line Input #1, SomeVariable 
    'increment counter
    Linecount = linecount + 1 
    Loop 
    'close file
    Close #1
    So everytime a new line of text is read in, it will overwrite the existing value (as long as your not storing it in an array)

    DocZaf


  3. #3

    Thread Starter
    Member
    Join Date
    Dec 1999
    Location
    Cleveland,OH
    Posts
    42

    Unhappy REDIM The Array

    I know about that. It’s just with test files of 10,000 + lines, it draws a little power and time to read the file. I just wanted to know if that was the most efficient method. It seams a little excessive and wasteful.

    Thanks for the reply

  4. #4
    Hyperactive Member
    Join Date
    May 2000
    Location
    Or
    Posts
    316
    I'm not sure if it would be faster, but you could alway avoid reading the lines, and simply use the split function. Here is what I mean:

    Code:

    Private Sub Command1_Click()
    Dim fso, sTemp(1)
    Dim newVar() As String
    Dim myVar As String

    Set fso = CreateObject("scripting.FilesystemObject")
    Set sTemp(0) = fso.GetFile("c:\windows\desktop\testfile.txt")
    Set sTemp(1) = sTemp(0).OpenAsTextStream(1, -2)
    myVar = sTemp(1).ReadAll
    newVar = Split(myVar, Chr(13) & Chr(10))
    For x = 0 To UBound(newVar)
    MsgBox newVar(x)
    Next
    End Sub

    However, with 10,000 records, I'm not sure how much faster this would be, but at least you could avoid all the file accesses.

    (Using VB6 SP3)

  5. #5
    Fanatic Member
    Join Date
    Feb 2000
    Location
    Japan
    Posts
    840
    True, The filesystem object can be slow. The quickest way I know is this.

    Pseudo code---

    Open the file in binary mode
    GET the file to a byte array (can open several MB per second
    MYString = StrConv(ByteArray, VBUnicode) (pretty much instant)

    The Split() function will redim the array for you. then just loop from zero to the Ubound()
    Paul Dwyer
    Network Engineer
    Aussie In Tokyo

    Using Powerbasic 6 & VB6 SP4 (Please also add your VB Version to your signature!)

  6. #6
    Fanatic Member
    Join Date
    Mar 2000
    Location
    That posh bit of England known as Buckinghamshire
    Posts
    658
    I have done som eresearch into this a while ago, and avoid using the split command at all costs. It is extremley slow. It is better to write your own funcion.

    The way to go is to read the whole file in one go, and count the number of vbCrLf in the string that is returned.

    So do as paul says, or use the inputB function. Then use the instr function in a loop to count the number of new line characters.
    Iain, thats with an i by the way!

  7. #7
    Fanatic Member
    Join Date
    Feb 2000
    Location
    Japan
    Posts
    840
    I was going to write that too, but, from what I could gather the line count was just for splitting the function anyway so it would be better just to split than to count and then split.

    The Split function makes a poor line counter but it's a good splitter.

    If your intention is just to count lines then loop the instr but if that's just to loop your own split function then go straight for the split()

    I'm a bit of a fan of just "going straight for the Split"

    Paul Dwyer
    Network Engineer
    Aussie In Tokyo

    Using Powerbasic 6 & VB6 SP4 (Please also add your VB Version to your signature!)

  8. #8
    Fanatic Member
    Join Date
    Mar 2000
    Location
    That posh bit of England known as Buckinghamshire
    Posts
    658
    No spliter is a poor spliter. As i said earlier it is very slow. I think, this is just a guess that it uses ReDim preserve evertime it finds a new character to splt on. That is why it is very slow.

    Just like VB's InStrRev function. Why reverse the string then use InStr? Why just not start from the end of the string and loop backwards? Just a couple of examples of why you should write your own functions. Microsoft are crap at writing them.

    It is fairly easy to write your own spliting function, and if you get stuck we are more than willing ot help out. Basically though, you use the InStr() and Mid$() functions with a couple of pointers.
    Iain, thats with an i by the way!

  9. #9
    Fanatic Member
    Join Date
    Feb 2000
    Location
    Japan
    Posts
    840
    Really?

    I found the splitter to be a poor line counter like I said but I'd like to see some code to beat the split function. The MS knowedge base has some code (the code?) for split and join as they weren't in VB5 and they don't seem to run any better, you got a good function I'd like to do some tests...

    Have you got anything you could post?

    Paul
    Paul Dwyer
    Network Engineer
    Aussie In Tokyo

    Using Powerbasic 6 & VB6 SP4 (Please also add your VB Version to your signature!)

  10. #10
    Fanatic Member
    Join Date
    Mar 2000
    Location
    That posh bit of England known as Buckinghamshire
    Posts
    658
    Paul,

    Here ya go. It is only really noticable on very large strings.

    Code:
    Function mySplit(strToSplt As String, strSplitOn As String) As String()
        Dim ip1 As Long, ip2 As Long
        Dim strArray() As String
        Dim iCount As Long
        
        
        ip1 = 1: ip2 = 1
        Do
          ip2 = InStr(ip1, strToSplt, strSplitOn)
          If ip2 = 0 Then
            ip2 = Len(strToSplt) + 1
          End If
          
          If iCount Mod 100 = 0 Then
            ReDim Preserve strArray(iCount + 100) As String
          End If
          strArray(iCount) = Mid$(strToSplt, ip1, ip2 - ip1)
          
          ip1 = ip2 + Len(strSplitOn)
          iCount = iCount + 1
          
        Loop Until ip2 >= Len(strToSplt)
        
        ReDim Preserve strArray(iCount - 1)
        mySplit = strArray
        
    End Function
    Iain, thats with an i by the way!

  11. #11
    Fanatic Member
    Join Date
    Mar 2000
    Location
    That posh bit of England known as Buckinghamshire
    Posts
    658
    I have done some tests on the VB Split() command against the above code, and the results are quite impresive.

    Also when i upped the charcter even more, the VB Split() function ran out of memory, and mine did not.


    Code:
    String length = 4,669,056   'a bit excesive i know
    
    
    Number     |  VB Split()   |  My Split()
     Of        |    Time       |    Time
    Splits     |               |
    ---------------------------------------------
               |               |
    712080     |  52.89 s      |  2.44 s
               |               |
    ---------------------------------------------
               |               |
    256032     |  11.64 s      |  2.3 s
               |               |                 
    ---------------------------------------------
               |               |
    50400      |  3.04 s       |  1.22 s
               |               |                 
    ---------------------------------------------
    
    
    Results from a 700mhz with 128mb ram


    I have also writen a function that will match the InStrRev function in vb6.

    The way Microsoft suggest you write it is a complete load of crap. There technique ran out of memory on the string i used above. This way of doing it found a string that i had put at the beginning of the string used above in 1.3 seconds.

    Code:
    Function myInStrRev(strStringToSearch As String, strFind As String, _
                        Optional iStart As Long) As Long
    
        Dim ip1 As Long, ip2 As Long
    
        Dim iLenStringToSearch As Long
        
        'get the length of the string
        iLenStringToSearch = Len(strStringToSearch)
        
        'if the start is 0 then set the start to the length
        'og the string
        If iStart = 0 Then
          iStart = iLenStringToSearch
        End If
        
        ip1 = 1
        Do
          ip2 = InStr(ip1, strStringToSearch, strFind)
          
          If (ip2 > 0) And (ip2 < iStart) Then
            'if ip2 is not zero and it is less than the
            'place to start searching then set the function
            'to return that position
            myInStrRev = ip2
          ElseIf ip2 = 0 Then
            ip2 = iLenStringToSearch
          End If
          
          ip1 = ip2 + 1
        
        Loop Until ip1 >= iStart
        
    End Function

    [Edited by Iain17 on 06-23-2000 at 06:41 AM]
    Iain, thats with an i by the way!

  12. #12
    Fanatic Member
    Join Date
    Feb 2000
    Location
    Japan
    Posts
    840
    Impressive

    Is the "Number of splits" the number of time the split function was called or the number of pieces the text was split into?
    Paul Dwyer
    Network Engineer
    Aussie In Tokyo

    Using Powerbasic 6 & VB6 SP4 (Please also add your VB Version to your signature!)

  13. #13
    Fanatic Member
    Join Date
    Mar 2000
    Location
    That posh bit of England known as Buckinghamshire
    Posts
    658
    The number of pieces the text was split into.
    Iain, thats with an i by the way!

  14. #14
    PowerPoster Chris's Avatar
    Join Date
    Jan 1999
    Location
    K-PAX
    Posts
    3,238

    Lightbulb

    May be you need to download this sample project ro understant the API function that use to extend the TextBox control.

    Sample Project

  15. #15
    Fanatic Member
    Join Date
    Feb 2000
    Location
    Japan
    Posts
    840
    Originally posted by Iain17
    The number of pieces the text was split into.
    hmmmm...

    I've grabbed a copy of this code, I want to do some tests too. I always found split to be fast but you seem to have an order of magnitude on it.

    Paul Dwyer
    Network Engineer
    Aussie In Tokyo

    Using Powerbasic 6 & VB6 SP4 (Please also add your VB Version to your signature!)

  16. #16
    Fanatic Member
    Join Date
    Feb 2000
    Location
    Japan
    Posts
    840

    Thumbs up

    I like it!

    I think I'll add an enum though to hint at the array size.

    ie
    Code:
    Public Enum ArraySize
        vbSmallArray = 50
        vbMedArray = 100
        vbLargeArray = 500
    End Enum
    
    Function myInStrRev(strStringToSearch As String, strFind As String, Size as ArraySize, Optional iStart As Long) As Long
    Then use the enum in the Mod code:
    "If iCount Mod Size = 0 Then"

    Haven't really tested it to get the sizes and the name is not intuitive enough to be able to pick without knowing the function code, but you see where I'm going...

    Nice piece of code
    Paul Dwyer
    Network Engineer
    Aussie In Tokyo

    Using Powerbasic 6 & VB6 SP4 (Please also add your VB Version to your signature!)

  17. #17
    Frenzied Member
    Join Date
    Jan 2000
    Location
    Bellevue, WA, USA
    Posts
    1,357

    I must be missing something, but...

    Why can't you just write to the array as you read the file instead of making 2 passes through it?

    Use Redim Preserve inside the loop:
    Code:
    Open FileName For Input As #1 
    Do Until EOF(1) 
        Linecount = Linecount + 1
        Redim Preserve MyArray(linecount,5)
        Line Input #1, MyArray(linecount,5)
    Loop 
    Close #1
    Isn't this a good, easy way to do it? What am I missing?

    ~seaweed

  18. #18

    Thread Starter
    Member
    Join Date
    Dec 1999
    Location
    Cleveland,OH
    Posts
    42

    Talking

    Why not Redim in the loop? Or just drop it in the array during the loop. Well that would open another whole topic. I really appreciate all the info and I have a few things to work on now. The over all picture is I am reworking output data from another program that no one knows anything about. Some guy wrote and no one has the source and he is gone. Anyway the data gets dropped over to a DB after the data is sorted and merged with another file. I had to make it viewable in a grid if someone wanted to verify the data before accepting the records. As for the Redim Preserve I believe that only works for the last dimension. Redim Preserve MyArray(5,linecount).

    Thanks Again Everyone and Iain17

  19. #19
    Fanatic Member
    Join Date
    Feb 2000
    Location
    Japan
    Posts
    840

    Re: I must be missing something, but...

    Seaweed,

    The problem with that is that:

    that you're rediming every line every line, which can be very slow if you have to do it a lot.

    get binary is very quick to load files too
    Paul Dwyer
    Network Engineer
    Aussie In Tokyo

    Using Powerbasic 6 & VB6 SP4 (Please also add your VB Version to your signature!)

  20. #20
    Lively Member Nigorr's Avatar
    Join Date
    Apr 2002
    Location
    Brisbane, Australia
    Posts
    106

    OK... So atm I'm using VBA

    I am trying to adapt this code to work in VBA the problem is VBA does not like the following:
    Function mySplit(strToSplt As String, strSplitOn As String) As String().

    So how do I get a split function to work in VBA??? Office 97

  21. #21
    Fanatic Member
    Join Date
    Feb 2000
    Location
    Japan
    Posts
    840
    I'm not that familiar with Office VBA but in ASP you'd have to drop all the types as everything is a varient.

    So try getting rid of all the "as string" and "as long" stuff
    Paul Dwyer
    Network Engineer
    Aussie In Tokyo

    Using Powerbasic 6 & VB6 SP4 (Please also add your VB Version to your signature!)

  22. #22
    Retired VBF Adm1nistrator plenderj's Avatar
    Join Date
    Jan 2001
    Location
    Dublin, Ireland
    Posts
    10,359
    I've not bothered reading the discussion, so forgive me if this has already been suggested, but why not use this approach :

    VB Code:
    1. Private Function numLinesInFile(ByVal strFile As String) As Long
    2.     Open strFile For Binary As #1
    3.         Dim strBuff As String: strBuff = Space(LOF(1))
    4.         Get #1, , strBuff
    5.         numLinesInFile = UBound(Split(strBuff, vbCrLf)) + 1
    6.     Close #1
    7. End Function

    Available from my "Various Tutorials" > "File I/O Functions" page
    Microsoft MVP : Visual Developer - Visual Basic [2004-2005]

  23. #23
    Addicted Member Guru's Avatar
    Join Date
    May 2000
    Location
    sulking in the cupboard under the stairs
    Posts
    237

    Angry

    Originally posted by plenderj
    I've not bothered reading the discussion,
    Perhaps you ought to read it all!
    ****!
    The thread is nearly 2 years old.
    Another light-hearted post from Guru

  24. #24
    Retired VBF Adm1nistrator plenderj's Avatar
    Join Date
    Jan 2001
    Location
    Dublin, Ireland
    Posts
    10,359
    ah.
    well Nigorr and paul282 bumped it by posting here so thats why I replied
    Microsoft MVP : Visual Developer - Visual Basic [2004-2005]

  25. #25
    Frenzied Member macai's Avatar
    Join Date
    Jul 2001
    Location
    Napanoch NY
    Posts
    1,228
    VB Code:
    1. Public Function txtLines(TextFile As String) As Long
    2.  Dim SplitLines() As String
    3.  Open TextFile For Input As #1
    4.  SplitLines = Split(Input(LOF(1), 1), vbNewLine)
    5.  txtLines = UBound(SplitLines) + 1
    6. End Function
    This help?
    Luke

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  



Click Here to Expand Forum to Full Width