Does anybody know how to count the number of lines in a text file?
Printable View
Does anybody know how to count the number of lines in a text file?
You can read all the line and count it.
Code:Dim iFile As Integer
Dim OneLine As Integer
Dim NbrLine As Integer
NbrLine=0
Open "MyFileNameAndPath" For Input As iFile
While Not Eof(iFile)
Line Input #iFile,OneLine
NbrLine=NbrLine+1
Wend
Close #iFile
Msgbox "The file contain " & NbrLine & " line(s)"
This one should be faster and work also with non-textfiles
Code:Dim ff as integer,buffer as string, X as long, Lines as long
ff=freefile
Open Yourfile for Binary as ff
Get# ff,,Buffer
Close ff
X=instr(Buffer,vbcrlf)
Do While X
X=instr(X+1,Buffer,vbcrlf)
Lines=Lines+1
Loop
Ok, you asked for it, here is the quickest way to count the number of lines in a text file.
Code:Option Explicit
Function lineCount(myInFile As String) As Long
Dim lFileSize As Long, lChunk As Long
Dim bFile() As Byte
Dim lSize As Long
Dim strText As String
'the size of the chunk to read in. You can experiment
'with this to see what works fastest.
lSize = CLng(1024) * 10
'size the array to the chunk size
ReDim bFile(lSize - 1) As Byte
Open myInFile For Binary As #1
'get the file size
lFileSize = LOF(1)
'set the chunk number to 1
lChunk = 1
Do While (lSize * lChunk) < lFileSize
'get the data from the in file
Get #1, , bFile
strText = StrConv(bFile, vbUnicode)
'get the line count for this chunk
lineCount = lineCount + searchText(strText)
'increment the chunk count
lChunk = lChunk + 1
Loop
'redim the array to the remaining size
ReDim bFile((lFileSize - (lSize * (lChunk - 1))) - 1) As Byte
'get the remaining data
Get #1, , bFile
strText = StrConv(bFile, vbUnicode)
'get line count for this chunk
lineCount = lineCount + searchText(strText)
'close the file
Close #1
lineCount = lineCount + 1
End Function
Private Function searchText(strText As String) As Long
Static blPossible As Boolean
Dim lp1 As Long
'if we have a possible line count
If blPossible = True Then
'if the fist charcter is chr(10) then we have a new line
If Left$(strText, 1) = Chr(10) Then
searchText = searchText + 1
End If
End If
blPossible = False
'loop through counting vbCrLf's
lp1 = 1
Do
lp1 = InStr(lp1, strText, vbCrLf)
If lp1 <> 0 Then
searchText = searchText + 1
lp1 = lp1 + 2
End If
Loop Until lp1 = 0
'if the last character is a chr(13) then we may have a
'new line, so we mark it as possible
If Right$(strText, 1) = Chr(13) Then
blPossible = True
End If
End Function
Or, (don't you just love all the alternatives), you can use the ever-popular API.
Code:Private Declare Function SendMessage Lib "user32" Alias "SendMessageA" (ByVal hwnd As Long, ByVal wMsg As Long, ByVal wParam As Long, lParam As Any) As Long
Private Const EM_GETLINECOUNT = &HBA
Private Sub Command1_Click()
Dim iLines As Integer
iLines = SendMessage(Text1.hwnd, EM_GETLINECOUNT, 0, 0)
MsgBox ("There are " & iLines & " lines in Text1")
End Sub
Or dont you just love people that post an alternative that doesn't do what was asked, that counts the lines in a text box, not a file.
Sorry. That is what I get for trying to help.
Actually, if you have really big file then using hidden textbox to load the file there and count lines using API will be much faster then count them in the loop.
Just my $0.02
Again, I will have to be the one to beg to differ. Until
someone shows me a better way to count the lines in a text
file, I still say that my code is the fastest. Apart from
reading the file a line at a time, it is also the easiest
on memory. You try loading a 10mb file into a text box,
and see how long it takes you. Especially on my very slow
computer at work. (233mhz.)
Just to make sure, I ran some tests. I used a 10mb file,
that contained 100,000 lines of text in it.
- My code took 1.6 seconds to count all of the lines.
- The line input method took 5.5 seconds.
- Loading the file into a text box then using the API took 24
seconds, and it got the count wrong. (returned 1)- The RichtTextBox’s loadfile method then using the API took
5 minutes, and got the count wrong. (returned 5 times as
many lines as there were)
Iain, what makes you think your algoritm is the fastest?
I haven't tested mine neigher anyones but looking at your code i find:
1. You load the file in 10240 byte chunks
2. You convert a byte array into a string instead of getting a string directly
3. You call searchtext (ok that doesn't make much difference, except the argument is byref)
4. You check for the first and last byte to see the connection (ok that's not much)
5. You use Instr to search (and actually i'm using the same but i think that's the real bad guy slowing our code)
But 1 and 2 makes your code slower than mine, i think, you should check it out
Kedaman,
The point of programming is that your algorithm will work on any machine,
not just one with 1gb of memory. I would not have posted a reply if i was
not reasonably sure that the code was as quick as i could get it. I have
experimented with the chunk size, and that is the size that i have found works
the fastest.
I take your point about reading the file into a byte array, and then using
StrConv. I will test reading it straight into a sting later on today.
My major complaint with your code is that it will not work on very large files.
Even if it would, it would be slower than reading the file in chunks. The
trick you see is to balance the processing power against the reading
of the data from the file.
To extend this idea. If you do manage to read in a 50mb file into memory.
It will not all be in memory. Some of it will be paged out into virtual memory
(the hard drive again) so you will just have to read it from the harddrive
twice in the end.
But thanks for the suggestion about using a string to start with. I will modify
my code to try that.
Youre right, IF the file size exceeded free physical memory and swaps enough on a harddisk, it will be slower. But I think there's more relevance in a practical size of a text file wouldn't exceed for instance 5M, which should be enough for most computers. Now this argumentation could get very complex so let's just say the situation chooses the method. (Nobody wins the game :()Quote:
My major complaint with your code is that it will not work on very large files.
Even if it would, it would be slower than reading the file in chunks. The
trick you see is to balance the processing power against the reading
of the data from the file.
Now i would myself complain more on Instr for the slowness.
I'm not sure but maybe if you compare directly with the byte array, you could get pretty fast code. :D
Kedaman,
You are missing the point my friend. Lets go back to the way that processors
and hard-drives interact.
When you ask for some information from a file, or the whole file itself, the
processor tells the hard-drive what file, or which bit of a file it wants. The
hard-drive then goes off and gets the data, returning any data that it finds as
quickly as it can.
See that, as quickly as it can, which is no where near as quick as the processor
can go. So when you ask the hard-drive for a 5mb file, the processor is basically
sat there twiddling its thumbs waiting for information from the hard-drive.
This is where balancing the reading of data from the hard-drive, and searching
through the data returned becomes important. So we ask for a smaller chunk.
Obviously the processor has some work to do, by asking for the chunk, and dealing
with where to put that data, so if we ask for the right sized chunk, the processor is
not idling as much comparatively. Now we have a chunk back, we go into a tight
InStr loop, that pretty much uses the processor to 100% of its capability.
As you can see, it is just resource management. By reading the file in chunks, we
are trying to use as much of the processors power as possible, so we get little waste
in idle time.
I am also going to have to beg to differ about the InStr command. It is in fact an
extremely fast function.
Also, going back to your post before the last one. Passing an argument by reference
is quicker than passing it by value. I would have thought you would have known that.
I could of course place the looping code directly in the function itself, but for ease
of reading, and code re-use, I created a separate function.
Anyway, back to the Get command and byte arrays. So far it seems that you are right.
Creating a string the size of the chunk we want, then reading directly into a string is
slightly faster. So far it seems to be faster by 1/10th of a second. I will try it out on
my computer at home which seems more reliable foe speed tests.
By the way, I tried reading the entire file in on text files of varying sizes between
1 and 5mb. And as fits the explanation above, chunk reading is still quicker.
How do you (simple, without API and stuff like that),
count the lines in a (Rich) Text Box?
Pentax
Iain, ok you seem to be one step ahead me, the processor which usually has nothing to do, could do something while waiting for information, but i'm almost sure this won't work in VB since it doesn't support multithreading. It will actually do everything step by step, reading the chunk, then compare it, harddisk does nothing, and then read again, so actually i'm suggesting it would be slower to read in chunks. Of course this is just a guess.
Now Instr is a slower method than i don't remember what it was, i guess i have to get back to my own computer and check that out, but there is a lot faster method.
I'm not sure, you might be right with the Byval thing, but from what i've heard Byval is faster than Byref.
Now simple line counting, this is for Pentax, using slow instr, you do for strings or string properties like Richtext.text, just replace buffer with that.Code:X=instr(Buffer,vbcrlf)
Do While X
X=instr(X+1,Buffer,vbcrlf)
Lines=Lines+1
Loop
It never ceases to amaze me Iain. You're very proud of that code and your split func. Looks like you try to give them out as much as you can ;)
Ok then, the lesson for today is on ByRef vs. ByVal ;)
They both have their advantages and disadvantages. I would encourage people to pass
variable by value whenever possible, but in some cases, you can get away with a ByRef
if you know what you are doing.
ByRef
Passes a memory pointer to the sub / function. This is an address that points to where the
actual variable is stored in memory, so any modification to the variable are mirrored in the
calling sub.
ByVal
Passes a copy of the variable to the sub / function. Thus the actual value of the variable is
placed on the stack. Any modification to the variable are local to this sub and are not
mirrored on the calling sub.
Above you can see the differences between ByRef and ByVal, though I am sure you knew
that anyway Kedaman. So why is it quicker to pass by reference. Well let us take an
example from the above function. Lets say we want to pass a string variable that consists of
10240 characters. This means the string takes up 10240 bytes, each character being one byte.
If we pass this variable ByVal, we need to place 10240 bytes on the stack. Alternatively, we
could pass it ByRef and place a 32 bit memory pointer on the stack (well I think it’s 32 bit).
Now it is a simple case of maths to work out which will be quicker.
Youre correct about Byval, it's slower than Byref, thanks for enlightening me.
Now testing out your code i got
ReDim bFile((lFileSize - (lSize * (lChunk - 1))) - 1) As Byte
subscript out of range here
This was for autoexec.bat, a small file
For explorer.exe, a 230 k file, my algo was faster, 3ms, while your was slower, 20 ms
For a smaller file 32k, a dat file containing excel stuff, i got 0 ms for your and 130 for mine, but i got 0 lines in yours and 10 in mine??!?!?
Ahhh, point taken. The only time i have been ablt to re-produce that error is if the file is actually empty.
Anyway, ever since you suggesetd the string chunks, i have been using these, and i dont get an error on a 0 byte file with that method.
I am not sure if the strings are much quicker, it seems to vary between which one is quicker, but it certainly makes sense that it would be quicker.
I haven't been able to reporduce a wrong count yet, but thanks for letting me know that there may be a problem.
Code:Function Linecount(text as string) as long
Dim x as long
If len(text) then Linecount=1
X=instr(text,vbcrlf)
Do While X
X=instr(X+1,text,vbcrlf)
Linecount=Linecount+1
Loop
End Function
[Edited by kedaman on 10-02-2000 at 04:51 AM]Code:'USAGE
Msgbox Linecount(text1)
How about this nifty little Function I wrote?
(ATTENTION: ONLY VB 6 (and higher ;) )
To test, try this in the Form_Load or so:Code:Private Function LineCount(ByRef FileName As String) As Long
Dim FF As Integer
Dim TStr As String * 10240
Dim lCnt As Long
' Get a free File-Handle
FF = FreeFile
' Open the file
Open FileName For Binary As FF Len = Len(TStr)
' Read whole file
While Not EOF(FF)
Get FF, , TStr ' Read a 10k Chunk
lCnt = lCnt + UBound(Split(TStr, vbCrLf)) ' Split in an array of strings,
' where delimiter = vbCrLf,
' then get the UBound of it...
Wend
' Return the value
LineCount = lCnt
End Function
Let me know?Code:Private Sub Form_Load()
Dim T As Long
Dim NuLines As Long
Dim StrtTime As Single
Const MAXLINES = 1000000
' Make sure we get a RANDOM value every time we run it..
Randomize Timer
' Open the file and write "NuLines" number of lines in the file...
Open "C:\Test.txt" For Output As #1
NuLines = CLng(Rnd * MAXLINES)
For T = 1 To NuLines
Print #1, "This is a line"
Next T
' Return how many lines actually are in the file
Debug.Print "Lines written : " & NuLines
Close #1
' Start the "StopWatch"
StrtTime = Timer
' Print the result
Debug.Print "Counted lines : " & LineCount("C:\Test.txt")
' And the time
Debug.Print "Time to count : " & Timer - StrtTime
End Sub
At least it's original (I think...)
[Edited by RobIII on 10-02-2000 at 07:45 AM]
Kedaman:
Will it count not only where the user puts a newline, but also where the textbox breaks the lines?
'Cause that's what I need, some funktion that checks how many lines there are in a textbox, which has multi-line set to true.
Pentax
Here is the quickest way I could find that counts the number of lines (seperated by vbCRLF, vbCR, or vbLF)
The file I have there is 178k with quite a few new lines in it. When I run this code, it produces the number 6698 which is correct (at least by my text editors reckoning). It doesn't account text boxes or any such fun though. I have to add one because the array is a 0 based array ...Code:'Code tested and verified using VB6
Dim InPath As String
Dim tmpLoad As String
Dim FF As Integer
Dim Lines() As String
Private Sub Command1_Click()
InPath = "C:\webpage\zonewriter\amber.zone"
FF = FreeFile
Open InPath For Input As #FF
tmpLoad = Input$(LOF(FF), FF)
Close #FF
If InStr(1, tmpLoad, vbLf) < 1 Then
Lines = Split(tmpLoad, vbCr)
Else
Lines = Split(tmpLoad, vbLf)
End If
MsgBox UBound(Lines) + 1
End Sub
This takes ~2 seconds to give you the number of lines in a text file (not a box) and it doesn't require the API ;)
I may be late in posting this, but I hope it helps you out.
I did a test with my code ... I loaded in a file that had 100000 lines in it to see how fast it would go and it took about 1.3 seconds to perform.
Whichever way you prefer. When you get rid of the text file, you're working with memory structures and not kludgy controls, so it's going to be much faster.
Pentax:
That's would require a lot more brainworking since the multiline breakpoints aren't stored anywere but a private variable inside the textbox control. Maybe there's an api for it, maybe not.
Although there's a way but it's damn slow. you use textwidth property of a picture or form method with the font you have in the textbox, and test each lines length and split them up if they exceed the textbox width. Not sure but could be fast too, if you don't have all text in one line and a 10 pixel textbox. Worth a try?
Hmmm.
It seems worth a try.
Else I thought of saving the text of a textbox into a temporary file, count the lines with one of the methods shown here (There are a few I can choose from...) and then delete the temporary file when the program ends, but it's a h*ll of a way to run a railroad.
Could you post a code-example of how you would do it?
The reason I need it is for my text editor. I want to have a statusbar at the bottom, telling the user how many signs, lines etc.
Thanks,
Pentax
Hey hey hey, you don't need all that to just count the lines, instr is good for counting lines, but i thought you wanted to count the wrapped ones too?Quote:
I thought of saving the text of a textbox into a temporary file, count the lines with one of the methods shown here (There are a few I can choose from...) and then delete the temporary file when the program ends, but it's a h*ll of a way to run a railroad.
I'm not sure but do you really need the statusbar to show the amount of wrapped lines+ linefeed in the text, instead of just linefeeds?
Well, if it's not too much of a trouble, that's what I want.
There's a program called EditPad (http://www.editpadpro.net/editpadclassic.html), and in the statusbar it shows all lines, not only linefeeds but also wrapped.
And yes, I want the wrapped one's too.
I thought that the methods shown here before would count them, but maybe not.
Is there anyone who has a good idea, please help me!
Pentax
When i was doing my test, i noticed that the API to count
the number of lines in a text box often returned the wrong
results. This is because it counts the number of lines in
the text box (including wrapped ones), and not the number
of carriage line feeds.
Problem? No Problem ;)
Nice discovery Iain :) Pentax, you got the answer, api!
Is it this one you mean?
Quote:
Originally posted by jbart
Or, (don't you just love all the alternatives), you can use the ever-popular API.
Code:Private Declare Function SendMessage Lib "user32" Alias "SendMessageA" (ByVal hwnd As Long, ByVal wMsg As Long, ByVal wParam As Long, lParam As Any) As Long
Private Const EM_GETLINECOUNT = &HBA
Private Sub Command1_Click()
Dim iLines As Integer
iLines = SendMessage(Text1.hwnd, EM_GETLINECOUNT, 0, 0)
MsgBox ("There are " & iLines & " lines in Text1")
End Sub
I must confess that I have never used API, os if anyone could give me a brief lesson in hte using of it, I'd be most grateful.
It seems very confusingn, wiht a lot of different variables and so on. Are all those really neccessary?
Thanks in advance,
Pentax
This part is called declarations, you put it where you declare all you module scope variables, at top of each fileCode:Private Declare Function SendMessage Lib "user32" Alias "SendMessageA" (ByVal hwnd As Long, ByVal wMsg As Long, ByVal wParam As Long, lParam As Any) As Long
Private Const EM_GETLINECOUNT = &HBA
This is the api call itself, it returns the function value proceed by user32.dll so you will only need to pass the correct parameters. You use the api calls as normal functions and subs :)Code:SendMessage(Text1.hwnd, EM_GETLINECOUNT, 0, 0)
place a commandbutton on the form and paste the code, it should work just with that.
Iain17/kedaman,
Not sure if you are still around after so long but if you are I had a question on your functions below. While they work great for files that have vbCrLF type linefeeds, what about files from Unix boxes where VB doesn't seem to recognize these as Carriage returns. I am basically running through Apache Logs and in Ian's case I get a 1 (like its all one line) and in kedaman's case I get a 0 because there are no vbCRLfs.
Thanks in advance and nice job on these!
You should have started a new topic instead of bringing up old threads.
Try using vBCr or vbLf in place of vbcrlf. One of them should work.