|
-
Mar 5th, 2008, 08:56 AM
#1
Thread Starter
Fanatic Member
[RESOLVED] [2005] Word Counter - Any improvments?
Hey guys and gals, I have spent (with the help of a friend) 2 days working on this piece of code for a word counter.
Now we have tested and tested all sorts of combinations, and so far I have not come along a single piece that does not work.
But if you have any suggestions on how to improve please let me know! 
Module Module1 Code:
Module Module1
Public Function MyWordCount(ByVal TextToBeCounted As String) As Integer
Dim SpacePos As Integer ' Stores the value returned from Instring where a space char is found.
Dim X As Integer ' X tells InString from which char position to start from.
Dim WordCount As Integer ' How many words there are.
Dim NoMore As Boolean ' Yes or No.
WordCount = 0
X = 1
NoMore = False
If Len(Trim(TextToBeCounted)) > 0 Then
Do While NoMore = False
SpacePos = InStr(X, Trim(TextToBeCounted), " ")
If SpacePos > 0 Then
If Asc(Mid(TextToBeCounted, X, 1)) > 64 And Asc(Mid(TextToBeCounted, X, 1)) < 91 Or Asc(Mid(TextToBeCounted, X, 1)) > 96 And Asc(Mid(TextToBeCounted, X, 1)) < 123 Or Asc(Mid(TextToBeCounted, X, 1)) > 47 And Asc(Mid(TextToBeCounted, X, 1)) < 58 Then
WordCount += 1
End If
X = SpacePos + 1
Do While InStr(X, Mid(TextToBeCounted, X, 1), " ") > 0
X += 1
Loop
Else
If Asc(Mid(TextToBeCounted, X, 1)) > 64 And Asc(Mid(TextToBeCounted, X, 1)) < 91 Or Asc(Mid(TextToBeCounted, X, 1)) > 96 And Asc(Mid(TextToBeCounted, X, 1)) < 123 Or Asc(Mid(TextToBeCounted, X, 1)) > 47 And Asc(Mid(TextToBeCounted, X, 1)) < 58 Then
WordCount += 1
End If
NoMore = True
End If
Loop
End If
MyWordCount = WordCount
End Function
End Module
-
Mar 5th, 2008, 09:04 AM
#2
Re: [2005] Word Counter - Any improvments?
Good work A couple of things though..
You're using nothing but VB6 legacy functions, if this wasnt the .NET forum, I wouldve been certain that this was VB6 code.
And alot of times you're calling the same method twice (or more), when you couldve just called it once and use the result twice, take this for example:
vb Code:
If Asc(Mid(TextToBeCounted, X, 1)) > 64 And Asc(Mid(TextToBeCounted, X, 1)) < 91 Or Asc(Mid(TextToBeCounted, X, 1)) > 96 And Asc(Mid(TextToBeCounted, X, 1)) < 123 Or Asc(Mid(TextToBeCounted, X, 1)) > 47 And Asc(Mid(TextToBeCounted, X, 1)) < 58 Then
WordCount += 1
End If
In this case, you're calling Asc(Mid(TextToBeCounted, X, 1)) alot of times, it could be optimized by doing like this:
vb Code:
Dim charValue As Integer = Asc(Mid(TextToBeCounted, X, 1))
If charValue > 64 And charValue < 91 Or charValue > 96 And charValue < 123 Or charValue > 47 And charValue < 58 Then
WordCount += 1
End If
And to optimize it even further, make use of the AndAlso/OrElse operators:
vb Code:
Dim charValue As Integer = Asc(Mid(TextToBeCounted, X, 1))
If charValue > 64 AndAlso charValue < 91 OrElse charValue > 96 AndAlso charValue < 123 OrElse charValue > 47 AndAlso charValue < 58 Then
WordCount += 1
End If
-
Mar 5th, 2008, 09:56 AM
#3
Thread Starter
Fanatic Member
Re: [2005] Word Counter - Any improvments?
You are very true Asc(Mid(TextToBeCounted, X, 1)) is repeated alot.
Thank you for your very quick and helpful post, I have taken your ideas.
What would you suggest, insead of using Asc and Mid then?
Thank you
-
Mar 5th, 2008, 10:21 AM
#4
Re: [2005] Word Counter - Any improvments?
I think .SubString is the .NET version of Mid if I remember correctly.
vb Code:
Dim myInteger As Integer = Asc(texttobeCounted.Substring(x, 1))
Also, you can use this which is more conventional for .NET:
vb Code:
If texttobeCounted.Trim.Length > 0 Then
I think inStr is from VB6 and usually you would use IndexOf but as you are looping through each char it probably wouldn't be viable so it's probably more trouble than it's worth changing it.
-
Mar 5th, 2008, 10:25 AM
#5
Re: [2005] Word Counter - Any improvments?
Here's what I would change: (Note that I dont know if this will improve the performance or not, I just think its a good idea to use these in a .NET language)
Instead of using the Trim() function, use the String's Trim method.
vb Code:
Trim(MyString)
'becomes:
MyString.Trim()
Instead of using the Len() function, use the String's Length property.
vb Code:
Len(MyString)
'becomes:
MyString.Length
Instead of using the Mid() function, use the String's Substring method:
vb Code:
Mid(MyString, X, 1)
'becomes:
MyString.Substring(X-1,1) 'Using substring, the first character in a string is at index 0, whereas Mid() sees 1 as the index of the first char.
And to be honest, I think Asc() is okay to use. As there is no really good .NET equivallent.
Edit: Ah yes I forgot InStr, good thing Stimbo brought that up
-
Mar 5th, 2008, 10:51 AM
#6
Thread Starter
Fanatic Member
Re: [2005] Word Counter - Any improvments?
Hey guys, I have gone through and changed all of the parts you have suggested.
I have run through the prog and it seems to be working fine. Here is the end piece if you are interested.
vb Code:
Module Module1
Public Function MyWordCount(ByVal TextToBeCounted As String) As Integer
Dim SpacePos As Integer ' Stores the value returned from Instring where a space char is found.
Dim X As Integer ' X tells InString from which char position to start from.
Dim WordCount As Integer ' How many words there are.
Dim NoMore As Boolean ' Yes or No.
Dim CharValue As Integer
WordCount = 0
X = 1
NoMore = False
If TextToBeCounted.Trim.Length > 0 Then
Do While NoMore = False
SpacePos = InStr(X, Trim(TextToBeCounted), " ")
If SpacePos > 0 Then
CharValue = Asc(TextToBeCounted.Substring(X - 1, 1))
If CharValue > 64 AndAlso CharValue < 91 OrElse CharValue > 96 AndAlso CharValue < 123 OrElse CharValue > 47 AndAlso CharValue < 58 Then
WordCount += 1
End If
X = SpacePos + 1
Do While InStr(X, (TextToBeCounted.Substring(X - 1, 1)), " ") > 0
X += 1
Loop
Else
If X <= TextToBeCounted.Length Then
CharValue = Asc(TextToBeCounted.Substring(X - 1, 1))
If CharValue > 64 AndAlso CharValue < 91 OrElse CharValue > 96 AndAlso CharValue < 123 OrElse CharValue > 47 AndAlso CharValue < 58 Then
WordCount += 1
End If
End If
NoMore = True
End If
Loop
End If
MyWordCount = WordCount
End Function
End Module
Edit: Sorry forgot to mention, I did not understand Indexof very much so that is why I have left Instr inside (for the time being, currently searching now.)
-
Mar 5th, 2008, 11:03 AM
#7
Re: [2005] Word Counter - Any improvments?
I think this solution is overly complicated. To count words, why not use a regular expression? They're very very fast and efficient.
This example is in VB6 but you can still use the same regular expression and the new .Net regex classes to simply count all of the matches. This way you can get your 20-30 lines of code down to about 2-3.
-
Mar 5th, 2008, 11:25 AM
#8
Thread Starter
Fanatic Member
Re: [2005] Word Counter - Any improvments?
Hey.... well that burst my bubble!
I had a look though RegExp is not supported in VB.Net.
-
Mar 5th, 2008, 11:29 AM
#9
Fanatic Member
Re: [2005] Word Counter - Any improvments?
check this out.
How to use regular expressions in vb.net
-
Mar 5th, 2008, 11:37 AM
#10
Thread Starter
Fanatic Member
Re: [2005] Word Counter - Any improvments?
Hey, thank you all so much for all of your help!
Kasracer and talkro I am sorry though I have read through both of these and I do not understand them enough to take that path.
Thank you both though.
-
Mar 5th, 2008, 11:45 AM
#11
Re: [2005] Word Counter - Any improvments?
Code:
Function CountWords(ByVal Text As String) As Long
Dim re As New RegularExpressions.Regex("\b\w+\b")
' the following pattern means that we're looking for a word character (\w)
' repeated one or more times (the + suffix), and that occurs on a word
' boundary (leading and trailing \b sequences)
' the Execute method does the search and returns a MatchCollection object
' which in turn exposes the Count property,
' i.e. the result we're interested into
CountWords = re.Matches(Text).Count
End Function
I copied pasted changed the link kas provided.
-
Mar 5th, 2008, 12:13 PM
#12
Re: [RESOLVED] [2005] Word Counter - Any improvments?
How 'bout this
Code:
Function CountWords(ByVal theText As String) As Int32
Dim loopCTR, wordCTR As Int32
Dim words() As String
Dim valid As String = "abcdefghijklmnopqrstuvwxyz" 'characters that define word beginning
Dim strip As String = ":;?/.>,<`~!@#$%^&*()-_=+[{}]|\'0123456789" & ControlChars.Quote 'strip this out
valid &= valid.ToUpper 'upper and lower
theText = theText.Trim 'get rid of lead/trail spaces
For loopCTR = 0 To strip.Length - 1 'strip characters out (is strip defined correctly?)
theText = theText.Replace(strip.Substring(loopCTR, 1), " ")
Next
words = theText.Trim.Split(" "c) 'split into an array
For loopCTR = 0 To words.Length - 1
If words(loopCTR) <> "" Then
If valid.IndexOf(words(loopCTR).Substring(0, 1)) <> -1 Then
wordCTR += 1
End If
End If
Next
End Function
I just threw this together, but my gut says strip may need attention.
Last edited by dbasnett; Mar 5th, 2008 at 01:52 PM.
-
Mar 5th, 2008, 01:51 PM
#13
Re: [RESOLVED] [2005] Word Counter - Any improvments?
I used the following as a test
Do While NoMore = False SpacePos = InStr(X, Trim(TextToBeCounted), ) If SpacePos > 0 Then If Asc(Mid(TextToBeCounted, X, 1)) > 64 And Asc(Mid(TextToBeCounted, X, 1)) < 91 Or Asc(Mid(TextToBeCounted, X, 1)) > 96 And Asc(Mid(TextToBeCounted, X, 1)) < 123 Or Asc(Mid(TextToBeCounted, X, 1)) > 47 And Asc(Mid(TextToBeCounted, X, 1)) < 58 Then WordCount += 1 End If X = SpacePos + 1 Do While InStr(X, Mid(TextToBeCounted, X, 1), ) > 0 X += 1 Loop Else If Asc(Mid(TextToBeCounted, X, 1)) > 64 And Asc(Mid(TextToBeCounted, X, 1)) < 91 Or Asc(Mid(TextToBeCounted, X, 1)) > 96 And Asc(Mid(TextToBeCounted, X, 1)) < 123 Or Asc(Mid(TextToBeCounted, X, 1)) > 47 And Asc(Mid(TextToBeCounted, X, 1)) < 58 Then WordCount += 1 End If NoMore = True End If Loop
My word count was 97. Microsoft Word thinks it is 124 words.
It looks like this after parsing
Do While NoMore False SpacePos InStr X Trim TextToBeCounted If SpacePos Then If Asc Mid TextToBeCounted X And Asc Mid TextToBeCounted X Or Asc Mid TextToBeCounted X And Asc Mid TextToBeCounted X Or Asc Mid TextToBeCounted X And Asc Mid TextToBeCounted X Then WordCount End If X SpacePos Do While InStr X Mid TextToBeCounted X X Loop Else If Asc Mid TextToBeCounted X And Asc Mid TextToBeCounted X Or Asc Mid TextToBeCounted X And Asc Mid TextToBeCounted X Or Asc Mid TextToBeCounted X And Asc Mid TextToBeCounted X Then WordCount End If NoMore True End If Loop
Last edited by dbasnett; Mar 5th, 2008 at 02:53 PM.
-
Mar 5th, 2008, 05:54 PM
#14
Re: [RESOLVED] [2005] Word Counter - Any improvments?
This also returns 124 words, and looks to be the simplest solution
vb Code:
Dim s As String = "Do While NoMore = False SpacePos = InStr(X, Trim(TextToBeCounted), ) If SpacePos > 0 Then If " & _
"Asc(Mid(TextToBeCounted, X, 1)) > 64 And Asc(Mid(TextToBeCounted, X, 1)) < 91 Or " & _
"Asc(Mid(TextToBeCounted, X, 1)) > 96 And Asc(Mid(TextToBeCounted, X, 1)) < 123 Or " & _
"Asc(Mid(TextToBeCounted, X, 1)) > 47 And Asc(Mid(TextToBeCounted, X, 1)) < 58 Then WordCount += 1 " & _
"End If X = SpacePos + 1 Do While InStr(X, Mid(TextToBeCounted, X, 1), ) > 0 X += 1 Loop Else If " & _
"Asc(Mid(TextToBeCounted, X, 1)) > 64 And Asc(Mid(TextToBeCounted, X, 1)) < 91 Or " & _
"Asc(Mid(TextToBeCounted, X, 1)) > 96 And Asc(Mid(TextToBeCounted, X, 1)) < 123 Or " & _
"Asc(Mid(TextToBeCounted, X, 1)) > 47 And Asc(Mid(TextToBeCounted, X, 1)) < 58 Then WordCount += 1 " & _
"End If NoMore = True End If Loop"
Console.WriteLine(s.Split(" "c).Length)
-
Mar 5th, 2008, 08:12 PM
#15
Re: [RESOLVED] [2005] Word Counter - Any improvments?
Unless you're writing this function for learning purposes, String.Split with option RemoveEmptyEntries will do the job in a single line of code.
-
Mar 5th, 2008, 09:39 PM
#16
Re: [RESOLVED] [2005] Word Counter - Any improvments?
 Originally Posted by wild_bill
This also returns 124 words, and looks to be the simplest solution
Simplest? Maybe (though the Regex is very simple and probably faster). Accurate? Probably not 100% but it's a very good idea.
-
Mar 6th, 2008, 04:36 AM
#17
Thread Starter
Fanatic Member
Re: [RESOLVED] [2005] Word Counter - Any improvments?
I had also used Word to test that, and my prog + code.
The reason behind the difference is that My prog counts ONLY A-Z a-z and 0-9.
WORD Counts = and > as well by the looks of it.
Once you remove all of the Misc symbols in the group of text (Which I do not want to be counted as words in my prog), the result is the same.
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules
|
Click Here to Expand Forum to Full Width
|