|
-
Mar 5th, 2008, 01:42 AM
#1
Thread Starter
Addicted Member
[RESOLVED] [VB 2005] Frequency Dictionary
Hello,
I would like to build a 'frequency dictionary'.
I have been wondering what would be the best way/algorith to do so.
I know how to read each cell of an excel file. First I want to separate each 'word'. (surrouned by space).
Once every cell has been "scanned" I want to count the occurence of each word.
Not sure what would be the fastest, most logical way.
thanks for input
-
Mar 5th, 2008, 11:59 AM
#2
Re: [VB 2005] Frequency Dictionary
Probably the easiest way would be pull it out into a datatable and then loop through each row to see what's in each column.
Create a class with two properties, Word and Count. Create a list of these and as you go through each value, use the list.Find method to either add a new value to the list or increment the count of what's already there.
-
Mar 7th, 2008, 10:45 AM
#3
Thread Starter
Addicted Member
Re: [VB 2005] Frequency Dictionary
hi thanks ,
i read about dictionaries list
would not they be better ?
-
Mar 7th, 2008, 10:54 AM
#4
Re: [VB 2005] Frequency Dictionary
You haven't really described your data format adequately so it's anyone's guess how to break up the data in the first place. Once you have it broken up into words a Dictionary would be the way to go. Assuming you've broken your data up into a String array you would do this:
vb.net Code:
Dim wordFrequencies As New Dictionary(Of String, Integer) For Each word As String In myStringArray If wordFrequencies.ContainsKey(word) Then wordFrequencies(word) += 1 Else 'First occurrence of this word. wordFrequencies.Add(word, 1) End If Next word
If you wanted the words sorted alphabetically you could use a SortedDictionary or SortedList instead.
-
Mar 7th, 2008, 11:00 AM
#5
Re: [VB 2005] Frequency Dictionary
 Originally Posted by Zoroxeus
hi thanks ,
i read about dictionaries list
would not they be better ?
Actually yes, that would probably be much faster. If you're using 2005 or 2008, you can make it a dictionary with a string key and an integer value, which will eliminate the need for any kind of casting while you're going through it:
vb Code:
Dim sd As New Dictionary(Of String, Integer)
'Change these two lines to pulling from Excel file
Dim str As String = "This is this line of this line"
Dim arrStr As String() = str.Split(" "c)
For Each s As String In arrStr
s = s.ToLower
If sd.ContainsKey(s) Then
sd(s) = sd(s) + 1
Else
sd.Add(s, 1)
End If
Next
-
Mar 7th, 2008, 04:31 PM
#6
Thread Starter
Addicted Member
Re: [VB 2005] Frequency Dictionary
Hi
thanks
two things I don't understand
if myStringArray is a string by doing "for each word" it splits in charachters, and does not split in word
Tom Sawyer's method works although i would prefer to use the "For each" technique.
Is there a way to go from Value 0 to the last value and display each "key" instead of integer...
because right now i know how to display the values, but not the keys,
thanks
-
Mar 8th, 2008, 01:53 AM
#7
Re: [VB 2005] Frequency Dictionary
First up, something named "myStringArray" is hardly going to be a String. It will be a String array.
As for looping through a Dictionary, it could look like this:
vb.net Code:
For Each frequency As KeyValuePair(Of String, Integer) In wordFrequencies MessageBox.Show(frequency.Value.ToString(), frequency.Key) Next frequency
where the generic type of the KeyValuePair must match the generic type of the Dictionary. It could also look like this:
vb.net Code:
For Each word As String In wordFrequencies.Keys MessageBox.Show(wordFrequencies(word).ToString(), word) Next word
-
Mar 10th, 2008, 02:50 PM
#8
Thread Starter
Addicted Member
Re: [VB 2005] Frequency Dictionary
Hi jmcilhinney
You are right, i found out my mistake
I was trying to do with the "each" method, but was not sure what parameters i could used
When displaying the content of the dicitonary is there a way to show the list sorted ? Or do I have to use a sorted dictionary ??
thanks
-
Mar 10th, 2008, 03:11 PM
#9
Re: [VB 2005] Frequency Dictionary
If you just change it from a Dictionary to a SortedDictionary, it'll sort it for you.
-
Mar 10th, 2008, 03:26 PM
#10
Thread Starter
Addicted Member
Re: [VB 2005] Frequency Dictionary
Thanks,
Yes I had tried, but it does not show things properly.
First it shows sorted by Key, I would like by Value.
As well is there a way, a method of getting rid of punctuations
when i do the split using " " ... i get a lot words such as (hello or .bye or ~approx....
I am quite unfamiliar with diciotnaries, and really wonder why it's doing this... and how it could be fixed
Last edited by Zoroxeus; Mar 10th, 2008 at 03:52 PM.
-
Mar 10th, 2008, 05:19 PM
#11
Re: [VB 2005] Frequency Dictionary
There's nothing to "fix" with the Dictionary. It's doing exactly what it's supposed to do. There is no collection that will sort by value. If you want that then you're going to have to provide the sorting logic yourself.
As for punctuation included in words, of course if you split on space characters then the punctuation will be left behind. Why would splitting arbitrarily remove characters? Again, if you want to trim certain characters off the words you end up with after splitting then you need to provide the logic. The String class has a trim method that you can call on each word to remove the characters you don't want.
-
Mar 10th, 2008, 11:04 PM
#12
Thread Starter
Addicted Member
Re: [VB 2005] Frequency Dictionary
Thanks,
regarding the punctuations I thought there would some kind of built in function already doing this.
Regarding the sorting, would not a sorted list be better then ?
-
Mar 10th, 2008, 11:30 PM
#13
Re: [VB 2005] Frequency Dictionary
 Originally Posted by Zoroxeus
Thanks,
regarding the punctuations I thought there would some kind of built in function already doing this.
Splitting on spaces is not splitting into words. It's just splitting on spaces. The system doesn't differentiate any of the other characters. It simply finds all the spaces, removes them and makes an array from the substrings in between. If you want those substrings to contain only specific characters then it's up to you to specify which characters to omit or which to include.
 Originally Posted by Zoroxeus
Regarding the sorting, would not a sorted list be better then ?
There are no collections that automatically sort on value. The SortedList sorts on key. If you have frequency values by word then the SortedList will sort in alphabetical order by word, not numerical order by frequency. If you want the data sorted by frequency then you have to do it yourself.
-
Mar 11th, 2008, 02:12 PM
#14
Thread Starter
Addicted Member
Re: [VB 2005] Frequency Dictionary
hey,
first thanks
i managed to play around with the strings and the New Char[] thing... and removed a lot stuff
i also use the "replace" method before to get rid of paragraph marks, and double quotes, as i am not sure how to put it.
for example for . i write "."c , for commas: ","c
but for double quotes, and paragraphs, do you know how ?
as well when i am spliting the array, is there a way to get rid of "numbers" for example 2007, 2008...
also if i want to test if a string is equal to a list of strings how come
if myString = {"string1", "string2", ...}
does not work ?
thanks
Last edited by Zoroxeus; Mar 11th, 2008 at 02:15 PM.
-
Mar 11th, 2008, 02:48 PM
#15
Re: [VB 2005] Frequency Dictionary
For double-quotes, use chr(34) and for line breaks, use chr(13).
-
Mar 11th, 2008, 05:26 PM
#16
Re: [VB 2005] Frequency Dictionary
 Originally Posted by Tom Sawyer
For double-quotes, use chr(34) and for line breaks, use chr(13).
For double quotes just escape with another double quote, i.e. """"c, or you can use ControlChars.Quote.
13 is the ASCII code for a carriage return. That's not a line break. In Windows a line break is officially a carriage return followed by a line feed, which has an ASCII code of 10. In Unix-based systems it's just a line feed. Nowhere is it just a carriage return. If you split a Windows-based text file on just carriage returns or just line feeds then you will leave the other behind, which will give you incorrect results. If you split a Unix-based text file on just carriage returns it won't split at all.
If you want to split on just line feeds then use ControlChars.Lf. If you want to split on carriage return/line feed pairs then use ControlChars.CrLf, ControlChars.NewLine. If you want to split on whatever the standard line break is for the current system then use Environment.NewLine.
-
Mar 12th, 2008, 11:14 AM
#17
Thread Starter
Addicted Member
Re: [VB 2005] Frequency Dictionary
hey,
none of the methods you explained worked with Split...
Instead i use replace(chr(x)), " " when i extract the string (x being the ascii value of what i want to replace)...
I have encountered this issue when extracting strings from excel cells..
Is there a way to compare a string to a string array ?
When I do:
Code:
For each Word as string in ArrayofString
if CompareArray.indexof(word) = -1 then
end if
Next
I get an Overload error...
PS: To look for a number i use IsNumeric...
i thought that it was a method of string... so i could not find it at first
-
Mar 12th, 2008, 11:40 AM
#18
Addicted Member
Re: [VB 2005] Frequency Dictionary
To be rid of non-alpha characters, try using regular expressions rather than using Split and Replace. It will require less coding and run faster.
-
Mar 12th, 2008, 05:24 PM
#19
Re: [VB 2005] Frequency Dictionary
 Originally Posted by ThatSamiam
To be rid of non-alpha characters, try using regular expressions rather than using Split and Replace. It will require less coding and run faster.
That's a fine suggestion. The Regex class actually has a Split method itself, so you can split on a pattern instead of a character or substring.
-
Mar 12th, 2008, 05:42 PM
#20
Thread Starter
Addicted Member
Re: [VB 2005] Frequency Dictionary
Hi thanks
I am actually reading about reg ex, and it looks very promising
website at www.regular-expressions.info is quite well done and clear
However do you know why i get an error when doing:
For each Word as string in ArrayofString
if CompareArray.indexof(word) = -1 then
end if
Next
thanks
-
Mar 12th, 2008, 05:43 PM
#21
Re: [VB 2005] Frequency Dictionary
Please do not ever ask about an error without specifying what the error is. We shouldn't have to guess.
-
Mar 12th, 2008, 07:17 PM
#22
Thread Starter
Addicted Member
Re: [VB 2005] Frequency Dictionary
Hi
sorry to have forgotten to mention the error.
I get an "Overload resolution failed because no accessible 'IndexOf' accepts this number of arguments.
-
Mar 12th, 2008, 09:30 PM
#23
Re: [VB 2005] Frequency Dictionary
So now we know what we're looking for the solution is easy to see. You need to take notice of what Intellisense tells you for a start. While you were typing it would have shown you what parameters were needed and you would have seen that IndexOf needed more than one parameter. You'd have also seen that Array.IndexOf is Shared, so you can't call it on an instance. The MSDN documentation would than have shown you the correct way to call it:
vb.net Code:
If Array.IndexOf(CompareArray, word) = -1 Then
-
Mar 13th, 2008, 02:05 PM
#24
Thread Starter
Addicted Member
Re: [RESOLVED] [VB 2005] Frequency Dictionary
Hi,
thanks
I still i am not sure how we can have "array.indexof" since array has never been defined anywhere...
i did not even know we could have done that...
it's like saying integer.indexof... it's a bit strange to me.
I always thought we needed to have the variable and then from the variable we can access various methods...
-
Mar 13th, 2008, 02:08 PM
#25
Re: [RESOLVED] [VB 2005] Frequency Dictionary
It's a shared method the the Array class, so you can use it without creating an instance of it.
(VB/C#) is clearly superior to (C#/VB) because it (has/doesn't have) <insert trivial difference here>.
-
Mar 13th, 2008, 05:41 PM
#26
Re: [RESOLVED] [VB 2005] Frequency Dictionary
If you want to call instance methods then yes, you do need an instance. Shared methods are members of the class though, not of any particular instance, so they are called on the class.
have you ever called MessageBox.Show? Did you declare MessageBox beforehand? No, you didn't. Show is a Shared member of the MessageBox class. Likewise IndexOf is a Shared member of the Array class, so you call it on the Array class itself and then pass the array instance as a parameter.
You mention Integer.IndexOf. There is no IndexOf for the Integer type but it does have other Shared members. For instance, if you want to parse a String and return an Integer value you could call Integer.Parse:
vb.net Code:
Dim int As Integer = Integer.Parse(myString)
You don't have an Integer instance to begin with so how could you call Parse on it? You call the Shared Parse method and it returns an Integer instance. There are plenty of other examples throughout the Framework.
-
Mar 14th, 2008, 06:00 PM
#27
Thread Starter
Addicted Member
Re: [RESOLVED] [VB 2005] Frequency Dictionary
Thanks,
I just said Integer.IndexOf ... because nothing was coming to my mind...
it makes more sense now with your explanation
Where is there like a high level view of the classes ? I want to see from "point zero" ... and see all the ramifications, is it possible ?
-
Mar 14th, 2008, 08:28 PM
#28
Re: [RESOLVED] [VB 2005] Frequency Dictionary
The Help menu of Visual Studio of course.
-
Mar 14th, 2008, 09:48 PM
#29
Thread Starter
Addicted Member
Re: [RESOLVED] [VB 2005] Frequency Dictionary
Thanks, but the help menu of VS 2k5 just gets me to the famous msdn online thing... which is not a very clear help ... and i don't find it complete ... sometimes by searching online i find so much more about a function rather than msdn ...
-
Mar 15th, 2008, 01:45 AM
#30
Re: [RESOLVED] [VB 2005] Frequency Dictionary
The search online for specific topics when you need to, but there's nowhere else that you're going to find a topic dedicated every single type and member in the .NET Framework class library like you can in the MSDN Library. That's what you asked for and that's what it provides.
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules
|
Click Here to Expand Forum to Full Width
|