Results 1 to 30 of 30

Thread: [RESOLVED] [VB 2005] Frequency Dictionary

  1. #1

    Thread Starter
    Addicted Member
    Join Date
    Apr 2007
    Posts
    174

    Resolved [RESOLVED] [VB 2005] Frequency Dictionary

    Hello,
    I would like to build a 'frequency dictionary'.
    I have been wondering what would be the best way/algorith to do so.
    I know how to read each cell of an excel file. First I want to separate each 'word'. (surrouned by space).
    Once every cell has been "scanned" I want to count the occurence of each word.

    Not sure what would be the fastest, most logical way.

    thanks for input

  2. #2
    Frenzied Member
    Join Date
    May 2006
    Location
    Toronto, ON
    Posts
    1,093

    Re: [VB 2005] Frequency Dictionary

    Probably the easiest way would be pull it out into a datatable and then loop through each row to see what's in each column.

    Create a class with two properties, Word and Count. Create a list of these and as you go through each value, use the list.Find method to either add a new value to the list or increment the count of what's already there.

  3. #3

    Thread Starter
    Addicted Member
    Join Date
    Apr 2007
    Posts
    174

    Re: [VB 2005] Frequency Dictionary

    hi thanks ,
    i read about dictionaries list
    would not they be better ?

  4. #4
    Super Moderator jmcilhinney's Avatar
    Join Date
    May 2005
    Location
    Sydney, Australia
    Posts
    111,221

    Re: [VB 2005] Frequency Dictionary

    You haven't really described your data format adequately so it's anyone's guess how to break up the data in the first place. Once you have it broken up into words a Dictionary would be the way to go. Assuming you've broken your data up into a String array you would do this:
    vb.net Code:
    1. Dim wordFrequencies As New Dictionary(Of String, Integer)
    2.  
    3. For Each word As String In myStringArray
    4.     If wordFrequencies.ContainsKey(word) Then
    5.         wordFrequencies(word) += 1
    6.     Else
    7.         'First occurrence of this word.
    8.         wordFrequencies.Add(word, 1)
    9.     End If
    10. Next word
    If you wanted the words sorted alphabetically you could use a SortedDictionary or SortedList instead.
    Why is my data not saved to my database? | MSDN Data Walkthroughs
    VBForums Database Development FAQ
    My CodeBank Submissions: VB | C#
    My Blog: Data Among Multiple Forms (3 parts)
    Beginner Tutorials: VB | C# | SQL

  5. #5
    Frenzied Member
    Join Date
    May 2006
    Location
    Toronto, ON
    Posts
    1,093

    Re: [VB 2005] Frequency Dictionary

    Quote Originally Posted by Zoroxeus
    hi thanks ,
    i read about dictionaries list
    would not they be better ?
    Actually yes, that would probably be much faster. If you're using 2005 or 2008, you can make it a dictionary with a string key and an integer value, which will eliminate the need for any kind of casting while you're going through it:

    vb Code:
    1. Dim sd As New Dictionary(Of String, Integer)
    2.  
    3. 'Change these two lines to pulling from Excel file
    4. Dim str As String = "This is this line of this line"
    5. Dim arrStr As String() = str.Split(" "c)
    6.  
    7. For Each s As String In arrStr
    8.  
    9.     s = s.ToLower
    10.  
    11.     If sd.ContainsKey(s) Then
    12.  
    13.         sd(s) = sd(s) + 1
    14.  
    15.     Else
    16.  
    17.         sd.Add(s, 1)
    18.  
    19.     End If
    20.  
    21. Next

  6. #6

    Thread Starter
    Addicted Member
    Join Date
    Apr 2007
    Posts
    174

    Re: [VB 2005] Frequency Dictionary

    Hi
    thanks
    two things I don't understand
    if myStringArray is a string by doing "for each word" it splits in charachters, and does not split in word
    Tom Sawyer's method works although i would prefer to use the "For each" technique.

    Is there a way to go from Value 0 to the last value and display each "key" instead of integer...

    because right now i know how to display the values, but not the keys,

    thanks

  7. #7
    Super Moderator jmcilhinney's Avatar
    Join Date
    May 2005
    Location
    Sydney, Australia
    Posts
    111,221

    Re: [VB 2005] Frequency Dictionary

    First up, something named "myStringArray" is hardly going to be a String. It will be a String array.

    As for looping through a Dictionary, it could look like this:
    vb.net Code:
    1. For Each frequency As KeyValuePair(Of String, Integer) In wordFrequencies
    2.     MessageBox.Show(frequency.Value.ToString(), frequency.Key)
    3. Next frequency
    where the generic type of the KeyValuePair must match the generic type of the Dictionary. It could also look like this:
    vb.net Code:
    1. For Each word As String In wordFrequencies.Keys
    2.     MessageBox.Show(wordFrequencies(word).ToString(), word)
    3. Next word
    Why is my data not saved to my database? | MSDN Data Walkthroughs
    VBForums Database Development FAQ
    My CodeBank Submissions: VB | C#
    My Blog: Data Among Multiple Forms (3 parts)
    Beginner Tutorials: VB | C# | SQL

  8. #8

    Thread Starter
    Addicted Member
    Join Date
    Apr 2007
    Posts
    174

    Re: [VB 2005] Frequency Dictionary

    Hi jmcilhinney
    You are right, i found out my mistake
    I was trying to do with the "each" method, but was not sure what parameters i could used

    When displaying the content of the dicitonary is there a way to show the list sorted ? Or do I have to use a sorted dictionary ??
    thanks

  9. #9
    Frenzied Member
    Join Date
    May 2006
    Location
    Toronto, ON
    Posts
    1,093

    Re: [VB 2005] Frequency Dictionary

    If you just change it from a Dictionary to a SortedDictionary, it'll sort it for you.

  10. #10

    Thread Starter
    Addicted Member
    Join Date
    Apr 2007
    Posts
    174

    Re: [VB 2005] Frequency Dictionary

    Thanks,
    Yes I had tried, but it does not show things properly.
    First it shows sorted by Key, I would like by Value.

    As well is there a way, a method of getting rid of punctuations
    when i do the split using " " ... i get a lot words such as (hello or .bye or ~approx....

    I am quite unfamiliar with diciotnaries, and really wonder why it's doing this... and how it could be fixed
    Last edited by Zoroxeus; Mar 10th, 2008 at 03:52 PM.

  11. #11
    Super Moderator jmcilhinney's Avatar
    Join Date
    May 2005
    Location
    Sydney, Australia
    Posts
    111,221

    Re: [VB 2005] Frequency Dictionary

    There's nothing to "fix" with the Dictionary. It's doing exactly what it's supposed to do. There is no collection that will sort by value. If you want that then you're going to have to provide the sorting logic yourself.

    As for punctuation included in words, of course if you split on space characters then the punctuation will be left behind. Why would splitting arbitrarily remove characters? Again, if you want to trim certain characters off the words you end up with after splitting then you need to provide the logic. The String class has a trim method that you can call on each word to remove the characters you don't want.
    Why is my data not saved to my database? | MSDN Data Walkthroughs
    VBForums Database Development FAQ
    My CodeBank Submissions: VB | C#
    My Blog: Data Among Multiple Forms (3 parts)
    Beginner Tutorials: VB | C# | SQL

  12. #12

    Thread Starter
    Addicted Member
    Join Date
    Apr 2007
    Posts
    174

    Re: [VB 2005] Frequency Dictionary

    Thanks,
    regarding the punctuations I thought there would some kind of built in function already doing this.

    Regarding the sorting, would not a sorted list be better then ?

  13. #13
    Super Moderator jmcilhinney's Avatar
    Join Date
    May 2005
    Location
    Sydney, Australia
    Posts
    111,221

    Re: [VB 2005] Frequency Dictionary

    Quote Originally Posted by Zoroxeus
    Thanks,
    regarding the punctuations I thought there would some kind of built in function already doing this.
    Splitting on spaces is not splitting into words. It's just splitting on spaces. The system doesn't differentiate any of the other characters. It simply finds all the spaces, removes them and makes an array from the substrings in between. If you want those substrings to contain only specific characters then it's up to you to specify which characters to omit or which to include.
    Quote Originally Posted by Zoroxeus
    Regarding the sorting, would not a sorted list be better then ?
    There are no collections that automatically sort on value. The SortedList sorts on key. If you have frequency values by word then the SortedList will sort in alphabetical order by word, not numerical order by frequency. If you want the data sorted by frequency then you have to do it yourself.
    Why is my data not saved to my database? | MSDN Data Walkthroughs
    VBForums Database Development FAQ
    My CodeBank Submissions: VB | C#
    My Blog: Data Among Multiple Forms (3 parts)
    Beginner Tutorials: VB | C# | SQL

  14. #14

    Thread Starter
    Addicted Member
    Join Date
    Apr 2007
    Posts
    174

    Re: [VB 2005] Frequency Dictionary

    hey,
    first thanks

    i managed to play around with the strings and the New Char[] thing... and removed a lot stuff
    i also use the "replace" method before to get rid of paragraph marks, and double quotes, as i am not sure how to put it.
    for example for . i write "."c , for commas: ","c
    but for double quotes, and paragraphs, do you know how ?

    as well when i am spliting the array, is there a way to get rid of "numbers" for example 2007, 2008...

    also if i want to test if a string is equal to a list of strings how come
    if myString = {"string1", "string2", ...}

    does not work ?

    thanks
    Last edited by Zoroxeus; Mar 11th, 2008 at 02:15 PM.

  15. #15
    Frenzied Member
    Join Date
    May 2006
    Location
    Toronto, ON
    Posts
    1,093

    Re: [VB 2005] Frequency Dictionary

    For double-quotes, use chr(34) and for line breaks, use chr(13).

  16. #16
    Super Moderator jmcilhinney's Avatar
    Join Date
    May 2005
    Location
    Sydney, Australia
    Posts
    111,221

    Re: [VB 2005] Frequency Dictionary

    Quote Originally Posted by Tom Sawyer
    For double-quotes, use chr(34) and for line breaks, use chr(13).
    For double quotes just escape with another double quote, i.e. """"c, or you can use ControlChars.Quote.

    13 is the ASCII code for a carriage return. That's not a line break. In Windows a line break is officially a carriage return followed by a line feed, which has an ASCII code of 10. In Unix-based systems it's just a line feed. Nowhere is it just a carriage return. If you split a Windows-based text file on just carriage returns or just line feeds then you will leave the other behind, which will give you incorrect results. If you split a Unix-based text file on just carriage returns it won't split at all.

    If you want to split on just line feeds then use ControlChars.Lf. If you want to split on carriage return/line feed pairs then use ControlChars.CrLf, ControlChars.NewLine. If you want to split on whatever the standard line break is for the current system then use Environment.NewLine.
    Why is my data not saved to my database? | MSDN Data Walkthroughs
    VBForums Database Development FAQ
    My CodeBank Submissions: VB | C#
    My Blog: Data Among Multiple Forms (3 parts)
    Beginner Tutorials: VB | C# | SQL

  17. #17

    Thread Starter
    Addicted Member
    Join Date
    Apr 2007
    Posts
    174

    Re: [VB 2005] Frequency Dictionary

    hey,
    none of the methods you explained worked with Split...

    Instead i use replace(chr(x)), " " when i extract the string (x being the ascii value of what i want to replace)...
    I have encountered this issue when extracting strings from excel cells..

    Is there a way to compare a string to a string array ?

    When I do:
    Code:
    For each Word as string in ArrayofString
     if CompareArray.indexof(word) = -1 then
     end if
    Next
    I get an Overload error...




    PS: To look for a number i use IsNumeric...
    i thought that it was a method of string... so i could not find it at first

  18. #18
    Addicted Member ThatSamiam's Avatar
    Join Date
    Apr 2007
    Posts
    128

    Re: [VB 2005] Frequency Dictionary

    To be rid of non-alpha characters, try using regular expressions rather than using Split and Replace. It will require less coding and run faster.

  19. #19
    Super Moderator jmcilhinney's Avatar
    Join Date
    May 2005
    Location
    Sydney, Australia
    Posts
    111,221

    Re: [VB 2005] Frequency Dictionary

    Quote Originally Posted by ThatSamiam
    To be rid of non-alpha characters, try using regular expressions rather than using Split and Replace. It will require less coding and run faster.
    That's a fine suggestion. The Regex class actually has a Split method itself, so you can split on a pattern instead of a character or substring.
    Why is my data not saved to my database? | MSDN Data Walkthroughs
    VBForums Database Development FAQ
    My CodeBank Submissions: VB | C#
    My Blog: Data Among Multiple Forms (3 parts)
    Beginner Tutorials: VB | C# | SQL

  20. #20

    Thread Starter
    Addicted Member
    Join Date
    Apr 2007
    Posts
    174

    Re: [VB 2005] Frequency Dictionary

    Hi thanks
    I am actually reading about reg ex, and it looks very promising
    website at www.regular-expressions.info is quite well done and clear


    However do you know why i get an error when doing:

    For each Word as string in ArrayofString
    if CompareArray.indexof(word) = -1 then
    end if
    Next

    thanks

  21. #21
    Super Moderator jmcilhinney's Avatar
    Join Date
    May 2005
    Location
    Sydney, Australia
    Posts
    111,221

    Re: [VB 2005] Frequency Dictionary

    Please do not ever ask about an error without specifying what the error is. We shouldn't have to guess.
    Why is my data not saved to my database? | MSDN Data Walkthroughs
    VBForums Database Development FAQ
    My CodeBank Submissions: VB | C#
    My Blog: Data Among Multiple Forms (3 parts)
    Beginner Tutorials: VB | C# | SQL

  22. #22

    Thread Starter
    Addicted Member
    Join Date
    Apr 2007
    Posts
    174

    Re: [VB 2005] Frequency Dictionary

    Hi
    sorry to have forgotten to mention the error.
    I get an "Overload resolution failed because no accessible 'IndexOf' accepts this number of arguments.

  23. #23
    Super Moderator jmcilhinney's Avatar
    Join Date
    May 2005
    Location
    Sydney, Australia
    Posts
    111,221

    Re: [VB 2005] Frequency Dictionary

    So now we know what we're looking for the solution is easy to see. You need to take notice of what Intellisense tells you for a start. While you were typing it would have shown you what parameters were needed and you would have seen that IndexOf needed more than one parameter. You'd have also seen that Array.IndexOf is Shared, so you can't call it on an instance. The MSDN documentation would than have shown you the correct way to call it:
    vb.net Code:
    1. If Array.IndexOf(CompareArray, word) = -1 Then
    Why is my data not saved to my database? | MSDN Data Walkthroughs
    VBForums Database Development FAQ
    My CodeBank Submissions: VB | C#
    My Blog: Data Among Multiple Forms (3 parts)
    Beginner Tutorials: VB | C# | SQL

  24. #24

    Thread Starter
    Addicted Member
    Join Date
    Apr 2007
    Posts
    174

    Re: [RESOLVED] [VB 2005] Frequency Dictionary

    Hi,
    thanks
    I still i am not sure how we can have "array.indexof" since array has never been defined anywhere...
    i did not even know we could have done that...
    it's like saying integer.indexof... it's a bit strange to me.
    I always thought we needed to have the variable and then from the variable we can access various methods...

  25. #25
    Frenzied Member
    Join Date
    May 2006
    Location
    Toronto, ON
    Posts
    1,093

    Re: [RESOLVED] [VB 2005] Frequency Dictionary

    It's a shared method the the Array class, so you can use it without creating an instance of it.
    (VB/C#) is clearly superior to (C#/VB) because it (has/doesn't have) <insert trivial difference here>.

  26. #26
    Super Moderator jmcilhinney's Avatar
    Join Date
    May 2005
    Location
    Sydney, Australia
    Posts
    111,221

    Re: [RESOLVED] [VB 2005] Frequency Dictionary

    If you want to call instance methods then yes, you do need an instance. Shared methods are members of the class though, not of any particular instance, so they are called on the class.

    have you ever called MessageBox.Show? Did you declare MessageBox beforehand? No, you didn't. Show is a Shared member of the MessageBox class. Likewise IndexOf is a Shared member of the Array class, so you call it on the Array class itself and then pass the array instance as a parameter.

    You mention Integer.IndexOf. There is no IndexOf for the Integer type but it does have other Shared members. For instance, if you want to parse a String and return an Integer value you could call Integer.Parse:
    vb.net Code:
    1. Dim int As Integer = Integer.Parse(myString)
    You don't have an Integer instance to begin with so how could you call Parse on it? You call the Shared Parse method and it returns an Integer instance. There are plenty of other examples throughout the Framework.
    Why is my data not saved to my database? | MSDN Data Walkthroughs
    VBForums Database Development FAQ
    My CodeBank Submissions: VB | C#
    My Blog: Data Among Multiple Forms (3 parts)
    Beginner Tutorials: VB | C# | SQL

  27. #27

    Thread Starter
    Addicted Member
    Join Date
    Apr 2007
    Posts
    174

    Re: [RESOLVED] [VB 2005] Frequency Dictionary

    Thanks,
    I just said Integer.IndexOf ... because nothing was coming to my mind...
    it makes more sense now with your explanation
    Where is there like a high level view of the classes ? I want to see from "point zero" ... and see all the ramifications, is it possible ?

  28. #28
    Super Moderator jmcilhinney's Avatar
    Join Date
    May 2005
    Location
    Sydney, Australia
    Posts
    111,221

    Re: [RESOLVED] [VB 2005] Frequency Dictionary

    The Help menu of Visual Studio of course.
    Why is my data not saved to my database? | MSDN Data Walkthroughs
    VBForums Database Development FAQ
    My CodeBank Submissions: VB | C#
    My Blog: Data Among Multiple Forms (3 parts)
    Beginner Tutorials: VB | C# | SQL

  29. #29

    Thread Starter
    Addicted Member
    Join Date
    Apr 2007
    Posts
    174

    Re: [RESOLVED] [VB 2005] Frequency Dictionary

    Thanks, but the help menu of VS 2k5 just gets me to the famous msdn online thing... which is not a very clear help ... and i don't find it complete ... sometimes by searching online i find so much more about a function rather than msdn ...

  30. #30
    Super Moderator jmcilhinney's Avatar
    Join Date
    May 2005
    Location
    Sydney, Australia
    Posts
    111,221

    Re: [RESOLVED] [VB 2005] Frequency Dictionary

    The search online for specific topics when you need to, but there's nowhere else that you're going to find a topic dedicated every single type and member in the .NET Framework class library like you can in the MSDN Library. That's what you asked for and that's what it provides.
    Why is my data not saved to my database? | MSDN Data Walkthroughs
    VBForums Database Development FAQ
    My CodeBank Submissions: VB | C#
    My Blog: Data Among Multiple Forms (3 parts)
    Beginner Tutorials: VB | C# | SQL

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  



Click Here to Expand Forum to Full Width