|
-
May 8th, 2010, 05:15 PM
#1
Thread Starter
Hyperactive Member
[RESOLVED] Loop is slowing down greatly.
I have a loop in my program that iterates at the start about 20,000 - 25,000 times per second. By the end of the loop, the iteration is down to about 6000 per second.
Here is my code.
Code:
Sub FindAnagrams(ByVal s As String)
' Remove the string we are currently using from the queue
If Not AnagramQueue.Count = 0 Then AnagramQueue.RemoveAt(0)
' Loop through each character in the string
For n As Integer = 0 To s.Length - 1
' Since all Anagrams of a particular word have the same exact letters,
' alphabetizing them will help us weed out the duplicates.
AlphaString = Alphabetize(s)
' This loop lops of the nth letter and adds the rest of the string to our lists.
While AlphaString.Length > 0
NumPathsSearched += 1
' If the anagram hasn't been found yet, add it.
If Not FoundAnagrams.Contains(AlphaString) Then
AnagramQueue.Add(AlphaString)
FoundAnagrams.Add(AlphaString)
UpdateAnagramUI()
End If
' Prevent an OutOfRange exception
If n >= AlphaString.Length Then
Exit While
Else
AlphaString = AlphaString.Remove(n, 1)
End If
End While
' Recurse
While AnagramQueue.Count > 0
FindAnagrams(AnagramQueue(0))
End While
Next
End Sub
AnagramQueue and FoundAnagrams are List(Of String).
The Alphabetize function merely returns the string with it's Chars in abc order.
Can you see any bottlenecks or any reasons for the slowdown?
Prefix has no suffix, but suffix has a prefix.
-
May 8th, 2010, 05:20 PM
#2
Re: Loop is slowing down greatly.
As you are adding items to your collections throughout the loop, and checking for items that are already in the collection, then your code is bound to slow down.
Searching through a collection of 1 item is bound to be much faster than searching through a collection of 1000 items isn't it?
-
May 8th, 2010, 05:37 PM
#3
Thread Starter
Hyperactive Member
Re: Loop is slowing down greatly.
Are you saying that FoundAnagrams.Contains is what is slowing it down?
I have a piece of code after that looks like so
Code:
For Each item In FoundAnagrams
If AnagramDict.ContainsKey(item) Then
AddItem(lbWords, AnagramDict(item))
End If
Next
AnagramDict has over 160,000 items in it.
Found Anagrams rarely has over 20,000 items.
The code above runs instantly, meaning for all 20,000-ish items in FoundAnagrams it searches the AnagramDict.
Though, AnagramDict is a Dictionary(Of String, List(Of String))
Where exactly do you see the bottleneck?
Prefix has no suffix, but suffix has a prefix.
-
May 8th, 2010, 06:15 PM
#4
Re: Loop is slowing down greatly.
Do you mind if I suggest a different approach...?
I take it you have a big list/array/dictionary/collection of words somewhere. Each of these words could be assigned a number (a erm Word/UInt). This number could be generated by assigning a bit to each letter (a=1, b=2, c=4 ... y=16777216, z=33554432) with the remaining 6 high bits representing the words length (1-63<<26). The Word number pairs could then be sorted first by the number, then alphabetically. This only needs to be done once.
It's quick to generate a number for any combination of letters. Its equally quick to find all the word number pairs that share the particular number (binary search). All that is required then is to make sure that the found words have the same number of letter duplicates as the search word.
The system lends itself to all sorts of word games, not just anagrams, and is blindingly fast.
-
May 8th, 2010, 06:29 PM
#5
Thread Starter
Hyperactive Member
Re: Loop is slowing down greatly.
Are you talking about my dictionary of unique words, AnagramDict?
The key is the alphabetized string of the word. The value is a List(Of String) containing all the words that match.
Creating the dictionary isn't the issue. It's super fast. Also, searching the dictionary is super fast.
The slowdown is inside the method I posted in the original post.
Should I changed FoundWords to a different collection type?
Prefix has no suffix, but suffix has a prefix.
-
May 8th, 2010, 06:43 PM
#6
Re: Loop is slowing down greatly.
Okay, using the anagram searching system I'm talking about (assigning each word a number based on it's length and letters), locating all the possible anagrams takes next to no time. The slow bit is the time it takes to display them on the screen.
-
May 8th, 2010, 06:52 PM
#7
Thread Starter
Hyperactive Member
Re: Loop is slowing down greatly.
Are you finding all anagrams that are the same length of the input word?
My program find all anagrams from the input word of any length up to the length of the word.
If the input word in 6 letters long it will find all the 6-, 5-, 4-, 3- and 2- letter words.
Which one are you doing? Anagrams of the exact length of the input or all possible anagrams?
Also, I should add that the program is only slow on words over 15 characters long. It also depends on the diversity of the word.
Words with more of the same letters take considerably less time.
Prefix has no suffix, but suffix has a prefix.
-
May 8th, 2010, 07:04 PM
#8
Re: Loop is slowing down greatly.
Finding anagrams of the same length is quickest but the approach lends itself to both.
The program I wrote did both, it was based on query strings with wildcards (scabble and crosswords)
Rather than one binary search it becomes a sequential bitwise search from 0 to the 'search number'.
[pseudo]If (SearchNumber And WordNumber) = Wordnumber Then we have a possible anagram, check for duplicates.[/pseudo]
The key is that working with numbers is much faster than working with strings.
Last edited by Milk; May 8th, 2010 at 07:08 PM.
Reason: more bumf
W o t . S i g
-
May 8th, 2010, 07:11 PM
#9
Thread Starter
Hyperactive Member
Re: Loop is slowing down greatly.
Ah, I see. Let me tinker with this for a bit. Thanks.
Prefix has no suffix, but suffix has a prefix.
-
May 8th, 2010, 07:49 PM
#10
Re: Loop is slowing down greatly.
oopsy in my Pseudo...
Code:
If (SearchNumber Or (WordNumber And LetterMask)) = SearchNumber Then...
I hope you get what I mean :/
Last edited by Milk; May 8th, 2010 at 07:57 PM.
Reason: double oopsy, doh!
W o t . S i g
-
May 8th, 2010, 10:14 PM
#11
Thread Starter
Hyperactive Member
Re: Loop is slowing down greatly.
Ok, from what you said in post #4, I would assign a bit to each letter.
So, the word cab would be 7 (111 in binary). Now the length is three. 3 << 26 is 201326592 for a total of 201326599. That's fine and dandy.
Let's say we have a word, cabb. What would I do with the extra b? It's bit is already used.
Of course, I may have misunderstood what you were trying to say.
Prefix has no suffix, but suffix has a prefix.
-
May 8th, 2010, 11:59 PM
#12
Thread Starter
Hyperactive Member
Re: Loop is slowing down greatly.
Ok, I got it figured out but it is still probably pretty inefficient.
Here is my binarysearch function
Code:
Private Function BinarySearch(ByRef List As List(Of Long), ByVal value As Long) As Boolean
Dim low As Long = 0
Dim high As Long = List.Count - 1
Dim midv As Long = 0
While (low <= high)
midv = (low + high) / 2
If (List(midv) > value) Then
high = midv - 1.0
ElseIf (List(midv) < value) Then
low = midv + 1
Else
Return True
End If
End While
Return False
End Function
I even rewrote my dictionary function to store words as numbers.
Here is the function I use to get the binary value of a word.
Code:
Function GetBinary(ByVal s As String) As Long
Dim alpha As String = "abcdefghijklmnopqrstuvwxyz"
Dim l As Long
For Each c In s
l += 1 << alpha.IndexOf(c)
Next
l += s.Length << 26
Return l
End Function
I can tell my algorithm for making anagrams is still super inefficient. In a benchmark test word I found 2.98 million possible anagrams. Only 40,213 of them were unique. That means I did 2.94 million unnecessary searches.
Every piece of code on anagram algorithms I find on the net are in c++ or some older language. I can't even begin to convert them to vb.
Btw, the benchmark word was bigbiggerbiggestsmall. The entire test took 63 seconds. Average of 47k+ searches per second.
Let me know if you see any spots I can make my code more efficient.
Thanks.
Prefix has no suffix, but suffix has a prefix.
-
May 9th, 2010, 02:30 AM
#13
Thread Starter
Hyperactive Member
Re: Loop is slowing down greatly.
Well, I found out the Binary thing isn't going to work. Some words have the same value even though they have different letters.
Using my GetBinary function aaaad and bacca have the same value. I guess I must look for another method.
Prefix has no suffix, but suffix has a prefix.
-
May 9th, 2010, 07:01 AM
#14
Re: Loop is slowing down greatly.
aaaad and bacca should not generate the same number, caaad and bacca should though. It looks like you are adding, not setting bits. l = l Or (1 << alpha.IndexOf(c))
caaad is obviously not an anagram of bacca so a further test is needed to determine if they both share the same number of letter duplicates. The point being this final test, although slow, only needs to be performed on a very small proportion of the words. The numbers let you weed out most of the words very quickly.
As you are interested in all the possible anagrams of all lengths then the Binary search is perhaps not so useful. A sequential search through the ordered list/array would be better.
Here's a small sample of the SOWPODS list, scored and ordered.
Code:
Index Code Word
133876 537845776 STERNPORT
133877 537845776 STERNPOST
133878 537845777 PATENTORS
133879 537845777 PATRONESS
133880 537845777 PATRONNES
133881 537845777 PERSONATE
133882 537845777 PROSTERNA
133883 537845777 TRANSPOSE
133884 537845780 COPRESENT
133885 537845780 STONECROP
133886 537845781 COPARENTS
133887 537845781 PORTANCES
133888 537845781 SPORTANCE
133889 537845784 DROPSTONE
133890 537845808 FORESPENT
133891 537845808 PENTROOFS
133892 537845825 TRAGOPANS
133893 537845905 HAPTERONS
133894 537845920 SHOPFRONT
133895 537846016 POSITRONS
133896 537846016 SORPTIONS
133897 537846016 TROPONINS
133898 537846017 PARPOINTS
133899 537846017 PROTISTAN
133900 537846017 SOPRANIST
133901 537846017 SPIRATION
133902 537846020 CONSCRIPT
133903 537846020 OPTRONICS
133904 537846020 PROCINCTS
133905 537846032 ENTROPIES
133906 537846032 INTERPOSE
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules
|
Click Here to Expand Forum to Full Width
|