DanielLH
Jun 16th, 2008, 11:28 AM
Does anyone have any suggestions of how I could implement the following efficiently? I had a go, but started using too many loops and things got messy.
I have an input string that is a group of words without spaces eg "onetwothree". The number of words can vary.
I have a sentence such as "apple banana one two three pear strawberry"
I need a function that would identify and return "one two three" as per the sentence. If the sentence contained "one twothree" or "onetwo three" or even "on e tw o thr ee", that would be returned.
Also, what if multiple instances of the words in the sentence eg "apple onetwo three banana onetwo th r e e pear"???
Any suggestions? :confused:
NickThissen
Jun 16th, 2008, 11:38 AM
This is not math?
Oh well... Perhaps remove all the spaces first (from both the word you are looking for and the sentence) and then check for the existence of the word.
riteshjain1982
Jun 16th, 2008, 12:00 PM
even i am bit confused,but what about this
Dim strSentance As String = "apple banana one two three pear strawberry"
Dim strInput As String = "onetwothree"
Dim arrWord() As String
Dim strWord As String
strSentance = strSentance.Replace(" ", " ") 'Replace all double space with single
arrWord = strSentance.Split(" ")
For Each strWord In arrWord
If strInput.IndexOf(strWord) > -1 Then
MessageBox.Show("Found:" & strWord)
End If
Next
:D
DanielLH
Jun 16th, 2008, 02:42 PM
You're right. Not math. I'll move the thread and try explaining better.
Thanks.
jemidiah
Jun 17th, 2008, 05:23 AM
My algorithm:
1. Remove all spaces from your test sentence. Do so in the following way: Iterate through the test sentence. Whenever a non-space character is encountered, store the number of spaces that have already been encountered in a separate array using incrementing indecies.
2. Search for an exact match of your input string in the stripped-down sentence.
3. When a match is found, remember the position of the first character of the match. Now go back to the array you got from step 1, and look at the index of the match. You now know what character position the match started at in the original sentence (the one with spaces). To find out how many characters long the match is, look at the entry in the array that corresponds to the last character of the match.
Ex:
Input String = "onetwothree"
Test Sentence = "apple onetwo three banana onetwo th r e e pear"
After applying step one we have these two:
New Test Sentence = "appleonetwothreebananaonetwothreepear"
Array = (a->0, p->0, p->0, l->0, e->0, o->1, n->1, e->1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3...)
After applying step two we find the first match, "appleonetwothreebananaonetwothreepear" which starts at character position 6.
Now apply step three to get the return value:
At index 6 of the array (I'm assuming the first index is a 1), we have a 1. At index 6 + len(Input String) - 1 = 6 + 11 - 1 = 16, corresponding to the last "e" in "three", we have a 2. We then know there must have been 2-1 = 1 spaces in the "onetwothree" variant inside of the test sentence originally. Since we also know that the "onetwothree" variant starts at index 6+1=7 [only one space preceeded it], we can get the "onetwothree" variant from the original test sentence by starting at character position 7 and proceeding for len(Input String) + (2-1) = 11+1 = 12 characters.
Looking back at our test sentence, "apple onetwo three banana onetwo th r e e pear", sure enough we'll get "onetwo three" if we start at character position 7 and proceed for 12 characters.
If you wish to find multiple instances of the same words (rather than simply ignoring any instances of your search string past the first one) you could delete the part of the sentence up to and including the first instance you find and pass the newly culled sentence to the same routine. This would be recursively dealing with the problem of multiple instances.
To my knowledge everything I've mentioned here would be easy to code, though to do so efficiently for exceedingly large strings (megabytes or gigabytes long) would be much more of a hassle.