|
-
Jun 11th, 2008, 07:35 AM
#1
Thread Starter
PowerPoster
[RESOLVED] Need help with replace issue
Hi guys,
say I have the following data in 4 array elements:
I loop through each item in the array like this:
Code:
strEmailIn = Split(stremail, ",")
Call BubbleSort1(strEmailIn)
For C = 0 To UBound(strEmailIn())
If C = 0 Then
If Not InStr(strEmailIn(C), "@") = 0 Then
strHTML = Replace(strHTML, strEmailIn(C), "<a href=""mailto:" & strEmailIn(C) & """> " & strEmailIn(C) & "</a>")
Else
strHTML = Replace(strHTML, strEmailIn(C), "<a href=""http://" & Replace(strEmailIn(C) & """ target=""_blank""", "http://", "") & """> " & Replace(strEmailIn(C), "http://", "") & "</a> ")
End If
Else
If (strEmailIn(C) <> strEmailIn(C - 1)) Then
If Not InStr(strEmailIn(C), "@") = 0 Then
strHTML = Replace(strHTML, strEmailIn(C), "<a href=""mailto:" & strEmailIn(C) & """> " & strEmailIn(C) & "</a>")
Else
strHTML = Replace(strHTML, strEmailIn(C), "<a href=""http://" & Replace(strEmailIn(C) & """ target=""_blank""", "http://", "") & """> " & Replace(strEmailIn(C), "http://", "") & "</a> ")
End If
End If
End If
Next C
im doing an rtb to html conversion. Problem is for example:
www.vbforums.co.za has been replaced with:
Code:
<a href="http://www.vbforums.com" target="_blank""> www.vbforums.com</a>
but when the loop reaches:
http://www.vbforums.com it replaces the previous www.vbforums.com with another:
Code:
<a href="http://www.vbforums.com" target="_blank""> www.vbforums.com</a>
and thus my html goes off.
How can I work around this
-
Jun 11th, 2008, 12:20 PM
#2
Fanatic Member
Re: Need help with replace issue
I'm not sure, but I think is is because when you replace a string with a larger string and continue searching the string from the last character place, you're actually searching the replacement.
For example, let's say I want to replace the "test.com" with "www.test.com" in the string "http://test.com".
Using your method, I would search through the string one character at a time and arrive at "test.com" and position 8. Replacing "test.com" with "www.test.com" I would get this string: "http://www.test.com". The loop would increment by one, so we'd be looking for "test.com" starting at position 9. Position 9, however, would be the second "w" of the "www" portion. Therefore, when we get to position 13, we're at the start of the next "test.com," which is actually the middle of the replacement.
To fix this, there are a few options. You can set up a do loop block outside the for next loop and loop until a flag gets set to true. When you find the string and replace it, you can exit the for loop. You wouldn't set the flag until you found no further replacements. The beginning of the for next loop would have to be a variable that would be changed when you replace text to be the position after the text. By doing this, you'd be exiting the for next loop then reinitializing it with the new parameters.
I'm not sure of any other way to do that as modyfying the array in your code would change the ubound, and thus it would change the parameters of the for next loop, which would have to be reinitialized.
-
Jun 11th, 2008, 09:52 PM
#3
Re: Need help with replace issue
Before doing the replace, tag first the text you're gonna process later with a loop, e.g. nest as a comment with array index as tag value or some other unique representation or key that would not be affected by the data in the array.
Replace at loop would then become:
strHTML = Replace(strHTML, "<!--" & C & "-->", "<a href=""mailto:" & strEmailIn(C) & """> " & strEmailIn(C) & "</a>")
And if you end up with "<!--" & C & "-->" tags after processing then there's something wrong with the logic as it missed replacing these items.
Last edited by leinad31; Jun 11th, 2008 at 09:56 PM.
-
Jun 12th, 2008, 04:27 AM
#4
Thread Starter
PowerPoster
Re: Need help with replace issue
thanks for the suggestions guys. Leinad31, please explain your post to me. It seems like a great idea but i'm confused
-
Jun 12th, 2008, 05:07 AM
#5
Re: Need help with replace issue
First pass through array you convert instances of text to <!--1-->, <!--2-->, etc. Second pass through array converts these comments to links, since they are in comment format you won't accidentally replace other text (except if <!--1--> and similar already exists in strHTML before you began any processing).
-
Jun 12th, 2008, 05:14 AM
#6
Re: Need help with replace issue
leinad, you always give great advice, but sometimes an example is worth a thousand words.
-
Jun 12th, 2008, 06:15 AM
#7
Thread Starter
PowerPoster
Re: Need help with replace issue
correct me if i'm wrong. But I replaced all instances of "http://" with "" before running my loops and it works great
-
Jun 12th, 2008, 06:15 AM
#8
Re: Need help with replace issue
strHTML = "www.forum.co.za___www.vbforums.com___http://[email protected]"
Note that strHTML contains strings in your array. You iterate through your array and replace with comment format. You end up with
strHTML = "<!--0-->___<!--1-->___<!--2-->___<!--3-->"
You iterate again through the array to replace comment format with link format. Such as
strHTML = Replace(strHTML, "<!--" & C & "-->", "<a href=""mailto:" & strEmailIn(C) & """> " & strEmailIn(C) & "</a>")
If C was zero then it would replace <!--0--> and you'll get
strHTML = "<a href=""mailto:www.forum.co.za"">www.forum.co.za</a>___<!--1-->___<!--2-->___<!--3-->"
Just continue the process.
You can make other variations (or use of other string tokens instead of comment)... Just bear in mind central idea which is use of tokens. I used comment form of token to keep sample simple.
Cons of using comment form of token is if its not converted to a link successfully then the text is no longer visible on page. An alternative would be use token "<a>" & strEmailIn(C) & "</a>" and search for that when replacing.
-
Jun 12th, 2008, 06:16 AM
#9
Re: Need help with replace issue
 Originally Posted by Nitesh
correct me if i'm wrong. But I replaced all instances of "http://" with "" before running my loops and it works great 
Will work until a set of data you didn't anticipate for comes along. Consider www.google.com.ph followed later by www.google.com.uk, finally followed by www.google.com which affects links of previous two.
Use tokens.
Last edited by leinad31; Jun 12th, 2008 at 08:27 PM.
-
Jun 12th, 2008, 07:58 AM
#10
Fanatic Member
Re: Need help with replace issue
-
Jun 12th, 2008, 07:30 PM
#11
Re: Need help with replace issue
-
Jun 12th, 2008, 09:10 PM
#12
Re: Need help with replace issue
The method suggested by leinad is the best way for your case. With this method you don't need to use sorting.
You may clearly see the problem when one item is a substring of other one or more items.
For example, with
str = "ADBCDEF"
now you want to replace "CD" with "xCDy" and replace "D" with "Dx" and want to have final as: str = "ADxBxCDyEF"
Method 1 (2 steps):
str = Replace(str, "CD", "xCDy") '-- "ADBxCDyEF"
str = Replace(str, "D", "Dx") '-- "ADxBxCDxyEF" : wrong!
Method 2 (2 steps):
str = Replace(str, "D", "Dx") '-- "ADxBCDxEF"
str = Replace(str, "CD", "xCDy") '-- "ADxBxCDyxEF" : wrong!
Method 3 (4 steps): with #1 and #2 as temp tokens
str = Replace(str, "CD", "#1") '-- "ADB#1EF"
str = Replace(str, "D", "#2") '-- "A#2B#1EF"
str = Replace(str, "#1", "xCDy") '-- "A#2BxCDyEF"
str = Replace(str, "#2", "Dx") '-- "ADxBxCDyEF" : correct!
-
Jun 13th, 2008, 01:11 AM
#13
Thread Starter
PowerPoster
Re: Need help with replace issue
this is getting hectic. Say I have these within a string:
Code:
this is a test
www.vbforums.com
http://www.vbforums.com
using my regexp function I extract www.vbforums.comand http://www.vbforums.com
I now add these two strings to an array seperated by commas.
Then I loop through the array and replace the first item with its array index so:
www.vbforums.com becomes <--0--> and
http://www.vbforums.com becomes <--1-->
Now how will I go about replacing that.
Please help
-
Jun 13th, 2008, 01:15 AM
#14
Re: Need help with replace issue
 Originally Posted by Nitesh
this is getting hectic. Say I have these within a string:
Code:
this is a test
www.vbforums.com
http://www.vbforums.com
using my regexp function I extract www.vbforums.comand http://www.vbforums.com
I now add these two strings to an array seperated by commas.
Then I loop through the array and replace the first item with its array index so:
www.vbforums.com becomes <--0--> and
http://www.vbforums.com becomes <--1-->
Now how will I go about replacing that.
Please help
If you can Replace() A's with B's then you can Replace() B's with A's. Convert instances of domain to token:
strHTML = Replace(strHTML, strEmailIn(C), "<!--" & C & "-->")
Then as already explained convert tokens to links. Don't convert directly from domains into links.
-
Jun 13th, 2008, 02:18 AM
#15
Thread Starter
PowerPoster
Re: Need help with replace issue
sorry for my ignorance,
but using this text
www.google.com
www.google.com.au
and www.google.com.ph
and this code:
Code:
For B = 0 To UBound(strEmailIn())
tempstr = stremailin(b)
strHTML = Replace(strHTML, tempStr, "<-- " & B & " -->")
Next B
www.google.com becomes <--0-->
www.google.com.au becomes <--0-->.au
etc. is this right
-
Jun 13th, 2008, 02:35 AM
#16
Re: Need help with replace issue
Yup that would happen, really depends on the data. Please post your regex code. I think it would be best to implement introduction of token there.
-
Jun 13th, 2008, 02:45 AM
#17
Thread Starter
PowerPoster
Re: Need help with replace issue
but then that takes me back to square 1 .
this is my regex code: please advise
Code:
Public Function rgxExtract(Optional ByVal Target As Variant, Optional Pattern As String = "", Optional ByVal Item As Long = 0, Optional CaseSensitive As Boolean = False, Optional FailOnError As Boolean = True, Optional Persist As Boolean = False) As Variant
Dim arrEmails() As String
Const rgxPROC_NAME = "rgxExtract"
Static oRE As Object 'VBScript_RegExp_55.RegExp
'Static declaration means we don't have to create
'and compile the RegExp object every single time
'the function is called.
Dim oMatches As Object 'VBScript_RegExp_55.MatchCollection
On Error GoTo ErrHandler
rgxExtract = Null 'Default return value
'NB: if FailOnError is false, returns Null on error
If IsMissing(Target) Then
'This is the signal to dispose of oRE
Set oRE = Nothing
Exit Function 'with default value
End If
'Create the RegExp object if necessary
If oRE Is Nothing Then
Set oRE = CreateObject("VBScript.Regexp")
End If
With oRE
'Check whether the current arguments (other than Target)
'are different from those stored in oRE, and update them
'(thereby recompiling the regex) only if necessary.
If CaseSensitive = .IgnoreCase Then
.IgnoreCase = Not .IgnoreCase
End If
.Global = True
.MultiLine = True
If Pattern <> .Pattern Then
.Pattern = Pattern
End If
'Finally, execute the match
If IsNull(Target) Then
rgxExtract = Null
Else
Set oMatches = oRE.Execute(Target)
If oMatches.count > 0 Then
retstring = ""
For j = 0 To oMatches.count - 1
retstring = retstring & oMatches(j) & ","
Next j
If retstring <> "" Then
retstring = Left(retstring, Len(retstring) - 1)
rgxExtract = retstring
Exit Function
End If
Else
rgxExtract = Null
End If
End If
End With
'Tidy up and normal exit
If Not Persist Then Set oRE = Nothing
Exit Function
ErrHandler:
Set oRE = Nothing
' End If
End Function
this is how I call it:
Code:
stremail = rgxExtract(strHTML, "(https?://)?(([0-9a-z_!~*'().&=+$%-]+: )?[0-9a-z_!~*'().&=+$%-]+@)?(([0-9]{1,3}\.){3}[0-9]{1,3}|([0-9a-z_!~*'()-]+\.)*([0-9a-z][0-9a-z-]{0,61})?[0-9a-z]\.[a-z]{2,6})(:[0-9]{1,4})?((/?)|(/[0-9a-z_!~*'().;?:@&=+$,%#-]+)+/?)*", , True, False, False)
-
Jun 13th, 2008, 02:59 AM
#18
Thread Starter
PowerPoster
Re: Need help with replace issue
ok im finally understanding.
but now this is my output
www.google.com
www.google.com and then .au seperately.
How can I fix that
-
Jun 13th, 2008, 03:18 AM
#19
Re: Need help with replace issue
My VB regex is kinda fuzzy and I won't have time to research... but doesn't it have a property that returns the start position and length of the match? You can then get the text before, after and between the matches (or befoer, after and between the domains) and rebuild strHTML with the tokens inserted with string concatenation (or better alternative to concat) rather than using replace.
Are there performance issues such as processing lots of HTML text?
-
Jun 13th, 2008, 03:29 AM
#20
Thread Starter
PowerPoster
Re: Need help with replace issue
Hi again,
Thank you so much for helping me . Im almost there. I haven't had any performance issues so far. I usually send fairly small bits of text. I will try the regex functions u spoke about and let you know what happens
-
Jun 13th, 2008, 06:58 AM
#21
Thread Starter
PowerPoster
Re: Need help with replace issue
Hi Leinad31,
Regex doesn't have those properties
-
Jun 13th, 2008, 01:14 PM
#22
Re: Need help with replace issue
You might be refering to match collection, instead of match object. Did some research http://www.regular-expressions.info/vbscript.html
For this I would use a specialized called procedure rather than a generic regex wrapper... yes, you'll always pass a pattern, execute, and get match collection but what you do next depends on what your trying to accomplish... a CSV return will not always be applicable. You shouldn't be only returning matches... you also need to return string with the matches replaced as tokens.
Logic would have been something like this.
- Get match collection and resize a string array, say arrTmp, from 0 to count
- Check if there are matches via matchColl.Count property. If so then following is code under IF
- If there are matches then ; initialize to 1 a variable, say lastPos, for tracking end of last match then iterate thru match collection For arrIdx = 0 To matchColl.Count - 1
- In loop - based on lastPos and matchColl(arrIdx).FirstIndex property you can Mid(strInput, lastPos, matchColl(arrIdx).FirstIndex - lastPos +1) the part of the string that didn't match. Store this in arrTmp(arrIdx) with token "<!--" & arrIdx & "-->" concatenated. EDIT: actually you can already concat link instead
- In loop - after storing non-matched string with trailing token to array, also update your CSV if you want to retain that method: retstring = retstring & oMatches(j) & ","
- In loop - don't forget to update lastPos to character position after match or matchColl(arrIdx).FirstIndex + 1 + matchColl(arrIdx).Length, so next iteration extracts next non-matched text into array arrTmp.
- After loop, with matches - Concatenate trailing non-matched text to last array element in arrTmp. Check first if lastPos <= strInput length before trying to do a Mid() or Right().
- This is the ELSE part , or no matches were found for pattern - simply assign strInput to arrTmp(0)
- END IF
- strOutput or string with tokens in place would be Join(arrTmp, ""), also return your CSV list of matches.
Last edited by leinad31; Jun 13th, 2008 at 01:43 PM.
-
Jun 13th, 2008, 01:17 PM
#23
Re: Need help with replace issue
On second thought, don't use rgxExtract()... from what I described above you can already place the links rather than doing a Replace() later. A one size fits all approach your trying with rgxExtract() isn't applicable... or it would have been better off as a wrapper that returns match collection object since further processing of match collection is case to case depending on what your trying to accomplish. Also use of rgxExtract() shifted focus back to basic string manipulation functions on its return value, match collection and match object were forgotten and the information their properties provided were not taken advantage.
Last edited by leinad31; Jun 13th, 2008 at 01:46 PM.
-
Jun 13th, 2008, 07:13 PM
#24
Re: Need help with replace issue
We have gone a bit too far. There is an easy way and making sure error free:
* Sort domain names by their length, the longest first. This will make sure if domain name A is a substring of domain B, B will be processed first.
* Replace each domain name in sorted order with a "token" as mentioned such as "[!@#$%--" & c & "--!@#$%]" where c is the index.
* After finish all domain names, replace all "tokens" with corespondent link.
-
Jun 15th, 2008, 02:49 AM
#25
Re: Need help with replace issue
 Originally Posted by anhn
We have gone a bit too far. There is an easy way and making sure error free:
* Sort domain names by their length, the longest first. This will make sure if domain name A is a substring of domain B, B will be processed first.
* Replace each domain name in sorted order with a "token" as mentioned such as "[!@#$%--" & c & "--!@#$%]" where c is the index.
* After finish all domain names, replace all "tokens" with corespondent link.
He's using regex after all, so might as well take advantage of that fact. We can consider the results of regex as tokens themselves.
-
Jun 17th, 2008, 01:27 AM
#26
Thread Starter
PowerPoster
Re: Need help with replace issue
Thanks guys,
I've added this code to my module. Please check it for me. I'm sorting by length. Please point out any possible issues for me.
Code:
Public Sub SortByLen(DomArray As Variant)
Dim j As Long
Dim jMin As Long
Dim jMax As Long
Dim temp As Variant
Dim blnSwap As Boolean
jMin = LBound(DomArray)
jMax = UBound(DomArray) - 1
Do
blnSwap = False
For j = jMin To jMax
If Len(DomArray(j)) < Len(DomArray(j + 1)) Then
temp = DomArray(j)
DomArray(j) = DomArray(j + 1)
DomArray(j + 1) = temp
blnSwap = True
End If
jMax = jMax - 1
Next j
Loop Until Not blnSwap
End Sub
It worked well with the sample in my previous post.
-
Jun 18th, 2008, 01:26 AM
#27
Thread Starter
PowerPoster
Re: Need help with replace issue
Thanks everyone who helped me. I really appreciate it.
-
Jun 18th, 2008, 01:48 AM
#28
Re: [RESOLVED] Need help with replace issue
As long as there are no cases of domains with and without leading www (if http:// was not included) such www.xyz.com, xyz.com
-
Jun 18th, 2008, 01:53 AM
#29
Re: [RESOLVED] Need help with replace issue
Glad to see it works for you now.
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules
|
Click Here to Expand Forum to Full Width
|