[RESOLVED] ASP VBSrcipt: Regular Expression - HTML
Hi,
I've trying to come up with a regular expression to find html in strings
This is what I have so far -
vb Code:
Dim sTags : sTags="address|applet|area|a|base|basefont|big|blockquote|body|br|b|caption|center|cite|code|dd|dfn|dir|div|dl|dt|em|embed|fieldset|font|form|frameset|frame|h1|h2|h3|h4|h5|h6|head|hr|html|iframe|img|input|isindex|i|kbd|legend|link|li|map|menu|meta|object|ol|option|param|pre|p|samp|script|select|small|span|strike|strong|style|sub|sup|table|td|textarea|th|title|tr|tt|ul|u|var"
Dim sPat : sPat = "</?(" & sTags & ")(.|\n)*?>"
Problem is it thinks that <email> is a valid tag because it begins with "em"
or <blah> - thinks it's bold
or <hello> - thinks it's a heading
I'm wanting to use this RE for two reasons -
- Parse html to ensure that when writing text to a web page valid html gets rendered as such and other < and >'s are html encoded.
- Strip html out when sending plain text emails
Any help will be greatly appreciated
Al
Re: ASP VBSrcipt: Regular Expression - HTML
Introduce a mandatory space after the tag and before the (.|\n). You could use
\s+
Re: ASP VBSrcipt: Regular Expression - HTML
Mendhak,
Thanks for your reply. Wouldn't that eliminate all tags like <b> and or all closing tags?
Is this what you meant?
Code:
Dim sPat : sPat = "</?(" & sTags & ")\s+(.|\n)*?>"
Al
Re: ASP VBSrcipt: Regular Expression - HTML
I think I've cracked it, or lets say that it's performing okay against the variations that I've tried it with.
I've added a word boundary to the end of the tag list see hilighted -
Code:
Dim sPat : sPat = "</?(" & sTags & ")(\b)(.|\n)*?>"
Al