Results 1 to 4 of 4

Thread: [RESOLVED] ASP VBSrcipt: Regular Expression - HTML

  1. #1

    Thread Starter
    Fanatic Member aconybeare's Avatar
    Join Date
    Oct 2001
    Location
    UK
    Posts
    772

    Resolved [RESOLVED] ASP VBSrcipt: Regular Expression - HTML

    Hi,

    I've trying to come up with a regular expression to find html in strings

    This is what I have so far -

    vb Code:
    1. Dim sTags   : sTags="address|applet|area|a|base|basefont|big|blockquote|body|br|b|caption|center|cite|code|dd|dfn|dir|div|dl|dt|em|embed|fieldset|font|form|frameset|frame|h1|h2|h3|h4|h5|h6|head|hr|html|iframe|img|input|isindex|i|kbd|legend|link|li|map|menu|meta|object|ol|option|param|pre|p|samp|script|select|small|span|strike|strong|style|sub|sup|table|td|textarea|th|title|tr|tt|ul|u|var"
    2. Dim sPat    : sPat = "</?(" & sTags & ")(.|\n)*?>"

    Problem is it thinks that <email> is a valid tag because it begins with "em"
    or <blah> - thinks it's bold
    or <hello> - thinks it's a heading


    I'm wanting to use this RE for two reasons -
    1. Parse html to ensure that when writing text to a web page valid html gets rendered as such and other < and >'s are html encoded.
    2. Strip html out when sending plain text emails


    Any help will be greatly appreciated

    Al
    Last edited by aconybeare; Jan 13th, 2009 at 09:03 AM.

  2. #2
    I'm about to be a PowerPoster! mendhak's Avatar
    Join Date
    Feb 2002
    Location
    Ulaan Baator GooGoo: Frog
    Posts
    38,170

    Re: ASP VBSrcipt: Regular Expression - HTML

    Introduce a mandatory space after the tag and before the (.|\n). You could use

    \s+

  3. #3

    Thread Starter
    Fanatic Member aconybeare's Avatar
    Join Date
    Oct 2001
    Location
    UK
    Posts
    772

    Re: ASP VBSrcipt: Regular Expression - HTML

    Mendhak,

    Thanks for your reply. Wouldn't that eliminate all tags like <b> and or all closing tags?

    Is this what you meant?
    Code:
    Dim sPat    : sPat = "</?(" & sTags & ")\s+(.|\n)*?>"
    Al

  4. #4

    Thread Starter
    Fanatic Member aconybeare's Avatar
    Join Date
    Oct 2001
    Location
    UK
    Posts
    772

    Re: ASP VBSrcipt: Regular Expression - HTML

    I think I've cracked it, or lets say that it's performing okay against the variations that I've tried it with.

    I've added a word boundary to the end of the tag list see hilighted -

    Code:
    Dim sPat	: sPat = "</?(" & sTags & ")(\b)(.|\n)*?>"
    Al

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  



Click Here to Expand Forum to Full Width