Results 1 to 7 of 7

Thread: [RESOLVED] Parse HTML with Regex

  1. #1

    Thread Starter
    New Member
    Join Date
    Feb 2009
    Posts
    5

    Resolved [RESOLVED] Parse HTML with Regex

    Thanks to threads found on this forum I've been able to grab a web page and place it into a document. I now have certain text that I need to extract from the page. This text is always surrounded by the same HTML tags.

    <h3><a id=link1 href="http://www.site.com"><b>Listing Title</b></a></h3><cite>MyWebsite.com</cite>&nbsp; &nbsp; &nbsp; Come visit my awesome website!<li class

    I cut off at "<li class" because the actual class is not the same from one to the next.

    The things I need to grab from this block are, in order:

    link1
    Listing Title
    Come visit my awesome website!
    MyWebsite.com

    There are 10-15 of these blocks on a page, so I'd like to loop this and store each to its own set of text boxes...

    text1.text = link1
    text2.text = Listing Title

    etc etc..

    I'm slightly familiar with PHP so I know I have to do this with regex, but I'd really appreciate some help figuring out how to go about actually putting this together.

  2. #2
    Frenzied Member
    Join Date
    Dec 2007
    Posts
    1,072

    Re: Parse HTML with Regex

    You could just use a Mid$() function to parse through those, no need for Regex :P

    Code:
    Public Function GB(rC As String, rS As String, rF As String, Optional lgB As Long = 1) As String
    On Error Resume Next
        lgB = InStr(lgB, rC, rS) + Len(rS): GB = Mid$(rC, lgB, InStr(lgB, rC, rF) - lgB)
    End Function
    GB("abcdef", "ab", "ef") returns "cd"

    And if the string is not found in rC, then it returns nothing

  3. #3

    Thread Starter
    New Member
    Join Date
    Feb 2009
    Posts
    5

    Re: Parse HTML with Regex

    The simple thought of not having to use regex makes me tingle. Thanks for the tip, I'll try it and report back with the results!

  4. #4

    Thread Starter
    New Member
    Join Date
    Feb 2009
    Posts
    5

    Re: Parse HTML with Regex

    Ok it works, but there are pound signs and forward slashes in the areas of code I'm trying to match between that cause it to break...

    I tried putting the two areas I'm maching in between in strings, but as soon as I add the pound sign, it starts returning the wrong match

    For example:

    Starting the match with "<h3><a id=link1 href=" works fine

    Starting the match with "<h3><a id=link1 href=#" causes it to return the wrong match... WAY wrong, like not even in the neighborhood of the string I'm looking for
    Last edited by dogfighter; Feb 18th, 2009 at 05:15 PM.

  5. #5
    Hyperactive Member su ki's Avatar
    Join Date
    Oct 2007
    Posts
    354

    Re: Parse HTML with Regex

    hey dogfighter
    as u told u r familier with regular expressions so i m giving u a sample for implementing these in vb
    use following function
    and pass pattern and text to be parsed

    vb Code:
    1. Function TestRegExp(sPattern As String, sText As String)
    2.    Dim oRegExp As RegExp
    3.    Dim oMatch As Match
    4.    Dim oMatches As MatchCollection
    5.    Dim sOutput As String
    6.    
    7.    Set oRegExp = New RegExp
    8.    
    9.    oRegExp.Pattern = sPattern
    10.    oRegExp.IgnoreCase = True
    11.    oRegExp.Global = True
    12.    
    13.    If (oRegExp.Test(sText) = True) Then
    14.     Set oMatches = oRegExp.Execute(sText)  
    15.     For Each oMatch In oMatches  
    16.       sOutput = sOutput & "Match found at position "
    17.       sOutput = sOutput & oMatch.FirstIndex & ". Match Value is '"
    18.       sOutput = sOutput & oMatch.Value & "'." & vbCrLf
    19.     Next
    20.    Else
    21.     sOutput = "String Matching Failed"
    22.    End If
    23.    TestRegExp = sOutput
    24. End Function
    * If my post helped you, please Rate it
    * If your problem is solved please also mark the thread resolved it is there in right top of page under thread tools
    * Why Rating is useful

  6. #6

    Thread Starter
    New Member
    Join Date
    Feb 2009
    Posts
    5

    Re: Parse HTML with Regex

    Appreciate it suki, but I'd like to avoid regex if I can. Zach's method was working just fine until I included that pound sign.

    Can anyone shed some light on a way around this?

  7. #7

    Thread Starter
    New Member
    Join Date
    Feb 2009
    Posts
    5

    Re: Parse HTML with Regex

    Nvm, i was missing something in my match string, my error. Thanks to Zach and suki for your help.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  



Click Here to Expand Forum to Full Width