[RESOLVED] Parse HTML with Regex
Thanks to threads found on this forum I've been able to grab a web page and place it into a document. I now have certain text that I need to extract from the page. This text is always surrounded by the same HTML tags.
<h3><a id=link1 href="http://www.site.com"><b>Listing Title</b></a></h3><cite>MyWebsite.com</cite> Come visit my awesome website!<li class
I cut off at "<li class" because the actual class is not the same from one to the next.
The things I need to grab from this block are, in order:
link1
Listing Title
Come visit my awesome website!
MyWebsite.com
There are 10-15 of these blocks on a page, so I'd like to loop this and store each to its own set of text boxes...
text1.text = link1
text2.text = Listing Title
etc etc..
I'm slightly familiar with PHP so I know I have to do this with regex, but I'd really appreciate some help figuring out how to go about actually putting this together. :confused:
Re: Parse HTML with Regex
You could just use a Mid$() function to parse through those, no need for Regex :P
Code:
Public Function GB(rC As String, rS As String, rF As String, Optional lgB As Long = 1) As String
On Error Resume Next
lgB = InStr(lgB, rC, rS) + Len(rS): GB = Mid$(rC, lgB, InStr(lgB, rC, rF) - lgB)
End Function
GB("abcdef", "ab", "ef") returns "cd"
And if the string is not found in rC, then it returns nothing
Re: Parse HTML with Regex
The simple thought of not having to use regex makes me tingle. Thanks for the tip, I'll try it and report back with the results!
Re: Parse HTML with Regex
Ok it works, but there are pound signs and forward slashes in the areas of code I'm trying to match between that cause it to break...
I tried putting the two areas I'm maching in between in strings, but as soon as I add the pound sign, it starts returning the wrong match
For example:
Starting the match with "<h3><a id=link1 href=" works fine
Starting the match with "<h3><a id=link1 href=#" causes it to return the wrong match... WAY wrong, like not even in the neighborhood of the string I'm looking for
Re: Parse HTML with Regex
hey dogfighter
as u told u r familier with regular expressions so i m giving u a sample for implementing these in vb
use following function
and pass pattern and text to be parsed
vb Code:
Function TestRegExp(sPattern As String, sText As String)
Dim oRegExp As RegExp
Dim oMatch As Match
Dim oMatches As MatchCollection
Dim sOutput As String
Set oRegExp = New RegExp
oRegExp.Pattern = sPattern
oRegExp.IgnoreCase = True
oRegExp.Global = True
If (oRegExp.Test(sText) = True) Then
Set oMatches = oRegExp.Execute(sText)
For Each oMatch In oMatches
sOutput = sOutput & "Match found at position "
sOutput = sOutput & oMatch.FirstIndex & ". Match Value is '"
sOutput = sOutput & oMatch.Value & "'." & vbCrLf
Next
Else
sOutput = "String Matching Failed"
End If
TestRegExp = sOutput
End Function
Re: Parse HTML with Regex
Appreciate it suki, but I'd like to avoid regex if I can. Zach's method was working just fine until I included that pound sign.
Can anyone shed some light on a way around this?
Re: Parse HTML with Regex
Nvm, i was missing something in my match string, my error. Thanks to Zach and suki for your help.