|
-
Feb 18th, 2009, 11:44 AM
#1
Thread Starter
New Member
[RESOLVED] Parse HTML with Regex
Thanks to threads found on this forum I've been able to grab a web page and place it into a document. I now have certain text that I need to extract from the page. This text is always surrounded by the same HTML tags.
<h3><a id=link1 href="http://www.site.com"><b>Listing Title</b></a></h3><cite>MyWebsite.com</cite> Come visit my awesome website!<li class
I cut off at "<li class" because the actual class is not the same from one to the next.
The things I need to grab from this block are, in order:
link1
Listing Title
Come visit my awesome website!
MyWebsite.com
There are 10-15 of these blocks on a page, so I'd like to loop this and store each to its own set of text boxes...
text1.text = link1
text2.text = Listing Title
etc etc..
I'm slightly familiar with PHP so I know I have to do this with regex, but I'd really appreciate some help figuring out how to go about actually putting this together.
-
Feb 18th, 2009, 12:14 PM
#2
Frenzied Member
Re: Parse HTML with Regex
You could just use a Mid$() function to parse through those, no need for Regex :P
Code:
Public Function GB(rC As String, rS As String, rF As String, Optional lgB As Long = 1) As String
On Error Resume Next
lgB = InStr(lgB, rC, rS) + Len(rS): GB = Mid$(rC, lgB, InStr(lgB, rC, rF) - lgB)
End Function
GB("abcdef", "ab", "ef") returns "cd"
And if the string is not found in rC, then it returns nothing
-
Feb 18th, 2009, 04:32 PM
#3
Thread Starter
New Member
Re: Parse HTML with Regex
The simple thought of not having to use regex makes me tingle. Thanks for the tip, I'll try it and report back with the results!
-
Feb 18th, 2009, 04:59 PM
#4
Thread Starter
New Member
Re: Parse HTML with Regex
Ok it works, but there are pound signs and forward slashes in the areas of code I'm trying to match between that cause it to break...
I tried putting the two areas I'm maching in between in strings, but as soon as I add the pound sign, it starts returning the wrong match
For example:
Starting the match with "<h3><a id=link1 href=" works fine
Starting the match with "<h3><a id=link1 href=#" causes it to return the wrong match... WAY wrong, like not even in the neighborhood of the string I'm looking for
Last edited by dogfighter; Feb 18th, 2009 at 05:15 PM.
-
Feb 19th, 2009, 02:45 AM
#5
Hyperactive Member
Re: Parse HTML with Regex
hey dogfighter
as u told u r familier with regular expressions so i m giving u a sample for implementing these in vb
use following function
and pass pattern and text to be parsed
vb Code:
Function TestRegExp(sPattern As String, sText As String) Dim oRegExp As RegExp Dim oMatch As Match Dim oMatches As MatchCollection Dim sOutput As String Set oRegExp = New RegExp oRegExp.Pattern = sPattern oRegExp.IgnoreCase = True oRegExp.Global = True If (oRegExp.Test(sText) = True) Then Set oMatches = oRegExp.Execute(sText) For Each oMatch In oMatches sOutput = sOutput & "Match found at position " sOutput = sOutput & oMatch.FirstIndex & ". Match Value is '" sOutput = sOutput & oMatch.Value & "'." & vbCrLf Next Else sOutput = "String Matching Failed" End If TestRegExp = sOutput End Function
* If my post helped you, please Rate it
* If your problem is solved please also mark the thread resolved it is there in right top of page under thread tools
* Why Rating is useful
-
Feb 19th, 2009, 10:23 AM
#6
Thread Starter
New Member
Re: Parse HTML with Regex
Appreciate it suki, but I'd like to avoid regex if I can. Zach's method was working just fine until I included that pound sign.
Can anyone shed some light on a way around this?
-
Feb 20th, 2009, 01:21 PM
#7
Thread Starter
New Member
Re: Parse HTML with Regex
Nvm, i was missing something in my match string, my error. Thanks to Zach and suki for your help.
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules
|
Click Here to Expand Forum to Full Width
|