I am parsing through an HTML file using regex. I am trying to get the information that is between my pattern. But if I tell it to match any position preceding the expression (using ?=) it works. If I tell it to match following the expression it doesn't work.
Sample HTML Doc
Using this code:Code:<table cellpadding="0" cellspacing="0" style="width:100%;" class="f-bold"> <tr> <td valign="top" class="pr-10" style="width:50%;"> <a href="http://www.aUrl">First Value Here</a><br /> Second Value Here </td> <td valign="top" class="pr-10" style="width:50%;"> <a href="http://www.aUrl">First Value Here</a><br /> Second Value Here </td> </tr> <tr> <td colspan="2"> <div class="divider"></div> </td> </tr> <tr> <td valign="top" class="pr-10" style="width:50%;"> <a href="http://www.aUrl">First Value Here</a><br /> Second Value Here </td> <td valign="top" class="pr-10" style="width:50%;"> <a href="http://www.aUrl">First Value Here</a><br /> Second Value Here </td> </tr> .....
VB Code:
Dim pattern As String = "(?=<td valign.*>).*?(?=</td>)" Dim reg As New Regex(pattern, RegexOptions.Singleline) Dim mc As MatchCollection = reg.Matches(pg)
mc has 4 values
The next code I try to make it so it doesn't return the <td.... > in the valueCode:<td valign="top" class="pr-10" style="width:50%;"> <a href="http://www.aUrl">First Value Here</a><br /> Second Value Here <td valign="top" class="pr-10" style="width:50%;"> <a href="http://www.aUrl">First Value Here</a><br /> Second Value Here <td valign="top" class="pr-10" style="width:50%;"> <a href="http://www.aUrl">First Value Here</a><br /> Second Value Here <td valign="top" class="pr-10" style="width:50%;"> <a href="http://www.aUrl">First Value Here</a><br /> Second Value Here
VB Code:
Dim pattern As String = "(?<=<td valign.*>).*?(?=</td>)" Dim reg As New Regex(pattern, RegexOptions.Singleline) Dim mc As MatchCollection = reg.Matches(pg)
The first entry is correct then...
notice all the white space. What should I do to just get what is inside the <td....> </td>?Code:<a href="http://www.aUrl.com">First Value</a><br /> Second Value 'then <td valign="top" class="pl-10" style="width:50%;"> <a href="http://www.aUrl.com">First Value</a><br /> Second Value




Reply With Quote