I am parsing through an HTML file using regex. I am trying to get the information that is between my pattern. But if I tell it to match any position preceding the expression (using ?=) it works. If I tell it to match following the expression it doesn't work.
Sample HTML Doc
Using this code:Code:<table cellpadding="0" cellspacing="0" style="width:100%;" class="f-bold">
<tr>
<td valign="top" class="pr-10" style="width:50%;">
<a href="http://www.aUrl">First Value Here</a><br />
Second Value Here
</td>
<td valign="top" class="pr-10" style="width:50%;">
<a href="http://www.aUrl">First Value Here</a><br />
Second Value Here
</td>
</tr>
<tr>
<td colspan="2">
<div class="divider"></div>
</td>
</tr>
<tr>
<td valign="top" class="pr-10" style="width:50%;">
<a href="http://www.aUrl">First Value Here</a><br />
Second Value Here
</td>
<td valign="top" class="pr-10" style="width:50%;">
<a href="http://www.aUrl">First Value Here</a><br />
Second Value Here
</td>
</tr>
.....
VB Code:
Dim pattern As String = "(?=<td valign.*>).*?(?=</td>)" Dim reg As New Regex(pattern, RegexOptions.Singleline) Dim mc As MatchCollection = reg.Matches(pg)
mc has 4 values
The next code I try to make it so it doesn't return the <td.... > in the valueCode:<td valign="top" class="pr-10" style="width:50%;">
<a href="http://www.aUrl">First Value Here</a><br />
Second Value Here
<td valign="top" class="pr-10" style="width:50%;">
<a href="http://www.aUrl">First Value Here</a><br />
Second Value Here
<td valign="top" class="pr-10" style="width:50%;">
<a href="http://www.aUrl">First Value Here</a><br />
Second Value Here
<td valign="top" class="pr-10" style="width:50%;">
<a href="http://www.aUrl">First Value Here</a><br />
Second Value Here
VB Code:
Dim pattern As String = "(?<=<td valign.*>).*?(?=</td>)" Dim reg As New Regex(pattern, RegexOptions.Singleline) Dim mc As MatchCollection = reg.Matches(pg)
The first entry is correct then...
notice all the white space. What should I do to just get what is inside the <td....> </td>?Code:
<a href="http://www.aUrl.com">First Value</a><br />
Second Value
'then
<td valign="top" class="pl-10" style="width:50%;">
<a href="http://www.aUrl.com">First Value</a><br />
Second Value
