Results 1 to 3 of 3

Thread: [RESOLVED] Regex Problem

  1. #1

    Thread Starter
    Frenzied Member
    Join Date
    Jul 2005
    Posts
    1,521

    Resolved [RESOLVED] Regex Problem

    I am parsing through an HTML file using regex. I am trying to get the information that is between my pattern. But if I tell it to match any position preceding the expression (using ?=) it works. If I tell it to match following the expression it doesn't work.

    Sample HTML Doc
    Code:
        <table cellpadding="0" cellspacing="0" style="width:100%;" class="f-bold">
        
        
                  <tr>
          
          <td valign="top" class="pr-10" style="width:50%;">
            <a href="http://www.aUrl">First Value Here</a><br />
            Second Value Here
          </td>
          
                
                
        
          
          <td valign="top" class="pr-10" style="width:50%;">
            <a href="http://www.aUrl">First Value Here</a><br />
            Second Value Here
          </td>
          
                  </tr>
                
                  <tr>
              <td colspan="2">
                <div class="divider"></div>
              </td>
            </tr>
                
        
                                <tr>
          
          <td valign="top" class="pr-10" style="width:50%;">
            <a href="http://www.aUrl">First Value Here</a><br />
            Second Value Here
          </td>
          
                
                
        
          
          <td valign="top" class="pr-10" style="width:50%;">
            <a href="http://www.aUrl">First Value Here</a><br />
            Second Value Here
          </td>
          
                  </tr>
    
                    .....
    Using this code:
    VB Code:
    1. Dim pattern As String = "(?=<td valign.*>).*?(?=</td>)"
    2. Dim reg As New Regex(pattern, RegexOptions.Singleline)
    3. Dim mc As MatchCollection = reg.Matches(pg)

    mc has 4 values
    Code:
    <td valign="top" class="pr-10" style="width:50%;">
            <a href="http://www.aUrl">First Value Here</a><br />
            Second Value Here
    
    <td valign="top" class="pr-10" style="width:50%;">
            <a href="http://www.aUrl">First Value Here</a><br />
            Second Value Here
    
    <td valign="top" class="pr-10" style="width:50%;">
            <a href="http://www.aUrl">First Value Here</a><br />
            Second Value Here
    
    <td valign="top" class="pr-10" style="width:50%;">
            <a href="http://www.aUrl">First Value Here</a><br />
            Second Value Here
    The next code I try to make it so it doesn't return the <td.... > in the value
    VB Code:
    1. Dim pattern As String = "(?<=<td valign.*>).*?(?=</td>)"
    2. Dim reg As New Regex(pattern, RegexOptions.Singleline)
    3. Dim mc As MatchCollection = reg.Matches(pg)

    The first entry is correct then...
    Code:
            <a href="http://www.aUrl.com">First Value</a><br />
            Second Value
    
    'then
    
          
                
                
        
          
          <td valign="top" class="pl-10" style="width:50%;">
           <a href="http://www.aUrl.com">First Value</a><br />
            Second Value
    notice all the white space. What should I do to just get what is inside the <td....> </td>?
    Visual Studio Team Edition 2005
    GDI+ Links: Bob Powell VB.Net Heaven
    API Links: All API Pinvoke.Net
    VB6 to VB.Net: Visual Basic 6 to .NET Function Equivalents (Thread)

  2. #2
    PowerPoster
    Join Date
    Aug 2005
    Location
    College Station, TX
    Posts
    4,521

    Re: Regex Problem

    See if this is what you are wanting to do... I tweaked it according to your sample, so it would probably need to change if your html tags are coded a little different in other pages that you might want to try it on. It searches for a "<td" and "<a href" tag, and doesnt return the beginning and ending <td> tags...
    VB Code:
    1. 'this was just a text file with your sample HTML in your first post...
    2.         Dim MyString As String = New System.IO.StreamReader("c:\regex.txt").ReadToEnd
    3.         Dim pattern As String = "(?<=<td.*?>.*?)<a href.*?>.*?</a>.*?(?=</td>)"
    4.         Dim reg As New System.Text.RegularExpressions.Regex(pattern, System.Text.RegularExpressions.RegexOptions.Singleline)
    5.         Dim mc As System.Text.RegularExpressions.MatchCollection = reg.Matches(MyString)
    6.         For Each match As System.Text.RegularExpressions.Match In mc
    7.             MessageBox.Show(match.Value)
    8.         Next

  3. #3

    Thread Starter
    Frenzied Member
    Join Date
    Jul 2005
    Posts
    1,521

    Re: Regex Problem

    Worker perfectly. Thanks.
    Visual Studio Team Edition 2005
    GDI+ Links: Bob Powell VB.Net Heaven
    API Links: All API Pinvoke.Net
    VB6 to VB.Net: Visual Basic 6 to .NET Function Equivalents (Thread)

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  



Click Here to Expand Forum to Full Width