Results 1 to 13 of 13

Thread: Regex problems, need help please

  1. #1

    Thread Starter
    Addicted Member
    Join Date
    Dec 2007
    Posts
    190

    Regex problems, need help please

    hello, i am on the hard part of regex.
    what i am trying to do is extract information beween two tags in some html from the source of a website.
    The contents of the text between the two tags will always be different.

    the code i currently have is;
    Code:
    Private Sub Button1_Click_1(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click
            Dim html As String = WebBrowser1.DocumentText
            Dim regxheader As String
            regxheader = HeaderFronttxt.Text & "*\" & HeaderEndtxt.Text
            
            Dim r = New Regex((regxheader), _
                RegexOptions.IgnoreCase)
    
            Dim m As Match = r.Match(html)
            While m.Success
                headerDisplaytxt.Text = m.Value
                
            End While
        End Sub
    this is code i have adapted, so im not sure if its correct. The problem is, when it gets to while m.success, it skips the suction, meaning it didnt work.

    The code i am trying to extract is
    Code:
    <h2>
    				Sony 2.1 35W PC Speakers      		</h2>
    the two text box entries for the VB code at the top is <h2> and </h2>

  2. #2
    Master Of Orion ForumAccount's Avatar
    Join Date
    Jan 2009
    Location
    Canada
    Posts
    2,802

    Re: Regex problems, need help please

    try

    vb.net Code:
    1. regxheader = HeaderFronttxt.Text & ".+" & HeaderEndtxt.Text

  3. #3

    Thread Starter
    Addicted Member
    Join Date
    Dec 2007
    Posts
    190

    Re: Regex problems, need help please

    thats worked great thanks, could you please extend on what the . and + means and any other sybols. For the benefit of me and others looking at ti forum. It might seem stupid, but i cant find any decent sites that explain what these are.

  4. #4
    Master Of Orion ForumAccount's Avatar
    Join Date
    Jan 2009
    Location
    Canada
    Posts
    2,802

    Re: Regex problems, need help please

    Sure, a '.' (dot) in a regex pattern means it will match any character, except a new line character (/n). The '+' means it will match the preceding element multiple times. It is what is known as a greedy quantifier. Greedy means that it will try to match as much of the string as it can.

    This is a good article on Regex quantifiers: (Things like + ? . )

    http://msdn.microsoft.com/en-us/library/3206d374.aspx

    This is a good article on the regex character class: (Things like [] () \w \d)

    http://msdn.microsoft.com/en-us/library/20bw873z.aspx

    This is a good article on a wide range of regex resources:

    http://social.msdn.microsoft.com/For...3-d93c9797fdce

  5. #5

    Thread Starter
    Addicted Member
    Join Date
    Dec 2007
    Posts
    190

    Re: Regex problems, need help please

    thanks

  6. #6

    Thread Starter
    Addicted Member
    Join Date
    Dec 2007
    Posts
    190

    Re: Regex problems, need help please

    hello, i am trying to add a regx character that allows multiple lines to be used. For some reason .+ isnt working. you said that /n should work but ive tried that with no success. I checked out the sites you posted but couldnt find anything directly related to multiple lines.

    Here is the text if it helps;

    <div class="productdetailright">
    <p>Sony Multi-channel speaker, 2.1 PC Speaker, 2 Satellite, 1 sub-woofer.</p>
    Immerse yourself in the latest video game, DVD or share your favourite music with your friends. Offering high frequency response, the Sony range of PC speakers deliver high quality music playback, so you can be sure of an unforgettable experience. With such quality sound, your PC will soon become your favourite multimedia device!<ul><li> 2.1ch Speaker System</li><li>Space Saving 2.1ch speaker for younger user</li><li>Two Inputs (MIX) to connect to a PC and another device (WM)</li></ul><p><strong>Wired Controller</strong><br />- Power, Volume, Bass, HP jack</p><p><strong>Total Output: 35W</strong><br />- Satellite: 5W x 2<br />- STL. Driver: 57 mm<br />- Subwoofer: 25W<br />- SW. Driver: 120 mm</p><strong><br /></strong><p>&nbsp;</p> </div>

  7. #7
    Master Of Orion ForumAccount's Avatar
    Join Date
    Jan 2009
    Location
    Canada
    Posts
    2,802

    Re: Regex problems, need help please

    No I said .+ won't match a new line character. To read multi-line data, take a look at the Regex options that can be passed into the constructor. You can specify that your data is multi-lined.

    http://msdn.microsoft.com/en-us/libr...exoptions.aspx

  8. #8

    Thread Starter
    Addicted Member
    Join Date
    Dec 2007
    Posts
    190

    Re: Regex problems, need help please

    ok please bear with me. I have added the multiline option. It says i need to use the ^ and $ characters which i have implimented as follows;

    vb Code:
    1. Private Sub Product_description()
    2.         Dim html As String = WebBrowser1.DocumentText
    3.         Dim regxheader As String
    4.         regxheader = DescriptionFrontTxtBox.Text & "^+$" & DescriptionBacktxtbx.Text
    5.         'MessageBox.Show(regxheader)
    6.         Dim r = New Regex((regxheader), _
    7.             RegexOptions.IgnoreCase And RegexOptions.Multiline)
    8.  
    9.         Dim m As Match = r.Match(html)
    10.         If m.Success Then
    11.             DescriptionTxtBx.Text = m.Value
    12.             MessageBox.Show(regxheader)
    13.         End If
    14.         r = Nothing
    15.         m = Nothing
    16.     End Sub
    This has not worked. Am am a newbie at regex as you can proberly tell.

  9. #9
    Master Of Orion ForumAccount's Avatar
    Join Date
    Jan 2009
    Location
    Canada
    Posts
    2,802

    Re: Regex problems, need help please

    What are you trying to extract from that? Just the bold?

    This will match that text above if that's what you're checking:

    vb.net Code:
    1. "^<div class=[\w\W\s\S]+<\/div>$"
    Last edited by ForumAccount; May 27th, 2009 at 02:59 PM.

  10. #10

    Thread Starter
    Addicted Member
    Join Date
    Dec 2007
    Posts
    190

    Re: Regex problems, need help please

    i am trying to get everything between those two points. I dont care what it is, i want everything between <div class="productdetailright"> and </div>.
    I will try your example now, thanks

  11. #11

    Thread Starter
    Addicted Member
    Join Date
    Dec 2007
    Posts
    190

    Re: Regex problems, need help please

    i tried it in place of mine, and i got nothing. a blank output

  12. #12
    Master Of Orion ForumAccount's Avatar
    Join Date
    Jan 2009
    Location
    Canada
    Posts
    2,802

    Re: Regex problems, need help please

    Can I see how you used it?

  13. #13

    Thread Starter
    Addicted Member
    Join Date
    Dec 2007
    Posts
    190

    Re: Regex problems, need help please

    vb Code:
    1. Private Sub Product_description()
    2.         Dim html As String = WebBrowser1.DocumentText
    3.         Dim regxheader As String
    4.         'regxheader = DescriptionFrontTxtBox.Text & "^+$" & DescriptionBacktxtbx.Text
    5.         regxheader = "^<div class=[\w\W\s\S]+<\/div>$"
    6.         'MessageBox.Show(regxheader)
    7.         Dim r = New Regex((regxheader), _
    8.             RegexOptions.IgnoreCase And RegexOptions.Multiline)
    9.  
    10.         Dim m As Match = r.Match(html)
    11.         If m.Success Then
    12.             DescriptionTxtBx.Text = m.Value
    13.             'MessageBox.Show(regxheader)
    14.         End If
    15.         r = Nothing
    16.         m = Nothing
    17.     End Sub

    neither does this;
    vb Code:
    1. Private Sub Product_description()
    2.         Dim html As String = WebBrowser1.DocumentText
    3.         Dim regxheader As String
    4.         'regxheader = DescriptionFrontTxtBox.Text & "^+$" & DescriptionBacktxtbx.Text
    5.         regxheader = "^<div class=[\w\W\s\S]+<\/div>$"
    6.         'MessageBox.Show(regxheader)
    7.         Dim r = New Regex((regxheader), _
    8.             RegexOptions.IgnoreCase And RegexOptions.Multiline)
    9.  
    10.         Dim m As Match = r.Match(html)
    11.         'If m.Success Then
    12.         DescriptionTxtBx.Text = m.Value
    13.         'MessageBox.Show(regxheader)
    14.         'End If
    15.         r = Nothing
    16.         m = Nothing
    17.     End Sub

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  



Click Here to Expand Forum to Full Width