|
-
May 24th, 2009, 03:13 PM
#1
Thread Starter
Addicted Member
Regex problems, need help please
hello, i am on the hard part of regex.
what i am trying to do is extract information beween two tags in some html from the source of a website.
The contents of the text between the two tags will always be different.
the code i currently have is;
Code:
Private Sub Button1_Click_1(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click
Dim html As String = WebBrowser1.DocumentText
Dim regxheader As String
regxheader = HeaderFronttxt.Text & "*\" & HeaderEndtxt.Text
Dim r = New Regex((regxheader), _
RegexOptions.IgnoreCase)
Dim m As Match = r.Match(html)
While m.Success
headerDisplaytxt.Text = m.Value
End While
End Sub
this is code i have adapted, so im not sure if its correct. The problem is, when it gets to while m.success, it skips the suction, meaning it didnt work.
The code i am trying to extract is
Code:
<h2>
Sony 2.1 35W PC Speakers </h2>
the two text box entries for the VB code at the top is <h2> and </h2>
-
May 24th, 2009, 03:36 PM
#2
Re: Regex problems, need help please
try
vb.net Code:
regxheader = HeaderFronttxt.Text & ".+" & HeaderEndtxt.Text
-
May 24th, 2009, 04:16 PM
#3
Thread Starter
Addicted Member
Re: Regex problems, need help please
thats worked great thanks, could you please extend on what the . and + means and any other sybols. For the benefit of me and others looking at ti forum. It might seem stupid, but i cant find any decent sites that explain what these are.
-
May 24th, 2009, 04:31 PM
#4
Re: Regex problems, need help please
Sure, a '.' (dot) in a regex pattern means it will match any character, except a new line character (/n). The '+' means it will match the preceding element multiple times. It is what is known as a greedy quantifier. Greedy means that it will try to match as much of the string as it can.
This is a good article on Regex quantifiers: (Things like + ? . )
http://msdn.microsoft.com/en-us/library/3206d374.aspx
This is a good article on the regex character class: (Things like [] () \w \d)
http://msdn.microsoft.com/en-us/library/20bw873z.aspx
This is a good article on a wide range of regex resources:
http://social.msdn.microsoft.com/For...3-d93c9797fdce
-
May 24th, 2009, 04:51 PM
#5
Thread Starter
Addicted Member
Re: Regex problems, need help please
-
May 27th, 2009, 02:07 PM
#6
Thread Starter
Addicted Member
Re: Regex problems, need help please
hello, i am trying to add a regx character that allows multiple lines to be used. For some reason .+ isnt working. you said that /n should work but ive tried that with no success. I checked out the sites you posted but couldnt find anything directly related to multiple lines.
Here is the text if it helps;
<div class="productdetailright">
<p>Sony Multi-channel speaker, 2.1 PC Speaker, 2 Satellite, 1 sub-woofer.</p>
Immerse yourself in the latest video game, DVD or share your favourite music with your friends. Offering high frequency response, the Sony range of PC speakers deliver high quality music playback, so you can be sure of an unforgettable experience. With such quality sound, your PC will soon become your favourite multimedia device!<ul><li> 2.1ch Speaker System</li><li>Space Saving 2.1ch speaker for younger user</li><li>Two Inputs (MIX) to connect to a PC and another device (WM)</li></ul><p><strong>Wired Controller</strong><br />- Power, Volume, Bass, HP jack</p><p><strong>Total Output: 35W</strong><br />- Satellite: 5W x 2<br />- STL. Driver: 57 mm<br />- Subwoofer: 25W<br />- SW. Driver: 120 mm</p><strong><br /></strong><p> </p> </div>
-
May 27th, 2009, 02:12 PM
#7
Re: Regex problems, need help please
No I said .+ won't match a new line character. To read multi-line data, take a look at the Regex options that can be passed into the constructor. You can specify that your data is multi-lined.
http://msdn.microsoft.com/en-us/libr...exoptions.aspx
-
May 27th, 2009, 02:24 PM
#8
Thread Starter
Addicted Member
Re: Regex problems, need help please
ok please bear with me. I have added the multiline option. It says i need to use the ^ and $ characters which i have implimented as follows;
vb Code:
Private Sub Product_description()
Dim html As String = WebBrowser1.DocumentText
Dim regxheader As String
regxheader = DescriptionFrontTxtBox.Text & "^+$" & DescriptionBacktxtbx.Text
'MessageBox.Show(regxheader)
Dim r = New Regex((regxheader), _
RegexOptions.IgnoreCase And RegexOptions.Multiline)
Dim m As Match = r.Match(html)
If m.Success Then
DescriptionTxtBx.Text = m.Value
MessageBox.Show(regxheader)
End If
r = Nothing
m = Nothing
End Sub
This has not worked. Am am a newbie at regex as you can proberly tell.
-
May 27th, 2009, 02:40 PM
#9
Re: Regex problems, need help please
What are you trying to extract from that? Just the bold?
This will match that text above if that's what you're checking:
vb.net Code:
"^<div class=[\w\W\s\S]+<\/div>$"
Last edited by ForumAccount; May 27th, 2009 at 02:59 PM.
-
May 28th, 2009, 03:49 PM
#10
Thread Starter
Addicted Member
Re: Regex problems, need help please
i am trying to get everything between those two points. I dont care what it is, i want everything between <div class="productdetailright"> and </div>.
I will try your example now, thanks
-
May 28th, 2009, 05:16 PM
#11
Thread Starter
Addicted Member
Re: Regex problems, need help please
i tried it in place of mine, and i got nothing. a blank output
-
May 28th, 2009, 05:50 PM
#12
Re: Regex problems, need help please
Can I see how you used it?
-
May 29th, 2009, 02:10 PM
#13
Thread Starter
Addicted Member
Re: Regex problems, need help please
vb Code:
Private Sub Product_description()
Dim html As String = WebBrowser1.DocumentText
Dim regxheader As String
'regxheader = DescriptionFrontTxtBox.Text & "^+$" & DescriptionBacktxtbx.Text
regxheader = "^<div class=[\w\W\s\S]+<\/div>$"
'MessageBox.Show(regxheader)
Dim r = New Regex((regxheader), _
RegexOptions.IgnoreCase And RegexOptions.Multiline)
Dim m As Match = r.Match(html)
If m.Success Then
DescriptionTxtBx.Text = m.Value
'MessageBox.Show(regxheader)
End If
r = Nothing
m = Nothing
End Sub
neither does this;
vb Code:
Private Sub Product_description()
Dim html As String = WebBrowser1.DocumentText
Dim regxheader As String
'regxheader = DescriptionFrontTxtBox.Text & "^+$" & DescriptionBacktxtbx.Text
regxheader = "^<div class=[\w\W\s\S]+<\/div>$"
'MessageBox.Show(regxheader)
Dim r = New Regex((regxheader), _
RegexOptions.IgnoreCase And RegexOptions.Multiline)
Dim m As Match = r.Match(html)
'If m.Success Then
DescriptionTxtBx.Text = m.Value
'MessageBox.Show(regxheader)
'End If
r = Nothing
m = Nothing
End Sub
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules
|
Click Here to Expand Forum to Full Width
|