Hi,
I'm gonna give it another go.
First of all - I noticed the page http://battle.co.il/news/ is in encoding WINDOWS-1255
Most probably this causes issues reported in previous posts.
But when you have it in encoding UTF-8 my code will work.
It should work with windows-1255 encoding as well, but then searching for appropriate HTML strings will became a nightmare.
VB Code:
HTML Code:Code:Imports System Imports System.Text.RegularExpressions Public Class Form1 Private Sub Form1_Load(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles MyBase.Load WebBrowser1.Navigate("file:///C:\mamrom_test_page.html") End Sub Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click Dim theElementCollection As HtmlElementCollection theElementCollection = WebBrowser1.Document.GetElementsByTagName("body") For Each curElement As HtmlElement In theElementCollection TextBox1.Text = (curElement.GetAttribute("OuterHtml").ToString) Next End Sub Private Sub Button2_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button2.Click Try Dim mCollect As MatchCollection = Regex.Matches(TextBox1.Text.ToString, "(?<=שם: <B>).*?(?=</B><BR>מין:)", RegexOptions.IgnoreCase) For Each m As Match In mCollect MsgBox(m.Value) Next Catch ex As Exception End Try End Sub End Class
I also recorded a video demonstrating how this code works - you may watch it here : SORRY - VIDEO NO LONGER AVAILABLECode:<HTML> <HEAD> <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> <!--<META HTTP-EQUIV="CONTENT-TYPE" CONTENT="TEXT/HTML; CHARSET=WINDOWS-1255">--> </HEAD> <BODY> שם: <b>דניאל</b><BR> מין: <b>זכר</b><BR> </BODY>
Just one important thing to notice. I've got all possible language packs installed on my computer - honestly - anything you can think of, so hebrew displays fine on my machine, but I have no idea if it will display fine on other computers.
EDITED:
what is good about my code. When you are displaying your html code in a textbox you see it the same way the webbrowser sees it, so you know what you suppose to look for to get your variable. Once you know how it reads the HTML code, you may simply skip this part with textbox. I hope this makes sense
regards





Reply With Quote