i also get jibrish in the source code with maxthone browser maybe with google chrome browser in hebrew it could work
Printable View
i also get jibrish in the source code with maxthone browser maybe with google chrome browser in hebrew it could work
The site supports only in Internet Explorer (WebBrowser based on IE)
Hi,
I'm gonna give it another go.
First of all - I noticed the page http://battle.co.il/news/ is in encoding WINDOWS-1255
Most probably this causes issues reported in previous posts.
But when you have it in encoding UTF-8 my code will work.
It should work with windows-1255 encoding as well, but then searching for appropriate HTML strings will became a nightmare.
VB Code:
HTML Code:Code:Imports System
Imports System.Text.RegularExpressions
Public Class Form1
Private Sub Form1_Load(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles MyBase.Load
WebBrowser1.Navigate("file:///C:\mamrom_test_page.html")
End Sub
Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click
Dim theElementCollection As HtmlElementCollection
theElementCollection = WebBrowser1.Document.GetElementsByTagName("body")
For Each curElement As HtmlElement In theElementCollection
TextBox1.Text = (curElement.GetAttribute("OuterHtml").ToString)
Next
End Sub
Private Sub Button2_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button2.Click
Try
Dim mCollect As MatchCollection = Regex.Matches(TextBox1.Text.ToString, "(?<=שם: <B>).*?(?=</B><BR>מין:)", RegexOptions.IgnoreCase)
For Each m As Match In mCollect
MsgBox(m.Value)
Next
Catch ex As Exception
End Try
End Sub
End Class
I also recorded a video demonstrating how this code works - you may watch it here : SORRY - VIDEO NO LONGER AVAILABLECode:<HTML>
<HEAD>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<!--<META HTTP-EQUIV="CONTENT-TYPE" CONTENT="TEXT/HTML; CHARSET=WINDOWS-1255">-->
</HEAD>
<BODY>
שם: <b>דניאל</b><BR>
מין: <b>זכר</b><BR>
</BODY>
Just one important thing to notice. I've got all possible language packs installed on my computer - honestly - anything you can think of, so hebrew displays fine on my machine, but I have no idea if it will display fine on other computers.
EDITED:
what is good about my code. When you are displaying your html code in a textbox you see it the same way the webbrowser sees it, so you know what you suppose to look for to get your variable. Once you know how it reads the HTML code, you may simply skip this part with textbox. I hope this makes sense ;)
regards
-----------------------
http://www.vbforums.com/images/ieimages/2011/03/1.gifhttp://www.vbforums.com/images/ieimages/2011/03/1.gifhttp://www.vbforums.com/images/ieimages/2011/03/1.gif
marl very good video,thx :)
Marl,Moti and stateofidleness thx a lot.