hi, after my browswer navigates to the url that i want, I want my program to save that page in a file called result.htm Here is how it gets to the url. wbWeb.Navigate(url) CAn some one help?
Printable View
hi, after my browswer navigates to the url that i want, I want my program to save that page in a file called result.htm Here is how it gets to the url. wbWeb.Navigate(url) CAn some one help?
Upon further inspection it seems you can get it via the document object. Make a reference to the COM component 'Microsoft HTML Object Library' then add a NavigateComplete2 event:
VB Code:
Private Sub wbWeb_NavigateComplete2(ByVal sender As Object, ByVal e As AxSHDocVw.DWebBrowserEvents2_NavigateComplete2Event) Handles wbWeb.NavigateComplete2 Dim doc As mshtml.HTMLDocument = wbWeb.Document dim sData As String= doc.documentElement.innerHTML() 'this is the html of the page End Sub
NOTE: If you are navigating to other pages and not just this one then you probably want to set up a flag of some sort so you only get the html on this page.
how do i initiate that event? And how do i Pharse it?
You don't initiate the event it automatically gets called when the document is finished loading. You should already have the parsing code from the previous topics you've posted. Its the RegularExpressions stuff.
I tried pharsing it but no luck. It doesnt get all the code. BUt i thought of a differnt way. First it navigates to the web page that i need the html from. i save that file as result.htm. Tehn i open it and then save it as a text file. then i phase it with yur code. this works because i have tried it but i dont know how to download that webpage. Do you?
Thats what this does. There is no need to save it as an html page then a text file that code puts all the page in sData.Quote:
Originally posted by Edneeis
Upon further inspection it seems you can get it via the document object. Make a reference to the COM component 'Microsoft HTML Object Library' then add a NavigateComplete2 event:
VB Code:
Private Sub wbWeb_NavigateComplete2(ByVal sender As Object, ByVal e As AxSHDocVw.DWebBrowserEvents2_NavigateComplete2Event) Handles wbWeb.NavigateComplete2 Dim doc As mshtml.HTMLDocument = wbWeb.Document dim sData As String= doc.documentElement.innerHTML() 'this is the html of the page End Sub
NOTE: If you are navigating to other pages and not just this one then you probably want to set up a flag of some sort so you only get the html on this page.
I am trying that but it keeps saying that sData cant be referred to before it is declared. But it is declared before it. I added the reference. Heres the code.
VB Code:
Public Class Form1 Inherits System.Windows.Forms.Form #Region " Windows Form Designer generated code " Private Sub wbWeb_NavigateComplete2(ByVal sender As Object, ByVal e As AxSHDocVw.DWebBrowserEvents2_NavigateComplete2Event) Handles AxWebBrowser1.NavigateComplete2 Dim doc As mshtml.HTMLDocument = AxWebBrowser1.Document Dim sData As String = doc.documentElement.innerHTML() 'this is the html of the page End Sub Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click AxWebBrowser1.Navigate("http://www.outwar.com/rankings.php?type=2&find=120&submit=go") End Sub Private Sub Button2_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button2.Click Dim sr As New IO.StreamReader(sData) Dim sData As String = sr.ReadToEnd sr.Close() Dim pattern As String = "(?<=\>)\w+(?=\<\/a\>)" Dim reg As New System.Text.RegularExpressions.Regex(pattern) Dim mcol As System.Text.regularexpressions.MatchCollection = reg.Matches(sData) For Each m As System.Text.RegularExpressions.Match In mcol ListBox1.Items.Add(m.Value) Next End Sub End Class
You should read up on scope. If you declare a variable in one sub you can't access it in another. It actually doesn't exist outside of the sub it was declared in. If you need something to be reached from different subs/functions then declare it in the form itself.
I assume the button fills a list of some sort from the data on the web. So really you'll need to navigate there with every button click, right?
Try this:
VB Code:
'all in the form Private CatchData As Boolean = False Private Sub wbWeb_NavigateComplete2(ByVal sender As Object, ByVal e As AxSHDocVw.DWebBrowserEvents2_NavigateComplete2Event) Handles wbWeb.NavigateComplete2 If CatchData Then 'get html Dim doc As mshtml.HTMLDocument = wbWeb.Document Dim sData As String = doc.documentElement.innerHTML() 'convert html to list Dim pattern As String = "(?<=\> )\w+(?=\<\/a\> )" Dim reg As New System.Text.RegularExpressions.Regex(pattern) Dim mcol As System.Text.regularexpressions.MatchCollection = reg.Matches(sData) For Each m As System.Text.RegularExpressions.Match In mcol ListBox1.Items.Add(m.Value) Next 'reset flag CatchData = False End If End Sub Private Sub Button2_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button2.Click 'set flag CatchData = True wbWeb.Navigate("http://www.outwar.com/rankings.php?type=2&find=120&submit=go") End Sub
as Edneeis said the string that holds the html must be available to all subs , not declared inside a sub, here's a quick example using a richtextbox to receive the string that holds the html / save the html file to the HD ...
VB Code:
[COLOR=BLUE]Dim[/COLOR] htmlDoc [COLOR=BLUE]As[/COLOR] mshtml.HTMLDocument [COLOR=GREEN]'/// reference to Microsoft.mshtml. [/COLOR] [COLOR=BLUE]Dim[/COLOR] source [COLOR=BLUE]As[/COLOR] [COLOR=BLUE]String[/COLOR] [COLOR=GREEN]'/// this must not be inside a sub, but available to all subs. [/COLOR] [COLOR=GREEN]'/// below the windows generated code area^^^. [/COLOR] [COLOR=BLUE]Private[/COLOR] [COLOR=BLUE]Sub[/COLOR] Button1_Click([COLOR=BLUE]ByVal[/COLOR] sender [COLOR=BLUE]As[/COLOR] System.Object, [COLOR=BLUE]ByVal[/COLOR] e [COLOR=BLUE]As[/COLOR] System.EventArgs) [COLOR=BLUE]Handles[/COLOR] Button1.Click AxWebBrowser1.Navigate("http://vbforums.com") [COLOR=BLUE]End[/COLOR] [COLOR=BLUE]Sub [/COLOR] [COLOR=BLUE]Private[/COLOR] [COLOR=BLUE]Sub[/COLOR] AxWebBrowser1_NavigateComplete2([COLOR=BLUE]ByVal[/COLOR] sender [COLOR=BLUE]As[/COLOR] [COLOR=BLUE]Object[/COLOR], [COLOR=BLUE]ByVal[/COLOR] e [COLOR=BLUE]As[/COLOR] AxSHDocVw.DWebBrowserEvents2_NavigateComplete2Event) [COLOR=BLUE]Handles[/COLOR] AxWebBrowser1.NavigateComplete2 htmlDoc = [COLOR=BLUE]DirectCast[/COLOR](AxWebBrowser1.Document, mshtml.HTMLDocument) source = htmlDoc.documentElement.innerHTML RichTextBox1.Text = source [COLOR=GREEN]'/// test to see if source holds the html from the webpage. [/COLOR] RichTextBox1.SaveFile("C:\someHtml.htm", RichTextBoxStreamType.PlainText) [COLOR=GREEN]'/// save the htm file to a location on the harddrive. [/COLOR] [COLOR=BLUE]End[/COLOR] [COLOR=BLUE]Sub[/COLOR]
by the way , if you want the text but not the html tags, you can use the InnerText property rather than InnerHtml , eg:
VB Code:
source = htmlDoc.documentElement.innerText '/// gets the text of the website without the html tags ^^^.
Here is the vide i now have. I dont know whats going on, after i press the button to navigate to the page (which it does) nothing happens, nothing gets added to the list box.
VB Code:
Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click wbweb.Navigate("http://www.outwar.com/rankings.php?type=2&find=120&submit=go") End Sub Private Sub wbweb_NavigateComplete2(ByVal sender As Object, ByVal e As AxSHDocVw.DWebBrowserEvents2_NavigateComplete2Event) Handles wbweb.NavigateComplete2 If CatchData Then 'get html Dim sdata2 As String Dim doc As mshtml.HTMLDocument = wbweb.Document Dim sData As String = doc.documentElement.innerHTML sData = sData2 'convert html to list Dim pattern As String = "(?<=\> )\w+(?=\<\/a\> )" Dim reg As New System.Text.RegularExpressions.Regex(pattern) Dim mcol As System.Text.regularexpressions.MatchCollection = reg.Matches(sData) For Each m As System.Text.RegularExpressions.Match In mcol ListBox1.Items.Add(m.Value) Next 'Reset(flag) CatchData = False End If End Sub
You didn't add all the code I posted. You forgot to declare the CatchData variable in the form and to set it to true before the navigate call.
i did its above the windows generated code. Still nit working. Does it work for u?
Getting the webpage worked but I can't login to get the correct html on the page that you are looking for. You still don't have CatchData=True right above the Navigate call.
VB Code:
Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click [b]CatchData=True[/b] wbweb.Navigate("http://www.outwar.com/rankings.php?type=2&find=120&submit=go") End Sub
ok i did that, but i looked at the document but it is broken down further into sub folders when i set a break points. Are u sure this will work?
What doesn't work about it? Are you not getting the data into the string (sData)? Is it not finding the html stuff you are looking for? Are you getting an error?
nothing is being added to the list some I am guessing that it is not getting the html properly. Why is it saving it to a .doc CAnt you just put it in a variable?
It doesn't save it as a .doc. Doc is the name of a variable and is of the HTMLDocument type. you have to cast to this so you can get the InnerHTML of the document. But the page never gets saved anywhere. All its text gets put into the variable sData.
You need to debug the app. Set a breakpoint or show the sData variable in a msgbox so you can see if you are getting the right html. Make sure it contains html like what you posted before.
This is what i get as HTML? Why is this Happening?
doc.documentElement.innerHTML "<HEAD><TITLE>Outwar.com Round 12 - The land of Monsters, Gangsters, and Pop Stars!</TITLE>
<META http-equiv=Content-Language content=en-us>
<META http-equiv=Content-Type content="text/html; charset=windows-1252">
<STYLE type=text/css>
<!--
#dek {POSITION:absolute;VISIBILITY:hidden;Z-INDEX:200;}
//-->
</STYLE>
<LINK href="style.css" type=text/css rel=STYLESHEET>
<SCRIPT language=JavaScript>
<!--
function SymError()
{
return true;
}
window.onerror = SymError;
var SymRealWinOpen = window.open;
function SymWinOpen(url, name, attributes)
{
return (new Object());
}
window.open = SymWinOpen;
//-->
</SCRIPT>
</HEAD>" String
I don't know you'll have to try and find another way I guess. Or check other parts of the document object. I tried.
I found another way but i need to know how to save the webpage i navigate to as a .htm file.
hey edneeis, i talked to other progrmmers that do similsr things that i am doing and they said that doing the way you said is the right(and only) way to do it. That code is very close except its not getting all the information i need, do u know a way off getting the whole html form the page not just the innertext? Thanks
No InnerText should be the inner text of the whole document. I was reading up and it seem that the problem is that you are getting the DOM after it is executed but all the other methods bypass the login mechamism for the site and just get the page (so it wont have the data you want). Sorry you'll have to research the Document object and see if you can find something. I don't know that much about it.
What is this topic called so I know what to research?
I don't know that'll be part of the research, anything on the WebBroswer control's Document property.
I fund out that everyother webpage on the web works correctly with that code except the webpage i want. This so because there is something unique about the website and i have yet to figure it out but i will.
I think it may be possibly be getting the frame of the page but I am not sure yet