Parsing information from a webpage
Hello:
I want to know how to parse(read) the information contained in a web page, using a VB applet that will refresh this information every 2 or 3 seconds. The web page is very simple, does not contain images and only a marquee (which can be eliminated), it only contaings text that displays temperature readings etcetera.
I know that i have to use the webbrowser control, and after the document has loaded i have to parse it, what i dont know is how to parse it.
Thanks
Re: Parsing information from a webpage
whats the URL? (or is it a local page)
We will need to see the source to be able to give u some code to do this.
if this page is online, the 2 to 3 seconds may not be possible with the webbrowser control. but what can be done is reload as soon as its done parsing which should still be very fast.
Re: Parsing information from a webpage
The webpage is local (its being generated by a microcontroller board), i dont really have that much of a source code, but i have read some. I also want to know if it is possible to just extract the information, ignoring a marquee thats on the page.
thanks
Re: Parsing information from a webpage
Yes. but I need to see the full source and what u want to get from it.
Using the webbrowser control combined with a reference to the HTML Object Library..
it should be a snap.
Can u post the full source? (or attach the page)
Thanks!
Re: Parsing information from a webpage
ok
i will post the full source
Re: Parsing information from a webpage
It would be easiest using the webbrowser control, but I suggest you use the Inet control to increase the speed of the download (if that is a priority to you).
Post the source code of the page as Static said, and we will be able to help you generate code to parse it.
Re: Parsing information from a webpage
Web Server for Embedded Applications\
</I>\r\n\
<BR>\
<A HREF=http://www.violasystems.com>\
www.violasystems.com - Embedding The Internet</A>\
</BODY>";
char Https_TestIndexPage[] = "HTTP/1.0 200 OK\r\n\
Last-modified: Fri, 18 Oct 2002 12:04:32 GMT\r\n\
Server: ESERV-10/1.0\nContent-type: text/html\r\n\
Content-length: 400\r\n\
\r\n\
<HEAD>\
<TITLE>UPRB Sistema de monitoreo de Sensores</TITLE></HEAD>\
<BODY>\
<DIV align=center>\
<H2><MARQUEE behavior=scroll direction=right width=500> MCF5282 Microcontroller</DIV></MARQUEE></H2>\
<BR>\
<DIV align=center>\
UPRB Salón 121 \
<HR>\
<BR> Temperatura Fahrenh = \
<I>\r\n\
<BR> Humedad Relativa = \
<BR> Intensidad de luz = \
<BR> Sensor de Humo = \
<BR> Sensor de Movimiento = \
<BR> Sensor de Puerta = \
<BR> Servidor WEB <BR></I>\r\n\
<BR>\
<A HREF=http://www.uprb.edu>\
www.uprb.edu</A>\
<BR><BR>\
<HR>\
</DIV>\
</BODY>";
Re: Parsing information from a webpage
this the part of the C code that generates the web page
Re: Parsing information from a webpage
;)
we need just the final result (the HTML) with data included for testing
Re: Parsing information from a webpage
the thing is that the variables that are displayed are made in another section of the microcontroller, can u suggest any code with what you have up until now and i will fill in the gaps
where it says
<BR> Temperatura Fahrenh = \
<I>\r\n\
<BR> Humedad Relativa = \
<BR> Intensidad de luz = \
<BR> Sensor de Humo = \
<BR> Sensor de Movimiento = \
<BR> Sensor de Puerta = \
this is the part of the html that i want to read (its in spanish)
Re: Parsing information from a webpage
ok i have been able to parse the inner text out of the web-page,now i have another problem:
when i try to reload the page after a variable(im controlling) changes, the information "read" by the program is not updated, and i know its a problem with the cache.
Any Info on this i will post the source code following this
Re: Parsing information from a webpage
Option Explicit
Private Sub cmdExit_Click()
If MsgBox("Are you sure?", vbYesNo, "Exiting the application") = vbYes Then
Unload Me
End If
End Sub
Private Sub cmdGo_Click()
Dim objLink As HTMLLinkElement
Dim objMSHTML As New MSHTML.HTMLDocument
Dim objDocument As MSHTML.HTMLDocument
lblStatus.Caption = "Gettting document via HTTP"
' This function is only available with Internet Explorer 5
Set objDocument = objMSHTML.createDocumentFromUrl(txtURL.Text, vbNullString)
lblStatus.Caption = "Getting and parsing HTML document"
' Tricky, to make the function wait for the document to complete, usually
' the transfer is asynchronus. Note that this string might be different if
' you have another language than english for Internet Explorer on the
' machine where the code is executed.
While objDocument.readyState <> "complete"
DoEvents
Wend
lblStatus.Caption = "Document completed"
' Copying the source to the text box
txtSource.Text = objDocument.documentElement.innerText
DoEvents
' Copying the title of the page to the label
lblTitle.Caption = "Title : " & objDocument.Title
DoEvents
lblStatus.Caption = "Extracting links"
' Processing the link collection of the HTMLDocument object
For Each objLink In objDocument.links
lstLinks.AddItem objLink
lblStatus.Caption = "Extracted " & objLink
DoEvents
Next
lblStatus.Caption = "Done"
Beep
End Sub
Re: Parsing information from a webpage
is there a way to make the program always look on the web page for the document instead of the cache?