I Need to parse a html file and get only the text from the webpage not the html code it self... how do I do this
Printable View
I Need to parse a html file and get only the text from the webpage not the html code it self... how do I do this
Or how would I display a webpages source in a textbox?
Set the TextBox MultiLine = True:
VB Code:
Option Explicit Private Sub Form_Load() WebBrowser1.Navigate2 ("www.vbforums.com") End Sub Private Sub WebBrowser1_DocumentComplete(ByVal pDisp As Object, URL As Variant) If (pDisp Is WebBrowser1.Application) Then Text1.Text = WebBrowser1.Document.Body.InnerText End If End Sub
This can be quite involved and fairly painful lol. It is easier if the website you want to parse has the particular text you want surrounded by specific tags, still not easy but certainly easier. If this is not the case then you need to get the source and use a replace function to remove all the html tags, tedious and not foolproof unless you handle every tag and combination possibe =/
I personally wouldn't use the webbrowser for this function, it is far too memory intensive, usually around 30mb. I would use Inet, it is quick and easy and on a fast internet connection pretty foolproof.
VB Code:
Text1.Text = Inet1.OpenURL("http://www.yahoo.com")