vb.net - Strip HTML from website
This is a small function that allows a user to retrieve the text from a website minus the HTML.
VB Code:
Public Function RemoveHtml(ByVal sURL As String) As String
Dim oHttpWebRequest As System.Net.HttpWebRequest
Dim oStream As System.IO.Stream
Dim sTemp As String
oHttpWebRequest = (System.Net.HttpWebRequest.Create(sURL))
Dim oHttpWebResponse As System.Net.WebResponse = oHttpWebRequest.GetResponse()
oStream = oHttpWebResponse.GetResponseStream
sTemp = System.Text.RegularExpressions.Regex.Replace(New System.IO.StreamReader(oStream).ReadToEnd(), "<[^>]*>", "")
oStream.Close()
oHttpWebResponse.Close()
Return sTemp
End Function
To use it simply pass in the url of the website that you want to retrieve the text from
VB Code:
textbox1.text = RemoveHtml("http://www.vbforums.com")
Cheers
MarkusJ
Re: vb.net - Strip HTML from website
I added a little to your code.
I added error handling and made using the http part of the url optional.
Code:
Public Function RemoveHtml(ByVal sURL As String) As String
Dim sTemp As String = ""
Try
sURL = LCase(sURL)
If Microsoft.VisualBasic.Left(sURL, 7) <> "http://" Then
sURL = "http://" & sURL
End If
Dim oHttpWebRequest As System.Net.HttpWebRequest
Dim oStream As System.IO.Stream
oHttpWebRequest = (System.Net.HttpWebRequest.Create(sURL))
Dim oHttpWebResponse As System.Net.WebResponse = oHttpWebRequest.GetResponse()
oStream = oHttpWebResponse.GetResponseStream
sTemp = System.Text.RegularExpressions.Regex.Replace(New System.IO.StreamReader(oStream).ReadToEnd(), "<[^>]*>", "")
oStream.Close()
oHttpWebResponse.Close()
Catch ex As Exception
Console.WriteLine("Error: " & ex.Message)
End Try
Return sTemp
End Function
Re: vb.net - Strip HTML from website
Why not:
Code:
Public Function RemoveHtml(ByVal sURL As String) As String
Using wc As New Net.WebClient()
Return System.Text.RegularExpressions.Regex.Replace(New System.IO.StreamReader(wc.OpenRead(sURL)).ReadToEnd(), "<[^>]*>", "")
End Using
End Function
Re: vb.net - Strip HTML from website
hi, how to add proxy to strip html,
thx
i mean use your code with proxy