|
-
Sep 10th, 2003, 10:15 PM
#1
Thread Starter
Hyperactive Member
vb.net - Strip HTML from website
This is a small function that allows a user to retrieve the text from a website minus the HTML.
VB Code:
Public Function RemoveHtml(ByVal sURL As String) As String
Dim oHttpWebRequest As System.Net.HttpWebRequest
Dim oStream As System.IO.Stream
Dim sTemp As String
oHttpWebRequest = (System.Net.HttpWebRequest.Create(sURL))
Dim oHttpWebResponse As System.Net.WebResponse = oHttpWebRequest.GetResponse()
oStream = oHttpWebResponse.GetResponseStream
sTemp = System.Text.RegularExpressions.Regex.Replace(New System.IO.StreamReader(oStream).ReadToEnd(), "<[^>]*>", "")
oStream.Close()
oHttpWebResponse.Close()
Return sTemp
End Function
To use it simply pass in the url of the website that you want to retrieve the text from
VB Code:
textbox1.text = RemoveHtml("http://www.vbforums.com")
Cheers
MarkusJ
-
Jun 4th, 2010, 12:52 AM
#2
New Member
Re: vb.net - Strip HTML from website
I added a little to your code.
I added error handling and made using the http part of the url optional.
Code:
Public Function RemoveHtml(ByVal sURL As String) As String
Dim sTemp As String = ""
Try
sURL = LCase(sURL)
If Microsoft.VisualBasic.Left(sURL, 7) <> "http://" Then
sURL = "http://" & sURL
End If
Dim oHttpWebRequest As System.Net.HttpWebRequest
Dim oStream As System.IO.Stream
oHttpWebRequest = (System.Net.HttpWebRequest.Create(sURL))
Dim oHttpWebResponse As System.Net.WebResponse = oHttpWebRequest.GetResponse()
oStream = oHttpWebResponse.GetResponseStream
sTemp = System.Text.RegularExpressions.Regex.Replace(New System.IO.StreamReader(oStream).ReadToEnd(), "<[^>]*>", "")
oStream.Close()
oHttpWebResponse.Close()
Catch ex As Exception
Console.WriteLine("Error: " & ex.Message)
End Try
Return sTemp
End Function
-
Jun 4th, 2010, 05:30 PM
#3
Re: vb.net - Strip HTML from website
Why not:
Code:
Public Function RemoveHtml(ByVal sURL As String) As String
Using wc As New Net.WebClient()
Return System.Text.RegularExpressions.Regex.Replace(New System.IO.StreamReader(wc.OpenRead(sURL)).ReadToEnd(), "<[^>]*>", "")
End Using
End Function
Last edited by minitech; Aug 26th, 2010 at 03:06 PM.
-
Aug 7th, 2010, 02:55 AM
#4
New Member
Re: vb.net - Strip HTML from website
hi, how to add proxy to strip html,
thx
i mean use your code with proxy
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules
|
Click Here to Expand Forum to Full Width
|