Results 1 to 4 of 4

Thread: vb.net - Strip HTML from website

  1. #1

    Thread Starter
    Hyperactive Member MarkusJ_NZ's Avatar
    Join Date
    Jun 2001
    Posts
    375

    vb.net - Strip HTML from website

    This is a small function that allows a user to retrieve the text from a website minus the HTML.

    VB Code:
    1. Public Function RemoveHtml(ByVal sURL As String) As String
    2.         Dim oHttpWebRequest As System.Net.HttpWebRequest
    3.         Dim oStream As System.IO.Stream
    4.         Dim sTemp As String
    5.         oHttpWebRequest = (System.Net.HttpWebRequest.Create(sURL))
    6.         Dim oHttpWebResponse As System.Net.WebResponse = oHttpWebRequest.GetResponse()
    7.         oStream = oHttpWebResponse.GetResponseStream
    8.         sTemp = System.Text.RegularExpressions.Regex.Replace(New System.IO.StreamReader(oStream).ReadToEnd(), "<[^>]*>", "")
    9.         oStream.Close()
    10.         oHttpWebResponse.Close()
    11.         Return sTemp
    12.     End Function

    To use it simply pass in the url of the website that you want to retrieve the text from

    VB Code:
    1. textbox1.text = RemoveHtml("http://www.vbforums.com")

    Cheers
    MarkusJ

  2. #2
    New Member
    Join Date
    Aug 2009
    Posts
    9

    Re: vb.net - Strip HTML from website

    I added a little to your code.

    I added error handling and made using the http part of the url optional.

    Code:
    Public Function RemoveHtml(ByVal sURL As String) As String
            Dim sTemp As String = ""
            Try
                sURL = LCase(sURL)
                If Microsoft.VisualBasic.Left(sURL, 7) <> "http://" Then
                    sURL = "http://" & sURL
                End If
    
                Dim oHttpWebRequest As System.Net.HttpWebRequest
    
                Dim oStream As System.IO.Stream
    
    
    
                oHttpWebRequest = (System.Net.HttpWebRequest.Create(sURL))
    
                Dim oHttpWebResponse As System.Net.WebResponse = oHttpWebRequest.GetResponse()
    
                oStream = oHttpWebResponse.GetResponseStream
    
                sTemp = System.Text.RegularExpressions.Regex.Replace(New System.IO.StreamReader(oStream).ReadToEnd(), "<[^>]*>", "")
    
                oStream.Close()
    
                oHttpWebResponse.Close()
            Catch ex As Exception
                Console.WriteLine("Error: " & ex.Message)
            End Try
            Return sTemp
    
        End Function

  3. #3
    Stack Overflow mod​erator
    Join Date
    May 2008
    Location
    British Columbia, Canada
    Posts
    2,824

    Re: vb.net - Strip HTML from website

    Why not:
    Code:
          Public Function RemoveHtml(ByVal sURL As String) As String
                  Using wc As New Net.WebClient()
                          Return System.Text.RegularExpressions.Regex.Replace(New System.IO.StreamReader(wc.OpenRead(sURL)).ReadToEnd(), "<[^>]*>", "")
                  End Using
              End Function
    Last edited by minitech; Aug 26th, 2010 at 03:06 PM.

  4. #4
    New Member
    Join Date
    Apr 2009
    Posts
    5

    Re: vb.net - Strip HTML from website

    hi, how to add proxy to strip html,
    thx

    i mean use your code with proxy

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  



Click Here to Expand Forum to Full Width