Results 1 to 8 of 8

Thread: Stripping...

  1. #1
    ricmitch_uk
    Guest

    Talking Stripping...

    Thought that title'ld get some people's attention...
    I need some help with this HTML stripping code, I'm having trouble removing style sheets, applet things, etc.
    Can someone tell me where I'm going wrong with this code:
    Code:
    Private Sub StripIt()
    On Error Resume Next
    Dim strHTML As String
    Dim intPos As Integer
    Dim intPos2 As Integer
    Dim intPos3 As Integer
    Dim strTempText As String
    Dim strTempText2 As String
    Dim Temp As Integer
        strHTML = WebBrowser1.Document.documentelement.InnerHTML
        Do
            If InStr(1, strHTML, "<") And InStr(1, strHTML, ">") <> 0 Then
                intPos = InStr(1, strHTML, "<")
                strTempText = Left$(strHTML, intPos - 1)
                intPos2 = InStr(intPos, strHTML, ">")
                Select Case LCase$(Mid$(strHTML, intPos, 7))
                Case "<script", "<style>", "<applet"
                    intPos3 = InStr((intPos2 + 1), strHTML, ">")
                    strTempText2 = Right$(strHTML, (Len(strHTML) - intPos3))
                    GoTo EndSelect
                Case Else
                    strTempText2 = Right$(strHTML, (Len(strHTML) - intPos2))
                End Select
    EndSelect:
                strHTML = strTempText + strTempText2
            Else
                Exit Do
            End If
        Loop
        txtPage.Text = strHTML
    End Sub
    Thanks for that.

  2. #2
    Retired VBF Adm1nistrator plenderj's Avatar
    Join Date
    Jan 2001
    Location
    Dublin, Ireland
    Posts
    10,359
    Well what errors are you getting ?

    - jamie
    Microsoft MVP : Visual Developer - Visual Basic [2004-2005]

  3. #3
    Guest
    I don't get any errors. It just doesn't remove the stuff between the style tags.
    Eg.
    With...
    Code:
    <STYLE TYPE = "TEXT/CSS">
    A:hover { color : #ff0000;
                  text-decoration : none }
    </STYLE>
    I get left with:
    A:hover { color : #ff0000;
    text-decoration : none }

  4. #4
    Retired VBF Adm1nistrator plenderj's Avatar
    Join Date
    Jan 2001
    Location
    Dublin, Ireland
    Posts
    10,359
    Well ;

    Code:
                Case "<script", "<style>", "<applet"
    Should probably be ;

    Code:
                Case "<script", "<style", "<applet"
    That work ?

    - jamie
    Microsoft MVP : Visual Developer - Visual Basic [2004-2005]

  5. #5
    Guest
    Nope. It removes script tags fine. It's just bitching about, the style tags for some reason, even if they don't have comments round the style (which I haven't accounted for yet), it still doesn't work.

  6. #6
    Hyperactive Member Kagey's Avatar
    Join Date
    Sep 2000
    Location
    The Wilderness of New Brunswick
    Posts
    294

    try htis

    i found this around:
    Code:
    Function StripHTMLTag(ByVal sText)
       StripHTMLTag = ""
       fFound = False
       Do While InStr(sText, "<")
          fFound = True
          StripHTMLTag = StripHTMLTag & " " & Left(sText, InStr(sText, "<")-1)
          sText = MID(sText, InStr(sText, ">") + 1)
       Loop
       StripHTMLTag = StripHTMLTag & sText
       If Not fFound Then StripHTMLTag = sText
    End Function
    let me know how it goes.

  7. #7
    Guest
    All that does is remove the single tags. If you have tags which have stuff between them it doesn't remove that.
    Eg:-
    Script tags
    Code:
    <script>
    function goback()
    {
    window.location = history.go(-1);
    }
    </script>
    and Style tags
    Code:
    <style type = "text/css">
    A:hover { color : #ff0000;
                    text-decoration : none }
    </style>
    and Applet tags
    Code:
    <applet codebase = "nav.class">
    <param name="backcolor" value="red">
    </applet>
    There are lots of other tags that use this type of format, although many aren't used very often they do pop up and more people are using style sheets and javascript to get the effects they want on their webpages.
    If you want to view it text only, then you don't want to see all their source-code, most inexpirienced users would think the computer isn't working and would never use the program again.

  8. #8
    Retired VBF Adm1nistrator plenderj's Avatar
    Join Date
    Jan 2001
    Location
    Dublin, Ireland
    Posts
    10,359
    Ah just for christ sake
    Just put everything into a string,
    do something like :

    Code:
    mid(var_string, Instr(1, var_string, "<SCRIPT>", vbtextcompare), Instr(1, var_string, "</SCRIPT>, vbtextcompare) - Instr(1, var_string, "<SCRIPT>", vbtextcompare)) = " "
    Microsoft MVP : Visual Developer - Visual Basic [2004-2005]

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  



Click Here to Expand Forum to Full Width