PDA

Click to See Complete Forum and Search --> : Stripping...


ricmitch_uk
Feb 10th, 2001, 04:40 AM
Thought that title'ld get some people's attention...
I need some help with this HTML stripping code, I'm having trouble removing style sheets, applet things, etc.
Can someone tell me where I'm going wrong with this code:

Private Sub StripIt()
On Error Resume Next
Dim strHTML As String
Dim intPos As Integer
Dim intPos2 As Integer
Dim intPos3 As Integer
Dim strTempText As String
Dim strTempText2 As String
Dim Temp As Integer
strHTML = WebBrowser1.Document.documentelement.InnerHTML
Do
If InStr(1, strHTML, "<") And InStr(1, strHTML, ">") <> 0 Then
intPos = InStr(1, strHTML, "<")
strTempText = Left$(strHTML, intPos - 1)
intPos2 = InStr(intPos, strHTML, ">")
Select Case LCase$(Mid$(strHTML, intPos, 7))
Case "<script", "<style>", "<applet"
intPos3 = InStr((intPos2 + 1), strHTML, ">")
strTempText2 = Right$(strHTML, (Len(strHTML) - intPos3))
GoTo EndSelect
Case Else
strTempText2 = Right$(strHTML, (Len(strHTML) - intPos2))
End Select
EndSelect:
strHTML = strTempText + strTempText2
Else
Exit Do
End If
Loop
txtPage.Text = strHTML
End Sub

Thanks for that.

plenderj
Feb 12th, 2001, 05:06 AM
Well what errors are you getting ?

- jamie

Feb 13th, 2001, 12:08 PM
I don't get any errors. It just doesn't remove the stuff between the style tags.
Eg.
With...

<STYLE TYPE = "TEXT/CSS">
A:hover { color : #ff0000;
text-decoration : none }
</STYLE>

I get left with:
A:hover { color : #ff0000;
text-decoration : none }

plenderj
Feb 14th, 2001, 02:03 AM
Well ;


Case "<script", "<style>", "<applet"


Should probably be ;


Case "<script", "<style", "<applet"


That work ?

- jamie

Feb 14th, 2001, 01:09 PM
Nope. It removes script tags fine. It's just bitching about, the style tags for some reason, even if they don't have comments round the style (which I haven't accounted for yet), it still doesn't work.

Kagey
Feb 14th, 2001, 05:16 PM
i found this around:

Function StripHTMLTag(ByVal sText)
StripHTMLTag = ""
fFound = False
Do While InStr(sText, "<")
fFound = True
StripHTMLTag = StripHTMLTag & " " & Left(sText, InStr(sText, "<")-1)
sText = MID(sText, InStr(sText, ">") + 1)
Loop
StripHTMLTag = StripHTMLTag & sText
If Not fFound Then StripHTMLTag = sText
End Function


let me know how it goes.

Feb 15th, 2001, 11:21 AM
All that does is remove the single tags. If you have tags which have stuff between them it doesn't remove that.
Eg:-
Script tags

<script>
function goback()
{
window.location = history.go(-1);
}
</script>

and Style tags

<style type = "text/css">
A:hover { color : #ff0000;
text-decoration : none }
</style>

and Applet tags

<applet codebase = "nav.class">
<param name="backcolor" value="red">
</applet>

There are lots of other tags that use this type of format, although many aren't used very often they do pop up and more people are using style sheets and javascript to get the effects they want on their webpages.
If you want to view it text only, then you don't want to see all their source-code, most inexpirienced users would think the computer isn't working and would never use the program again.

plenderj
Feb 15th, 2001, 11:35 AM
Ah just for christ sake
Just put everything into a string,
do something like :


mid(var_string, Instr(1, var_string, "<SCRIPT>", vbtextcompare), Instr(1, var_string, "</SCRIPT>, vbtextcompare) - Instr(1, var_string, "<SCRIPT>", vbtextcompare)) = " "