|
-
Feb 10th, 2001, 05:40 AM
#1
Stripping...
Thought that title'ld get some people's attention...
I need some help with this HTML stripping code, I'm having trouble removing style sheets, applet things, etc.
Can someone tell me where I'm going wrong with this code:
Code:
Private Sub StripIt()
On Error Resume Next
Dim strHTML As String
Dim intPos As Integer
Dim intPos2 As Integer
Dim intPos3 As Integer
Dim strTempText As String
Dim strTempText2 As String
Dim Temp As Integer
strHTML = WebBrowser1.Document.documentelement.InnerHTML
Do
If InStr(1, strHTML, "<") And InStr(1, strHTML, ">") <> 0 Then
intPos = InStr(1, strHTML, "<")
strTempText = Left$(strHTML, intPos - 1)
intPos2 = InStr(intPos, strHTML, ">")
Select Case LCase$(Mid$(strHTML, intPos, 7))
Case "<script", "<style>", "<applet"
intPos3 = InStr((intPos2 + 1), strHTML, ">")
strTempText2 = Right$(strHTML, (Len(strHTML) - intPos3))
GoTo EndSelect
Case Else
strTempText2 = Right$(strHTML, (Len(strHTML) - intPos2))
End Select
EndSelect:
strHTML = strTempText + strTempText2
Else
Exit Do
End If
Loop
txtPage.Text = strHTML
End Sub
Thanks for that.
-
Feb 12th, 2001, 06:06 AM
#2
Retired VBF Adm1nistrator
Well what errors are you getting ?
- jamie
Microsoft MVP : Visual Developer - Visual Basic [2004-2005]
-
Feb 13th, 2001, 01:08 PM
#3
I don't get any errors. It just doesn't remove the stuff between the style tags.
Eg.
With...
Code:
<STYLE TYPE = "TEXT/CSS">
A:hover { color : #ff0000;
text-decoration : none }
</STYLE>
I get left with:
A:hover { color : #ff0000;
text-decoration : none }
-
Feb 14th, 2001, 03:03 AM
#4
Retired VBF Adm1nistrator
Well ;
Code:
Case "<script", "<style>", "<applet"
Should probably be ;
Code:
Case "<script", "<style", "<applet"
That work ?
- jamie
Microsoft MVP : Visual Developer - Visual Basic [2004-2005]
-
Feb 14th, 2001, 02:09 PM
#5
Nope. It removes script tags fine. It's just bitching about, the style tags for some reason, even if they don't have comments round the style (which I haven't accounted for yet), it still doesn't work.
-
Feb 14th, 2001, 06:16 PM
#6
Hyperactive Member
try htis
i found this around:
Code:
Function StripHTMLTag(ByVal sText)
StripHTMLTag = ""
fFound = False
Do While InStr(sText, "<")
fFound = True
StripHTMLTag = StripHTMLTag & " " & Left(sText, InStr(sText, "<")-1)
sText = MID(sText, InStr(sText, ">") + 1)
Loop
StripHTMLTag = StripHTMLTag & sText
If Not fFound Then StripHTMLTag = sText
End Function
let me know how it goes.
-
Feb 15th, 2001, 12:21 PM
#7
All that does is remove the single tags. If you have tags which have stuff between them it doesn't remove that.
Eg:-
Script tags
Code:
<script>
function goback()
{
window.location = history.go(-1);
}
</script>
and Style tags
Code:
<style type = "text/css">
A:hover { color : #ff0000;
text-decoration : none }
</style>
and Applet tags
Code:
<applet codebase = "nav.class">
<param name="backcolor" value="red">
</applet>
There are lots of other tags that use this type of format, although many aren't used very often they do pop up and more people are using style sheets and javascript to get the effects they want on their webpages.
If you want to view it text only, then you don't want to see all their source-code, most inexpirienced users would think the computer isn't working and would never use the program again.
-
Feb 15th, 2001, 12:35 PM
#8
Retired VBF Adm1nistrator
Ah just for christ sake
Just put everything into a string,
do something like :
Code:
mid(var_string, Instr(1, var_string, "<SCRIPT>", vbtextcompare), Instr(1, var_string, "</SCRIPT>, vbtextcompare) - Instr(1, var_string, "<SCRIPT>", vbtextcompare)) = " "
Microsoft MVP : Visual Developer - Visual Basic [2004-2005]
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules
|
Click Here to Expand Forum to Full Width
|