Results 1 to 22 of 22

Thread: [RESOLVED] How do I remove JavaScript from HTML source using Regular Expressions. Almost solved.

  1. #1

    Thread Starter
    Addicted Member
    Join Date
    Mar 2006
    Posts
    186

    Resolved [RESOLVED] How do I remove JavaScript from HTML source using Regular Expressions. Almost solved.

    Well, almost solved, though I cant get rid of the darn JavaScript. Here's what I got so far:

    Add references to "Microsoft VBScript Regular Expressions 5.5".

    On Form:
    place 2 text boxes, 1 Inet control, and 2 buttons.
    VB Code:
    1. Private Sub Command1_Click()
    2. Text1 = Inet1.OpenURL("http://www.yujunet.com/")
    3. End Sub
    4.  
    5. Private Sub Command2_Click()
    6. Dim temp1 As String
    7. Dim temp2 As String
    8. Dim temp3 As String
    9. Dim newstring
    10. temp1 = RemoveLines(Text1)
    11. temp2 = RegExFind(temp1, "<script[^>]*>(.*)</script>")
    12. temp3 = RegExReplace(temp1, temp2, "")
    13. temp3 = RemoveHTML(temp3)
    14. Text2 = temp3
    15. End Sub

    In Module:
    VB Code:
    1. Function RemoveLines(myString As String)
    2.     'convert multiline to single line string:
    3.     myString = Replace(myString, vbTab, " ")   'removes Tabs
    4.     myString = Replace(myString, Chr(13), " ")
    5.     myString = Replace(myString, Chr(10), " ")
    6.     myString = Replace(myString, vbCrLf, " ")
    7.     myString = Replace(myString, vbNewLine, " ")
    8.     RemoveLines = myString
    9. End Function
    10.  
    11.  
    12. Function RegExFind(myString As String, FindWhat As String)
    13. On Error Resume Next
    14.  
    15.    'Create objects.
    16.    Dim objRegExp As RegExp
    17.    Dim objMatch As Match
    18.    Dim colMatches   As MatchCollection
    19.    Dim RetStr As String
    20.  
    21.    Set objRegExp = New RegExp
    22.    objRegExp.Pattern = FindWhat
    23.    objRegExp.IgnoreCase = True
    24.    objRegExp.Global = True
    25.    objRegExp.MultiLine = True
    26.    If (objRegExp.Test(myString) = True) Then
    27.     Set colMatches = objRegExp.Execute(myString)
    28.     For Each objMatch In colMatches
    29.       RetStr = objMatch.Value
    30.     Next
    31.    Else
    32.     RetStr = "" 'No matches
    33.    End If
    34.    RegExFind = RetStr
    35. End Function
    36.  
    37.  
    38. Function RegExReplace(myString As String, FindThis As String, ReplaceWithThis As String)
    39. On Error Resume Next
    40.     'search string for item and then replace with new item:
    41.     Dim sourse1 As String, resourse As Object
    42.     sourse1 = myString
    43.     Set resourse = New RegExp
    44.     resourse.Pattern = FindThis
    45.     resourse.Global = True
    46.     resourse.IgnoreCase = True
    47.     If resourse.Test(sourse1) = True Then
    48.         myString = resourse.Replace(sourse1, ReplaceWithThis)
    49.     End If
    50.     RegExReplace = myString
    51. End Function
    52.  
    53.  
    54. Function RemoveHTML(strText As String)
    55.     Dim RegEx
    56.     Set RegEx = New RegExp
    57.     RegEx.Pattern = "<[^>]*>"
    58.     RegEx.Global = True
    59.     RegEx.IgnoreCase = True
    60.     strText = Replace(strText, "&nbsp;", "")
    61.     RemoveHTML = RegEx.Replace(strText, "")
    62. End Function

    Any suggestions would really help

  2. #2
    PowerPoster Static's Avatar
    Join Date
    Oct 2000
    Location
    Rochester, NY
    Posts
    9,390

    Re: How do I remove JavaScript from HTML source using Regular Expressions. Almost solved.

    what do want as a result.. all the html? or just the body?
    JPnyc rocks!! (Just ask him!)
    If u have your answer please go to the thread tools and click "Mark Thread Resolved"

  3. #3

    Thread Starter
    Addicted Member
    Join Date
    Mar 2006
    Posts
    186

    Re: How do I remove JavaScript from HTML source using Regular Expressions. Almost solved.

    I need to get rid of HTML and JavaScript.

  4. #4
    PowerPoster Static's Avatar
    Join Date
    Oct 2000
    Location
    Rochester, NY
    Posts
    9,390

    Re: How do I remove JavaScript from HTML source using Regular Expressions. Almost solved.

    so what do u want? just the text of the site?
    JPnyc rocks!! (Just ask him!)
    If u have your answer please go to the thread tools and click "Mark Thread Resolved"

  5. #5

    Thread Starter
    Addicted Member
    Join Date
    Mar 2006
    Posts
    186

    Re: How do I remove JavaScript from HTML source using Regular Expressions. Almost solved.

    Yup.

  6. #6
    PowerPoster Static's Avatar
    Join Date
    Oct 2000
    Location
    Rochester, NY
    Posts
    9,390

    Re: How do I remove JavaScript from HTML source using Regular Expressions. Almost solved.

    add the webbrowser control to your project:

    VB Code:
    1. Private Sub Form_Load()
    2.     WebBrowser1.Navigate "http://www.yujunet.com/"
    3. End Sub
    4.  
    5. Private Sub WebBrowser1_DocumentComplete(ByVal pDisp As Object, URL As Variant)
    6.     If (pDisp Is WebBrowser1.Application) Then
    7.         Debug.Print WebBrowser1.Document.documentElement.innerText
    8.     End If
    9. End Sub
    thats it
    JPnyc rocks!! (Just ask him!)
    If u have your answer please go to the thread tools and click "Mark Thread Resolved"

  7. #7

    Thread Starter
    Addicted Member
    Join Date
    Mar 2006
    Posts
    186

    Re: How do I remove JavaScript from HTML source using Regular Expressions. Almost solved.

    Thanks Static, but I dont want to use WebBrowser as it loads too many useless items to me. Thats why I wanted to use Inet.

  8. #8
    PowerPoster Static's Avatar
    Join Date
    Oct 2000
    Location
    Rochester, NY
    Posts
    9,390

    Re: How do I remove JavaScript from HTML source using Regular Expressions. Almost solved.

    Method 2:

    Add a reference to the MS HTML Object Library
    Remove the webbrowser control

    VB Code:
    1. Dim HTML As New HTMLDocument
    2.     Dim DOC As HTMLDocument
    3.     Set DOC = HTML.createDocumentFromUrl("http://www.yujunet.com/", vbNullString)
    4.     Do While DOC.ReadyState <> "complete"
    5.         DoEvents
    6.     Loop
    7.     Debug.Print DOC.documentElement.innerText
    JPnyc rocks!! (Just ask him!)
    If u have your answer please go to the thread tools and click "Mark Thread Resolved"

  9. #9

    Thread Starter
    Addicted Member
    Join Date
    Mar 2006
    Posts
    186

    Re: How do I remove JavaScript from HTML source using Regular Expressions. Almost solved.

    Here's my alternative that I'm working on, though its buggy too:
    http://www.vbforums.com/showthread.php?t=412337

  10. #10

    Thread Starter
    Addicted Member
    Join Date
    Mar 2006
    Posts
    186

    Re: How do I remove JavaScript from HTML source using Regular Expressions. Almost solved.

    Trying something else ... though still unsucessful:

    VB Code:
    1. 'Use the same module functions as in the first post.
    2.  
    3. Private Sub Command1_Click()
    4. Dim i As String
    5. i = Inet1.OpenURL("http://www.yujunet.com/")
    6. Text1 = RemoveLines(i)
    7. End Sub
    8.  
    9. Private Sub Command2_Click()
    10. Dim i As String
    11. i = RemoveSpaces(Text1)
    12. Text2 = Trim$(i)
    13. End Sub
    14.  
    15. Private Sub Command3_Click()
    16. 'this one finds the tags, but it finds the first <script[^>]*> and the last </script>, while I need to find EVERY match.
    17. Text3 = RegExFind(Text2, "<script[^>]*>(.*)</script>")
    18. End Sub
    19.  
    20. Private Sub Command4_Click()
    21.         sArray = Split(sText, "<script[^>]*>")
    22.         For i = 0 To Len(Text2)
    23.             iPoe = InStr(sArray(i), "</script>")
    24.             If iPoe Then
    25.                 iPor = "<script[^>]*>" & Mid$(sArray(i), 1, (iPoe - 1)) & "</script>"
    26.                 Text4 = Trim$(Replace(sText, iPor, " "))
    27.             End If
    28.         Next i
    29. End Sub

    Any help?

  11. #11

    Thread Starter
    Addicted Member
    Join Date
    Mar 2006
    Posts
    186

    Re: How do I remove JavaScript from HTML source using Regular Expressions. Almost solved.

    Well, looks like I've solved the pozzle. It stips HTML, JavaScript, CSS, and comment tags from HTML file, and leaves just the text. Something similat to WebBrowser1.Document.documentElement.innerText but with a use of RegEx:

    form:
    VB Code:
    1. Private Sub Command1_Click()
    2. Dim i As String
    3. i = Inet1.OpenURL("http://www.yujunet.com/")
    4. i = RemoveLines(i)
    5. i = RegExReplace(i, "<style[^>]*>[\s\S]*?</style>", " ")
    6. i = RegExReplace(i, "<script[^>]*>[\s\S]*?</script>", " ")
    7. i = RegExReplace(i, "<!--[\s\S]*?-->", " ")
    8. i = RegExReplace(i, "<[^>]*>", " ")
    9. i = RegExReplace(i, "&nbsp;", " ")
    10. i = RegExReplace(i, "&amp;", " ")
    11. i = RemoveSpaces(i)
    12. Text1 = Trim$(i)
    13. End Sub

    module:
    VB Code:
    1. Function RegExReplace(myString As String, FindThis As String, ReplaceWithThis As String)
    2. On Error Resume Next
    3.     'search string for item and then replace with new item:
    4.     Dim sourse1 As String, resourse As Object
    5.     sourse1 = myString
    6.     Set resourse = New RegExp
    7.     resourse.Pattern = FindThis
    8.     resourse.Global = True
    9.     resourse.IgnoreCase = True
    10.     If resourse.Test(sourse1) = True Then
    11.         myString = resourse.Replace(sourse1, ReplaceWithThis)
    12.     End If
    13.     RegExReplace = myString
    14. End Function
    15.  
    16. Function RemoveSpaces(myString As String)
    17. Do Until InStr(1, myString, "  ") = 0
    18.     myString = Replace(Replace(myString, "  ", " "), "  ", " ")
    19. Loop
    20. RemoveSpaces = myString
    21. End Function
    22.  
    23. Function RemoveLines(myString As String)
    24.     'convert multiline to single line string:
    25.     myString = Replace(myString, vbTab, " ")   'removes Tabs
    26.     myString = Replace(myString, Chr(13), " ") '   vbNullString
    27.     myString = Replace(myString, Chr(10), " ")
    28.     myString = Replace(myString, vbCrLf, " ")
    29.     myString = Replace(myString, vbNewLine, " ")
    30.     RemoveLines = myString
    31. End Function

    It works fine, though any improvement suggestions are really appreciated

  12. #12
    PowerPoster Static's Avatar
    Join Date
    Oct 2000
    Location
    Rochester, NY
    Posts
    9,390

    Re: How do I remove JavaScript from HTML source using Regular Expressions. Almost solved.

    Looks good to me nice work
    JPnyc rocks!! (Just ask him!)
    If u have your answer please go to the thread tools and click "Mark Thread Resolved"

  13. #13
    PowerPoster
    Join Date
    May 2006
    Posts
    2,988

    Re: How do I remove JavaScript from HTML source using Regular Expressions. Almost sol

    And without RegExpressions ..

    Added removal of Extra Chars, Special Symbols, Single Letters, Digits, Common Words. Upper Case first letter of each word.

    VB Code:
    1. Option Explicit
    2.  
    3. Private Sub Command1_Click()
    4.     Dim i As String
    5.     i = Inet1.OpenURL("http://www.yujunet.com/")
    6.     If Len(i) Then
    7.         i = RemoveLines(i)
    8.         i = RemoveTags(i, "<style", "</style>")
    9.         i = RemoveTags(i, "<script", "</script>")
    10.         i = RemoveTags(i, "<!--", "-->")
    11.         i = RemoveTags(i, "<", ">")
    12.         i = RemoveTags(i, "&#", ";")  ' SPECIAL SYMBOLS
    13.         i = RemoveChars(i, "&nbsp#&amp;#&quot#&gt;#&lt;#[#]#""#;#:#.#,#'#/#$#%#?#!#|#(#)#=#-#+#&#*#©#®")
    14.         i = RemoveDigits(i, "0 1 2 3 4 5 6 7 8 9")
    15.         i = RemoveCommon(i, "a b c d e f g h i j k l m n o p q r s t u v w x y z")
    16.         i = RemoveCommon(i, "at and com is or of to that this then the was what with where who when")
    17.         i = RemoveMultiple(i, "  ")   ' GET RID OF MULTIPLE SPACES
    18.         i = StrConv(i, vbProperCase)  ' UPPER CASE FIRST LETTER
    19.         Text1 = Trim$(i)
    20.     End If
    21. End Sub
    22.  
    23. Private Function RemoveTags(ByVal myString As String, _
    24.     start As String, finish As String) As String
    25.     Dim sArray() As String, i As Integer
    26.     Dim iPor As String, iPoe As Integer
    27.     sArray = Split(myString, start, , 3)                            ' SPLIT BY TAG START
    28.     For i = 0 To UBound(sArray)                                     ' LOOP THROUGH
    29.         iPoe = InStr(1, sArray(i), finish, 3)                       ' GET REPLACE LENGTH
    30.         If iPoe Then                                                ' IF EXISTS IN TEXT
    31.             iPor = start & Mid$(sArray(i), 1, (iPoe - 1)) & finish  ' OUR REPLACE STRING
    32.             myString = Trim$(Replace(myString, iPor, " ", , , 3))   ' REPLACE IN TEXT
    33.         End If
    34.     Next i                                                          ' NEXT TAG START
    35.     RemoveTags = myString
    36. End Function
    37.  
    38. Private Function RemoveCommon(ByVal myString As String, _
    39.     myVal As String) As String
    40.     Dim sArray() As String, i As Integer
    41.     sArray = Split(myVal)
    42.     For i = 0 To UBound(sArray)
    43.         Do While (InStr(1, " " & myString & " ", " " & sArray(i) & " ", 3))
    44.             myString = Replace(" " & myString & " ", " " & sArray(i) & " ", " ", , , 3)
    45.         Loop
    46.     Next
    47.     RemoveCommon = myString
    48. End Function
    49.  
    50. Private Function RemoveDigits(ByVal myString As String, _
    51.     myVal As String) As String
    52.     Dim sArray() As String, i As Integer
    53.     sArray = Split(myVal)
    54.     For i = 0 To UBound(sArray)
    55.         Do While (InStr(myString, sArray(i)))
    56.             myString = Replace(myString, sArray(i), " ")
    57.         Loop
    58.     Next
    59.     RemoveDigits = myString
    60. End Function
    61.  
    62. Private Function RemoveChars(ByVal myString As String, _
    63.     myVal As String) As String
    64.     Dim sArray() As String, i As Integer
    65.     sArray = Split(myVal, "#")
    66.     For i = 0 To UBound(sArray)
    67.         myString = Replace(myString, sArray(i), " ", , , 3)
    68.     Next i
    69.     myString = Replace(myString, "#", " ")
    70.     RemoveChars = myString
    71. End Function
    72.  
    73. Private Function RemoveMultiple(ByVal myString As String, _
    74.     myVal As String) As String
    75.     Do While (InStr(myString, myVal))
    76.         myString = Replace(myString, myVal, " ", , , 3)
    77.     Loop
    78.     RemoveMultiple = myString
    79. End Function
    80.  
    81. Private Function RemoveLines(ByVal myString As String) As String
    82.     myString = Replace(myString, vbTab, " ")
    83.     myString = Replace(myString, Chr(13), " ")
    84.     myString = Replace(myString, Chr(10), " ")
    85.     RemoveLines = myString
    86. End Function
    Last edited by rory; Jun 20th, 2006 at 06:17 PM.

  14. #14

    Thread Starter
    Addicted Member
    Join Date
    Mar 2006
    Posts
    186

    Re: How do I remove JavaScript from HTML source using Regular Expressions. Almost solved.

    Wow! Thanks guys. Now we getting somewhere

  15. #15
    PowerPoster
    Join Date
    May 2006
    Posts
    2,988

    Re: How do I remove JavaScript from HTML source using Regular Expressions. Almost sol

    Updated .. try it now .. like on Yahoo or something with a ton of text ..

    basically i made the RemoveSpaces a Multiple Function .. so you can remove ... as well as extra spaces .. or anything else that might have multiple chars ..

    In the case of the . you want to keep it if it is something like "$200.00" ...
    but not "End of Sentence."

    You also want to replace the commas (?) but not "$200,000.00"

    hope it helps ..

    If you want to get rid of numbers, etc then you'll need to add a function for that or let us know ..
    Last edited by rory; Jun 20th, 2006 at 04:57 PM.

  16. #16
    PowerPoster
    Join Date
    Feb 2006
    Location
    East of NYC, USA
    Posts
    5,691

    Re: How do I remove JavaScript from HTML source using Regular Expressions. Almost sol

    Quote Originally Posted by foxter
    module:
    VB Code:
    1. Function RemoveLines(myString As String)
    2.     'convert multiline to single line string:
    3.     myString = Replace(myString, vbTab, " ")   'removes Tabs
    4.     myString = Replace(myString, Chr(13), " ") '   vbNullString
    5.     myString = Replace(myString, Chr(10), " ")
    6.     myString = Replace(myString, vbCrLf, " ")
    7.     myString = Replace(myString, vbNewLine, " ")
    8.     RemoveLines = myString
    9. End Function
    Once you've removed all occurrences of Chr(13) and Chr(10), there are no occurrences of vbCrLf or vbNewLine - you've removed them. vbCr is Chr(13), vbLf is Chr(10) and vbNewLine is vbCr & vbLf.
    The most difficult part of developing a program is understanding the problem.
    The second most difficult part is deciding how you're going to solve the problem.
    Actually writing the program (translating your solution into some computer language) is the easiest part.

    Please indent your code and use [HIGHLIGHT="VB"] [/HIGHLIGHT] tags around it to make it easier to read.

    Please Help Us To Save Ana

  17. #17
    I'm about to be a PowerPoster!
    Join Date
    Jan 2005
    Location
    Everywhere
    Posts
    13,647

    Re: How do I remove JavaScript from HTML source using Regular Expressions. Almost solved.

    I think it would be quicker to use the HTML library and the innerText property like Static originally suggested. Reference against the Microsoft HTML Object Library (or whatever it's called) rather than the Webbrowser control. I agree that using a control for this is not really appropriate but it doesn't mean you should shut yourself out from taking advantage of an already present routine which is likely to be more efficient and powerful.

  18. #18
    PowerPoster
    Join Date
    May 2006
    Posts
    2,988

    Re: How do I remove JavaScript from HTML source using Regular Expressions. Almost sol

    Other Methods to get the Text .. API, MSXML, HTML Object (already suggested), Winsock (wont show that here)


    API (no controls needed):

    sText = SendAPIRequest("http://www.mywebsitelink.com")

    VB Code:
    1. Option Explicit
    2.  
    3. Private Const STRING_SIZE = 128
    4. Private Const INTERNET_OPEN_TYPE_DIRECT = 1
    5. Private Const INTERNET_FLAG_NO_CACHE_WRITE = &H4000000
    6.  
    7. Private Declare Function InternetOpen Lib "wininet" Alias "InternetOpenA" _
    8. (ByVal sAgent As String, ByVal lAccessType As Long, ByVal sProxyName As String, _
    9. ByVal sProxyBypass As String, ByVal lFlags As Long) As Long
    10.  
    11. Private Declare Function InternetCloseHandle Lib "wininet" (ByRef hInet As Long) As Long
    12.  
    13. Private Declare Function InternetReadFile Lib "wininet" _
    14. (ByVal hFile As Long, ByVal sBuffer As String, ByVal lNumBytesToRead As Long, lNumberOfBytesRead As Long) As Integer
    15.  
    16. Private Declare Function InternetOpenUrl Lib "wininet" Alias "InternetOpenUrlA" _
    17. (ByVal hInternetSession As Long, ByVal lpszUrl As String, ByVal lpszHeaders As String, _
    18. ByVal dwHeadersLength As Long, ByVal dwFlags As Long, ByVal dwContext As Long) As Long
    19.  
    20. '// GET TEXT FROM WEB PAGE **** USING API
    21. Private Function SendAPIRequest(ByVal strUrl As String) As String
    22.     Dim hOpen As Long, hFile As Long
    23.     Dim Ret As Long, sBuffer As String * 128
    24.     Dim iResult As Integer, sData As String
    25.     hOpen = InternetOpen("VB Program", 1, vbNullString, vbNullString, 0)
    26.     If hOpen = 0 Then
    27.         MsgBox "Error opening Internet connection"
    28.         Exit Function
    29.     End If
    30.     hFile = InternetOpenUrl(hOpen, strUrl, vbNullString, 0, INTERNET_FLAG_NO_CACHE_WRITE, 0)
    31.     If hFile = 0 Then
    32.         MsgBox "Error opening Web page"
    33.     Else
    34.         InternetReadFile hFile, sBuffer, STRING_SIZE, Ret
    35.         sData = sBuffer
    36.         Do While Ret <> 0
    37.             InternetReadFile hFile, sBuffer, STRING_SIZE, Ret
    38.             sData = sData + Mid(sBuffer, 1, Ret)
    39.         Loop
    40.     End If
    41.     InternetCloseHandle hFile
    42.     InternetCloseHandle hOpen
    43.     SendAPIRequest = sData
    44.     sData = ""
    45. End Function


    MSXML: Reference Microsoft XML, version 2.0 (or above if your server supports it - 4.0 suggested)

    sText = SendRequest("http://www.mywebsitelink.com")

    VB Code:
    1. Option Explicit
    2.  
    3. Private Function SendRequest(ByVal strUrl As String) _
    4.     As String
    5.     On Error Resume Next
    6.     Dim objHTTP As New MSXML.XMLHTTPRequest                         ' CREATE OBJECT
    7.     objHTTP.Open "GET", strUrl, False                               ' START REQUEST
    8.     objHTTP.setRequestHeader "Content-Type", "text/html"
    9.     If Err = 0 Then                                                 ' NO ERRORS
    10.         objHTTP.send                                                ' SEND REQUEST
    11.         SendRequest = objHTTP.responseText                          ' GET TEXT
    12.     Else
    13.         MsgBox "Error " & Err.Number & _
    14.         vbNewLine & Err.Description
    15.     End If
    16. End Function

    And the One that was posted above .. HTML Object Library ..
    Reference Microsoft HTML Object Library.

    In this case as shown by static, it strips all the tags already ..
    though you would still need to clean up the text.

    sText = getHTMLDocument("http://www.mywebsitelink.com")

    VB Code:
    1. Option Explicit
    2.  
    3. Private Function getHTMLDocument(ByVal strUrl As String) As String
    4.     Dim HTML As New HTMLDocument
    5.     Dim DOC As HTMLDocument
    6.     Set DOC = HTML.createDocumentFromUrl(strUrl, vbNullString)
    7.     Do While DOC.ReadyState <> "complete"
    8.         DoEvents
    9.     Loop
    10.     getHTMLDocument = DOC.documentElement.innerText
    11. End Function
    Last edited by rory; Jun 21st, 2006 at 01:24 AM.

  19. #19
    PowerPoster
    Join Date
    May 2006
    Posts
    2,988

    Re: How do I remove JavaScript from HTML source using Regular Expressions. Almost sol

    Quote Originally Posted by penagate
    I think it would be quicker to use the HTML library and the innerText property like Static originally suggested. Reference against the Microsoft HTML Object Library (or whatever it's called) rather than the Webbrowser control. I agree that using a control for this is not really appropriate but it doesn't mean you should shut yourself out from taking advantage of an already present routine which is likely to be more efficient and powerful.
    agreed, didnt even know that control existed .. :-)

  20. #20
    Member
    Join Date
    Jun 2006
    Posts
    41

    Re: [RESOLVED] How do I remove JavaScript from HTML source using Regular Expressions. Almost solved.

    I have a similar task to read text from a FAQ webpage and store the Q/A pairs somehow. They will then all be written to an aiml output file in the format below

    <aiml>
    <category>
    <pattern>WHAT ARE YOU</pattern>
    <template>
    I am the latest result in artificial intelligence,
    which can reproduce the capabilities of the human brain
    with greater speed and accuracy.
    </template>
    </category>
    .
    .
    .
    <aiml>
    see -- http://www.alicebot.org/aiml.html

    The use of the webrowsercontrol is pretty cool. Any comments ?

  21. #21

    Thread Starter
    Addicted Member
    Join Date
    Mar 2006
    Posts
    186

    Re: [RESOLVED] How do I remove JavaScript from HTML source using Regular Expressions. Almost solved.

    In the code from my first post:
    VB Code:
    1. temp2 = RegExFind(temp1, "<script[^>]*>(.*)</script>")
    change to:
    VB Code:
    1. temp2 = RegExFind(temp1, "<aiml>(.*)</aiml>")
    and then search for text within <TEMPLATE> tags within that string:
    VB Code:
    1. myNewString = RegExFind(temp2, "<template>(.*)</template>")

  22. #22
    Member
    Join Date
    Jun 2006
    Posts
    41

    Re: [RESOLVED] How do I remove JavaScript from HTML source using Regular Expressions. Almost solved.

    Actually after reading the text from a faq webpage, the program will then output an aiml file complete with the <aiml>...</aiml> tags, I'm not trying to strip text from the aiml pages at all. Is this your understanding foxter? It seems you're recommending how to strip text form an aiml file.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  



Click Here to Expand Forum to Full Width