I am having lots of trouble trying to figure out how to make it work. I was wondering if you know of a good tutorial or can make this work somehow? I attached my work so far. Thanks.
I use VB .NET 2022. Currently developing StudyX educational software, PlazSales POS system and Yargis a space ship shooter game.
Very few people would download a zip file just to have a look at your code... You'll probably get more response if you just post your code directly on the thread instead of having it as a download.
Very few people would download a zip file just to have a look at your code... You'll probably get more response if you just post your code directly on the thread instead of having it as a download.
Sure, here is my code:
Code:
Imports System.IO
Imports System.Net
Imports System
Imports System.Text
Imports System.Text.RegularExpressions
Public Class Form1
Inherits System.Windows.Forms.Form
#Region " Windows Form Designer generated code "
Public Sub New()
MyBase.New()
'This call is required by the Windows Form Designer.
InitializeComponent()
'Add any initialization after the InitializeComponent() call
End Sub
'Form overrides dispose to clean up the component list.
Protected Overloads Overrides Sub Dispose(ByVal disposing As Boolean)
If disposing Then
If Not (components Is Nothing) Then
components.Dispose()
End If
End If
MyBase.Dispose(disposing)
End Sub
'Required by the Windows Form Designer
Private components As System.ComponentModel.IContainer
'NOTE: The following procedure is required by the Windows Form Designer
'It can be modified using the Windows Form Designer.
'Do not modify it using the code editor.
Friend WithEvents txtHTMLContent As System.Windows.Forms.TextBox
<System.Diagnostics.DebuggerStepThrough()> Private Sub InitializeComponent()
Me.txtHTMLContent = New System.Windows.Forms.TextBox
Me.SuspendLayout()
'
'txtHTMLContent
'
Me.txtHTMLContent.Location = New System.Drawing.Point(112, 64)
Me.txtHTMLContent.Name = "txtHTMLContent"
Me.txtHTMLContent.TabIndex = 0
Me.txtHTMLContent.Text = "TextBox1"
'
'Form1
'
Me.AutoScaleBaseSize = New System.Drawing.Size(5, 13)
Me.ClientSize = New System.Drawing.Size(292, 266)
Me.Controls.Add(Me.txtHTMLContent)
Me.Name = "Form1"
Me.Text = "Form1"
Me.ResumeLayout(False)
End Sub
#End Region
'//////////////////////////////////////////////////////////////////////////////
Private objParser As HTMLContentParser
Private Sub cmdGetHTML_ServerClick(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles cmdGetHTML.ServerClick
Dim sURL As String = "http://" & txtURL.Value
txtHTMLContent.EnableViewState = False
txtHTMLContent.Value = objParser.Return_HTMLContent(sURL)
End Sub
Private Sub cmdParse_ServerClick(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles cmdParse.ServerClick
Call PopulatetblParsedContent()
End Sub
Private Sub PopulatetblParsedContent() 'Populate Links Table
Dim sURL As String = "http://" & txtURL.Value
Dim myAnchor As HtmlAnchor
Dim intRows As Integer
Dim intRowCount As Integer
Dim objRow As HtmlTableRow
Dim objCell As HtmlTableCell
Dim sLinks As String
Dim sImage As String
Dim lstLinks As ArrayList = objParser.ParseHTMLLinks(txtHTMLContent.Value, sURL)
Dim lstImages As ArrayList = objParser.ParseHTMLImages(txtHTMLContent.Value, sURL)
tblParsedContent = Me.tblParsedContent
tblParsedContent.EnableViewState = False
For Each sLinks In lstLinks
objRow = New HtmlTableRow
objCell = New HtmlTableCell
myAnchor = New HtmlAnchor
myAnchor.Target = "_blank"
myAnchor.InnerText = "Link: " & sLinks.ToString
myAnchor.HRef = sLinks.ToString
objCell.NoWrap = False
objCell.Controls.Add(myAnchor)
objRow.Cells.Add(objCell)
tblParsedContent.Rows.Add(objRow)
Next
For Each sImage In lstImages
objRow = New HtmlTableRow
objCell = New HtmlTableCell
myAnchor = New HtmlAnchor
myAnchor.Target = "_blank"
myAnchor.InnerText = "Img: " & sImage.ToString
myAnchor.HRef = sImage.ToString
objCell.NoWrap = False
objCell.Controls.Add(myAnchor)
objRow.Cells.Add(objCell)
tblParsedContent.Rows.Add(objRow)
Next
End Sub
'/////////////////////////////////////////////////////////////////////////////////
Private Sub Form1_Load(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles MyBase.Load
End Sub
End Class
'''''''''''''''''''''''''''''''''''''''''''''''''''''
Imports System.IO
Imports System.Net
Imports System
Imports System.Text
Imports System.Text.RegularExpressions
Public Class HTMLContentParser
Function Return_HTMLContent(ByVal sURL As String)
Dim sStream As Stream
Dim URLReq As HttpWebRequest
Dim URLRes As HttpWebResponse
Try
URLReq = WebRequest.Create(sURL)
URLRes = URLReq.GetResponse()
sStream = URLRes.GetResponseStream()
Return New StreamReader(sStream).ReadToEnd()
Catch ex As Exception
Return ex.Message
End Try
End Function
Function ParseHTMLLinks(ByVal sHTMLContent As String, ByVal sURL As String) As ArrayList
Dim rRegEx As Regex
Dim mMatch As Match
Dim aMatch As New ArrayList()
rRegEx = New Regex("a.*href\s*=\s*(?:""(?<1>[^""]*)""|(?<1>\S+))", _
RegexOptions.IgnoreCase Or RegexOptions.Compiled)
mMatch = rRegEx.Match(sHTMLContent)
While mMatch.Success
Dim sMatch As String
sMatch = ProcessURL(mMatch.Groups(1).ToString, sURL)
aMatch.Add(sMatch)
mMatch = mMatch.NextMatch()
End While
Return aMatch
End Function
Function ParseHTMLImages(ByVal sHTMLContent As String, ByVal sURL As String) As ArrayList
Dim rRegEx As Regex
Dim mMatch As Match
Dim aMatch As New ArrayList()
rRegEx = New Regex("img.*src\s*=\s*(?:""(?<1>[^""]*)""|(?<1>\S+))", _
RegexOptions.IgnoreCase Or RegexOptions.Compiled)
mMatch = rRegEx.Match(sHTMLContent)
While mMatch.Success
Dim sMatch As String
sMatch = ProcessURL(mMatch.Groups(1).ToString, sURL)
aMatch.Add(sMatch)
mMatch = mMatch.NextMatch()
End While
Return aMatch
End Function
Private Function ProcessURL(ByVal sInput As String, ByVal sURL As String)
'Find out if the sURL has a "/" after the Domain Name
'If not, give a "/" at the end
'First, check out for any slash after the
'Double Dashes of the http://
'If there is NO slash, then end the sURL string with a SLASH
If InStr(8, sURL, "/") = 0 Then
sURL += "/"
End If
'FILTERING
'Filter down to the Domain Name Directory from the Right
Dim iCount As Integer
For iCount = sURL.Length To 1 Step -1
If Mid(sURL, iCount, 1) = "/" Then
sURL = Left(sURL, iCount)
Exit For
End If
Next
'Filter out the ">" from the Left
For iCount = 1 To sInput.Length
If Mid(sInput, iCount, 4) = ">" Then
sInput = Left(sInput, iCount - 1) 'Stop and Take the Char before
Exit For
End If
Next
'Filter out unnecessary Characters
sInput = sInput.Replace("<", Chr(39))
sInput = sInput.Replace(">", Chr(39))
sInput = sInput.Replace(""", "")
sInput = sInput.Replace("'", "")
If (sInput.IndexOf("http://") < 0) Then
If (Not (sInput.StartsWith("/")) And Not (sURL.EndsWith("/"))) Then
Return sURL & "/" & sInput
Else
If (sInput.StartsWith("/")) And (sURL.EndsWith("/")) Then
Return sURL.Substring(0, sURL.Length - 1) + sInput
Else
Return sURL + sInput
End If
End If
Else
Return sInput
End If
End Function
End Class
I use VB .NET 2022. Currently developing StudyX educational software, PlazSales POS system and Yargis a space ship shooter game.
Well, lots of things. I would like to start with links and work my way up to images and portions of the text. I have many uses for this, and am very exited to make it work, but I am pretty new at VB .NET and having lots of trouble finding help compared to regular VB.
Originally Posted by stanav
What exactly are you trying to parse out from a web page's source code?
I use VB .NET 2022. Currently developing StudyX educational software, PlazSales POS system and Yargis a space ship shooter game.
Well, lots of things. I would like to start with links and work my way up to images and portions of the text. I have many uses for this, and am very exited to make it work, but I am pretty new at VB .NET and having lots of trouble finding help compared to regular VB.
If so then you should try to use a WebBrowser object to navigate to your desired URL and handle the WebBrowser.DocumentCompleted event. In this event handler, you read the WB.Document and work with it to get what you need... My suggestion is go to MSDN library and read up the documentation for WebBrowser class, especially its members.
This is an overly simplified example of using a webbrowser to parse all the links and append them to a richtextbox. Add a textbox, a button and a richtextbox to your form, then paste the code below in you code page
vb Code:
Private WithEvents WB As New WebBrowser
Private Sub WB_DocumentCompleted(ByVal sender As Object, ByVal e As System.Windows.Forms.WebBrowserDocumentCompletedEventArgs) Handles WB.DocumentCompleted
Dim lnkCollection As HtmlElementCollection = WB.Document.Links
That looks very cool. I did a bunch of research and found out that I need to the "Microsoft Web Browser" From the COM objects. I did this, however I am still getting this error:
C:\Reports\Jeff\VB\HTMLreasearcher\Form1.vb(88): Type 'WebBrowser' is not defined.
Here is my code:
Code:
Private WithEvents WB As New WebBrowser
I use VB .NET 2022. Currently developing StudyX educational software, PlazSales POS system and Yargis a space ship shooter game.
(please see last post, I still need this answered)
I found this code, but I am not sure how to use it. I tried putting it under the imports and I tried putting it inside the class. Also, it keeps telling me I need another '>' http://msdn2.microsoft.com/en-us/lib...er(VS.80).aspx
Also, should I just upgrade to 2005, or whatever the newest is? Because it seems like it supports more and more people use it?
Code:
<ComVisibleAttribute(True)> _
<ClassInterfaceAttribute(ClassInterfaceType.AutoDispatch)> _
Public Class WebBrowser
Inherits WebBrowserBase
Last edited by rex64; Sep 12th, 2007 at 09:50 PM.
I use VB .NET 2022. Currently developing StudyX educational software, PlazSales POS system and Yargis a space ship shooter game.
If you can upgrade to 2005, I highly recommend it. In the mean time, you can download the VB.Net 2005 Express for free and try it out.
That aside, for .net 1.1, you can use a webclicent to download the webpage, then load it to an IHTMLDocument2 object. Once this is done, the rest is the same
Here's what to do:
1. Add a reference to "Microsoft HTML Object Library" (it's in the COM tab)
2. Add a textbox, a richtextbox and a button to your form
3. Paste the following code
Code:
Imports mshtml
Imports System.Net
Private WithEvents WC As New WebClient
Private htmlDoc As IHTMLDocument2 = New mshtml.HTMLDocument
Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click
'First use the web client to download data and convert it to html string
Dim url As String = Me.TextBox1.Text
Dim B() As Byte = WC.DownloadData(url)
Dim html As String = System.Text.Encoding.Default.GetString(B)
'Next we use the COM IHTMLDocument2 interface to load the html string
htmlDoc.clear()
htmlDoc.write(html)
htmlDoc.close()
'Make sure that the document is fully loaded
While (htmlDoc.readyState <> "complete")
System.Threading.Thread.Sleep(1000)
Application.DoEvents()
End While
'Now we're ready to parse the document to get the links
Me.RichTextBox1.Clear()
Dim lnkCollection As IHTMLElementCollection = htmlDoc.links
For Each lnk As IHTMLElement In lnkCollection
Me.RichTextBox1.AppendText(CStr(lnk.getAttribute("href")) & ControlChars.NewLine)
Next
End Sub
Ok, I got that working. I am using Visual Studio 2005 now. My problem now is that I am I would like to also be able to post forms and get data back. For example, I would like to see the results from this form: http://www.studyx.com/contactform.html
I use VB .NET 2022. Currently developing StudyX educational software, PlazSales POS system and Yargis a space ship shooter game.