Results 1 to 2 of 2

Thread: [RESOLVED] Using AxWebBrowser and DOM to extract data

  1. #1

    Thread Starter
    New Member
    Join Date
    Sep 2005
    Posts
    2

    Resolved [RESOLVED] Using AxWebBrowser and DOM to extract data

    I am attempting to extract some data from various tables in a web page. I've decided to use the WebBrowser control and DOM to get to the data. I'm struggling with the correct data types to use to access of the results of the DOM methods.

    The following code snippet is intended to simply print the number of tables on a web page, and then the size of each table. The first print works, but none of the following do (the exception get's thrown).

    I believe the problem is that objTable is simply an object, not a collection of objects.

    VB Code:
    1. Private Sub DumpTables()
    2.       Dim x As Integer
    3.       Dim i, j, k As Integer
    4.       Dim s As String
    5.       Dim objTable As Object
    6.       Dim ObjDoc As Object
    7.  
    8.       ' wbrBrowser is the WebBrowser Control on the main form.
    9.       ObjDoc = wbrBrowser.Document
    10.  
    11.       Try
    12.          objTable = wbrBrowser.Document.getElementsByTagName("TABLE")
    13.          x = objTable.length
    14.          Debug.WriteLine("Number of Tables " & x)
    15.  
    16.  
    17.          For i = 0 To x - 1
    18.             j = objTable(0).length
    19.             Debug.WriteLine("Table: all: " & j _
    20.                & " Rows: " & objTable(i).rows.length _
    21.                & " Cols: " & objTable(i).cols)
    22.          Next
    23.  
    24.       Catch ex As Exception
    25.          MessageBox.Show("It is likely that your submit does not exist or has no name attribute. Check the HTML source.", "No name att. or no submit available", MessageBoxButtons.OK, MessageBoxIcon.Exclamation)
    26.       End Try
    27.  
    28.    End Sub

    Here is a link to the MSDN document that describes the DOM.

    It seems like I'm very close to understanding this, but can't figure out what types to use so VB can interpret the results from the DOM calls.

    Any pointers or ideas would be appreciated.

  2. #2

    Thread Starter
    New Member
    Join Date
    Sep 2005
    Posts
    2

    Resolved Re: Using AxWebBrowser and DOM to extract data

    Ok, so I wasn't even close with the attempt above, but based on kleinma's excellent post
    here I was able to make some progress.

    The following code iterates every cell in every table on a web page. And then more importantly (for me anyway) shows how to directly access any of those cells using a single VB.NET statement. Note that the VB source code is dependent on the contents of the web page, but that is ok for the application I'm working on.

    I'm posting this hear as it might help others, and more importantly others may have suggestions as to better ways to do this.

    Here is the code:
    VB Code:
    1. Private Sub DumpTables()
    2.  
    3.       Dim t, c As Integer  ' Used to count tables and cells.
    4.  
    5.       Dim IWebDocument As HTMLDocument
    6.       Dim IWebElements As IHTMLElementCollection
    7.       Dim ITableElement As HTMLTable
    8.       Dim ICellElement As HTMLTableCell
    9.  
    10.       ListBox1.Items.Clear()
    11.  
    12.       'GET DOCUMENT
    13.       IWebDocument = CType(wb.Document, HTMLDocument)
    14.  
    15.       'GET TABLES
    16.       IWebElements = IWebDocument.getElementsByTagName("TABLE")
    17.       ListBox1.Items.Add("Length = " & IWebElements.length)
    18.  
    19.       ' Iterate through all the Tables on the web page.
    20.       t = 0
    21.       For Each ITableElement In IWebElements
    22.          ListBox1.Items.Add("Table = " & t & " Length - " & ITableElement.rows.length & " Cells - " & ITableElement.cells.length)
    23.  
    24.          ' Iterate through all the cells within a table.
    25.          c = 0
    26.          For Each ICellElement In ITableElement.cells
    27.             ListBox1.Items.Add("Cell ( " & t & "," & c & " -->" & ICellElement.innerText & "<--")
    28.             c = c + 1
    29.          Next
    30.          t = t + 1
    31.       Next
    32.  
    33.       ' Test directly accessing a few of the table elements.  These are hard coded and depend on the page you
    34.       ' are parsing.
    35.  
    36.       Try
    37.          ' Extract the innertext from the second cell of the first table.
    38.          ListBox1.Items.Add("IWebElements(0,1)" & IWebElements.item(0).cells.item(1).innerText)
    39.          ' Extract the innertext from the seventh cell of the fourth table.
    40.          ListBox1.Items.Add("IWebElements(3,6)" & IWebElements.item(3).cells.item(6).innerText)
    41.       Catch ex As Exception
    42.          MessageBox.Show("One of the items you tried to reference is invalid.  Check the indexes vs. the HTML source.", "Parsing Error", MessageBoxButtons.OK, MessageBoxIcon.Exclamation)
    43.       End Try
    44.  
    45.    End Sub

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  



Click Here to Expand Forum to Full Width