Results 1 to 18 of 18

Thread: [RESOLVED] Parse HTML Table

  1. #1

    Thread Starter
    Junior Member
    Join Date
    Jan 2010
    Posts
    20

    Resolved [RESOLVED] Parse HTML Table

    Hi all again after a week of trying to figure out how to parse a HTML table I have yet to figure it out.

    Below is the Table I am trying to get the information out of.

    Code:
    <table class="resourceInfoTable">
    	<tbody>
    		<tr>
              		<th>Production</th>
    		</tr>
    		<tr>
    			<td class="fontSize11 fontBold colorWhite">28,900</td>
    		</tr>
    		<tr>
    			<th>Consumption</th>
    		</tr>
    		<tr>
    			<td class="fontSize11 fontBold colorWhite">23,132</td>
    		</tr>
    		<tr>
            		<td class="borderBottom">&nbsp;</td>
    		</tr>
    		<tr>
    			<th>Yield</th>
    		</tr>
    		<tr>
    			<td class="fontSize16 fontBold fontColorRace">5,768</td>
    		</tr>
    		<tr>
            		<th>Stored</th>
    		</tr>
    		<tr>
             		<td class="fontSize11 fontBold colorWhite">170,000</td>
    		</tr>
    	</tbody>
    </table>
    Below is the code I am using to Pull that information out.
    VB Code:
    1. Public Sub Energy()
    2.  
    3.         Dim doc As HtmlDocument = frmWebBrowser.WebBrowser1.Document
    4.         Dim tds As HtmlElementCollection = doc.GetElementsByTagName("div")
    5.  
    6.         For Each td As HtmlElement In tds
    7.             If td.GetAttribute("id") = "planetEnergyOverview" Then
    8.  
    9.                 Dim linkText As String = String.Empty
    10.                 Dim PlanetClassName As String = String.Empty
    11.  
    12.                 For Each div As HtmlElement In td.GetElementsByTagName("table")
    13.                     If div.GetAttribute("className") = "resourceInfoTable" Then
    14.                         For Each div2 As HtmlElement In div.GetElementsByTagName("td")
    15.                             If div2.GetAttribute("className") = "fontSize11 fontBold colorWhite" Then
    16.                                 linkText = div2.InnerText.Trim()
    17.                                 If Not String.IsNullOrEmpty(linkText) Then
    18.                                     Exit For
    19.                                 End If
    20.                             End If
    21.                         Next
    22.                     End If
    23.                 Next
    24.                 frmMain.lblEnergyPro.Text = "Production: " & String.Format("{0}", linkText)
    25.             End If
    26.         Next
    27.  
    28.     End Sub

    The problem I am having is that It pulls out the 28,900 fine but I need to pull the rest of the information IE the 23,132 and the 170,000 and they will get placed into other Labels. Now they are not Static numbers they change all the time to higher or lower lumbers.

    Thank you for the help in advance this site is helped me get from VB6 to VB.net and learn new things I didn't know I was able to do in VB.

  2. #2

    Thread Starter
    Junior Member
    Join Date
    Jan 2010
    Posts
    20

    Re: Parse HTML Table

    Anyone able to help me out with this?

  3. #3
    PowerPoster stanav's Avatar
    Join Date
    Jul 2006
    Location
    Providence, RI - USA
    Posts
    9,290

    Re: Parse HTML Table

    Is that the exact format of your html table (this is very important since if the format is off, any code we give you will not work)? And you want to pull this information from it?
    Code:
    Production
    28,900
    Consumption
    23,132
    
    Yield
    5,768
    Stored
    170,000
    Let us have faith that right makes might, and in that faith, let us, to the end, dare to do our duty as we understand it.
    - Abraham Lincoln -

  4. #4
    Fanatic Member Megalith's Avatar
    Join Date
    Oct 2006
    Location
    Secret location in the UK
    Posts
    879

    Re: Parse HTML Table

    you could treat the table as xelement and parse it using LINQ.
    If debugging is the process of removing bugs, then programming must be the process of putting them in.

  5. #5

    Thread Starter
    Junior Member
    Join Date
    Jan 2010
    Posts
    20

    Re: Parse HTML Table

    Stan yes that is the correct Format that it gets pulled out as. I dont need the Text just the numbers.

  6. #6
    PowerPoster stanav's Avatar
    Join Date
    Jul 2006
    Location
    Providence, RI - USA
    Posts
    9,290

    Re: Parse HTML Table

    The problem is with HtmlElement.GetAttribute() method. While GetAttribute("className") works on older IE, it's doesn't work on IE8 and thus it will return an empty string, and thus this condition is never true.
    Code:
    If div.GetAttribute("className") = "resourceInfoTable" Then
    You can completely remove that test, and I don't think it will have any affects on the end result at all. Something like this:
    Code:
     Dim doc As HtmlDocument = frmWebBrowser.WebBrowser1.Document
            Dim divs As HtmlElementCollection = doc.GetElementsByTagName("div")
    
            For Each div As HtmlElement In divs
                If div.GetAttribute("id") = "planetEnergyOverview" Then
                    Dim tables As HtmlElementCollection = doc.GetElementsByTagName("table")
                    For Each table As HtmlElement In tables
                        Dim tds As HtmlElementCollection = table.GetElementsByTagName("td")
                        Dim tdText As String = String.Empty
                        For Each td As HtmlElement In tds
                            tdText = td.InnerText
                            Debug.WriteLine(tdText)
                        Next
                    Next
                End If
            Next
    Let us have faith that right makes might, and in that faith, let us, to the end, dare to do our duty as we understand it.
    - Abraham Lincoln -

  7. #7

    Thread Starter
    Junior Member
    Join Date
    Jan 2010
    Posts
    20

    Re: Parse HTML Table

    It doesn't Return anything back at all in the debug window.

    I am using IE8 and the className is working fine is this something recently that happened that its not being used?

    It would only return The First Numbers and not the rest of them for the table.

  8. #8
    PowerPoster stanav's Avatar
    Join Date
    Jul 2006
    Location
    Providence, RI - USA
    Posts
    9,290

    Re: Parse HTML Table

    Where did you run that code? I hope that you run it in the WB.DocumentCompleted event.
    Let us have faith that right makes might, and in that faith, let us, to the end, dare to do our duty as we understand it.
    - Abraham Lincoln -

  9. #9

    Thread Starter
    Junior Member
    Join Date
    Jan 2010
    Posts
    20

    Re: Parse HTML Table

    Yes the code was run in the Document Completed Event.

    I Do thank you for all the help Stanav.

  10. #10

    Thread Starter
    Junior Member
    Join Date
    Jan 2010
    Posts
    20

    Re: Parse HTML Table

    any updates by chance?

  11. #11

    Thread Starter
    Junior Member
    Join Date
    Jan 2010
    Posts
    20

    Re: Parse HTML Table

    Any idea with this? I been still trying to figure out how to get this and I am unable to get it.

  12. #12
    Addicted Member
    Join Date
    Mar 2008
    Posts
    129

    Re: Parse HTML Table

    here try this
    Code:
    Imports System.Text.RegularExpressions 'in order to use regularexpresstions you must import this
    Public Class Form1
    
        Private Sub Form1_Load(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles MyBase.Load
            sourcetext.LoadFile("c:\test.txt", RichTextBoxStreamType.PlainText) 'i used a richtextbox to load the file to test it
        End Sub
    
        Sub getinfo()
            Dim rx As New System.Text.RegularExpressions.Regex("<td class=.*</td>") 'sets a new regular expression with the filter in quotes
    
            Dim m As Match 'm is the match of the filter
    
            For Each m In rx.Matches(sourcetext.Text) 'goes through each matching format in sourcetext aka the richtextbox
                If m.ToString.Contains("fontSize11") Then 'since there is the border that uses the same basic format, check to see if it contains fontSize11
    
                    TextBox1.Text = TextBox1.Text & strbetween(m.ToString, "fontSize11", ">", "<") & vbCrLf 'sets textbox1 as the text between the brackets
                    'strbetween finds strings between characters in a string in this case m.tostring is the <td class=...etc 
                End If
            Next
    
    
        End Sub
    
        Public Function strbetween(ByVal SearchText As String, ByVal findfirst As String, ByVal StartText As String, ByVal EndText As String)
            On Error GoTo 3
            Dim s
            Dim a
            Dim b
            Dim c
            Dim d
            s = InStr(1, SearchText, findfirst)
            If s = 0 Then
                strbetween = findfirst & " Not Found"
            End If
    
            a = InStr(s, SearchText, StartText)
            If a = 0 Then
                strbetween = StartText & " Not Found"
                GoTo 4
            Else
                a = a + Len(StartText)
            End If
            b = InStr(a, SearchText, EndText)
            If b = 0 Then
                strbetween = EndText & " Not Found"
                GoTo 4
            End If
            c = b - a
            d = Mid(SearchText, a, c)
            strbetween = d
            GoTo 4
    3:      strbetween = "An error occured"
    4:
        End Function
        Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click
            Call getinfo() 'calls get info to run the routine
        End Sub
    End Class
    see if that works...

  13. #13

    Thread Starter
    Junior Member
    Join Date
    Jan 2010
    Posts
    20

    Re: Parse HTML Table

    I will try it I am going to have to Modify it to work with the way my program works. I dont load a text file and then will have to change the names... I will let you know what happens.

  14. #14

    Thread Starter
    Junior Member
    Join Date
    Jan 2010
    Posts
    20

    Re: Parse HTML Table

    That code would require me to modify a lot of what I have done already (not saying I am not up for it) but I tried to put that into my program but it doesn't seem to work for me correctly maybe I am doing it wrong.

    I dont place the information into a Textbox I have it placed into Labels and I have it save into Parts of a XML File for the program to use as a settings file.

  15. #15
    Stack Overflow mod​erator
    Join Date
    May 2008
    Location
    British Columbia, Canada
    Posts
    2,824

    Re: Parse HTML Table

    vb.net Code:
    1. Dim re1 As New System.Text.RegularExpressions.Regex("(?<=\<table\s+class=""resourceInfoTable""\>).+?(?=\</table\>)")
    2. Dim re2 As New System.Text.RegularExpressions.Regex("(?<=\<td.*?\>)[0-9\,]+(?=\</td\>)")
    3. Dim html As String = (put html here) '<-- I don't use the WebBrowser, but I'm sure there's a way to get the HTML of a document.
    4. html = re1.Match(html).Value
    5. Dim matches As System.Text.RegularExpressions.MatchCollection = re.Matches(html)
    6. 'Extract values
    7. Dim production As String = matches(0).Value
    8. Dim consumption As String = matches(1).Value
    9. Dim yield As String = matches(2).Value
    10. Dim stored As String = matches(3).Value
    11. 'Do whatever you want with values

  16. #16

    Thread Starter
    Junior Member
    Join Date
    Jan 2010
    Posts
    20

    Re: Parse HTML Table

    I was able to figure this out with using my Original Code. This is how I did it below. Thank you everyone for helping me with this!

    My Main issue was the When it found nothing in the TD's it would dump out of the Next so when I commented it out then it would pull all the Values out and that's when I put them in the If tsCount Statement to put each in its correct place.

    VB Code:
    1. Public Sub EnergyProduction()
    2.  
    3.         Dim doc As HtmlDocument = frmWebBrowser.WebBrowser1.Document
    4.         Dim tds As HtmlElementCollection = doc.GetElementsByTagName("div")
    5.         Dim tsCount As Integer
    6.  
    7.         tsCount = 0
    8.  
    9.         For Each td As HtmlElement In tds
    10.             If td.GetAttribute("id") = "planetEnergyOverview" Then
    11.  
    12.                 Dim linkText As String = String.Empty
    13.                 Dim PlanetClassName As String = String.Empty
    14.  
    15.                 For Each div As HtmlElement In td.GetElementsByTagName("table")
    16.                     If div.GetAttribute("className") = "resourceInfoTable" Then
    17.                         For Each div2 As HtmlElement In div.GetElementsByTagName("td")
    18.                             If div2.GetAttribute("className") = "fontSize11 fontBold colorWhite" Then
    19.                                 linkText = div2.InnerText.Trim()
    20.  
    21.                                 If tsCount = 0 Then
    22.                                     frmMain.lblEnergyPro.Text = "Production: " & String.Format("{0}", linkText)
    23.                                     tsCount = tsCount + 1
    24.                                 ElseIf tsCount = 1 Then
    25.                                     frmMain.lblEnergyCon.Text = "Consumption: " & String.Format("{0}", linkText)
    26.                                     tsCount = tsCount + 1
    27.                                 ElseIf tsCount = 2 Then
    28.                                     frmMain.lblEnergyStored.Text = "Stored: " & String.Format("{0}", linkText)
    29.                                 End If
    30.  
    31.                                 'frmDebug.rtbDebug.Text = frmDebug.rtbDebug.Text & "Production: " & String.Format("{0}", linkText) & vbCrLf
    32.  
    33.                                 If Not String.IsNullOrEmpty(linkText) Then
    34.                                     'Exit For
    35.                                 End If
    36.                             End If
    37.                         Next
    38.                     End If
    39.                 Next
    40.             End If
    41.         Next
    42.  
    43.     End Sub

  17. #17
    Fanatic Member Megalith's Avatar
    Join Date
    Oct 2006
    Location
    Secret location in the UK
    Posts
    879

    Re: [RESOLVED] Parse HTML Table

    Just a little point, you should (ideally) do the if...elseif....elseif...end if block with an else at the end i.e. if...elseif....elseif....else..(something is wrong)...end if not a biggy as the code will exit gracefully if for some reason you got tsCount to be 3, however if it happened you may wanna know so you can adapt the code. this would allow you to know when the data was updated and act accordingly
    If debugging is the process of removing bugs, then programming must be the process of putting them in.

  18. #18

    Thread Starter
    Junior Member
    Join Date
    Jan 2010
    Posts
    20

    Re: [RESOLVED] Parse HTML Table

    Thanks for the idea Megalith I think I will do that. Yeah getting back into coding has a somewhat steep learning Curve.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  



Click Here to Expand Forum to Full Width