|
-
Feb 9th, 2010, 01:49 PM
#1
Thread Starter
Junior Member
[RESOLVED] Parse HTML Table
Hi all again after a week of trying to figure out how to parse a HTML table I have yet to figure it out.
Below is the Table I am trying to get the information out of.
Code:
<table class="resourceInfoTable">
<tbody>
<tr>
<th>Production</th>
</tr>
<tr>
<td class="fontSize11 fontBold colorWhite">28,900</td>
</tr>
<tr>
<th>Consumption</th>
</tr>
<tr>
<td class="fontSize11 fontBold colorWhite">23,132</td>
</tr>
<tr>
<td class="borderBottom"> </td>
</tr>
<tr>
<th>Yield</th>
</tr>
<tr>
<td class="fontSize16 fontBold fontColorRace">5,768</td>
</tr>
<tr>
<th>Stored</th>
</tr>
<tr>
<td class="fontSize11 fontBold colorWhite">170,000</td>
</tr>
</tbody>
</table>
Below is the code I am using to Pull that information out.
VB Code:
Public Sub Energy()
Dim doc As HtmlDocument = frmWebBrowser.WebBrowser1.Document
Dim tds As HtmlElementCollection = doc.GetElementsByTagName("div")
For Each td As HtmlElement In tds
If td.GetAttribute("id") = "planetEnergyOverview" Then
Dim linkText As String = String.Empty
Dim PlanetClassName As String = String.Empty
For Each div As HtmlElement In td.GetElementsByTagName("table")
If div.GetAttribute("className") = "resourceInfoTable" Then
For Each div2 As HtmlElement In div.GetElementsByTagName("td")
If div2.GetAttribute("className") = "fontSize11 fontBold colorWhite" Then
linkText = div2.InnerText.Trim()
If Not String.IsNullOrEmpty(linkText) Then
Exit For
End If
End If
Next
End If
Next
frmMain.lblEnergyPro.Text = "Production: " & String.Format("{0}", linkText)
End If
Next
End Sub
The problem I am having is that It pulls out the 28,900 fine but I need to pull the rest of the information IE the 23,132 and the 170,000 and they will get placed into other Labels. Now they are not Static numbers they change all the time to higher or lower lumbers.
Thank you for the help in advance this site is helped me get from VB6 to VB.net and learn new things I didn't know I was able to do in VB.
-
Feb 11th, 2010, 11:39 AM
#2
Thread Starter
Junior Member
Re: Parse HTML Table
Anyone able to help me out with this?
-
Feb 11th, 2010, 11:55 AM
#3
Re: Parse HTML Table
Is that the exact format of your html table (this is very important since if the format is off, any code we give you will not work)? And you want to pull this information from it?
Code:
Production
28,900
Consumption
23,132
Yield
5,768
Stored
170,000
Let us have faith that right makes might, and in that faith, let us, to the end, dare to do our duty as we understand it.
- Abraham Lincoln -
-
Feb 11th, 2010, 01:04 PM
#4
Fanatic Member
Re: Parse HTML Table
you could treat the table as xelement and parse it using LINQ.
If debugging is the process of removing bugs, then programming must be the process of putting them in.
-
Feb 11th, 2010, 02:58 PM
#5
Thread Starter
Junior Member
Re: Parse HTML Table
Stan yes that is the correct Format that it gets pulled out as. I dont need the Text just the numbers.
-
Feb 11th, 2010, 03:50 PM
#6
Re: Parse HTML Table
The problem is with HtmlElement.GetAttribute() method. While GetAttribute("className") works on older IE, it's doesn't work on IE8 and thus it will return an empty string, and thus this condition is never true.
Code:
If div.GetAttribute("className") = "resourceInfoTable" Then
You can completely remove that test, and I don't think it will have any affects on the end result at all. Something like this:
Code:
Dim doc As HtmlDocument = frmWebBrowser.WebBrowser1.Document
Dim divs As HtmlElementCollection = doc.GetElementsByTagName("div")
For Each div As HtmlElement In divs
If div.GetAttribute("id") = "planetEnergyOverview" Then
Dim tables As HtmlElementCollection = doc.GetElementsByTagName("table")
For Each table As HtmlElement In tables
Dim tds As HtmlElementCollection = table.GetElementsByTagName("td")
Dim tdText As String = String.Empty
For Each td As HtmlElement In tds
tdText = td.InnerText
Debug.WriteLine(tdText)
Next
Next
End If
Next
Let us have faith that right makes might, and in that faith, let us, to the end, dare to do our duty as we understand it.
- Abraham Lincoln -
-
Feb 11th, 2010, 04:48 PM
#7
Thread Starter
Junior Member
Re: Parse HTML Table
It doesn't Return anything back at all in the debug window.
I am using IE8 and the className is working fine is this something recently that happened that its not being used?
It would only return The First Numbers and not the rest of them for the table.
-
Feb 11th, 2010, 05:20 PM
#8
Re: Parse HTML Table
Where did you run that code? I hope that you run it in the WB.DocumentCompleted event.
Let us have faith that right makes might, and in that faith, let us, to the end, dare to do our duty as we understand it.
- Abraham Lincoln -
-
Feb 11th, 2010, 07:58 PM
#9
Thread Starter
Junior Member
Re: Parse HTML Table
Yes the code was run in the Document Completed Event.
I Do thank you for all the help Stanav.
-
Feb 15th, 2010, 11:17 AM
#10
Thread Starter
Junior Member
-
Feb 19th, 2010, 02:38 PM
#11
Thread Starter
Junior Member
Re: Parse HTML Table
Any idea with this? I been still trying to figure out how to get this and I am unable to get it.
-
Feb 19th, 2010, 03:39 PM
#12
Addicted Member
Re: Parse HTML Table
here try this
Code:
Imports System.Text.RegularExpressions 'in order to use regularexpresstions you must import this
Public Class Form1
Private Sub Form1_Load(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles MyBase.Load
sourcetext.LoadFile("c:\test.txt", RichTextBoxStreamType.PlainText) 'i used a richtextbox to load the file to test it
End Sub
Sub getinfo()
Dim rx As New System.Text.RegularExpressions.Regex("<td class=.*</td>") 'sets a new regular expression with the filter in quotes
Dim m As Match 'm is the match of the filter
For Each m In rx.Matches(sourcetext.Text) 'goes through each matching format in sourcetext aka the richtextbox
If m.ToString.Contains("fontSize11") Then 'since there is the border that uses the same basic format, check to see if it contains fontSize11
TextBox1.Text = TextBox1.Text & strbetween(m.ToString, "fontSize11", ">", "<") & vbCrLf 'sets textbox1 as the text between the brackets
'strbetween finds strings between characters in a string in this case m.tostring is the <td class=...etc
End If
Next
End Sub
Public Function strbetween(ByVal SearchText As String, ByVal findfirst As String, ByVal StartText As String, ByVal EndText As String)
On Error GoTo 3
Dim s
Dim a
Dim b
Dim c
Dim d
s = InStr(1, SearchText, findfirst)
If s = 0 Then
strbetween = findfirst & " Not Found"
End If
a = InStr(s, SearchText, StartText)
If a = 0 Then
strbetween = StartText & " Not Found"
GoTo 4
Else
a = a + Len(StartText)
End If
b = InStr(a, SearchText, EndText)
If b = 0 Then
strbetween = EndText & " Not Found"
GoTo 4
End If
c = b - a
d = Mid(SearchText, a, c)
strbetween = d
GoTo 4
3: strbetween = "An error occured"
4:
End Function
Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click
Call getinfo() 'calls get info to run the routine
End Sub
End Class
see if that works...
-
Feb 19th, 2010, 05:50 PM
#13
Thread Starter
Junior Member
Re: Parse HTML Table
I will try it I am going to have to Modify it to work with the way my program works. I dont load a text file and then will have to change the names... I will let you know what happens.
-
Feb 23rd, 2010, 02:09 PM
#14
Thread Starter
Junior Member
Re: Parse HTML Table
That code would require me to modify a lot of what I have done already (not saying I am not up for it) but I tried to put that into my program but it doesn't seem to work for me correctly maybe I am doing it wrong.
I dont place the information into a Textbox I have it placed into Labels and I have it save into Parts of a XML File for the program to use as a settings file.
-
Feb 23rd, 2010, 07:14 PM
#15
Re: Parse HTML Table
vb.net Code:
Dim re1 As New System.Text.RegularExpressions.Regex("(?<=\<table\s+class=""resourceInfoTable""\>).+?(?=\</table\>)") Dim re2 As New System.Text.RegularExpressions.Regex("(?<=\<td.*?\>)[0-9\,]+(?=\</td\>)") Dim html As String = (put html here) '<-- I don't use the WebBrowser, but I'm sure there's a way to get the HTML of a document. html = re1.Match(html).Value Dim matches As System.Text.RegularExpressions.MatchCollection = re.Matches(html) 'Extract values Dim production As String = matches(0).Value Dim consumption As String = matches(1).Value Dim yield As String = matches(2).Value Dim stored As String = matches(3).Value 'Do whatever you want with values
-
Mar 17th, 2010, 06:33 PM
#16
Thread Starter
Junior Member
Re: Parse HTML Table
I was able to figure this out with using my Original Code. This is how I did it below. Thank you everyone for helping me with this!
My Main issue was the When it found nothing in the TD's it would dump out of the Next so when I commented it out then it would pull all the Values out and that's when I put them in the If tsCount Statement to put each in its correct place.
VB Code:
Public Sub EnergyProduction() Dim doc As HtmlDocument = frmWebBrowser.WebBrowser1.Document Dim tds As HtmlElementCollection = doc.GetElementsByTagName("div") Dim tsCount As Integer tsCount = 0 For Each td As HtmlElement In tds If td.GetAttribute("id") = "planetEnergyOverview" Then Dim linkText As String = String.Empty Dim PlanetClassName As String = String.Empty For Each div As HtmlElement In td.GetElementsByTagName("table") If div.GetAttribute("className") = "resourceInfoTable" Then For Each div2 As HtmlElement In div.GetElementsByTagName("td") If div2.GetAttribute("className") = "fontSize11 fontBold colorWhite" Then linkText = div2.InnerText.Trim() If tsCount = 0 Then frmMain.lblEnergyPro.Text = "Production: " & String.Format("{0}", linkText) tsCount = tsCount + 1 ElseIf tsCount = 1 Then frmMain.lblEnergyCon.Text = "Consumption: " & String.Format("{0}", linkText) tsCount = tsCount + 1 ElseIf tsCount = 2 Then frmMain.lblEnergyStored.Text = "Stored: " & String.Format("{0}", linkText) End If 'frmDebug.rtbDebug.Text = frmDebug.rtbDebug.Text & "Production: " & String.Format("{0}", linkText) & vbCrLf If Not String.IsNullOrEmpty(linkText) Then 'Exit For End If End If Next End If Next End If Next End Sub
-
Mar 18th, 2010, 06:25 AM
#17
Fanatic Member
Re: [RESOLVED] Parse HTML Table
Just a little point, you should (ideally) do the if...elseif....elseif...end if block with an else at the end i.e. if...elseif....elseif....else..(something is wrong)...end if not a biggy as the code will exit gracefully if for some reason you got tsCount to be 3, however if it happened you may wanna know so you can adapt the code. this would allow you to know when the data was updated and act accordingly
If debugging is the process of removing bugs, then programming must be the process of putting them in.
-
Mar 18th, 2010, 10:43 AM
#18
Thread Starter
Junior Member
Re: [RESOLVED] Parse HTML Table
Thanks for the idea Megalith I think I will do that. Yeah getting back into coding has a somewhat steep learning Curve.
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules
|
Click Here to Expand Forum to Full Width
|