[RESOLVED] Getting information from HTML
Hi!
I need to retrieve information from a table in a HTML code.
The page i need to retrieve info from is not mine so i can't change the HTML code to make it solve my problem.
Here's an example of the HTML:
Code:
<html>
<head>
<title>Nana</title>
</head>
<body>
<TABLE BORDER=0 CELLSPACING=1 CELLPADDING=4 WIDTH=100%>
<TR BGCOLOR=#505050><TD COLSPAN=2 CLASS=white>
<B>Character Information</B></TD></TR>
<TR BGCOLOR=#F1E0C6><TD WIDTH=20%>Name:</TD><TD>Name Value</TD></TR>
<TR BGCOLOR=#D4C0A1><TD>Sex:</TD><TD>Value of Sex</TD></TR>
</TABLE>
</body>
</html>
So what i need to do is get the value of the field 'Name' and the value of the row 'Sex'. How can i do this?
Thanks!
//Zeelia
Re: Getting information from HTML
Re: Getting information from HTML
You should also be able to use the Web Browser control and .GetElementByID
Re: Getting information from HTML
if the elements had IDs that might work... seeing as how they don't ... RegEx would work, and would probably be most efficient. Also a possibility would be to load it up into an XMLDocument, BUT the HTML has to be fully formed XML-style (XHTML) ... but if there's not guarantee that it will.... I'd go with regex.
-tg
Re: Getting information from HTML
Hi!
Techgnome, i like that about XMLDocument, but to make a XML i need to get the values first of all.. Can you explain more about XMLDocument please?
And if anyone could help me a bit with regex, i can only think of a way to get the field where it says 'Name:' since that's static information but i don't know how to get the next row (the one with the value).
Thanks.
//Zeelia
Re: Getting information from HTML
"...but to make a XML i need to get the values first of all.. C..." Not true... IF and ONLY IF the HTML is well formed, you can actually treat it like XML...
Here's a link to the XMLDocument overview: http://msdn.microsoft.com/en-us/libr...ldocument.aspx
You'll need either the Load method: http://msdn.microsoft.com/en-us/libr...ment.load.aspx
Or the LoadXML method: http://msdn.microsoft.com/en-us/libr...t.loadxml.aspx
From there, you can use SelectNodes ( http://msdn.microsoft.com/en-us/libr...lectnodes.aspx ) to get to the nodes you need.
-tg
Re: Getting information from HTML
Hi!
XMLDocument doesn't work since the page has errors like un-closed tags.
So if anyone could send me into the right direction of how i should proceed with regex, i'd really appreciate it.
Thanks!
*EDIT*
Okay i solved the problem, im not very good at regex so if you have a better solution, please share.
Here's my solution:
vb.net Code:
Public Function RunRegEx(ByVal inputhtml As String, ByVal fieldname As String)
' Define a regular expression for currency values.
Dim rx As New Regex("<([A-Z][A-Z0-9]*)\b[^>]*>" & fieldname & "</\1><([A-Z][A-Z0-9]*)\b[^>]*>(.*?)</\1>", RegexOptions.IgnoreCase)
' Find matches.
Dim matches As MatchCollection = rx.Matches(inputhtml)
' Report on each match.
Dim i As Integer = 1
Dim returnval As String = ""
For Each match As Match In matches
If i = 1 Then
If match.ToString.Contains("width") Then
returnval = match.ToString.Remove(0, (23 + fieldname.Length))
Else
returnval = match.ToString.Remove(0, (13 + fieldname.Length))
End If
returnval = returnval.Remove(returnval.Length - 5, 5)
End If
i = i + 1
MsgBox(match.ToString)
Next
MsgBox(returnval)
Return Nothing
End Function
//Zeelia