-
Sep 29th, 2015, 01:15 PM
#1
Thread Starter
Addicted Member
[RESOLVED] Need help with parsing html
Hello, I'm trying to parse html table from a WebBrowser that has hierarchy seems like:
HTML Code:
<tr class="evenRow" id="trItem74709" spry:select="selectedRow" spry:hover="hoverRow" ondblclick="addToCart();return false;">
<td onclick="return openStokDetay('CK100FLX05')" style="cursor:pointer">
<span style="margin-right:4px">
<i class="fa fa-picture-o fa-2x"></i>
</span>
<span>
FLAXES FLX-343W WIRELESS Q TR USB KLAVYE/MOUSE SET
</span>
<span>
MAVİ TUŞ
</span>
</td>
<td>
...
</td>
...
Here's the problem. I want to eliminate
HTML Code:
<i class="fa fa-picture-o fa-2x"></i>
I just need the text in span tags which DON'T contain <i> tags. Because some rows of the table doesn't contain it, some of them does.
The code I have so far:
VBnet Code:
For Each parentElement As HtmlElement In WebBrowser1.Document.GetElementsByTagName("tr").Cast(Of HtmlElement).Where(Function(x) x.GetAttribute("id").StartsWith("trItem"))
For Each spanElement As HtmlElement In parentElement.GetElementsByTagName("span")
str = str & spanElement.InnerText & delimiter
Next
dict.Add(num, str)
num = num + 1
Next
I tried this:
VBnet Code:
If spanElement.FirstChild IsNot "i" Then
str = str & spanElement.InnerText & delimiter
End If
However it didn't seem work.
-
Sep 29th, 2015, 01:56 PM
#2
Re: Need help with parsing html
You need to turn Option Strict on, the reason is because if you look up the documentation for HtmlElement.FirstChild you would realize that the object returned is an HtmlElement and not a String and with Option Strict on it would've thrown an error. So try changing the conditional statement to:
Code:
If spanElement.FirstChild.TagName <> "i" Then
-
Sep 29th, 2015, 02:24 PM
#3
Thread Starter
Addicted Member
Re: Need help with parsing html
First thank you for your reply. I tried both option strict on and off with the line you had stated but I got an error:
Object reference not set to an instance of an object
-
Sep 29th, 2015, 02:58 PM
#4
Re: Need help with parsing html
Then that means that FirstChild is returning nothing. You need to add an additional condition to the statement:
Code:
If spanElement.FirstChild IsNot Nothing AndAlso spanElement.FirstChild.TagName <> "i" Then
-
Sep 29th, 2015, 03:13 PM
#5
Thread Starter
Addicted Member
Re: Need help with parsing html
-
Sep 29th, 2015, 03:51 PM
#6
Thread Starter
Addicted Member
Re: [RESOLVED] Need help with parsing html
I had to change it to
VBnet Code:
For Each parentElement As HtmlElement In WebBrowser1.Document.GetElementsByTagName("tr").Cast(Of HtmlElement).Where(Function(x) x.GetAttribute("id").StartsWith("trItem")) Dim str As String = "" For Each spanElement As HtmlElement In parentElement.GetElementsByTagName("span") If spanElement.FirstChild IsNot Nothing AndAlso spanElement.FirstChild.TagName = "i" Then ' Invalid tag Else str = str & spanElement.InnerText & delimiter End If Next dict.Add(num, str) num = num + 1 Next
I guess this will work fine.
-
Sep 29th, 2015, 03:57 PM
#7
Re: [RESOLVED] Need help with parsing html
You're already using LINQ, I would just expand that. I'm free-typing this and I can't test it but try this out:
Code:
Dim innerText() As String = WebBrowser1.Document.GetElementsByTagName("tr").Cast(Of HtmlElement).Where(Function(x) x.GetAttribute("id").StartsWith("trItem")).Where(Function(h) h.TagName = "span" AndAlso h.FirstChild IsNot Nothing AndAlso h.FirstChild.TagName <> "i").Select(Function(h) h.InnerText).ToArray()
Dim str As String = String.Join(delimiter, innerText)
Tags for this Thread
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules
|
Click Here to Expand Forum to Full Width
|