Results 1 to 7 of 7

Thread: [RESOLVED] Need help with parsing html

  1. #1

    Thread Starter
    Addicted Member
    Join Date
    Oct 2013
    Posts
    200

    Resolved [RESOLVED] Need help with parsing html

    Hello, I'm trying to parse html table from a WebBrowser that has hierarchy seems like:

    HTML Code:
    <tr class="evenRow" id="trItem74709" spry:select="selectedRow" spry:hover="hoverRow" ondblclick="addToCart();return false;">
    		<td onclick="return openStokDetay('CK100FLX05')" style="cursor:pointer">
    				<span style="margin-right:4px">
    						<i class="fa fa-picture-o fa-2x"></i>
    				</span>
    				<span>
    						FLAXES FLX-343W WIRELESS Q TR USB KLAVYE/MOUSE SET
    				</span>
    				<span>
    						MAVİ TUŞ
    				</span>
    		</td>
    		<td>
    		        ...
    		</td>
    
    		...
    Here's the problem. I want to eliminate

    HTML Code:
    <i class="fa fa-picture-o fa-2x"></i>
    I just need the text in span tags which DON'T contain <i> tags. Because some rows of the table doesn't contain it, some of them does.

    The code I have so far:

    VBnet Code:
    1. For Each parentElement As HtmlElement In WebBrowser1.Document.GetElementsByTagName("tr").Cast(Of HtmlElement).Where(Function(x) x.GetAttribute("id").StartsWith("trItem"))
    2.     For Each spanElement As HtmlElement In parentElement.GetElementsByTagName("span")
    3.             str = str & spanElement.InnerText & delimiter
    4.     Next
    5.  
    6.     dict.Add(num, str)
    7.     num = num + 1
    8. Next

    I tried this:
    VBnet Code:
    1. If spanElement.FirstChild IsNot "i" Then
    2.         str = str & spanElement.InnerText & delimiter
    3. End If

    However it didn't seem work.

  2. #2
    Super Moderator dday9's Avatar
    Join Date
    Mar 2011
    Location
    South Louisiana
    Posts
    11,715

    Re: Need help with parsing html

    You need to turn Option Strict on, the reason is because if you look up the documentation for HtmlElement.FirstChild you would realize that the object returned is an HtmlElement and not a String and with Option Strict on it would've thrown an error. So try changing the conditional statement to:
    Code:
    If spanElement.FirstChild.TagName <> "i" Then
    "Code is like humor. When you have to explain it, it is bad." - Cory House
    VbLessons | Code Tags | Sword of Fury - Jameram

  3. #3

    Thread Starter
    Addicted Member
    Join Date
    Oct 2013
    Posts
    200

    Re: Need help with parsing html

    First thank you for your reply. I tried both option strict on and off with the line you had stated but I got an error:

    Object reference not set to an instance of an object

  4. #4
    Super Moderator dday9's Avatar
    Join Date
    Mar 2011
    Location
    South Louisiana
    Posts
    11,715

    Re: Need help with parsing html

    Then that means that FirstChild is returning nothing. You need to add an additional condition to the statement:
    Code:
    If spanElement.FirstChild IsNot Nothing AndAlso spanElement.FirstChild.TagName <> "i" Then
    "Code is like humor. When you have to explain it, it is bad." - Cory House
    VbLessons | Code Tags | Sword of Fury - Jameram

  5. #5

    Thread Starter
    Addicted Member
    Join Date
    Oct 2013
    Posts
    200

    Re: Need help with parsing html

    Woww! thank you sir.

  6. #6

    Thread Starter
    Addicted Member
    Join Date
    Oct 2013
    Posts
    200

    Resolved Re: [RESOLVED] Need help with parsing html

    I had to change it to

    VBnet Code:
    1. For Each parentElement As HtmlElement In WebBrowser1.Document.GetElementsByTagName("tr").Cast(Of HtmlElement).Where(Function(x) x.GetAttribute("id").StartsWith("trItem"))
    2.     Dim str As String = ""
    3.     For Each spanElement As HtmlElement In parentElement.GetElementsByTagName("span")
    4.         If spanElement.FirstChild IsNot Nothing AndAlso
    5.                             spanElement.FirstChild.TagName = "i" Then
    6.             ' Invalid tag
    7.         Else
    8.             str = str & spanElement.InnerText & delimiter
    9.         End If
    10.     Next
    11.  
    12.     dict.Add(num, str)
    13.     num = num + 1
    14. Next

    I guess this will work fine.

  7. #7
    Super Moderator dday9's Avatar
    Join Date
    Mar 2011
    Location
    South Louisiana
    Posts
    11,715

    Re: [RESOLVED] Need help with parsing html

    You're already using LINQ, I would just expand that. I'm free-typing this and I can't test it but try this out:
    Code:
    Dim innerText() As String = WebBrowser1.Document.GetElementsByTagName("tr").Cast(Of HtmlElement).Where(Function(x) x.GetAttribute("id").StartsWith("trItem")).Where(Function(h) h.TagName = "span" AndAlso h.FirstChild IsNot Nothing AndAlso h.FirstChild.TagName <> "i").Select(Function(h) h.InnerText).ToArray()
    Dim str As String = String.Join(delimiter, innerText)
    "Code is like humor. When you have to explain it, it is bad." - Cory House
    VbLessons | Code Tags | Sword of Fury - Jameram

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  



Click Here to Expand Forum to Full Width