Results 1 to 10 of 10

Thread: HtmlElement In a_tags is not staying within if condition

  1. #1

    Thread Starter
    Addicted Member
    Join Date
    Aug 2011
    Posts
    184

    HtmlElement In a_tags is not staying within if condition

    I'm trying to scrape using the code below but it is not doing what I intend it to do. I want to scrape the inner text of the tag "a" when the attribute itemprop = "name". The result is correct but it scraped it twice.

    The For Each a_tag As HtmlElement In a_tags is not staying within the itemprop="offers" condition.

    ******************
    <tr itemtype="http://schema.org/Offer" itemscope="" itemprop="offers">
    <a itemprop="name" title="2 Sets of Cross Country Asics Spikes with Handles!" class="vip" href="http://www.ebay.com/itm/2-Sets-of-Cross-Country-Asics-Spikes-with-Handles-/140838036468?pt=LH_DefaultDomain_0&amp;hash=item20ca99e3f4">2 Sets of Cross Country Asics Spikes with Handles!</a>
    </tr>
    <tr itemtype="http://schema.org/Offer" itemscope="" itemprop="offers">
    <a itemprop="name" title="2 Sets of Cross Country Asics Spikes with Handles!" class="vip" href="http://www.ebay.com/itm/2-Sets-of-Cross-Country-Asics-Spikes-with-Handles-/140838036468?pt=LH_DefaultDomain_0&amp;hash=item20ca99e3f4">#2 - 2 Sets of Cross Country Asics Spikes with Handles!</a>
    </tr>

    *****************************
    Dim trs As HtmlElementCollection = WebBrowser1.Document.GetElementsByTagName("tr")
    For Each tr As HtmlElement In trs
    If tr.GetAttribute("itemprop") = "offers" Then
    Dim a_tags As HtmlElementCollection = WebBrowser1.Document.GetElementsByTagName("a")
    For Each a_tag As HtmlElement In a_tags
    If a_tag.GetAttribute("itemprop") = "name" Then
    textbox1.text = textbox1.text & vbcrlf & a_tag.InnerText)
    End If
    Next
    End If
    Next

  2. #2
    PowerPoster dunfiddlin's Avatar
    Join Date
    Jun 2012
    Posts
    8,245

    Re: HtmlElement In a_tags is not staying within if condition

    Well, yeah. It would. At every <tr> you load all the page tags into a second collection and then stop at the first <a> tag to come along. So if you've got 2 <tr> tags, you get 2 copies of the same <a> tag text. Why are you tracking <tr> at all if its the <a> tags you actually want?

    Dim a_tags As HtmlElementCollection = WebBrowser1.Document.GetElementsByTagName("a")
    For Each a_tag As HtmlElement In a_tags
    If a_tag.GetAttribute("itemprop") = "name" Then
    textbox1.AppendText(a_tag.InnerText & vbCrLf)
    End If
    Next

  3. #3

    Thread Starter
    Addicted Member
    Join Date
    Aug 2011
    Posts
    184

    Re: HtmlElement In a_tags is not staying within if condition

    The data I'm trying to scrape is in a table with multiple columns. Each row has a tag "tr." If I just scraped the "a" tag then I cannot keep the data together from each row.

  4. #4
    PowerPoster dunfiddlin's Avatar
    Join Date
    Jun 2012
    Posts
    8,245

    Re: HtmlElement In a_tags is not staying within if condition

    Well that's not quite true. The HTML has to follow a logical sequence so the <a> tags will always appear in the same order. The items from each row will always be adjacent to each other in a list. If you want to separate them in sections you can divide a_tags.Count by trs.Count to get the number of items per row.

  5. #5

    Thread Starter
    Addicted Member
    Join Date
    Aug 2011
    Posts
    184

    Re: HtmlElement In a_tags is not staying within if condition

    I understand what you are suggesting. Is that how most people scrape data from a table? It seems like if you're off by one count, your entire data is incorrect.

    I was trying to grab the first "tr" tag that met my condition, then search all "a" tags within that tag that met my other conditions. I was hoping to scrape multiple columns of data on that row and then move to the next "tr" tag and continue in that fashion.

  6. #6
    PowerPoster dunfiddlin's Avatar
    Join Date
    Jun 2012
    Posts
    8,245

    Re: HtmlElement In a_tags is not staying within if condition

    Well you might be able to do it that way if, say, you created a new HTML document from the inner html of each table row and then scanned that for the <a> tags. So you'd have something like (structure not code)

    For Each <tr> in Original HTML
    New HTML = inner <tr>
    For each <a> in New HTML
    Append Text inner <a>
    Next
    Append Text separator
    Next

  7. #7

    Thread Starter
    Addicted Member
    Join Date
    Aug 2011
    Posts
    184

    Re: HtmlElement In a_tags is not staying within if condition

    That's a great solution. I'm having trouble implementing the code. I'm not sure how to temporarily store the HtmlElement and set a new HtmlElementCollection.

    ****************
    Dim current_tag_tr As String

    Dim trs As HtmlElementCollection = WebBrowser1.Document.GetElementsByTagName("tr")
    For Each tr As HtmlElement In trs
    If tr.GetAttribute("itemprop") = "offers" Then

    'syntax not correct
    current_tag_tr = tr.OuterHtml
    Dim a_tags As HtmlElementCollection = current_tag_tr.GetElementsByTagName("a")

    For Each a_tag As HtmlElement In a_tags
    If a_tag.GetAttribute("itemprop") = "name" Then
    TextBox11.Text = TextBox11.Text & vbCrLf & a_tag.InnerText
    End If
    Next
    End If
    Next

  8. #8

    Thread Starter
    Addicted Member
    Join Date
    Aug 2011
    Posts
    184

    Re: HtmlElement In a_tags is not staying within if condition

    I think the syntax is correct in this code but I'm still not getting the correct action. I have two test points in this code. The result of the 1 test is shown below. The 2nd point is blank.

    #1
    *******
    <TR itemprop="offers" itemtype="http://schema.org/Offer" itemscope><TD class="pic lt"><!-- Moved to ResultSet.tag --><!-- Moved to ResultSet.tag --><A class=img href="http://www.ebay.com/itm/Mens-Asics-Gel-Noosa-TRI-6-Racings-Shoes-Neon-yellow-White-Turquoise-/280950896347?pt=US_Men_s_Shoes&amp;hash=item4169fa76db" itemprop="url"><IMG class=img alt="Men's Asics-Gel Noosa TRI 6 Racings Shoes Neon yellow/White/Turquoise" src="http://thumbs4.ebaystatic.com/d/l225/m/miz8OhHhVLOkFCVWXGaDycg.jpg" itemprop="image"> </A></TD>
    <TD class=dtl>
    <DIV class=ittl><A class=vip title="Men's Asics-Gel Noosa TRI 6 Racings Shoes Neon yellow/White/Turquoise" href="http://www.ebay.com/itm/Mens-Asics-Gel-Noosa-TRI-6-Racings-Shoes-Neon-yellow-White-Turquoise-/280950896347?pt=US_Men_s_Shoes&amp;hash=item4169fa76db" itemprop="name">Men's Asics-Gel Noosa TRI 6 Racings Shoes Neon yellow/White/T​urquoise</A> </DIV><!-- Moved to ResultSet.tag -->
    <DIV class="dyn dynS">
    <DIV class="s2 distLoc"></DIV>
    <DIV class=s2>Returns: Not accepted</DIV>
    <DIV style="CLEAR: left"></DIV></DIV>
    <DIV></DIV>
    <DIV class=anchors>
    <DIV class=group>
    <DIV class=mi-l><!-- Moved to ResultSet.tag -->
    <DIV class=mi><A class="lnk iconQuickLook_14x14 mi-a" url="http://www.ebay.com/sch/moreinfo/?_id=280950896347&amp;_ptns=US_Men_s_Shoes&amp;_pppn=r1" t="QL">Quick Look</A> </DIV></DIV></DIV></DIV></TD>
    <TD class=trs></TD>
    <TD class="bids bin1"><!-- Moved to ResultSet.tag -->
    <DIV>9 bids</DIV><!-- Moved to ResultSet.tag --></TD>
    <TD class=prc><!-- Moved to ResultSet.tag -->
    <DIV class=g-b itemprop="price">$27.00</DIV><!-- Moved to ResultSet.tag --></TD>
    <TD class="tme "><B class=hidlb>Time left:</B> <SPAN class=tme><B class=hidlb>Time left:</B> <SPAN>3d&nbsp;6h&nbsp;55m</SPAN> </SPAN></TD></TR>

    *************************
    code
    -----
    Dim trs As HtmlElementCollection = WebBrowser1.Document.GetElementsByTagName("tr")
    For Each tr As HtmlElement In trs
    If tr.GetAttribute("itemprop") = "offers" Then

    Dim wb As New WebBrowser
    'wb.DocumentText = "your html string"
    wb.DocumentText = tr.OuterHtml

    TextBox3.Text = TextBox3.Text & vbCrLf & "ZZ" & tr.OuterHtml

    Dim doc As HtmlDocument = wb.Document
    Dim a_tags As HtmlElementCollection = doc.GetElementsByTagName("a")

    For Each a_tag As HtmlElement In a_tags
    TextBox3.Text = TextBox3.Text & vbCrLf & "XX" & a_tag.OuterHtml

    If a_tag.GetAttribute("itemprop") = "name" Then
    TextBox4.Text = TextBox4.Text & vbCrLf & a_tag.GetAttribute("href").ToString
    TextBox11.Text = TextBox11.Text & vbCrLf & a_tag.InnerText
    End If
    Next
    End If
    Next

  9. #9

    Thread Starter
    Addicted Member
    Join Date
    Aug 2011
    Posts
    184

    Re: HtmlElement In a_tags is not staying within if condition

    I think the syntax is correct in this code but I'm still not getting the correct action. I have two test points in this code. The result of the 1 test is shown below. The 2nd point is blank.

    #1
    *******
    <TR itemprop="offers" itemtype="http://schema.org/Offer" itemscope><TD class="pic lt"><!-- Moved to ResultSet.tag --><!-- Moved to ResultSet.tag --><A class=img href="http://www.ebay.com/itm/Mens-Asics-Gel-Noosa-TRI-6-Racings-Shoes-Neon-yellow-White-Turquoise-/280950896347?pt=US_Men_s_Shoes&amp;hash=item4169fa76db" itemprop="url"><IMG class=img alt="Men's Asics-Gel Noosa TRI 6 Racings Shoes Neon yellow/White/Turquoise" src="http://thumbs4.ebaystatic.com/d/l225/m/miz8OhHhVLOkFCVWXGaDycg.jpg" itemprop="image"> </A></TD>
    <TD class=dtl>
    <DIV class=ittl><A class=vip title="Men's Asics-Gel Noosa TRI 6 Racings Shoes Neon yellow/White/Turquoise" href="http://www.ebay.com/itm/Mens-Asics-Gel-Noosa-TRI-6-Racings-Shoes-Neon-yellow-White-Turquoise-/280950896347?pt=US_Men_s_Shoes&amp;hash=item4169fa76db" itemprop="name">Men's Asics-Gel Noosa TRI 6 Racings Shoes Neon yellow/White/T​urquoise</A> </DIV><!-- Moved to ResultSet.tag -->
    <DIV class="dyn dynS">
    <DIV class="s2 distLoc"></DIV>
    <DIV class=s2>Returns: Not accepted</DIV>
    <DIV style="CLEAR: left"></DIV></DIV>
    <DIV></DIV>
    <DIV class=anchors>
    <DIV class=group>
    <DIV class=mi-l><!-- Moved to ResultSet.tag -->
    <DIV class=mi><A class="lnk iconQuickLook_14x14 mi-a" url="http://www.ebay.com/sch/moreinfo/?_id=280950896347&amp;_ptns=US_Men_s_Shoes&amp;_pppn=r1" t="QL">Quick Look</A> </DIV></DIV></DIV></DIV></TD>
    <TD class=trs></TD>
    <TD class="bids bin1"><!-- Moved to ResultSet.tag -->
    <DIV>9 bids</DIV><!-- Moved to ResultSet.tag --></TD>
    <TD class=prc><!-- Moved to ResultSet.tag -->
    <DIV class=g-b itemprop="price">$27.00</DIV><!-- Moved to ResultSet.tag --></TD>
    <TD class="tme "><B class=hidlb>Time left:</B> <SPAN class=tme><B class=hidlb>Time left:</B> <SPAN>3d&nbsp;6h&nbsp;55m</SPAN> </SPAN></TD></TR>

    *************************
    code
    -----
    Dim trs As HtmlElementCollection = WebBrowser1.Document.GetElementsByTagName("tr")
    For Each tr As HtmlElement In trs
    If tr.GetAttribute("itemprop") = "offers" Then

    Dim wb As New WebBrowser
    'wb.DocumentText = "your html string"
    wb.DocumentText = tr.OuterHtml

    TextBox3.Text = TextBox3.Text & vbCrLf & "ZZ" & tr.OuterHtml

    Dim doc As HtmlDocument = wb.Document
    Dim a_tags As HtmlElementCollection = doc.GetElementsByTagName("a")

    For Each a_tag As HtmlElement In a_tags
    TextBox3.Text = TextBox3.Text & vbCrLf & "XX" & a_tag.OuterHtml

    If a_tag.GetAttribute("itemprop") = "name" Then
    TextBox4.Text = TextBox4.Text & vbCrLf & a_tag.GetAttribute("href").ToString
    TextBox11.Text = TextBox11.Text & vbCrLf & a_tag.InnerText
    End If
    Next
    End If
    Next

  10. #10
    PowerPoster dunfiddlin's Avatar
    Join Date
    Jun 2012
    Posts
    8,245

    Re: HtmlElement In a_tags is not staying within if condition

    vb.net Code:
    1. Dim trs As HtmlElementCollection = WebBrowser1.Document.GetElementsByTagName("tr")
    2. For Each tr As HtmlElement In trs
    3. If tr.GetAttribute("itemprop") = "offers" Then
    4.  
    5.             Dim wb As WebBrowser = New WebBrowser 'allows the use of HTMLDocument
    6.             wb.DocumentText = ""                             'initialises document
    7.             wb.Document.Write(tr.OuterHtml)              'creates HTML document from the <tr>
    8.  
    9.  
    10.             Dim a_tags As HtmlElementCollection = wb.Document.GetElementsByTagName("a")
    11.  
    12. For Each a_tag As HtmlElement In a_tags
    13. If a_tag.GetAttribute("itemprop") = "name" Then
    14. TextBox11.AppendText(a_tag.InnerText & vbCrLf) 'that's the way to do it!
    15. End If
    16. Next
    17. End If
    18. Next

  11. #11

    Thread Starter
    Addicted Member
    Join Date
    Aug 2011
    Posts
    184

    Re: HtmlElement In a_tags is not staying within if condition

    That is it. Thanks. Appreciate all your help.

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  



Click Here to Expand Forum to Full Width