Results 1 to 2 of 2

Thread: Vb.net scrape html table....question...

  1. #1

    Thread Starter
    New Member
    Join Date
    Dec 2016
    Posts
    5

    Lightbulb Vb.net scrape html table....question...

    Say I have a table that will always contain RANDOM DATA (various product titles, prices, & ratings in no particular order). I noticed that sometimes either the "Price:" column or "Rating" column won't always have a value. So when I'm scraping multiple items into an array & sending each column into a listview, the data won't sync up properly if a value is missing in say the "Price" column.

    Here is an example of a html table that I'm trying to scrape data from, but notice how row "# 5" is missing the price. This is what's messing up the syncing of the data while it's being added to the listview in VB.NET:


    HTML Code:
    <html>
    <head>
    <style>
    
    table {
    margin:auto;
    margin-top:50px;
    font-family: arial, sans-serif;
    border-collapse: collapse;
    width: 40%;
    }
    
    td{
    border: 3px solid #000;
    text-align: left;
    padding: 3px;
    }
    
    th {
    border: 3px solid #000;
    background-color:gold;
    text-align: left;
    padding: 3px;
    }
    
    tr:nth-child(even) {
    background-color: #dddddd;
    }
    </style>
    </head>
    <body>
        <table>
            <tr><th>#</th><th>Product Title:</th><th width="60">Price:</th><th width="60">Rating:</th></tr>
            <tr><td width="20">1</td><td>Minera Natural Dead Sea Salt, 5lbs Bulk Bag - Fine Grain</td><td>$20.00</td><td>9/10</td></tr>
            <tr><td width="20">2</td><td>Minera Dead Sea Salt 2lb Bag Fine Grain, 100% Pure Mineral Salt Treatment</td><td>$9.99</td><td>6/10</td></tr>
            <tr><td width="20">3</td><td>Minera Pure Dead Sea Salt 10lbs Fine Grain</td><td>$15.95</td><td>8/10</td></tr>
            <tr><td width="20">4</td><td>Dead Sea Warehouse - Amazing Minerals Dead Sea Bath Salts, Temporary Relief from...</td><td>$16.00</td><td>5/10</td></tr>
            <tr><td width="20">5</td><td>Natural Planet Dead Sea Salt, 5lbs Fine Grain - 100% Pure Bath Salt - For Psoriasis...</td><td></td><td>5/10</td></tr>
            <tr><td width="20">6</td><td>Art Naturals Himalayan Salt Body Scrub 20oz -Deep Cleansing Exfoliator With Shea...</td><td>$13.95</td><td>7/10</td></tr>
            <tr><td width="20">7</td><td>Dead Sea Salt 2.2lb try for Psoriasis, Eczema, and Dermatitis (1 x Resealable...</td><td>$9.99</td><td>4/10</td></tr>
            <tr><td width="20">8</td><td>Premier Dead Sea Aromatherapy Mineral Body Treatment, Silver, Salt Scrub, 425...</td><td>$15.95</td><td>8/10</td></tr>
            <tr><td width="20">9</td><td>Dead Sea Warehouse - Amazing Minerals Dead Sea Bath Salts, Temporary Relief from...</td><td>$16.00</td><td>6/10</td></tr>
            <tr><td width="20">10</td><td>Natural Planet Dead Sea Salt, 50lbs Fine Grain - 100% Pure Bath Salt - For Psoriasis...</td><td>$90.25</td><td>10/10</td></tr>
    
        </table>
    </body>
    </html>
    Now here is an example of what I'm using in VB.NET to collect data from this table:

    Code:
    Imports System.Text.RegularExpressions
    Public Class Form1
        Dim ITEM As New ListViewItem
        Dim ProductTitle As String
        Dim ProductPrice As String
    
        Private Sub Button1_Click(sender As Object, e As EventArgs) Handles Button1.Click
            ListView1.Items.Clear()
            ProductTitle = ""
            ProductPrice = ""
    
            Dim keyword As String = TextBox1.Text
            keyword = keyword.Replace(" ", "+")
            Try
        'This is the HTML Table That I'm talking about:
                Dim html As String = "THE HTML TABLE SPECIFIED"
      
                'Product Title:
                '<h5 data-attribute="".+?""
                Dim regx1 As New Regex("<h5 data-attribute="".+?""", RegexOptions.IgnoreCase)
                Dim matches1 As MatchCollection = regx1.Matches(html)
                For Each match1 As Match In matches1
                    ProductTitle += match1.Value & "^"
                    ProductTitle = ProductTitle.Replace("<h5 data-attribute=""", "").Replace("""", "")
                Next
    
                'Price:
                Dim regx As New Regex("<span class=""a-size-small a-color-price a-text-bold"">.+?</span>", RegexOptions.IgnoreCase)
                Dim matches As MatchCollection = regx.Matches(html)
                For Each match As Match In matches
                    ProductPrice += match.Value & "^"
                    ProductPrice = ProductPrice.Replace("<span class=""a-size-small a-color-price a-text-bold"">", "").Replace("</span>", "")
                Next
    
                'Create the split & add all items to listview:
                Dim split1() As String = ProductTitle.Split("^")
                Dim split2() As String = ProductPrice.Split("^")
                For i = 0 To split1.Count - 2
                    ITEM = ListView1.Items.Add(split1(i))
                    ITEM.SubItems.Add(split2(i))
                Next
    
    
                Label1.Text = "Product Title: " & ListView1.Items.Count
                Label2.Text = "Price: " & ListView1.Items.Count
    
            Catch ex As Exception
    
            End Try
        End Sub
    End Class
    Again, the problem is that sometimes I won't know which table is going to have some elements missing (such as the "Price" column) which causes the data NOT to be synced up in the rows of the ListView. How could I fix this with the code that I've written above? Thanks.

  2. #2
    eXtreme Programmer .paul.'s Avatar
    Join Date
    May 2007
    Location
    Chelmsford UK
    Posts
    26,413

    Re: Vb.net scrape html table....question...

    You can grab the <td>something</td>

    New Regex("/<td/>(.+)/<//td/>")

    Match.Groups(1) is the 'something'

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  



Click Here to Expand Forum to Full Width