Results 1 to 3 of 3

Thread: help scraping data from website

  1. #1

    Thread Starter
    Junior Member
    Join Date
    Jan 2005
    Posts
    20

    help scraping data from website

    Hi,

    I need help with scraping certain data of websites. the current code that i have works fine for some
    Code:
    Dim bookmarkNodes As IEnumerable(Of HtmlNode) = htmlDocument.DocumentNode.SelectNodes("//a[@rel='bookmark' and @title and not(.//time[@class='entry-date' and not(@datetime='')])]")
    but some use the this code
    Code:
    <h3 class="entry-title"><a href="https://somewebisteref" rel="bookmark">data that i want to scrap</a></h3> <div class="entry-meta">
    or
    Code:
    <a href="https://somewebsite" title="data i need">
    the title part of my code again works for some but not all.

    i would like to be able to still use the exsisting code that works and intergrate what i need into it.

    The scrape goes into a text box that formats the data to what i need.

    many thanks
    Last edited by vampiro; Apr 15th, 2024 at 04:14 PM.

  2. #2
    Super Moderator dday9's Avatar
    Join Date
    Mar 2011
    Location
    South Louisiana
    Posts
    11,809

    Re: help scraping data from website

    You need to look at the overall DOM, determine what pattern is specific enough to match only the items you want but not too specific that it will exclude some desired items.

    Right now, you are it too specific.

    Is it safe to say you want all anchor elements (<a />) that are direct children of a DOM element with the class ".entry-title"? Could you be more specific to say that you want all anchor elements that are direct children of heading 3 (<h3 />) elements with the class ".entry-title"?

    Once you get that down, it's just a matter of building the query selector. Unfortunately the business logic is not something we decide for you. Once you know what you want to grab, in plain English, give that to us and we can help you on the code side.
    "Code is like humor. When you have to explain it, it is bad." - Cory House
    VbLessons | Code Tags | Sword of Fury - Jameram

  3. #3
    Super Moderator Shaggy Hiker's Avatar
    Join Date
    Aug 2002
    Location
    Idaho
    Posts
    39,163

    Re: help scraping data from website

    What I always say is that if you can avoid scraping from a website, then do so. If the site owners don't want you to have the data, then they wouldn't put it on a website. If they do want you to have the data, then they might already have an API, or might be willing to write an API, for the site. An API is going to be VASTLY easier to work with than scraping a website.

    The problem is that there is something in the nature of web developers that they have to change the HTML every few months. First it's a <p>, then it's a button, then it's some goofy <div> with CSS to make it look like a button. Active websites change and change often. APIs are a way to get the data out such that computers can talk between themselves to transfer the raw data. Those tend not to change, whereas the website is just a visual representation that can change on a whim.
    My usual boring signature: Nothing

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  



Click Here to Expand Forum to Full Width