Results 1 to 5 of 5

Thread: Reading text from a website

  1. #1

    Thread Starter
    Addicted Member
    Join Date
    Sep 2018
    Posts
    159

    Reading text from a website

    Hi

    I need to read the text of a web page from vb.net

    In the past I've done this by navigating to the url via a webbrowser control, waiting for the DocumentCompleted event, and then looking at WebBrowser1.Document.Body.InnerText

    The problem I have is that I've been asked to look at a page which uses javascript to create some variable text. I don't know javascript at all, but these seem to be the relevant lines:

    Code:
    Hello! You recently spoke to <b>*insert name of Canvasser*</b> about
    ....
    var node = document.getElementById('dvwith');                 
    var a= node.innerHTML.replace("*insert name of Canvasser*", _canvass.display_text);
    document.getElementById('dvwith').innerHTML =a;
    jQuery("#dvwith").css("display","block");
    jQuery("#dvwithout").css("display","none");

    I need to read the text which replaces '*insert name of Canvasser*'. It's to test the page has rendered properly, so I need to capture the final text, after the replacement has been done.

    Unfortunately I can't get the page to show properly in the webbrowser control - I can see the rest of the page, but the variable text doesn't display at all, so I assume the script is not running properly (I get errors when I set ScriptErrorsSuppressed = False on the browser control).

    Can anyone suggest a way forward with this? - I just need a way for the browser to show the page in it's final state so I can then grab the text and do a search on it to get what I want.

    Thanks

  2. #2
    Super Moderator dday9's Avatar
    Join Date
    Mar 2011
    Location
    South Louisiana
    Posts
    11,712

    Re: Reading text from a website

    First, let me breakdown the JavaScript for you:

    Line 1:
    Code:
    var node = document.getElementById('dvwith');
    This code is declaring a variable named node and assigning it's value to the Element object in the document where the id is equal to dvwidth. You can find the documentation on the getElementById method here: https://developer.mozilla.org/en-US/...getElementById

    Line 2:
    Code:
    var a= node.innerHTML.replace("*insert name of Canvasser*", _canvass.display_text);
    This code is doing several things in one line:
    1. Declaring a variable named a
    2. Assigning it's value to:
      1. The innerHTML of the variable node. innerHTML gets the HTML markup contained within the element. You can find the documentation on the innerHTML property here: https://developer.mozilla.org/en-US/...ment/innerHTML
      2. Replaces "*insert name of Canvasser*" in the innerHTML with the result of _canvass.display_text. We will need to see how _canvass is defined and possibly how its display_text is assigned to. You can find the documentation on the replace method here: https://developer.mozilla.org/en-US/...String/replace


    Line 3:
    Code:
    document.getElementById('dvwith').innerHTML =a;
    This line sets dvwith's innerHTML equal to the result of line 2.

    Line 4:
    Code:
    jQuery("#dvwith").css("display","block");
    This is a bit odd because they're mixing vanilla JavaScript with jQuery for no real good reason, but the first part:
    Code:
    jQuery("#dvwith")
    Is functionally the same as line 1. But the second part:
    Code:
    .css("display", "block");
    Is essentially showing the element.

    Line 5:
    Code:
    jQuery("#dvwithout").css("display","none");
    This is hiding the element with the id dvwithout.

    If I were optimizing this code, I'd do it a bit differently:
    Code:
    var dvwith = document.getElementById('dvwith');
    var a = dvwith.innerHTML.replace("*insert name of Canvasser*", _canvass.display_text);
    dvwith.innerHTML = a;
    dvwith.style.display = 'block';
    
    var dvwithout = document.getElementById('dvwithout');
    dvwithout.style.display = 'none';
    Now that you understand the code a little bit more, could you show where/how _canvass.display_text is being set? This will be the core point of how you solve your problem.
    "Code is like humor. When you have to explain it, it is bad." - Cory House
    VbLessons | Code Tags | Sword of Fury - Jameram

  3. #3

    Thread Starter
    Addicted Member
    Join Date
    Sep 2018
    Posts
    159

    Re: Reading text from a website

    Hi

    This is a very useful explanation - it's great to be able to learn a bit more as I go along and thanks for laying it out so clearly.

    As for _canvas.display_text, I pass some text which becomes this in the url (this is from a QR code):

    Code:
    https://www.safestyle-windows.co.uk/get-a-quote/canvass/?utm_source=canvass&utm_medium=precision&utm_campaign=birminghamedg
    (sorry to display as code but the link was abbreviated when I posted as a url)


    The web page must have a lookup from the utm_campaign=birminghamedg bit, so as you've shown this:

    Code:
    Hello! You recently spoke to <b>*insert name of Canvasser*</b> about.....
    becomes:

    Code:
    Hello! You recently spoke to our Birmingham branch about....

  4. #4
    Super Moderator dday9's Avatar
    Join Date
    Mar 2011
    Location
    South Louisiana
    Posts
    11,712

    Re: Reading text from a website

    In the URL you're passing a query string, more information here: https://en.wikipedia.org/wiki/Query_string

    In JavaScript, you can get the query string parameters by using the URLSearchParams. What you should look for is something along these lines:
    Code:
    var urlParams = new URLSearchParams(window.location.search);
    var canvasser = urlParams.get('utm_campaign');
    However, if you have the URL to begin with, then you could get the query string parameter yourself using VB.NET:
    Code:
    Dim urlParams = Web.HttpUtility.ParseQueryString(url.Substring(url.IndexOf("?") + 1))
    Dim canvasser = urlParams("utm_campaign")
    "Code is like humor. When you have to explain it, it is bad." - Cory House
    VbLessons | Code Tags | Sword of Fury - Jameram

  5. #5

    Thread Starter
    Addicted Member
    Join Date
    Sep 2018
    Posts
    159

    Re: Reading text from a website

    Quote Originally Posted by dday9 View Post
    In the URL you're passing a query string, more information here: https://en.wikipedia.org/wiki/Query_string

    In JavaScript, you can get the query string parameters by using the URLSearchParams. What you should look for is something along these lines:
    Code:
    var urlParams = new URLSearchParams(window.location.search);
    var canvasser = urlParams.get('utm_campaign');
    However, if you have the URL to begin with, then you could get the query string parameter yourself using VB.NET:
    Code:
    Dim urlParams = Web.HttpUtility.ParseQueryString(url.Substring(url.IndexOf("?") + 1))
    Dim canvasser = urlParams("utm_campaign")
    Thanks - I'll give that a try.

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  



Click Here to Expand Forum to Full Width