Results 1 to 18 of 18

Thread: Webpage Text Into Array

  1. #1

    Thread Starter
    Lively Member StevenM's Avatar
    Join Date
    Mar 2015
    Posts
    73

    Webpage Text Into Array

    There has to be a better way of doing this...

    I am attempting to get the text that we see when we visit a website and dump it into an array.

    Right now the code is using a webBrowser control, a text box and a string array ar1()

    Here is what I have working:
    Code:
        Private Sub Button1_Click(sender As System.Object, e As System.EventArgs) Handles Button1.Click
            TextBox1.Text = WebBrowser1.Document.Body.InnerText
            ar1 = TextBox1.Lines
        End Sub
    When the program opens it loads a website into the webBrowser1.
    When the button is clicked the webpage text populates the text box.
    The array ar1() is populated with the webpage text.

    I hope to have the code do all this in the background. Can I get a webpage text into an array without using the textbox and webBrowser controls?

    Thank you in advance

  2. #2
    Bad man! ident's Avatar
    Join Date
    Mar 2009
    Location
    Cambridge
    Posts
    5,390

    Re: Webpage Text Into Array

    InnerText is a string, TextBox1.Lines is a string array. Use TextBox1.Text

  3. #3

    Thread Starter
    Lively Member StevenM's Avatar
    Join Date
    Mar 2015
    Posts
    73

    Re: Webpage Text Into Array

    Quote Originally Posted by ident View Post
    InnerText is a string, TextBox1.Lines is a string array. Use TextBox1.Text
    I am hoping to avoid both the textBox1 and webBrowser1 controls.
    For how slick VB.Net is there has to be a way to store the webpage text we see on a website in an array without the use of controls.

    I found by setting both controls to Visible = False it runs much faster as I do not need to see the text. The code will manipulate it and output to a file what I am looking for.

    Hope this explains it better than my first attempt :-)

  4. #4
    Bad man! ident's Avatar
    Join Date
    Mar 2009
    Location
    Cambridge
    Posts
    5,390

    Re: Webpage Text Into Array

    Why do you need to store it in an array? Use the webclient class to download the pages source and assign it to a variable.

  5. #5
    Super Moderator dday9's Avatar
    Join Date
    Mar 2011
    Location
    South Louisiana
    Posts
    9,677

    Re: Webpage Text Into Array

    You will need to use the WebClient class. Here is one such example:
    Code:
    Dim sourceString As String = New System.Net.WebClient().DownloadString("www.some-web-page.com")

  6. #6

    Thread Starter
    Lively Member StevenM's Avatar
    Join Date
    Mar 2015
    Posts
    73

    Re: Webpage Text Into Array

    Quote Originally Posted by dday9 View Post
    You will need to use the WebClient class. Here is one such example:
    Code:
    Dim sourceString As String = New System.Net.WebClient().DownloadString("www.some-web-page.com")
    dDay, Your example pulls down the text and HTML.

    I tried messing around with the WebClient() class and came up empty.

    The example I posted at the top of this thread works as the text window breaks the text into individual lines that can be put in an array.

    I need the text in an array as after the array has the individual lines from the webpage the code makes changes to it like removing comments that are remmed out.

    For example, if the website displays this:

    Bob Jr. and Mary
    Sam and Sue
    Joe-Jan

    I need to store just the text and no HTML, Java, css stuff, etc. Just the text we view and read on websites.

    It needs to go into an array like this:

    Bob Jr. and Mary
    Sam and Sue
    Joe-Jan

    and not this:

    Bob Jr. and Mary Sam and Sue Joe-Jan

    There has to be a way in VB.Net to do this without using the webBrowser control and textbox control.

  7. #7
    Bad man! ident's Avatar
    Join Date
    Mar 2009
    Location
    Cambridge
    Posts
    5,390

    Re: Webpage Text Into Array

    Why dont you actually post the url? Text can be split using string.split

  8. #8

    Thread Starter
    Lively Member StevenM's Avatar
    Join Date
    Mar 2015
    Posts
    73

    Re: Webpage Text Into Array

    There is not a specific website I am making this for. This is why I need to dump it into an array and have the code get what I want. I will give you an example using a website and a piece of what I am trying to get:

    Website:
    http://www.behindthename.com/top/lis...tates/2013/100

    Using the example I have working at the top of the thread my textbox.txt gets this:

    RankName%
    1. Noah 0.904 +3
    2. Liam 0.900 +4
    3. Jacob 0.899 -2
    4. Mason 0.879 -2
    5. William 0.825 0
    6. Ethan 0.806 -3
    7. Michael 0.768 0
    8. Alexander 0.738 +1
    9. Jayden 0.733 -1
    10. Daniel 0.707 +1
    11. Elijah 0.681 +2
    12. Aiden 0.676 -2
    13. James 0.671 +1
    14. Benjamin 0.668 +2
    15. Matthew 0.661 -3

    There are two columns of names on this site. One is boys and the other is girls. The list is pretty long so this is just a sample of what is in the text.txt control. Note there is only text. No code. This is what people see when the visit the website. And it is what I want to put into an array. The array that would hold the above example would have 16 elements.

    This is a simple example to offer that demonstrate what I am trying to get this code to do.

  9. #9
    Bad man! ident's Avatar
    Join Date
    Mar 2009
    Location
    Cambridge
    Posts
    5,390

    Re: Webpage Text Into Array

    Then you need to show us how you have tried to split tables. What have you tried so far?

  10. #10
    Super Moderator dday9's Avatar
    Join Date
    Mar 2011
    Location
    South Louisiana
    Posts
    9,677

    Re: Webpage Text Into Array

    One option that you have is to use an XmlDocument and populate the document using LoadXml method with the contents of the WebClients DownloadString method.

  11. #11

    Thread Starter
    Lively Member StevenM's Avatar
    Join Date
    Mar 2015
    Posts
    73

    Re: Webpage Text Into Array

    So far I have posted what I have tried. The way the tables or <div> are in the webpage is not an issue.

    VB6 has a way of pulling just the text from a website that is easily put into an array which is very fast and easy to make fault-tolerant.VB.Net has to have something similar where all that is needed is an array.

    Spent hours today looking at countless examples and none accomplish this as fast or as well as the example I posted in the first message. What I am trying to do is not specific to one or two websites. Once the text is in an array I can use code to give me the desired result. Right now I have a program that does it but I have to manually copy/paste the text from the website while visiting it. My mouse will copy the text into Notepad. There has to be a fast way to get VB.Net to copy the text to an array without the assistance of VB.Net controls.

    Had no idea this was going to be such a challenge lol

  12. #12
    Super Moderator dday9's Avatar
    Join Date
    Mar 2011
    Location
    South Louisiana
    Posts
    9,677

    Re: Webpage Text Into Array

    Did you attempt to do what I sugges with the XmlDocument?

  13. #13

    Thread Starter
    Lively Member StevenM's Avatar
    Join Date
    Mar 2015
    Posts
    73

    Re: Webpage Text Into Array

    dDay9, That is a task for Thursday. I will report back on how it is going.

    Long day. Moving for the 2nd time in as many months.

    BTW...the previous place we lived had farm animals. That little bug of yours crawling all over my screen is the same little bugs that were crawling on my screen for real while living by the farm. There really is no escape from these darn bugs! lol

  14. #14
    Frenzied Member dynamic_sysop's Avatar
    Join Date
    Jun 2003
    Location
    Ashby, Leicestershire.
    Posts
    1,142

    Re: Webpage Text Into Array

    There's quite a good example / tutorial on downloading the web content to XML here ---- > https://developer.yahoo.com/dotnet/howto-xml_vb.html
    ~
    if a post is resolved, please mark it as [Resolved]
    protected string get_Signature(){return Censored;}
    [vbcode][php] please use code tags when posting any code [/php][/vbcode]

  15. #15

    Thread Starter
    Lively Member StevenM's Avatar
    Join Date
    Mar 2015
    Posts
    73

    Re: Webpage Text Into Array

    Been playing around with the XML thing and, if it will do what I am searching for, am unable to find it. This is the closest I can come to getting the webpage text into an array from the example site dynamic_sysop offered:

    Code:
        Dim ar(0) As String
    
        Private Sub Form1_Load(sender As System.Object, e As System.EventArgs) Handles MyBase.Load
            Try
                ' Create the web request  
                request = DirectCast(WebRequest.Create("http://www.behindthename.com/names/usage/english"), HttpWebRequest)
    
                ' Get response  
                response = DirectCast(request.GetResponse(), HttpWebResponse)
    
                ' Get the response stream into a reader  
                reader = New StreamReader(response.GetResponseStream())
    
                ' Read the whole contents and return as a string  
                result = reader.ReadToEnd()
    
                File.WriteAllText("c:\webToString.txt", result)
                ar = System.IO.File.ReadAllLines("c:\webToString.txt")
    
            Finally
                If Not response Is Nothing Then response.Close()
            End Try
        End Sub
    This streams the HTML code along with the text. If it would just pull down the text for the array to load...

  16. #16
    Super Moderator dday9's Avatar
    Join Date
    Mar 2011
    Location
    South Louisiana
    Posts
    9,677

    Re: Webpage Text Into Array

    I don't know why you're so hung up on trying to use an array for this, perhaps it is because your VB6 background. It is inefficient and there is a better way. That way is to load the HTML into an XmlDocument and then you're able to search through the various nodes. I can only suggest this so many times, I wish you good luck, but I'm unsubscribing from the thread.

  17. #17

    Thread Starter
    Lively Member StevenM's Avatar
    Join Date
    Mar 2015
    Posts
    73

    Re: Webpage Text Into Array

    I agree that my VB6 background gets in the way with learning VB.Net. That is why I am here!

    I agree there is a better way and keep asking for help. After spending hours again today researching and testing I was able to get the webpage text into an array without the use of a TextBox and WebBrowser control. The string came with XML tags and by dumping the string into a file then using System.IO.FileInfo it went into an array. From there code cleaned it and gave me what I want. However, it is not near as fast as the example in the original message of this thread.

    The bottleneck is coming from pulling the webpage data into the XML string format. I tested the amount of time it took to load the textbox in the first example against the amount of time it took to get the webpage text into an XML string. I tested it before any code of mine ran.

    Am not crazy about loading a textbox with the WebBrowser control. However, it does work and is the fastest way I have tried to satisfy this.

    I appreciate all who offered suggestions to this challenge. Thank you! If I only knew what some of you know about VB.net...

  18. #18
    Frenzied Member dynamic_sysop's Avatar
    Join Date
    Jun 2003
    Location
    Ashby, Leicestershire.
    Posts
    1,142

    Re: Webpage Text Into Array

    would referencing to mshtml be an issue for you? if not you could use the IHtmlDocument2 along with the HttpWebrequest / HttpWebresponse / Streamreader, like this.....
    Code:
    Imports mshtml
    Imports System.Net
    Imports System.IO
    
    Public Class Form1
    
        Private Sub Button1_Click(sender As Object, e As EventArgs) Handles Button1.Click
            '///
            Dim httpReq As HttpWebRequest = WebRequest.Create("http://www.behindthename.com/top/lists/united-states/2013/100")
            Dim httpResp As HttpWebResponse = httpReq.GetResponse
            Dim strReader As StreamReader = New StreamReader(httpResp.GetResponseStream)
            Dim htmlDoc As IHTMLDocument2 = DirectCast(New HTMLDocument, IHTMLDocument2)
    
            htmlDoc.write(strReader.ReadToEnd)
    
            httpResp.Close()
            strReader.Close()
    
    
            Dim sr As New StringReader(htmlDoc.body.innerText)
    
            htmlDoc.close()
        End Sub
    End Class
    ~
    if a post is resolved, please mark it as [Resolved]
    protected string get_Signature(){return Censored;}
    [vbcode][php] please use code tags when posting any code [/php][/vbcode]

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  



Click Here to Expand Forum to Full Width