Results 1 to 7 of 7

Thread: How to extract a link and some text from webpage with Webclient

  1. #1

    Thread Starter
    Lively Member
    Join Date
    Jul 2013
    Posts
    108

    How to extract a link and some text from webpage with Webclient

    Hi, I would like to extract a link and some text information from a web page using webclient.
    The link to extract would be:
    Code:
    <a class="btn btn-success btn-lg" href="https://arnold.ytapivmp3.com/download/bTxfcINRwXU/mp3/320/1602976370/4c0175cff8ea7e09d5a47ecef7154cd993703d56aefcf9f1ece7b99219db52bd/1" rel="nofollow noopener">Download MP3</a>
    and the text in :
    Code:
     <title>This is the title</title>
    How can I extract the link and text? The link is always the only link with more than 50 characters on the page.
    I'm extracting the html code with:
    Code:
    Dim request As WebRequest = WebRequest.Create("https://www.320youtube.com/v10/watch?v=i_wnFX5WPv4")
            Dim response As WebResponse = request.GetResponse()
            Dim data As Stream = response.GetResponseStream()
            Dim html As String = String.Empty
    
            Using sr As StreamReader = New StreamReader(data)
                html = sr.ReadToEnd()
                RichTextBox1.Text = html
            End Using
    Thank you

  2. #2
    Super Moderator jmcilhinney's Avatar
    Join Date
    May 2005
    Location
    Sydney, Australia
    Posts
    110,296

    Re: How to extract a link and some text from webpage with Webclient

    How you download the data has nothing really to do with how you process the data. You current have a String containing the HTML markup. If you created a WebClient and called DownloadString, you'd have that same String. If you called File.ReadAllText to read an HTML file then you'd have the same String too. How you get the String is irrelevant to how you process it.

    As for that processing, you could just use straight string manipulation but I would suggest that you actually treat it as HTML and use the HTML Agility Pack. You can then traverse the DOM, find the element you want and then read the desired data. That would require doing some research on the HTML Agility Pack and how to use it.

  3. #3

    Thread Starter
    Lively Member
    Join Date
    Jul 2013
    Posts
    108

    Re: How to extract a link and some text from webpage with Webclient

    I wanted to avoid a third party solution. I was thinking to use rwgular expression to exctract aĺl links and then filter the longest one.. what do you think? Also, do you advise me to
    xml parser instead? Thanks

  4. #4
    Super Moderator jmcilhinney's Avatar
    Join Date
    May 2005
    Location
    Sydney, Australia
    Posts
    110,296

    Re: How to extract a link and some text from webpage with Webclient

    Quote Originally Posted by matty95srk View Post
    I wanted to avoid a third party solution.
    That's your prerogative but I would suggest that it's misguided.

  5. #5

    Thread Starter
    Lively Member
    Join Date
    Jul 2013
    Posts
    108

    Re: How to extract a link and some text from webpage with Webclient

    I would rather prefer to learn any .net solutions. I know about html agility pack since a long time, I've tried it, but I'm not that crazy about.
    I will choose to get text with xml solution.
    Thanks

  6. #6
    PowerPoster PlausiblyDamp's Avatar
    Join Date
    Dec 2016
    Location
    Pontypool, Wales
    Posts
    2,458

    Re: How to extract a link and some text from webpage with Webclient

    Quote Originally Posted by matty95srk View Post
    I would rather prefer to learn any .net solutions. I know about html agility pack since a long time, I've tried it, but I'm not that crazy about.
    I will choose to get text with xml solution.
    Thanks
    The problem with either Regex or Xml as a solution is they both require the html to conform to the standards exactly, this isn't always the case with webpages.

  7. #7

    Thread Starter
    Lively Member
    Join Date
    Jul 2013
    Posts
    108

    Re: How to extract a link and some text from webpage with Webclient

    I will try to "play" a bit with substring methods.
    Edit:
    I wanted to share the solution I found for my case.
    I took a bit more html before the link and now I m sure 100% about i m getting just that link. Obviously my code will fail in case html change.. hopefully not.
    Code:
    Dim allinputtext As String = RichTextBox1.Text
        Dim textafter As String = """ rel=""nofollow noopener"
        Dim textbefore As String = "class=""btn btn-success btn-lg"" href="""
        Dim startPosition As Integer = allInputText.IndexOf(textBefore)
    
        'If text before was not found, return Nothing
        If startPosition < 0 Then
    
        End If
    
        'Move the start position to the end of the text before, rather than the beginning.
        startPosition += textBefore.Length
    
        'Find the first occurrence of text after the desired number
        Dim endPosition As Integer = allInputText.IndexOf(textAfter, startPosition)
    
        'If text after was not found, return Nothing
        If endPosition < 0 Then
    
        End If
    
        'Get the string found at the start and end positions
        Dim textFound As String = allInputText.Substring(startPosition, endPosition - startPosition)
        TextBox4.Text = (textFound)
    I knew there was something easier in .net class .. hope this will help somebody
    Last edited by matty95srk; Oct 18th, 2020 at 07:27 PM.

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  



Click Here to Expand Forum to Full Width