-
Oct 17th, 2020, 10:24 PM
#1
Thread Starter
Lively Member
How to extract a link and some text from webpage with Webclient
Hi, I would like to extract a link and some text information from a web page using webclient.
The link to extract would be:
Code:
<a class="btn btn-success btn-lg" href="https://arnold.ytapivmp3.com/download/bTxfcINRwXU/mp3/320/1602976370/4c0175cff8ea7e09d5a47ecef7154cd993703d56aefcf9f1ece7b99219db52bd/1" rel="nofollow noopener">Download MP3</a>
and the text in :
Code:
<title>This is the title</title>
How can I extract the link and text? The link is always the only link with more than 50 characters on the page.
I'm extracting the html code with:
Code:
Dim request As WebRequest = WebRequest.Create("https://www.320youtube.com/v10/watch?v=i_wnFX5WPv4")
Dim response As WebResponse = request.GetResponse()
Dim data As Stream = response.GetResponseStream()
Dim html As String = String.Empty
Using sr As StreamReader = New StreamReader(data)
html = sr.ReadToEnd()
RichTextBox1.Text = html
End Using
Thank you
-
Oct 17th, 2020, 11:43 PM
#2
Re: How to extract a link and some text from webpage with Webclient
How you download the data has nothing really to do with how you process the data. You current have a String containing the HTML markup. If you created a WebClient and called DownloadString, you'd have that same String. If you called File.ReadAllText to read an HTML file then you'd have the same String too. How you get the String is irrelevant to how you process it.
As for that processing, you could just use straight string manipulation but I would suggest that you actually treat it as HTML and use the HTML Agility Pack. You can then traverse the DOM, find the element you want and then read the desired data. That would require doing some research on the HTML Agility Pack and how to use it.
-
Oct 18th, 2020, 04:57 AM
#3
Thread Starter
Lively Member
Re: How to extract a link and some text from webpage with Webclient
I wanted to avoid a third party solution. I was thinking to use rwgular expression to exctract aĺl links and then filter the longest one.. what do you think? Also, do you advise me to
xml parser instead? Thanks
-
Oct 18th, 2020, 06:52 AM
#4
Re: How to extract a link and some text from webpage with Webclient
Originally Posted by matty95srk
I wanted to avoid a third party solution.
That's your prerogative but I would suggest that it's misguided.
-
Oct 18th, 2020, 07:42 AM
#5
Thread Starter
Lively Member
Re: How to extract a link and some text from webpage with Webclient
I would rather prefer to learn any .net solutions. I know about html agility pack since a long time, I've tried it, but I'm not that crazy about.
I will choose to get text with xml solution.
Thanks
-
Oct 18th, 2020, 07:53 AM
#6
Re: How to extract a link and some text from webpage with Webclient
Originally Posted by matty95srk
I would rather prefer to learn any .net solutions. I know about html agility pack since a long time, I've tried it, but I'm not that crazy about.
I will choose to get text with xml solution.
Thanks
The problem with either Regex or Xml as a solution is they both require the html to conform to the standards exactly, this isn't always the case with webpages.
-
Oct 18th, 2020, 07:56 AM
#7
Thread Starter
Lively Member
Re: How to extract a link and some text from webpage with Webclient
I will try to "play" a bit with substring methods.
Edit:
I wanted to share the solution I found for my case.
I took a bit more html before the link and now I m sure 100% about i m getting just that link. Obviously my code will fail in case html change.. hopefully not.
Code:
Dim allinputtext As String = RichTextBox1.Text
Dim textafter As String = """ rel=""nofollow noopener"
Dim textbefore As String = "class=""btn btn-success btn-lg"" href="""
Dim startPosition As Integer = allInputText.IndexOf(textBefore)
'If text before was not found, return Nothing
If startPosition < 0 Then
End If
'Move the start position to the end of the text before, rather than the beginning.
startPosition += textBefore.Length
'Find the first occurrence of text after the desired number
Dim endPosition As Integer = allInputText.IndexOf(textAfter, startPosition)
'If text after was not found, return Nothing
If endPosition < 0 Then
End If
'Get the string found at the start and end positions
Dim textFound As String = allInputText.Substring(startPosition, endPosition - startPosition)
TextBox4.Text = (textFound)
I knew there was something easier in .net class .. hope this will help somebody
Last edited by matty95srk; Oct 18th, 2020 at 07:27 PM.
Tags for this Thread
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules
|
Click Here to Expand Forum to Full Width
|