Results 1 to 4 of 4

Thread: Grab text from the web [how to?]

  1. #1

    Thread Starter
    New Member
    Join Date
    Jun 2011
    Posts
    4

    Question Grab text from the web [how to?]

    Hi, i'm not a rookie but i'm not (at all) an expert in VB, a few months ago, i created a simple program to auto renew my books from my school's library. The code works well, i just use a webbrowser navigate to it, and the click the buttons with a timer (that consider the internet speed, changing the time to start). e.g.: WebBrowser1.Document.GetElementById("login").SetAttribute("value", TextBox1.Text) / or WebBrowser1.Document.GetElementById("btn_gravar").InvokeMember("click") ... simple as that.

    but now i want (after the renew thing) to withdraw some information on the page, i searched a lot of tutorials (some with, almost, the same problem as me) but none of them worked, it looks so simple but i cant find a way to work with. can you help me?

    here is the website code (since you can't log in to see)

    what i want is:
    (1) Return until: 14/09/2017
    (2) Total of renewals performed: 0
    (3) Reference: MONTEIRO, Washington de Barros.


    Code:
    <div id="1b" style="">
            <table width="100%" border="0" cellspacing="0" cellpadding="0">
    			<tbody><tr>
    				<td class="box_do_detalhes">
    					<table width="100%" border="0" cellpadding="0" cellspacing="0">
    						<tbody><tr>
    							<td colspan="2" class="box_f7f7f7_c">Reference: MONTEIRO, Washington de Barros. <b> Curso de direito civil. </b> <b></b> 36. ed. São Paulo: Saraiva, 2001. 350 p.  ISBN 8502020439 </td>
    						</tr>
    						<tr>
    							<td class="box_f7f7f7_c"><em><strong>Call number: 342.1 M775c 2001 (BU-JC)</strong></em></td>
    							<td class="box_fffff_c">Unidade de Informação source: Biblioteca </td>
    						</tr>
    						<tr>
    							<td class="box_f7f7f7_c">Type of loan: Estágio             </td>
    							<td class="box_fffff_c">Description: v.2                                                 , nº 4 </td>
    						</tr>
    						<tr>
    							<td class="box_f7f7f7_c">Date of loan: 31/08/2017 07:52:55</td>
    							<td class="box_fffff_c">Return until: <strong>14/09/2017</strong></td>
    						</tr>
    						<tr>
    							<td class="box_f7f7f7_c">Fine partial amount: $ 0</td>
    							<td class="box_fffff_c">Total of renewals performed: 0</td>
    						</tr>
    					</tbody></table>
    				</td>
    			</tr>
            </tbody></table>
    	</div>
    here's a second example, (a second book) this goes on, 1b ; 2b; 3b; 4b; 5b; 6b; and 7b (since 7 books is the max you may retain)
    Code:
    <div id="2b" style="">
            <table width="100%" border="0" cellspacing="0" cellpadding="0">
    			<tbody><tr>
    				<td class="box_do_detalhes">
    					<table width="100%" border="0" cellpadding="0" cellspacing="0">
    						<tbody><tr>
    							<td colspan="2" class="box_f7f7f7_c">Reference: FIUZA, César. <b> Direito civil:  </b> curso completo. <b></b> 12. ed., rev., atual. e ampl. Belo Horizonte: Del Rey, 2008. xxiv, 1084 p.  ISBN 9788573089868. </td>
    						</tr>
    						<tr>
    							<td class="box_f7f7f7_c"><em><strong>Call number: 342.1 F565d 2008 (BU-JC)</strong></em></td>
    							<td class="box_fffff_c">Unidade de Informação source: Biblioteca </td>
    						</tr>
    						<tr>
    							<td class="box_f7f7f7_c">Type of loan: Estágio             </td>
    							<td class="box_fffff_c">Description: nº 6 </td>
    						</tr>
    						<tr>
    							<td class="box_f7f7f7_c">Date of loan: 31/08/2017 07:53:07</td>
    							<td class="box_fffff_c">Return until: <strong>14/09/2017</strong></td>
    						</tr>
    						<tr>
    							<td class="box_f7f7f7_c">Fine partial amount: $ 0</td>
    							<td class="box_fffff_c">Total of renewals performed: 0</td>
    						</tr>
    					</tbody></table>
    				</td>
    			</tr>
            </tbody></table>
    	</div>

  2. #2
    VB For Fun Edgemeal's Avatar
    Join Date
    Sep 2006
    Location
    WindowFromPoint
    Posts
    4,255

    Re: Grab text from the web [how to?]

    You can try something like this, and do any additional parsing using some basic string methods.

    Code:
    Dim elems As HtmlElementCollection = WebBrowser1.Document.GetElementsByTagName("tr")
    For Each tr As HtmlElement In elems
        Dim colTD As HtmlElementCollection = tr.GetElementsByTagName("td")
        For Each td As HtmlElement In colTD
            Debug.WriteLine(td.InnerText) ' add this text to a list or something.
        Next td
    Next tr
    And one way to parse html text, to strip data out,,...
    Code:
    Private Sub Button1_Click(sender As Object, e As EventArgs) Handles Button1.Click
        Dim tmp = TextBetween(html_source_text, ">Return until:", "</td>")
        Dim retunUntil = TextBetween(tmp, "<strong>", "</strong>").Trim
        MsgBox("Return until: " & retunUntil)
    
        Dim totalRenews = TextBetween(html_source_text, ">Total of renewals performed:", "</td>").Trim
        MsgBox("Total of renewals performed: " & totalRenews)
    
        Dim reference = TextBetween(html_source_text, ">Reference:", "<b>").Trim
        MsgBox("Reference: " & reference)
    End Sub
    
    Private Function TextBetween(mainText As String, findFirst As String, findSecond As String) As String
        Dim l = mainText.IndexOf(findFirst) + findFirst.Length
        Dim r = mainText.IndexOf(findSecond, l)
        Return If(r > l, mainText.Substring(l, r - l), "")
    End Function
    Last edited by Edgemeal; Sep 5th, 2017 at 10:52 AM.

  3. #3

    Thread Starter
    New Member
    Join Date
    Jun 2011
    Posts
    4

    Re: Grab text from the web [how to?]

    Thank you Sr.,

    but 1 problem here, it's saying that the html_source_text it's not declared, how should i proceed?

  4. #4
    Frenzied Member KGComputers's Avatar
    Join Date
    Dec 2005
    Location
    Cebu, PH
    Posts
    2,020

    Re: Grab text from the web [how to?]

    html_source_text it's not declared
    You may replace that variable with the actual page source such as the DocumentText property of the WebBrowser control.

    - kgc
    CodeBank: VB.NET & C#.NET | ASP.NET
    Programming: C# | VB.NET
    Blogs: Personal | Programming
    Projects: GitHub | jsFiddle
    ___________________________________________________________________________________

    Rating someone's post is a way of saying Thanks...

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  



Click Here to Expand Forum to Full Width