Results 1 to 6 of 6

Thread: retrieve text from a webpage

  1. #1

    Thread Starter
    PowerPoster
    Join Date
    Aug 2003
    Location
    Edinburgh, UK
    Posts
    2,773

    retrieve text from a webpage

    is it possible to retrieve text on a guestbook on a website?

    if so how? I am trying to figure out how to read the website guestbook in asp.net so i can extract the stuff i need from it.
    Last edited by Techno; Jun 22nd, 2005 at 11:30 AM.

  2. #2
    Frenzied Member Magiaus's Avatar
    Join Date
    Mar 2002
    Location
    swamp land
    Posts
    1,267

    Re: retrieve text from a webpage

    you could loop through it getting each control/element on either client or server side and get each control/elements .innerText
    Magiaus

    If I helped give me some points.

  3. #3
    Frenzied Member Magiaus's Avatar
    Join Date
    Mar 2002
    Location
    swamp land
    Posts
    1,267

    Re: retrieve text from a webpage

    p.s. if you do it server side the page has a form control the form control has all the .net controls within it
    Magiaus

    If I helped give me some points.

  4. #4

    Thread Starter
    PowerPoster
    Join Date
    Aug 2003
    Location
    Edinburgh, UK
    Posts
    2,773

    Re: retrieve text from a webpage

    thanks

    actually its probably my fault, should have been in the C# forum. but still, how would you tell the page/app to go to a website, read the entire site (text) or whatever in some sort of stream and then i can do whatever i want with it...?

  5. #5
    Frenzied Member Magiaus's Avatar
    Join Date
    Mar 2002
    Location
    swamp land
    Posts
    1,267

    Re: retrieve text from a webpage

    I would Setup a web app with a frame page with three frames. top frame address bar for going to the site, middle frame the site to pull text from and the bottom frame contains js to read what is in the middle frame and display it in a textbox.

    document.frames["middle_frame"].document.Body.innerText you can also loop through the arrays of elements by name ie document.anchors or document.images or whatever.

    that frames[name].document..... may or may not need the document I don't remember right now for certian

    btw yes frames are evil and if a site nows how break out of frames like for instance msdn.microsoft.com it won't work.....

    the other option would be to use something like this below and the parse the text from the html

    PHP Code:
    using System;
    using System.Net;
    using System.IO;
    using System.Text;
    class 
    ClientGET {
        private static 
    bool bShow;

        public static 
    void Main(string[] args) {

            if (
    args.Length 1) {
                
    showusage();
            } else {
                if (
    args.Length == 2)
                    
    bShow false;
                else
                    
    bShow true;

                
    getPage(args[0]);
            }

            
    Console.WriteLine();
            
    Console.WriteLine("Press Enter to continue...");
            
    Console.ReadLine();

            return;
        }

        public static 
    void showusage() {
            
    Console.WriteLine("Attempts to GET a URL");
            
    Console.WriteLine("\r\nUsage:");
            
    Console.WriteLine("ClientGET URL");
            
    Console.WriteLine("Examples:");
            
    Console.WriteLine("ClientGET http://www.microsoft.com/net/");
        }


        public static 
    void getPage(String url) {
            
    WebResponse result null;

            try {
                
    WebRequest req WebRequest.Create(url);
                
    result req.GetResponse();
                
    Stream ReceiveStream result.GetResponseStream();
                
    Encoding encode System.Text.Encoding.GetEncoding("utf-8");
                
    StreamReader sr = new StreamReaderReceiveStreamencode );
                
    Console.WriteLine("\r\nResponse stream received");
                if (
    bShow) {
                    
    Char[] read = new Char[256];
                    
    int count sr.Readread0256 );

                    
    Console.WriteLine("HTML...\r\n");
                    while (
    count 0) {
                        
    String str = new String(read0count);
                        
    Console.Write(str);
                        
    count sr.Read(read0256);
                    }
                    
    Console.WriteLine("");
                }
            } catch(
    Exception) {
                
    Console.WriteLine("\r\nThe request URI could not be found or was malformed");
            } finally {
                if ( 
    result != null ) {
                    
    result.Close();
                }
            }
        }

    that is from the quickstarts and will need to be tweaked to even think about working
    Magiaus

    If I helped give me some points.

  6. #6
    Frenzied Member Magiaus's Avatar
    Join Date
    Mar 2002
    Location
    swamp land
    Posts
    1,267

    Re: retrieve text from a webpage

    you may want to try search on planetsourcecode.com I saw somethings on the that may do what your wanting or be a good start. Just remember if it is a server page it has to run before you try to read it or it won't be as it should that's why I would go with the web app....

    document.frames[name].document.location.href = new url in case you don't know
    Magiaus

    If I helped give me some points.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  



Click Here to Expand Forum to Full Width