Results 1 to 5 of 5

Thread: How to use javascript to extract text from an external webpage?

  1. #1

    Thread Starter
    New Member
    Join Date
    Feb 2011
    Posts
    2

    How to use javascript to extract text from an external webpage?

    I want to get all of the text on the page without any of the HTML elements.

    Then count the words on the text.



    The website page should provides the user with a input box, and the user can type a URL, and then get both the text and the number of words on the text.

  2. #2
    Fanatic Member
    Join Date
    Jun 2008
    Posts
    1,023

    Re: How to use javascript to extract text from an external webpage?

    welcome to the forums

    you could try out some javascript with iframes and to strip out all the html tags, of course by replacing them with nothing.


    edit: sorry i realized i gave you wrong information.. but you could try out something like this:
    you will still have to do some work on it, i'm not gonna do it all for you.
    this does not seem to allow external websites content to be read, but locally it works...
    HTML Code:
    <html>
    <head>
    <script language="javascript"> 
    function getContents() {
    var htstring = document.getElementById('testIFrame').contentWindow.document.body.innerHTML;
    var stripped = htstring.replace(/(<([^>]+)>)/ig,""); 
    
    document.getElementById('contents').innerHTML = stripped; 
    
    }
    </script> 
    </head> 
    <body> 
    <div id="contents"></div>
    <iframe src="/test.html" id="testIFrame" style="display:none;"></iframe>
    <input type="button" value="Get Content" onclick="getContents();" />
    </body> 
    </html>
    and test.html

    HTML Code:
    <p align="Left"><b>Hello</b> <I>World</I></p>
    Last edited by Justa Lol; Feb 10th, 2011 at 06:15 PM.

  3. #3
    PowerPoster
    Join Date
    Sep 2003
    Location
    Edmonton, AB, Canada
    Posts
    2,629

    Re: How to use javascript to extract text from an external webpage?

    you'd have to use AJAX to request the page contents, then strip the mark-up away.
    Like Archer? Check out some Sterling Archer quotes.

  4. #4
    Frenzied Member
    Join Date
    Apr 2009
    Location
    CA, USA
    Posts
    1,516

    Re: How to use javascript to extract text from an external webpage?

    And don't forget that AJAX doesn't "naturally" work for cross-domain requests. You'll have to look up workarounds to handle that.

    You'd be better off doing this with PHP (or another server-side scripting lang).

  5. #5
    New Member
    Join Date
    Feb 2014
    Posts
    1

    Re: How to use javascript to extract text from an external webpage?

    This was the life saver, "var stripped = htstring.replace(/(<([^>]+)>)/ig,""); "

    Thank you.....

    Quote Originally Posted by Justa Lol View Post
    welcome to the forums

    you could try out some javascript with iframes and to strip out all the html tags, of course by replacing them with nothing.


    edit: sorry i realized i gave you wrong information.. but you could try out something like this:
    you will still have to do some work on it, i'm not gonna do it all for you.
    this does not seem to allow external websites content to be read, but locally it works...
    HTML Code:
    <html>
    <head>
    <script language="javascript"> 
    function getContents() {
    var htstring = document.getElementById('testIFrame').contentWindow.document.body.innerHTML;
    var stripped = htstring.replace(/(<([^>]+)>)/ig,""); 
    
    document.getElementById('contents').innerHTML = stripped; 
    
    }
    </script> 
    </head> 
    <body> 
    <div id="contents"></div>
    <iframe src="/test.html" id="testIFrame" style="display:none;"></iframe>
    <input type="button" value="Get Content" onclick="getContents();" />
    </body> 
    </html>
    and test.html

    HTML Code:
    <p align="Left"><b>Hello</b> <I>World</I></p>

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  



Click Here to Expand Forum to Full Width