-
Feb 10th, 2011, 03:36 PM
#1
Thread Starter
New Member
How to use javascript to extract text from an external webpage?
I want to get all of the text on the page without any of the HTML elements.
Then count the words on the text.
The website page should provides the user with a input box, and the user can type a URL, and then get both the text and the number of words on the text.
-
Feb 10th, 2011, 05:25 PM
#2
Fanatic Member
Re: How to use javascript to extract text from an external webpage?
welcome to the forums
you could try out some javascript with iframes and to strip out all the html tags, of course by replacing them with nothing.
edit: sorry i realized i gave you wrong information.. but you could try out something like this:
you will still have to do some work on it, i'm not gonna do it all for you.
this does not seem to allow external websites content to be read, but locally it works...
HTML Code:
<html>
<head>
<script language="javascript">
function getContents() {
var htstring = document.getElementById('testIFrame').contentWindow.document.body.innerHTML;
var stripped = htstring.replace(/(<([^>]+)>)/ig,"");
document.getElementById('contents').innerHTML = stripped;
}
</script>
</head>
<body>
<div id="contents"></div>
<iframe src="/test.html" id="testIFrame" style="display:none;"></iframe>
<input type="button" value="Get Content" onclick="getContents();" />
</body>
</html>
and test.html
HTML Code:
<p align="Left"><b>Hello</b> <I>World</I></p>
Last edited by Justa Lol; Feb 10th, 2011 at 06:15 PM.
-
Feb 11th, 2011, 11:26 AM
#3
Re: How to use javascript to extract text from an external webpage?
you'd have to use AJAX to request the page contents, then strip the mark-up away.
-
Feb 11th, 2011, 12:22 PM
#4
Re: How to use javascript to extract text from an external webpage?
And don't forget that AJAX doesn't "naturally" work for cross-domain requests. You'll have to look up workarounds to handle that.
You'd be better off doing this with PHP (or another server-side scripting lang).
-
Feb 3rd, 2014, 12:49 AM
#5
New Member
Re: How to use javascript to extract text from an external webpage?
This was the life saver, "var stripped = htstring.replace(/(<([^>]+)>)/ig,""); "
Thank you.....
Originally Posted by Justa Lol
welcome to the forums
you could try out some javascript with iframes and to strip out all the html tags, of course by replacing them with nothing.
edit: sorry i realized i gave you wrong information.. but you could try out something like this:
you will still have to do some work on it, i'm not gonna do it all for you.
this does not seem to allow external websites content to be read, but locally it works...
HTML Code:
<html>
<head>
<script language="javascript">
function getContents() {
var htstring = document.getElementById('testIFrame').contentWindow.document.body.innerHTML;
var stripped = htstring.replace(/(<([^>]+)>)/ig,"");
document.getElementById('contents').innerHTML = stripped;
}
</script>
</head>
<body>
<div id="contents"></div>
<iframe src="/test.html" id="testIFrame" style="display:none;"></iframe>
<input type="button" value="Get Content" onclick="getContents();" />
</body>
</html>
and test.html
HTML Code:
<p align="Left"><b>Hello</b> <I>World</I></p>
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules
|
Click Here to Expand Forum to Full Width
|