Does anybody here have an algorithm to strip the text from an HTML page?

for instance, i have a page in html with lots of data on it, but when i open it with a text stream reader, i get all the "color=, border =' etc.... crap as well as the text that i want.

if anybody has a function lying around that does this, it would be greatly appreciated.