Using XmlDocument.Load to read in web pages
I want to read a web page into an XmlDocumemnt so I can use all those cool XML tools to examine it :D .
But HTML doesn't quite match XML specs:confused:. <HTML>, <IMG>, <BR> tags don't require ending tags, for example. And values don't have to be quoted
<Table Width=130>
vs
<Table Width='130'>
Is there a way to get HTMl pages into an XmlDocument without "correcting" all the anamolies?
Thanks!
Re: Using XmlDocument.Load to read in web pages
No but try doing a search on the HTMLDocument object. There is a Document Object Model for HTML docs as well which can make life easier depending on what you are doing.
Re: Using XmlDocument.Load to read in web pages
XML and HTML are two different languages. Just because they have a similar structure doesn't mean that something written to parse XML code should be able to read HTML code. HTML code that complies to the rules of XML is called XHTML and only if your Web pages conform to that spec will an XML parser be able to understand them. The chances of that are pretty minimal. Use the right tool for the job, as Edneeis suggests.
Re: Using XmlDocument.Load to read in web pages
Terrific. Perfect solution.
One small problem... I'm using version .net 1.1 and the HTMLDocument object was introduced at 2.0.
Ok, so I need to upgrade. Is the upgrade I want Visual Studio 2005? I'm finding the Microsoft update info unhelpful since they talk about Visual Studio, not ".net".
Re: Using XmlDocument.Load to read in web pages
They dropped the .NET from VS with 2005.... so yes, VS2005 is the next version of .NET (specificaly .NET FW 2.0)
-tg
Re: Using XmlDocument.Load to read in web pages
You can use the HTMLDocument in any version via COM. The .NET version is based on the COM one anyway.