string/html parsing in c#
Hello!
I'm building a string parser for my friend and I need some help...
The parser will get a string which is an html file.
from that html (which is received as a string) the parser needs to exract the
attribute called 'content' from an element called 'meta name="keywords"'.
this is for a search function...
I thought of convering the string to a xml file and then exract the attribute,
but I don't know how...
can anyone help?
thanks!
:wave:
Re: string/html parsing in c#
Are you using .NET 2.0? Please use the radio buttons provided to specify your IDE/Framework version when creating a thread.
Re: string/html parsing in c#
I dont think converting html to xml is really a good idea... or even going to work for that matter. Why not use normal string functions like IndexOf and SubString?
Re: string/html parsing in c#
When you say the HTML file is passed as a string, are you saying the contents of the HTML file (the actual HTML code) is passed as a string, or that the filename to the HTML file is passed as a string?
You should look at the MSHTML library. It lets you automate an Internet Explorer window.
So you can navigate to a URL or file on disk, and iterate through all the elements. The most important part is that you can get back a collection of any type of element you want.
Check out this site.
http://www.csharphelp.com/archives/archive146.html
I can also post some of my own code if you want additional reference material.
Re: string/html parsing in c#
I meant that the html code is passed on as a string,
not the url...