[RESOLVED] Iterate through html elements using htmlagilitypack

**nikel** · Aug 15th, 2017, 12:05 PM

Hello, I have an html file which contains:

Code:

<div class="bookmarks">
	<p>
		<b>Kategoriler<br></b><a href="#anchor1">Yüklü Yazılımlar</a><br>
		<a href="#anchor2">Active Setup</a><br>
		<a href="#anchor3">Installed Programs</a><br>
		<a href="#anchor4"> Tools for .Net 3.5</a><br>
		<a href="#anchor5">7-Zip 16.04 (x64 edition)</a><br>
		<a href="#anchor6">Active Directory Authentication Library for SQL Server</a><br>
		<a href="#anchor7">Active Directory Authentication Library for SQL Server (x86)</a><br>
		<a href="#anchor8">Adobe Acrobat 7.0 Professional</a><br>
		<a href="#anchor9">Adobe Acrobat Reader DC - Turkish</a><br>
                ...
         </p>
</div>

and I want to get all innertext after (including) #anchor4. I know this is a general question but I couldn't understand logic of this kind of iterating. For example I tried this:

VB.NET Code:

HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.Load("installed_apps.html");
foreach (HtmlNode node in doc.DocumentNode.SelectNodes("//div[@class='bookmarks']"))
{
    string str = "";
    foreach (HtmlNode node2 in node.SelectNodes("//p"))
    {
        foreach (HtmlNode node3 in node2.SelectNodes("//a[@href='anchor5']"))
        {
            str = node3.InnerText;
        }
    }
    installedItems.Add(str);
}

However I got system.nullreferenceexception.

**PlausiblyDamp** · Aug 15th, 2017, 12:23 PM

The line

Code:

    foreach (HtmlNode node3 in node2.SelectNodes("//a[@href='anchor5']"))

is looking for an element with a href attribute of 'anchor5', none of the elements have that as the href though - try '#anchor5' and see if that helps.

**nikel** · Aug 15th, 2017, 12:31 PM

Wow that works fine. Here's my final code:

VB.NET Code:

HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.Load("installed_apps.html");
num = 1;
string str = "";
foreach (HtmlNode node in doc.DocumentNode.SelectNodes("//div[@class='bookmarks']"))
{
    foreach (HtmlNode node2 in node.SelectNodes("//p"))
    {
        foreach (HtmlNode node3 in node2.SelectNodes("//a[@href='#anchor" + num + "']"))
        {
            str = node3.InnerText;
            Console.WriteLine(str);
            num++;
        }
    }
    installedItems.Add(str);
}

Thread: [RESOLVED] Iterate through html elements using htmlagilitypack

Thread Tools

Display

[RESOLVED] Iterate through html elements using htmlagilitypack

Re: Iterate through html elements using htmlagilitypack

Re: Iterate through html elements using htmlagilitypack

Tags for this Thread

Posting Permissions