Results 1 to 3 of 3

Thread: [RESOLVED] Iterate through html elements using htmlagilitypack

  1. #1

    Thread Starter
    Addicted Member
    Join Date
    Oct 2013
    Posts
    200

    Resolved [RESOLVED] Iterate through html elements using htmlagilitypack

    Hello, I have an html file which contains:

    Code:
    <div class="bookmarks">
    	<p>
    		<b>Kategoriler<br></b><a href="#anchor1">Yüklü Yazılımlar</a><br>
    		<a href="#anchor2">Active Setup</a><br>
    		<a href="#anchor3">Installed Programs</a><br>
    		<a href="#anchor4"> Tools for .Net 3.5</a><br>
    		<a href="#anchor5">7-Zip 16.04 (x64 edition)</a><br>
    		<a href="#anchor6">Active Directory Authentication Library for SQL Server</a><br>
    		<a href="#anchor7">Active Directory Authentication Library for SQL Server (x86)</a><br>
    		<a href="#anchor8">Adobe Acrobat 7.0 Professional</a><br>
    		<a href="#anchor9">Adobe Acrobat Reader DC - Turkish</a><br>
                    ...
             </p>
    </div>
    and I want to get all innertext after (including) #anchor4. I know this is a general question but I couldn't understand logic of this kind of iterating. For example I tried this:

    VB.NET Code:
    1. HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
    2. doc.Load("installed_apps.html");
    3. foreach (HtmlNode node in doc.DocumentNode.SelectNodes("//div[@class='bookmarks']"))
    4. {
    5.     string str = "";
    6.     foreach (HtmlNode node2 in node.SelectNodes("//p"))
    7.     {
    8.         foreach (HtmlNode node3 in node2.SelectNodes("//a[@href='anchor5']"))
    9.         {
    10.             str = node3.InnerText;
    11.         }
    12.     }
    13.     installedItems.Add(str);
    14. }

    However I got system.nullreferenceexception.
    Last edited by nikel; Aug 15th, 2017 at 12:19 PM. Reason: I changed code

  2. #2
    PowerPoster PlausiblyDamp's Avatar
    Join Date
    Dec 2016
    Location
    Pontypool, Wales
    Posts
    2,474

    Re: Iterate through html elements using htmlagilitypack

    The line
    Code:
        foreach (HtmlNode node3 in node2.SelectNodes("//a[@href='anchor5']"))
    is looking for an element with a href attribute of 'anchor5', none of the elements have that as the href though - try '#anchor5' and see if that helps.

  3. #3

    Thread Starter
    Addicted Member
    Join Date
    Oct 2013
    Posts
    200

    Re: Iterate through html elements using htmlagilitypack

    Wow that works fine. Here's my final code:

    VB.NET Code:
    1. HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
    2. doc.Load("installed_apps.html");
    3. num = 1;
    4. string str = "";
    5. foreach (HtmlNode node in doc.DocumentNode.SelectNodes("//div[@class='bookmarks']"))
    6. {
    7.     foreach (HtmlNode node2 in node.SelectNodes("//p"))
    8.     {
    9.         foreach (HtmlNode node3 in node2.SelectNodes("//a[@href='#anchor" + num + "']"))
    10.         {
    11.             str = node3.InnerText;
    12.             Console.WriteLine(str);
    13.             num++;
    14.         }
    15.     }
    16.     installedItems.Add(str);
    17. }

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  



Click Here to Expand Forum to Full Width