<?xml version="1.0" encoding="utf-8"?>
<rss xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:pingback="http://madskills.com/public/xml/rss/module/pingback/" xmlns:trackback="http://madskills.com/public/xml/rss/module/trackback/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:dc="http://purl.org/dc/elements/1.1/" version="2.0">
  <channel>
    <title>Where did the time go? - Html Agility Pack</title>
    <link>http://blog.j-maxx.net/</link>
    <description>Brain Powered</description>
    <language>en-us</language>
    <copyright>Jeff Klawiter</copyright>
    <lastBuildDate>Sun, 06 Jun 2010 00:17:49 GMT</lastBuildDate>
    <generator>newtelligence dasBlog 2.1.8102.813</generator>
    <managingEditor>Jeff.Klawiter@sierra-bravo.com</managingEditor>
    <webMaster>Jeff.Klawiter@sierra-bravo.com</webMaster>
    <item>
      <trackback:ping>http://blog.j-maxx.net/Trackback.aspx?guid=ddec503d-058c-4c14-8d31-140047924fe1</trackback:ping>
      <pingback:server>http://blog.j-maxx.net/pingback.aspx</pingback:server>
      <pingback:target>http://blog.j-maxx.net/PermaLink,guid,ddec503d-058c-4c14-8d31-140047924fe1.aspx</pingback:target>
      <dc:creator>Jeff Klawiter</dc:creator>
      <wfw:comment>http://blog.j-maxx.net/CommentView,guid,ddec503d-058c-4c14-8d31-140047924fe1.aspx</wfw:comment>
      <wfw:commentRss>http://blog.j-maxx.net/SyndicationService.asmx/GetEntryCommentsRss?guid=ddec503d-058c-4c14-8d31-140047924fe1</wfw:commentRss>
      <body xmlns="http://www.w3.org/1999/xhtml">
        <p>
Recently I have added 4 new projects to SVN for Html Agility Pack. 
</p>
        <ol>
          <li>
            <a href="#HAPLight">HAPLight</a>: a Silverlight implementation 
</li>
          <li>
            <a href="#HAPCompact">HAPCompact</a>: a .NET CF 3.5 version 
</li>
          <li>
            <a href="#DotNet4">HAP for .NET 4.0</a>: taking advantage of DynamicObject. 
</li>
          <li>
            <a href="#UnitTests">Unit Tests</a>
          </li>
        </ol>
        <p>
All of these are works in progress and should be considered in alpha stages thus no
binary releases for them yet. To use them you’ll need to download them from SVN. <a title="http://htmlagilitypack.codeplex.com/SourceControl/list/changesets" href="http://htmlagilitypack.codeplex.com/SourceControl/list/changesets">http://htmlagilitypack.codeplex.com/SourceControl/list/changesets</a></p>
        <a name="HAPLight">
        </a>
        <h3>HAPLight
</h3>
        <p>
Bringing Html Agility Pack to Silverlight was relatively simple, thanks to Silverlight
supporting XPATH and XpathNavigator. There have been two losses so far, HtmlCmdLine
and HtmlWeb. HtmlWeb is a big loss and I don't plan on leaving it that way. Silverlight
requires all web requests to be Asyncronous which HtmlWeb surely is not. So at some
point I will be making a version of HtmlWeb that exposes Asynchronous methods for
downloading pages and returning them as HtmlDocuments. For now you can do this yourself
without much code using WebClient.DownloadstringAsync()
</p>
        <a name="HAPCompact">
        </a>
        <h3>HAPCompact
</h3>
        <p>
Again making a port of Html Agility Pack to .NET CF wasn't too difficult. One of the
biggest issues is .NET CF has no XPathNavigator support. There are no good free implementations
and I don't expect there ever will be. So HAPCompact will need to rely on using LINQ
to Objects. This project needs to be built with Visual Studio 2008. Unfortunately
VS2010 did not include any .NET compact framework support. I've been trying to look
into a way of taking advantage of VS2010's multi-targeting to add back in compilation
support. I have many projects at work that are in .NET CF 2.0 and 3.5.
</p>
        <a name="DotNet4">
        </a>
        <h3>Html Agility Pack for .NET 4.0
</h3>
        <p>
.NET 4.0 shipped with the Dynamic Language Runtime included. C# was updated in turn
to include a dynamic typing system. I thought it would be interesting to see if HAP
could take advantage of these features to dynamically access HtmlNodes and HtmlAttributes. 
This project so far is a partial class that makes HtmlNode inherit from DynamicObject.
This may change later to have it just implement an interface instead. The advantage
of this is you can access first level child nodes and attributes without . Something
like documentElement.Html.Body.Div to get the first &lt;div&gt; on the page.
</p>
        <p>
In C# to use these features you need to indicate the object is dynamic. Simply assigning
the node to a variable typed as dynamic will suffice. I had hoped to use @ for getting
attributes but found that it is completely lost so to access attributes a prefix of
_ is needed. Here are some examples taken from the unit tests:
</p>
        <div style="padding-bottom: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; display: inline; float: none; padding-top: 0px" id="scid:812469c5-0cb0-4c63-8c15-c81123a09de7:f1fe2ef8-b4a8-49b3-b582-ac5596512217" class="wlWriterEditableSmartContent">
          <pre name="code" class="c#">[Test]
public void TestGetAttribute()
{
    var doc = new HtmlDocument();
    doc.LoadHtml("&lt;html&gt;&lt;body class=\"asdfasd\"&gt;&lt;p&gt;asdf asdf sdf&lt;/p&gt;&lt;/body&gt;&lt;/html&gt;");
    dynamic docElement = doc.DocumentNode;
    var item = docElement.Html.Body._Class;
    Assert.IsNotNull(item);
    Assert.IsInstanceOf&lt;HtmlAttribute&gt;(item);
}

[Test]
public void TestGetMember()
{
    var doc = new HtmlDocument();
    doc.LoadHtml("&lt;html&gt;&lt;body&gt;&lt;p&gt;asdf asdf sdf&lt;/p&gt;&lt;/body&gt;&lt;/html&gt;");
    dynamic docElement = doc.DocumentNode;
    var item = docElement.Html.Body;
    Assert.IsNotNull(item);
    Assert.IsInstanceOf&lt;HtmlNode&gt;(item);
}</pre>
        </div>
        <p>
        </p>
        <p>
        </p>
        <p>
Other ideas I’m having with this is to introduce some kind of domain specific language
for doing more specific accessing like documentElement.Html.Body.First_Div or documentElement.Html.Body.ById_Header
. This will be limited of course due to lack of symbols that could be used. 
</p>
        <a name="UnitTests">
        </a>
        <h3>Unit Tests
</h3>
        <p>
I’ve begun adding Unit Tests to Html Agility Pack. This will be a long process to
even approach a good code coverage percentage. There is quite a bit of code in the
library and some of it could use a good refactoring. So as I’m writing unit tests
I may be doing some refactoring as well. Along with this may come some introductions
of breaking changes with some of the methods or properties within the API. Thus this
next version may be 2.0.
</p>
        <img width="0" height="0" src="http://blog.j-maxx.net/aggbug.ashx?id=ddec503d-058c-4c14-8d31-140047924fe1" />
      </body>
      <title>New Html Agility Pack Versions and Features</title>
      <guid isPermaLink="false">http://blog.j-maxx.net/PermaLink,guid,ddec503d-058c-4c14-8d31-140047924fe1.aspx</guid>
      <link>http://blog.j-maxx.net/2010/06/06/NewHtmlAgilityPackVersionsAndFeatures.aspx</link>
      <pubDate>Sun, 06 Jun 2010 00:17:49 GMT</pubDate>
      <description>&lt;p&gt;
Recently I have added 4 new projects to SVN for Html Agility Pack. 
&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;a href="#HAPLight"&gt;HAPLight&lt;/a&gt;: a Silverlight implementation 
&lt;/li&gt;
&lt;li&gt;
&lt;a href="#HAPCompact"&gt;HAPCompact&lt;/a&gt;: a .NET CF 3.5 version 
&lt;/li&gt;
&lt;li&gt;
&lt;a href="#DotNet4"&gt;HAP for .NET 4.0&lt;/a&gt;: taking advantage of DynamicObject. 
&lt;/li&gt;
&lt;li&gt;
&lt;a href="#UnitTests"&gt;Unit Tests&lt;/a&gt; 
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;
All of these are works in progress and should be considered in alpha stages thus no
binary releases for them yet. To use them you’ll need to download them from SVN. &lt;a title="http://htmlagilitypack.codeplex.com/SourceControl/list/changesets" href="http://htmlagilitypack.codeplex.com/SourceControl/list/changesets"&gt;http://htmlagilitypack.codeplex.com/SourceControl/list/changesets&lt;/a&gt;
&lt;/p&gt;
&lt;a name="HAPLight"&gt;&lt;/a&gt; 
&lt;h3&gt;HAPLight
&lt;/h3&gt;
&lt;p&gt;
Bringing Html Agility Pack to Silverlight was relatively simple, thanks to Silverlight
supporting XPATH and XpathNavigator. There have been two losses so far, HtmlCmdLine
and HtmlWeb. HtmlWeb is a big loss and I don't plan on leaving it that way. Silverlight
requires all web requests to be Asyncronous which HtmlWeb surely is not. So at some
point I will be making a version of HtmlWeb that exposes Asynchronous methods for
downloading pages and returning them as HtmlDocuments. For now you can do this yourself
without much code using WebClient.DownloadstringAsync()
&lt;/p&gt;
&lt;a name="HAPCompact"&gt;&lt;/a&gt; 
&lt;h3&gt;HAPCompact
&lt;/h3&gt;
&lt;p&gt;
Again making a port of Html Agility Pack to .NET CF wasn't too difficult. One of the
biggest issues is .NET CF has no XPathNavigator support. There are no good free implementations
and I don't expect there ever will be. So HAPCompact will need to rely on using LINQ
to Objects. This project needs to be built with Visual Studio 2008. Unfortunately
VS2010 did not include any .NET compact framework support. I've been trying to look
into a way of taking advantage of VS2010's multi-targeting to add back in compilation
support. I have many projects at work that are in .NET CF 2.0 and 3.5.
&lt;/p&gt;
&lt;a name="DotNet4"&gt;&lt;/a&gt; 
&lt;h3&gt;Html Agility Pack for .NET 4.0
&lt;/h3&gt;
&lt;p&gt;
.NET 4.0 shipped with the Dynamic Language Runtime included. C# was updated in turn
to include a dynamic typing system. I thought it would be interesting to see if HAP
could take advantage of these features to dynamically access HtmlNodes and HtmlAttributes.&amp;#160;
This project so far is a partial class that makes HtmlNode inherit from DynamicObject.
This may change later to have it just implement an interface instead. The advantage
of this is you can access first level child nodes and attributes without . Something
like documentElement.Html.Body.Div to get the first &amp;lt;div&amp;gt; on the page.
&lt;/p&gt;
&lt;p&gt;
In C# to use these features you need to indicate the object is dynamic. Simply assigning
the node to a variable typed as dynamic will suffice. I had hoped to use @ for getting
attributes but found that it is completely lost so to access attributes a prefix of
_ is needed. Here are some examples taken from the unit tests:
&lt;/p&gt;
&lt;div style="padding-bottom: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; display: inline; float: none; padding-top: 0px" id="scid:812469c5-0cb0-4c63-8c15-c81123a09de7:f1fe2ef8-b4a8-49b3-b582-ac5596512217" class="wlWriterEditableSmartContent"&gt;&lt;pre name="code" class="c#"&gt;[Test]
public void TestGetAttribute()
{
    var doc = new HtmlDocument();
    doc.LoadHtml("&amp;lt;html&amp;gt;&amp;lt;body class=\"asdfasd\"&amp;gt;&amp;lt;p&amp;gt;asdf asdf sdf&amp;lt;/p&amp;gt;&amp;lt;/body&amp;gt;&amp;lt;/html&amp;gt;");
    dynamic docElement = doc.DocumentNode;
    var item = docElement.Html.Body._Class;
    Assert.IsNotNull(item);
    Assert.IsInstanceOf&amp;lt;HtmlAttribute&amp;gt;(item);
}

[Test]
public void TestGetMember()
{
    var doc = new HtmlDocument();
    doc.LoadHtml("&amp;lt;html&amp;gt;&amp;lt;body&amp;gt;&amp;lt;p&amp;gt;asdf asdf sdf&amp;lt;/p&amp;gt;&amp;lt;/body&amp;gt;&amp;lt;/html&amp;gt;");
    dynamic docElement = doc.DocumentNode;
    var item = docElement.Html.Body;
    Assert.IsNotNull(item);
    Assert.IsInstanceOf&amp;lt;HtmlNode&amp;gt;(item);
}&lt;/pre&gt;
&lt;/div&gt;
&lt;p&gt;
&lt;/p&gt;
&lt;p&gt;
&lt;/p&gt;
&lt;p&gt;
Other ideas I’m having with this is to introduce some kind of domain specific language
for doing more specific accessing like documentElement.Html.Body.First_Div or documentElement.Html.Body.ById_Header
. This will be limited of course due to lack of symbols that could be used. 
&lt;/p&gt;
&lt;a name="UnitTests"&gt;&lt;/a&gt; 
&lt;h3&gt;Unit Tests
&lt;/h3&gt;
&lt;p&gt;
I’ve begun adding Unit Tests to Html Agility Pack. This will be a long process to
even approach a good code coverage percentage. There is quite a bit of code in the
library and some of it could use a good refactoring. So as I’m writing unit tests
I may be doing some refactoring as well. Along with this may come some introductions
of breaking changes with some of the methods or properties within the API. Thus this
next version may be 2.0.
&lt;/p&gt;
&lt;img width="0" height="0" src="http://blog.j-maxx.net/aggbug.ashx?id=ddec503d-058c-4c14-8d31-140047924fe1" /&gt;</description>
      <comments>http://blog.j-maxx.net/CommentView,guid,ddec503d-058c-4c14-8d31-140047924fe1.aspx</comments>
      <category>C# 4.0</category>
      <category>Html Agility Pack</category>
      <category>Silverlight</category>
    </item>
    <item>
      <trackback:ping>http://blog.j-maxx.net/Trackback.aspx?guid=8338593d-c072-440d-b928-314e7de18cc4</trackback:ping>
      <pingback:server>http://blog.j-maxx.net/pingback.aspx</pingback:server>
      <pingback:target>http://blog.j-maxx.net/PermaLink,guid,8338593d-c072-440d-b928-314e7de18cc4.aspx</pingback:target>
      <dc:creator>Jeff Klawiter</dc:creator>
      <wfw:comment>http://blog.j-maxx.net/CommentView,guid,8338593d-c072-440d-b928-314e7de18cc4.aspx</wfw:comment>
      <wfw:commentRss>http://blog.j-maxx.net/SyndicationService.asmx/GetEntryCommentsRss?guid=8338593d-c072-440d-b928-314e7de18cc4</wfw:commentRss>
      <slash:comments>1</slash:comments>
      <body xmlns="http://www.w3.org/1999/xhtml">
        <p>
For a few months now I’ve been working on a VS2010 extension I’m calling Funky Search.
It’s basic intent is to bring tag based search and replace functionality to Visual
Studio. My first order of business when creating this extension was the need for an
HTML Parsing Engine. I had used <a title="HTML Agility Pack Codeplex" href="http://htmlagilitypack.codeplex.com" target="_blank">HTML
Agility Pack</a> (HAP from now on) in the past. One downside of it is that it uses
XPATH for querying the HTML. While in it’s day XPATH was a decent solution for searching
XML structures, there are better searching solutions available today namely LINQ. 
</p>
        <p>
I set out and updated HAP to have all of it’s Node and Attribute collections to inherit
from IList&lt;T&gt; instead of implementing their own Enumerators. I then added many
helper methods to mimic LINQ to XML. With this I could now work on creating dynamic
LINQ statements to power my extension. 
</p>
        <p>
While working on this I got into the community of people using HAP and I came across
a larger issue, it had not been updated in years and the creator and other developer
on the project had seemed to abandon it. I sent many emails to the creator Simon Mourier
(former MS employee, and current CTO of SoftFluent) over the summer with no reply.
I finally found his work email and discovered he was on vacation until early September.
I was finally able to get in contact with him today and he added me as a developer
on the project. 
</p>
        <p>
This will mark the first time in about 5 years I’m a developer on an open source project.
Before coming to Sierra Bravo I was huge into open source, also at that time MS had
no free versions of Visual Studio. I was working as a PHP developer and had contributed
to some small projects and even worked on part of the Mozilla project adding in an
easier way to code-sign your Mozilla/Firefox extensions.
</p>
        <p>
I’m looking forward to advancing HAP, fixing bugs and making it easier to use. It
sits in a unique position as being the only freely available HTML parser that works.
While it can be used for dubious purposes as a page scraper it can also be used for
good. I’ve used it in the past where we had a client that had their hosting provider
go out of business, their site was going to only be up for another day and we had
no direct access to their database server. We had FTP access to get the code of the
site and access to a readonly front end that displayed the contents of the tables
in html with no export functionality. I wrote a scraper with HAP to get those tables
and put them into an importable format. With it I was able to download and import
their database and save their site. 
</p>
        <img width="0" height="0" src="http://blog.j-maxx.net/aggbug.ashx?id=8338593d-c072-440d-b928-314e7de18cc4" />
      </body>
      <title>HTML Agility Pack - Contributor</title>
      <guid isPermaLink="false">http://blog.j-maxx.net/PermaLink,guid,8338593d-c072-440d-b928-314e7de18cc4.aspx</guid>
      <link>http://blog.j-maxx.net/2009/09/15/HTMLAgilityPackContributor.aspx</link>
      <pubDate>Tue, 15 Sep 2009 16:02:42 GMT</pubDate>
      <description>&lt;p&gt;
For a few months now I’ve been working on a VS2010 extension I’m calling Funky Search.
It’s basic intent is to bring tag based search and replace functionality to Visual
Studio. My first order of business when creating this extension was the need for an
HTML Parsing Engine. I had used &lt;a title="HTML Agility Pack Codeplex" href="http://htmlagilitypack.codeplex.com" target="_blank"&gt;HTML
Agility Pack&lt;/a&gt; (HAP from now on) in the past. One downside of it is that it uses
XPATH for querying the HTML. While in it’s day XPATH was a decent solution for searching
XML structures, there are better searching solutions available today namely LINQ. 
&lt;/p&gt;
&lt;p&gt;
I set out and updated HAP to have all of it’s Node and Attribute collections to inherit
from IList&amp;lt;T&amp;gt; instead of implementing their own Enumerators. I then added many
helper methods to mimic LINQ to XML. With this I could now work on creating dynamic
LINQ statements to power my extension. 
&lt;/p&gt;
&lt;p&gt;
While working on this I got into the community of people using HAP and I came across
a larger issue, it had not been updated in years and the creator and other developer
on the project had seemed to abandon it. I sent many emails to the creator Simon Mourier
(former MS employee, and current CTO of SoftFluent) over the summer with no reply.
I finally found his work email and discovered he was on vacation until early September.
I was finally able to get in contact with him today and he added me as a developer
on the project. 
&lt;/p&gt;
&lt;p&gt;
This will mark the first time in about 5 years I’m a developer on an open source project.
Before coming to Sierra Bravo I was huge into open source, also at that time MS had
no free versions of Visual Studio. I was working as a PHP developer and had contributed
to some small projects and even worked on part of the Mozilla project adding in an
easier way to code-sign your Mozilla/Firefox extensions.
&lt;/p&gt;
&lt;p&gt;
I’m looking forward to advancing HAP, fixing bugs and making it easier to use. It
sits in a unique position as being the only freely available HTML parser that works.
While it can be used for dubious purposes as a page scraper it can also be used for
good. I’ve used it in the past where we had a client that had their hosting provider
go out of business, their site was going to only be up for another day and we had
no direct access to their database server. We had FTP access to get the code of the
site and access to a readonly front end that displayed the contents of the tables
in html with no export functionality. I wrote a scraper with HAP to get those tables
and put them into an importable format. With it I was able to download and import
their database and save their site. 
&lt;/p&gt;
&lt;img width="0" height="0" src="http://blog.j-maxx.net/aggbug.ashx?id=8338593d-c072-440d-b928-314e7de18cc4" /&gt;</description>
      <comments>http://blog.j-maxx.net/CommentView,guid,8338593d-c072-440d-b928-314e7de18cc4.aspx</comments>
      <category>CodePlex</category>
      <category>Html Agility Pack</category>
      <category>LINQ</category>
      <category>Visual Studio 2010</category>
    </item>
  </channel>
</rss>