Sign In

Navigation

On This Page

New Html Agility Pack Versions and Features
CLR 4.0 to include the DLR - With Limitations
Update on new C# 4 features

Archive

<September 2010>
SunMonTueWedThuFriSat
2930311234
567891011
12131415161718
19202122232425
262728293012
3456789

Categories

Blogroll

Contact

Send mail to the author(s) Email Me
MCPD
MCTS

Disclaimer

The opinions expressed herein are my own personal opinions and do not represent my employer's view in any way


Copyright ©  2010
 Creative Commons License
This work by Jeff Klawiter is, unless explicitly stated in the article,  available under the Creative Commons Attribution 3.0 United States License.

Pick a theme:
# Saturday, June 05, 2010
by Jeff Klawiter - Saturday, June 05, 2010 6:17:49 PM (Central Standard Time, UTC-06:00)

Recently I have added 4 new projects to SVN for Html Agility Pack.

  1. HAPLight: a Silverlight implementation
  2. HAPCompact: a .NET CF 3.5 version
  3. HAP for .NET 4.0: taking advantage of DynamicObject.
  4. Unit Tests

All of these are works in progress and should be considered in alpha stages thus no binary releases for them yet. To use them you’ll need to download them from SVN. http://htmlagilitypack.codeplex.com/SourceControl/list/changesets

HAPLight

Bringing Html Agility Pack to Silverlight was relatively simple, thanks to Silverlight supporting XPATH and XpathNavigator. There have been two losses so far, HtmlCmdLine and HtmlWeb. HtmlWeb is a big loss and I don't plan on leaving it that way. Silverlight requires all web requests to be Asyncronous which HtmlWeb surely is not. So at some point I will be making a version of HtmlWeb that exposes Asynchronous methods for downloading pages and returning them as HtmlDocuments. For now you can do this yourself without much code using WebClient.DownloadstringAsync()

HAPCompact

Again making a port of Html Agility Pack to .NET CF wasn't too difficult. One of the biggest issues is .NET CF has no XPathNavigator support. There are no good free implementations and I don't expect there ever will be. So HAPCompact will need to rely on using LINQ to Objects. This project needs to be built with Visual Studio 2008. Unfortunately VS2010 did not include any .NET compact framework support. I've been trying to look into a way of taking advantage of VS2010's multi-targeting to add back in compilation support. I have many projects at work that are in .NET CF 2.0 and 3.5.

Html Agility Pack for .NET 4.0

.NET 4.0 shipped with the Dynamic Language Runtime included. C# was updated in turn to include a dynamic typing system. I thought it would be interesting to see if HAP could take advantage of these features to dynamically access HtmlNodes and HtmlAttributes.  This project so far is a partial class that makes HtmlNode inherit from DynamicObject. This may change later to have it just implement an interface instead. The advantage of this is you can access first level child nodes and attributes without . Something like documentElement.Html.Body.Div to get the first <div> on the page.

In C# to use these features you need to indicate the object is dynamic. Simply assigning the node to a variable typed as dynamic will suffice. I had hoped to use @ for getting attributes but found that it is completely lost so to access attributes a prefix of _ is needed. Here are some examples taken from the unit tests:

[Test]
public void TestGetAttribute()
{
    var doc = new HtmlDocument();
    doc.LoadHtml("<html><body class=\"asdfasd\"><p>asdf asdf sdf</p></body></html>");
    dynamic docElement = doc.DocumentNode;
    var item = docElement.Html.Body._Class;
    Assert.IsNotNull(item);
    Assert.IsInstanceOf<HtmlAttribute>(item);
}

[Test]
public void TestGetMember()
{
    var doc = new HtmlDocument();
    doc.LoadHtml("<html><body><p>asdf asdf sdf</p></body></html>");
    dynamic docElement = doc.DocumentNode;
    var item = docElement.Html.Body;
    Assert.IsNotNull(item);
    Assert.IsInstanceOf<HtmlNode>(item);
}

Other ideas I’m having with this is to introduce some kind of domain specific language for doing more specific accessing like documentElement.Html.Body.First_Div or documentElement.Html.Body.ById_Header . This will be limited of course due to lack of symbols that could be used.

Unit Tests

I’ve begun adding Unit Tests to Html Agility Pack. This will be a long process to even approach a good code coverage percentage. There is quite a bit of code in the library and some of it could use a good refactoring. So as I’m writing unit tests I may be doing some refactoring as well. Along with this may come some introductions of breaking changes with some of the methods or properties within the API. Thus this next version may be 2.0.

# Tuesday, October 28, 2008
by Jeff Klawiter - Tuesday, October 28, 2008 10:23:43 AM (Central Standard Time, UTC-06:00)
All of these new C# 4.0 dynamic features require parts of the DLR. Thus it looks like MS is taking the DLR and making it a first class citizen in the CLR. This also I'm guessing will make IronPython and IronRuby first class citizens as well. A huge win for the dynamic languages community. For C# 4.0 it is bittersweet. It means better interoperability when calling things created in IronRuby or IronPython but there are limitations. Below is an excerpt from the C# 4.0 WhitePaper (available here http://code.msdn.microsoft.com/csharpfuture)

Open issues

There are a few limitations and things that might work differently than you would expect.

·         The DLR allows objects to be created from objects that represent classes. However, the current implementation of C# doesn’t have syntax to support this.

·         Dynamic lookup will not be able to find extension methods. Whether extension methods apply or not depends on the static context of the call (i.e. which using clauses occur), and this context information is not currently kept as part of the payload.

·         Anonymous functions (i.e. lambda expressions) cannot appear as arguments to a dynamic method call. The compiler cannot bind (i.e. “understand”) an anonymous function without knowing what type it is converted to.

One consequence of these limitations is that you cannot easily use LINQ queries over dynamic objects:

dynamic collection = …;

var result = collection.Select(e => e + 5);

If the Select method is an extension method, dynamic lookup will not find it. Even if it is an instance method, the above does not compile, because a lambda expression cannot be passed as an argument to a dynamic operation.

There are no plans to address these limitations in C# 4.0.


To me this is a very huge limitation. I can already see that most of my interop with dynamic languages will probably involve collections of some sort. Also this would come into play with collections from Dynamic COM objects. LINQ is so powerful and easy to use, it may end up being a major annoyance to have to move away from it for dynamic typing. I hope they work on this for C# 4.5


Comments [0] #      C# 4.0  |  kick it on DotNetKicks.com Shout it
by Jeff Klawiter - Tuesday, October 28, 2008 8:13:13 AM (Central Standard Time, UTC-06:00)
While browsing through MSDN blogs I came across this nice little post. http://blogs.msdn.com/dparys/archive/2008/10/28/neue-m-glichkeiten-in-c-4-0.aspx . After translating the page I found that he linked to the new C# 40 page http://code.msdn.microsoft.com/csharpfuture

I played around with VS 2010 last night. I was able to test the dynamic keyword. It works as advertised but the biggest thing one has to realize is using it removes intellisense for that variable. Compiling type safety as well. I hope they'll be able to add some sort of limited intellisense by looking at the last assigned type.

Also on the Dynamic front is DynamicObject. A new base object type that allows for on the fly Property declaration. The DynamicObject uses a PropertyBag (looks like a Dictionary<string,object>). You can declare properties on the fly. Like
public class MyBag : DynamicObject
{
// überschreibt Getter / Setter
} 
dynamic b = new MyBag();
b.Id = 124;
b.Name = "Windows 7"
b.Price = 499.99m;
b.IsAvailable = false;

One thing I was unable to figure out was the optional, default and named parameters. Again the blog provided some answers.

public void InsertCustomer( int customerId,
                          string companyName = "Neue Firma",
                          decimal creditLimit = 2000m )
{
}

InsertCustomer( 1, creditLimit: 2000m );  

InsertCustomer( creditLimit: 2000m, customerId: 1 );





Comments [1] #      C# | C# 4.0  |  kick it on DotNetKicks.com Shout it