Sign In

Navigation

On This Page

HTML Agility Pack - Contributor
LINQ: Refactoring Inline Instantiation
The blog hath arrived

Archive

<September 2010>
SunMonTueWedThuFriSat
2930311234
567891011
12131415161718
19202122232425
262728293012
3456789

Categories

Blogroll

Contact

Send mail to the author(s) Email Me
MCPD
MCTS

Disclaimer

The opinions expressed herein are my own personal opinions and do not represent my employer's view in any way


Copyright ©  2010
 Creative Commons License
This work by Jeff Klawiter is, unless explicitly stated in the article,  available under the Creative Commons Attribution 3.0 United States License.

Pick a theme:
# Tuesday, September 15, 2009
by Jeff Klawiter - Tuesday, September 15, 2009 10:02:42 AM (Central Standard Time, UTC-06:00)

For a few months now I’ve been working on a VS2010 extension I’m calling Funky Search. It’s basic intent is to bring tag based search and replace functionality to Visual Studio. My first order of business when creating this extension was the need for an HTML Parsing Engine. I had used HTML Agility Pack (HAP from now on) in the past. One downside of it is that it uses XPATH for querying the HTML. While in it’s day XPATH was a decent solution for searching XML structures, there are better searching solutions available today namely LINQ.

I set out and updated HAP to have all of it’s Node and Attribute collections to inherit from IList<T> instead of implementing their own Enumerators. I then added many helper methods to mimic LINQ to XML. With this I could now work on creating dynamic LINQ statements to power my extension.

While working on this I got into the community of people using HAP and I came across a larger issue, it had not been updated in years and the creator and other developer on the project had seemed to abandon it. I sent many emails to the creator Simon Mourier (former MS employee, and current CTO of SoftFluent) over the summer with no reply. I finally found his work email and discovered he was on vacation until early September. I was finally able to get in contact with him today and he added me as a developer on the project.

This will mark the first time in about 5 years I’m a developer on an open source project. Before coming to Sierra Bravo I was huge into open source, also at that time MS had no free versions of Visual Studio. I was working as a PHP developer and had contributed to some small projects and even worked on part of the Mozilla project adding in an easier way to code-sign your Mozilla/Firefox extensions.

I’m looking forward to advancing HAP, fixing bugs and making it easier to use. It sits in a unique position as being the only freely available HTML parser that works. While it can be used for dubious purposes as a page scraper it can also be used for good. I’ve used it in the past where we had a client that had their hosting provider go out of business, their site was going to only be up for another day and we had no direct access to their database server. We had FTP access to get the code of the site and access to a readonly front end that displayed the contents of the tables in html with no export functionality. I wrote a scraper with HAP to get those tables and put them into an importable format. With it I was able to download and import their database and save their site.

# Saturday, August 23, 2008
by Jeff Klawiter - Saturday, August 23, 2008 2:39:02 PM (Central Standard Time, UTC-06:00)

Over the summer I was able to run my first large project in .NET 3.5. I had a chance to put to use all the new features and learned quite a bit on the way. I've blogged a bit about this project before. The project contained 2 data backends a local SQL database and a 3rd-party ASMX service. I took the approach of having a business object library that contain only class definitions that I had full control over. I split out the backends to their own libraries with a main Datalayer library that handled the communication with the two underneath it.

Initially I started coding the two bottom layers to do the object instantiation inline in the LINQ queries. As the layers grew I began to refactor much of the instantiation to methods that mapped the layer objects to the business objects.

Converting an inline expression like this

var result = from id in dc.OrderDetails
             where id.OrderID == OrderId
             select new DataObjects.OrderItem()
             {
                 PartID = id.PartID,
                 Price = id.Price,
                 Quantity = id.Quantity
             };

To this

var result = from id in dc.OrderDetails
             where id.OrderID == OrderId
             select MapOrderDetailToOrderItem(id);

Using the LINQ to SQL Classes was a perfect fit. They are by far much easier to use as an ORM than SQL Datasets and DataAdapters are. When retrieving lists such as items on an order the relationship propertis on the SQL objects made it extremely easy to have clean data access.

After completing the project I started thinking about what the performance impact of the refactoring had on the data access. So I decided to run some tests. Initially I thought that the inline instantiation would probably be faster since it was constructing the object in the expression instead of getting the SQL objects and then passing it to a function.

I wrote a quick program to do some performance testing on both implementations. I set up examples using a common real world call. Retrieving an order from a database with the line items on the order Below you can see the inline call and the refactored call.

[Tests.cs]
    static class Tests
    {
        public static void RunInlineInitialization()
        {
            using (SqlOrdersDataContext dc = new SqlOrdersDataContext())
            {
                var result = from o in dc.OrderHeaders
                             select new DataObjects.Order()
                             {
                                 CustomerID = o.CustomerID,
                                 OrderDate = o.OrderDate,
                                 OrderID = o.OrderID,
                                 OrderTotal = o.Total,
                                 ShippingAddress1 = o.ShippingAddress1,
                                 ShippingAddress2 = o.ShippingAddress2,
                                 ShippingCity = o.ShippingCity,
                                 ShippingDate = o.ShippingDate,
                                 ShippingMethod = o.ShippingMethod,
                                 ShippingState = o.ShippingState,
                                 ShippingTotal = o.ShippingTotal,
                                 ShippingZip = o.ShippingZip,
                                 SubTotal = o.SubTotal,
                                 TaxTotal = o.TaxTotal,
                                 TrackingNumber = o.TrackingNumber,
                                 Details = o.OrderDetails.Select(od => new DataObjects.OrderItem()
                                 {
                                     PartID = od.PartID,
                                     Price = od.Price,
                                     Quantity = od.Quantity
                                 }).ToList()
                             };
                DataObjects.Order order = result.FirstOrDefault();
            }
        }
        public static void RunRefactoredInitialization()
        {
            using (SqlOrdersDataContext dc = new SqlOrdersDataContext())
            {
                var result = from o in dc.OrderHeaders
                             select MapOrderHeaderToDataObjectOrder(o);
                DataObjects.Order order = result.FirstOrDefault();
            }
        }

        private static LinqTest.DataObjects.Order MapOrderHeaderToDataObjectOrder(OrderHeader o)
        {
            return new DataObjects.Order()
            {
                CustomerID = o.CustomerID,
                OrderDate = o.OrderDate,
                OrderID = o.OrderID,
                OrderTotal = o.Total,
                ShippingAddress1 = o.ShippingAddress1,
                ShippingAddress2 = o.ShippingAddress2,
                ShippingCity = o.ShippingCity,
                ShippingDate = o.ShippingDate,
                ShippingMethod = o.ShippingMethod,
                ShippingState = o.ShippingState,
                ShippingTotal = o.ShippingTotal,
                ShippingZip = o.ShippingZip,
                SubTotal = o.SubTotal,
                TaxTotal = o.TaxTotal,
                TrackingNumber = o.TrackingNumber,
                Details = MapOrderDetailToDataObjectOrderItem(o)
            };
        }

        private static List<LinqTest.DataObjects.OrderItem> MapOrderDetailToDataObjectOrderItem(OrderHeader o)
        {
            return o.OrderDetails.Select(od => new DataObjects.OrderItem()
            {
                PartID = od.PartID,
                Price = od.Price,
                Quantity = od.Quantity
            }).ToList();
        }
    }

As you can see both public methods do the same thing. The second test was refactored easily using the Refactor-Extract Method menu item in Visual Studio. I load the orders from the database, take the SQL OrderHeader object and map it to the business object. The Details property on the Order object is simply a generic list of OrderItems. To retrieve them I do a quick lambda expression to query the OrderDetails relationship property. 

For some the refactoring goes without saying. Modularizing code like this makes it more maintainable and reusable. This concept can be foreign to some procedural programmers. With Visual Studio and addins like Resharper refactoring becomes so easy it's almost an afterthought to do it. For anyone that still doesn't see the benefit with refactoring, I hope this article will help you.

The testing program is pretty simple, pass in the amount of iterations and a boolean to turn pre-JITing on or off.

[Program.cs - (some code removed for brevity)]
        static System.Diagnostics.Stopwatch stp = new System.Diagnostics.Stopwatch();
        static int Runs = 10;
        static bool PreJitRoutines = false;
        
        static void Main(string[] args)
        {
            ProcessCommandLineArguments(args);

            //Lets get JIT over all the methods in question
            if (PreJitRoutines)
            {
                PreJitTestRoutines();
            }
            //Display Current Selected Options
            Console.WriteLine("Number of Runs: {0}", Runs);
            Console.WriteLine("Pre JIT Enabled: {0}", PreJitRoutines);
            
            //Run and Measure Inline Test
            stp.Start();
            for (int i = 0; i <= Runs; i++)
            {
                Tests.RunInlineInitialization();
            }
            stp.Stop();
            //Display Test Results
            Console.WriteLine("Inline Initialization Test: {0} , Average: {1}", stp.Elapsed, new TimeSpan(stp.ElapsedTicks/Runs));
           
            //Save Result for later calculations
            TimeSpan FirstRun = stp.Elapsed;
            
            //Reset StopWatch
            stp.Reset();

            //Run and Measure Refactored Test
            stp.Start();
            for (int i = 0; i <= Runs; i++)
            {
                Tests.RunRefactoredInitialization();
            }
            stp.Stop();
            //Display Refactored test results
            Console.WriteLine("Refactored Initialization Test: {0} , Average: {1}", stp.Elapsed, new TimeSpan(stp.ElapsedTicks / Runs));
            
            //Perform and report comparisons between tests
            if (FirstRun.CompareTo(stp.Elapsed)<0)
                Console.WriteLine("Inline Construction Faster: {0:f}", stp.Elapsed.TotalMilliseconds / FirstRun.TotalMilliseconds);
            else
                Console.WriteLine("Refactored Construction Faster: {0:f}", FirstRun.TotalMilliseconds / stp.Elapsed.TotalMilliseconds);

        }

I ran the tests in release mode with iterations of 1, 10, 100 and 1000.

>LinqTest.exe 1  true
Number of Runs: 1
Pre JIT Enabled:  True
Inline  Initialization Test: 00:00:00.0124051 , Average: 00:00:00.0177619
Refactored  Initialization Test: 00:00:00.0121951 , Average: 00:00:00.0174613
Refactored  Construction Faster: 1.02


>LinqTest.exe  10 true
Number of Runs: 10
Pre JIT Enabled:  True
Inline  Initialization Test: 00:00:00.0701888 , Average: 00:00:00.0100497
Refactored  Initialization Test: 00:00:00.0650985 , Average: 00:00:00.0093209
Refactored  Construction Faster: 1.08


>LinqTest.exe  100 true
Number of Runs:  100
Pre JIT Enabled:  True
Inline  Initialization Test: 00:00:00.6291376 , Average: 00:00:00.0090081
Refactored  Initialization Test: 00:00:00.5354964 , Average: 00:00:00.0076673
Refactored  Construction Faster: 1.17


>LinqTest.exe  1000 true
Number of Runs:  1000
Pre JIT Enabled:  True
Inline  Initialization Test: 00:00:06.2699034 , Average: 00:00:00.0089773
Refactored  Initialization Test: 00:00:05.3725538 , Average: 00:00:00.0076925
Refactored  Construction Faster: 1.17

As you can see at 1 iteration there's barely a difference. As we move up the scale the refactored code does consistently outperform the inline expression. This outcome was different from my initial hypothesis. I decided to dig a bit deeper and find out why. So I pulled out ILDasm to see what was going on. I was surprised to see that the IL generated for inline test was twice as long as the refactored test. Looking at the code it became clear what was going on.

Inline IL
 IL_0042:   stloc.3
 IL_0043:   ldloc.3
 IL_0044:   ldc.i4.0
 IL_0045:   ldtoken    method instance void  LinqTest.DataObjects.Order::set_CustomerID(int32)
 IL_004a:   call       class [mscorlib]System.Reflection.MethodBase  [mscorlib]System.Reflection.MethodBase::GetMethodFromHandle(valuetype  [mscorlib]System.RuntimeMethodHandle)
 IL_004f:   castclass  [mscorlib]System.Reflection.MethodInfo
 IL_0054:   ldloc.2
 IL_0055:   ldtoken    method instance int32  LinqTest.OrderHeader::get_CustomerID()
 IL_005a:   call       class [mscorlib]System.Reflection.MethodBase  [mscorlib]System.Reflection.MethodBase::GetMethodFromHandle(valuetype  [mscorlib]System.RuntimeMethodHandle)
 IL_005f:   castclass  [mscorlib]System.Reflection.MethodInfo
 IL_0064:   call       class  [System.Core]System.Linq.Expressions.MemberExpression  [System.Core]System.Linq.Expressions.Expression::Property(class  [System.Core]System.Linq.Expressions.Expression,
 class  [mscorlib]System.Reflection.MethodInfo)
 IL_0069:   call       class  [System.Core]System.Linq.Expressions.MemberAssignment  [System.Core]System.Linq.Expressions.Expression::Bind(class  [mscorlib]System.Reflection.MethodInfo,
 class  [System.Core]System.Linq.Expressions.Expression)
Refactored IL
 IL_0005:   stloc.0
 IL_0006:   ldloc.0
 IL_0007:   ldarg.0
 IL_0008:   callvirt   instance int32 LinqTest.OrderHeader::get_CustomerID()
 IL_000d:   callvirt   instance void  LinqTest.DataObjects.Order::set_CustomerID(int32)

The inline call uses reflection on the objects to build the instantiation into the expression tree. It has to load the information about the SQL OrderHeader.CustomerID property via reflection. It then does the same thing for Order.CustomerID on the business object. After that it takes the value loaded from the sql object and binds it to the business object. 

The refactored code skips the reflection entirely. Since the method is expecting an order object LINQ to SQL just needs to do what it does best, load data from the database and map it to the ORM object. The refactored methods just need to do straight property assignment

Now most of this testing was doing with the C# LINQ syntax and not the chained function calls. I'm going to dig a bit deeper and recreate this with pure lambda expressions and see how that stacks up. I have a feeling they will probably perform close to the refactored examples

Another thing that makes me curious is the plateau reached and the differences between 1 to 100 iterations. I have a sneaking suspiscion that some of this maybe related to the JIT compiler and the garbage collector optimizing for the pattern of execution. I'll probably throw in some garbage collection counters to see how different they are.

So for now the moral of the story is refactoring is not only good for code reuse, simplicity it can also help increase performance. Without moving the mapping into another method other calls would have increased JIT time due to having more code than needed. The mapping call only needs to be JITed once.

kick it on DotNetKicks.com
Comments [2] #      LINQ | Performance  |  kick it on DotNetKicks.com Shout it
# Tuesday, June 24, 2008
by Jeff Klawiter - Tuesday, June 24, 2008 6:25:49 PM (Central Standard Time, UTC-06:00)
Well I'm finally opening a real blog. Not sure how much I will be updating this but hey I have a real online presence again. The old J-Maxx Net will stay as it is. Still get a decent amount of traffic on there (kicking myself for not installing adsense years ago).

To start off with.. I love LINQ. I've been looking into it for over a year now but finally have a project where I get to use it fully. C# 3.0 has added so many features I no longer pine for PHP as I once did. Here's a sample below of something I wrote the other day.

            var result = from c in CurrentDataContext.Categories
                         join localinfo in CurrentDataContext.CategoryLocalizationInfos
                            on c.CategoryID equals localinfo.CategoryID
                         where c.CategoryID == CategoryID
                         && localinfo.Language == Language
                         select new Business.Data.Category()
                         {
                             ID = c.CategoryID,
                             Description = localinfo.Description,
                             Languages = GetLanguagesForCategory(c.CategoryID),
                             Name = c.Name,
                             SortOrder = c.SortOrder,
                             Title = localinfo.Title,
                             NavImage = c.NavImage,
                             Language = localinfo.Language,
                             Links = GetLinksForCategory(c.CategoryID, Language),
                             IsDataLoadedFromSql = true
                         };

Comments [0] #      C# | LINQ  |  kick it on DotNetKicks.com Shout it