.NN#11: Extending XSLT Stylesheets with .Net Code

Wow, it’s been over a month since I last posted a .Net Nugget.  Sorry about that.  I started this one about two weeks ago and have been slowly updating the draft as I found time.  It might have come out sooner if I hadn’t gotten Mass Effect.

As a developer, XML and related technologies are an excellent (almost mandatory) tool to have in your toolbox.  Not just knowing what XML is and being able to throw some angle brackets together to produce a simple XML document, but being familiar with the power of related technology like XSLT as well.  With XSLT we can take an XML document source and transform it to a variety of output formats.  There are countless articles and tutorials on the web to help you get your head around XSLT if you are new to it; however, this .Net Nugget will focus on a more advanced capability of XSLT that is available to us when using the .Net XML Parser for transforms.

If you’ve ever written XSLT stylesheets you’ll remember that XSLT (mostly via XPath) does have some limited built in functions.  These provide methods for string manipulation, math helpers, date calculations and more. (Note that the baked in XML bits in .Net handle XSLT 1.0, and not even .Net 3.5 can deal with XSLT 2.0 functions yet.  There are rumors of this being included in a future .Net framework release or a stand-alone web release.)  If these aren’t enough, and you are working with the Microsoft parser, you can also embed some JavaScript, C# or VB.Net to contain your own functions by using the < msxml:script > tag.  This is nice because you can get at the Framework if you need to; however, as with any solution, there may be some gotchas.  Using the < msxml:script > tag is known to have some memory repercussions in some situations.  Also, these scripts are embedded directly into the stylesheet and so the only reuse options you have is cut & paste or includes/imports.

While the msxml tag can get you some additional power in your transforms, XSLT Extension Objects are even more powerful.  With extension objects you can create a .Net assembly that contains functions you want to use within your XSLT and then reference them from within your stylesheets!  This promotes reuse of your functions and, in my opinion, allows for better unit testing of your functions.

For my example I’m going to use a completely arbitrary scenario.  In the built in functions there is no simple way to format a date.  You can create a date format function by piecing together the string manipulation functions, but it’s pretty nasty looking and you are still limited as you’ll see below:

Here is the XML I’m using:

<?xml version="1.0" encoding="utf-8"?>
<example>
    <myDate>2007-11-23T00:00:00.0000000-05:00</myDate>
    <myDate>2007-11-25T11:05:00.0000000-05:00</myDate>
    <myDate>2007-11-21T13:01:00.0000000-05:00</myDate>
    <myDate>2007-11-19T15:45:23.0000000-05:00</myDate>
</example>

Here is one way using string manipulation to format our date (I’m only showing a small part of the XSLT code here, just the template around displaying the date):

<xsl:template match="//myDate/text()">
    <xsl:element name="formattedDate">
        <xsl:value-of select="concat(substring-before(substring-after(., '-'), '-'), '/' ,substring-before(substring-after(substring-after(., '-'), '-'), 'T'), '/', substring-before(., '-'))" />
    </xsl:element>
</xsl:template>

The code above produces dates in the Day-Month-Year format.  The output looks like this:

<?xml version="1.0" encoding="utf-8"?>
<root>
  <formattedDate>11/23/2007</formattedDate>
  <formattedDate>11/25/2007</formattedDate>
  <formattedDate>11/21/2007</formattedDate>
  <formattedDate>11/19/2007</formattedDate>
</root>

This is really nasty looking code.  Can you imagine if you used this repeatedly over several locations in multiple stylesheets and then had to change your format for some reason?  Ouch.  Also, you’re limited to just pulling out the numbers when using substring.  What if you want “Dec” for December instead of 12?  Well, that’s just more code you have nest together.

Let’s try to extract this function using the msxml script tag.

First we have to inform the parser that we will be using script within the stylesheet.  This is done by ensuring the msxml namespace is on your root stylesheet tag.  Looks like the XSLT template in VS 2008 adds that in automatically.  Also, we add an additional namespace for our custom code, in this case I’m calling it the scr namespace.  Our Stylesheet element now looks like this:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:msxsl="urn:schemas-microsoft-com:xslt"
    xmlns:scr="http://www.myNamespace">

Next we code our custom script function, then pass the value we want formatted to that function.  Below is a copy of the full stylesheet using the mxsml:script tag.

<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:msxsl="urn:schemas-microsoft-com:xslt"
    xmlns:scr="http://www.myNamespace">
    <xsl:output method="xml" indent="yes"/>
 
    <xsl:template match="example">
        <xsl:element name="root">
            <xsl:apply-templates select="//myDate"></xsl:apply-templates>
        </xsl:element>
    </xsl:template>
 
    <xsl:template match="myDate">
        <xsl:element name="formattedDate">
            <xsl:value-of select="scr:formatDateString(./text())" />
        </xsl:element>
    </xsl:template>
 
    <msxsl:script language="VB" implements-prefix="scr">
        <![CDATA[
        Function formatDateString(inputDate as string) as String
            Return FormatDateTime(inputDate, DateFormat.ShortDate)
        End Function
        ]]>
    </msxsl:script>
</xsl:stylesheet>

Note that we are passing the text value of the current node from the template into the formatDateString function.  The function then returns the formatted date using the intrinsic FormatDateTime VB.Net function.  I could have used JScript or C# as well.  The function looks a little cleaner than the previous code and we can reuse the script in more than one location within this stylesheet. Note that the output of this stylesheet is exactly the same as before.  Also note hat the function has a CDATA tag around it.  This is important because there are many characters in code that are not valid inside XML without being escaped.

What if we wanted to share this function across stylesheets?  Well, you can use xsl:import or xsl:include to do this.  Each of your stylesheets could just include a common xslt that contained all your functions.  This works so long as you can pull off what your wanting to do within the confines of the stylesheet (and it doesn’t cause the memory leak that I indicated earlier).  But what if you wanted to use a function you already had in a compiled assembly?  That’s where extension objects come into play.

To keep with our previous example I’ve created a class in VB.Net that will be our extension:

Public Class FormatMethods
 
    Public Function FormatDateString(inputDate As string) As String
        Return FormatDateTime(inputDate, DateFormat.ShortDate)
    End Function
 
End Class

This is pretty much the exact code as noted before in the msxml script.  Notice there is nothing in this code that indicates it is used for XSLT transforms?  As long as what you want to use is a public method and returns something you can use (string in most cases) you should be good.  Do note that the documentation indicates that you can’t use functions that use the params keyword, which lets there be an indeterminate number of parameters passed to the function.

Now, what follows is the code used to apply the transform:

Imports System.Xml.Xsl
Imports System.Xml.XPath
Imports System.Xml
 
Module Module1
 
    Sub Main()
 
        Dim styleSheet As New XslCompiledTransform
        styleSheet.Load("ourStylesheet.xslt")
 
        Dim inputData As New XPathDocument("inputData.xml")
 
        Dim argList As New XsltArgumentList
 
        argList.AddExtensionObject("http://www.myNamespace", New FormatMethods())
 
        styleSheet.Transform(inputData, argList, XmlWriter.Create("output.xml"))
    End Sub
 
End Module

First we create and load up our stylesheet via the XslCompiledTransform class.  We also load up our input data into an XPathDocument object.  The key here is that we have to provide a map for the methods in our Extension Object, which we do with using the System.Xml.Xsl.XsltArgumentList object.  Register the Extension Object by calling AddExtensionObject and provide the namespace you want to use for mapping and an instance of the extension object.  Then call the transform method on the XslCompiledTransform class and pass along the XsltArugmentList instance. 

Below is the full stylesheet used when I added the extension object:

<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:scr=http://www.myNamespace>
    <xsl:output method="xml" indent="yes"/>
 
    <xsl:template match="example">
        <xsl:element name="root">
            <xsl:apply-templates select="//myDate"></xsl:apply-templates>
        </xsl:element>
    </xsl:template>
 
    <xsl:template match="myDate">
        <xsl:element name="formattedDate">
            <xsl:value-of select="scr:FormatDateString(./text())" />
        </xsl:element>
    </xsl:template>
 
 
</xsl:stylesheet>

Note that the msxml namespace reference is not included, but that our custom namespace is.  The “http://www.myNamespace” is our mapping key.  Back in the module code above you’ll see that when we call AddExtensionObject we include this namespace.  The parser will then map this namespace to the methods within the extension object.  So when the parser see scr:FormatDateString in the stylesheet it will call the FormatDateString method in our FormatMethods class.

My example is very weak since I’m just dealing with formatting a date, but the technique is there for whatever you need to do.  A few benefits I see with using extension objects:

  • More concise code in both your helper functions within the Extension Object and in your stylesheets.
  • Share commonly used functions across stylesheets.
  • Have the fun functionality of the .Net Framework inside your transforms, just be aware of what your doing.
  • And the #1 benefit (IMO): Better unit testing of your functions.  I can easily create a unit test around the FormatDateString function and ensure it is doing what I want it to.  This is much easier than trying to test code that is trapped inside a script tag within a stylesheet.

This is a cool tool to put in your toolbox for when you need to share some code between stylesheets.  Be aware of what you are doing with it and as always, check the performance of your project.  Be aware if what you are doing within the extension objects is causing memory bloat or other side effects. 

Security Note: The use of Extension objects requires Full Trust.  Bummer. Something to keep in mind.

Reader’s Note: Do not confuse Extension Objects for XSLT with the new Extension Methods features in .Net.  They are completely different.

All code compiled and tested using VS.Net 2008 targeting the .Net 2.0 Framework.