Sunday, September 27, 2009

Thinking Functionally in XSLT

My current (almost done) project at work is a single-source publication pipeline using converting XML to XSL-FO with XSLT. I'm working with four other developers; one who sits next to me, two at a sister company down in Pittsburgh, who I've met with a couple times in the last year, and one more as a paid contractor at another remote site. We write stylesheets in Altova XML Spy, run transform operations in Saxon (good support for XSLT 2.0 features like xsl:for-each-group and XPath filters like select="* except title"), and feed the XSL-FO output into Antenna House for the XSL 1.1 support (standard change bars and retrieve-table-marker). Our distributed coding process demands a Software Configuration Management tool, and unfortunately corporate availability dictates that we use SourceSafe as opposed to something a little more useful like git or Mercurial.

One of the things I'm seeing the programmers on this project do (and I have to admit, I myself am guilty of setting up an early template like this) is to write XSLT code, which should be functional, the way one would write imperative procedural or object-oriented code. Code that reacts to matching an element, and then dives into a series of xsl:choose cases. Or a called-template that accepts a bunch of parameters. The problem with this is not readily apparent when writing one's own templates, but when others need to reuse or extend functionality, it becomes more obvious.

For example, in our case we divided up all of the (180-200) elements of our DTD by categories. One developer handles the bulk of the block-level and inline content, another handles the auto-generated and/or tabular-formatted front matter, I handle the inline and numbered tables, my buddy on-site does sequential lists, and the contractor does all of the graphics and extractable supplements. One of the first things that I added to the repository was a template that produced the static headers and footers, with classification markings, volume number and title, chapter/section type, number, and title, revision number and date, and page number format that would be followed by an fo:page-number (we didn't know about the XSL-FO 1.1 folio prefix/suffix features). All of these features were passed to the template as parameters. We all used this template repeatedly throughout our code, any time we had a high-level element that needed to produce a page-sequence. Now I'm kicking myself for writing it that way, because I have a big match template in my tables module that matches a bunch of types of tables, and calls the static content template just once in response to any of those table elements. Some of the parameters to the called-template are optional, but XSLT doesn't let me wrap an xsl:with-param tag inside a xsl:if or xsl:choose. You can do it the other way around, placing the conditional inside the param, but then you're passing something in rather than taking the param's default value. If I had instead written the template to apply moded match templates on the context node in the places where it referred to the params, we could all write the appropriate moded match templates in our modules. I could match my table elements with mode="revdate" for instance, and have it return the info appropriate to the particular element.

A similar thing happened when the one programmer doing the main content templates made one for the "title" element that produced a block for certain contexts, and did nothing as a default case. The rest of us having elements that contained a title could not just apply-templates to title and get a block output, because the default case would take over. We ended up having to output a block, and then possibly apply moded templates inside (in my case, tables can have multitle sheets, and a table's title outputs as "Table X. Title" but a sheet looks like "(Sheet X) Title" under the table title, so I make a match template for table/title and sheet/title).

Doing it functionally like that causes you to write more templates, but the advantage is that the templates are shorter, more shallowly nested, and more focused, making it easier to maintain because you can more easily tell the purpose of the shorter template and you don't get lost in a larger multi-screen template. I personally hate trying to debug something, jumping to a particular line or searching for text, and winding up in the middle of a multi-screen-long template and not knowing the name or match rule for the template that I found. It also makes the templates more extensible - for example, consider this as a template for titles:
<xsl:template match="title">
<fo:block font-weight="bold">
<xsl:apply-templates select="." mode="title"/>
</fo:block>
</xsl:template>

Pretty simple, right? By applying with a mode, we ensure that the default case will be to just apply-templates to its content nodes in the default mode. But if we were to override that by providing a moded match template, we can define specific behavior, and the mode gives us a nice hook to accomplish this. For instance, a table's title get the table number prefix, and it should be centered instead of left aligned, so:
<xsl:template match="table/title" mode="title">
<xsl:attribute name="text-align">center</xsl:attribute>
Table <xsl:value-of select="@label" />. <xsl:apply-templates />
</xsl:template>

The more hooks we give to other programmers, the more cohesive the entire project becomes. So although a functional language like XSLT can be more difficult for a programmer versed in Object Oriented or procedural scripting languages to wrap their head around concepts like pattern matching and recursion, the end result becomes much easier to read and understand, and therefore maintain.