If you're interested in functional programming, you might also want to checkout my second blog which i'm actively working on!!

Wednesday, October 31, 2012

Using keys with XSLT2.0

This article will show you how to efficiently use keys to speed up XSLT processing if you are dealing with large input files (hundreds of megabytes). Consider following example where we have two input files (stock.xml and orderlines.xml). The idea is to update the stock with new quantities by processing the orderlines.

The challenge here is how to use a key (built from matching orderlines) in the context of processing the stock. It might sound trivial but I leave it up to yourself to find out it's actually not.

stock.xml
<?xml version="1.0" encoding="UTF-8" ?>
<stock>
  <item id="PH3330L">
    <quantity>10</quantity>
  </item>
  <item id="BAS16">
    <quantity>7</quantity>
  </item>
  <item id="BUK100-50DL">
    <quantity>14</quantity>
  </item>  
</stock>

orderlines.xml
<?xml version="1.0" encoding="UTF-8" ?>
<orderlines>
  <orderline itemId="PH3330L">
    <quantity>4</quantity>
  </orderline>
  <orderline itemId="BAS16">
    <quantity>2</quantity>
  </orderline> 
</orderlines>

newstock.xml (expected output)
<?xml version="1.0" encoding="UTF-8"?>
<stock>
  <item id="PH3330L">
    <quantity>6</quantity>
  </item>
  <item id="BAS16">
    <quantity>5</quantity>
  </item>
  <item id="BUK100-50DL">
    <quantity>14</quantity>
  </item>  
</stock>

processOrderlines.xslt
<?xml version="1.0" encoding="UTF-8"?>
<!--
Author: Robby Pelssers
-->

<xsl:stylesheet version="2.0" 
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:xs="http://www.w3.org/2001/XMLSchema"
  xmlns:pelssers="http://robbypelssers.blogspot.com"
  exclude-result-prefixes="pelssers xs">
  
  <xsl:output method="xml" version="1.0" encoding="UTF-8"/>
  
  <xsl:param name="orderlinesURI" />
  <xsl:variable name="orderlines" select="document($orderlinesURI)/orderlines"/>
  <xsl:key name="orderline-lookup" match="orderline" use="@itemId"/>

  <xsl:template match="/">
    <xsl:apply-templates/>
  </xsl:template>

  <xsl:function name="pelssers:newQuantity" as="xs:double">
    <xsl:param name="element" as="element(orderlines)"/>
    <xsl:param name="itemId" as="xs:string"/>
    <xsl:param name="stockQuantity" as="xs:double"/>
    <xsl:apply-templates select="$element">
      <xsl:with-param name="itemId" select="$itemId"/>
      <xsl:with-param name="stockQuantity" select="$stockQuantity"/>
    </xsl:apply-templates>
  </xsl:function>

  <xsl:template match="orderlines" as="xs:double">
    <xsl:param name="itemId" as="xs:string"/>
    <xsl:param name="stockQuantity" as="xs:double"/>    
    <xsl:sequence select="if (exists(key('orderline-lookup', $itemId))) 
                  then $stockQuantity - key('orderline-lookup', $itemId)/quantity else $stockQuantity"/>
  </xsl:template>

  <xsl:template match="stock/item/quantity">
    <quantity><xsl:sequence select="pelssers:newQuantity($orderlines, parent::item/@id, .)"/></quantity>
  </xsl:template>

  <xsl:template match="@*|node()">
    <xsl:copy>
      <xsl:apply-templates select="@*"/>
      <xsl:apply-templates/>
    </xsl:copy>
  </xsl:template>

</xsl:stylesheet>

For this demo I only used the saxon jar from the command line.
java -Xmx1024m -jar Saxon-HE-9.4.jar 
  -s:C:/tmp/keydemo/input/stock.xml 
  -o:C:/tmp/keydemo/output/newstock.xml 
  -xsl:C:/tmp/keydemo/xslt/processOrderlines.xslt orderlinesURI=file:/C:/tmp/keydemo/input/orderlines.xml

Below a simplified stylesheet using a 3rd parameter to set the context node. It's based on a tip from @grtjn.
<?xml version="1.0" encoding="UTF-8"?>
<!--
Author: Robby Pelssers
-->

<xsl:stylesheet version="2.0" 
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:xs="http://www.w3.org/2001/XMLSchema"
  xmlns:pelssers="http://robbypelssers.blogspot.com"
  exclude-result-prefixes="pelssers xs">
  
  <xsl:output method="xml" version="1.0" encoding="UTF-8"/>
  
  <xsl:param name="orderlinesURI" />
  <xsl:variable name="orderlines" select="document($orderlinesURI)/orderlines"/>
  <xsl:key name="orderline-lookup" match="orderline" use="@itemId"/>

  <xsl:template match="/">
    <xsl:apply-templates/>
  </xsl:template>

  <xsl:template match="stock/item/quantity">
    <xsl:variable name="orderline" select="key('orderline-lookup', parent::item/@id, $orderlines)"/>
    <quantity><xsl:sequence select="if (exists($orderline)) then . - $orderline/quantity else xs:double(.)"/></quantity>
  </xsl:template>

  <xsl:template match="@*|node()">
    <xsl:copy>
      <xsl:apply-templates select="@*"/>
      <xsl:apply-templates/>
    </xsl:copy>
  </xsl:template>

</xsl:stylesheet>

3 comments:

  1. Nice feature, I didn't know about keys in XSLT.

    Off topic: I'm wondering why line numbers are incorrect in your code snippets. Why don't you fix it? :)

    ReplyDelete
  2. haha.. Unfortunately I have other bugs to fix mate ;-)

    ReplyDelete
  3. I will have to try applying some of this to those OntoML transforms

    ReplyDelete