I really like working with XML and XSLT transformations. I have used them to create subset data in code for clients. There is just something simple and beautiful about transforming a base document into another document for another purpose. Then I saw a job on Guru that required just that. I thought to myself, I ought to dust off the books and put together a simple command line program to do a transformation in a generic way. So I did just that and wrote this program:
# Name: xml xslt processor
# Purpose: Skill demonstration using lxml technologies
#
# Author: Demolishun
#
# Created: 14/03/2017
# Copyright: (c) Demolishun 2017
# Licence: all rights reserved
#-------------------------------------------------------------------------------
import os
import sys
import argparse
import lxml.etree as etree
# check input files
def checkinputfiles(args):
# check files exist
if not os.path.isfile(args.xml_in) or not os.path.isfile(args.xslt_in):
return None
# check if files are xml
xml_in = None
try:
xml_in = etree.parse(args.xml_in)
except etree.XMLSyntaxError, e:
print "Error processing input xml file: {}".format(args.xml_in)
return None
xslt_in = None
try:
xslt_in = etree.parse(args.xslt_in)
except etree.XMLSyntaxError, e:
print "Error processing input xslt file: {}".format(args.xslt_in)
return None
return {"xml_in":xml_in, "xslt_in":xslt_in}
# process xml using xslt
def processxml(trees):
# create transform from xslt
transform = etree.XSLT(trees["xslt_in"])
# process xml using transform
outdoc = transform(trees["xml_in"])
return outdoc
def main():
# create argument parser
parser = argparse.ArgumentParser(description='Convert XML document using XSLT document.')
# add inputs for arguments
parser.add_argument('xml_in')
parser.add_argument('xslt_in')
parser.add_argument('doc_out')
# parse arguments, close app if correct args are not present
args = parser.parse_args()
#print args
# determine if files are present and are indeed xml files
trees = checkinputfiles(args)
if trees == None:
print "One or more files not found: {}, {}".format(args.xml_in, args.xslt_in)
return -1
# process input xml against input xslt
outdoc = processxml(trees)
#print outdoc
#print etree.tostring(outdoc, method="txt", xml_declaration = False) #, pretty_print=True, )
try:
text_file = open(args.doc_out, "w")
text_file.write(str(outdoc))
text_file.close()
except Exception as e:
#print e
print "Error writing output file: {}".format(args.doc_out)
if __name__ == '__main__':
main()
I then used this source XML document:
<dataroot>
<data>
<name>Data1</name>
<trait>red</trait>
</data>
<data>
<name>Data2</name>
<trait>blue</trait>
</data>
<data>
<name>Data3</name>
<trait>yellow</trait>
</data>
<data>
<name>Data4</name>
<trait>blue</trait>
</data>
</dataroot>
and this XSLT document:
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output method="text" omit-xml-declaration="yes" indent="no"/>
<xsl:template match="/">
<xsl:for-each select="dataroot/data">
<xsl:value-of select="name"/> = <xsl:value-of select="trait"/><xsl:text> </xsl:text>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>
which transformed into this:
Data2 = blue
Data3 = yellow
Data4 = blue
It was fun to write and I got some practice doing command line argument processing using a library I had not used yet. A fun way to exercise some of my skills. I also now have generic command line processing program to convert XML documents.