Skip to main content

Introducing Schematron

Schematron is a rule-based validation language for making assertions about patterns found in XML documents. It is a simple language which is based very much on XML itself and uses standard XPath to specify the assertion statements. The Schematron definations (a.k.a Schema) can be processed with standard XSL templates; which makes Schematron applicable is a variety of scenarios.

Although a Schematron defination is referred as a Schema, but one must understand that Schematron differs in the basic concept from other schema languages; i.e. it is not based on grammars but on finding tree patterns in the parsed document. This approach allows many kinds of structures to be represented which could be difficult with grammar-based schema languages. For instance - imagine how would a typical schema be, for the following XML document -

<?xml version="1.0" encoding="UTF-8"?>
<instance>
####<person>
########<fname/>
########<lname/>
####</person>
</instance>

Guess its a no-brainer! It would be something like this -

<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
####<xs:element name="instance">
########<xs:complexType>
############<xs:sequence>
################<xs:element name="person">
####################<xs:complexType>
########################<xs:sequence>
############################<xs:element name="fname" type="xs:string"/>
############################<xs:element name="lname" type="xs:string"/>
########################</xs:sequence>
####################</xs:complexType>
################</xs:element>
############</xs:sequence>
########</xs:complexType>
####</xs:element>
</xs:schema>

You must be already wondering about how different would the same schema look like in the "Schematron" world, right? Well, here is the answer -

<?xml version="1.0" encoding="UTF-8"?>
<schema xmlns="http://www.ascc.net/xml/schematron" >
####<pattern name="assert validity">
########<rule context="instance">
############<assert test="person">person element is missing.</assert>
########</rule>
########<rule context="person">
############<assert test="fname">fname element is missing.</assert>
############<assert test="lname">lname element is missing.</assert>
########</rule>
####</pattern>
</schema>

Now, isn't that a much better and understandable version of a schema?

A closer look at the above document would reveal how Schematron differs in the fundamentals of validating a document; the crux herein is not to define the structure of the document (that is what the traditional schema types do), but is to assert the structure. Imagine it to be something like JUnit or NUnit for the XML world; wherein one puts assert statements to check the validity of an object's state. And just like it happens in the JUnit or NUnit world; herein with Schematron also, one can have custom messages for assert conditions.

The Schematron can render custom messages in two cases, v.i.z. -

1. "Report" the presence of a pattern
2. and "Assert" the absence of a pattern

The following Schematron document illustrates the usage of the above discussed features -


<?xml version="1.0" encoding="UTF-8"?>
<schema xmlns="http://www.ascc.net/xml/schematron" >

####<!-- Render messages in case the elements are found. -->
####<pattern name="report validity">
########<rule context="instance">
############<report test="person">person element is present.</assert>
########</rule>
########<rule context="person">
############<report test="fname">fname element is present.</assert>
############<report test="lname">lname element is present.</assert>
########</rule>
####</pattern>

####<!-- Render messages in case the elements are not found. -->
####<pattern name="assert validity">
########<rule context="instance">
############<assert test="person">person element is missing.</assert>
########</rule>
########<rule context="person">
############<assert test="fname">fname element is missing.</assert>
############<assert test="lname">lname element is missing.</assert>
########</rule>
####</pattern>

</schema>


Now that we know how Schematron based schemas look like; the next obvious question is - "How to validate a document against a Schematron schema...?" Well, I already addressed that question in the first paragraph itself, with the following statement -

"The Schematron definations (a.k.a Schema) can be processed with standard XSL templates."

Following statements illustrate the basic processing involved in validating a document with Schematron -

xslt -stylesheet schematron-message.xsl SchematronRules.xml > compiled-SchematronRules.xsl
xslt -stylesheet compiled-SchematronRules.xsl TestData.xml

Now if that looks a little complex; here is a simple Schematron document validator (implemented in Java) which I developed to ease this complexity -

- Schematronize.java

The above listed validator requires the following XSL templates to be locally available -

- skeleton1-5.xsl
- schematron-message.xsl

Thats all for now... I’d leave it here for you to play. Hope you enjoyed reading this article. I will look forward to reading your feedbacks and implementation details around Schematron, so please do take a moment and drop in a note.

Adieu.

Some References -

- The Schematron Website
- The Academia Sinica Computing Centre's Schematron Home Page

Comments

Popular posts from this blog

Shard – A Database Design

Scaling Database is one of the most common and important issue that every business confronts in order to accommodate the growing business and thus caused exponential data storage and availability demand. There two principle approaches to accomplish database scaling; v.i.z. vertical and horizontal. Regardless of which ever scaling strategy one decides to follow, we usual land-up buying ever bigger, faster, and more expensive machines; to either move the database on them for vertical scale-up or cluster them together to scale horizontally. While this arrangement is great if one has ample financial support, it doesn't work so well for the bank accounts of some of our heroic system builders who need to scale well past what they can afford. In this write-up, I intend to explain a revolutionary and fairly new database architecture; termed as Sharding, that some websites like Friendster and Flickr have been using since quite sometime now. The concept defines an affordable approach t...

FAINT - Search for faces

Lately, I've been playing around a bit with facial pattern recognition algorithms and their open source implementations. I came across many reference implementation but a very few were implemented in Java, and the Eigenfaces algorithm by far happens to be the best amongst them all. During my research around the said topic i happened to stumble-upon an implementation called FAINT (The Face Annotation Interface - http://faint.sourceforge.net). Faint by far the best facial pattern recognition API and as you must have already guessed, it implements the Eigenfaces algorithm. Now enough of theory talks, how about implementing an example with faint...? Here is one for all you face-recognition enthusiasts. The following example simply searches for faces in a given photograph and thumbnails them. Now, I know thats not face recognition; but be a little creative here. Once you have the facial thumbnails extracted, its never a big deal to look further in the Faint API and find methods which ca...

Is Java String really immutable...?

In many texts String is cited as the ultimate benchmark of Java's various immutable classes. Well, I'm sure you'd have to think the other way once you have read this article. To start with, let's get back to the books and read the definition of immutability. The Wikipedia defines it as follows - 'In object-oriented and functional programming, an immutable object is an object whose state cannot be modified after it is created.' I personally find this definition good as it mentions that an immutable instance's state should not be allowed to be modified after it's construction. Now keeping this in the back of our minds, let's decompile Java's standard String implementation and peep into the hashCode() method - public int hashCode() { int h = hash; if (h == 0) { int off = offset; char val[] = value; int len = count; for (int i = 0; i h = 31*h + val[off++]; } hash = h; } return h; } A detailed ...