<?xml version="1.0" encoding="utf-8"?>
<rss xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:trackback="http://madskills.com/public/xml/rss/module/trackback/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:pingback="http://madskills.com/public/xml/rss/module/pingback/" version="2.0">
  <channel>
    <title>Griffin Brown Weblog</title>
    <link>http://www.griffinbrown.co.uk/blog/</link>
    <description>Publishing Technology News &amp;amp; Views</description>
    <language>en-gb</language>
    <copyright>Griffin Brown Digital Publishing Ltd. All rights reserved.</copyright>
    <lastBuildDate>Sun, 04 May 2008 13:40:14 GMT</lastBuildDate>
    <generator>newtelligence dasBlog 1.9.7174.0</generator>
    <managingEditor>info@griffinbrown.co.uk</managingEditor>
    <webMaster>info@griffinbrown.co.uk</webMaster>
    <item>
      <trackback:ping>http://www.griffinbrown.co.uk/blog/Trackback.aspx?guid=ace3b1c6-7ce8-49c7-8485-1ff8c34b7038</trackback:ping>
      <pingback:server>http://www.griffinbrown.co.uk/blog/pingback.aspx</pingback:server>
      <pingback:target>http://www.griffinbrown.co.uk/blog/PermaLink.aspx?guid=ace3b1c6-7ce8-49c7-8485-1ff8c34b7038</pingback:target>
      <dc:creator>Alex Brown</dc:creator>
      <wfw:comment>http://www.griffinbrown.co.uk/blog/CommentView.aspx?guid=ace3b1c6-7ce8-49c7-8485-1ff8c34b7038</wfw:comment>
      <wfw:commentRss>http://www.griffinbrown.co.uk/blog/SyndicationService.asmx/GetEntryCommentsRss?guid=ace3b1c6-7ce8-49c7-8485-1ff8c34b7038</wfw:commentRss>
      <slash:comments>23</slash:comments>
      <body xmlns="http://www.w3.org/1999/xhtml">
        <p>
Just when it seemed like nobody was interested in the <a href="http://www.griffinbrown.co.uk/blog/PermaLink.aspx?guid=f0384bed-808b-49a8-8887-ea7cde5caace">ODF
conformance smoke test</a> posted a few days ago, <a href="http://www.robweir.com/blog/2008/05/odf-validation-for-dummies.html">IBM's
Rob Weir weighs in with a lengthy piece in response.</a></p>
        <p>
Rob replicates the test I ran and runs a few of his own, finding ODF validation problems
along the way and ending with an eyebrow-raising take on this which, I think, sells
ODF seriously short.
</p>
        <p>
But before getting to that, a few technical things need to be put straight.
</p>
        <h4>Is the ODF schema broken?
</h4>
        <p>
One of the unexpected things I found in my test was that the ODF schema itself was
broken, leading me to conclude that there could be no valid ODF 1.0 documents in existence
as the schema simply could not be validated against.
</p>
        <p>
Rob doesn't believe there's a problem here (though he allows "Alex's proposed changes
to the schema are reasonable and should be considered" – too right!), and when he
finds a validator reporting the error I mention, he blithely disables the reporting
of that error so he can continue on to get a bunch of "error free" validation reports
when validating the ODF 1.0 spec.
</p>
        <p>
Why did Rob disable this error reporting? Well, he claims the standard allows him
to – he writes that "there is no claim whatsoever [in the ODF spec] that a conformant
ODF 1.0 document will conform to the ID/IDREF constraints defined in Relax NG DTD
Compatibility". Crucially, this claim is misguided.
</p>
        <p>
The ODF 1.0 spec makes explicit use of datatypes it names "ID" and "IDREF" – it states
that these are the W3C types as defined in <a href="http://www.w3.org/TR/2001/REC-xmlschema-2-20010502">XML
Schema Part 2</a>. If we look in turn at this document, it defines both of these types,
and states that they represent the same types from <a href="http://www.w3.org/TR/2000/REC-xml-20001006">XML
1.0 (Second Edition)</a>. And if we look back to that document we see that both these
types have a <a href="http://www.w3.org/TR/2000/REC-xml-20001006#id">bunch of validity
constraints</a> which need to be tested, such as the need for every IDREF to correspond
to some matching ID, or that ID values must be unique per document. To be valid according
to these definitions a validator must respect the semantic constraints associated
with these datatype definitions. (To return to the "dummies" level, we might read
the helpful description from the <a href="http://www.w3.org/TR/2001/REC-xmlschema-0-20010502/">XML
Schema Primer</a> which states: "XML 1.0 provides a mechanism for ensuring uniqueness
using the ID attribute and its associated attributes IDREF and IDREFS. This mechanism
is also provided in XML Schema through the ID, IDREF, and IDREFS simple types which
can be used for declaring XML 1.0-style attributes"). By switching this functionality
OFF Rob may be generating good spin for his blog, but he is not validating ODF correctly,
as he is ignoring the very type correctness checking that the ODF spec mandates through
its datatyping! (And worryingly, this gaffe has now been perpetuated in <a href="http://wiki.oasis-open.org/office/How_to_Validate_an_ODF_document">an
(official?) OASIS TC Wiki, on an immutable page!</a>.)
</p>
        <p>
Coming at this from another direction, we could also take into account the fact that
the RELAX NG used by ODF is not "pure" ISO/IEC 19757-2, but uses mechanisms from the
OASIS past of RELAX NG. In particular, it declares:
</p>
        <pre>datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes"</pre>
        <p>
and in so doing brings into play RELAX NG's schema XSD datatype emulation. The OASIS
spec describing this feature is <a href="http://relaxng.org/xsd-20010907.html">Guidelines
for using W3C XML Schema Datatypes with RELAX NG</a> and this refers to the very RELAX
NG compatibility features Rob claims we can safely ignore:
</p>
        <blockquote>
          <p>
[DTD Compatibility] defines the concept of an ID-type, which is an additional semantic
for datatypes that allows datatypes to have [XML 1.0] cross-reference semantics. An
implementation of [DTD Compatibility] that supports these guidelines should associate
the ID, IDREF and IDREFS datatypes of [W3C XML Schema Datatypes] with the ID-types
ID, IDREF, and IDREFS respectively. 
</p>
        </blockquote>
        <p>
The jing validator <b>does</b> support these guidelines, and accordingly performs
just such an association. As the co-author of the spec, James Clark (the author of
jing) can be relied on - rather more than Rob - to know what functionality applies
for a particular validation scenario.
</p>
        <p>
So, both formally and informally we should <b>not</b> be disabling ID/IDREF awareness
– and there is also a third, less dry technical reason why we should not: common sense.
The ID/IDREF testing performs a useful first-line of defense testing on our document,
and prevents such nonsenses as duplicate IDs or broken links. Without it, we could
take the ODF spec as XML, make all the IDs in it identical, and then watch as Rob's
validation method passed the resulting rubbish all as "a-okay". So I'm sorry Rob,
but on all three counts the "it's error free if we disable error testing" approach
does not cut the mustard, and is simply not something the ODF spec entitles you to
do.
</p>
        <p>
Where I do agree is that we need to put this in perspective. Although these findings
are interesting in the context of the OOXML furor, they do not signal anything particularly
momentous about ODF. Defects get found; defects get fixed – the standard improves
and everybody is happy. Right?
</p>
        <h4>Negativity
</h4>
        <p>
Amid the general downer that is Rob's blog entry, is an assumption that I share such
negative thoughts. I find myself described as "someone who would be well served if
he could show that all consortia standards are junk, and that only SC34 (and he himself)
could make them good". Hmmmmm - where did that come from?
</p>
        <p>
For the record, I am an enthusiastic supporter of consortia and consortium standards
and know from experience that consortia contain great people who are producing some
of the best standards work in the planet: XML 1.0, ODF, XSLT, UBL, OOXML (ha!) – the
list goes on. Most recently I was very pleased to see a new working draft of the important
new <a href="http://www.w3.org/XML/XProc/docs/langspec.html">W3C XProc specification</a> –
something that SC 34 is specifically deferring to rather than attempt something similar
itself. I thoroughly disapprove of the kind of oppositional mindset that sees things
in a polarised "ISO vs OASIS" or "ISO vs W3C" way. In my view that mode of thinking
already did enough damage during the DIS 29500 project.
</p>
        <h4>Tools that produce valid ODF?
</h4>
        <p>
Rob continues, re-running the tests I performed and finding the same result. Rob quibbles
with many aspects of the test (which is fine, this was just a "smoke test") but, after
all the huffing and puffing is done, we are left with the cold, hard fact that OpenOffice.org
2.4 (and, as Rob demonstrates, the <a href="http://odf-converter.sourceforge.net/">CleverAge
converter</a>) are not emitting valid ODF documents.
</p>
        <p>
It's at this point that things get a bit odd. Faced with the invalid documents before
him Rob writes:
</p>
        <blockquote>
          <p>
Conformance requires that [an application] is capable of writing out a valid document.
And of course, success for an ODF implementation requires that its conformance to
the standard is sufficient to deliver on the promises of the standard, for interoperability. 
</p>
        </blockquote>
        <p>
No. A conformant application needs to be more than "capable of" writing valid documents.
If it claims to be emitting ODF 1.0 then valid ODF 1.0 is what it <b>has to</b> emit
– the ODF schema is normative, not an optional extra. If the application fails to
do this, it is non-conformant and consequently has a bug which need fixing. This is
what I would expect to be the message to OpenOffice: it has some (mild-looking) ODF
conformance bugs which need fixing. Let's fix the application, not try and re-define
what conformance means and pretend all is well!
</p>
        <p>
Rob then moves on to compare the corpus of ODF documents to HTML on the Web:
</p>
        <blockquote>
          <p>
So I suggest that ODF has a far better validation record than HTML and the web have,
and that is an encouraging statement. 
</p>
        </blockquote>
        <p>
"encouraging"!? err, sorry but again: no. To compare any document type collection
to the validity rubbish-heap that is the Web's corpus of HTML is saying practically
nothing and, I think, sells ODF seriously short of where it's at. What is "encouraging"
to me is that the schema problems in the ODF schema, and the validity errors we find
in ODF emitted by a major application (OpenOffice), are so comparatively minor. The
prize is in sight - with some schema fixing and bug fixing we (the users) could be
using an office application which worked reliably with a truly international standard
(ODF 1.0 in this case). That is surely what we should all be aiming for. Inevitably,
progress in this will be slower if defects, when found, meet with denial and obfuscation
rather than a willingness to move forwards.
</p>
        <h4>Homework
</h4>
        <p>
Now that interest seems to have been awakened in performing ODF (and OOXML) validation,
perhaps it is worth investigating the <a href="http://www.griffinbrown.co.uk/blog/images/odf10-msv-warnings.txt">25
warning messages</a> that <a href="https://msv.dev.java.net/">msv</a> emits when parsing
the ODF 1.0 schema with warnings enabled? The last two are related to the ID/IDREF
problem mentioned above and are fixed by applying my proposed resolution. But are
the remaining 23 all spurious? – nothing seems wrong with the schema from a quick
look (this is a genuine, not a rhetorical, question BTW).
</p>
        <p>
And I again renew my call: I am very interested in hearing about any application that
consistently emits valid ODF (or valid OOXML for that matter). Are there really <i>none</i>?
</p>
        <h4>Moving forward
</h4>
        <p>
As I wrote many times (and as was repeatedly ignored) the smoke tests for OOXML and
ODF validation were, by design, crude – they just give a rough idea whether all is
well. Based on the results, it is apparent that a more thorough investigation of both
formats (and their applications) would be of interest. Accordingly the next step is
to start constructing a validation testing framework that:
</p>
        <ul>
          <li>
Uses a varied suite of documents originated natively using office applications (MS
Office, OpenOffice.org and others)</li>
          <li>
Goes beyond schema validation to apply semantic constraints described by the standards'
text (using e.g. Schematron)</li>
          <li>
Corellates and presents the results in full</li>
        </ul>
        <p>
Watch this space ...
</p>
        <p>
- Alex.
</p>
        <img width="0" height="0" src="http://www.griffinbrown.co.uk/blog/aggbug.ashx?id=ace3b1c6-7ce8-49c7-8485-1ff8c34b7038" />
      </body>
      <title>ODF validation for the cognoscenti</title>
      <guid isPermaLink="false">http://www.griffinbrown.co.uk/blog/PermaLink.aspx?guid=ace3b1c6-7ce8-49c7-8485-1ff8c34b7038</guid>
      <link>http://www.griffinbrown.co.uk/blog/PermaLink.aspx?guid=ace3b1c6-7ce8-49c7-8485-1ff8c34b7038</link>
      <pubDate>Sun, 04 May 2008 13:40:14 GMT</pubDate>
      <description>&lt;p&gt;
Just when it seemed like nobody was interested in the &lt;a href="http://www.griffinbrown.co.uk/blog/PermaLink.aspx?guid=f0384bed-808b-49a8-8887-ea7cde5caace"&gt;ODF
conformance smoke test&lt;/a&gt; posted a few days ago, &lt;a href="http://www.robweir.com/blog/2008/05/odf-validation-for-dummies.html"&gt;IBM's
Rob Weir weighs in with a lengthy piece in response.&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;
Rob replicates the test I ran and runs a few of his own, finding ODF validation problems
along the way and ending with an eyebrow-raising take on this which, I think, sells
ODF seriously short.
&lt;/p&gt;
&lt;p&gt;
But before getting to that, a few technical things need to be put straight.
&lt;/p&gt;
&lt;h4&gt;Is the ODF schema broken?
&lt;/h4&gt;
&lt;p&gt;
One of the unexpected things I found in my test was that the ODF schema itself was
broken, leading me to conclude that there could be no valid ODF 1.0 documents in existence
as the schema simply could not be validated against.
&lt;/p&gt;
&lt;p&gt;
Rob doesn't believe there's a problem here (though he allows "Alex's proposed changes
to the schema are reasonable and should be considered" – too right!), and when he
finds a validator reporting the error I mention, he blithely disables the reporting
of that error so he can continue on to get a bunch of "error free" validation reports
when validating the ODF 1.0 spec.
&lt;/p&gt;
&lt;p&gt;
Why did Rob disable this error reporting? Well, he claims the standard allows him
to – he writes that "there is no claim whatsoever [in the ODF spec] that a conformant
ODF 1.0 document will conform to the ID/IDREF constraints defined in Relax NG DTD
Compatibility". Crucially, this claim is misguided.
&lt;/p&gt;
&lt;p&gt;
The ODF 1.0 spec makes explicit use of datatypes it names "ID" and "IDREF" – it states
that these are the W3C types as defined in &lt;a href="http://www.w3.org/TR/2001/REC-xmlschema-2-20010502"&gt;XML
Schema Part 2&lt;/a&gt;. If we look in turn at this document, it defines both of these types,
and states that they represent the same types from &lt;a href="http://www.w3.org/TR/2000/REC-xml-20001006"&gt;XML
1.0 (Second Edition)&lt;/a&gt;. And if we look back to that document we see that both these
types have a &lt;a href="http://www.w3.org/TR/2000/REC-xml-20001006#id"&gt;bunch of validity
constraints&lt;/a&gt; which need to be tested, such as the need for every IDREF to correspond
to some matching ID, or that ID values must be unique per document. To be valid according
to these definitions a validator must respect the semantic constraints associated
with these datatype definitions. (To return to the "dummies" level, we might read
the helpful description from the &lt;a href="http://www.w3.org/TR/2001/REC-xmlschema-0-20010502/"&gt;XML
Schema Primer&lt;/a&gt; which states: "XML 1.0 provides a mechanism for ensuring uniqueness
using the ID attribute and its associated attributes IDREF and IDREFS. This mechanism
is also provided in XML Schema through the ID, IDREF, and IDREFS simple types which
can be used for declaring XML 1.0-style attributes"). By switching this functionality
OFF Rob may be generating good spin for his blog, but he is not validating ODF correctly,
as he is ignoring the very type correctness checking that the ODF spec mandates through
its datatyping! (And worryingly, this gaffe has now been perpetuated in &lt;a href="http://wiki.oasis-open.org/office/How_to_Validate_an_ODF_document"&gt;an
(official?) OASIS TC Wiki, on an immutable page!&lt;/a&gt;.)
&lt;/p&gt;
&lt;p&gt;
Coming at this from another direction, we could also take into account the fact that
the RELAX NG used by ODF is not "pure" ISO/IEC 19757-2, but uses mechanisms from the
OASIS past of RELAX NG. In particular, it declares:
&lt;/p&gt;
&lt;pre&gt;datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes"&lt;/pre&gt;
&lt;p&gt;
and in so doing brings into play RELAX NG's schema XSD datatype emulation. The OASIS
spec describing this feature is &lt;a href="http://relaxng.org/xsd-20010907.html"&gt;Guidelines
for using W3C XML Schema Datatypes with RELAX NG&lt;/a&gt; and this refers to the very RELAX
NG compatibility features Rob claims we can safely ignore:
&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;
[DTD Compatibility] defines the concept of an ID-type, which is an additional semantic
for datatypes that allows datatypes to have [XML 1.0] cross-reference semantics. An
implementation of [DTD Compatibility] that supports these guidelines should associate
the ID, IDREF and IDREFS datatypes of [W3C XML Schema Datatypes] with the ID-types
ID, IDREF, and IDREFS respectively. 
&lt;/p&gt;
&lt;/blockquote&gt; 
&lt;p&gt;
The jing validator &lt;b&gt;does&lt;/b&gt; support these guidelines, and accordingly performs
just such an association. As the co-author of the spec, James Clark (the author of
jing) can be relied on - rather more than Rob - to know what functionality applies
for a particular validation scenario.
&lt;/p&gt;
&lt;p&gt;
So, both formally and informally we should &lt;b&gt;not&lt;/b&gt; be disabling ID/IDREF awareness
– and there is also a third, less dry technical reason why we should not: common sense.
The ID/IDREF testing performs a useful first-line of defense testing on our document,
and prevents such nonsenses as duplicate IDs or broken links. Without it, we could
take the ODF spec as XML, make all the IDs in it identical, and then watch as Rob's
validation method passed the resulting rubbish all as "a-okay". So I'm sorry Rob,
but on all three counts the "it's error free if we disable error testing" approach
does not cut the mustard, and is simply not something the ODF spec entitles you to
do.
&lt;/p&gt;
&lt;p&gt;
Where I do agree is that we need to put this in perspective. Although these findings
are interesting in the context of the OOXML furor, they do not signal anything particularly
momentous about ODF. Defects get found; defects get fixed – the standard improves
and everybody is happy. Right?
&lt;/p&gt;
&lt;h4&gt;Negativity
&lt;/h4&gt;
&lt;p&gt;
Amid the general downer that is Rob's blog entry, is an assumption that I share such
negative thoughts. I find myself described as "someone who would be well served if
he could show that all consortia standards are junk, and that only SC34 (and he himself)
could make them good". Hmmmmm - where did that come from?
&lt;/p&gt;
&lt;p&gt;
For the record, I am an enthusiastic supporter of consortia and consortium standards
and know from experience that consortia contain great people who are producing some
of the best standards work in the planet: XML 1.0, ODF, XSLT, UBL, OOXML (ha!) – the
list goes on. Most recently I was very pleased to see a new working draft of the important
new &lt;a href="http://www.w3.org/XML/XProc/docs/langspec.html"&gt;W3C XProc specification&lt;/a&gt; –
something that SC 34 is specifically deferring to rather than attempt something similar
itself. I thoroughly disapprove of the kind of oppositional mindset that sees things
in a polarised "ISO vs OASIS" or "ISO vs W3C" way. In my view that mode of thinking
already did enough damage during the DIS 29500 project.
&lt;/p&gt;
&lt;h4&gt;Tools that produce valid ODF?
&lt;/h4&gt;
&lt;p&gt;
Rob continues, re-running the tests I performed and finding the same result. Rob quibbles
with many aspects of the test (which is fine, this was just a "smoke test") but, after
all the huffing and puffing is done, we are left with the cold, hard fact that OpenOffice.org
2.4 (and, as Rob demonstrates, the &lt;a href="http://odf-converter.sourceforge.net/"&gt;CleverAge
converter&lt;/a&gt;) are not emitting valid ODF documents.
&lt;/p&gt;
&lt;p&gt;
It's at this point that things get a bit odd. Faced with the invalid documents before
him Rob writes:
&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;
Conformance requires that [an application] is capable of writing out a valid document.
And of course, success for an ODF implementation requires that its conformance to
the standard is sufficient to deliver on the promises of the standard, for interoperability. 
&lt;/p&gt;
&lt;/blockquote&gt; 
&lt;p&gt;
No. A conformant application needs to be more than "capable of" writing valid documents.
If it claims to be emitting ODF 1.0 then valid ODF 1.0 is what it &lt;b&gt;has to&lt;/b&gt; emit
– the ODF schema is normative, not an optional extra. If the application fails to
do this, it is non-conformant and consequently has a bug which need fixing. This is
what I would expect to be the message to OpenOffice: it has some (mild-looking) ODF
conformance bugs which need fixing. Let's fix the application, not try and re-define
what conformance means and pretend all is well!
&lt;/p&gt;
&lt;p&gt;
Rob then moves on to compare the corpus of ODF documents to HTML on the Web:
&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;
So I suggest that ODF has a far better validation record than HTML and the web have,
and that is an encouraging statement. 
&lt;/p&gt;
&lt;/blockquote&gt; 
&lt;p&gt;
"encouraging"!? err, sorry but again: no. To compare any document type collection
to the validity rubbish-heap that is the Web's corpus of HTML is saying practically
nothing and, I think, sells ODF seriously short of where it's at. What is "encouraging"
to me is that the schema problems in the ODF schema, and the validity errors we find
in ODF emitted by a major application (OpenOffice), are so comparatively minor. The
prize is in sight - with some schema fixing and bug fixing we (the users) could be
using an office application which worked reliably with a truly international standard
(ODF 1.0 in this case). That is surely what we should all be aiming for. Inevitably,
progress in this will be slower if defects, when found, meet with denial and obfuscation
rather than a willingness to move forwards.
&lt;/p&gt;
&lt;h4&gt;Homework
&lt;/h4&gt;
&lt;p&gt;
Now that interest seems to have been awakened in performing ODF (and OOXML) validation,
perhaps it is worth investigating the &lt;a href="http://www.griffinbrown.co.uk/blog/images/odf10-msv-warnings.txt"&gt;25
warning messages&lt;/a&gt; that &lt;a href="https://msv.dev.java.net/"&gt;msv&lt;/a&gt; emits when parsing
the ODF 1.0 schema with warnings enabled? The last two are related to the ID/IDREF
problem mentioned above and are fixed by applying my proposed resolution. But are
the remaining 23 all spurious? – nothing seems wrong with the schema from a quick
look (this is a genuine, not a rhetorical, question BTW).
&lt;/p&gt;
&lt;p&gt;
And I again renew my call: I am very interested in hearing about any application that
consistently emits valid ODF (or valid OOXML for that matter). Are there really &lt;i&gt;none&lt;/i&gt;?
&lt;/p&gt;
&lt;h4&gt;Moving forward
&lt;/h4&gt;
&lt;p&gt;
As I wrote many times (and as was repeatedly ignored) the smoke tests for OOXML and
ODF validation were, by design, crude – they just give a rough idea whether all is
well. Based on the results, it is apparent that a more thorough investigation of both
formats (and their applications) would be of interest. Accordingly the next step is
to start constructing a validation testing framework that:
&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
Uses a varied suite of documents originated natively using office applications (MS
Office, OpenOffice.org and others)&lt;/li&gt;
&lt;li&gt;
Goes beyond schema validation to apply semantic constraints described by the standards'
text (using e.g. Schematron)&lt;/li&gt;
&lt;li&gt;
Corellates and presents the results in full&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;
Watch this space ...
&lt;/p&gt;
&lt;p&gt;
- Alex.
&lt;/p&gt;
&lt;img width="0" height="0" src="http://www.griffinbrown.co.uk/blog/aggbug.ashx?id=ace3b1c6-7ce8-49c7-8485-1ff8c34b7038" /&gt;</description>
      <comments>http://www.griffinbrown.co.uk/blog/CommentView.aspx?guid=ace3b1c6-7ce8-49c7-8485-1ff8c34b7038</comments>
    </item>
    <item>
      <trackback:ping>http://www.griffinbrown.co.uk/blog/Trackback.aspx?guid=f0384bed-808b-49a8-8887-ea7cde5caace</trackback:ping>
      <pingback:server>http://www.griffinbrown.co.uk/blog/pingback.aspx</pingback:server>
      <pingback:target>http://www.griffinbrown.co.uk/blog/PermaLink.aspx?guid=f0384bed-808b-49a8-8887-ea7cde5caace</pingback:target>
      <dc:creator>Alex Brown</dc:creator>
      <wfw:comment>http://www.griffinbrown.co.uk/blog/CommentView.aspx?guid=f0384bed-808b-49a8-8887-ea7cde5caace</wfw:comment>
      <wfw:commentRss>http://www.griffinbrown.co.uk/blog/SyndicationService.asmx/GetEntryCommentsRss?guid=f0384bed-808b-49a8-8887-ea7cde5caace</wfw:commentRss>
      <slash:comments>21</slash:comments>
      <body xmlns="http://www.w3.org/1999/xhtml">
        <p>
Following on from the <a href="http://www.griffinbrown.co.uk/blog/PermaLink.aspx?guid=3e2202cd-59a3-4356-8f30-b8eb79735e1a">recent
smoke test of Office 2007 conformance to ISO/IEC 29500</a> here, as promised, is a
repeat of the exercise using ISO/IEC 26300 (ODF 1.0).
</p>
        <p>
Like OOXML, ODF has (sensibly) a schema defined using RELAX NG (ISO/IEC 19757-2).
This schema is published in the standard itself and is <a href="http://www.oasis-open.org/committees/download.php/12571/OpenDocument-schema-v1.0-os.rng">available
for download from OASIS</a>.
</p>
        <h4>ODF Schema Woes
</h4>
        <p>
The first problem encountered was in trying to use this schema. Both <a href="http://www.thaiopensource.com/relaxng/jing.html">James
Clark’s jing</a> and <a href="https://msv.dev.java.net/">Sun’s Multi-schema validator</a> emitted
error messages when processing it. Further investigation reveals that the schema has
a critical flaw in the way its open models conflict with its typed attribute values.
At the end of this blog entry is a detailed defect report with a proposal how to fix
the schema. By filing this I nail my colours to the mast as a staunch <a href="http://www.durusau.net/publications/promotion.pdf">ODF
supporter</a>!
</p>
        <p>
The consequence of this schema flaw is that the formal definition of document validity
in ODF 1.0 is broken. I suspect tools which claim to use the schema with success are
based on <a href="http://xmlsoft.org/">Libxml</a>, whose RELAX NG validator is incomplete.
Don’t trust them.
</p>
        <p>
Imagine the outrage there'd have been if OOXML had passed with this kind of defect!
</p>
        <h4>Getting an ODF Document
</h4>
        <p>
For parity with the OOXML test, I used the same document (Ecma 376 Part 4) for testing.
This requires several steps of conversion, from Ecma 376 format to Word binary, and
then (using <a href="http://www.openoffice.org/">OpenOffice.org</a> 2.4.0) from Word
binary to ODF. The process took several hours, but in the end it results in a .odt
file of approx 59MB.
</p>
        <h4>Validation Result
</h4>
        <p>
Validating the ODF document against the (patched) schema yielded 7,525 validation
errors – mostly of the same type (use of an undeclared <tt>soft-page-break</tt> element).
</p>
        <h4>Conclusion
</h4>
        <p>
Again, only tentative conclusions can be drawn from a smoke test (readers unfamiliar
with this term as applied to software testing are recommended to read <a href="http://en.wikipedia.org/wiki/Smoke_testing#Smoke_testing_in_software_development">the
Wikipedia article on it</a><b>before</b> grumbling about the depth of the test, please).
</p>
        <ul>
          <li>
For ISO/IEC 26300:2006 (ODF) in general, we can say that the standard itself has a
defect which prevents any document claiming validity from being actually valid. Consequently,
there are no XML documents in existence which are valid to ISO ODF.</li>
          <li>
Even if the schema is fixed, we can see that OpenOffice.org 2.4.0 does not produce
valid XML documents. This is to be expected and is a mirror-case of what was found
for MS Office 2007: while MS Office has not caught up with the ISO standard, OpenOffice
has rather bypassed it (it aims at its consortium standard, just as MS Office does).</li>
        </ul>
        <p>
I’d be very interested to find an office application that <b>does</b> work with valid
ISO/IEC 26300 content. Do any readers know of one?
</p>
        <h4>Looking Forward
</h4>
        <p>
A smoke test only scratches the surface – a fuller document conformance test suite
would give a much better idea of the semantic (as well as the syntactic) validity
of documents that claim conformance to either 29500 or 26300. 
</p>
        <p>
Fortunately <a href="http://www.jtc1sc34.org/">SC 34</a> has spent the past years
working on exactly the kinds of technologies (<a href="http://www.dsdl.org">ISO/IEC
19757, DSDL</a>) that will allow a more complete validation of XML documents. I am
hopeful that we will see some more meaningful testing in time, and note with interest
that the Italian National Standards Body have invited participation in such activities.
</p>
        <p>
The unfortunate reality for concerned users is that there are no office application
suites on the planet that create XML valid to International Standards, although both
MS Office and OpenOffice.org get you within sniffing distance. The remedies for this
shortfall are for Microsoft (on the one hand) to update its Office product, and for
ODF developers (on the other hand) to pay more attention to XML validity – especially
when targeting the upcoming ISO standard version of ODF 1.2. The world is moving on,
and users do not want to spend time battling with incorrect outputs of their office
applications: they want a reliable format they can use to build further applications
on. Let us hope the coming months and years will see marked improvements in document
conformance levels!
</p>
        <p>
N.B. As this blog entry “goes to press”, <a href="http://idippedut.dk/post/2008/04/Conformance-of-ODF-documents.aspx">Jesper
Lund Stocholm has posted a blog entry on ODF conformance</a> which is also well worth
reading.
</p>
        <p>
I suspect neither his blog entry, nor this one, will receive as much attention as
the one reporting findings on MS Office's XML! Let's see.
</p>
        <br />
        <br />
        <hr height="1px" width="100" />
        <br />
        <div>
          <h4>Defect Report ISO/IEC 26300:2006
</h4>
          <p>
Clause 16.2 defines an “open model” for custom content using two patterns, as follows:
</p>
          <pre style="background-color: rgb(238, 238, 238);">&lt;define name="anyAttListOrElements"&gt;<br />
&lt;zeroOrMore&gt;<br />
&lt;attribute&gt;<br />
&lt;anyName&gt;<br />
&lt;text&gt; &lt;/text&gt;<br />
&lt;/anyName&gt;<br />
&lt;ref name="anyElements"&gt; &lt;/ref&gt;<br />
&lt;define name="anyElements"&gt;<br />
&lt;zeroOrMore&gt;<br />
&lt;element&gt;<br />
&lt;anyName&gt;<br />
&lt;mixed&gt;<br />
&lt;ref name="anyAttListOrElements"&gt; &lt;/ref&gt;<br />
&lt;/mixed&gt;<br />
&lt;/anyName&gt;<br />
&lt;/element&gt;<br />
&lt;/zeroOrMore&gt;<br />
&lt;/define&gt;<br />
&lt;/attribute&gt;<br />
&lt;/zeroOrMore&gt;<br />
&lt;/define&gt;</pre>
          <p>
Similar definitions are also used (clause 15.2) for the modelling of mathematical
markup:
</p>
          <pre style="background-color: rgb(238, 238, 238);"> &lt;!-- To avoid inclusion of the complete MathML schema, anything --&gt;<br />
&lt;!-- is allowed within a math:math top-level element --&gt;<br /><br />
&lt;define name="mathMarkup"&gt;<br />
&lt;zeroOrMore&gt;<br />
&lt;choice&gt;<br />
&lt;attribute&gt;<br />
&lt;anyName/&gt;<br />
&lt;/attribute&gt;<br />
&lt;text/&gt;<br />
&lt;element&gt;<br />
&lt;anyName/&gt;<br />
&lt;ref name="mathMarkup"/&gt;<br />
&lt;/element&gt;<br />
&lt;/choice&gt;<br />
&lt;/zeroOrMore&gt;<br />
&lt;/define&gt;</pre>
          <p>
However, the declaration of attributes here with any name and any value of any type,
conflicts with the declaration elsewhere in the schema of attributes that have an
ID or IDREF type. Consequently the schema cannot be processed by validating processors
which respect type consistency (e.g. jing [1] or msv [2] used with warning enabled).
</p>
          <p>
            <b>Proposed Solution</b>
          </p>
          <p>
The schema must be corrected. This can be done by excluding the typed attributes from
the custom model as follows:
</p>
          <pre style="background-color: rgb(238, 238, 238);">  &lt;define name="anyAttListOrElements"&gt;<br />
&lt;zeroOrMore&gt;<br />
&lt;attribute&gt;<br />
&lt;anyName&gt;<br />
&lt;except&gt;<br />
&lt;name&gt;smil:targetElement&lt;/name&gt;<br />
&lt;name&gt;text:id&lt;/name&gt;<br />
&lt;name&gt;text:change-id&lt;/name&gt;<br />
&lt;name&gt;form:id&lt;/name&gt;<br />
&lt;name&gt;presentation:master-element&lt;/name&gt;<br />
&lt;name&gt;draw:id&lt;/name&gt;<br />
&lt;name&gt;anim:id&lt;/name&gt;<br />
&lt;name&gt;draw:shape-id&lt;/name&gt;<br />
&lt;name&gt;draw:end-shape&lt;/name&gt;<br />
&lt;name&gt;draw:start-shape&lt;/name&gt;<br />
&lt;name&gt;draw:control&lt;/name&gt;<br />
&lt;/except&gt;<br />
&lt;/anyName&gt;<br />
&lt;text/&gt;<br />
&lt;/attribute&gt;<br />
&lt;/zeroOrMore&gt;<br />
&lt;ref name="anyElements"/&gt;<br />
&lt;/define&gt;<br />
&lt;define name="anyElements"&gt;<br />
&lt;zeroOrMore&gt;<br />
&lt;element&gt;<br />
&lt;anyName/&gt;<br />
&lt;mixed&gt;<br />
&lt;ref name="anyAttListOrElements"/&gt;<br />
&lt;/mixed&gt;<br />
&lt;/element&gt;<br />
&lt;/zeroOrMore&gt;<br />
&lt;/define&gt;</pre>
          <p>
If it is intended these attributes should be allowed in custom data, they should be
re-included (correctly typed) as necessary.
</p>
          <p>
In general, the custom data model should be revisited – is it really the intention
that it should be so open?
</p>
          <p>
Similarly, the math markup model would be better made more restrictive either by incorporating
a MathML schema, or at least by restricting the allowed elements to certain Namespaces.
For the time being it should at least re-use the custom model to avoid unnecessary
replication of patterns.
</p>
          <h4>References
</h4>
          <p>
[1] Jing - A RELAX NG validator in Java <a href="http://www.thaiopensource.com/relaxng/jing.html">http://www.thaiopensource.com/relaxng/jing.html</a></p>
          <p>
[2] Sun Multi-Schema Validator <a href="https://msv.dev.java.net/">https://msv.dev.java.net/</a></p>
        </div>
        <img width="0" height="0" src="http://www.griffinbrown.co.uk/blog/aggbug.ashx?id=f0384bed-808b-49a8-8887-ea7cde5caace" />
      </body>
      <title>ODF 1.0 and OpenOffice.org: a conformance smoke test</title>
      <guid isPermaLink="false">http://www.griffinbrown.co.uk/blog/PermaLink.aspx?guid=f0384bed-808b-49a8-8887-ea7cde5caace</guid>
      <link>http://www.griffinbrown.co.uk/blog/PermaLink.aspx?guid=f0384bed-808b-49a8-8887-ea7cde5caace</link>
      <pubDate>Wed, 30 Apr 2008 11:50:15 GMT</pubDate>
      <description>&lt;p&gt;
Following on from the &lt;a href="http://www.griffinbrown.co.uk/blog/PermaLink.aspx?guid=3e2202cd-59a3-4356-8f30-b8eb79735e1a"&gt;recent
smoke test of Office 2007 conformance to ISO/IEC 29500&lt;/a&gt; here, as promised, is a
repeat of the exercise using ISO/IEC 26300 (ODF 1.0).
&lt;/p&gt;
&lt;p&gt;
Like OOXML, ODF has (sensibly) a schema defined using RELAX NG (ISO/IEC 19757-2).
This schema is published in the standard itself and is &lt;a href="http://www.oasis-open.org/committees/download.php/12571/OpenDocument-schema-v1.0-os.rng"&gt;available
for download from OASIS&lt;/a&gt;.
&lt;/p&gt;
&lt;h4&gt;ODF Schema Woes
&lt;/h4&gt;
&lt;p&gt;
The first problem encountered was in trying to use this schema. Both &lt;a href="http://www.thaiopensource.com/relaxng/jing.html"&gt;James
Clark’s jing&lt;/a&gt; and &lt;a href="https://msv.dev.java.net/"&gt;Sun’s Multi-schema validator&lt;/a&gt; emitted
error messages when processing it. Further investigation reveals that the schema has
a critical flaw in the way its open models conflict with its typed attribute values.
At the end of this blog entry is a detailed defect report with a proposal how to fix
the schema. By filing this I nail my colours to the mast as a staunch &lt;a href="http://www.durusau.net/publications/promotion.pdf"&gt;ODF
supporter&lt;/a&gt;!
&lt;/p&gt;
&lt;p&gt;
The consequence of this schema flaw is that the formal definition of document validity
in ODF 1.0 is broken. I suspect tools which claim to use the schema with success are
based on &lt;a href="http://xmlsoft.org/"&gt;Libxml&lt;/a&gt;, whose RELAX NG validator is incomplete.
Don’t trust them.
&lt;/p&gt;
&lt;p&gt;
Imagine the outrage there'd have been if OOXML had passed with this kind of defect!
&lt;/p&gt;
&lt;h4&gt;Getting an ODF Document
&lt;/h4&gt;
&lt;p&gt;
For parity with the OOXML test, I used the same document (Ecma 376 Part 4) for testing.
This requires several steps of conversion, from Ecma 376 format to Word binary, and
then (using &lt;a href="http://www.openoffice.org/"&gt;OpenOffice.org&lt;/a&gt; 2.4.0) from Word
binary to ODF. The process took several hours, but in the end it results in a .odt
file of approx 59MB.
&lt;/p&gt;
&lt;h4&gt;Validation Result
&lt;/h4&gt;
&lt;p&gt;
Validating the ODF document against the (patched) schema yielded 7,525 validation
errors – mostly of the same type (use of an undeclared &lt;tt&gt;soft-page-break&lt;/tt&gt; element).
&lt;/p&gt;
&lt;h4&gt;Conclusion
&lt;/h4&gt;
&lt;p&gt;
Again, only tentative conclusions can be drawn from a smoke test (readers unfamiliar
with this term as applied to software testing are recommended to read &lt;a href="http://en.wikipedia.org/wiki/Smoke_testing#Smoke_testing_in_software_development"&gt;the
Wikipedia article on it&lt;/a&gt; &lt;b&gt;before&lt;/b&gt; grumbling about the depth of the test, please).
&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
For ISO/IEC 26300:2006 (ODF) in general, we can say that the standard itself has a
defect which prevents any document claiming validity from being actually valid. Consequently,
there are no XML documents in existence which are valid to ISO ODF.&lt;/li&gt;
&lt;li&gt;
Even if the schema is fixed, we can see that OpenOffice.org 2.4.0 does not produce
valid XML documents. This is to be expected and is a mirror-case of what was found
for MS Office 2007: while MS Office has not caught up with the ISO standard, OpenOffice
has rather bypassed it (it aims at its consortium standard, just as MS Office does).&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;
I’d be very interested to find an office application that &lt;b&gt;does&lt;/b&gt; work with valid
ISO/IEC 26300 content. Do any readers know of one?
&lt;/p&gt;
&lt;h4&gt;Looking Forward
&lt;/h4&gt;
&lt;p&gt;
A smoke test only scratches the surface – a fuller document conformance test suite
would give a much better idea of the semantic (as well as the syntactic) validity
of documents that claim conformance to either 29500 or 26300. 
&lt;/p&gt;
&lt;p&gt;
Fortunately &lt;a href="http://www.jtc1sc34.org/"&gt;SC 34&lt;/a&gt; has spent the past years
working on exactly the kinds of technologies (&lt;a href="http://www.dsdl.org"&gt;ISO/IEC
19757, DSDL&lt;/a&gt;) that will allow a more complete validation of XML documents. I am
hopeful that we will see some more meaningful testing in time, and note with interest
that the Italian National Standards Body have invited participation in such activities.
&lt;/p&gt;
&lt;p&gt;
The unfortunate reality for concerned users is that there are no office application
suites on the planet that create XML valid to International Standards, although both
MS Office and OpenOffice.org get you within sniffing distance. The remedies for this
shortfall are for Microsoft (on the one hand) to update its Office product, and for
ODF developers (on the other hand) to pay more attention to XML validity – especially
when targeting the upcoming ISO standard version of ODF 1.2. The world is moving on,
and users do not want to spend time battling with incorrect outputs of their office
applications: they want a reliable format they can use to build further applications
on. Let us hope the coming months and years will see marked improvements in document
conformance levels!
&lt;/p&gt;
&lt;p&gt;
N.B. As this blog entry “goes to press”, &lt;a href="http://idippedut.dk/post/2008/04/Conformance-of-ODF-documents.aspx"&gt;Jesper
Lund Stocholm has posted a blog entry on ODF conformance&lt;/a&gt; which is also well worth
reading.
&lt;/p&gt;
&lt;p&gt;
I suspect neither his blog entry, nor this one, will receive as much attention as
the one reporting findings on MS Office's XML! Let's see.
&lt;/p&gt;
&lt;br&gt;
&lt;br&gt;
&lt;hr height="1px" width="100"&gt;
&lt;br&gt;
&lt;div&gt;
&lt;h4&gt;Defect Report ISO/IEC 26300:2006
&lt;/h4&gt;
&lt;p&gt;
Clause 16.2 defines an “open model” for custom content using two patterns, as follows:
&lt;/p&gt;
&lt;pre style="background-color: rgb(238, 238, 238);"&gt;&amp;lt;define name="anyAttListOrElements"&amp;gt;&lt;br&gt;
&amp;lt;zeroOrMore&amp;gt;&lt;br&gt;
&amp;lt;attribute&amp;gt;&lt;br&gt;
&amp;lt;anyName&amp;gt;&lt;br&gt;
&amp;lt;text&amp;gt; &amp;lt;/text&amp;gt;&lt;br&gt;
&amp;lt;/anyName&amp;gt;&lt;br&gt;
&amp;lt;ref name="anyElements"&amp;gt; &amp;lt;/ref&amp;gt;&lt;br&gt;
&amp;lt;define name="anyElements"&amp;gt;&lt;br&gt;
&amp;lt;zeroOrMore&amp;gt;&lt;br&gt;
&amp;lt;element&amp;gt;&lt;br&gt;
&amp;lt;anyName&amp;gt;&lt;br&gt;
&amp;lt;mixed&amp;gt;&lt;br&gt;
&amp;lt;ref name="anyAttListOrElements"&amp;gt; &amp;lt;/ref&amp;gt;&lt;br&gt;
&amp;lt;/mixed&amp;gt;&lt;br&gt;
&amp;lt;/anyName&amp;gt;&lt;br&gt;
&amp;lt;/element&amp;gt;&lt;br&gt;
&amp;lt;/zeroOrMore&amp;gt;&lt;br&gt;
&amp;lt;/define&amp;gt;&lt;br&gt;
&amp;lt;/attribute&amp;gt;&lt;br&gt;
&amp;lt;/zeroOrMore&amp;gt;&lt;br&gt;
&amp;lt;/define&amp;gt;&lt;/pre&gt;
&lt;p&gt;
Similar definitions are also used (clause 15.2) for the modelling of mathematical
markup:
&lt;/p&gt;
&lt;pre style="background-color: rgb(238, 238, 238);"&gt; &amp;lt;!-- To avoid inclusion of the complete MathML schema, anything --&amp;gt;&lt;br&gt;
&amp;lt;!-- is allowed within a math:math top-level element --&amp;gt;&lt;br&gt;
&lt;br&gt;
&amp;lt;define name="mathMarkup"&amp;gt;&lt;br&gt;
&amp;lt;zeroOrMore&amp;gt;&lt;br&gt;
&amp;lt;choice&amp;gt;&lt;br&gt;
&amp;lt;attribute&amp;gt;&lt;br&gt;
&amp;lt;anyName/&amp;gt;&lt;br&gt;
&amp;lt;/attribute&amp;gt;&lt;br&gt;
&amp;lt;text/&amp;gt;&lt;br&gt;
&amp;lt;element&amp;gt;&lt;br&gt;
&amp;lt;anyName/&amp;gt;&lt;br&gt;
&amp;lt;ref name="mathMarkup"/&amp;gt;&lt;br&gt;
&amp;lt;/element&amp;gt;&lt;br&gt;
&amp;lt;/choice&amp;gt;&lt;br&gt;
&amp;lt;/zeroOrMore&amp;gt;&lt;br&gt;
&amp;lt;/define&amp;gt;&lt;/pre&gt;
&lt;p&gt;
However, the declaration of attributes here with any name and any value of any type,
conflicts with the declaration elsewhere in the schema of attributes that have an
ID or IDREF type. Consequently the schema cannot be processed by validating processors
which respect type consistency (e.g. jing [1] or msv [2] used with warning enabled).
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;Proposed Solution&lt;/b&gt;
&lt;/p&gt;
&lt;p&gt;
The schema must be corrected. This can be done by excluding the typed attributes from
the custom model as follows:
&lt;/p&gt;
&lt;pre style="background-color: rgb(238, 238, 238);"&gt;  &amp;lt;define name="anyAttListOrElements"&amp;gt;&lt;br&gt;
&amp;lt;zeroOrMore&amp;gt;&lt;br&gt;
&amp;lt;attribute&amp;gt;&lt;br&gt;
&amp;lt;anyName&amp;gt;&lt;br&gt;
&amp;lt;except&amp;gt;&lt;br&gt;
&amp;lt;name&amp;gt;smil:targetElement&amp;lt;/name&amp;gt;&lt;br&gt;
&amp;lt;name&amp;gt;text:id&amp;lt;/name&amp;gt;&lt;br&gt;
&amp;lt;name&amp;gt;text:change-id&amp;lt;/name&amp;gt;&lt;br&gt;
&amp;lt;name&amp;gt;form:id&amp;lt;/name&amp;gt;&lt;br&gt;
&amp;lt;name&amp;gt;presentation:master-element&amp;lt;/name&amp;gt;&lt;br&gt;
&amp;lt;name&amp;gt;draw:id&amp;lt;/name&amp;gt;&lt;br&gt;
&amp;lt;name&amp;gt;anim:id&amp;lt;/name&amp;gt;&lt;br&gt;
&amp;lt;name&amp;gt;draw:shape-id&amp;lt;/name&amp;gt;&lt;br&gt;
&amp;lt;name&amp;gt;draw:end-shape&amp;lt;/name&amp;gt;&lt;br&gt;
&amp;lt;name&amp;gt;draw:start-shape&amp;lt;/name&amp;gt;&lt;br&gt;
&amp;lt;name&amp;gt;draw:control&amp;lt;/name&amp;gt;&lt;br&gt;
&amp;lt;/except&amp;gt;&lt;br&gt;
&amp;lt;/anyName&amp;gt;&lt;br&gt;
&amp;lt;text/&amp;gt;&lt;br&gt;
&amp;lt;/attribute&amp;gt;&lt;br&gt;
&amp;lt;/zeroOrMore&amp;gt;&lt;br&gt;
&amp;lt;ref name="anyElements"/&amp;gt;&lt;br&gt;
&amp;lt;/define&amp;gt;&lt;br&gt;
&amp;lt;define name="anyElements"&amp;gt;&lt;br&gt;
&amp;lt;zeroOrMore&amp;gt;&lt;br&gt;
&amp;lt;element&amp;gt;&lt;br&gt;
&amp;lt;anyName/&amp;gt;&lt;br&gt;
&amp;lt;mixed&amp;gt;&lt;br&gt;
&amp;lt;ref name="anyAttListOrElements"/&amp;gt;&lt;br&gt;
&amp;lt;/mixed&amp;gt;&lt;br&gt;
&amp;lt;/element&amp;gt;&lt;br&gt;
&amp;lt;/zeroOrMore&amp;gt;&lt;br&gt;
&amp;lt;/define&amp;gt;&lt;/pre&gt;
&lt;p&gt;
If it is intended these attributes should be allowed in custom data, they should be
re-included (correctly typed) as necessary.
&lt;/p&gt;
&lt;p&gt;
In general, the custom data model should be revisited – is it really the intention
that it should be so open?
&lt;/p&gt;
&lt;p&gt;
Similarly, the math markup model would be better made more restrictive either by incorporating
a MathML schema, or at least by restricting the allowed elements to certain Namespaces.
For the time being it should at least re-use the custom model to avoid unnecessary
replication of patterns.
&lt;/p&gt;
&lt;h4&gt;References
&lt;/h4&gt;
&lt;p&gt;
[1] Jing - A RELAX NG validator in Java &lt;a href="http://www.thaiopensource.com/relaxng/jing.html"&gt;http://www.thaiopensource.com/relaxng/jing.html&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;
[2] Sun Multi-Schema Validator &lt;a href="https://msv.dev.java.net/"&gt;https://msv.dev.java.net/&lt;/a&gt;
&lt;/p&gt;
&lt;/div&gt;
&lt;img width="0" height="0" src="http://www.griffinbrown.co.uk/blog/aggbug.ashx?id=f0384bed-808b-49a8-8887-ea7cde5caace" /&gt;</description>
      <comments>http://www.griffinbrown.co.uk/blog/CommentView.aspx?guid=f0384bed-808b-49a8-8887-ea7cde5caace</comments>
    </item>
    <item>
      <trackback:ping>http://www.griffinbrown.co.uk/blog/Trackback.aspx?guid=203990bb-aba4-4575-9097-19c4084455dc</trackback:ping>
      <pingback:server>http://www.griffinbrown.co.uk/blog/pingback.aspx</pingback:server>
      <pingback:target>http://www.griffinbrown.co.uk/blog/PermaLink.aspx?guid=203990bb-aba4-4575-9097-19c4084455dc</pingback:target>
      <dc:creator>Alex Brown</dc:creator>
      <wfw:comment>http://www.griffinbrown.co.uk/blog/CommentView.aspx?guid=203990bb-aba4-4575-9097-19c4084455dc</wfw:comment>
      <wfw:commentRss>http://www.griffinbrown.co.uk/blog/SyndicationService.asmx/GetEntryCommentsRss?guid=203990bb-aba4-4575-9097-19c4084455dc</wfw:commentRss>
      <slash:comments>8</slash:comments>
      <body xmlns="http://www.w3.org/1999/xhtml">
        <p>
          <a href="http://picasaweb.google.co.uk/alexander.david.john.brown/Nyc/photo#5192176895708696514">
            <img src="http://lh4.ggpht.com/alexander.david.john.brown/SA5RYJe1D8I/AAAAAAAAAfU/wX_oXrKKPQA/s288/IMG_0008.JPG" />
          </a>
          <br />
          <i>Empire State Building</i>
        </p>
        <p>
To New York for three days of client meetings. With an afternoon free, and very pleasant
weather what better way to spend time than taking a trip up the <a href="http://en.wikipedia.org/wiki/Empire_State_Building">Empire
State Building</a> (the sign in the lobby said "visibility: 10 miles").
</p>
        <p>
          <a href="http://picasaweb.google.co.uk/alexander.david.john.brown/Nyc/photo#5192177286550720466">
            <img src="http://lh3.ggpht.com/alexander.david.john.brown/SA5Ru5e1D9I/AAAAAAAAAfc/ld20QWlRvWs/s400/IMG_0056.JPG" />
          </a>
          <br />
          <i>Pigeons on the 86th floor</i>
        </p>
        <p>
How nice to have three days of purely commercial work stretching ahead, with no OOXML
or standards politics in sight. There is a certain clarity to doing technical work
in an environment when the requirements are clearly on the table; and technically
and conceptually the schema I'm working on here is miles ahead of OOXML/ODF — but
maybe in saying that I'm influenced by the fact that I am the chief designer ;-)
</p>
        <h4>ODF Conformance catch-up
</h4>
        <p>
When I get back to the UK I hope to post a blog entry on ODF conformance. I'm surprised
nobody has risen to the challenge I issued in my last blog entry to predict the result.
So, I renew the call! I'd be particularly interested in hearing about any ODF implementations
that people think <i>should</i> be conformant …
</p>
        <p>
One immediate problem came up in that the published RELAX NG schemas in the ISO standard
(ISO/IEC 26300) appear to have a technical fault which makes them unusable. I wonder,
am I the <i>first person ever</i> to make a serious attempt to validate an ODF document
against its International Standard specification?
</p>
- Alex. <img width="0" height="0" src="http://www.griffinbrown.co.uk/blog/aggbug.ashx?id=203990bb-aba4-4575-9097-19c4084455dc" /></body>
      <title>Up here where the air is clear</title>
      <guid isPermaLink="false">http://www.griffinbrown.co.uk/blog/PermaLink.aspx?guid=203990bb-aba4-4575-9097-19c4084455dc</guid>
      <link>http://www.griffinbrown.co.uk/blog/PermaLink.aspx?guid=203990bb-aba4-4575-9097-19c4084455dc</link>
      <pubDate>Wed, 23 Apr 2008 08:26:54 GMT</pubDate>
      <description>&lt;p&gt;
&lt;a href="http://picasaweb.google.co.uk/alexander.david.john.brown/Nyc/photo#5192176895708696514"&gt;&lt;img src="http://lh4.ggpht.com/alexander.david.john.brown/SA5RYJe1D8I/AAAAAAAAAfU/wX_oXrKKPQA/s288/IMG_0008.JPG"&gt;&lt;/a&gt; 
&lt;br&gt;
&lt;i&gt;Empire State Building&lt;/i&gt;
&lt;/p&gt;
&lt;p&gt;
To New York for three days of client meetings. With an afternoon free, and very pleasant
weather what better way to spend time than taking a trip up the &lt;a href="http://en.wikipedia.org/wiki/Empire_State_Building"&gt;Empire
State Building&lt;/a&gt; (the sign in the lobby said "visibility: 10 miles").
&lt;/p&gt;
&lt;p&gt;
&lt;a href="http://picasaweb.google.co.uk/alexander.david.john.brown/Nyc/photo#5192177286550720466"&gt;&lt;img src="http://lh3.ggpht.com/alexander.david.john.brown/SA5Ru5e1D9I/AAAAAAAAAfc/ld20QWlRvWs/s400/IMG_0056.JPG"&gt;&lt;/a&gt;
&lt;br&gt;
&lt;i&gt;Pigeons on the 86th floor&lt;/i&gt;
&lt;/p&gt;
&lt;p&gt;
How nice to have three days of purely commercial work stretching ahead, with no OOXML
or standards politics in sight. There is a certain clarity to doing technical work
in an environment when the requirements are clearly on the table; and technically
and conceptually the schema I'm working on here is miles ahead of OOXML/ODF — but
maybe in saying that I'm influenced by the fact that I am the chief designer ;-)
&lt;/p&gt;
&lt;h4&gt;ODF Conformance catch-up
&lt;/h4&gt;
&lt;p&gt;
When I get back to the UK I hope to post a blog entry on ODF conformance. I'm surprised
nobody has risen to the challenge I issued in my last blog entry to predict the result.
So, I renew the call! I'd be particularly interested in hearing about any ODF implementations
that people think &lt;i&gt;should&lt;/i&gt; be conformant …
&lt;/p&gt;
&lt;p&gt;
One immediate problem came up in that the published RELAX NG schemas in the ISO standard
(ISO/IEC 26300) appear to have a technical fault which makes them unusable. I wonder,
am I the &lt;i&gt;first person ever&lt;/i&gt; to make a serious attempt to validate an ODF document
against its International Standard specification?
&lt;/p&gt;
- Alex. &lt;img width="0" height="0" src="http://www.griffinbrown.co.uk/blog/aggbug.ashx?id=203990bb-aba4-4575-9097-19c4084455dc" /&gt;</description>
      <comments>http://www.griffinbrown.co.uk/blog/CommentView.aspx?guid=203990bb-aba4-4575-9097-19c4084455dc</comments>
    </item>
    <item>
      <trackback:ping>http://www.griffinbrown.co.uk/blog/Trackback.aspx?guid=3e2202cd-59a3-4356-8f30-b8eb79735e1a</trackback:ping>
      <pingback:server>http://www.griffinbrown.co.uk/blog/pingback.aspx</pingback:server>
      <pingback:target>http://www.griffinbrown.co.uk/blog/PermaLink.aspx?guid=3e2202cd-59a3-4356-8f30-b8eb79735e1a</pingback:target>
      <dc:creator>Alex Brown</dc:creator>
      <wfw:comment>http://www.griffinbrown.co.uk/blog/CommentView.aspx?guid=3e2202cd-59a3-4356-8f30-b8eb79735e1a</wfw:comment>
      <wfw:commentRss>http://www.griffinbrown.co.uk/blog/SyndicationService.asmx/GetEntryCommentsRss?guid=3e2202cd-59a3-4356-8f30-b8eb79735e1a</wfw:commentRss>
      <slash:comments>9</slash:comments>
      <body xmlns="http://www.w3.org/1999/xhtml">
        <p>
I was excited to receive from <a href="http://en.wikipedia.org/wiki/Makoto_Murata">Murata
Makoto</a> a set of the RELAX NG schemas for the (post-BRM) revision of OOXML, and
thought it would be interesting to validate some real-world content against them,
to get a rough idea of how non-conformant the standardisation of 29500 had made MS
Office 2007.
</p>
        <p>
Not having Office 2007 installed at work (our clients aren't using it – yet), the
first problem is actually getting a reasonable sample for testing. Fortunately, the
Ecma 376 specification itself is <a href="http://www.ecma-international.org/publications/standards/Ecma-376.htm">available
for download from Ecma</a> as a .docx file, and this hefty document is a reasonable
basis for a smoke test ...
</p>
        <p>
The main document ("document.xml") content for Part 4 of Ecma 376 weighs in at approx.
60MB of XML. Looking at it ... I'm sorry, but I'm not working on that size of document
when it's spread across only two lines. Pretty-printing the thing makes it rather
more usable, but pushes the file size up to around 100MB.
</p>
        <p>
So we have a document and a RELAX NG schema. All that's necessary now it to use <a href="http://www.thaiopensource.com/relaxng/jing.html">jing</a> (or
similar) and we can validate ...
</p>
        <h4>Validating against the STRICT model
</h4>
        <p>
The STRICT conformance model is quite a bit different from Ecma 376, essentially because
most of that format's most notorious features (non ISO dates, compatibility settings
like <i>autospacewotnot</i>, VML, etc.) have been removed. Thus the expectation is
that existing Office 2007 documents might be some distance away from being valid according
to the strict schemas.
</p>
        <p>
Sure enough, jing emitted 17MB (around 122,000) of invalidity messages when validating
in this scenario. Most of them seem to involve unrecognised attributes or attribute
values: I would expect a document which exercised a wider range of features to generate
a more diverse set of error message.
</p>
        <h4>Validating against the TRANSITIONAL model
</h4>
        <p>
The TRANSITIONAL conformance model is quite a bit closer to the original Ecma 376.
Countries at the BRM (rather more than Ecma, as it happened) were very keen to keep
compatibilty with Ecma 376 and to preserve XML structures at which legacy Office features
could be targetted. The expectation is therefore that an MS Office 2007 document should
be pretty close to valid according to the TRANSITIONAL schema.
</p>
        <p>
Sure enough (again) the result is as expected: relatively few messages (84) are emitted
and they are all of the same type complaining e.g. of the element:
</p>
        <pre>&lt;m:degHide m:val="on"/&gt;<br /></pre>
since the allowed attribute values for <tt>val</tt> are now "true", "false", etc.
— this was one of the many tidying-up exercices performed at the BRM. 
<h4>Conclusions?
</h4><p>
Such a test is only indicative, of course, but a few tentative conclusions can be
drawn:
</p><ul><li>
Word documents generated by today's version of MS Office 2007 do not conform to ISO/IEC
29500</li><li>
Making them conform to the STRICT schema is going to require some surgery to the (de)serialisation
code of the application</li><li>
Making them conform to the TRANSITIONAL will require less of the same sort of surgery
(since they're quite close to conformant as-is)<br /></li></ul><p>
Given Microsoft's proven ability to <a href="http://www.griffinbrown.co.uk/blog/PermaLink,guid,f19a3daa-6cbe-4621-8add-b64f532c6743.aspx">tinker
with the Office XML file format</a> between service packs, I am hoping that MS Office
will shortly be brought into line with the 29500 specification, <b>and will stay that
way</b>. Indeed, a strong motivation for approving 29500 as an ISO/IEC standard was
to discourage Microsoft from this kind of file format rug-pulling stunt in future.
</p><h4>What's next?
</h4><p>
To repeat the exercise with ISO/IEC 26300:2006 (ODF 1.0) and a popular implementation
of OpenDocument. Will anybody be brave enough to predict what kind of result that
exercise will have?
</p><br />
- Alex.<img width="0" height="0" src="http://www.griffinbrown.co.uk/blog/aggbug.ashx?id=3e2202cd-59a3-4356-8f30-b8eb79735e1a" /></body>
      <title>OOXML and Office 2007 Conformance: a Smoke Test</title>
      <guid isPermaLink="false">http://www.griffinbrown.co.uk/blog/PermaLink.aspx?guid=3e2202cd-59a3-4356-8f30-b8eb79735e1a</guid>
      <link>http://www.griffinbrown.co.uk/blog/PermaLink.aspx?guid=3e2202cd-59a3-4356-8f30-b8eb79735e1a</link>
      <pubDate>Thu, 17 Apr 2008 13:20:22 GMT</pubDate>
      <description>&lt;p&gt;
I was excited to receive from &lt;a href="http://en.wikipedia.org/wiki/Makoto_Murata"&gt;Murata
Makoto&lt;/a&gt; a set of the RELAX NG schemas for the (post-BRM) revision of OOXML, and
thought it would be interesting to validate some real-world content against them,
to get a rough idea of how non-conformant the standardisation of 29500 had made MS
Office 2007.
&lt;/p&gt;
&lt;p&gt;
Not having Office 2007 installed at work (our clients aren't using it – yet), the
first problem is actually getting a reasonable sample for testing. Fortunately, the
Ecma 376 specification itself is &lt;a href="http://www.ecma-international.org/publications/standards/Ecma-376.htm"&gt;available
for download from Ecma&lt;/a&gt; as a .docx file, and this hefty document is a reasonable
basis for a smoke test ...
&lt;/p&gt;
&lt;p&gt;
The main document ("document.xml") content for Part 4 of Ecma 376 weighs in at approx.
60MB of XML. Looking at it ... I'm sorry, but I'm not working on that size of document
when it's spread across only two lines. Pretty-printing the thing makes it rather
more usable, but pushes the file size up to around 100MB.
&lt;/p&gt;
&lt;p&gt;
So we have a document and a RELAX NG schema. All that's necessary now it to use &lt;a href="http://www.thaiopensource.com/relaxng/jing.html"&gt;jing&lt;/a&gt; (or
similar) and we can validate ...
&lt;/p&gt;
&lt;h4&gt;Validating against the STRICT model
&lt;/h4&gt;
&lt;p&gt;
The STRICT conformance model is quite a bit different from Ecma 376, essentially because
most of that format's most notorious features (non ISO dates, compatibility settings
like &lt;i&gt;autospacewotnot&lt;/i&gt;, VML, etc.) have been removed. Thus the expectation is
that existing Office 2007 documents might be some distance away from being valid according
to the strict schemas.
&lt;/p&gt;
&lt;p&gt;
Sure enough, jing emitted 17MB (around 122,000) of invalidity messages when validating
in this scenario. Most of them seem to involve unrecognised attributes or attribute
values: I would expect a document which exercised a wider range of features to generate
a more diverse set of error message.
&lt;/p&gt;
&lt;h4&gt;Validating against the TRANSITIONAL model
&lt;/h4&gt;
&lt;p&gt;
The TRANSITIONAL conformance model is quite a bit closer to the original Ecma 376.
Countries at the BRM (rather more than Ecma, as it happened) were very keen to keep
compatibilty with Ecma 376 and to preserve XML structures at which legacy Office features
could be targetted. The expectation is therefore that an MS Office 2007 document should
be pretty close to valid according to the TRANSITIONAL schema.
&lt;/p&gt;
&lt;p&gt;
Sure enough (again) the result is as expected: relatively few messages (84) are emitted
and they are all of the same type complaining e.g. of the element:
&lt;/p&gt;
&lt;pre&gt;&amp;lt;m:degHide m:val="on"/&amp;gt;&lt;br&gt;
&lt;/pre&gt;
since the allowed attribute values for &lt;tt&gt;val&lt;/tt&gt; are now "true", "false", etc.
— this was one of the many tidying-up exercices performed at the BRM. 
&lt;h4&gt;Conclusions?
&lt;/h4&gt;
&lt;p&gt;
Such a test is only indicative, of course, but a few tentative conclusions can be
drawn:
&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
Word documents generated by today's version of MS Office 2007 do not conform to ISO/IEC
29500&lt;/li&gt;
&lt;li&gt;
Making them conform to the STRICT schema is going to require some surgery to the (de)serialisation
code of the application&lt;/li&gt;
&lt;li&gt;
Making them conform to the TRANSITIONAL will require less of the same sort of surgery
(since they're quite close to conformant as-is)&lt;br&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;
Given Microsoft's proven ability to &lt;a href="http://www.griffinbrown.co.uk/blog/PermaLink,guid,f19a3daa-6cbe-4621-8add-b64f532c6743.aspx"&gt;tinker
with the Office XML file format&lt;/a&gt; between service packs, I am hoping that MS Office
will shortly be brought into line with the 29500 specification, &lt;b&gt;and will stay that
way&lt;/b&gt;. Indeed, a strong motivation for approving 29500 as an ISO/IEC standard was
to discourage Microsoft from this kind of file format rug-pulling stunt in future.
&lt;/p&gt;
&lt;h4&gt;What's next?
&lt;/h4&gt;
&lt;p&gt;
To repeat the exercise with ISO/IEC 26300:2006 (ODF 1.0) and a popular implementation
of OpenDocument. Will anybody be brave enough to predict what kind of result that
exercise will have?
&lt;/p&gt;
&lt;br&gt;
- Alex.&lt;img width="0" height="0" src="http://www.griffinbrown.co.uk/blog/aggbug.ashx?id=3e2202cd-59a3-4356-8f30-b8eb79735e1a" /&gt;</description>
      <comments>http://www.griffinbrown.co.uk/blog/CommentView.aspx?guid=3e2202cd-59a3-4356-8f30-b8eb79735e1a</comments>
    </item>
    <item>
      <trackback:ping>http://www.griffinbrown.co.uk/blog/Trackback.aspx?guid=3ceba09a-8350-4332-a494-90c4c8618f88</trackback:ping>
      <pingback:server>http://www.griffinbrown.co.uk/blog/pingback.aspx</pingback:server>
      <pingback:target>http://www.griffinbrown.co.uk/blog/PermaLink.aspx?guid=3ceba09a-8350-4332-a494-90c4c8618f88</pingback:target>
      <dc:creator>Alex Brown</dc:creator>
      <wfw:comment>http://www.griffinbrown.co.uk/blog/CommentView.aspx?guid=3ceba09a-8350-4332-a494-90c4c8618f88</wfw:comment>
      <wfw:commentRss>http://www.griffinbrown.co.uk/blog/SyndicationService.asmx/GetEntryCommentsRss?guid=3ceba09a-8350-4332-a494-90c4c8618f88</wfw:commentRss>
      <slash:comments>2</slash:comments>
      <body xmlns="http://www.w3.org/1999/xhtml">
        <p>
There has been some interesting <a href="http://lists.xml.org/archives/xml-dev/200802/msg00131.html">discussion
on xml-dev recently</a> about the future of XML, and in particular whether the XML
specification itself needs to be fundamentally revisited. One idea that particularly
interested me was that DTDs could/should be removed from XML specification as they
place a heavy burden on implementors and implementations in what is the Age of the
Schema (apparently).
</p>
        <p>
I think we can go a lot further than that, and that there is a general need to be
able to communicate to a processor what features of the XML family a document uses.
I think a good way to do this is with a PI that follows the XML declaration, so:
</p>
        <pre style="background: rgb(221, 221, 221) none repeat scroll 0% 50%; -moz-background-clip: -moz-initial; -moz-background-origin: -moz-initial; -moz-background-inline-policy: -moz-initial;">&lt;?xml version="1.0"?&gt;<br />
&lt;?profile dtd="no"?&gt;<br /></pre>
        <p>
would do the trick in conveying that a document made no uses of DTD constructs.
</p>
        <p>
We could go further:
</p>
        <pre style="background: rgb(221, 221, 221) none repeat scroll 0% 50%; -moz-background-clip: -moz-initial; -moz-background-origin: -moz-initial; -moz-background-inline-policy: -moz-initial;">&lt;?xml version="1.0"?&gt;<br />
&lt;?profile dtd="no" namespaces="no"?&gt;<br /></pre>
        <p>
          <i>et voila</i> we convey to our processor that there will be no use of <a href="http://www.w3.org/TR/REC-xml-names/">XML
Namespaces</a> in a document. Conversely, specifying <tt>namespaces="yes"</tt> would
tell a processor that support for that spec is required. Currently this sort of thing
has to be done using ad hoc processor-specific features.
</p>
        <p>
We could use this kind of mechanism to tell a processor whether it should recognize <a href="http://www.w3.org/TR/xml-id/">xml:id</a>, <a href="http://www.w3.org/TR/xinclude/">XML
Inclusions</a>, etc. etc.
</p>
        <h2>Getting more controversial
</h2>
        <p>
We can go further still. What about this? 
</p>
        <pre style="background: rgb(221, 221, 221) none repeat scroll 0% 50%; -moz-background-clip: -moz-initial; -moz-background-origin: -moz-initial; -moz-background-inline-policy: -moz-initial;">&lt;?xml version="1.0"?&gt;<br />
&lt;?profile edition="4"?&gt;<br /></pre>
        <p>
In an attempt to stop the slippety-slide of XML 1.0 fifth edition into our document
space.
</p>
        <p>
And what about this?
</p>
        <pre style="background: rgb(221, 221, 221) none repeat scroll 0% 50%; -moz-background-clip: -moz-initial; -moz-background-origin: -moz-initial; -moz-background-inline-policy: -moz-initial;">&lt;?xml version="1.0"?&gt;<br />
&lt;?profile attributes="no"?&gt;<br /></pre>
        <p>
i.e., turning off a "core" feature of XML – the use of attributes. SML by the back
door? Hmmmmmmm, I like.
</p>
        <p>
And of course, such profiled XML documents would always be 100% conformant XML too
...
</p>
        <p>
What's not to like? If I can just type it up we can have it fast-tracked through ISO
in a jiffy ;-)
</p>
- Alex.<img width="0" height="0" src="http://www.griffinbrown.co.uk/blog/aggbug.ashx?id=3ceba09a-8350-4332-a494-90c4c8618f88" /></body>
      <title>XML Profile: A Rough Proposal for a New Standard</title>
      <guid isPermaLink="false">http://www.griffinbrown.co.uk/blog/PermaLink.aspx?guid=3ceba09a-8350-4332-a494-90c4c8618f88</guid>
      <link>http://www.griffinbrown.co.uk/blog/PermaLink.aspx?guid=3ceba09a-8350-4332-a494-90c4c8618f88</link>
      <pubDate>Thu, 13 Mar 2008 12:42:30 GMT</pubDate>
      <description>&lt;p&gt;
There has been some interesting &lt;a href="http://lists.xml.org/archives/xml-dev/200802/msg00131.html"&gt;discussion
on xml-dev recently&lt;/a&gt; about the future of XML, and in particular whether the XML
specification itself needs to be fundamentally revisited. One idea that particularly
interested me was that DTDs could/should be removed from XML specification as they
place a heavy burden on implementors and implementations in what is the Age of the
Schema (apparently).
&lt;/p&gt;
&lt;p&gt;
I think we can go a lot further than that, and that there is a general need to be
able to communicate to a processor what features of the XML family a document uses.
I think a good way to do this is with a PI that follows the XML declaration, so:
&lt;/p&gt;
&lt;pre style="background: rgb(221, 221, 221) none repeat scroll 0% 50%; -moz-background-clip: -moz-initial; -moz-background-origin: -moz-initial; -moz-background-inline-policy: -moz-initial;"&gt;&amp;lt;?xml version="1.0"?&amp;gt;&lt;br&gt;
&amp;lt;?profile dtd="no"?&amp;gt;&lt;br&gt;
&lt;/pre&gt;
&lt;p&gt;
would do the trick in conveying that a document made no uses of DTD constructs.
&lt;/p&gt;
&lt;p&gt;
We could go further:
&lt;/p&gt;
&lt;pre style="background: rgb(221, 221, 221) none repeat scroll 0% 50%; -moz-background-clip: -moz-initial; -moz-background-origin: -moz-initial; -moz-background-inline-policy: -moz-initial;"&gt;&amp;lt;?xml version="1.0"?&amp;gt;&lt;br&gt;
&amp;lt;?profile dtd="no" namespaces="no"?&amp;gt;&lt;br&gt;
&lt;/pre&gt;
&lt;p&gt;
&lt;i&gt;et voila&lt;/i&gt; we convey to our processor that there will be no use of &lt;a href="http://www.w3.org/TR/REC-xml-names/"&gt;XML
Namespaces&lt;/a&gt; in a document. Conversely, specifying &lt;tt&gt;namespaces="yes"&lt;/tt&gt; would
tell a processor that support for that spec is required. Currently this sort of thing
has to be done using ad hoc processor-specific features.
&lt;/p&gt;
&lt;p&gt;
We could use this kind of mechanism to tell a processor whether it should recognize &lt;a href="http://www.w3.org/TR/xml-id/"&gt;xml:id&lt;/a&gt;, &lt;a href="http://www.w3.org/TR/xinclude/"&gt;XML
Inclusions&lt;/a&gt;, etc. etc.
&lt;/p&gt;
&lt;h2&gt;Getting more controversial
&lt;/h2&gt;
&lt;p&gt;
We can go further still. What about this? 
&lt;/p&gt;
&lt;pre style="background: rgb(221, 221, 221) none repeat scroll 0% 50%; -moz-background-clip: -moz-initial; -moz-background-origin: -moz-initial; -moz-background-inline-policy: -moz-initial;"&gt;&amp;lt;?xml version="1.0"?&amp;gt;&lt;br&gt;
&amp;lt;?profile edition="4"?&amp;gt;&lt;br&gt;
&lt;/pre&gt;
&lt;p&gt;
In an attempt to stop the slippety-slide of XML 1.0 fifth edition into our document
space.
&lt;/p&gt;
&lt;p&gt;
And what about this?
&lt;/p&gt;
&lt;pre style="background: rgb(221, 221, 221) none repeat scroll 0% 50%; -moz-background-clip: -moz-initial; -moz-background-origin: -moz-initial; -moz-background-inline-policy: -moz-initial;"&gt;&amp;lt;?xml version="1.0"?&amp;gt;&lt;br&gt;
&amp;lt;?profile attributes="no"?&amp;gt;&lt;br&gt;
&lt;/pre&gt;
&lt;p&gt;
i.e., turning off a "core" feature of XML – the use of attributes. SML by the back
door? Hmmmmmmm, I like.
&lt;/p&gt;
&lt;p&gt;
And of course, such profiled XML documents would always be 100% conformant XML too
...
&lt;/p&gt;
&lt;p&gt;
What's not to like? If I can just type it up we can have it fast-tracked through ISO
in a jiffy ;-)
&lt;/p&gt;
- Alex.&lt;img width="0" height="0" src="http://www.griffinbrown.co.uk/blog/aggbug.ashx?id=3ceba09a-8350-4332-a494-90c4c8618f88" /&gt;</description>
      <comments>http://www.griffinbrown.co.uk/blog/CommentView.aspx?guid=3ceba09a-8350-4332-a494-90c4c8618f88</comments>
    </item>
    <item>
      <trackback:ping>http://www.griffinbrown.co.uk/blog/Trackback.aspx?guid=9aebb083-a961-42b1-9748-a57e06a0f19a</trackback:ping>
      <pingback:server>http://www.griffinbrown.co.uk/blog/pingback.aspx</pingback:server>
      <pingback:target>http://www.griffinbrown.co.uk/blog/PermaLink.aspx?guid=9aebb083-a961-42b1-9748-a57e06a0f19a</pingback:target>
      <dc:creator>Alex Brown</dc:creator>
      <wfw:comment>http://www.griffinbrown.co.uk/blog/CommentView.aspx?guid=9aebb083-a961-42b1-9748-a57e06a0f19a</wfw:comment>
      <wfw:commentRss>http://www.griffinbrown.co.uk/blog/SyndicationService.asmx/GetEntryCommentsRss?guid=9aebb083-a961-42b1-9748-a57e06a0f19a</wfw:commentRss>
      <slash:comments>1</slash:comments>
      <body xmlns="http://www.w3.org/1999/xhtml">
        <p>
I have recently recommended to a large publishing client that they adopt <a href="http://en.wikipedia.org/wiki/RELAX_NG">RELAX
NG</a> as the basis of the formal definitions of their content, in preference to <a href="http://en.wikipedia.org/wiki/XML_Schema_%28W3C%29">W3C
XML Schema Definition Language (WXS)</a>.
</p>
        <p>
There are lots of individual bits of information on why RELAX NG should be preferred
all over the web. Here is an attempt to condense some of the key information into
ten points …
</p>
        <p>
          <b>1. A better spec means better interoperability</b>
        </p>
        <p>
We, in common with many people working with WXS schemas, have been tripped up by interoperability
problems caused by different tools having a different take on how WXS should be implemented.
Even Microsoft, a developer who in generally sympathetic to WXS, has reported a number
of interoperabilty problems, and that for its customers WXS had “stuffed up the ready
interoperability they thought they were buying into with XML”. [<a href="#msreport">1</a>]
</p>
        <p>
        </p>
        <p>
The root of such interoperability problems is that the WXS specification is notoriously
hard to interpret. James Clark has called it “without doubt the hardest to understand
specification that I have ever read”. [<a href="#jcietf">2</a>] Little wonder then
that mere mortal developers have difficulty interpreting it!
</p>
        <p>
RELAX NG has, by contrast, a clear formal description of the semantics of a RELAX
NG schema – and for those who want to skip the formal text of the standard, the technology <a href="http://www.oasis-open.org/committees/relax-ng/tutorial.html">can
be clearly explained even in a short tutorial</a>. 
</p>
        <p>
          <b>2. Availability of a compact syntax</b>
        </p>
        <p>
Unlike WXS, RELAX NG has a compact syntax (as explained in <a href="http://relaxng.org/compact-tutorial-20030326.html">this
tutorial</a>. Using it a DTD like:
</p>
        <pre>&lt;!DOCTYPE addressBook [<br />
&lt;!ELEMENT addressBook (card*)&gt;<br />
&lt;!ELEMENT card (name, email)&gt;<br />
&lt;!ELEMENT name (#PCDATA)&gt;<br />
&lt;!ELEMENT email (#PCDATA)&gt;<br /></pre>
        <p>
can be expressed with this syntax:
</p>
        <pre>element addressBook {<br />
element card {<br />
element name { text },<br />
element email { text }<br />
}*<br />
}</pre>
        <p>
Much nicer!
</p>
        <p>
          <b>3. The specification is a stable ISO standard</b>
        </p>
        <p>
RELAX NG first became an <a href="http://www.oasis-open.org/committees/relax-ng/spec-20011203.html">OASIS
standard</a> in 2001 and then went through a full ISO standardisation process to become
an ISO Standard (<a href="http://standards.iso.org/ittf/PubliclyAvailableStandards/c037605_ISO_IEC_19757-2_2003%28E%29.zip">ISO
19757-2:2003</a> [free ZIP download]) in 2003. It has proved stable and complete from
the start and no revisions to it are planned.
</p>
        <p>
WXS emerged from a vendor-dominated consortium (the W3C), and is currently anticipated
to be revised and released in a 'mostly compatible' version <a href="http://www.w3.org/TR/xmlschema-11-req/">1.1</a> and,
later, revised to a <a href="http://www.w3.org/2003/09/xmlap/xml-schema-wg-charter.html">2.0</a> release.
It is unclear what level of vendor support these new releases will enjoy.
</p>
        <p>
          <b>4. No PSVI</b>
        </p>
        <p id="psvi">
The PSVI, or Post-Schema-Validation Infoset, is the result of validating a document
against a WSX schema. It consists of the normal <a hre="http://www.w3.org/TR/xml-infoset/">XML
infoset</a>, plus extra information that might have be gleaned from the schema, such
as type information about content.
</p>
        <p>
This is a bad thing.
</p>
        <p>
The main reason <i>why</i> it's a bad thing is that it introduces into the processing
model, information that cannot be expressed as XML. If a processing pipeline needs
to make use of the kind of information embodied by the PSVI, then every step in that
pipeline has to become PSVI-aware and the result is a tightly-coupled system that
is no longer XML-based, but based on something <i>other</i> than the XML Infoset,
the PSVI.
</p>
        <p>
Both James Clark [<a href="#jcpsvi">3</a>] and Elliotte Rusty Harold [<a href="#erh">4</a>]
say all that needs to be said about the perils of the PSVI.
</p>
        <p>
          <b>5. No content defaulting</b>
        </p>
        <p>
RELAX NG, at least in its ISO form, provides no mechanisms for content default. For
reaons why this is good, see <a href="http://www.griffinbrown.co.uk/blog/default.aspx#a81d0be91-b563-49b2-a1e2-067717b86bc6">this
other blog entry</a>.
</p>
        <p>
          <b>6. Better datatyping support</b>
        </p>
        <p>
WXS provides a set of datatypes that may be used to constrain and bind values in content.
This is a good idea.
</p>
        <p>
Unfortunately, there are a number of serious problems[<a href="#martin">5</a>] with
the way this has been done (and the fact that type information is communicated using
the <a href="#psvi">PSVI</a>). 
</p>
        <p>
RELAX NG, in contrast has the option for pluggable type libraries which may be implemented
through <a href="http://www.thaiopensource.com/relaxng/api/datatype/overview-summary.html">an
API</a>. Most validators ship with WXS-mirroring type libraries (if you must) too.
</p>
        <p>
(In future, when we're all using pipeline processing for validation, a nice datatype
language like <a href="http://dsdl.org/dsdl-5.pdf">DTLL</a> could more properly perform
the task of datatype validation.)
</p>
        <p>
          <b>7. More sophisticated modelling</b>
        </p>
        <p>
WXS gives us barely more sophistication in grammar modelling than DTDs did. RELAX
NG introduces useful new feature for modelling interdependent attribute and element
content.
</p>
        <p>
        </p>
        <p>
          <b>8. More sophisticated grammatical validation</b>
        </p>
        <p>
WXS grammars have to be deterministic. RELAX NG grammars can be ambiguous.
</p>
        <p>
Score one for WXS, you might think. But wait - WXS's means of preventing ambiguity
is through a constraint called Unique Particle Attribution (UPA). The problem with
this, as the Microsoft report notes, is that “it breaks idiomatic uses of XML”. So
if you want to express a grammar like <tt>(title?,para+)|(title,subtitle?,para+)</tt> (i.e.
subtitle is only permitted when there is a title) the UPA rule will prevent you, as
a validator cannot know which 'branch' of the model it is following during validation.
The problem becomes more acute if one starts adopting some of the wildcarding features
permitted in WXS.
</p>
        <p>
RELAX NG, on the other hand, will happily accommodate non-deterministic content models.
</p>
        <p>
In most applications (and probably all publishing applications) the question of whether
a governing schema's content model is deterministic or not, is a dry technicality,
of absolutely no consequence to the work in hand.
</p>
        <p>
          <b>9. Instances have no dependency</b>
        </p>
        <p>
WXS schemas (like DTDs) provide a mechanism for associating an instance with a schema:
the xsi:schemaLocation attribute. This is problematic in two ways: first, the W3C
recommendation makes it optional for processors to use this mechanism - and so behaviour
is unpredictable; secondly, this is a potential security problem: it is possible to
specify an unwanted schema here knowing that an aplication may not be free to ignore
it. 
</p>
        <p>
RELAX NG schemas, on the other hand, have no formal association with instances. The
validation model is one in which the validation process has separate inputs for the
data being tested, and the tests themselves - users do not to have to validate a document
each and every time it is processed.
</p>
        <p>
          <b>10. Growing consensus</b>
        </p>
        <p>
A growing number of key XML languages are being normatively defined using RELAX NG,
such as <a href="http://www.w3.org/TR/xhtml2/">XHTML 2.0</a>, <a href="http://www.ietf.org/rfc/rfc4287">the
Atom Syndication Format</a>, <a href="http://en.wikipedia.org/wiki/OpenDocument">OpenDocument
Format</a> and <a href="http://www.docbook.org/schemas/5x">DocBook 5</a>. It's clear
(if there is a shift) which direction that shift is in, particularly for document-like
modelling. And when <a href="http://www.tbray.org/ongoing/">Tim Bray</a>, one of the
original editors of <a href="http://www.w3.org/TR/REC-xml/">XML 1.0</a> comes out
against WXS it really is time to listen:
</p>
        <p style="margin-left: 30px; margin-right: 30px; background-color: rgb(238, 238, 255);">
Everybody who actually touches the technology has known the truth for years, and it’s
time to stop sweeping it under the rug. W3C XML Schemas (XSD) suck. They are hard
to read, hard to write, hard to understand, have interoperability problems, and are
unable to describe lots of things you want to do all the time in XML. Schemas based
on Relax NG, also known as ISO Standard 19757, are easy to write, easy to read, are
backed by a rigorous formalism for interoperability, and can describe immensely more
different XML constructs. [<a href="#tim">5</a>] 
</p>
- Alex. 
<h3>References
</h3><a name="msreport">[1]</a> Microsoft Corp., XML Schema Language Experience Report, <a href="http://www.w3.org/2005/05/25-schema/microsoft.html">http://www.w3.org/2005/05/25-schema/microsoft.html</a><br /><br /><a name="jcietf=">[2]</a> James Clark, RELAX NG and W3C XML Schema, <a href="http://www.imc.org/ietf-xml-use/mail-archive/msg00217.html">http://www.imc.org/ietf-xml-use/mail-archive/msg00217.html</a><br /><br /><a name="jcpsvi">[3]</a> James Clark, PSVI considered harmful, <a href="http://osdir.com/ml/org.w3c.tag/2002-06/msg00118.html">href='http://osdir.com/ml/org.w3c.tag/2002-06/msg00118.html</a><br /><br /><a name="erh">[4] Elliotte Rusty Harold, Pretend There's No Such Thing as the PSVI, </a><a href="http://safari.awprofessional.com/0321150406/ch25">http://safari.awprofessional.com/0321150406/ch25</a> [pay-for
content] 
<br /><br /><a name="martin">[5]</a> Comments on XML Schema Datatype made by ISO/IEC JTC 1/SC
34/WG1, <a href="http://www.jtc1sc34.org/repository/0392.htm">http://www.jtc1sc34.org/repository/0392.htm</a><br /><br /><a name="tim">[6]</a> Tim Bray, Choose RELAX Now, <a href="http://www.tbray.org/ongoing/When/200x/2006/11/27/Choose-Relax">http://www.tbray.org/ongoing/When/200x/2006/11/27/Choose-Relax</a><br /><br /><a href="http://www.digg.com"><img src="http://digg.com/img/badges/80x15-digg-badge.gif" alt="Digg!" height="15" width="80" /></a><img width="0" height="0" src="http://www.griffinbrown.co.uk/blog/aggbug.ashx?id=9aebb083-a961-42b1-9748-a57e06a0f19a" /></body>
      <title>10 reasons to model XML with RELAX NG , not W3C XML Schema</title>
      <guid isPermaLink="false">http://www.griffinbrown.co.uk/blog/PermaLink.aspx?guid=9aebb083-a961-42b1-9748-a57e06a0f19a</guid>
      <link>http://www.griffinbrown.co.uk/blog/PermaLink.aspx?guid=9aebb083-a961-42b1-9748-a57e06a0f19a</link>
      <pubDate>Thu, 26 Jul 2007 09:05:16 GMT</pubDate>
      <description>&lt;p&gt;
I have recently recommended to a large publishing client that they adopt &lt;a href="http://en.wikipedia.org/wiki/RELAX_NG"&gt;RELAX
NG&lt;/a&gt; as the basis of the formal definitions of their content, in preference to &lt;a href="http://en.wikipedia.org/wiki/XML_Schema_%28W3C%29"&gt;W3C
XML Schema Definition Language (WXS)&lt;/a&gt;.
&lt;/p&gt;
&lt;p&gt;
There are lots of individual bits of information on why RELAX NG should be preferred
all over the web. Here is an attempt to condense some of the key information into
ten points …
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;1. A better spec means better interoperability&lt;/b&gt;
&lt;/p&gt;
&lt;p&gt;
We, in common with many people working with WXS schemas, have been tripped up by interoperability
problems caused by different tools having a different take on how WXS should be implemented.
Even Microsoft, a developer who in generally sympathetic to WXS, has reported a number
of interoperabilty problems, and that for its customers WXS had “stuffed up the ready
interoperability they thought they were buying into with XML”. [&lt;a href="#msreport"&gt;1&lt;/a&gt;]
&lt;/p&gt;
&lt;p&gt;
&lt;/p&gt;
&lt;p&gt;
The root of such interoperability problems is that the WXS specification is notoriously
hard to interpret. James Clark has called it “without doubt the hardest to understand
specification that I have ever read”. [&lt;a href="#jcietf"&gt;2&lt;/a&gt;] Little wonder then
that mere mortal developers have difficulty interpreting it!
&lt;/p&gt;
&lt;p&gt;
RELAX NG has, by contrast, a clear formal description of the semantics of a RELAX
NG schema – and for those who want to skip the formal text of the standard, the technology &lt;a href="http://www.oasis-open.org/committees/relax-ng/tutorial.html"&gt;can
be clearly explained even in a short tutorial&lt;/a&gt;. 
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;2. Availability of a compact syntax&lt;/b&gt;
&lt;/p&gt;
&lt;p&gt;
Unlike WXS, RELAX NG has a compact syntax (as explained in &lt;a href="http://relaxng.org/compact-tutorial-20030326.html"&gt;this
tutorial&lt;/a&gt;. Using it a DTD like:
&lt;/p&gt;
&lt;pre&gt;&amp;lt;!DOCTYPE addressBook [&lt;br&gt;
&amp;lt;!ELEMENT addressBook (card*)&amp;gt;&lt;br&gt;
&amp;lt;!ELEMENT card (name, email)&amp;gt;&lt;br&gt;
&amp;lt;!ELEMENT name (#PCDATA)&amp;gt;&lt;br&gt;
&amp;lt;!ELEMENT email (#PCDATA)&amp;gt;&lt;br&gt;
&lt;/pre&gt;
&lt;p&gt;
can be expressed with this syntax:
&lt;/p&gt;
&lt;pre&gt;element addressBook {&lt;br&gt;
element card {&lt;br&gt;
element name { text },&lt;br&gt;
element email { text }&lt;br&gt;
}*&lt;br&gt;
}&lt;/pre&gt;
&lt;p&gt;
Much nicer!
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;3. The specification is a stable ISO standard&lt;/b&gt;
&lt;/p&gt;
&lt;p&gt;
RELAX NG first became an &lt;a href="http://www.oasis-open.org/committees/relax-ng/spec-20011203.html"&gt;OASIS
standard&lt;/a&gt; in 2001 and then went through a full ISO standardisation process to become
an ISO Standard (&lt;a href="http://standards.iso.org/ittf/PubliclyAvailableStandards/c037605_ISO_IEC_19757-2_2003%28E%29.zip"&gt;ISO
19757-2:2003&lt;/a&gt; [free ZIP download]) in 2003. It has proved stable and complete from
the start and no revisions to it are planned.
&lt;/p&gt;
&lt;p&gt;
WXS emerged from a vendor-dominated consortium (the W3C), and is currently anticipated
to be revised and released in a 'mostly compatible' version &lt;a href="http://www.w3.org/TR/xmlschema-11-req/"&gt;1.1&lt;/a&gt; and,
later, revised to a &lt;a href="http://www.w3.org/2003/09/xmlap/xml-schema-wg-charter.html"&gt;2.0&lt;/a&gt; release.
It is unclear what level of vendor support these new releases will enjoy.
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;4. No PSVI&lt;/b&gt;
&lt;/p&gt;
&lt;p id="psvi"&gt;
The PSVI, or Post-Schema-Validation Infoset, is the result of validating a document
against a WSX schema. It consists of the normal &lt;a hre="http://www.w3.org/TR/xml-infoset/"&gt;XML
infoset&lt;/a&gt;, plus extra information that might have be gleaned from the schema, such
as type information about content.
&lt;/p&gt;
&lt;p&gt;
This is a bad thing.
&lt;/p&gt;
&lt;p&gt;
The main reason &lt;i&gt;why&lt;/i&gt; it's a bad thing is that it introduces into the processing
model, information that cannot be expressed as XML. If a processing pipeline needs
to make use of the kind of information embodied by the PSVI, then every step in that
pipeline has to become PSVI-aware and the result is a tightly-coupled system that
is no longer XML-based, but based on something &lt;i&gt;other&lt;/i&gt; than the XML Infoset,
the PSVI.
&lt;/p&gt;
&lt;p&gt;
Both James Clark [&lt;a href="#jcpsvi"&gt;3&lt;/a&gt;] and Elliotte Rusty Harold [&lt;a href="#erh"&gt;4&lt;/a&gt;]
say all that needs to be said about the perils of the PSVI.
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;5. No content defaulting&lt;/b&gt;
&lt;/p&gt;
&lt;p&gt;
RELAX NG, at least in its ISO form, provides no mechanisms for content default. For
reaons why this is good, see &lt;a href="http://www.griffinbrown.co.uk/blog/default.aspx#a81d0be91-b563-49b2-a1e2-067717b86bc6"&gt;this
other blog entry&lt;/a&gt;.
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;6. Better datatyping support&lt;/b&gt;
&lt;/p&gt;
&lt;p&gt;
WXS provides a set of datatypes that may be used to constrain and bind values in content.
This is a good idea.
&lt;/p&gt;
&lt;p&gt;
Unfortunately, there are a number of serious problems[&lt;a href="#martin"&gt;5&lt;/a&gt;] with
the way this has been done (and the fact that type information is communicated using
the &lt;a href="#psvi"&gt;PSVI&lt;/a&gt;). 
&lt;/p&gt;
&lt;p&gt;
RELAX NG, in contrast has the option for pluggable type libraries which may be implemented
through &lt;a href="http://www.thaiopensource.com/relaxng/api/datatype/overview-summary.html"&gt;an
API&lt;/a&gt;. Most validators ship with WXS-mirroring type libraries (if you must) too.
&lt;/p&gt;
&lt;p&gt;
(In future, when we're all using pipeline processing for validation, a nice datatype
language like &lt;a href="http://dsdl.org/dsdl-5.pdf"&gt;DTLL&lt;/a&gt; could more properly perform
the task of datatype validation.)
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;7. More sophisticated modelling&lt;/b&gt;
&lt;/p&gt;
&lt;p&gt;
WXS gives us barely more sophistication in grammar modelling than DTDs did. RELAX
NG introduces useful new feature for modelling interdependent attribute and element
content.
&lt;/p&gt;
&lt;p&gt;
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;8. More sophisticated grammatical validation&lt;/b&gt;
&lt;/p&gt;
&lt;p&gt;
WXS grammars have to be deterministic. RELAX NG grammars can be ambiguous.
&lt;/p&gt;
&lt;p&gt;
Score one for WXS, you might think. But wait - WXS's means of preventing ambiguity
is through a constraint called Unique Particle Attribution (UPA). The problem with
this, as the Microsoft report notes, is that “it breaks idiomatic uses of XML”. So
if you want to express a grammar like &lt;tt&gt;(title?,para+)|(title,subtitle?,para+)&lt;/tt&gt; (i.e.
subtitle is only permitted when there is a title) the UPA rule will prevent you, as
a validator cannot know which 'branch' of the model it is following during validation.
The problem becomes more acute if one starts adopting some of the wildcarding features
permitted in WXS.
&lt;/p&gt;
&lt;p&gt;
RELAX NG, on the other hand, will happily accommodate non-deterministic content models.
&lt;/p&gt;
&lt;p&gt;
In most applications (and probably all publishing applications) the question of whether
a governing schema's content model is deterministic or not, is a dry technicality,
of absolutely no consequence to the work in hand.
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;9. Instances have no dependency&lt;/b&gt;
&lt;/p&gt;
&lt;p&gt;
WXS schemas (like DTDs) provide a mechanism for associating an instance with a schema:
the xsi:schemaLocation attribute. This is problematic in two ways: first, the W3C
recommendation makes it optional for processors to use this mechanism - and so behaviour
is unpredictable; secondly, this is a potential security problem: it is possible to
specify an unwanted schema here knowing that an aplication may not be free to ignore
it. 
&lt;/p&gt;
&lt;p&gt;
RELAX NG schemas, on the other hand, have no formal association with instances. The
validation model is one in which the validation process has separate inputs for the
data being tested, and the tests themselves - users do not to have to validate a document
each and every time it is processed.
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;10. Growing consensus&lt;/b&gt;
&lt;/p&gt;
&lt;p&gt;
A growing number of key XML languages are being normatively defined using RELAX NG,
such as &lt;a href="http://www.w3.org/TR/xhtml2/"&gt;XHTML 2.0&lt;/a&gt;, &lt;a href="http://www.ietf.org/rfc/rfc4287"&gt;the
Atom Syndication Format&lt;/a&gt;, &lt;a href="http://en.wikipedia.org/wiki/OpenDocument"&gt;OpenDocument
Format&lt;/a&gt; and &lt;a href="http://www.docbook.org/schemas/5x"&gt;DocBook 5&lt;/a&gt;. It's clear
(if there is a shift) which direction that shift is in, particularly for document-like
modelling. And when &lt;a href="http://www.tbray.org/ongoing/"&gt;Tim Bray&lt;/a&gt;, one of the
original editors of &lt;a href="http://www.w3.org/TR/REC-xml/"&gt;XML 1.0&lt;/a&gt; comes out
against WXS it really is time to listen:
&lt;/p&gt;
&lt;p style="margin-left: 30px; margin-right: 30px; background-color: rgb(238, 238, 255);"&gt;
Everybody who actually touches the technology has known the truth for years, and it’s
time to stop sweeping it under the rug. W3C XML Schemas (XSD) suck. They are hard
to read, hard to write, hard to understand, have interoperability problems, and are
unable to describe lots of things you want to do all the time in XML. Schemas based
on Relax NG, also known as ISO Standard 19757, are easy to write, easy to read, are
backed by a rigorous formalism for interoperability, and can describe immensely more
different XML constructs. [&lt;a href="#tim"&gt;5&lt;/a&gt;] 
&lt;/p&gt;
- Alex. 
&lt;h3&gt;References
&lt;/h3&gt;
&lt;a name="msreport"&gt;[1]&lt;/a&gt; Microsoft Corp., XML Schema Language Experience Report, &lt;a href="http://www.w3.org/2005/05/25-schema/microsoft.html"&gt;http://www.w3.org/2005/05/25-schema/microsoft.html&lt;/a&gt; 
&lt;br&gt;
&lt;br&gt;
&lt;a name="jcietf="&gt;[2]&lt;/a&gt; James Clark, RELAX NG and W3C XML Schema, &lt;a href="http://www.imc.org/ietf-xml-use/mail-archive/msg00217.html"&gt;http://www.imc.org/ietf-xml-use/mail-archive/msg00217.html&lt;/a&gt; 
&lt;br&gt;
&lt;br&gt;
&lt;a name="jcpsvi"&gt;[3]&lt;/a&gt; James Clark, PSVI considered harmful, &lt;a href="http://osdir.com/ml/org.w3c.tag/2002-06/msg00118.html"&gt;href='http://osdir.com/ml/org.w3c.tag/2002-06/msg00118.html&lt;/a&gt; 
&lt;br&gt;
&lt;br&gt;
&lt;a name="erh"&gt;[4] Elliotte Rusty Harold, Pretend There's No Such Thing as the PSVI, &lt;/a&gt;&lt;a href="http://safari.awprofessional.com/0321150406/ch25"&gt;http://safari.awprofessional.com/0321150406/ch25&lt;/a&gt; [pay-for
content] 
&lt;br&gt;
&lt;br&gt;
&lt;a name="martin"&gt;[5]&lt;/a&gt; Comments on XML Schema Datatype made by ISO/IEC JTC 1/SC
34/WG1, &lt;a href="http://www.jtc1sc34.org/repository/0392.htm"&gt;http://www.jtc1sc34.org/repository/0392.htm&lt;/a&gt; 
&lt;br&gt;
&lt;br&gt;
&lt;a name="tim"&gt;[6]&lt;/a&gt; Tim Bray, Choose RELAX Now, &lt;a href="http://www.tbray.org/ongoing/When/200x/2006/11/27/Choose-Relax"&gt;http://www.tbray.org/ongoing/When/200x/2006/11/27/Choose-Relax&lt;/a&gt; 
&lt;br&gt;
&lt;br&gt;
&lt;a href="http://www.digg.com"&gt; &lt;img src="http://digg.com/img/badges/80x15-digg-badge.gif" alt="Digg!" height="15" width="80"&gt; &lt;/a&gt; &lt;img width="0" height="0" src="http://www.griffinbrown.co.uk/blog/aggbug.ashx?id=9aebb083-a961-42b1-9748-a57e06a0f19a" /&gt;</description>
      <comments>http://www.griffinbrown.co.uk/blog/CommentView.aspx?guid=9aebb083-a961-42b1-9748-a57e06a0f19a</comments>
    </item>
    <item>
      <trackback:ping>http://www.griffinbrown.co.uk/blog/Trackback.aspx?guid=076f175d-241b-473b-8498-1eb21a76b0aa</trackback:ping>
      <pingback:server>http://www.griffinbrown.co.uk/blog/pingback.aspx</pingback:server>
      <pingback:target>http://www.griffinbrown.co.uk/blog/PermaLink.aspx?guid=076f175d-241b-473b-8498-1eb21a76b0aa</pingback:target>
      <dc:creator>Alex Brown</dc:creator>
      <wfw:comment>http://www.griffinbrown.co.uk/blog/CommentView.aspx?guid=076f175d-241b-473b-8498-1eb21a76b0aa</wfw:comment>
      <wfw:commentRss>http://www.griffinbrown.co.uk/blog/SyndicationService.asmx/GetEntryCommentsRss?guid=076f175d-241b-473b-8498-1eb21a76b0aa</wfw:commentRss>
      <body xmlns="http://www.w3.org/1999/xhtml">
        <p>
          <a href="http://www.franciscave.com">Francis Cave</a> and I are running training days
on <a href="http://www.train4publishing.co.uk/guideto/electronic/xml">XML in Publishing</a> this
summer at <a href="http://www.train4publishing.co.uk/">The Publishing Training Centre</a> at
Book House.
</p>
        <p>
This course is for those who have to manage the production of electronic content for
a range of applications. It requires no prior knowledge. During it, participants will
find out:
</p>
        <ul>
          <li>
the basic principles of mark-up languages 
</li>
          <li>
the roles XML can play in publishing 
</li>
          <li>
what it is like to work with XML data. 
</li>
        </ul>
        <p>
The next session is scheduled for 25th September, and is already filling up. To book
a place, please contact The Publishing Training Centre directly ...
</p>
- Alex.<img width="0" height="0" src="http://www.griffinbrown.co.uk/blog/aggbug.ashx?id=076f175d-241b-473b-8498-1eb21a76b0aa" /></body>
      <title>XML Training</title>
      <guid isPermaLink="false">http://www.griffinbrown.co.uk/blog/PermaLink.aspx?guid=076f175d-241b-473b-8498-1eb21a76b0aa</guid>
      <link>http://www.griffinbrown.co.uk/blog/PermaLink.aspx?guid=076f175d-241b-473b-8498-1eb21a76b0aa</link>
      <pubDate>Mon, 23 Jul 2007 15:22:37 GMT</pubDate>
      <description>&lt;p&gt;
&lt;a href="http://www.franciscave.com"&gt;Francis Cave&lt;/a&gt; and I are running training days
on &lt;a href="http://www.train4publishing.co.uk/guideto/electronic/xml"&gt;XML in Publishing&lt;/a&gt; this
summer at &lt;a href="http://www.train4publishing.co.uk/"&gt;The Publishing Training Centre&lt;/a&gt; at
Book House.
&lt;/p&gt;
&lt;p&gt;
This course is for those who have to manage the production of electronic content for
a range of applications. It requires no prior knowledge. During it, participants will
find out:
&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
the basic principles of mark-up languages 
&lt;/li&gt;
&lt;li&gt;
the roles XML can play in publishing 
&lt;/li&gt;
&lt;li&gt;
what it is like to work with XML data. 
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;
The next session is scheduled for 25th September, and is already filling up. To book
a place, please contact The Publishing Training Centre directly ...
&lt;/p&gt;
- Alex.&lt;img width="0" height="0" src="http://www.griffinbrown.co.uk/blog/aggbug.ashx?id=076f175d-241b-473b-8498-1eb21a76b0aa" /&gt;</description>
      <comments>http://www.griffinbrown.co.uk/blog/CommentView.aspx?guid=076f175d-241b-473b-8498-1eb21a76b0aa</comments>
    </item>
    <item>
      <trackback:ping>http://www.griffinbrown.co.uk/blog/Trackback.aspx?guid=ba269ca0-65ca-4d6c-9deb-603f4c12955f</trackback:ping>
      <pingback:server>http://www.griffinbrown.co.uk/blog/pingback.aspx</pingback:server>
      <pingback:target>http://www.griffinbrown.co.uk/blog/PermaLink.aspx?guid=ba269ca0-65ca-4d6c-9deb-603f4c12955f</pingback:target>
      <dc:creator>Andrew Sales</dc:creator>
      <wfw:comment>http://www.griffinbrown.co.uk/blog/CommentView.aspx?guid=ba269ca0-65ca-4d6c-9deb-603f4c12955f</wfw:comment>
      <wfw:commentRss>http://www.griffinbrown.co.uk/blog/SyndicationService.asmx/GetEntryCommentsRss?guid=ba269ca0-65ca-4d6c-9deb-603f4c12955f</wfw:commentRss>
      <body xmlns="http://www.w3.org/1999/xhtml">XSLT transformations are the stock-in-trade
of XML developers. In general, you shouldn't have to worry too much about how different
engines work, but in some rare edge cases, it's a consideration.<br /><br />
Now, it would be foolish to rely on the order of attributes in your physical instance,
since none is imposed on them by an XML processor. (OK, you <i>could</i> in theory
rely on the ordering if you canonicalize the document first: "An element's attribute
nodes are sorted lexicographically with namespace URI as the primary key and local
name as the secondary key" -- W3C Canonical XML Version 1.0).<br /><br />
The ordering of attribute nodes in XPath is similarly undefined, so trusty XPath engines
do not necessarily produce the same results when an element's attributes are processed
together. Given the XML instance:<br /><br />
    <font face="Courier New">&lt;foo c='1' b='2' a='3'/&gt;</font><br /><br />
and the XPath expression:<br /><font face="Courier New"><br />
    name(//@*)</font><br /><br />
the result might be 'c', 'b' or 'a', and could conceivably differ between runs of
the same engine.<br /><br />
That XPath expression is artificial and unlikely to be used seriously, but it serves
to make the point: different XPath engines producing different results can all still
be considered to have produced "correct" output.<br /><br />
- Andrew<br /><p></p><img width="0" height="0" src="http://www.griffinbrown.co.uk/blog/aggbug.ashx?id=ba269ca0-65ca-4d6c-9deb-603f4c12955f" /></body>
      <title>Not all XPath engines are the same...</title>
      <guid isPermaLink="false">http://www.griffinbrown.co.uk/blog/PermaLink.aspx?guid=ba269ca0-65ca-4d6c-9deb-603f4c12955f</guid>
      <link>http://www.griffinbrown.co.uk/blog/PermaLink.aspx?guid=ba269ca0-65ca-4d6c-9deb-603f4c12955f</link>
      <pubDate>Thu, 26 Apr 2007 10:50:36 GMT</pubDate>
      <description>XSLT transformations are the stock-in-trade of XML developers. In general, you shouldn't have to worry too much about how different engines work, but in some rare edge cases, it's a consideration.&lt;br&gt;
&lt;br&gt;
Now, it would be foolish to rely on the order of attributes in your physical instance,
since none is imposed on them by an XML processor. (OK, you &lt;i&gt;could&lt;/i&gt; in theory
rely on the ordering if you canonicalize the document first: "An element's attribute
nodes are sorted lexicographically with namespace URI as the primary key and local
name as the secondary key" -- W3C Canonical XML Version 1.0).&lt;br&gt;
&lt;br&gt;
The ordering of attribute nodes in XPath is similarly undefined, so trusty XPath engines
do not necessarily produce the same results when an element's attributes are processed
together. Given the XML instance:&lt;br&gt;
&lt;br&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;font face="Courier New"&gt;&amp;lt;foo c='1' b='2' a='3'/&amp;gt;&lt;/font&gt;
&lt;br&gt;
&lt;br&gt;
and the XPath expression:&lt;br&gt;
&lt;font face="Courier New"&gt;
&lt;br&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp; name(//@*)&lt;/font&gt;
&lt;br&gt;
&lt;br&gt;
the result might be 'c', 'b' or 'a', and could conceivably differ between runs of
the same engine.&lt;br&gt;
&lt;br&gt;
That XPath expression is artificial and unlikely to be used seriously, but it serves
to make the point: different XPath engines producing different results can all still
be considered to have produced "correct" output.&lt;br&gt;
&lt;br&gt;
- Andrew&lt;br&gt;
&lt;p&gt;
&lt;/p&gt;
&lt;img width="0" height="0" src="http://www.griffinbrown.co.uk/blog/aggbug.ashx?id=ba269ca0-65ca-4d6c-9deb-603f4c12955f" /&gt;</description>
      <comments>http://www.griffinbrown.co.uk/blog/CommentView.aspx?guid=ba269ca0-65ca-4d6c-9deb-603f4c12955f</comments>
    </item>
    <item>
      <trackback:ping>http://www.griffinbrown.co.uk/blog/Trackback.aspx?guid=97577d97-a97b-4768-ba27-d16d8fd2c942</trackback:ping>
      <pingback:server>http://www.griffinbrown.co.uk/blog/pingback.aspx</pingback:server>
      <pingback:target>http://www.griffinbrown.co.uk/blog/PermaLink.aspx?guid=97577d97-a97b-4768-ba27-d16d8fd2c942</pingback:target>
      <dc:creator>Alex Brown</dc:creator>
      <wfw:comment>http://www.griffinbrown.co.uk/blog/CommentView.aspx?guid=97577d97-a97b-4768-ba27-d16d8fd2c942</wfw:comment>
      <wfw:commentRss>http://www.griffinbrown.co.uk/blog/SyndicationService.asmx/GetEntryCommentsRss?guid=97577d97-a97b-4768-ba27-d16d8fd2c942</wfw:commentRss>
      <body xmlns="http://www.w3.org/1999/xhtml">
        <img src="http://www.griffinbrown.co.uk/images/xml-uk-logo.gif" />
        <br />
        <p>
          <a href="http://www.xmluk.org/">XML UK</a> are holding a day conference entitled entitled
“Publishing 2.0” at <a href="http://www.bletchleypark.org.uk/">Bletchley Park</a> on
Wednesday 25 April 2007.
</p>
        <p>
Beyond being an eye-catching title what we (as organisers) intend “Publishing 2.0”
to mean, is that the conference will be examining some of the more cutting-edge applications
of XML(ish) technology to publishing. We're putting together a cracking program which
already includes: 
</p>
        <ul>
          <li>
Leigh Dodds (<a href="http://www.publishingtechnology.com/">Publishing Technology
plc</a>) speaking on Content Management with RDF. 
</li>
          <li>
Richard Kidd (<a href="http://www.rsc.org/Publishing/Journals/">RSC Publishing)</a> speaking
on <i>The future of the journal: introducing semantics into chemical science publishing
with the RSC's <a href="http://www.rsc.org/Publishing/Journals/ProjectProspect/">Project
Prospect</a></i>. 
</li>
        </ul>
And how cool a venue is <a href="http://en.wikipedia.org/wiki/Bletchley_Park">Bletchley
Park</a>? To be in the presence of the ghost of <a href="http://en.wikipedia.org/wiki/Alan_Turing">Alan
Turing</a> adds an extra geeky frisson to the occasion. 
<p></p><p>
A full programme will be announced shortly, but I confidently predict this event <b>will</b> sell
out (the venue is limited to 100 people), so to reserve an early space <a href="http://www.xmluk.org/contact-form.php">contact
XML UK</a> with your credit card in hand.
</p>
- Alex.<img width="0" height="0" src="http://www.griffinbrown.co.uk/blog/aggbug.ashx?id=97577d97-a97b-4768-ba27-d16d8fd2c942" /></body>
      <title>Publishing 2.0 Conference (Bletchley Park, 25 April 2007)</title>
      <guid isPermaLink="false">http://www.griffinbrown.co.uk/blog/PermaLink.aspx?guid=97577d97-a97b-4768-ba27-d16d8fd2c942</guid>
      <link>http://www.griffinbrown.co.uk/blog/PermaLink.aspx?guid=97577d97-a97b-4768-ba27-d16d8fd2c942</link>
      <pubDate>Wed, 14 Feb 2007 09:50:35 GMT</pubDate>
      <description>&lt;img src="http://www.griffinbrown.co.uk/images/xml-uk-logo.gif"&gt; 
&lt;br&gt;
&lt;p&gt;
&lt;a href="http://www.xmluk.org/"&gt;XML UK&lt;/a&gt; are holding a day conference entitled entitled
“Publishing 2.0” at &lt;a href="http://www.bletchleypark.org.uk/"&gt;Bletchley Park&lt;/a&gt; on
Wednesday 25 April 2007.
&lt;/p&gt;
&lt;p&gt;
Beyond being an eye-catching title what we (as organisers) intend “Publishing 2.0”
to mean, is that the conference will be examining some of the more cutting-edge applications
of XML(ish) technology to publishing. We're putting together a cracking program which
already includes: 
&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
Leigh Dodds (&lt;a href="http://www.publishingtechnology.com/"&gt;Publishing Technology
plc&lt;/a&gt;) speaking on Content Management with RDF. 
&lt;li&gt;
Richard Kidd (&lt;a href="http://www.rsc.org/Publishing/Journals/"&gt;RSC Publishing)&lt;/a&gt; speaking
on &lt;i&gt;The future of the journal: introducing semantics into chemical science publishing
with the RSC's &lt;a href="http://www.rsc.org/Publishing/Journals/ProjectProspect/"&gt;Project
Prospect&lt;/a&gt;&lt;/i&gt;. 
&lt;/li&gt;
&lt;/ul&gt;
And how cool a venue is &lt;a href="http://en.wikipedia.org/wiki/Bletchley_Park"&gt;Bletchley
Park&lt;/a&gt;? To be in the presence of the ghost of &lt;a href="http://en.wikipedia.org/wiki/Alan_Turing"&gt;Alan
Turing&lt;/a&gt; adds an extra geeky frisson to the occasion. 
&lt;p&gt;
&lt;/p&gt;
&lt;p&gt;
A full programme will be announced shortly, but I confidently predict this event &lt;b&gt;will&lt;/b&gt; sell
out (the venue is limited to 100 people), so to reserve an early space &lt;a href="http://www.xmluk.org/contact-form.php"&gt;contact
XML UK&lt;/a&gt; with your credit card in hand.
&lt;/p&gt;
- Alex.&lt;img width="0" height="0" src="http://www.griffinbrown.co.uk/blog/aggbug.ashx?id=97577d97-a97b-4768-ba27-d16d8fd2c942" /&gt;</description>
      <comments>http://www.griffinbrown.co.uk/blog/CommentView.aspx?guid=97577d97-a97b-4768-ba27-d16d8fd2c942</comments>
    </item>
    <item>
      <trackback:ping>http://www.griffinbrown.co.uk/blog/Trackback.aspx?guid=c9ddcd7c-1208-49cb-a3da-6d12597c78b7</trackback:ping>
      <pingback:server>http://www.griffinbrown.co.uk/blog/pingback.aspx</pingback:server>
      <pingback:target>http://www.griffinbrown.co.uk/blog/PermaLink.aspx?guid=c9ddcd7c-1208-49cb-a3da-6d12597c78b7</pingback:target>
      <dc:creator>Alex Brown</dc:creator>
      <wfw:comment>http://www.griffinbrown.co.uk/blog/CommentView.aspx?guid=c9ddcd7c-1208-49cb-a3da-6d12597c78b7</wfw:comment>
      <wfw:commentRss>http://www.griffinbrown.co.uk/blog/SyndicationService.asmx/GetEntryCommentsRss?guid=c9ddcd7c-1208-49cb-a3da-6d12597c78b7</wfw:commentRss>
      <body xmlns="http://www.w3.org/1999/xhtml">
        <p>
… is the title of a paper I <b>will</b> be giving at <a href="http://xtech.expectnation.com/">The
XTech 2007 Conference</a>, which is to be held in Paris from 15 - 18 May.
</p>
        <p>
The focus of the presentation is the new <a href="http://www.editeur.org/onix_licensing.html">ONIX-PL</a> XML
language for expressing licences electronically, so that they are machine-processable.
In the first instance the licenses being modelled are between publishers and libraries.
</p>
        <p>
This will be a two-hander, with <a href="http://www.franciscave.com/">Francis Cave</a> handling
an explanation of the wider business issues, and me concentrating on how we've used <a href="http://www.orbeon.com">Orbeon
Forms</a> as the basis of a web application for authoring and managing these complex
electronic documents (see <a href="http://www.editeur.org/licensing/Wiley_License_draft_17.xml">a
sample license</a> for an example). Francis has been hard at work on an innovative
way of annotating XML schemas to affect how instances governed by them are rendered
in XForms engines.
</p>
        <p>
Now to write that paper ... Here's the abstract ... 
</p>
        <hr size="1" width="140" />
        <h4>Abstract
</h4>
        <p>
As more and more content is published electronically so the need for controlling access
to it has risen. Early efforts in this field focused on copy-protection technologies
(DRM), but a more enlightened approach emerges if instead content licenses can be <em>agreed</em> between
parties and content then used according to that agreement.
</p>
        <p>
This presentation focuses on the design and system implementations around the new <a href="http://www.jisc.org.uk/uploaded_documents/ONIX_Publisher_License_format.pdf"><span class="caps">ONIX</span>-PL</a> industry
standard (developed by <a href="http://www.editeur.org/">EDItEUR</a> ), for representing
license agreements between content producers and content recipients. Early adopters
of the standard are publishers, libraries and academic institutions wishing to agree
licensing terms for the use of high-value scholarly content. <span class="caps">ONIX</span>-PL
is a high-profile initiative enjoying support from <a href="http://www.jisc.ac.uk/"><span class="caps">JISC</span></a>, <a href="http://www.diglib.org/standards/dlf-erm02.htm"><span class="caps">DLF</span>/ERMI</a> and
a number of US and European universities, commercial publishers and library systems
vendors.
</p>
        <p>
          <span class="caps">ONIX</span>-PL license expressions are <span class="caps">XML</span> document
which are machine actionable. For that reason they need to capture with precise semantics
the implications of the legal clauses they embody.
</p>
        <p>
This presentation will examine the challenges of representing machine-actionable legal
agreements using <span class="caps">XML</span>, and in particular look at the semantic
web technologies considered and used (or rejected) in the <span class="caps">XML</span> model
designs.
</p>
        <p>
Standards and models are of no use if they have no implementation or take up. The
presentation will therefore consider how EDItEUR chose to develop a free Open Source
software application for authoring and managing these complex <span class="caps">XML</span> documents,
and how ultimately a full range of Web 2.0 technologies including XForms, pipelining,
and <span class="caps">AJAX</span> were necessary in consort with more established
technologies such as <span class="caps">XSLT</span>, XHTML and <span class="caps">J2EE</span>,
in order to have a web application that dealt properly with the problem space while
meeting tight development deadlines.
</p>
        <p>
The presentation will this conclude with some real-world tales of software development
and deployment (together with a demonstration) of licenses being created and used
using EDItEUR’s chosen infrastructure technology, <a href="http://www.orbeon.com/">Orbeon
Forms</a> (whose developers the presenters have no affiliation with)
</p>
        <p>
In summary, attendees can expect to learn:
</p>
        <ul>
          <li>
why there is a need for electronic expressions of licenses</li>
        </ul>
        <ul>
          <li>
how <span class="caps">XML</span> and semantic technologies can be used for this purpose</li>
        </ul>
        <ul>
          <li>
what an <span class="caps">XML</span> electronic license expressions looks like ‘for
real’</li>
        </ul>
        <ul>
          <li>
why <span class="caps">XML</span> licenses need to be created by non-technical users</li>
        </ul>
        <ul>
          <li>
how to rapidly develop a web application for them, and the ‘real world’ software development
challenges faced in doing so.</li>
        </ul>
- Alex.<img width="0" height="0" src="http://www.griffinbrown.co.uk/blog/aggbug.ashx?id=c9ddcd7c-1208-49cb-a3da-6d12597c78b7" /></body>
      <title>Electronic Licensing with XML and Web 2.0 Technology</title>
      <guid isPermaLink="false">http://www.griffinbrown.co.uk/blog/PermaLink.aspx?guid=c9ddcd7c-1208-49cb-a3da-6d12597c78b7</guid>
      <link>http://www.griffinbrown.co.uk/blog/PermaLink.aspx?guid=c9ddcd7c-1208-49cb-a3da-6d12597c78b7</link>
      <pubDate>Tue, 13 Feb 2007 13:55:48 GMT</pubDate>
      <description>&lt;p&gt;
… is the title of a paper I &lt;b&gt;will&lt;/b&gt; be giving at &lt;a href="http://xtech.expectnation.com/"&gt;The
XTech 2007 Conference&lt;/a&gt;, which is to be held in Paris from 15 - 18 May.
&lt;/p&gt;
&lt;p&gt;
The focus of the presentation is the new &lt;a href="http://www.editeur.org/onix_licensing.html"&gt;ONIX-PL&lt;/a&gt; XML
language for expressing licences electronically, so that they are machine-processable.
In the first instance the licenses being modelled are between publishers and libraries.
&lt;/p&gt;
&lt;p&gt;
This will be a two-hander, with &lt;a href="http://www.franciscave.com/"&gt;Francis Cave&lt;/a&gt; handling
an explanation of the wider business issues, and me concentrating on how we've used &lt;a href="http://www.orbeon.com"&gt;Orbeon
Forms&lt;/a&gt; as the basis of a web application for authoring and managing these complex
electronic documents (see &lt;a href="http://www.editeur.org/licensing/Wiley_License_draft_17.xml"&gt;a
sample license&lt;/a&gt; for an example). Francis has been hard at work on an innovative
way of annotating XML schemas to affect how instances governed by them are rendered
in XForms engines.
&lt;/p&gt;
&lt;p&gt;
Now to write that paper ... Here's the abstract ... 
&lt;/p&gt;
&lt;hr size="1" width="140"&gt;
&lt;h4&gt;Abstract
&lt;/h4&gt;
&lt;p&gt;
As more and more content is published electronically so the need for controlling access
to it has risen. Early efforts in this field focused on copy-protection technologies
(DRM), but a more enlightened approach emerges if instead content licenses can be &lt;em&gt;agreed&lt;/em&gt; between
parties and content then used according to that agreement.
&lt;/p&gt;
&lt;p&gt;
This presentation focuses on the design and system implementations around the new &lt;a href="http://www.jisc.org.uk/uploaded_documents/ONIX_Publisher_License_format.pdf"&gt;&lt;span class="caps"&gt;ONIX&lt;/span&gt;-PL&lt;/a&gt; industry
standard (developed by &lt;a href="http://www.editeur.org/"&gt;EDItEUR&lt;/a&gt; ), for representing
license agreements between content producers and content recipients. Early adopters
of the standard are publishers, libraries and academic institutions wishing to agree
licensing terms for the use of high-value scholarly content. &lt;span class="caps"&gt;ONIX&lt;/span&gt;-PL
is a high-profile initiative enjoying support from &lt;a href="http://www.jisc.ac.uk/"&gt;&lt;span class="caps"&gt;JISC&lt;/span&gt;&lt;/a&gt;, &lt;a href="http://www.diglib.org/standards/dlf-erm02.htm"&gt;&lt;span class="caps"&gt;DLF&lt;/span&gt;/ERMI&lt;/a&gt; and
a number of US and European universities, commercial publishers and library systems
vendors.
&lt;/p&gt;
&lt;p&gt;
&lt;span class="caps"&gt;ONIX&lt;/span&gt;-PL license expressions are &lt;span class="caps"&gt;XML&lt;/span&gt; document
which are machine actionable. For that reason they need to capture with precise semantics
the implications of the legal clauses they embody.
&lt;/p&gt;
&lt;p&gt;
This presentation will examine the challenges of representing machine-actionable legal
agreements using &lt;span class="caps"&gt;XML&lt;/span&gt;, and in particular look at the semantic
web technologies considered and used (or rejected) in the &lt;span class="caps"&gt;XML&lt;/span&gt; model
designs.
&lt;/p&gt;
&lt;p&gt;
Standards and models are of no use if they have no implementation or take up. The
presentation will therefore consider how EDItEUR chose to develop a free Open Source
software application for authoring and managing these complex &lt;span class="caps"&gt;XML&lt;/span&gt; documents,
and how ultimately a full range of Web 2.0 technologies including XForms, pipelining,
and &lt;span class="caps"&gt;AJAX&lt;/span&gt; were necessary in consort with more established
technologies such as &lt;span class="caps"&gt;XSLT&lt;/span&gt;, XHTML and &lt;span class="caps"&gt;J2EE&lt;/span&gt;,
in order to have a web application that dealt properly with the problem space while
meeting tight development deadlines.
&lt;/p&gt;
&lt;p&gt;
The presentation will this conclude with some real-world tales of software development
and deployment (together with a demonstration) of licenses being created and used
using EDItEUR’s chosen infrastructure technology, &lt;a href="http://www.orbeon.com/"&gt;Orbeon
Forms&lt;/a&gt; (whose developers the presenters have no affiliation with)
&lt;/p&gt;
&lt;p&gt;
In summary, attendees can expect to learn:
&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
why there is a need for electronic expressions of licenses&lt;/li&gt;
&lt;/ul&gt;
&lt;ul&gt;
&lt;li&gt;
how &lt;span class="caps"&gt;XML&lt;/span&gt; and semantic technologies can be used for this purpose&lt;/li&gt;
&lt;/ul&gt;
&lt;ul&gt;
&lt;li&gt;
what an &lt;span class="caps"&gt;XML&lt;/span&gt; electronic license expressions looks like ‘for
real’&lt;/li&gt;
&lt;/ul&gt;
&lt;ul&gt;
&lt;li&gt;
why &lt;span class="caps"&gt;XML&lt;/span&gt; licenses need to be created by non-technical users&lt;/li&gt;
&lt;/ul&gt;
&lt;ul&gt;
&lt;li&gt;
how to rapidly develop a web application for them, and the ‘real world’ software development
challenges faced in doing so.&lt;/li&gt;
&lt;/ul&gt;
- Alex.&lt;img width="0" height="0" src="http://www.griffinbrown.co.uk/blog/aggbug.ashx?id=c9ddcd7c-1208-49cb-a3da-6d12597c78b7" /&gt;</description>
      <comments>http://www.griffinbrown.co.uk/blog/CommentView.aspx?guid=c9ddcd7c-1208-49cb-a3da-6d12597c78b7</comments>
    </item>
    <item>
      <trackback:ping>http://www.griffinbrown.co.uk/blog/Trackback.aspx?guid=ed0f612c-4155-4321-a64f-a9950c33e1ab</trackback:ping>
      <pingback:server>http://www.griffinbrown.co.uk/blog/pingback.aspx</pingback:server>
      <pingback:target>http://www.griffinbrown.co.uk/blog/PermaLink.aspx?guid=ed0f612c-4155-4321-a64f-a9950c33e1ab</pingback:target>
      <dc:creator>Alex Brown</dc:creator>
      <wfw:comment>http://www.griffinbrown.co.uk/blog/CommentView.aspx?guid=ed0f612c-4155-4321-a64f-a9950c33e1ab</wfw:comment>
      <wfw:commentRss>http://www.griffinbrown.co.uk/blog/SyndicationService.asmx/GetEntryCommentsRss?guid=ed0f612c-4155-4321-a64f-a9950c33e1ab</wfw:commentRss>
      <body xmlns="http://www.w3.org/1999/xhtml">
        <p>
… is a presentation that I <b>won't</b> be giving at <a href="http://xtech.expectnation.com/">The
XTech 2007 Conference</a>, as the proposal was not accepted (I will however be speaking
on <a href="http://www.griffinbrown.co.uk/blog/PermaLink.aspx?guid=c9ddcd7c-1208-49cb-a3da-6d12597c78b7">another
topic</a>). Based on my experience of speaking at, and reviewing for, XML conferences
over several years the rejection of this paper surprises me. Maybe XTech really is
losing the XML-focus of its XML Europe past.
</p>
        <p>
I do hope somebody is covering <a href="http://dsdl.org/">DSDL</a>, as the technologies
it contains are important ones that deserve public airing.
</p>
        <p>
Anyway, here's the abstract of the paper that didn't make it:
</p>
        <hr size="1" width="140" />
        <h4>Description
</h4>
        <p>
ISO is expected shortly to standardise three new schema languages as part of DSDL.
Learn about them, and the DSDL project as a whole, in this update.
</p>
        <h4>Abstract
</h4>
        <div>
          <p>
It has recently <a href="http://cafe.elharo.com/xml/relax-wins/">been proclaimed</a> that
“among the <span class="caps">XML</span> cognoscenti, the debate is effectively over.
Everyone is choosing <span class="caps">RELAX NG</span>”. And indeed the early indicators
are that <span class="caps">RELAX NG</span> is getting increasing traction (if still
only being the grammar modelling language of the “cognoscenti”). So for example:
</p>
          <ul>
            <li>
The <span class="caps">W3C</span> have defined <span class="caps">XHTML 2</span>.0
normatively using <span class="caps">RELAX NG</span></li>
          </ul>
          <ul>
            <li>
Microsoft have agreed to have the schemas for Office re-expressed in <span class="caps">RELAX
NG</span> as part of their standardisation effort</li>
          </ul>
          <ul>
            <li>
DocBook 5 is being primarily developed using <span class="caps">RELAX NG</span>.</li>
          </ul>
          <p>
But <span class="caps">RELAX NG</span> is only one part of a 10 part <span class="caps">ISO</span> standard: <span class="caps">DSDL</span> (or
Document Schema Definition Languages, <span class="caps">ISO 19757</span>) aims to
offer a complete family of <span class="caps">XML</span> validation languages, in
which <span class="caps">RELAX NG</span> covers just the specialised area of regular-grammar-based
validation.
</p>
          <p>
The other two fully-standardised parts of <span class="caps">DSDL</span> (Schematron
and <span class="caps">NVDL</span>) are also gaining wider adoption in public <span class="caps">XML</span> models
and in implementations.
</p>
          <p>
But <span class="caps">DSDL</span> is about to include, in their final forms, three
new standards which are currently less well known, even among “the cognoscenti”: <span class="caps">DTLL</span> (Datatype
Library Language), <span class="caps">DSRL</span> (Document Schema Renaming Language)
and Datatype- and Namespace-aware DTDs.
</p>
          <p>
Drawn from real world experience in the <span class="caps">ISO</span> working group,
and in editing and implementing part of <span class="caps">DSDL</span>, this presentation
will include a description of <span class="caps">DSDL</span>, and in particular will
set out the function of the three lesser-known parts which are soon to be standardised.
It will explain why <span class="caps">DSDL</span> as a whole offers an elegant and
complete solution to the problems of <span class="caps">XML</span> validation, and
why users should care.
</p>
          <ul>
            <li>
              <span class="caps">DTLL</span> will introduce data-typing into the validation mix
in a way which overcomes the limitations of <span class="caps">W3C</span> Schema’s
fixed typing scheme. It will allow users to define their own type libraries in elegant
declarative <span class="caps">XML</span>.</li>
          </ul>
          <ul>
            <li>
Influenced by architectural forms, <span class="caps">DSRL</span> acts as a schema
adapter, allowing users to validate <span class="caps">XML</span> as though it were
valid to a schema, by modifying it ‘on the fly’. As such it powerfully supports internationalisation
and content defaulting.</li>
          </ul>
          <ul>
            <li>
Part 9 of <span class="caps">DSDL</span> will retro-fit some of its major features
into DTDs, allowing users with heavy investment in <span class="caps">DTD</span> technology
to get more life of them.</li>
          </ul>
          <p>
In summary, attendees will hear:
</p>
          <ul>
            <li>
A conceptual overview of the need for <span class="caps">DSDL</span> and an appreciation
of the problem space it addresses</li>
          </ul>
          <ul>
            <li>
What the 10 parts of <span class="caps">DSDL</span> are</li>
          </ul>
          <ul>
            <li>
A more detailed description of the three upcoming parts of <span class="caps">DSDL</span></li>
          </ul>
          <ul>
            <li>
Examples and/or demonstrations of these in action</li>
          </ul>
          <ul>
            <li>
A report of progress made in working-group meetings running alongside XTech 2007</li>
          </ul>
          <ul>
            <li>
A roadmap for the completion of the project and details on how to get involved.</li>
          </ul>
        </div>
        <img width="0" height="0" src="http://www.griffinbrown.co.uk/blog/aggbug.ashx?id=ed0f612c-4155-4321-a64f-a9950c33e1ab" />
      </body>
      <title>New XML Schema Languages in DSDL</title>
      <guid isPermaLink="false">http://www.griffinbrown.co.uk/blog/PermaLink.aspx?guid=ed0f612c-4155-4321-a64f-a9950c33e1ab</guid>
      <link>http://www.griffinbrown.co.uk/blog/PermaLink.aspx?guid=ed0f612c-4155-4321-a64f-a9950c33e1ab</link>
      <pubDate>Tue, 13 Feb 2007 13:36:27 GMT</pubDate>
      <description>&lt;p&gt;
… is a presentation that I &lt;b&gt;won't&lt;/b&gt; be giving at &lt;a href="http://xtech.expectnation.com/"&gt;The
XTech 2007 Conference&lt;/a&gt;, as the proposal was not accepted (I will however be speaking
on &lt;a href="http://www.griffinbrown.co.uk/blog/PermaLink.aspx?guid=c9ddcd7c-1208-49cb-a3da-6d12597c78b7"&gt;another
topic&lt;/a&gt;). Based on my experience of speaking at, and reviewing for, XML conferences
over several years the rejection of this paper surprises me. Maybe XTech really is
losing the XML-focus of its XML Europe past.
&lt;/p&gt;
&lt;p&gt;
I do hope somebody is covering &lt;a href="http://dsdl.org/"&gt;DSDL&lt;/a&gt;, as the technologies
it contains are important ones that deserve public airing.
&lt;/p&gt;
&lt;p&gt;
Anyway, here's the abstract of the paper that didn't make it:
&lt;/p&gt;
&lt;hr size="1" width="140"&gt;
&lt;h4&gt;Description
&lt;/h4&gt;
&lt;p&gt;
ISO is expected shortly to standardise three new schema languages as part of DSDL.
Learn about them, and the DSDL project as a whole, in this update.
&lt;/p&gt;
&lt;h4&gt;Abstract
&lt;/h4&gt;
&lt;div&gt;
&lt;p&gt;
It has recently &lt;a href="http://cafe.elharo.com/xml/relax-wins/"&gt;been proclaimed&lt;/a&gt; that
“among the &lt;span class="caps"&gt;XML&lt;/span&gt; cognoscenti, the debate is effectively over.
Everyone is choosing &lt;span class="caps"&gt;RELAX NG&lt;/span&gt;”. And indeed the early indicators
are that &lt;span class="caps"&gt;RELAX NG&lt;/span&gt; is getting increasing traction (if still
only being the grammar modelling language of the “cog