Standards work of Steve DeRose

This page was written by Steven J. DeRose a long time ago, and was last updated on 2003-04-14.

This page lists some of the standards I have worked on in various ways, with links to online information about those standards. On some of these I have been very deeply involved; on others, only peripherally, as kibbitzer or reviewer (formal or otherwise).



There are many forums in which standards are developed (see here for a partial list); many are industry-specific; several are broader, such as ISO, IEEE, and the Web Consortium.

Open Scripture Information Standard

I currently chair the Bible Technology Group, a joint effort of the Society of Biblical Literature and the American Bible Society. BTG develops standards, tools, and texts to support online use of the Bible for scholarly, devotional, and many other purposes.

Discipline and industry-specific DTDs

Text Encoding Initiative

The TEI is the primary tag set used in humanities and social science document encoding. It was deveoped by a large international team of scholars, sponsored by the Association for Computational Linguistics, the Association for Computing and the Humanities, and the Association for Literary and Linguistic Computing, and supported by a variety of grants from NEH, the Mellon foundation, and others. I was a member of the Text Representation Committee, and chaired the working group responsible for hypermedia. I also served as liaison to the ISO SGML and HyTime committees, and worked on the initial revision of the Writing System Declaration.

TEI Extended Pointers

Among the many contributions of the TEI to other standards efforts, TEI extended pointers provide a very compact, readable, yet powerful descriptive mechanism for addressing ID-less nodes in documents. A tutorial is available by Lou Burnard, co-editor of the TEI. This syntax has been adopted with minor modifications to develop XPointer, a W3C project to produce a vastly enhanced linking capability for the Web.

Among the advantages over present HTML linking are the ability to have multi-ended links, links that point to specific locations within documents (HTML lnks can do this only if the destination happens to have an explicit ID attribute); links that you can add which originate in documents you cannot modify; the ability to treat collections of links as databases in their own right; added link semantics such as expand-in-place, transclusion, and window management, and much more.

Encoded Archival Description

The EAD DTD was developed in the Archival community as a means to encode records cataloging archives, personal papers, manuscripts, and similar data. I had the great pleasure of spending a week working with an extraordinarily friendly and brilliant group of archivists under a grant from the Bentley Historical Library in Michigan, to develop this DTD. They have carried on much farther since, and you can find many archives appearing online using this DTD. The Society of American Archivists' recently awarded the C.F.W. Coker prize to the EAD team for this work (illustration)

I have an article in American Archivist that was originally a talk given at a meeting at Berkeley very early on in the EAD process. Berkeley was also one of the first recipient of the Inso (nee EBT) University Grant Program, and you can see some of the Berkeley Library's work online

ISO 12083

ISO 12083 is a much enhanced derivative of the famous AAP DTD for books, articles, and similar publication genres, one of the earliest SGML DTDs. As part of my TEI work I wrote a lengthy evaluation of the AAP DTD and its conventions, and this somehow found its way into the ISO 12083 process, with the result that I was asked to do a relatively formal review of the 12083 draft.

Pinnacles

This DTD was a grand triumph of cooperation in the semiconductor industry for representing interoperable spec sheets, though it is dauntingly complex for us non-EE types.

Metafile for Interactive Documents

MID provides some mechanisms for building highly scripted documents; almost like a small programming language bound into SGML syntax.


Web Consortium recommendations

XML

XML provides a very simple, trivial to implement subset of SGML, that nevertheless retains what many consider the most important characteristics: the ability to define new tag sets as needed. This makes XML (like SGML) quite radically different from specific tag sets such as HTML, DocBook, etc. I am a charter member of the XML Working Group. XML received formal approcal as a W3C Recommendation in February of 1998. In August, the WG was re-formed into a W3C Coordination Group, working with multiple WGs that now handle XML-related standards.

XML Linking

I have been co-editor or editor of most of the XML-related hypertext linking standards at one time or another. The XML Linking Working Group was originally part of the XML WG, but was formally made a separate Working Group in August of 1998, chartered to develop more advanced addressing and linking capabilities for enhancing the hypertext functionality of XML documents on the Web. This effort had an increasing number of parts:

XPath

XPath provides simple mechanisms for pointing to particular locations in XML documents, either by counting tree nodes, or by characterizing nodes by properties such as their types, attributes, and neighbors. This allows linking to anyplace in an XML document that you want to, rather than being able to link only to the whole document or to a place the author specifically provided, as in HTML. XPath was developed by unifying much of the functionality that had been independently developed in XSL and XPointer; James Clark and I did the bulk of this unification work, and the XPath spec was published as a joint effort of the two WGs.

XPointer Framework

This spec provides internal structure for the "fragment identifier" portion of URIs (technically, a URI with a fragment identifier is a "URI reference"). The values that can go there (after the "#") vary by the MIME media type of the file returned -- and few types other than HTML and SVG even define a meaning. The XPointer Framework gives a way to create new named schemas for fragment identifiers, so browsers can pick one appropriate to the type of information returned. Because of "content negotiation", asking for an HTML file doesn't guarantee you'll get one. The syntax allows for a bare name as in HTML, or for a sequence of function-like scheme invocations:

   http://example.com/docs/foo.html#element(3/4/2)xpointer(id('chap2')) hotspot(12,20,40,52)

XPointer element() scheme

This enables pointing to any whole element in any XML document, even when it doesn't have an ID. For example, you can append this to a URL (after the usual "#") in order to link to the second child of the fourth child of the element with ID 'foo':

foo/4/2

This has the enormous advantage of letting you link to any element you want, without having to be able to modify the document you're linking to to add an anchor point to link to. It is also utterly trivial to implement. On the other hand, it has limitations: A small one is that it cannot point to anything but elements, such as attributes, comments, processing instructions, etc. A bigger one is that pointers can end up pointing to the wrong place if the target document is edited (this is built into the architecture of the Web, and there is no way to avoid it in general, though some pointing methods are better than others: for example IDs are far more stable than byte offsets, with the element() scheme's node numbers somewhere in between). Perhaps the biggest limitation of the element() scheme is that it cannot point to the usual things users link to in documents: selections. Most selections are not whole nodes; to link to them require the xpointer() scheme instead.

XPointer xmlns() scheme

XPointer xpointer() scheme

This enables pointing to any selection in any XML document. Thus, it is the level of functionality required for true hypertext applications, as well as for specific applications such as collaborative authoring and editing, annotations, and so on. A couple companies have suggested that it is unimplementable -- this is quite simply wrong. There are several implementations, and every browser already implements the notion those companies found troublesome: selections. Every time you cick, or drag-select some text in a browser, the browser proves it supports selections, not just nodes. Prof. Fabio Vitali at the University of Bolgna has a team that has implemented it twice, and made the code public.

For example:

 id(chap5).child(1,SEC).child(2,LIST) 

Xpointer builds directly on an already-standard syntax: "Extended Pointers," developed by the Text Encoding Initiative (TEI)). TEI extended pointers have been around for a while, and are widely used and implemented. XPointer has also now been implemented, and feedback from developers indicates it is working well even in its current Working Draft form. The expected changes before Proposed Recommendation are small: cleaning up and strenghtening the constraint syntax by using standard boolean expression syntax, and allowing "child" to be omitted as the default, since (after ID) it will likely be the most commonly used addressing form. There is also work under consideration to allow appending a checksum to the XPointer, for verification that the target found actually matches what it was when the link was made.

XLink

builds on HyTime and other hypermedia models to provide a wide variety of hypermedia capabilities not otherwise available on the Web, such as multi-directional and external linking. I am co-editor of the XLink specification.


ISO standards

SGML Open Specifications



Back to home page of Steve DeRose or The Bible Technologies Group. or The Bible Technologies Group Working Groups. Or, contact me via email (fix the punctuation).