TEI Meeting: Day 1 Recap
This week I’m attending the Text Encoding Initiative‘s annual meeting (TEI@20). The TEI is, at heart, a scholarly effort to develop a tag-set for encoding, or marking up, documents in the humanities. Documents in this case are quite broadly defined to include books, manuscripts, music, even physical objects like gravestones. Once encoded, these digital documents can then be displayed on the web or analyzed and reused in a variety of ways. This tag set, described in “The Guidelines,” is considered extensible, that is, it can be extended to allow for description of specific types of documents, for example, letters. It has had an impact on both other standardized tag sets and even on the creation and development of XML itself. In fact, some of the TEI creators were heavily involved in the creation of XML.
The TEI is 20 years old this year, which means it predates XML, predates the web, and even predates UVM’s connection to the Internet. Despite its age the TEI still manages to be on the cutting edge of digitization efforts. This year the TEI debuts “P5,” a reconceptualization of the tag set and guidelines that takes advantage of recent developments in XML, especially schemas. P5 is even more modular, more open, more flexible than the previous version. It also incorporates more tags, and adjusts some tags to more closely align to developing ISO and W3c standards.
The first day was devoted to a workshop introducing P5. While the overview of new tags was welcome, the most interesting part of the day for me was the all too brief section on the new modular structure of P5 schemas. This has been a source of confusion. A key principle of the TEI is that you should only use those tags that you need. Some tags would be “core” i.e. needed by all documents, while others would be optional. The TEI called this the “Pizza” method (all pizzas need a crust but not all pizzas need pepperoni) and provided tools for choosing which “toppings” should be built into a DTD. This DTD would then control which tags could be used to markup the document. Unfortunately, the DTD itself is not written in XML. Schemas, on the other hand, serve the same function but are written in XML so can be modularized and processed as any XML file–a fact which makes for some interesting possibilities.
With the move to schemas, however, comes a challenge. For example, if you use the XML editor, OxygenXML, it comes with a library of “frameworks” that include both TEI P4 and P5 DTDs and schemas. These schemas are expressed as a series of modules. Choosing which modules to use, and how to combine them, and especially how to arrange them so that you can use them locally and from a server, is not trivial. Fortunately, the TEI has also created ROMA, a tool that allows one to choose which modules are needed, them combines them into a single RELAX NG file. Easy. The rest is just getting familiar with the tag sets. Of course, the current set numbers over 400, so there’s plenty to learn.
-
Archives
- March 2018
- October 2014
- September 2014
- May 2014
- January 2014
- November 2013
- April 2013
- March 2013
- January 2013
- December 2012
- November 2012
- October 2012
- April 2012
- January 2012
- June 2011
- May 2011
- February 2011
- January 2011
- November 2010
- October 2010
- September 2010
- August 2010
- July 2010
- March 2010
- May 2009
- April 2009
- March 2009
- February 2009
- January 2009
- November 2008
- October 2008
- September 2008
- August 2008
- July 2008
- June 2008
- May 2008
- April 2008
- March 2008
- January 2008
- November 2007
- October 2007
- September 2007
- August 2007
- July 2007
- June 2007
- May 2007
- April 2007
- March 2007
- February 2007
- January 2007
- December 2006
- November 2006
- October 2006
- September 2006
- August 2006
- July 2006
- June 2006
- May 2006
- April 2006
- March 2006
- February 2006
- January 2006
- December 2005
- November 2005
- October 2005
- September 2005
- August 2005
- July 2005
- June 2005
- May 2005
- April 2005
- March 2005
- February 2005
- January 2005
- December 2004
- November 2004
- October 2004
- September 2004
- August 2004
- June 2004
- August 1998
-
Meta