ISO 25964 “Thesauri and interoperability with other vocabularies” is the first thesaurus standard explicitly featuring a data model for monolingual and multilingual thesauri, recommendations for exchange formats and protocols and an XML schema. Part 1: “Thesauri for information retrieval”, officially numbered ISO/DIS 25964-1, has been released as a draft available for public comment until the end of February 2010 (the ballot will end March 26).
Clause 15 describes the data model underpinning a XML schema, presented both using UML (Fig. 15) and in tabular format. The full range of options described in the standard is accommodated. It models logically the underlying structure of thesaurus data, not necessarily representing the way data is held within a given computer system. The five main classes appearing are Thesaurus, ThesaurusConcept, ThesaurusTerm, ThesaurusArray and Note. How to see the text and to comment on it, has been described in an earlier blog message .
The XML schema for data exchange, derived from the data model, is included in the standard as an informative appendix (Annex B), and is available free of charge at . The schema may be used for electronically transmitting a whole thesaurus or portions of a thesaurus. Everybody is invited to give feedback on the draft schema by advancing from that page to the comments page for the schema and clicking on the “Add a Comment” link there. In addition, a test XML document using the schema is available as well. All comments will be open for public viewing. Links to the schema are also provided at the ISO 25964 public project page .
One of the predecessors to the new thesaurus standard is BS 8723. A data model and XML schema developed in the context of this standard and documented in Part 5 of the standard (BS 8723-5), dealing with exchange formats and protocols for interoperability, is freely available for comparison at .