Modelling Policy and Pattern: Internationalized Resource Identifier (IRI) Structure, Format, and Ontology Naming Conventions for IDMP

Overview

Namespaces provide the mechanism for grouping objects in a domain. Examples include

  • file systems, which organize files and provide a means of assigning names to those files
  • networks and data distribution schemes, which organize resources and allow naming of those resources according to some protocol

The Semantic Web leverages the web architecture to provide conventions for namespaces and naming via internationalized resource identifiers (IRIs).  An ontology IRI represents the base namespace (identifier) shared by all resources in a vocabulary or ontology.

The IRI structure for ontologies developed for the IDMP project follows recommendations provided in a number of W3C and IETF documents, including but not limited to:

The approach described below recommends the use of both a versioned and not-versioned form of the IRI, each of which are also Uniform Resource Locators (URLs) for the ontology. The non-versioned form of the ontology IRI will always resolve to the latest released version of an ontology, vocabulary, or other resource once that resource has been published, i.e., acting as a non-versioned URL. The versioned form of the IRI will always resolve to the specific version of the ontology published at the URL corresponding to the version IRI. For the IRI syntax, please refer to RFC 3987 from The Internet Engineering Task Force (IETF). The EDM Council infrastructure publishes the ontologies at both the versioned and non-versioned URLs, with content negotiations resolving the latest version of any ontology to the non-versioned URL, as stated above, while retaining every released version of each ontology at its versioned URL.  This approach allows our user community to migrate to a later version when it makes sense for them. It minimizes confusion and provides a clear history for users.  It also ensures that applications don't break when we publish a new version. 

This document provides the normative requirements for all IRIs minted for the IDMP project for consistency, reusability, and maintenance over time. An English language class and property naming scheme was selected after considering a non-human readable, numerical identity scheme (such as the scheme used in the OBO Foundry) to facilitate the use of the ontologies by tools such as UML tools that support diagramming, documentation, and development of related artifacts, and, more importantly, for understanding by non-ontologist users. The following sections define the rules for constructing the IRI and version IRI per OWL 2 specifications (reference below).

The following rules MUST be followed when reviewing this document, these are taken from IETF RFC 2119 (simplified):

  1. MUST: This word means that the definition is an absolute requirement of the specification.

  2. MUST NOT: This phrase means that the definition is an absolute prohibition of the specification.

  3. SHOULD: This word means that there may exist valid reasons in particular circumstances to ignore a particular item, but the full implications MUST be understood and carefully weighed before choosing a different course.

  4. SHOULD NOT: This phrase means that there may exist valid reasons in particular circumstances when the particular behavior is acceptable or even useful, but the full implications should be understood and the case carefully weighed before implementing any behavior described with this label.

  5. MAY: This word means that an item is truly optional. One vendor may choose to include the item because a particular marketplace requires it or because the vendor feels that it enhances the product while another vendor may omit the same item.

Protocol and Authority

All IDMP IRIs MUST be resolvable and refer to a resource that can be retrieved from the internet. For the GitHub work in progress and internal infrastructure process to work properly, the form of the IRI MUST specify the protocol as HTTPS and the authority MUST be a domain administered and owned by the EDM Council for conformance with the governance infrastructure for the project. The authoritative IRI for the ontologies developed for the IDMP project MUST use a single, normative authority, namely spec.pistoiaalliance.org

IRI Path

In accordance with IETF RFC 3987, the Path component of the IRI MUST immediately follow the authority starting with a forward-slash '/', and the Path parts MUST be separated by forward-slashes'/'. The first part of the Path is referred to as the Path Root.

IRI Path Root

The IRI Path Root MUST be /idmp. The Root provides the ability to have documentation and other supporting resources referenced in alternate Root resources, such as /idmp/documentation or /idmp/src. The IDMP governance team MUST designate each root resource for a specified use.  This includes the path root for other published versions of any ontology, such as a SKOS vocabulary, which might be published at /idmp/vocabulary, for example, rather than idmp/ontology, to avoid punning.

Subject Area

It is likely that a number of focus areas will emerge from the body of work produced by the IDMP project.  Such topics may range from standards-specific, to use case specific, to other domain areas that form natural divisions for relevant subject matter.  A subject area represents a domain or area of interest addressed by one or more working groups, and enables further subject-specific organization of sub-topics and modules. Guidelines for establishing high-level subject areas and for organizing sub-topics and ontologies may emerge over time. Currently, the governance infrastructure assumes that the names of subject areas are UPPER CASE alphabetic strings not longer than 6 characters. For phases I and II of the project, goals included publishing ontologies that can be used well beyond the end of the project, thus the following topic areas are currently in use:

CMNS – The ontologies managed under this subject area include patterns that either have been standardized by or are in the draft form but under consideration for standardization by the Object Management Group (OMG), together with a specification for Multiple Vocabulary Facility (MVF) support

EXT - the ontologies managed under this subject area are IDMP project ontologies that reuse the primary ISO ontologies to support use cases developed for the IDMP project . They include extensions to the core ontologies as well as examples, some of which are used for regression testing. These ontologies are the least stable and will continue to evolve over the course of the project. 

ISO – The ontologies managed under this subject area represent content from either (1) exactly one of the relevant ISO standards required for, or used in conjunction with, the project, or (2) a subset of a given ISO standard, such that coverage of that standard will ultimately include multiple ontologies, each of which will be managed under this topic. An example that includes multiple ontologies is the ISO 11238 Substances ontology, which has an extension including the relevant registration authorities for details related to substances., such as the FDA, EMA, WHO, and so forth.

LCC - The ontologies managed under this subject area represent a version of the OMG's Languages, Countries, and Codes (LCC) 1.2 specification that have been modified to reuse the CMNS ontologies. This version of the LCC ontologies will be published in a new LCC 2.0 version of the LCC standards, currently planned for the June 2024 OMG Technical Meeting. They cover the basic ISO 639 language codes and ISO 3166 country codes that have been published by ISO but not as ontologies, through OMG's liaison relationship with the ISO.

META - The ontologies managed under the META subject area support the inclusion of metadata that is IDMP-O specific.

MVF - The ontologies managed under the MVF subject area are part of the formal OMG Multiple Vocabulary Facility standard, which is used in IDMP to represent controlled vocabularies.

SPAR - The ontologies managed under the SPAR subject area are a subset of a library of ontologies developed by the library science community to represent publications of various kinds. The best known of these ontologies, which we are not currently using, is for the FRBR standard. Those in use by IDMP-O enable representation of parts of documents, including tables and the cells contained in them, for use in representing components of the labels on packaging, for example.

For example:

Since that time, the MVP area has been renamed to 'EXT', or 'extensions'. This partition includes some extensions to the units of measure to cover UCUM, as well as our examples.  We anticipate that other subject areas will be identified and added over the course of the work.

Ontology IRI and Version IRI

IDMP ontologies MUST include a versioned and non-versioned IRI. The version IRI MUST use the release date of the version in YYYYMMDD form, such as 20220601 for June 1, 2022. When a versioned IRI is formed, the version (date) MUST appear following the subject area. The non-versioned IRI represents the latest version of the ontology and will dereference to the most current version when published by the EDM Council.

IDMP ontology IRIs MUST follow a slash-style '/' rather than hash-style '#' structure. This approach facilitates server-side processing of content published to the URL corresponding to the ontology IRI as described in https://www.w3.org/TR/2008/NOTE-swbp-vocab-pub-20080828/, which we believe will be essential to a number of the use cases identified for the IDMP project.

Sub-Topic (Optional)

When specified, a sub-topic MUST appear after the version (date) in a versioned IRI or after the subject area in a non-versioned IRI. For example, in LCC, there are two sub-topics, Languages and Countries. There MAY be multiple sub-topics for any subject area. Sub-topics MAY include multiple ontologies. For example:

Ontology Naming and Ontology IRIs

An ontology is a set of related ontological classes, properties, and axioms encoded using a specific serialization format, such as RDF/XML, Turtle, or JSON-LD. A given HTTP server delivers an ontology in a serialization using the HTTP/1.1 Accept header of the request.  See Section 14 of IETF RFC 2616. The serialized representation is referred to as an ontology file. Following the subject and sub-topic resource locations, in a non-versioned IRI, the ontology name MUST be given without extension as follows:

The ontology name MUST be in Upper Camel Case, each word capitalized with no separation between words, with the exception of ISO standard names as specified below. All acronyms MUST be spelled out except when in the dictionary, like RADAR, or for those that are approved by the governance team such as ISO, as in the examples given above.  The ontology name MUST NOT have any extensions in the IRI (e.g., <OntologyName>.owl). Subsequent parts of the IRI, such as class and property names, MUST be separated from the file name by a forward slash '/' and the IRI MUST end with a forward slash' /'.  owl:imports and rdf:resource references MUST use the IRI with a trailing '/'.

Ontology names for the relevant ISO standards for the IDMP project are structured as follows: ISO<ISO standard number>-<UpperCamelCase name excerpted from the name of the standard>.  For example:

  • ISO21090-HarmonizedDatatypes, corresponding to ISO 21090 Health Informatics – Harmonized datatypes for information interchange
  • ISO11240-UnitsOfMeasurement, corresponding to ISO 11240, Health informatics - Identification of medicinal products - Data elements and structures for the unique identification and exchange of units of measurement

In cases where the standards are quite large, and there is a need to subdivide the content into multiple ontologies, the ontology name MUST be specified as: ISO<ISO standard number>-<UpperCamelCase name excerpted from the name of the standard>-<UpperCamelCase sub-topic name>.

Care should be taken to limit the ontology name to fewer than 255 characters (including the names of the classes, properties, and nominals contained in those ontologies) to allow for implementation across the broadest number of technologies and approaches.