Pattern: Controlled Vocabularies and Reference Code Lists - REVIEW
Introduction
The IDMP Ontology provides the semantic concepts and model structures that enable capturing product information in a consistent and harmonized manner. In the data harmonization process, a reoccurring task is to integrate and map Reference Code Lists from different authorities, e.g., the SPOR RMS code lists, EDQM lists, and internal lists.
The pattern described below should ensure that we have a common approach for integrating code lists so that IDMP-O implementers can more easily exchange and map them.
Related Competency Questions
- What are the release characteristics of <pharmaceutical product>?
- Where is <pharmaceutical product> administered?
Relation to ISO-IDMP Standards
The ISO IDMP Standards mostly describe the classes (e.g., "release characteristic" or "intended site", "route of administration"). Only in few cases do the ISO IDMP standards also provide reference entities (e.g., for the ingredient roles). The individual codes defined in the ISO standards are considered "nominals" with respect to the ontology, however, and not externally managed controlled vocabularies / code lists. A nominal is an instance that is fundamental to the ontology itself, rather than reference information, for example. For example, the IDMP standards themselves list a number of codes for ingredient roles, as described below. The instances for these role codes are included directly in the ontology, and are considered nominals. Thus there are two patterns provided herein: (1) the pattern for nominals that are defined in the ISO standards that are not jurisdiction-specific, and (2) the pattern for controlled vocabularies that may or may not be jurisdiction-specific. SPOR controlled vocabularies / code lists, for example, are specific to EMA's jurisdiction, although they may also be used by others. FDA and US National Institutes of Health (National Cancer Institute, National Library of Medicine, etc.) are US jurisdiction specific but may be used elsewhere.
Modeling Patterns
Representation of Nominals (Code Elements / Classifiers) Defined in an ISO IDMP Standard or Implementation Guide
There are a number of examples of controlled vocabularies and related codes in the ISO IDMP standards. These vocabularies typically include some number of values specified in the body of a given standard, representing a value domain from the perspective of ISO 21090, and a controlled vocabulary as defined in ISO 11238. Required elements for any such controlled vocabulary include: (1) a language - all controlled vocabularies that are included in the ISO standards are specified in English, (2) a description in the form of a text value, (3) a URI identifying the controlled vocabulary or code set (in this case the URI of the controlled vocabulary defined within the ISO standard(s)), and (4) a textual name for the vocabulary. Optional elements include the most recent publication date and version of the vocabulary, though for those that are defined in the body of the standard, the date and version correspond to those of the standard.
For example, clause 5.9 in ISO 11238 specifies a set of conformance annotations that apply to various elements in ISO 11238 and ISO/TS 19844. These include:
— Mandatory: Defining elements necessary for the unique identification of Substances and Specified Substances per the ISO IDMP standards/technical specifications.
— Conditional: Conditional applies to the 'within category' data elements, as applicable, when there are alternative data sources for a given data element(s) to identify a Substance/Specified Substance. Regional implementation of the ISO 11238 and ISO/TS 19844 may elevate the conditional conformance categories to 'mandatory' per regional requirements.
— Optional: When listed at the category level (e.g. Specified Substance), optional corresponds to ISO categories or data elements that are not absolutely necessary for the unique identification of Substances/Specified Substances as per ISO 11238. Regional implementation of ISO 11238 and ISO/TS 19844 may elevate the optional conformance categories to 'mandatory' or 'conditional' per regional requirements.
Because this property applies so broadly, we have defined a specific annotation property in the ISO 11238 substances ontology for it, together with a corresponding class and set of nominal values corresponding to the above definitions. The pattern used to represent these is given in Figure 1, below.
Figure 1. Representing ISO IDMP Internal Controlled Vocabularies, Code Sets and Elements
Note that while this particular case is more of a "meta controlled vocabulary", the pattern holds true for other true controlled vocabularies called out in the ISO IDMP standards. One good example is the ingredient role pattern covered in Pattern: Representing Ingredients - APPROVED. The extended pattern including the details related to expressing the controlled vocabulary are included in Figure 2, below.
Figure 2. Representing an ISO IDMP Internal Controlled Vocabulary for Ingredient Roles
Note that while the relationships shown between a code element and it's type and relationships with the controlled vocabulary are only shown for ACTIM, the same relationships are required for all of the code elements that are included in the controlled vocabulary. These relationships were elided for the sake of readability only.
Representing an ISO IDMP Internal Controlled Vocabulary
Every individual of type controlled vocabulary, code set, and classification scheme has the relationships and attributes specified in Table 1, below.
Table 1. Controlled Vocabulary Detail
Class | Relationship / Attribute | Value | URI | Required/Optional |
---|---|---|---|---|
Classification Scheme | cmns-cls;ClassificationScheme | required | ||
has method | literal | optional | ||
Code Set | cmns-cds;CodeSet | required | ||
has member | Code Element | cmns-col;hasMember, cmns-cds:CodeElement | optional | |
has name | Code System Name | cmns-dsg:hasName, idmp-dtp;CodeSystemName | optional | |
has version | string value | idmp-dtp;hasVersion | optional | |
Controlled Vocabulary | idmp-sub;ControlledVocabulary | required | ||
has textual name | string value | mvf:hasTextualName | required | |
has description | string value | cmns-dsg:hasDescription | required | |
has URI | anyURI value | mvf:hasURI | required | |
has arrangement | Arrangement | cmns-col;hasArrangement, cmns-col;Arrangement | optional | |
has language | Language | mvf:hasLanguage, lcc-lr;Language | required | |
has vocabulary entry | Vocabulary Entry | mvf:hasVocabularyEntry, mvf:VocabularyEntry | optional | |
is used by | Community | cmns-cxtdsg;isUsedBy, mvf;Community | optional | |
imports | Vocabulary | mvf;imports, mvf;Vocabulary | optional |
Controlled vocabularies in IDMP may have all of the relationships and attributes listed in the table.
Representing Internal Controlled Vocabulary Elements
Every individual that is a member of an IDMP controlled vocabulary has the relationships and attributes specified in Table 2, as a minimum, and may have others that are vocabulary specific.
Table 2. Controlled Vocabulary Element Detail.
Class | Relationship / Attribute | Value | URI | Required/Optional |
---|---|---|---|---|
Classifier | cmns-cls;Classifier | required | ||
classifies | something | cmns-cls;classifies | optional | |
is defined in | Classification Scheme | cmns-cls;ClassificationScheme | optional | |
Code Element | cmns-cds;CodeElement | required | ||
denotes | something | cmns-dsg;denotes | required* | |
is member of | Code Set | cmns-col;isMemberOf, cmns-cds;CodeSet | required | |
has text value | string | cmns-txt;hasTextValue, xsd;string | optional | |
Concept Descriptor | idmp-dtp;ISO21090-ConceptDescriptor | required | ||
is characterized by | Coding Rationale | cmns-cls;isCharacterizedBy, idmp-dtp;CodingRationale | optional | |
has source descriptor | Concept Descriptor | idmp-dtp;hasSourceDescriptor, idmp-dtp;ISO21090-ConceptDescriptor | optional | |
has translation | Concept Descriptor | idmp-dtp;hasTranslation, idmp-dtp;ISO21090-ConceptDescriptor | optional | |
comprises | Value Domain | cmns-col;comprises, idmp-dtp;ValueDomain | optional | |
has display name | string | idmp-dtp;hasDisplayName, xsd;string | optional | |
has original text | string | idmp-dtp;hasOriginalText, string | optional |
*While denotes is required, typically this element is implemented by using its inverse, stating that something "is signified by" some code element.
These elements are in addition to the required metadata for any content as specified in Pattern: Metadata and Annotations - APPROVED. Note that in cases where the set of code elements is complete, we also add an equivalent class enumerating the individuals that have been defined for the classifier. For ingredient role, above, for example, we have a complete list. This is not the case for all controlled vocabularies included in the IDMP specifications, particularly for those that have examples but not a specified list of valid values. In those cases, because we do not know whether other values might be relevant, it would not be appropriate to add an equivalence relationship.
For the example given with respect to ingredient roles, there are three kinds of things needed in the ontology: (1) an individual for the controlled vocabulary, (2) the classifier, which is used to type all individuals, and (3) the individuals themselves. We may also need a property linking something to the code, and in some cases, a separate concept for the element that has a code from the concept descriptor for the code element. Typically, for relating a code to the thing that denotes we use the Commons cmns-dsg;isSignifiedBy property, although in a few cases, where the element is used throughout the ontology, such as in this conformance case, it may be useful to have a custom property.
The definition for the vocabulary is included below.
The definition of the classifier, in this case for ingredient role codes, is as follows. Note that in this case we have included an equivalence to the individuals that are members of the class, as the set is complete per the IDMP standard.
The definition of the each of the relevant individuals that are members of the vocabulary is shown below for one of the members.
In some cases we may also defined a separate property that can be used to reference these individuals. In the case of the controlled vocabulary for conformance level, we have created an annotation property (i.e., metadata about something) rather than an object or data property. The typical usage would be an object property, however, using cmns-dsg;denotes to point from a code to the concept it refers to, rather than the other way around, and using SPARQL inverse queries or reasoning to get all of the codes for a given concept.
Representation of External Controlled Vocabularies and Code Lists
In cases where a given controlled vocabulary and/or code list is external to the ISO IDMP standards, we reuse and extend the pattern described above. The primary extensions are related to the individual code elements, and other extensions may be required depending on the source.
Representing an External Controlled Vocabulary
The pattern for representing the external vocabulary is identical to representing an internal vocabulary, although certain elements, such as the code system name, are required per the ISO IDMP standards. The extended representation of a "code system" as defined in ISO 11240, clause 4.6.2.3 and Table 3, including its mapping to the IDMP ontology, is given in Table 3, below. This definition of code system is a subclass of classification scheme, code set, and controlled vocabulary, and inherits all of the properties and their values defined in Table 1, above. The properties given below are in addition to those given in Table 1, in other words, including a refined required restriction with respect to the version.
Table 3. External Controlled Vocabulary / Code System Detail
Class | Relationship / Attribute | Value | URI | Required/Optional |
---|---|---|---|---|
Code System | idmp-dtp;CodeSystem | required | ||
is identified by | Identifier | cmns-id;isIdentifiedBy, cmns-id;Identifier | required | |
has name | Code System Name | cmns-dsg;hasName, idmp-dtp;CodeSystemName | optional | |
has name | Code System Full Name | cmns-dsg;hasName, idmp-dtp;CodeSystemFullName | optional | |
is described by | something | cmns-dsg;isDescribedBy | optional (individual containing a formal description with metadata as an individual) | |
has description | Text | cmns-dsg;hasDescription (xsd;string or rdf;langString) | optional (used if only a string value or language-tagged string value is needed) | |
has version | string | idmp-dtp;hasVersion | optional | |
copyright | string | cmns-av;copyright | optional but recommended for external vocabularies | |
issued | date time | dct;issued | optional, indicating the date, in ISO 8601 date time format, that the resource was originally issued | |
license | anyURI | dct;license | optional | |
modified | date time | dct;modified | optional, indicating the date, in ISO 8601 date time format, that the resource was most recently revised | |
rights | string | dct;rights | optional (used to provide additional access rights information) |
Note that not all controlled vocabularies have such an identifier. some examples that do include the datatype definitions given in ISO 21090 for things like "null flavor", which has an official OID. Some vocabularies such as those in EMA SPOR have URIs that identify them. In each case, a named individual may be needed to represent the identifier for the vocabulary in order to properly implement the pattern. If that identifier is an IRI, then the actual IRI for that vocabulary should be used to name the individual. There may be some challenges in doing this for EMA SPOR data due to the malformation of their URIs, which hopefully they will address in the near future.
Also, note that ISO 11240 requires a Code System Name, Code System Full Name, and a description in addition to an identifier for code systems for the representation of units of measure. Other details are optional.
For EMA SPOR repositories, for example, there are multiple layers, including at the top level, simply EMA SPOR (substance, product, organisational and referential) data services repository, and within SPOR,
- Substance Management Services (SMS)
- Product Management Services (PMS)
- Organisation Management Services (OMS)
- Referentials Management Services (RMS)
Each of these individual repositories includes numerous individual controlled vocabularies. For example, the SPOR Referentials registry is defined as follows.
The named individual for the corresponding services is defined as given below.
The overall classification scheme for the set of controlled vocabularies included in the SPOR Referentials registry is:
Further, and as suggested by the generic classification scheme for SPOR RMS given above, there are many controlled vocabularies, or term lists. Each of these lists must be represented as both a controlled vocabulary that is part of SPOR and as a vocabulary term mapped to the concept in the core IDMP-O ontology for which it provides valid values, including an identifier for the controlled vocabulary/term list and possibly other details such as a name. This is similar to but not exactly the notion of a coded concept in ISO 11239.
These individual controlled vocabularies, or lists as they are called in SPOR RMS, are named individuals of the following class:
One example is that of data classification. Data Classification is a SPOR concept that is used as a "meta concept" and set of terms to describe accessibility levels for other vocabulary elements in SPOR. We have explicitly encoded this and two other concepts, Domain, and Record Status, in order to facilitate automatic encoding of other SPOR vocabularies. The individual representing the data classification controlled vocabulary is as follows:
Representing External Controlled Vocabulary Terms and Codes
Representation of the terms in a controlled vocabulary may be informal or require increasing levels of detail, depending on the vocabulary or system and how it is used. At a minimum, the elements defined as required or optional in Table 2, above, apply to external vocabulary or code elements.
For example, terms in the SPOR RMS individual controlled vocabularies have additional attributes that are in some cases required and in others optional, as shown below.
One such term, corresponding to the controlled vocabulary for data classification, is given below.
For more details with respect to how to represent the content of EMA SPOR controlled vocabularies, see Pattern: Representing Controlled Vocabularies for EMA SPOR Referentials - REVIEW.
Coded Concepts and Code Term Pairs
In addition to the details shown above, some systems refer to the concept to which a vocabulary element or code applies as a "coded concept". This term, "coded concept" appears in ISO 11240, with respect to units of measure. The implementation guide for ISO 11615 (ISO/TS 20443) uses the word "coded" in reference to the code rather than the concept it reflects (as a CV datatype, which does not exist in ISO 21090 - it should be CD, which corresponds to a concept descriptor per the above and appears to refer to the code not the concept). An additional datatype that extends the notion of a code in ISO 21090 is CD.CV, which refers specifically to a date / time element, not to a concept descriptor (code or controlled vocabulary element) in general, and is not used in the ISO IDMP standards.
Another use of the term "coded concept" appears in ISO 11239, and is used in cases where there may be pairs of codes and terms, typically used for mapping and alignment purposes. This is a special case, though it occurs regularly in EDQM controlled vocabularies.
A high-level example of a concept that may have multiple terms / codes related to it from various systems is illustrated in Figure 3, below. "basic dose form" is a class in IDMP-O and the elements from the 3 different code lists instantiate the class.
Figure 3. Example Coded Concept for Basic Dose Forms.
In the diagram, the notion of a basic dose form is associated with several controlled vocabularies - from EQDM, from EMA SPOR, and possibly others. Each of the lists shown in the diagram represents a controlled vocabulary / code system, and each of the elements in a given list represents a code that is used to signify the value of the 'basic dose form' in some context.
Individual classes, such as basic dose form, may be extended with classes that include something from the controlled vocabulary, such as 'SPOR RMS basic dose form' in a SPOR-specific / jurisdiction-specific ontology, if additional features are needed to support that concept's definition. Each individual representing a term corresponding to the class in the IDMP ontology should be named <concept>-<individual name>, or <context>-<concept>-<individual name>, including the hyphen separator, unless, such as in the case of the SPOR individuals, there is a unique IRI for that individual. If not, then the naming pattern, <context>-<concept>-<individual name>, would use <context> to provide the name of the vocabulary, (e.g., SPOR-RMS, <concept> to provide the class name (e.g., BasicDoseForm), and then<individual name> (e.g., Tablet), resulting in an individual named 'SPOR-RMS-BasicDoseForm-Tablet'. Qualifying the name to include the source vocabulary ensures uniqueness. Additional policies for naming and labeling are provided in Pattern: Metadata and Annotations - APPROVED. Note that an rdfs:label, which is human-readable, is required for every term. Labels in SPOR are language tagged, and may be associated with additional status and translation information, and thus are represented as individuals rather than as string literals, but otherwise must be unique. Additional labels may be used as appropriate, such as skos:prefLabel, skos:altLabel, but there must be at least one rdfs:label and only one per language.
Further extensions may be required for vocabularies such as EDQM. This particular case is described in more detail at Pattern: Representing Controlled Vocabularies Specified by the European Directorate for the Quality of Medicines & Healthcare (EDQM) - DRAFT.