Candidate Hygiene Policies
Starting from a list that Mike Uschold provided. The idea is to discuss these, and move the ones we like into an accompanying page of Confirmed Hygiene Policies.
The following is a list of things that are or are likley to be NOT OK.
P1. Creating polysemous elements
An ontology element whose name has different meanings is included in the ontology to represent more than one conceptual idea. For example, the class “Theatre” is used to represent both the artistic discipline and the place in which a play is performed.
The FIBO policy for treating polysemes is to create separate concepts for each polyseme and differentiate them via (1) different names (avoiding numeric extensions on the same name in favor of longer or more accurate names), and (2) different definitions, including distinguishing restrictions, to the degree possible.
The only case where names are allowed to be duplicated (i.e, where the name used for a concept in one namespace is used for a different concept in another namespace) is for extension purposes, i.e., a concept in a primary FIBO ontology may be extended in a namespace that imports it, either in the original namespace or in a subordinate namespace as appropriate. Otherwise, FIBO users should assume that even though OWL does not impose a unique names assumption, FIBO policy does.
P2. Creating synonyms as classes
The FIBO policy for treating synonyms is to create a single concept (class) in the relevant FIBO ontology for that concept, and when required, add an annotation (fibo-fnd-utl-av:synonym) to augment the definition for that concept with an additional synonymous term.
Often, modelers who are not familiar with the FIBO policy will create several classes whose identifiers are synonyms and define those as as equivalent. As an example we could define “Car”, “Motorcar” and “Automobile” as equivalent classes. Another example is to define the classes “Waterfall” and “Cascade” as equivalents. This is considered to be a bad practice from a FIBO perspective, given the goal of developing a definitive ontology that can be used for RDARR compliance, among other use cases.
The serializer should identify any case where equivalences relate two named classes for review by the leadership team, and only those that are identified that do not violate this policy should be allowable.
P3.Creating the relationship “is” instead of using ''rdfs:subClassOf'', ''rdf:type'' or ''owl:sameAs''
The “is” relationship is created in the ontology instead of using OWL primitives for representing the subclass relationship (“subclassOf”), the membership to a class (“instanceOf”), or the equality between instances (“sameAs”). An example of this type of pitfall is to define the class “Actor” in the following way ‘Actor ≡ Person ∩ ∃interprets.Actuation ∩ ∃is.Man’.
P4. Creating unconnected ontology elements
Ontology elements (classes, relationships or attributes) are created with no relation to the rest of the ontology. An example of this type of pitfall is to create the relationship “memberOfTeam” and to miss the class representing teams; thus, the relationship created is isolated in the ontology.
P5. Defining wrong inverse relationships
Two relationships are defined as inverse relations when they are not necessarily. For example, something is sold or something is bought; in this case, the relationships “isSoldIn” and “isBoughtIn” are not inverse.
P6. Including cycles in the hierarchy *
A cycle between two classes in the hierarchy is included in the ontology, although it is not intended to have such classes as equivalent. That is, some class A has a subclass B and at the same time B is a superclass of A. An example of this type of pitfall is represented by the class “Professor” as subclass of “Person”, and the class “Person” as subclass of “Professor”.
FIBO policy strictly forbids cycles among ontologies. Cycles are not only considered poor modeling practice, but can make mapping an ontology to a data model or other artifact impossible. The FIBO ontologies are designed at a conceptual level, but are intended to support meeting Basel III RDARR and other data governance regulations, and therefore must be designed to support mapping to various application and repository standards. Cycles would prohibit such mappings.
P7. Merging different concepts in the same class
A class is created whose identifier is referring to two or more different concepts. An example of this type of pitfall is to create the class “StyleAndPeriod”, or “ProductOrService”.
P8. Missing annotations *
Ontology terms lack annotations properties. This kind of properties improves the ontology understanding and usability from a user point of view.
P9. Missing basic information: needed information is not included in the ontology
Sometimes this pitfall is related with the requirements in the ORSD that are not covered by the ontology. Other times it is related with knowledge that could be added to the ontology in order to make it more complete. An example of this type of pitfall is to create the relationship “startsIn” to represent that the routes have a starting point in a particular location; and to miss the relationship “endsIn” to show that a route has an end point. Another example is to create the relationship “follows” when modelling order relations; and do not create its inverse relationship “precedes”.
P10. Missing disjointness
The ontology lacks disjoint axioms between classes or between properties that should be defined as disjoint. For example, we can create the classes “Odd” and “Even” (or the classes “Prime” and “Composite”) without being disjoint; such representation is not correct based on the definition of these types of numbers.
P11. Missing domain or range in properties
Relationships and/or attributes without domain or range (or none of them) are included in the ontology. There are situations in which the relation is very general and the range should be the most general concept “Thing”. However, in other cases, the relations are more specific and it could be a good practice to specify its domain and/or range. An example of this type of pitfall is to create the relationship “hasWritten” in an ontology about art in which the relationship domain should be “Writer” and the relationship range should be “LiteraryWork”.
P12. Missing equivalent properties
When an ontology is imported into another, classes that are duplicated in both ontologies are normally defined as equivalent classes. However, the ontology developer misses the definition of equivalent properties in those cases of duplicated relationships and attributes. For example, the classes “CITY” and “City” in two different ontologies are defined as equivalent classes; however, relationships “hasMember” and “has-Member” in two different ontologies are not defined as equivalent relations.
P13. Missing inverse relationships
This pitfall appears when a relationship (except for the symmetric ones) has not an inverse relationship defined within the ontology. For example, the case in which the ontology developer omits the inverse definition between the relations “hasLanguageCode” and “isCodeOf”, or between “hasReferee” and “isRefereeOf”.
P14. Misusing ''owl:allValuesFrom''
This pitfall can appear in two different ways. In the first, the anomaly is to use the universal restriction (“allValuesFrom”) as the default qualifier instead of using the existential restriction (“someValuesFrom”). This means that the developer thinks that “allValuesFrom” implies “someValuesFrom”. In the second, the mistake is to include “allValuesFrom” to close off the possibility of further additions for a given property. An example of this type of pitfall is to define the class “Book” in the following way ‘Book ≡ ∃producedBy.Writer ∩ ∀uses.Paper’ and closing the possibility of adding “Ink” as an element used in the writing.
P15. Misusing “not some” and “some not”
To mistake the representation of “some not” for “not some”, or the other way round. An example of this type of pitfall is to define a vegetarian pizza as any pizza which both has some topping which is not meat and also has some topping which is not fish.
P16. Misusing primitive and defined classes
To fail to make the definition ‘complete’ rather than ‘partial’ (or ‘necessary and sufficient’ rather than just ‘necessary). It is critical to understand that, in general, nothing will be inferred to be subsumed under a primitive class by the classifier. This pitfall implies that the developer does not understand the open world assumption.
P17. Specializing too much a hierarchy
The hierarchy in the ontology is specialized in such a way that the final leaves cannot have instances, because they are actually instances and should have been created in this way instead of being created as classes. An example of this type of pitfall is to create the class “RatingOfRestaurants” and the classes “1fork”, “2forks”, and so on, as subclasses instead of as instances. Another example is to create the classes “Madrid”, “Barcelona”, “Sevilla”, and so on as subclasses of “Place”. This pitfall could be also named “Individuals are not Classes”.
P18. Specifying too much the domain or the range
Not to find a domain or a range that is general enough. An example of this type of pitfall is to restrict the domain of the relationship “isOfficialLanguage” to the class “City”, instead of allowing also the class “Country” to have official language or a more general concept such as “GeopoliticalObject”.
P19. Swapping intersection and union
The ranges and/or domains of the properties (relationships and attributes) are defined by intersecting several classes in cases in which the ranges and/or domains should be the union of such classes. An example of this type of pitfall is to create the relationship “takesPlaceIn” with domain “OlympicGames” and with range the intersection of the classes “City” and “Nation”. Another example can be to create the attribute “Name” for the classes “City” and “Drink” and to define its domain as the intersection of both classes.
P20. Misusing ontology annotations
The contents of some annotation properties are swapped or misused. An example of this type of pitfall is to include in the Label annotation of the class “Crossroads” the following sentence ’the place of intersection of two or more roads’; and to include in the Comment annotation the word ‘Crossroads’.
P21. Using a miscellaneous class
To create in a hierarchy a class that contains the instances that do not belong to the sibling classes instead of classifying such instances as instances of the class in the upper level of the hierarchy. This class is normally named “Other” or “Miscellaneous”. An example of this type of pitfall is to create the class “HydrographicalResource”, and the subclasses “Stream”, “Waterfall”, etc., and also the subclass “OtherRiverElement”.
P22. Using different naming criteria in the ontology
Ontology elements are not named using the same convention within the whole ontology. It is considered a good practice that the rules and style of lexical encoding for naming the different ontology elements is homogeneous within the ontology. One possibility for rules is that concept names start with capital letters and property names start with non-capital letters. In the case of style, there are different options such as camel case, hyphen style, underscore style, and the combinations. For example, this pitfall appears when a class is named by starting with upper case, e.g. “Ingredient”, and its subclasses by starting with lower case, e.g. “flour”, “milk”, etc.
P23. Using incorrectly ontology elements
An ontology element (class, relationship or attribute) is used to model a part of the ontology that should be modelled with a different element. An example of this type of pitfall is to create the relationship “isEcological” between an instance of “Car” and the instance “Yes” or “No”, instead of creating the attribute “isEcological” whose range is Boolean.
P24. Using recursive definition
An ontology element is used in its own definition. For example, it is used to create the relationship “hasFork” and to establish as its range the following ’the set of restaurants that have at least one value for the relationship “hasFork”.
P25. Defining a relationship inverse to itself *
A relationship is defined as inverse of itself. In this case, this property could have been defined as “owl:SymmetricProperty” instead. An example of this type of pitfall is to create the relationship “hasBorderWith” and to state that “hasBorderWith” is its inverse relationship.
P26. Defining inverse relationships for a symmetric one *
A relationship is defined as “owl:SymmetricProperty” and there is also a relationship (it could be itself or another relationship) defined as its inverse. For example, the symmetric relationship “farFrom” has an inverse relationships defined, e.g. itself, “farFrom”.
P27. Defining wrong equivalent relationships
Two relationships are defined as equivalent relations when they are not necessarily. For example, we can mix up common relationships that could hold between several types of entities, as "hasPart" defined between human body parts and the same relationship relating research plans as part of research projects.
P28. Defining wrong symmetric relationships
The domain defined for a symmetric relationship is different from its range. This could happen because the relationship might not be symmetric, for example defining the relation "pastProject" between the concepts "Agent" and "Project". This situation can also appear due to the domain and range are too specific, for example, if we define the symmetric relationship "hasSpouse" between the concepts "Man" and "Woman" instead of using the concept "Person" both as domain and range of such a relationship.
P29. Defining wrong transitive relationships:
The domain defined for a transitive relationship is different from its range. An example of this type of error is to create the relationship "participatesIn", which domain is the union of the concepts "Team" and "Individual" and which range is the concept "Event", defining the relationship as transitive.
P30. Missing equivalent classes
When an ontology is imported into another, classes with the same conceptual meaning that are duplicated in both ontologies should be defined as equivalent classes, in order to benefit the interoperability among both ontologies. An example of this pitfall can be not to have the equivalent knowledge explicitly defined between 'Trainer' (class in the imported ontology) and 'Coach' (class in the ontology about sports being developed).
P31. Defining wrong equivalent classes
Two classes are defined as equivalent when they are not necessarily. For example, defining “Car” as equivalent to “Vehicle”.
P32. Several classes with the same label *
Two or more classes have the same content in the rdfs:Label annotation. In some cases they could be defined as equivalent classes (e.g. if they are defined in different namespaces) or they could be replaced by a single class with one or more labels (e.g. if they are defined in the same namespace).
P33. Creating a property chain with just one property
A property chain including only one property in the antecedent part is created. In this case it could be more appropriate to create the property in the consequent equivalent to the one in the antecedent of the chain. For example, if the following property chain is created: isInChargeOf -> supervises.
P34. Untyped class
A resource is used as a class, e.g. appearing as the object of an rdf:type, rdfs:domain, or rdfs:range statement, or as the subject or object of an rdfs:subClassOf statement, without having been declared as a Class.
P35. Untyped property
A resource is used as a property, e.g. appearing as the subject or object of an rdfs:subPropertyOf statement, without having been declared as a rdf:Property or some subclass of it.
And a few that have occurred to me from situations I've seen along the way:
R1. Untyped references (this might cover several of the P's above) Reference to a URI in any context other than as the object of an annotation property, without a type triple for that URI.
R2. Crossing domains/ranges
p subPropertyOf q .
p domain A .
q domain B .
B subClassOf A .
Similarly, range.
R3. Annotating Individuals
For Individuals, should we use skos:defintion, dc:description or rdfs:comment or something else
Controlled lists e.g., Jurisdictions, currency lists, languages
Individuals representing a regulator, institutions, etc. They are members of classes like Financial Institution
- Jurisdication specific like the Federal Reserve
- Sample data e.g., CitiGroup (non-normative)
Have a look at current FBC
A set of hygiene suggestions from Elisa: (added on 11/17/16)
Hi Dean,
These fall into several categories:
- Annotations -- every primary entity (class, property and individual) must have at a minimum a label and definition, and for classes in particular, a source for the definition (adaptedFrom or definitionOrigin or dct:source worst case); there should be only one rdfs:label unless there are language tags indicating that they are labels for the same concept in different natural languages ... lack of source information should not prohibit a commit, but not having a source should be the exception not the norm
- Duplication -- there should be only one concept (primary entity) with a given name -- duplicates in different namespaces should be reported, and likely the one in the more abstract ontology should win, but identification of these in a report is a great start (I know of a couple of cases with respect to properties that we need to fix, for example)
- Naming conventions -- ontology names and class names should be upper camel case (MU does not adhere to this consistently), property names should be lower camel case, individuals should be lower camel case unless proper names (not sure you can automate that), and class names should not include acronyms or abbreviations (there are some in the IRSwaps ontology that need to be fixed)
- Entities with "no home" -- i.e., that are not defined somewhere, and there are a few of these with respect to individuals in FBC as an example if you need a test case let me know ... these are things we've referenced but haven't yet got individuals for - missed in a rush to publish I think
- non-ascii characters, such as certain line feeds, or funky representation of unicode character encoding (MagicDraw does this to ontologies I load in VOM, and actually changes the model to use their internal UTF-8 encoding instead of the ascii representation of the non-English or math characters, which I'm guessing MB doesn't realize yet -- caused nightmares for me with LCC) -- the serializer should not change the representation either, even to use an escaped or common html encoding rather than what was in the original either
- min 1 cardinality instead of some values from
- classes with only one child -- this is more of an "informative" thing, as in many cases we add more children in subordinate ontologies, but it would be good to point them out, often a sign of incompleteness
- failure with respect to either pellet or hermit or trowl to complete within "a reasonable amount of time on a reasonable processor", say 10 minutes if there are individuals and something like 2 minutes if not from a logical consistency perspective -- MU's loans stuff causes pellet to die, for example, although trowl is ok with it ... the only potential exception to this rule would be for string values that are defined via regular expressions, which pellet doesn't currently support but hermit does ... reasoners dying should be enough of a reason to disallow the commit unless you can identify a regular expression; slow reasoning is likely due to something else, possibly something in 1-7 above, or due to too many exact cardinalities greater than 1 or something like that
I'll think of more and send them on when I do, but this is a good start. I'd like to be able to run the report in a standalone mode on something in a local branch, as a test of what work remains, even before I attempt to commit it, too.
Thanks!!