/
ISWC 2024 Tutorial - Ontology Engineering for Industry Adoption (OEIA)

ISWC 2024 Tutorial - Ontology Engineering for Industry Adoption (OEIA)

Elisa Kendall12 and Pawel Garbacz23

1 Thematix Partners LLC, New York NY 10021, USA

2 EDM Council, Monmouth Junction, NJ, 08852, USA

3 The John Paul II Catholic University of Lubin, Lublin, Lubelskie 20-950, PL
ekendall@thematix.com, pgarbacz@edmcouncil.org

Abstract

Industry-wide collaborative ontology development efforts can distribute development costs over organizations, address a wider range of use cases, and have the potential to be of higher quality than many project or application-specific ontologies. Based on our experience with several industry ontologies, we will present several of the most important lessons learned in developing ontologies for industry applications, ranging from establishing critical policies from the outset, to reusing standards-based patterns, to leveraging collaborative tools for integration and test. Participants will select example use cases as the basis for an in-class ontology, reuse example patterns, and test their work using open-source tools for serializing ontologies as well as that tools that check for syntactic and semantic issues that well-known tools such as Protégé miss, providing direct experience with capabilities that are found essential for industry standard ontology development.

Detailed Description

Tutorial Overview

Standards that bridge the labeled property graph to knowledge graph divide will make it easier to use ontologies as the basis for machine learning, natural language processing (NLP) and other graph-based analyses to address real world business challenges. Several well-known graph database vendors have implemented (or are implementing) the emerging RDF (Resource Description Framework) 1.2 [1] and SPARQL (SPARQL Protocol and RDF Query Language) 1.2 W3C [2] specifications to that end. Using richer vocabularies improves the results obtained from querying diverse data sources, particularly from structured data, using large language models (LLMs) in ways that property graphs alone cannot [3].

Ontologies can be costly to develop, however, and many ontology projects fail to produce desired results. Industry-wide collaborative ontology development efforts can distribute development costs over organizations, address a wider range of use cases, and have the potential to be of higher quality than many project or application-specific ontologies. Based on our experience with

·       the Financial Industry Business Ontology (FIBO), which is now in use by many financial institutions, central banks, and vendors [4],

·       the Identification of Medicinal Products Ontology (IDMP-O) ontology, which is being applied by large pharmaceutical companies, regulators and vendors [5], and

·       the Industrial Ontologies Foundry (IOF) [6], which is still under development but gaining traction for manufacturing in general,

we believe that several critical factors can contribute to successful and long-lived ontologies.

In this tutorial we will present several of the most important lessons learned in developing ontologies for industry applications, ranging from establishing critical policies from the outset, to reusing standards-based patterns, to leveraging collaborative tools for integration and test. Participants will select from several example use cases as the basis for an in-class ontology, and reuse at least three of the example patterns in order to complete their projects. They will also test their work using open-source tools for serializing ontologies consistently to enable line-by-line comparison of changes in environments such as GitHub. Projects will also be evaluated for quality using open source tools that check for syntactic and semantic issues that well-known tools such as Protégé miss, providing participants with direct experience with the capabilities we have found essential for industry standard ontology development.

Learning Outcomes

Participants can expect to learn

·       Project structure and requirements for setting up a successful project

·       Guidelines and key policies that need to be established up front for any ontology project

·       Reusable modeling patterns from open-source libraries for metadata, controlled vocabularies and other common modeling structures with hands-on exercises

·       How to use and customize the EDM Council’s open-source platform tests and tools for collaboration to improve quality and increase the likelihood of reusability and success 

Presentation / Interaction Style

This tutorial will involve a combination of lecture plus exercises using open-source standards and technology, all freely available, for use in collaborative ontology development projects.

Motivation

Advances over the last several years in large language models, natural language processing, and in the use of machine learning for automating knowledge acquisition, have highlighted the need for consistency and higher quality in model development. Accelerated artificial intelligence technology development has raised serious questions and requirements for improving quality, limiting hallucinations, limiting error more generally, and gaining trust among users. Recent research has highlighted the need for ontologies including metadata, provenance, and formal semantics to address some of these gaps, especially with respect to interoperability across and integration of structured data. Development methods that incorporate consistent, recognizable patterns for certain concepts, that systematically test for minimal metadata, that support regression testing with examples, such as those established for projects supported by the EDM Council, are essential to ensuring a high level of quality, and to reassuring users that their projects will continue to work in light of changes made to the ontologies over time. Because these tools are freely available and open source, we believe that sharing them with the broader semantic web community is timely.

While a number of tutorials given at ISWC in recent years have covered aspects of current research, few, if any, have focused on collaboration or on repeatable approaches for ontology and knowledge graph development emphasizing quality and requirements critical for industry adoption. Industry ontologies can be quite large – FIBO consists of over 200 ontologies, making it difficult for users to understand. Use of consistent patterns has changed the game in the last two years, however, making it much easier to extend and more amenable to use for machine learning. Some of the insights leading to formalization of these patterns at the Object Management Group (OMG) will be shared during this tutorial, including the fact that the IDMP-O development team saved over a year by using them. The same patterns are being applied in retail and manufacturing.

Communities such as the OBO Foundry provide some tools for ontology checking and comparison but do not support regression testing, testing with key examples, or CI/CD (continuous integration / continuous deployment) capabilities. These kinds of capabilities have proven essential for industry ontology development and collaboration with commercial partners, however.

Format

Half-day tutorial, including lecture and hands-on experimentation.

Tutorial type and audience

Type: introductory tutorial

Level: beginner/intermediate

Target Audience: practitioners interested in enterprise and cross-organizational ontology efforts and/or promoting ontologies for broader use, particularly for standardization; ideally 20-40 people, small enough that individual challenges during hands-on sessions can be addressed

Prerequisites: familiarity with the Web Ontology Language (OWL), ontology development using tools such as Protégé or similar editors, and tools such as GitHub is preferred

Presenters

Elisa F. Kendall – Partner, Thematix Partners LLC and Lead Ontologist, EDM Council

email: ekendall@thematix.com

homepage: Management Team - Thematix , https://www.linkedin.com/in/elisakendall/n

ORCiD: https://orcid.org/0009-0009-1864-9506

Bio: Ms. Kendall is a consultant with Thematix Partners LLC and graduate-level lecturer in computer science, focused on data management, data governance, knowledge representation, and decisioning systems. Her consulting practice includes business and information architecture, knowledge representation strategies, and ontology design, development, and training for clients in financial services, government, manufacturing, media, pharmaceutical, and retail domains. Recent projects include the use of ontologies to drive natural language processing, machine learning, interoperability, and other knowledge graph-based applications. At the EDM Council she is lead ontologist for the Financial Industry Business Ontology (FIBO), the Pistoia Alliance Identification of Medicinal Products Ontology (IDMP-O), and other ontology and knowledge graph initiatives. Elisa represents knowledge representation, ontology, information architecture, and data management concerns on the Object Management Group (OMG)’s Architecture Board, is co-editor of a number of OMG ontology-related standards, and contributes to other ISO, W3C, and OMG standards. She is also a member of the Technical Oversight Board and participates in the OAGi Industrial Ontology Foundry(IOF) activity developing standardized ontologies for manufacturing. She holds a B.S. in Mathematics and Computer Science from UCLA, and an A.M in Linguistics from Stanford University.

Pawel Garbacz – Technical Director, EDM Council

email: pgarbacz@edmcouncil.org

homepage: https://www.kul.pl/pawel-garbacz,art_20842.html , https://www.linkedin.com/in/pawel-garbacz-3b041318/

Bio: Pawel Garbacz has 20 years of professional experience in ontology development and more recently in project management as a technical lead. In his IT career, Dr Garbacz has worked for small and medium-sized IT companies, where he promoted and applied the Semantic Web technologies in the enterprise-level computer systems. Pawel’s mission for the EDM Council is to oversee the development of its flagship infrastructure for collaborative ontology development. Dr Garbacz has earned a PhD in philosophy from the John Paul II Catholic University of Lublin, where he is currently a full professor and the chair of the Department of Foundations of Computer Science. 

References

[1] RDF 1.2 Concepts and Abstract Syntax, Olaf Hartig; Champin, Pierre-Antoine; Kellogg, Gregg; Seaborne, Andy, eds. W3C Working Draft, 02 May 2024. Available at https://www.w3.org/TR/rdf12-concepts/ .

[2] SPARQL 1.2 Overview, Andy Seaborne, ed. W3C Group Draft Note, 20 Jul 2023. Available at SPARQL 1.2 Overview .

[3] A BENCHMARK TO UNDERSTAND THE ROLE OF KNOWLEDGE GRAPHS ON LARGE LANGUAGE MODEL’S ACCURACY FOR QUESTION ANSWERING ON ENTERPRISE SQL DATABASES, Juan F. Sequeda, Allemang, Dean, and Jacob, Byron. Available at https://arxiv.org/pdf/2311.07509.

[4] Financial Industry Business Ontology (FIBO) – Available at https://github.com/edmcouncil/fibo and FIBO .

[5] Identification of Medicinal Products Ontology (IDMP-O) – Available at https://github.com/edmcouncil/idmp and IDMP (see also IDMP - O - Pistoia Alliance ).

[6] Industrial Ontologies Foundry (IOF) – available at Industrial Ontologies , and GitHub - iofoundry/ontology .

 

Word Version