SOFA Design Whitepaper
by Alex Alishevskikh
<alexeya (at) gmail (dot) com>
Abstract
SOFA (Simple Ontology Framework API) is a Java API representing an object model of abstract, language-independent specification of knowledge, known as an Ontology. It is intended for using by developers of the Semantic Web, Information Retrieval, Knowledge Bases applications and other ontology-driven software engineering. SOFA provides a simplified and highly abstract model of ontology which is independent of a specific ontology representation language and operates with ontologies on a conceptual, rather than syntactic level. It allows for SOFA-based applications to operate with ontologies described in diverse language forms and gives significant advantages in simplicity of software development.
Purpose of this document is to provide basic SOFA design principles for developers of SOFA-based ontology software. It also may be helpful for those who going to examine SOFA to join in the development.
1. Introduction
The suggested software is a set of reusable Java API's aimed to provide developers of ontology applications with the following tools:
- An object model of an abstract, language-independent Ontology to process them in uniform programmatic way.
- A common ontology inferencing mechanism.
- A storage utility for ontology models with multiple implementations for various physical storages.
- A mechanism for interoperating between distinct ontologies.
- Tools for external representation of ontology model with common ontology languages.
- A mechanism for checking ontology constraints (ontology validation).
- Ontology searching tool.
Key goals
- Reducing the time and cost of development by delivering a reusable implementation of common ontology processing tasks.
- Increasing software flexibility, universality and development simplicity due to highly abstract object model and flexible implementation framework.
- Simplifying team development by providing a common, well-defined and documented ontology API.
And after all,
- To make ontology software development as much more easy and amusing job.
Key principles
-
Open source.
The software is open, free and should not rely on any proprietary solutions. -
Platform-independency.
The software is implemented entirely with cross-platform development tools (Java2) and should not contain a platform-depended code or use any platform-specific features. -
Alignment with industry standards.
Software should be compatible with leading industry standards, principles and practicalities.
Design principles
-
Abstraction of model.
An ontology object model is a high-level representation of ontological concepts and it is independent from specific representation language, storage implementation, transfer protocols etc. -
Flexibility.
A software API is separated into interface and implementation layers. Implementation layer can be changed independently from the interfaces. Implementation layer is hidden from client applications.
Implementation principles
-
Test-Driven Development.
Implementations are provided together with test cases for automated testing framework (JUnit). - Reliability and robustness.
- Scalability
Terminology
-
Ontology
A specification of conceptualized knowledge about specific area of interest. -
Ontology Object Model
A representation of an Ontology with a structure of Java interfaces and implementation classes -
Thing
Any Ontology member. A subject of an Ontology encapsulating knowledge about specific item of area of interest. -
Concept (equivalent to Class in OWL terminology)
A Thing providing hierarchical classification of other Things and representing a node of that classification. -
Subconcept (of the Concept)
A descendant of the given Concept in classification hierarchy (more specialized Concept). -
Superconcept (of the Concept)
An ancestor of the given Concept in classification hierarchy (more generalized Concept). -
Relation (equivalent to Property in OWL terminology)
A Thing providing specification of relationship between distinct Things or between Thing and actual datatype values. -
Subrelation (of the Relation)
More specialized Relation than the given one. -
Superrelation (of the Relation)
More generalized Relation than the given one. -
Instance (of the Concept)
A classified Thing referred to the given Concept. -
Domain Concept (of the Relation)
A Concept, instances of which are allowed to have the given Relation. -
Defined Relation (of the Concept)
A Relation having the given Concept as a Domain Concept. -
Range Concept (of the Relation)
A Concept, instances of which are allowed to be the targets of the given Relation. -
Range Datatype (of the Relation)
A Java class, instances of which are allowed to be the targets of the given Relation. -
Datatype value
Any Java object. -
Restriction
A constraint set by a Concept for its Defined Relation. -
Statement
An assertion, claiming a single elementary fact about a given Thing. It can be represented as a triple {SUBJECT PREDICATE OBJECT} where SUBJECT is a Thing, PREDICATE is a Relation and OBJECT is another Thing or datatype value. -
Client application
Any software referring to the SOFA API -
Ontology Storage Model
An object model of abstract storage utility for storing the data of Ontology Object Model. -
Serialization
A process of representing the data of Ontology Object Model with external textual form (by a specific ontology language, e.g. OWL). -
Deserialization
A process of restoring the data of Ontology Object Model from serialized form. -
Inferencing
A process of getting new (implicit) statements derived from a basic statements set explicitly on a given Thing in accordance with certain inference rules. -
Ontology Integrity
A state of an Ontology when all constraints set on the Things are maintained.
2. Structure of API
The SOFA API is divided into the following conceptual modules:
-
The Ontology Object Model interfaces
The set of Java interfaces representing the concerns of the Ontology Object Model. The client applications refer to these interfaces only, not to their implementations. -
The Ontology Object Model Reference Implementation
The set of Java classes implementing the Ontology Object Model interfaces. This part can be changed independently from the interfaces. -
The Ontology Storage Model
It contains an uniform interface of abstract storage utility and may includes a number of implementations for specific physical storage back-ends. The Ontology Model implementation refers to that interface for saving and retrieving the Ontology internals. -
Serialization modules
The sets of classes providing external representation of the Ontology Model with various ontology languages and parsing the representations back to the Model.
The SOFA API design scheme
3. Ontology Object Model
The Ontology model API is a heart of entire platform. This is a set of interfaces and their implementations representing an Ontology Object Model for manipulating ontologies by a programmatic object-oriented way. The model in conceptual level is consistent with W3C OWL (Web Ontology Language).
Conceptual Model of an Abstract Ontology
Ontology
An ontology is a formal representation of knowledge about area of interest. In SOFA, an ontology is considered as a set of individuals (Things) which encapsulate sequences of axioms and facts. Ontology is intended to be a knowledge repository that responsible for Things creation, storage, retrieving and removing. Also it provides an uniform namespace for all Things which belong to the ontology.
Ontologies can also have non-logical annotations that can be used to record human-readable labels, comments, versioning info and other non-logical information associated with an ontology.
Things
A Thing is a logical meaning ontology member which encapsulates knowledge about a specific item within an area of interest.
Thing model can be considered as a set of statements declaring some facts about a given item. A character of these statements is specified by Relations (predicates). The set of Relations allowed to participate in a specific Thing is an union of sets of Relations declared in domains of all ontology Concepts for which this Thing is an instance.
Each Thing has the following built-in properties:
- Unique identifier. Together with an ontology namespace identifier it can form a qualified URI (Uniform Resource Identifier) name to address this Thing in distributed environments.
- Non-logical annotations for human-readable labels, comments, versioning info and other information associated with a Thing.
- Axioms of relations to the specific ontology Concept.
Concepts
The special Things providing hierarchical classification of other Things. A specific Concept defines a group of Things that belong together because they share some relation types. A Thing belonging to a specific Concept is called an instance of this Concept.
The Concepts can be organized in a specialization hierarchy using subconcept axioms. More general Concepts extended with their subconcepts, which represent more special notions. A multiple inheritance is allowed, i.e. a Concept can be the direct subconcept of more than one superconcepts.
Relations
The Things providing specification of relationships between Things or from Things to actual data values. A Relation specification includes a domain of this Relation (a set of the Concept for instances of which the relation can be applied) and a range specifies the Concepts or the data types, instances of which are allowed to be the targets of this Relation. As well as the Concept, the Relations can be organized in a specialization hierarchy using subrelation axioms.
Relation attributes
Relations may have the following attributes, playing role in ontology inferencing:
- Transitivity. If relation R is transitive , then statements {x R y} and { y R z } causes an implicit statement {x R z}.
- Symmetry. If the relation R is symmetric, then statement {x R y} causes an implicit statement {y R x} and vice versa.
- Inversion. If the relation R is inversion of relation R', then statement {x R y} causes an implicit statement {y R' x} and vice versa.
Restrictions
The Concept can state restrictions on how the specific Relations may be applied on its instances. There are following types of restrictions:
- Cardinality Restriction limits a maximal and minimal number of values acceptable for the same Relation.
- Value Restriction limits the values acceptable for the Relation by a specific set of predefined values.
Inference rules
An implementation provides an inference mechanism for making the implications from the axioms which are expicitly stated in the model. The basic inference rules are:
- Instances inheritance: If the Thing T is an instance of the Concept C, it is also an instance of all ancestor Concepts of the C.
- Subconcepts transitivity: If the Concept C is a subconcept of the Concept C', it is also a subconcept of all ancestor Concepts of the C'.
- Subrelations transitivity: If the Relation R is a subrelation of the Relation R', it is also a subrelation of all ancestor Relations of the R'.
- Domain concepts inheritance: If the Concept C is a domain concept of relation R, then all subconcepts of C are also the domain concepts of R .
- Range concepts inheritance: If the Concept C is a range concept of relation R, then all subconcepts of C are also the range concepts of R.
- Relations generalization: If the Thing T has a statement with relation R and with any value, it also has a set of statements with all ancestor relations of R (and with the same value).
- Rules of transitivity, symmetry and inversion of relations (see "Relation attributes" section).
Integrity conditions
An implementation provides checking of ontology integrity, which means an integrity of all Things belonging to the ontology. This integrity evaluated as truth of the following conditions:
- A Thing has statements only with the Relations which have the Concepts of this Thing as their domain concepts (considering the rule of "Domain concepts inheritance").
- A Thing has statements only with objects, allowed by ranges of corresponding Relations (considering the rule of "Range concepts inheritance").
- A number of statements with a specific Relation is satisfied to cardinality restriction for Relation, stated on the nearest Concept of this Thing (considering the rule of "Instances inheritance").
- Values of statements with a specific Relation is satisfied to value restriction for corresponding Relation, stated on the nearest Concept of this Thing (considering the rule of "Instances inheritance").
There is a sort of inconsistencies which are beyond the area of integrity and must be considered as exceptional situations (errors):
- Co-existing (in the same model) of two and more Things with the concurrent identifiers.
- Circular references in subconcepts, subrelations and instance-of definitions.
By convention, the model must not allow these situations.
Interoperability of the Ontology instances
The client applications must be able to manipulate with a number of distinct ontology instances at once. It should be possible for ontology members to have the relations with members of another ontology instances. The ontology model implementation provides a transparent mechanism for such interaction between the distinct ontologies.
Ontology Storage Model
The Ontology Storage Model API provides an abstract model of a storage utility for Ontology Model implementation. The SOFA model implementation refers to the storage model interface to store and retrieve the data (sets of statements) of the ontology internals. The client applications usually should not appeal directly to the storage API passing over the ontology model.
The interface part of the storage model represents an abstract ontology storage utility. It is independent from a specific way of storing the data in a particular physical storage back-end. This is a responsibility of a storage model implementation, which knows how to interact with a specific storage mechanism.
Storage model implementations
In-memory storage
The default storage model implementation is a simple in-memory storage
. This is a minimalistic storage utility based on default implementations of
the java.util.Collection and java.util.Map
interfaces family. This implementation intended mainly for testing and
experimental purposes and it can use the ontology serialization
mechanism for long-term storage.
Persistent storage
The main productional storage model implementation is a persistent storage. It is built using the JDBC (Java DataBase Connector) framework to store and retrieve the ontology axioms with relational database management systems (RDBMS). This approach allows to enable a vast of approved database software as possible persistent storage utilities for ontology models. Also it brings necessary characteristics of an enterprise quality storage - such as reliablity, scalability, transactions support and security.
Besides, a persistent storage model can be implemented with other suitable data storage back-ends, e.g. Object-Oriented DBMS, Native XML databases, BerkeleyDB-like databases etc.
Interoperability storage
The special class of storage model implementations are adapters to existing applications and information systems. The adapters interpret the ontology axioms into the structures of the external data model and vice versa. It provides a transparent way to interact the ontology applications with these systems and provides an ontological representation and way of mainipulate of their data.
Ontology Serialization
Reversible representation means that the model can be serialized in an external format and then restored (deserialized) from this format backwards. This mechanism also allows to use existing ontologies and provides an interoperability with external agents.
The SOFA ontology model is independent from specific languages, but it can be interpreted in terms of those having expression capabilities to describe that model. As the SOFA model is conceptually consistent with semantics of W3C Ontology Web Language (positioned as an industry standard of ontology representation), the model can be entirely represented using this language syntax. Also it is rather true for DAML+OIL (the predecessor of OWL and still the most popular ontology definition language). Other languages can lose some details of the SOFA ontology model.
The Ontology Serialization package includes the modules providing serialization of the ontology model with specific languages and restoring it from a serialized form. The primary of these modules are:
- OWL (Ontology Web Language) serialization module: provides guaranteed reversible serialization of the SOFA ontology model without any losses.
- DAML+OIL serialization module: provides reversible serialization with high reliability.
- RDF + RDF-Schema serialization module: provides reversible serialization of the main aspects of the SOFA ontology model.
Programming aspects
The Ontology model API is an hierarchy of objects of Java programming
language. The root notion (Thing) is represented by objects with a root
interface (Thing ) which provides the basic getters and
setters methods for statements and built-in relations. The sub-notions
(Concept and Relation) are represented by specialized subinterfaces of
Thing, extended with methods for their specific needs.
Instantiation of ontology objects is provided by the Ontology
interface.
Java data types
The model provides a transparent way to get and set the arbitrary Java objects as the datatype values of the Thing's statements. It also provides an automatical mapping from the Java classes to datatype ranges of the Relations.
Events and event listeners
The changes of the model bring to arising of an event. A client application can tracks the certain events by setting the event listeners, which will be notified about arising of events of specified class and execute certain tasks to handle these events.
The events mechanism is based on Java events framework (
java.util.EventObject class and java.util.EventListener
interface).
Exceptions
When the model meets an illegal action, failure or in another situation,
which may be considered as abnormal, it throws an exception. A
client application can catch the exception to handle it in appropriate
way. The exceptions mechanism is based on Java exceptions framework (
java.lang.Throwable class hierarchy).
$Id: index.html,v 1.1.1.1 2005/02/14 07:58:51 alexeya Exp $
