Consistent Annotation of WordNet using the Top Ontology

The Top Ontology (TO) (Alonge et al., 1998) is an indepedent hierarchy of features designed for clustering, comparing and exchanging concepts across languages in the EuroWordNet Project (Vossen, 1998). Furthermore, it has been usually used as a repository of lexical semantic information. Each WordNet synset has been annotated to one or more TO feature.

In the following link, we have the annotation of WordNet 1.6 with TO features (version 2.3):

The TO is also integrated into the Multilingual Central Repository (MCR).

The TCO consists in 63 features organized in three disjoint types of entities:

 - 1stOrderEntity: physical things (image)

 - 2ndOrderEntity: events, states and properties (image)

 - 3rdOrderEntity: unobservable entities

Most of the subdivisions of the TO are disjoint categories: a concept cannot be both Natural and Artifact. Nevertheless, some of these inconsistences can be found when the TO features are inherited through the hyponymy hierarchy.

We can avoid the inheritance of disjoint categories including some blockage points in the hyponymy hierarchy paths. In this way, a consistent annotation of the nominal part of WorNet is obtained. 

WordNet to TO Annotation Tools

We have developed a set of tools for checking the consistency of the annotation and also obtaining its expansion. For proving consistency, we check that there is no incompatiblity in the annotation of the nominal part of WordNet 1.6 to TO when using the blockage points. The expansion of the annotation can be obtained when the annotation is consistent. These tools have been implemented in Prolog and are available in the following links: [tar.gz] [zip]

Quantitative analysis

We have gotten some interesting numeric conclusions from the TO annotation and the addition of the blockage points. For instance, every blockage point subsumes an average of 120.16 synsets; there are 28,123 synsets that have at least one blockage point in their hypernymy line.

All this information is downloadable from here: [tar.gz] [zip]

License

This package is distributed under Attribution 3.0 Unported (CC BY 3.0) license. You can find it at http://creativecommons.org/licenses/by/3.0

Publications

References

  • Alonge A., Bertagna F., Bloksma L., Climent S., Peters W., Rodríguez H., Roventini A. and Vossen P. (1998) The Top-Down Strategy for Building EuroWordNet: Vocabulary Coverage, Base Concepts and Top Ontology. In Piek Vossen (ed.) EuroWordNet: A Multilingual Database with Lexical Semantic Networks. Kluwer Academic Publishers, Dordrecht
  • Vossen P., (Ed.) (1998). EuroWordNet: A Multilingual Database with Lexical Semantic Networks. Kluwer Academic Publishers