ONTOpedia (beta version) is a tool developed within the OntoPedia project funded by Ministerio de Ciencia e Innovación.

ONTOpedia searches on a large collection of tiplets (47GB), indexed with Solr. A triplet contains useful information on either a named entity or a term, and is constituted by three elements: Object 1, Relation, and Object 2. You can make queries on the triplets by specifying any of these elements. For instance, if the query is made by writing Rajoy in the Object 1 box and age in the Relation box, the search engine gives as result the set of tiplets containing specific information about the age of Rajoy.

The Resource

So far, we have extracted the triplets from only one resource, namely Wikipedia (Octuber 2010 version), in four languages: English, Portuguese, Spanish, and Galician. The triplets were extracted from seven different sources:

  • Infoboxes (inf)
  • Wikitables (tab)
  • List of categories (cat)
  • Disambiguation pages (hom)
  • Redirect pages (syn)
  • Titles and bullets (tit)
  • Unrestricted text (depOE), using a technique based on Open Information Extraction and a tool called DepOE (See our EACL-2012 paper)

Current Work

Many improvements should be performed, namely:

  • To perform extraction from more resources and corpora
  • To correct some bugs (e.g., extraction from wikitables)
  • To normalize with lemmatization, and synonyms
  • Language identification
  • To make use of FreeBase and DBpedia
  • To integrate our extracted triplets in Linked Data

Examples of queries:

Bertrand Russell (as Object 1) and death (as Relation)

Rajoy (as Object 1) and age (as Relation)

Nepal (as Object 1) and capital (as Relation)

Manuel Rivas (as Object 1) and awards (as Relation)

Javier Corcobado (as Object 1) and discografía (as Relation)