Вы находитесь на странице: 1из 16

Semantic Web 8 (2017) 437–452 437

DOI 10.3233/SW-160213
IOS Press

Linked Open Vocabularies (LOV): A gateway


to reusable semantic vocabularies on the Web
Editor(s): Michel Dumontier, Stanford Center for Biomedical Informatics Research, Stanford, CA, USA
Solicited review(s): Wei Hu, Nanjing University, China; Robert Meusel, University of Mannheim, Germany; Johann Schaible, GESIS,
Germany; Robert Danitz, Fraunhofer FOKUS, Berlin, Germany; Christian Bizer, University of Mannheim, Germany; Irene Celino, CEFRIEL –
Politecnico di Milano, Italy; one anonymous reviewer

Pierre-Yves Vandenbussche a,*,** , Ghislain A. Atemezing b , María Poveda-Villalón c and Bernard Vatant d

a Fujitsu(Ireland) Limited, Swords, Co. Dublin, Ireland


b Mondeca, 35 boulevard de Strasbourg, 75010 Paris, France
E-mail: ghislain.atemezing@mondeca.com
c Ontology Engineering Group (OEG), Universidad Politécnica de Madrid, Madrid, Spain

E-mail: mpoveda@fi.upm.es
d Mondeca, 35 boulevard de Strasbourg, 75010 Paris, France

E-mail: bernard.vatant@mondeca.com

Abstract. One of the major barriers to the deployment of Linked Data is the difficulty that data publishers have in determining
which vocabularies to use to describe the semantics of data. This systematic report describes Linked Open Vocabularies (LOV),
a high-quality catalogue of reusable vocabularies for the description of data on the Web. The LOV initiative gathers and makes
visible indicators such as the interconnections between vocabularies and each vocabulary’s version history, along with past and
current editor (individual or organization). The report details the various components of the system along with some innovations,
such as the introduction of a property-level boost in the vocabulary search scoring that takes into account the property’s type
(e.g., dc:comment) associated with a matching literal value. By providing an extensive range of data access methods (full-text
search, SPARQL endpoint, API, data dump or UI), the project aims at facilitating the reuse of well-documented vocabularies
in the Linked Data ecosystem. The adoption of LOV by many applications and methods shows the importance of such a set of
vocabularies and related features for ontology design and the publication of data on the Web.

Keywords: LOV, Linked Open Vocabularies, ontology search, Linked Data, vocabulary catalogue

1. Introduction Schema and the Web Ontology Language (OWL).


Thanks to a major effort in publishing data following
The last two decades have seen the emergence of Semantic Web and Linked Data principles [6], there
a “Semantic Web” enabling humans and computer are now tens of billions of facts spanning hundreds of
systems to exchange data with unambiguous, shared linked datasets on the Web covering a wide range of
meaning. This vision has been supported by World topics. Access to the data is facilitated by portals (such
Wide Web Consortium (W3C) Recommendations such as Datahub1 or UK Government Data2 ) or direct pub-
as the Resource Description Framework (RDF), RDF- lication by organisations (e.g. New York Times3 ).

* Corresponding author.
1 http://datahub.io/.
E-mail: pierre-yves.vandenbussche@ie.fujitsu.com.
** Thanks to Amélie Gyrard, Thomas Francart, Thérèze Rogez, 2 http://data.gov.uk/.
Laurent Irles and Anthony McCauley for their help on the project. 3 http://data.nytimes.com/.

1570-0844/17/$35.00 © 2017 – IOS Press and the authors. All rights reserved
438 P.-Y. Vandenbussche et al. / LOV: A gateway to reusable semantic vocabularies on the Web

Despite the enormous volume of data now avail-


able on the Web, the Linked Data community has rel-
atively little interest in vocabulary4 management, fo-
cusing rather on the data itself. A vocabulary con-
sists of classes, properties and datatypes that define the
meaning of data. RDF vocabularies are themselves ex-
pressed and published following the Linked Data prin-
ciples; this gives humans and machines access to the
definitions of the terms used to qualify the data. Unfor-
tunately some vocabularies are not published or are no
longer not available; this breaks the semantic interop-
erability of the data, one of the fundamental principles
of the Semantic Web [16].
The Linked Open Vocabularies (LOV) initiative5 is
Fig. 1. Evolution of the number of vocabularies in LOV from March
an innovative observatory of the semantic vocabular- 2011 to June 2015.
ies ecosystem. Started in March 2011, as part of the
DataLift research project [27] and hosted by the Open data, import, specialization, generalization, ex-
Knowledge Foundation, LOV gathers and makes vis- tension and equivalence (cf. Section 3.1.1). These
ible indicators not previously harvested, such as the relationships can be useful for finding alignments
interconnections between vocabularies, the versioning between ontologies.
history along with past and current editor (individ-
ual or organization). The number of vocabularies in- This report is structured as follows: in the next sec-
dexed by LOV is constantly growing (527 as of Octo- tion, we provide statistics on the usage of LOV. In
ber 2015) thanks to a community effort. It is the only Section 3, we describe the components and features
catalogue, to the best of our knowledge, that accepts of the system. Thereafter, in Section 4, we provide an
all types of search criteria: metadata search, ontology overview of some applications and research projects
search, APIs, a comprehensive dump file and SPARQL based on and motivated by the LOV system. In Sec-
endpoint access. tion 5, we report on related work. The limitations and
The purpose of LOV is to promote and facilitate the further development of LOV are discussed in Sec-
reuse of well documented vocabularies in the Linked tion 6. We conclude in Section 7.
Data ecosystem. In D’Aquin and Noy [12]’s categori-
sation of ontology libraries, LOV falls into the cate-
gories “curated ontology directory” and “application 2. LOV state
platform”. Specifically, LOV supports the following
main activities for the design of ontologies and the The LOV dataset consists of 527 vocabularies as of
publication of data on the Web [19,20,30,32]: October 2015.6 Figure 1 shows the evolution of the
number of vocabularies inserted in the LOV dataset
Ontology Search. LOV enables searching for vocab- since March 2011. The addition of new vocabularies
ulary terms (class, property, datatype) based on to LOV has been fairly constant with two exceptions:
domain: vocabularies (and therefore vocabulary 1) the deployment of LOV version 2 [early 2012] au-
terms) are categorised according to the domain tomated most of the vocabulary analyses, resulting in
they address. the increase number of vocabularies; and 2) the de-
Ontology Assessment. LOV provides a ranking (cf. ployment of LOV version 3 [early 2015], resulting in a
Section 3.3.1 for each term retrieved by a key- small decrease and plateau of the vocabularies. At that
word search to assist in ontology assessment. time we were considering removing offline vocabular-
Ontology Mapping. LOV categorizes seven different ies but finally decided to keep them with a special flag,
types of relationships between ontologies: meta- making LOV the only source of continuity for datasets
referencing unreachable vocabularies.
4 We use the terms “semantic vocabulary”, “vocabulary” and “on-
tology” interchangeably. 6 However, the figures and evaluation used in this report are based
5 http://lov.okfn.org/dataset/lov/. on LOV catalogue with 511 vocabularies as of June 2015.
P.-Y. Vandenbussche et al. / LOV: A gateway to reusable semantic vocabularies on the Web 439

Table 1
LOV vocabulary element types statistics
Type Count Median per vocab
Properties 29,925 17
Classes 20,034 9
Instances 5,232 0
Datatypes 101 0

Table 2
Top five languages detected in the LOV catalogue, showing numbers
and percentages of vocabularies using them. A vocabulary can make
use of multiple languages
Language # vocabs % vocabs (out of 511)
English 338 66.14%
Fig. 2. Distribution of LOV vocabularies by creation date. For indi-
cation, we use vertical red lines to mark the official release dates of French 37 7.24%
the main Semantic Web languages (RDF, RDFS and OWL). Spanish 25 4.89%
German 19 3.72%
Italian 18 3.52%

Fig. 3. Distribution of LOV vocabularies by last modified date.

By observing the vocabularies contained in LOV as Fig. 4. Distribution of LOV vocabularies by number of languages ex-
plicitly mentioned using language tag. “Zero” means that there is no
a whole, we can extract some information about Se- explicit language tag declared (i.e. no literal value of the vocabulary
mantic Web adoption and dynamics. Figure 2 shows has a language tag).
the distribution of LOV vocabularies by creation date.
The distribution follows a bell curve with its peak in stance of rdfs:Class or owl:Class; the Proper-
2011. It is worth noting that a decrease in number of ties type refers to any instance of rdf:Property or
vocabulary creation does not necessarily mean a de- by inference, any instance of subclasses of
crease in use of the technology but rather that the ex- rdf:Property defined in the OWL language; the
isting vocabularies now cover a large part of the se- Datatypes type refers to any instance of rdfs:
mantic description needed. When looking at the last Datatype; and finally, the members of a vocabulary
modified date of the same vocabularies (as illustrated class are known as instances of the class.
in Fig. 3), we see that LOV vocabularies are part of a Out of 511 vocabularies, 66.14% explicitly use the
living ecosystem in constant evolution. English language for labels/comments, i.e. containing
Overall, the LOV dataset contains 20,000 classes @en tag. Table 2 presents the number and percent-
and almost 30,000 properties. The median is 9 classes age of vocabularies using the top five languages de-
and 17 properties per vocabulary. Table 1 presents a tected in LOV. Figure 4 shows the distribution of vo-
breakdown of LOV content by vocabulary element cabularies per number of languages explicitly used:
type. In this Table, the Classes type refers to any in- 27.98% of the vocabularies still do not provide any
440 P.-Y. Vandenbussche et al. / LOV: A gateway to reusable semantic vocabularies on the Web

Table 3
Type of elements searched from January to June 2015 by users in LOV for all searches and those with keyword
Element Type # searches % searches # searches % searches
with keyword with keyword
Term 205,682 14.19% 80,728 92.84%
Vocabulary 178,837 12.34% 5,918 6.81%
Agent 1,064,597 73.47% 306 0.35%

Table 4
Top 10 terms searched from January to June 2015 by users in LOV
Vocabulary Term # searches % searches
set 7,092 8.79%
domain 2,518 3.12%
some 2,473 3.06%
status 1,486 1.84%
iso 639 1,389 1.72%
same 1,285 1.59%
state 1,235 1.53%
supports 1,145 1.42%
start 887 1.1%
space 864 1.07%

Fig. 5. Evolution of the number of searches through UI and API


language information, and only 14.68% of the vocabu- methods from January to June 2015. Note that the y axis has a loga-
laries are multilingual. In total, 45 languages are used rithmic scale.
by vocabularies in LOV. We will discuss the impor-
tance of providing multilingual vocabularies in Sec- neering and data publication. The LOV Google+ com-
tion 7. munity8 is now an important place to discuss, report
From January to June 2015, more than 1.4 million and announce general facts related to vocabularies on
searches were conducted on LOV.7 A breakdown of the Web. The LOV dataset itself references 389 re-
searches per element type is provided in Table 3. We sources of type Person and 111 of type Organiza-
can see that agent search (for person or organisation) tion participating in vocabulary design and/or publica-
is the most prevalent; this is a new feature in LOV tion.
version 3. This might be explained by the uniqueness
(when compared to other ontology catalogues) and the
recent development of this feature in LOV, which now 3. System components and features
allows a user to visualise who defined or published
vocabularies. Searches that include keywords (and not The LOV architecture is composed of four main
only filters) are mainly seek vocabulary terms. Table 4 components (cf. Fig. 6): 1) Tracking and Analysis.
presents the top 10 searched terms between January Checks for any vocabulary version update and anal-
and June 2015. Although most of the searches are yses vocabularies’ specific features. 2) Curation. En-
performed through the user interface, an application sures the high quality of the LOV dataset by enabling
ecosystem using LOV APIs has surfaced, as shown in the community to suggest vocabularies or edit the cat-
Fig. 5. alogue. 3) Data Access. Provides access to the data
Since 2011, the Linked Open Vocabularies initia- through a large range of methods and protocols to fa-
tive has gathered a community of about 480 people in- cilitate the use of LOV dataset and 4) Data Storage.
terested in various domains, including ontology engi- Offers a reliable and efficient method for storing and
querying the data. Each component provides a set of
7 This figure includes searches from the API and UI as well as
features detailed in the following subsections.
searches with and without keywords such as “all agents that partic-
ipated in vocabulary design and publication in the geo-location do- 8 https://plus.google.com/communities/
main”. 108509791366293651606.
P.-Y. Vandenbussche et al. / LOV: A gateway to reusable semantic vocabularies on the Web 441

Fig. 7. Metadata type, vocabulary inlinks and outlinks of DCAT vo-


cabulary.
Fig. 6. Overview of the Linked Open Vocabularies Architecture.
that V 1 contains a class c1 and a property p1 and V 2
3.1. Tracking and analysis contains a class c2 and a property p2. Relationships
between these two vocabularies can be of the following
The Tracking and Analysis component derefer- types (the lines and numbers in brackets correspond to
ences9 LOV vocabularies, stores a version locally (in real examples presented in Listing 1):
Notation 3 format) and extracts relevant metadata. Metadata. Some terms from V 2 are reused to provide
3.1.1. Vocabulary level analysis metadata about V 1, Listing 1 lines 1–2.
At the vocabulary level, the system extracts three Import. Some terms from V 2 are reused with V 1 to
types of information for each vocabulary version capture the semantic of the data (lines 3 to 4).
(Fig. 7): Specialization. V 1 defines some subclasses or sub-
properties (or local restrictions) of V 2, Listing 1
– The metadata associated to the vocabulary. This lines 5–8.
information is explicitly defined within the vo- Generalization. V 1 defines some superclasses or su-
cabulary to provide context and useful data about perproperties of V 2, Listing 1 lines 9–11.
the vocabulary. Some high level vocabularies can Extension. V 1 extends the expressivity of V 2, List-
be reused for that purpose, such as Dublin Core10 ing 1 lines 12–15.
to describe authors, contributors, publishers or Equivalence. V 1 declares some equivalent classes or
Creative Commons11 for the description of a li- properties with V 2, Listing 1 lines 16–20.
cense. Disjunction. V 1 declares some disjunct classes with
– Inlinks/incoming vocabularies, making explicit V 2, Listing 1 lines 21–23.
the links from another vocabulary based on the
These relationships, with the exception of Import
semantic relation of their terms.
which is represented by owl:imports, are cap-
– Outlinks/outgoing vocabularies, making explicit
tured by the Vocabulary of a Friend12 (VOAF). When-
the links to another vocabulary based on the se-
ever a new vocabulary/vocabulary version is added to
mantic relation of their terms.
LOV, the system automatically detects and adds the
Two vocabularies can be interlinked in many differ- inter-vocabulary relationships to the LOV catalogue
ent ways. Consider two vocabularies V 1 and V 2 such using specific Construct SPARQL queries.13 Table 5
presents a breakdown of the occurrences of each rela-
9 A URI is looked up over HTTP to return content in a processable tion in LOV.
format such as XML/RDF, Notation 3 or Turtle.
10 http://purl.org/dc/terms/. 12 http://lov.okfn.org/vocommons/voaf/.
11 http://creativecommons.org/ns#. 13 The SPARQL Queries are described in the VOAF vocabulary.
442 P.-Y. Vandenbussche et al. / LOV: A gateway to reusable semantic vocabularies on the Web

Listing 1. Examples of Inter-vocabulary relationships.

Table 5
LODStats project which gathers comprehensive statis-
Inter-vocabulary relationship types and their number of occurrences
tics about RDF datasets [3]. LOV regularly fetches
in LOV
LODStats raw data14 described using the Vocabulary
Inter-vocabulary relationship # relations
of Interlinked Datasets (VoID) [1] and the Data Cube
voaf:metadataVoc 2,637 vocabulary. We pre-process LODStats data before in-
voaf:specializes 1,269 serting it to LOV. Indeed, there are many duplicates
voaf:extends 1,031
in LODStats representing in fact the same vocabulary
owl:imports 373
URI (e.g., foaf has three different records,15 and has
voaf:hasEquivalencesWith 201
to be mapped to a single entry in LOV)
voaf:generalizes 57
voaf:hasDisjunctionsWith 16
3.2. Curation

3.1.2. Vocabulary term level analysis The vocabulary collection is maintained by curators
At the vocabulary term level, the system extracts who are responsible for validating metadata informa-
two types of information: tion, inserting a vocabulary in the LOV ecosystem, and
assigning a review on the suggested vocabulary.
– term type (class, property, datatype or instance
defined in the namespace of the vocabulary) in- 3.2.1. Vocabulary insertion
dexed by the system’s search engine so it can be Compared to other vocabulary catalogues (cf. Sec-
used to filter a search. tion 5), LOV relies on a semi-automated process for
– term natural language annotations (RDF literals) vocabulary insertion. Whereas an automated process
with their predicate and language (e.g. rdfs: focuses only on volume, in our process, we focus on
label "Temperature"@en). This informa- the quality of each vocabulary and therefore the qual-
tion is provided as is for indexing by the search ity of the overall LOV ecosystem. Suggestions come
engine and will later be used (cf. Section 3.3.1) in from the community and from inter-vocabulary ref-
the scoring algorithm. erence links. Our system provides a feature to sug-
gest16 the insertion of a new vocabulary. This feature
The information concerning the usage of a vocab-
ulary term in Linked Open Data, also called “popu- 14 We retrieve the statistics available at: http://stats.lod2.eu/. Un-
larity”, is used in LOV search results scoring as ex-
fortunately this file has been unavailable since June 2014 which ex-
plained in Section 3.3.1. This information is not na- plains some differences between the statistics we use and LODStats.
tively present in the vocabularies and can not be in- 15 http://stats.lod2.eu/vocabularies?search=foaf.
ferred from the LOV dataset. We make use of the 16 http://lov.okfn.org/dataset/lov/suggest/.
P.-Y. Vandenbussche et al. / LOV: A gateway to reusable semantic vocabularies on the Web 443

allows a user to check what information the LOV sys- curated review for each vocabulary is less than one
tem can automatically detect and extract. LOV cura- year old by sending curators a notification when a vo-
tors then check whether the vocabulary meets the fol- cabulary review is older than eleven months. In both
lowing LOV quality requirements: cases, curators update the vocabulary review accord-
ingly.
1. a vocabulary should be written in RDF and be
dereferenceable; 3.3. Data access
2. a vocabulary should be parsable without error
(warnings are tolerated); The LOV system (code and data) is published un-
3. all vocabulary terms (classes, properties and der a Creative Commons 4.0 license17 (CC BY 4.0).
datatypes) in a vocabulary should have an rdfs: Users and applications can access the LOV data in four
label; ways:
4. a vocabulary should refer to and reuse relevant
existing ones; and 1. Query the LOV search engine to find the most
5. a vocabulary should provide some metadata about relevant vocabulary terms, vocabularies or agents
the vocabulary itself (at least a title). matching keywords and/or filters;
2. Download data dumps of the LOV catalogue in
If a suggested vocabulary meets these criteria it is then RDF Notation 3 format or the LOV catalogue and
inserted in the LOV catalogue. During this process, the latest version of each vocabulary in RDF N-
LOV curators keep the authors informed and help them quads format;
to improve their vocabulary quality. As a result of our 3. Run SPARQL queries on the LOV SPARQL End-
experience in vocabulary publication, we developed a point; and
handbook of metadata recommendations for Linked 4. Use the LOV API which provides a full access to
Open Data vocabularies to help in publishing well doc- LOV data for software applications.
umented vocabularies [31].
3.3.1. Search engine
3.2.2. Vocabulary review In [9], Butt et al. compare eight different ranking
When automatic extraction of metadata fails, LOV methods grouped in two categories for querying vo-
curators enhance the description available in the sys- cabulary terms:
tem and notify the vocabulary authors of the pitfalls’
1. Content-based Ranking Models: tf-idf, BM25,
report. This manual task usually consists in checking
Vector Space Model and Class Match Measure.
for any additional information present in the HTML 2. Graph-based Ranking Models: PageRank, Den-
documentation (targeted for humans) and not reflected sity Measure, Semantic Similarity Measure and
in the RDF description. The documentation provided Betweenness Measure.
by the LOV system assists users in understanding the
semantics of each vocabulary term and therefore of any Based on their findings, we defined a new ranking
data using the term. For instance, information about method adapting term frequency inverse document fre-
the creator and publisher is a key indication for a vo- quency (tf-idf) to the graph-structure of vocabularies.
cabulary user in case help or clarification is required Compared to the other methods, tf-idf takes into ac-
from the author, or to assess the stability of that ar- count the relevance and importance of a resource to the
tifact. About 55% of the vocabularies specify at least query when assigning a weight to a particular vocab-
one creator, contributor or editor. We augment this in- ulary for a given query term. We reuse the augmented
formation using manually gathered information, lead- frequency variation of term frequency formula to pre-
ing to the inclusion of data about the creator in over vent a bias towards longer vocabularies. Because of the
85% of the vocabularies in LOV. The database stores inherent graph structure of vocabularies, tf-idf needs
every version of a vocabulary since its first issue. For to be tailored so that the basic unit is not a word, but
each version, a user can access the file (particularly rather a vocabulary term t in a vocabulary V . Equa-
useful when the original online file is no longer avail- tion (1) presents the adaptation of tf-idf to vocabular-
able). A script automatically checks for vocabulary up- ies (a definition of the variables used in this paper’s
dates on a daily basis. When a new version is de- equations is provided in Table 6).
tected, it is stored locally, and the statistics about that
vocabulary are recomputed. Similarly we ensure that 17 https://creativecommons.org/licenses/by/4.0/.
444 P.-Y. Vandenbussche et al. / LOV: A gateway to reusable semantic vocabularies on the Web

Table 6
higher weight to shorter fields, by combining it with a
Definition of the variables used in the equations
property-level boost boost(p(t)). Using this property-
Variable Description level boost we can assign a different score depending
V Set of Vocabularies on which label property a query term matches. We dis-
V A vocabulary: V ∈ V tinguish four categories of matches:
|V | Number of vocabularies in V
t A vocabulary term URI (class, property, instance or
– Local name (URI without the namespace). While
datatype): t ∈ V , t ∈ URI a URI is not supposed to carry any meaning, it is
Q Query string a convention to use a compressed form of a term
qi Query term i of Q label to construct the local name. The local name
σV Set of matched URIs for Q in V therefore becomes an important artifact for term
σV (qi ) Set of matched URIs for qi in V : ∀ti ∈ σV , ti ∈ V , ti matching for which the highest score will be as-
matches qi signed. An example of local name matching the
p A term predicate: p ∈ URI term “person” is http://schema.org/Person.
D Set of Datasets – Primary labels. The highest score will also be
D A Dataset: D ∈ D assigned for matches on the rdfs:label,
M(ti ) Number of Datasets: D in D, ti ∈ D dce:title, dcterms:title, skos:
prefLabel properties. An example of pri-
mary label matching the term “person” is rdfs:
label "Person"@en.
0.5 ∗ f (t, V ) – Secondary labels. We define as secondary la-
tf (t, V ) = 0.5 +
max{f (ti , V ) : ti ∈ V } bel the following properties: rdfs:comment,
(1) dce:description, dcterms:descrip-
|V |
idf (t, V) = log tion, skos:altLabel. A medium score is
|{V ∈ V : t ∈ V }|
assigned for matches on these properties. An ex-
As highlighted in [9] and [26], the notion of the vo- ample of secondary label matching the term “per-
cabulary term’s popularity across the LOD datasets set son” is dcterms:description "Exam-
D is quite important. In Eq. (2) we introduce a new ples of a Creator include a per-
popularity measure, which is a function of the normal- son, an organization, or a ser-
isation of the frequency f (t, D) of a term URI t in the vice."@en.
set of datasets D and the normalisation of the number – Tertiary labels. Finally all properties not falling
of datasets in which a term URI appears M(t) : t ∈ D. in the previous categories are considered as ter-
By using the maximum in this normalisation we em- tiary labels for which a low score is assigned. An
phasise the most used terms, result of a consensus example of tertiary label matching the term “per-
within the community. This measure will give a higher son” is rdarel2:name "Person"@en.
score to terms that are often used in datasets and across
a large number of datasets. norm(t, V ) = lengthNorm(field)
  
f (t, D) ∗ boost p(t) (3)
pop(t, D) =
max{f (ti , D) : ti ∈ D} p∈V
M(t)
∗ (2) For every vocabulary in LOV, terms (classes, prop-
max{M(ti ) : ti ∈ D}
erties, datatypes, instances) are indexed and a full text
RDF datasets have a consensual and stable struc- search feature is offered.18 Human users or agents can
ture, which arises from the best practices of vocab- further narrow a search by filtering on term type (class,
ulary publication. It then becomes intuitive to assign property, datatype, instance), language, vocabulary do-
more importance to a vocabulary term matching a main and vocabulary.
query on the value of the property rdfs:label than The final score of t for a query Q (Eq. (4)) is a com-
dcterms:comment. Equation (3) extends the inner bination of the tf-idf, the importance of label proper-
field-length norm lengthNorm(field) from the Lucene-
based search engine Elasticsearch, which attaches a 18 http://lov.okfn.org/dataset/lov/terms.
P.-Y. Vandenbussche et al. / LOV: A gateway to reusable semantic vocabularies on the Web 445

Fig. 8. The LOV catalogue RDF schema model, in a UML class diagram representation.

ties of t on which query terms matched, and the pop- quads format21 (keeping each vocabulary in a sepa-
ularity of that term in the LOD dataset. While the fac- rate named graph). As illustrated in Fig. 8, the RDF
torisation of the tf-idf and field normalisation factor is model mainly reuses the Data CATalogue Vocabulary
common for search engine ranking,19 we add a fourth (DCAT) which allows the representation of the LOV
parameter – the popularity – as it is fundamental in the catalogue as a dcat:Catalog composed of vocabu-
Semantic Web. Indeed, the intention of LOV is to fos- lary entries (dcat:CatalogRecord) capturing in-
ter the reuse of consensual vocabularies that become formation like the insertion date in LOV. Each entry
de facto standards. The popularity metric provides an point to the vocabulary itself is represented by a sub
indication on how widely a term is already used (in class of dcat:Dataset defined in the Vocabulary
frequency and in the number of datasets using it). We Of A Friend (VOAF). This artifact contains metadata
therefore add this new factor specific to the Semantic extracted by the LOV application such as creators, first
Web to the scoring equation: issued date, number of occurrences of the vocabulary
in Linked Open Data. Each vocabulary is then linked
score(t, Q) = tf (t, V ) ∗ idf (t, V) to its various published versions represented by the
dcat:Distribution entity on which information
∗ norm(t, V ) ∗ pop(t, D) such as inter-vocabulary relations or languages can be
 
: ∀t ∃qi ∈ Q : t ∈ σV (qi ) (4) found.
3.3.3. SPARQL endpoint
3.3.2. Data dumps The LOV SPARQL endpoint22 offers a complemen-
The system provides two data dumps, one contain- tary data access method and allows clients to pose
ing the LOV vocabulary catalogue only in RDF Nota- complex queries to the server and retrieve direct an-
tion 3 format20 and another containing the LOV cat- swers computed over the LOV dataset [8]. We use the
alogue along with the latest version of each vocab- Jena Fuseki triple store to store the N-quads file con-
ulary and the statistics of use in LOD in RDF N- taining the LOV catalogue and the latest version of

19 See elasticsearch documentation: http://bit.ly/1e37sFL. 21 http://lov.okfn.org/lov.nq.gz.


20 http://lov.okfn.org/lov.n3.gz. 22 http://lov.okfn.org/dataset/lov/sparql.
446 P.-Y. Vandenbussche et al. / LOV: A gateway to reusable semantic vocabularies on the Web

Fig. 10. A graphical representation of the incoming and outgoing


links for the Schema.org vocabulary as displayed in the UI.
Fig. 9. List of APIs to access LOV data.
above have been developed for, and follow the require-
each vocabulary. We believe that this is the first ser- ments of, Ontology Design and Data Publication tools.
vice that allows users to query multiple vocabularies The LOV Website offers intuitive navigation within
at the same time and to detect inter-vocabulary depen- the vocabularies catalogue. It allows users to explore
dencies. vocabularies, vocabulary terms, agents and languages,
3.3.4. LOV application program interfaces and user and to see the connections between these entities. For
interfaces instance, a user can use the agent search to look for
LOV APIs give a remote access to the many func- experts in geography and geometry domains.24 We use
tions of LOV through a set of RESTful services.23 the d325 JavaScript library [7] to display charts and
The basic design requirements for these APIs is that make the navigation more interactive; for example, we
they should allow applications to get access to the very use the star graph representation to display incoming
same information humans do via the User Interfaces. and outgoing links between vocabularies (cf. Fig. 10).
More precisely the APIs give access, through three dif-
ferent services (cf. Fig. 9), to functions related to: 3.4. Data storage

– Vocabulary terms (classes, properties, datatypes To support the features presented above, we make
and instances). With these functions, a software use of specific storage technologies. The LOV cat-
application can query the LOV search engine, alogue is stored in MongoDB® , a document-based
ask for auto-completion or a suggestion for mis- schema-less data store that scales and allows for dy-
spelled terms. namic changes in the data schema.26 We use Jena
– Vocabularies. A client can get access to the cur- Fuseki27 to serve the data exported in RDF through
rent list of vocabularies contained in the LOV the SPARQL protocol. The search feature is supported
catalogue; search for vocabularies, get auto- by Elasticsearch® , a full text index based on Lucene
completion or obtain all details about a vocabu- technology.28 This storage solution is particularly well
lary. adapted to our User Interface technology (Node.js) as
– Agents. This provides a software agent with a list it offers RESTful APIs with output in JSON format.
of all agent references in the LOV catalogue, a Finally we store each vocabulary version file and RDF
means to search for an agent, get auto-completion dumps of LOV catalogue in the environment file sys-
and details about an agent. tem.
LOV APIs are a convenient means to access the full
functionality and data of LOV. It is particularly ap- 24 http://lov.okfn.org/dataset/lov/agents?&tag=Geography,

propriate for dynamic Web applications using script- Geometry.


25 http://d3js.org/.
ing languages such as JavaScript. The APIs described 26 https://www.mongodb.org/.
27 https://jena.apache.org/documentation/serving_data/.
23 http://lov.okfn.org/dataset/lov/apidoc/. 28 https://lucene.apache.org/.
P.-Y. Vandenbussche et al. / LOV: A gateway to reusable semantic vocabularies on the Web 447

4. LOV adoption 4.2. Using LOV as a research platform

LOV, with its various data access methods, supports LOV has served as the object of studies in [21]
the emergence of a rich application ecosystem. Below where Poveda-Villalón et al. analysed trends in ontol-
we list some tools using our system as part of their ogy reuse methods. In addition, the LOV dataset has
service and project. been used to analyse the occurrence of good and bad
practices in vocabularies [22].
4.1. Derived tools and applications Prefixes in the LOV dataset are regularly mapped
with namespaces in the prefix.cc service. In [2], the au-
In [18], Maguire et al. use the LOV search API to thors perform alignments of Qnames of vocabularies in
implement OntoMaton,29 a widget for using ontology both services and provide different solutions to handle
lookup and tagging within the Google spreadsheets clashes and disagreements between preferred names-
collaborative environment. paces. Both LOV and prefix.cc provide associations
YASGUI (Yet Another SPARQL Query GUI)30 is a between prefixes and namespaces but follow a differ-
client-side JavaScript SPARQL query editor that uses ent logic. The prefix.cc service supports polysemy and
the LOV API for property and class auto-completion synonymy, and has a very loose control on its crowd-
together with prefix.cc31 for namespace prefix auto- sourced information. In contrast, LOV has a much
completion [25]. YASGUI is itself reused by LOV for more strict policy forbidding polysemy and synonymy
its SPARQL Endpoint User Interface. ensuring that each vocabulary in the LOV database is
Data2Ontology maps data objects and properties to uniquely identified by a unique prefix identification al-
ontology classes and predicates available in the LOV lowing the usage of prefixes in various LOV publica-
catalogue. Data2Ontology is part of the Datalift32 plat- tion URIs.
form [27], a framework for “lifting” raw data into RDF. The LOV query log covering the period between
The Data2Ontology module takes as input “raw RDF”, 2012-01-06 and 2014-04-16 has been used in [9] to
straightforward conversion of legacy format to RDF, build a benchmark suite for ontology search and rank-
with the goal of helping data publishers in selecting ing. The CBRBench35 benchmark uses eight ranking
vocabulary terms that could be used to better represent models of resources in ontologies and compares the
their data. results with ontology engineers’ results. Our vocabu-
OntoWiki33 facilitates the visual presentation of a lary term ranking method relies on and extends the out-
knowledge base as an information map, with different come of this work.
views on instance data [4]. It enables intuitive author- In [16], the authors provide a 5 star rating for RDF
ing of semantic content, with an inline editing mode vocabulary publication to boost interoperability, query
for editing RDF content, similar to WYSIWIG for federation and better interpretation of data on the Web
text documents. OntoWiki offers a vocabulary selec- similar to the 5 stars rating for Linked Open Data.
tion feature based on LOV. Based on LOV’s best practices criteria, all vocabular-
Furthermore, we can mention the ProtégéLOV,34 a ies must be 5 stars using this ranking and must provide
plug-in for the Protégé editor tool [14] that aims at further quality attributes imposed by LOV to facilitate
improving the development of lightweight ontologies vocabulary reuse.
by reusing existing vocabularies at a low fine grained RDFUnit36 is a test-driven data debugging frame-
level. The tool searches for a term in LOV via APIs work for the Web of Data. In [17], the authors pro-
and provides three actions if the term exists: 1) to re- vide an automatic test case for all available schema
place the selected term in the current ontology, 2) to registered with LOV. Vocabularies are used to encode
add the rdfs:subClassOf axiom and 3) to add the semantics to domain specific knowledge to check the
owl:equivalentClass. quality of data.
Finally, Governatori et al. [15] analyse the current
29 https://github.com/ISA-tools/OntoMaton. use of licenses in vocabularies on the Web based on
30 http://legacy.yasgui.org/.
the LOV catalogue in order to propose a framework to
31 http://prefix.cc.
32 http://datalift.org/.
33 http://ontowiki.net/. 35 https://zenodo.org/record/11121.
34 http://labs.mondeca.com/protolov/. 36 https://github.com/AKSW/RDFUnit.
448 P.-Y. Vandenbussche et al. / LOV: A gateway to reusable semantic vocabularies on the Web

Table 7
Comparison of LOV with respect to Swoogle, Watson, Falcons and Vocab.cc; adapted from the framework presented by d’Aquin and Noy [12].
SWD stands for Semantic Web Document
Feature Swoogle Watson Falcons Vocab.cc LOV
Listing ontologies Yes Yes Yes Yes Yes
Ontology discovery method Automatic Automatic Automatic Automatic Automatic/Manual
Scope SWDs SWDs Concepts vocab terms Vocabularies
Ranking LOD metric LOD metric LOD metric BTC corpus LOD/LOV metric
+ label’s property type
Domain filtering No No No No Yes
Comments and review No Yes No No Curators
Web service access Yes Yes Yes Yes Yes
SPARQL endpoint No No No No Yes
Read/Write Read Read/Write Read Read Read
Ontology directory No No No Yes Yes
Application platform No No No N/A Yes
Storage Cache N/A N/A API Dump/endpoint
Interaction with contributors No N/A No No Yes
Version tracking No No No No Yes
Inter-vocab. relationship visualization No No No No Yes

detect incompatibilities between datasets and vocabu- extent of Earth system science and related con-
laries. cepts (such as data characteristics), with the
search tool to aid finding science data resources.
– Catalogues of ontology Design Patterns (ODP)
5. Related work focus on reusable patterns in ontology engineer-
ing [23]. The submitted patterns are small pieces
Reusing vocabularies requires searching for terms of vocabularies that can further be integrated or
in existing specialised vocabulary catalogues or search linked with other vocabularies. ODP does not pro-
engines on the Web. While we refer the reader to [12] vide a search function for specific terms as is the
for a systematic survey of ontology repositories, below case with some of these other catalogues.
we list some existing catalogues relevant for finding – Search Engines of ontology terms. Among ontol-
vocabularies: ogy search engines, we can cite: Swoogle [13],
– Catalogues of generic vocabularies/schemas sim- Watson [11], FalconS [10] and Vocab.cc [29].
ilar to LOV catalogue. Example of catalogues These search engines crawl for data schema from
falling in this category are vocab.org,37 RDF documents on the Web. They offer filtering
ontologi.es,38 JoinUp Semantic Assets or the based on ontology type (Class, Property) and a
Open Metadata Registry. Most of those repos- ranking based on the popularity. They don’t look
itories are not regularly updated and are creat- for ontology relations nor do they check if the
ed/owned by the institutions using the service. definition of the ontology is available (usually
– Catalogues of ontologies for a specific domain known as dereferenciation). While in Swoogle
such as biomedicine with the BioPortal [33], the ranking score is displayed, Watson shows the
geospatial ontologies with SOCoP+OOR,39 Ma- language of the resource and the size. However,
rine Metadata Interoperability and the SWEET none of these services provide any relationship
[24] ontologies.40 The SWEET ontologies in- between the related ontologies, or any domain
clude several thousand terms, spanning a broad classification of the vocabularies. Table 7 presents
a summary of key features of LOV with respect
37 http://vocab.org/. to Swoogle, Watson, Falcons and Vocab.cc.
38 http://ontologi.es/. – Datasets and Vocabularies statistics. In this cat-
39 https://ontohub.org/socop. egory we can mention LODStats [3] and the vo-
40 http://sweet.jpl.nasa.gov//. cabularies derived from the LOD Cloud. LOD-
P.-Y. Vandenbussche et al. / LOV: A gateway to reusable semantic vocabularies on the Web 449

Stats makes a bridge between datasets and vocab- ies met LOV requirements and 8 different categories of
ularies gathering up to 32 different statistical cri- errors were detected: 1) Failed to determine the triples
teria based on a statement-stream-based approach content type, 2) Not found exception, 3) 403 forbid-
for RDF datasets in Datahub.41 LODStats main- den, 4) Unknown host exception, 5) Peer not authenti-
tains a comprehensive statistics on vocabularies cated, 6) 504 gateway, 7) Bad URI and 8) Unqualified
terms (i.e. classes, properties) defined and used typed nodes are not allowed.
in a dataset. Schmachtenberg et al. [28] present a Recently, an updated comprehensive empirical sur-
survey based on a large-scale Linked Data crawl vey of Linked Data conformance has been presented
from March 2014 to analyse the differences in by Schmachtenberg et al. [28]. Their survey is based
best practices adoption across different applica- on a large-scale Linked Data crawl from March 2014
tion domains. Their results concerning the most to analyse the differences of best practices adoption in
used vocabularies (e.g., foaf, dcterms, skos, different domains. Their results concerning the most
etc.) and the adoption of well-known vocabularies used accessible vocabularies and the adoption of well-
are inline with the findings of this paper. known vocabularies are inline with the findings of
this paper. However, comparing the vocabularies in
While most of the related work focuses on automatic the LOD cloud with the LOV catalogue needs some
techniques to gather as many ontologies as possible, alignments. From the 638 mentioned by Schmachten-
LOV focuses on maintaining a high quality collection berg et al., we removed invalid URIs such as domain
of vocabularies that data publishers can reuse to de- names such as “umbel.org”. Additionally we removed
scribe their own data. To ensure the high quality of misspelled URIs and incomplete URIs. As a result,
LOV data, we set up some stringent requirements for 270 candidate URIs (42.31%) can be compared with
vocabularies to be inserted (cf. Section 3.2.1) such as LOV vocabularies. Based on this analysis, we found
the fact that a vocabulary URI must be dereference- that 102 vocabularies in the LOD cloud are already in
able. These kinds of requirements are not always taken the LOV catalogue, representing 38% of the 270 can-
into account in the aforementioned work: for instance, didates. The general difference of our work with the
the authors in [28] define the notion of partly deref- one presented by Schmachtenberg et al. is that our ap-
erenceable for vocabularies. As a consequence, any- proach applies strict criteria to include a vocabulary
one using a vocabulary referenced in LOV is ensured while their approach is dataset driven.
to get access to the vocabulary metadata but most im-
portantly to its formal definition and preservation by
accessing to various versions. 6. Discussion
As part of our system evaluation we have compared
the list of vocabularies in LOV with the ones in ex- Whilst providing access to high quality vocabular-
ternal services (LODStats and the empirical survey of ies, LOV system presents several limitations. As de-
Schmachtenberg et al. [28]) so as to understand the scribed in the last section, LOV system could bene-
discrepancy. fit from an automatic discovery process to suggest vo-
LODStats contains 2,940 vocabularies extracted cabulary candidates. We could for instance extract vo-
from datasets listed in Datahub.io. This list con- cabularies from the latest version of the Billion Triple
tains in fact a large number (2,596) of invalid vo- Challenge or the Web Data Commons42 dataset. Man-
cabulary URIs and resource URIs that do not re- ual curation is a critical activity to ensure the high
fer to a vocabulary (e.g. http://data.kingcounty.gov/ quality of the LOV catalogue but also represents a
resource/d665-vvmd/ or http://lod2.eu/view). The do- limitation. At the moment we have been able to re-
main “http://dati.opendataground.it” contains 962 Re- cruit new curators as the catalogue is growing. The
source URIs which are instances and not vocabularies. version 3 of LOV system automates most of the pro-
As a result, only 344 candidate URIs in LODStats are cesses and analyses but there are still some assess-
comparable with LOV vocabularies. Out of those 344 ment and support activities that only a human can per-
URIs, 73 (21.22%) are covered by LOV. We randomly form.
chose 20 vocabularies not already present in LOV for Currently, LOV’s scope focuses on vocabularies for
assessment. None of the randomly chosen vocabular- the description of RDF data and does not include any

41 http://datahub.io/. 42 http://webdatacommons.org/.
450 P.-Y. Vandenbussche et al. / LOV: A gateway to reusable semantic vocabularies on the Web

Value Vocabularies such as SKOS thesauri. By making termining which vocabularies to use to describe their
the code of LOV system open source, we encourage data. The key innovations described in this article in-
anyone to set up an instance of the system to target clude: 1) the availability of a high quality dataset of vo-
such artifacts. cabularies available through multiple access methods
LOV relies on external projects such as LODStats 2) the curation by experts, making explicit for the first
to get the valuable information of vocabulary usage in time the relationships between vocabularies and their
published datasets. At the moment, the popularity in- version history; and 3) the consideration of property
formation coming from LODStats does not take into semantics in term search relevance scoring.
account the most recent interest in publishing RDF In the future, the LOV initiative could evolve in sev-
data using markup language (e.g. schema.org). As eral ways. First, an area that is still largely unexplored
a consequence, the popularity measure is incomplete is multi-term vocabulary search. During the ontology
and does not represent all possible use of a vocabulary. design process, it is common to have more than 20 con-
In future work we intend to extract those information cepts represented using existing vocabularies or a new
from the latest datasets versions of the Billion Triple one in case there is no corresponding artifact. While
Challenge and the Web Data Commons. we are able to search for relevant terms in LOV it is
From the study of LOV as a dynamic ecosystem we still the responsibility of the ontology designer to un-
can draw two main lessons learned: the need for more derstand the complex relationships between all these
multilingual vocabularies on the Web and the impor- terms and come up with a coherent ontology. We could
tance of long term preservation of vocabularies. use the network of vocabularies defined in LOV to sug-
Labels are the main entry point to a vocabulary and gest not only a list of terms but graphs to represent sev-
their associated language is the key. Only 15% of LOV eral concepts together.
vocabularies make use of more than one language. Second, we would like to provide more vocabulary
Multilingualism is important at least for two reasons: based services such as vocabulary matching to help
1) the most obvious one is allowing users to search, authors add more relationships to other vocabularies.
query and navigate vocabularies in their native lan- Vocabulary checking is another service the commu-
guage; and 2) translation is a process through which nity is asking for. We could integrate useful applica-
the quality of a vocabulary can only improve. Look- tions directly into LOV, such as Vapour,43 RDF Triple-
ing at a vocabulary through the eyes of other languages Checker44 and OOPS!.45
and identifying the difficulties of translation helps to Another research direction is SPARQL query exten-
better outline the initial concepts and if necessary re- sion and rewriting based on Linked Vocabularies. Us-
fine or revise them. Hence multilingualism and trans- ing the inter-vocabulary relationships we could trans-
lation should be native, built-in features of any vocab- form a query to use the same semantics (same vocabu-
ulary construction, not a marginal task. lary terms) as the data source(s) being queried.
Currently there is no solution for long-term vocabu- Finally, we plan to provide a user study and publish
lary preservation on the Web [5]. This is a particularly the results on the different usage of LOV by end users.
important problem in a distributed and uncontrolled In addition, we plan to include the vocabularies from
environment where any individual can create and pub- LODStats and LOD Cloud that are suitable for inclu-
lish a vocabulary. Third parties can reuse such vocab- sion in the LOV catalogue.
ularies and therefore create a dependency on the origi- The adoption and integration of the LOV catalogue
nal vocabulary availability as it retains the semantics of in applications for vocabulary engineering, reuse and
the data. This issue weakens the Semantic Web foun- data quality are significant. LOV has a central role in
dations. vocabulary life-cycle on the Web of data as highlighted
by the W3C:46 “The success of LOV as a central in-
formation point about vocabularies is symptomatic of
7. Conclusions and future work a need, for an authoritative reference point to aid the
encoding and publication of data”.
In this system report we presented an overview of
the Linked Open Vocabularies initiative, a high quality 43 https://bitbucket.org/fundacionctic/vapour/wiki/Home.
catalogue of reusable vocabularies for the description 44 http://graphite.ecs.soton.ac.uk/checker/.
of data on the Web. The importance of this work is mo- 45 http://oops.linkeddata.es/.
tivated by the difficulty that data publishers have in de- 46 http://www.w3.org/2013/data/.
P.-Y. Vandenbussche et al. / LOV: A gateway to reusable semantic vocabularies on the Web 451

Acknowledgements T. Tudorache, A. Bernstein, C. Welty, C. Knoblock, D. Vran-


dečić, P. Groth, N. Noy, K. Janowicz and C. Goble, eds, Lec-
This work has been partially supported by the ture Notes in Computer Science, Vol. 8797, Springer Interna-
French National Research Agency (ANR) within the tional Publishing, 2014, pp. 130–147. doi:10.1007/978-3-319-
11915-1_9.
Datalift Project, under grant number ANR-10-CORD-
[10] G. Cheng, W. Ge and Y. Qu, Falcons: Searching and brows-
009; the Spanish project BabelData (TIN2010-17550) ing entities on the semantic web, in: WWW, J. Huai, R. Chen,
and Fujitsu Laboratories Limited. The Linked Open H.-W. Hon, Y. Liu, W.-Y. Ma, A. Tomkins and X. Zhang, eds,
Vocabularies initiative is graciously hosted by the ACM, 2008, pp. 1101–1102. doi:10.1145/1367497.1367676.
Open Knowledge Foundation. We would like to thank [11] M. d’Aquin, C. Baldassare, L. Gridinoc, M. Sabou, S. Angele-
tou and E. Motta, Watson: Supporting next generation semantic
all the members of LOV community and all the editors
web applications, in: WWW/Internet Conference 2007, 2007.
and publishers of vocabularies who trust in LOV cat- [12] M. d’Aquin and N.F. Noy, Where to publish and find ontolo-
alogue. A special thank to Phil Archer, Julia Bosque gies? A survey of ontology libraries, Web Semantics: Science,
Gil and Jodi Schneider for their valuable feedback and Services and Agents on the World Wide Web 11 (2012), 96–111.
comments on this paper. doi:10.1016/j.websem.2011.08.005.
[13] T. Finin, L. Ding, R. Pan, A. Joshi, P. Kolari, A. Java and
Y. Peng, Swoogle: Searching for knowledge on the semantic
web, in: Proc. of the 20th National Conference on Artificial
References Intelligence – Volume 4, AAAI’05, A. Cohn, ed., AAAI Press,
2005, pp. 1682–1683.
[1] K. Alexander and M. Hausenblas, Describing linked datasets- [14] N. García-Santa, G.A. Atemezing and B. Villazón-Terrazas,
on the design and usage of void, the vocabulary of interlinked The protégélov plugin: Ontology access and reuse for ev-
datasets, in: Linked Data on the Web Workshop (LDOW 09), in eryone, in: The Semantic Web: ESWC 2015 Satellite Events,
Conjunction with 18th International World Wide Web Confer- F. Gandon, C. Guéret, S. Villata, J. Breslin, C. Faron-Zucker
ence (WWW 09), C. Bizer, T. Heath, T. Berners-Lee and K. Ide- and A. Zimmermann, eds, Lecture Notes in Computer Science,
hen, eds, Citeseer, 2009. Vol. 9341, Springer International Publishing, 2015, pp. 41–45.
[2] G.A. Atemezing, B. Vatant, R. Troncy and P.-Y. Vanden- doi:10.1007/978-3-319-25639-9_8.
bussche, Harmonizing services for lod vocabularies: A case [15] G. Governatori, H.-P. Lam, A. Rotolo, S. Villata, G.A. Ate-
study, in: WaSABi@ISWC, S. Coppens, K. Hammar, M. Knuth, mezing and F.L. Gandon, Checking licenses compatibility be-
M. Neumann, D. Ritze, H. Sack and M.V. Sande, eds, CEUR tween vocabularies and data, in: COLD, O. Hartig, A. Hogan
Workshop Proceedings, Vol. 1106, CEUR-WS.org, 2013. and J. Sequeda, eds, CEUR Workshop Proceedings, Vol. 1264,
[3] S. Auer, J. Demter, M. Martin and J. Lehmann, Lodstats – CEUR-WS.org, 2014.
an extensible framework for high-performance dataset ana- [16] K. Janowicz, P. Hitzler, B. Adams, D. Kolas and C. Vardeman,
lytics, in: Knowledge Engineering and Knowledge Manage- Five stars of linked data vocabulary use, Semantic Web 5(3)
ment, A. ten Teije, J. Völker, S. Handschuh, H. Stucken- (2014), 173–176. doi:10.3233/SW-140135.
schmidt, M. d’Acquin, A. Nikolov, N. Aussenac-Gilles and [17] D. Kontokostas, P. Westphal, S. Auer, S. Hellmann,
N. Hernandez, eds, Lecture Notes in Computer Science, J. Lehmann, R. Cornelissen and A. Zaveri, Test-driven eval-
Vol. 7603, Springer, Berlin, Heidelberg, 2012, pp. 353–362. uation of linked data quality, in: Proc. of the 23rd Interna-
doi:10.1007/978-3-642-33876-2_31. tional Conference on World Wide Web, WWW ’14, Republic
[4] S. Auer, S. Dietzold and T. Riechert, OntoWiki – a tool for and Canton of Geneva, Switzerland, 2014, pp. 747–758, Inter-
social, semantic collaboration, in: Lecture Notes in Computer national World Wide Web Conferences Steering Committee.
Science, Springer Science and Business Media, 2006, pp. 736– doi:10.1145/2566486.2568002.
749. doi:10.1007/11926078_53. [18] E. Maguire, A. Gonzalez-Beltran, P.L. Whetzel, S.-A. Sansone
[5] T. Baker, P.-Y. Vandenbussche and B. Vatant, Requirements and P. Rocca-Serra, OntoMaton: A bioportal powered ontology
for vocabulary preservation and governance, Library Hi Tech widget for Google spreadsheets, Bioinformatics 29(4) (Dec.
31(4) (2013), 657–668. doi:10.1108/LHT-03-2013-0027. 2012), 525–527. doi:10.1093/bioinformatics/bts718.
[6] T. Berners-Lee, Linked data – design issues, W3C, 2006, [19] S.G. Oh, M. Yi and W. Jang, Deploying linked open vocabulary
(09/20). (lov) to enhance library linked data, Journal of Information
[7] M. Bostock, V. Ogievetsky and J. Heer, Data-driven docu- Science Theory and Practice 2(2) (Jun. 2015). doi:10.1633/
ments, IEEE Trans. Visual. Comput. Graphics 17(12) (Dec. JISTaP.2015.3.2.1.
2011), 2301–2309. doi:10.1109/tvcg.2011.185. [20] C. Pedrinaci, J. Cardoso and T. Leidig, Linked USDL: A vo-
[8] C. Buil-Aranda, A. Hogan, J. Umbrich and P.-Y. Vandenbuss- cabulary for web-scale service trading, in: Proc. of the Se-
che, Sparql web-querying infrastructure: Ready for action? in: mantic Web: Trends and Challenges – 11th International Con-
The Semantic Web – ISWC 2013, H. Alani, L. Kagal, A. Fok- ference, ESWC 2014, Anissaras, Crete, Greece, May 25–29,
oue, P. Groth, C. Biemann, J. Parreira, L. Aroyo, N. Noy, 2014, Lecture Notes in Computer Science, Springer Interna-
C. Welty and K. Janowicz, eds, Lecture Notes in Computer Sci- tional Publishing, 2014, pp. 68–82. doi:10.1007/978-3-319-
ence, Vol. 8219, Springer, Berlin, Heidelberg, 2013, pp. 277– 07443-6_6.
293. doi:10.1007/978-3-642-41338-4_18. [21] M. Poveda-Villalón, M.C. Suárez-Figueroa and A. Gómez-
[9] A.S. Butt, A. Haller and L. Xie, Ontology search: An empir- Pérez, The landscape of ontology reuse in linked data, in: 1st
ical evaluation, in: The Semantic Web – ISWC 2014, P. Mika, Ontology Engineering in a Data-Driven World (OEDW 2012)
452 P.-Y. Vandenbussche et al. / LOV: A gateway to reusable semantic vocabularies on the Web

Workshop at the 18th International Conference on Knowledge abling linked-data publication with the datalift platform, in:
Engineering and Knowledge Management (EKAW2012), Infor- 26th Conference on Artificial Intelligence (AAAI-12), 2012.
matica, 2012. [28] M. Schmachtenberg, C. Bizer and H. Paulheim, Adoption of
[22] M. Poveda-Villalón, B. Vatant, M.C. Suárez-Figueroa and the linked data best practices in different topical domains,
A. Gómez-Pérez, Detecting good practices and pitfalls when in: The Semantic Web – ISWC 2014, P. Mika, T. Tudorache,
publishing vocabularies on the Web, in: Proc. of the 4th Work- A. Bernstein, C. Welty, C. Knoblock, D. Vrandečić, P. Groth,
shop on Ontology and Semantic Web Patterns Co-Located N. Noy, K. Janowicz and C. Goble, eds, Lecture Notes in Com-
with 12th International Semantic Web Conference (ISWC puter Science, Vol. 8796, Springer International Publishing,
2013), Sydney, Australia, October 21, 2013, A. Gangemi, 2014, pp. 245–260. doi:10.1007/978-3-319-11964-9_16.
M. Gruninger, K. Hammar, L. Lefort, V. Presutti and [29] S. Stadtmüller, A. Harth and M. Grobelnik, Accessing infor-
A. Scherp, eds, CEUR Workshop Proceedings, Vol. 1188, mation about linked data vocabularies with vocab.cc, in: Se-
CEUR-WS.org, 2013. mantic Web and Web Science, J. Li, G. Qi, D. Zhao, W. Ne-
[23] V. Presutti and A. Gangemi, Content ontology design pat- jdl and H.-T. Zheng, eds, Springer Proceedings in Complexity,
terns as practical building blocks for web ontologies, in: Springer, New York, 2013, pp. 391–396. doi:10.1007/978-1-
Lecture Notes in Computer Science, Springer Science and 4614-6880-6_34.
Business Media, 2008, pp. 128–141. doi:10.1007/978-3- [30] M.d.C. Suárez-Figueroa, NeOn methodology for building on-
540-87877-3_11. tology networks: Specification, scheduling and reuse, PhD
[24] R.G. Raskin and M.J. Pan, Knowledge representation in thesis, Universidad Politecnica de Madrid, Spain, June 2010.
the semantic web for Earth and environmental terminology http://oa.upm.es/3879/.
(SWEET), Computers & Geosciences 31(9) (Nov. 2005), [31] P.-Y. Vandenbussche and B. Vatant, Metadata recommenda-
1119–1125. doi:10.1016/j.cageo.2004.12.004. tions for linked open data vocabularies, Technical report,
[25] L. Rietveld and R. Hoekstra, Yasgui: Not just another sparql 2012.
client, in: The Semantic Web: ESWC 2013 Satellite Events, [32] S. Villata and F. Gandon, Licenses compatibility and com-
P. Cimiano, M. Fernández, V. Lopez, S. Schlobach and position in the web of data, in: Proc. of the Third Interna-
J. Völker, eds, Lecture Notes in Computer Science, Vol. 7955, tional Workshop on Consuming Linked Data, COLD 2012,
Springer, Berlin, Heidelberg, 2013, pp. 78–86. doi:10.1007/ Boston, MA, USA, November 12, 2012, J.F. Sequeda, A. Harth
978-3-642-41242-4_7. and O. Hartig, eds, CEUR Workshop Proceedings, Vol. 905,
[26] J. Schaible, T. Gottron, S. Scheglmann and A. Scherp, LOVER Aachen, 2012.
in: Proc. of the Joint EDBT/ICDT 2013 Workshops on – EDBT [33] P.L. Whetzel, N.F. Noy, N.H. Shah, P.R. Alexander, C. Nyulas,
’13, Association for Computing Machinery (ACM), 2013. T. Tudorache and M.A. Musen, BioPortal: Enhanced function-
doi:10.1145/2457317.2457332. ality via new web services from the national center for biomed-
[27] F. Scharffe, G. Atemezing, R. Troncy, F. Gandon, S. Villata, ical ontology to access and use ontologies in software applica-
B. Bucher, F. Hamdi, L. Bihanic, G. Képéklian, F. Cotton, tions, Nucleic Acids Research 39(Suppl) (Jun. 2011), W541–
J. Euzenat, Z. Fan, P.-Y. Vandenbussche and B. Vatant, En- W545. doi:10.1093/nar/gkr469.

Вам также может понравиться