different URIs to identify the same entitiy. For instance, DBpedia uses the URI
http://dbpedia.org/resource/Berlinto identify Berlin, while Geonames uses the URI
http://sws.geonames.org/2950159/to identify Berlin. As both URIs refer to the same real-
world entity, they are called URI aliases. URI aliases are common on the Web of Data, as it
can not realistically be expected that all information providers agree on the same URIs to
identify an entity. URI aliases also provide an important social function to the Web of Data
as they are dereferenced to different descriptions of the same real-world entity and thus
allow different views and opinions to be expressed on the Web. In order to still be able to
track that different information providers speak about the same entity, it is common
practice that information providers setowl:sameAslinks to URI aliases they know about.
Different communities have specific preferences on the vocabularies they prefer to use for
publishing data on the Web. The Web of Data is therefore open to arbitrary vocabularies
being used in parallel. Despite this general openness, it is considered good practice to reuse
terms from well-known RDF vocabularies such as FOAF, SIOC,SKOS,DOAP, vCard, Dublin
Core, OAI-ORE orGoodRelationswherever possible in order to make it easier for client
applications to process Linked Data. Only if these vocabularies do not provide the required
terms should data publishers define new, data source-specific terminology (Bizer &
Cyganiak & Heath, 2007). If new terminology is defined, it should be made self-describing
by making the URIs that identify terms Web dereferencable (Berrueta & Phipps, 2008). This
allows clients to retrieve RDF Schema or OWL definitions of the terms as well as term
mappings to other vocabularies. The Web of Data thus relies on a pay as you go data
integration approach (Das Sarma & Dong & Halevy, 2008) based on a mixture of using
common vocabularies together with data source-specific terms that are connected by
mappings as deemed necessary.
A common serialization format for Linked Data is RDF/XML (Beckett, 2004). In situations
where human inspection of RDF data is required, Notation3 (Berners-Lee, 1998), and its
subset Turtle (Beckett and Berners-Lee, 2008), are often provided as alternative, inter-
convertible serializations, due to the greater perceived readability of these formats.
Alternatively, Linked Data can also be serialized as RDFa (Adida et al., 2008) which provides
for embedding RDF triples into HTML. In the second case, data publishers should use the
RDFaaboutattribute to assign URIs to entities in order to allow other data providers to set
RDF links to them.
RDF links allow client applications to navigate between data sources and to discover
additional data. In order to be part of the Web of Data, data sources should set RDF links to
relatedentitiesin other data sources. As data sources often provide information about large
numbersof entities, it is common practice to use automated or semi-automated approaches
to generate RDF links.
In various domains, there are generally accepted naming schemata. For instance, in the
publication domain there are ISBNand ISSN numbers, in the financial domain there are
ISINidentifiers, EAN and EPC codes are widely used to identify products, in life science
various accepted identification schemata exist for genes, molecules, and chemical
substances. If the link source and the link target data sets already both support one of
these identification schema, the implicit relationship between entities in both data sets can
easily be made explicit as RDF links. This approach has been used to generate links between
various data sources in the LOD cloud.
If no shared naming schema exist, RDF links are often generated based on the similarity of
entities within both data sets. Such similarity computations can build on a large body of
related work on record linkage (Winkler, 2006) and duplicate detection (Elmagarmid et al.,
2007) within the database community as well as on ontology matching (Euzenat & Shvaiko,
2007) in the knowledge representation community. An example of a similarity based
interlinking algorithm is presented in (Raimond et al., 2008). In order to set RDF links
between artists in the Jamendo and Musicbrainz data sets, the authors use a similarity
metric that compares the names of artists as well as the titles of their albums and songs.
Various RDF link generation frameworks are available, that provide declarative languages
for specifying which types of RDF links should be created, which combination of similarity
metrics should be used to compare entities and how similarity scores for specific properties
are aggregated into an overall score. The Silk framework (Volz et al., 2009) works against
local and remote SPARQL [Endnote:http://www.w3.org/TR/rdf-sparql-query/]endpoints
and is designed to be employed in distributed environments without having to replicate data
sets locally. The LinQL framework (Hassanzadeh et al., 2009) works over relational
databases and is designed to be used together with database to RDF mapping tools such as
D2R Server or Virtuoso.
Linked Data should be published alongside several types of metadata, in order to increase
its utility for data consumers. In order to enable clients to assess the quality of published
data and to determine whether they want to trust data, data should be accompanied with
meta-information about its creator, its creation date as well as the creation method (Hartig,
2009). Basic provenance meta-information can be provided using Dublin Core terms or the
Semantic Web Publishing vocabulary (Carroll et al., 2005). The Open Provenance Model
(Moreau et al., 2008) provides terms for describing data transformation workflows. In (Zhao
et al., 2008), the authors propose a method for providing evidence for RDF links and for
tracing how the RDF links change over time
In order to support clients in choosing the most efficient way to access Web data for the
specific task they have to perform, data publishers can provide additional technical
metadata about their data set and its interlinkage relationships with other data sets: The
Semantic Web Crawling sitemap extension (Cyganiak et al., 2008) allows data publishers to
state which alternative means of access (SPARQL endpoint, RDF dumps) are provided
besides dereferenceable URIs. The Vocabulary Of Interlinked Datasets (Alexander et al.,
2009) defines terms and best practices to categorize and provide statistical meta-
information about data sets as well as the linksets connecting them.
A variety of Linked Data publishing tools has been developed. The tools either serve the
content of RDF stores as Linked Data on the Web or provide Linked Data views over non-
RDF legacy data sources. The tools shield publishers from dealing with technical details such
as content negotiation and ensure that data is published according to the Linked Data
community best practices (Sauermann & Cyganiak, 2008; Berrueta & Phipps, 2008; Bizer &
Cyganiak & Heath, 2007). All tools support dereferencing URIs into RDF descriptions. In
addition, some of the tools also provide SPARQL query access to the served data sets and
support the publication of RDF dumps.
• D2R Server.D2R Server (Bizer & Cyganiak, 2006) is a tool for publishing non-RDF
relational databases as Linked Data on the Web. Using a declarative mapping
language, the data publisher defines a mapping between the relational schema of
the database and the target RDF vocabulary. Based on the mapping, D2R server
publishes a Linked Data view over the database and allows clients to query the
database via the SPARQL protocol.
• Virtuoso Universal Server.The OpenLink Virtuoso server[Endnote:
http://www.openlinksw.com/dataspace/dav/wiki/Main/VOSRDF] provides for serving
RDF data via a Linked Data interface and a SPARQL endpoint. RDF data can either
be stored directly in Virtuoso or can be created on the fly from non-RDF relational
databases based on a mapping.
• Talis Platform.The Talis Platform[Endnote: http://www.talis.com/platform/] is
delivered as Software as a Service accessed over HTTP, and provides native storage
for RDF/Linked Data. Access rights permitting, the contents of each Talis Platform
store are accessible via a SPARQL endpoint and a series of REST APIs that adhere to
the Linked Data principles.
• Pubby.The Pubby server (Cyganiak & Bizer, 2008) can be used as an extension to
any RDF store that supports SPARQL. Pubby rewrites URI requests into SPARQL
DESCRIBE queries against the underlying RDF store. Besides RDF, Pubby also
provides a simple HTML view over the data store and takes care of handling 303
redirects and content negotiation between the two representations.
• Triplify.The Triplify toolkit (Auer et al, 2009) supports developers in extending
existing Web applications with Linked Data front-ends. Based on SQL query
templates, Triplify serves a Linked Data and a JSON view over the application's
• SparqPlug.SparqPlug (Coetzee, Heath and Motta, 2008) is a service that enables
the extraction of Linked Data from legacy HTML documents on the Web that do not
contain RDF data. The service operates by serialising the HTML DOM as RDF and
allowing users to define SPARQL queries that transform elements of this into an RDF
graph of their choice.
• OAI2LOD Server.The OAI2LOD (Haslhofer & Schandl, 2008) is a Linked Data
wrapper for document servers that support the Open Archives OAI-RMH protocol.
• SIOC Exporters.The SIOC project has developed Linked Data wrappers for several
popular blogging engines, content management systems and discussion forums such
as WordPress, Drupal, and phpBB [Endnote:http://sioc-project.org/exporters].
A service that helps publishers to debug their Linked Data site is the Vapourvalidation
service[Endnote:http://vapour.sourceforge.net/]. Vapour verifies that published data
complies with the Linked Data principles and community best practices.
5. Linked Data Applications
With significant volumes of Linked Data being published on the Web, numerous efforts are
underway to research and build applications that exploit this Web of Data. At present these
efforts can be broadly classified into three
categories: Linked Data browsers, Linked Data
search engines, and domain-specific Linked Data applications. In the following section we
will examine each of these categories.
Linked Data Browsers
Just as traditional Web browsers allow users to navigate between HTML pages by following
hypertext links, Linked Data browsers allow users to navigate between data sources by
following links expressed as RDF triples. For example, a user may view DBpedia's RDF
description of the city of Birmingham (UK), follow a 'birthplace' link to the description of the
comedian Tony Hancock (who was born in the city), and from there onward into RDF data
from the BBC describing broadcasts in which Hancock starred. The result is that a user may
begin navigation in one data source and progressively traverse the Web by following RDF
rather than HTML links. The Disco hyperdata browser [Endnote: http://www4.wiwiss.fu-
berlin.de/bizer/ng4j/disco/] follows this approach and can be seen as a direct application of
the hypertext navigation paradigm to the Web of Data.
Data, however, provides human interface opportunities and challenges beyond those of the
hypertext Web. People need to be able to explore the Web of links between items, but also
to powerfully analyze data in bulk. The Tabulator (Berners-Lee et al, 2006; Berners-Lee et
al, 2008), for example, allows the user traverse the Web of Data, and expose pieces of it in
a controlled way, in "outline mode"; to discover and highlight a pattern of interest; and then
query for any other similar patterns in the data Web. The results of the query form a table
that can then be analyzed with various conventional data presentation methods, such as
faceted browsers, maps, timelines, and so on.
Tabulator and Marbles (Becker & Bizer, 2008) ) (see Figure 3)are among the data browsers
which track the provenance of data, while merging data about the same thing from different
sources.While authors such as (Karger & schraefel, 2006) have questioned the use of
graph-oriented views over RDF data, as seen in browsers such as FOAFNaut [Endnote:
http://www.jibbering.com/foaf/], (Hastrup, Cyganiak & Bojars, 2008) argue that such
interfaces fill an important niche, and describe their Fenfire browser that follows this display
Documents you may be interested
Documents you may be interested