We use proprietary and third party's cookies to improve your experience and our services, identifying your Internet Browsing preferences on our website; develop analytic activities and display advertising based on your preferences. If you keep browsing, you accept its use. You can get more information on our Cookie Policy
Cookies Policy
FIWARE.ArchitectureDescription.Data.SemanticAnnotation R3 - FIWARE Forge Wiki

FIWARE.ArchitectureDescription.Data.SemanticAnnotation R3

From FIWARE Forge Wiki

Jump to: navigation, search
FIWARE WIKI editorial remark:
This page corresponds to Release 3 of FIWARE. The latest version associated to the latest Release is linked from FIWARE Architecture

Contents

Copyright

Legal Notice

Please check the following FI-WARE Open Specification Legal Notice (essential patents license) to understand the rights to use these specifications.

Overview

The principle standing behind Semantic Web is to evolve the "link" concept from an unspecified element describing the relationship between two elements into a "named relationship". This should clarify which is(are) the relationship(s) between those elements.

That is the main reason why RDF (Resource Description Framework), the language of Linked Open Data was invented. RDF is based on Triples, in the form of<SUBJECT><PREDICATE><OBJECT>.


The Subject is a URI, identifying uniquely a particular resource to be described, while the predicate (and sometimes the object) can describe objects and their relationships. The Semantic Annotator is basically a tool which tries to identify important entities (places,persons,organizations) and associate them a text and describe them with Linked Open Data.


This GE provides a general-purpose text analyzer to identify and disambiguate LOD (Linked Open Data) resources related to the entities in the text. It is built following a modular approach to optimize and distribute text processing & LOD sources (plug-in). Also it allows RDF triple generation that easily links to LOD resources.

The main conceptual idea of the Semantic Annotation GE is shown in the Figure below.

Conceptual Model of Semantic Annotation GE


Target usage

This GE may be used in the augmenting of content (news, books, etc.) with additional information and links to LOD. It provides filtering and search based on LOD resources used as categories/tags.


Target users are all stakeholders that want to enrich textual data (tags or text) with meaningful and external content.

In the media era of the web, much content is text-based or partially contains text, either as media itself or as metadata (e.g. title, description, tags, etc.). Such text is typically used for searching and classifying content, either through folksonomies (tag-based search), predefined categories, or through full-text based queries. To limit information overload with meaningless results there is a clear need to assist this searching process with semantic knowledge, thus helping in clarifying the intention of the user. This knowledge can be further exploited not only to provide the requested content, but also to enrich results with, additional , yet meaningful content, which can further satisfy the user needs.

Semantics, and in particular Linked Open Data (LOD), is helpful in both annotating & categorizing content, but also in providing additional rich information that can improve the user experience.

As end-user content can be of any type, and in any language, such enabler requires a general purpose & multilingual approach in addressing the annotation task.

Typical users or applications can be thus found in the area of eTourism or eReading, where content can benefit from such functionality when visiting a place or reading a book. For example, being provided with additional information regarding the location or cited characters.

The pure semantic annotation capabilities can be regarded as helpful for editors to categorize content in a meaningful manner thus limiting ambiguous search results (e.g. an article wouldn’t be simply tagged with apple, but with its exact concept, i.e. a fruit, New York City or the brand)


Basic Design Principles

The Enabler has been designed following a modular approach, as it is shown in the figure above. This way each component in the enabler can be developed or changed, given that it provides the same input/output format.

The Semantic Annotation reasoner (SANr), communicates with a full text based resolver, in order to identify entities in text and with Semantic Data Storages to link these identities with candidates.


This leaves open the road to change data sources in order to have other data sources than Dbpedia [1] or Geonames [2] or to change the process standing behind the candidate's choice for each entity.

Basic Concepts

The GE has a web API, supports multilingual texts (Italian, English, Spanish, Portuguese) and includes "candidate” LOD resources and performs disambiguation. As a result the GE creates external links and HTML snippets showing in a user-friendly way LOD information.

The API processes the input text with a language processor in order to identify entities in text which are basically persons, places and organizations. This is performed by crossing grammatical and syntactic information.

Once the entities are identified, the system tries to associate a list of candidates to each entity. Candidates are entries coming from Dbpedia and Geonames which are the most used general purpose semantic databases. Candidate association is performed by comparing each entity with the Dbpedia Labels, the most similar ones area chosen as candidates.

For each candidate, the system computes a score based on a syntactic similarity metric (e.g. if the entity is “foo”, a candidate with label “foo” will have higher score than another one with label “foo bar”). This score is then mixed with another score coming from an algorithm trying to evaluate how each candidate semantically fits in the context. To understand well a candidate structure check the example in “Main Interactions” section.

External Modules (such as Semantic Data Repositories) are parametric, so one can decide to replicate semantic datasets (such as DBPedia) locally, in order to improve performance. A typical usage, with Semantic Annotation used jointly with a local semantic data storage and a Relational-to-Semantic Converter, is shown in the Figure below.


Semantic Annotation Typical Usage

Main Interactions

The enabler basically consists of an API, which can be called by a simple HTTP GET request to this URL, so the interaction is a simple CALL->RESPONSE.

http://semantican.lab.fi-ware.eu/ajax/extract_words.php?text=

with a text to analyze as input which has to be passed as "text" parameter as shown in the link above.

This system will:


1. Identify Text Language

2. Identify Entities (People, Places, Organizations) in the Text

3. For each found entity It searches over Semantic Data Sources (DBPedia and Geonames) for related Linked Open Data Objects.

4. The found LOD objects for each entity are returned in JSON Format (since it is more versatile than XML) as "candidates". Each candidate has a score. The candidate with the highest score is flagged as "preferred".

5. The query is logged into a Database with an ID.


Here's an example of the return result in JSON format.

{
    "queryId": "12143",
    "lang": "it",
    "keywords": "Mario+Monti",
    "extags": "Mario Monti",
    "freeling": "Mario_Monti",
    "proc_time": "13",
    "terms": [
        {
            "id": "tc-Mario+Monti",
            "term": "Mario Monti",
            "candidates": [
                {
                    "id": "tag--Mario_Monti--http://dbpedia.org/resource/Mario_Monti",
                    "label": "Mario Monti",
                    "uri": "http://dbpedia.org/resource/Mario_Monti",
                    "type": "user",
                    "ext": "Mario Monti",
                    "extra": [],
                    "wrapper": "dbpedia",
                    "lev": "2",
                    "sim": "0.909090909091",
                    "sis": "1",
                    "jw": "0.963636363636",
                    "sc": "1",
                    "class": "empty",
                    "preferred": "true"
                }
            ],
            "html": "<fieldset><div class=panel><div class=header>A proposito di <b>Mario Monti</b></div><div class=panel_body></div></div><div class=panel><div class=panel_body><img src='http://upload.wikimedia.org/wikipedia/commons/thumb/3/33/Il_Presidente_del_Consiglio_incaricato_Mario_Monti_(cropped).jpg/200px-Il_Presidente_del_Consiglio_incaricato_Mario_Monti_(cropped).jpg' height=160 /><br><div class=info>È senatore a vita dal 9 novembre 2011 e dal successivo 16 novembre assume, per la prima volta, l'incarico di Presidente del Consiglio dei Ministri della Repubblica Italiana e allo stesso tempo di Ministro dell'Economia e delle Finanze dello stesso governo. Presidente dell'Università  Bocconi dal 1994, Monti è stato c...<ul><li><a href='http://www.guardian.co.uk/world/mario-monti' target='_blank'>Link utile</a></li></ul></div></div></div></fieldset><fieldset><legend>Concetti associati a <strong>Mario Monti</strong></legend><ul><li><img src='img/user.png' alt='user' title='user'> <a href='http://dbpedia.org/resource/Mario_Monti' target='_blank' title='[2-0.909090909091-0.963636363636/1]' >Mario Monti</a> (dbpedia)</li></ul></fieldset>",
            "class": "empty"
        }
    ]
}



Moreover, by setting the 'html_snippet=on' parameter in the request URL, an HTML snippet for the preferred DBPedia entry is returned if possible. The HTML Snippet contains a Picture and Short Abstract for the resource.

Personal tools
Create a book