We use proprietary and third party's cookies to improve your experience and our services, identifying your Internet Browsing preferences on our website; develop analytic activities and display advertising based on your preferences. If you keep browsing, you accept its use. You can get more information on our Cookie Policy
Cookies Policy
FI-WARE Data/Context Management - FIWARE Forge Wiki

FI-WARE Data/Context Management

From FIWARE Forge Wiki

Jump to: navigation, search

IMPORTANT NOTE: This page is deprecated. Please refer to the most updated description of the FI-WARE Data/Context Management Architecture

Overview

Contents

The availablity of advanced platform functionalities dealing with gathering, processing, interchange and exploitation of data at large scale is going to be cornerstone in the development of intelligent, customized, personalized, context-aware and enriched application and services beyond those available on the current Internet. These functionalities will foster the creation of new business models and opportunities which FI-WARE should be able to capture.

Data in FI-WARE refers to information that is produced, generated, collected or observed that may be relevant for processing, carrying out further analysis and knowledge extraction. It has associated a data type and a value. FI-WARE will support a set of built-in basic data types similar to those existing in most programming languages. Values linked to basic data types supported in FI-WARE are referred as basic data values. As an example, basic data values like ‘2’, ‘7’ or ‘365’ belong to the integer basic data type.

A data element refers to data whose value is defined as consisting of a sequence of one or more <name, type, value> triplets referred as data element attributes, where the type and value of each attribute is either mapped to a basic data type and a basic data value or mapped to the data type and value of another data element. Note that each data element has an associated data type in this formalism. This data type determines what concrete sequence of attributes characterizes the data element.

There may be meta-data (also referred as semantic data) linked to attributes in a data element. However, existence of meta-data linked to a data element attribute is optional.

Applications may assign an identifier to data elements in order to store them in a given Data Storage, e.g. a Data Base. Such identifier will not be considered part of the structure of the data element and the way it can be generated is out of the scope of this specification. Note that a given application may decide to use the value of some attribute linked to a data element as its identifier in a given Data Storage but, again, there is no identifier associated to the representation of a data element.

The structure associated to a data element is represented in Figure 1.


Image:Data_Element_Structure.jpg
Figure 1: Data Element Structure


A cornerstone concept in FI-WARE is that data elements are not bound to a specific format representation. They can be represented as an XML document at some point and then translated into another XML document representation later on, or marshaled as part of a binary message being transferred. Data elements can be stored e.g. in a Relational Database, in an RDF Repository or as entries in a noSQL data base like MongoDB, adopting a particular storage format that may be the same or different respectively to the format used for their transfer. It should be possible to infer the data type of a given data element based on the XML document or on the transferred message format (e.g., by a specific element of the XML document if the same XML style and encoding is used to represent data elements of different types or by a specific used XML style) or based on the specific storage structure format used to store it (e.g., may be inferred from the name of the table in which the data element is stored).

The way data elements are represented in memory by FI-WARE Data/Context Generic Enablers is not specified. Therefore, the implementer of a FI-WARE Datat/Context Generic Enabler may decide the way data elements are represented in memory.

Context in FI-WARE is represented through context elements. A context element extends the concept of data element by associating an EntityId and EntityType to it, uniquely identifying the entity (which in turn may map to a group of entities) in the FI-WARE system to which the context element information refers. In addition, there may be some attributes as well as meta-data associated to attributes that we may define as mandatory for context elements as compared to data elements.

Context elements are typically created containing the value of attributes characterizing a given entity at a given moment. As an example, a context element may contain values of the “last measured temperature”, “square meters” and “wall color” attributes associated to a room in a building.

Note that there might be many different context elements referring to the same entity in a system, each containing the values of a different set of attributes. This allows that different applications handle different context elements for the same entity, each containing only those attributes of that entity relevant to the corresponding application. It will also allow representing updates on the set of attributes linked to a given entity: each of these updates can actually take the form of a context element and contain only the value of those attributes that have changed.

The structure of a context element is represented in Figure 2.


Image:Context_Element_Structure_Model.jpg
Figure 2: Context Element Structure Model


Note that all the statements made with respect to data elements in the previous section would also apply to context elements.

An event is an occurrence within a particular system or domain; it is something that has happened, or is contemplated as having happened in that domain. Events typically lead to creation of some data or context element, thus enabling that information describing or related to events be handled by applications or event-aware FI-WARE GEs (e.g., the Publish/Subscribe Broker GE, when handling update/notifications, or the CEP GE). As an example, a sensor device may be measuring the temperature and pressure of a given boiler, sending a context element every five minutes associated to that entity (the boiler) that includes the value of these to attributes (temperature and pressure) or just the one that has changed. The creation and sending of the context element is an event, i.e., something that has occurred (the sensor device has sent new measures). As another example, a mobile handset may export attributes like “Operating System” or “Screen size”. A given application may query for the value of these two attributes in order to adapt the content to be delivered to the device. As a result, the mobile handset creates and replies a context element back to the application. This response may be considered as well an event, i.e., something that has occurred (the mobile handset has replied to a request issued by an application).

Since the data/context elements that are generated linked to an event are the way events get visible in a computing system, it is common to refer to data/context elements related to events simply as "events" while describing the features of, or the interaction with, event-aware FI-WARE GEs. For convenience, we also may use the terms “data event” and “context event”. A “data event” refers to an event leading to creation of a data element, while a “context event” refers to an event leading to creation of a context element.

The word event object is used to mean a programming entity that represents such an occurrence (event) in a computing system [EPIA]. Events are represented as event objects within computing systems to distinguish them from other types of objects and to perform operations on them, also known as event processing.

In FI-WARE, event objects are created internally to some GEs like the Complex Event Processing GE or the Publish/Subscribe Broker GE. These event objects are defined as a data element (or a context element) representing an event to which a number of standard event object properties (similar to a header) are associated internally. These standard event object properties support certain event processing functions. The concrete set of standard event object properties in FI-WARE is still to be defined but we may anticipate that one of these properties would be the time at which the event object is detected by the GE (arrives to the GE). This will, for example, allow to support functions that can operate on events that exceed certain age in the system. Tools will be provided enabling applications or admin users to assign values to those event object properties based on values of data element attributes (e.g., source of the event or actual capture time). An event object may wrap different characteristics of the data element (i.e., DataType) or the context element (i.e., EntityId and EntityType).

Unless otherwise specified, all staments in the rest of the chapter that make use of the term “data (element)” also apply to context (element) as well as to events, related or not to data/context elements.

Figure 3 below presents a high-level view of the Reference Architecture of the Data/Context Management chapter in FI-WARE. Defined GEs (marked in light blue in the figure) can be instantiated in a flexible and coherent way, enabling different FI-WARE Instances to pick and configure different GEs according to demands and requirements of applications that will run on top. Following is a brief description of them:

  • Publish/Subscribe Broker GE, that allows applications to interchange heterogeneous events following a standard publish – subscribe paradigm.
  • Complex Event Processing GE, which has to do with the processing of event streams in real-time that will generate immediate insight, enabling applications to instantly response to changing conditions on certain customers or objects (entities such as devices, applications, systems, etc.).
  • BigData Analysis GE: which enables to perform a map-reduce analysis of large amount of data both on the go or previously stored.
  • Multimedia Analysis Generation GE, which performs the automatic or semiautomatic extraction of meta-information (knowledge) based on the analysis of multimedia content.
  • Unstructured data analysis GE, which enables the extraction of meta-data based on the analysis of unnstructured information obtained from web resources.
  • Meta-data pre-processing GE, which ease the generation of programming objects from several metadata-formats.
  • Location GE, which provides geo-location information as a context information obtained from devices.
  • Query Broker GE, which deals with the problem of providing a uniform query access mechanism for retrieval of data stored in heterogeneous formats.
  • Semantic Annotation GE, which allows to enrich multimedia information or other data with semantic meta-data tags to be exploited by semantic web applications.
  • Semantic Application Support GE, which provides support for the core set of semantic web functionalities that ease programming of semantic web applications.


Image:High-Level_view_of_GEs_in_the_Data-Context_Management_chapter.jpg
Figure 3: High-Level view of GEs in the Data/Context Management chapter


In addition to these GEs, the opportunity to include added value Generic Enablers implementing Intelligent Services in FI-WARE has been identified. However, their inclusion will depend very much on the identification of a demand from first users of FI-WARE and their potential to be common to a large number of applications in different domains. These enablers would be programmed based on the off-line and real time processing Generic Enablers, as well as on data both in memory or persistent, to provide analytical and algorithmic capabilities in the following areas:

  • Social Network Analysis
  • Mobility and Behaviour Analysis
  • Real-time recommendations
  • Behavioural and Web profiling
  • Opinion mining

The following sections describe these Generic Enablers (GE) in more detail. Figure 13 displays some question marks that are reviewed at the end of this chapter.

Generic Enablers

Publish/Subscribe Broker

Target usage

The Publish/Subscribe Broker is a GE of the FI-WARE platform that enables publication of events by entities, referred as Event Producers, so that published events becomes available to other entities, referred as Event Consumers, which are interested in processing the published events. Applications or even other GEs in the FI-WARE platform may play the role of Event Producers, Event Consumers or both.

A fundamental principle supported by this GE is that of achieving a total decoupling between Event Producers and Event Consumers. On one hand, this means that Event Producers publish data without knowing which Event Consumers will consume published data; therefore they don’t need to be connected to them. On the other hand, Event Consumers consume data of their interest, without this meaning they know which Event Producer has published a particular event: they are just interested in the event itself but not in who generated it.

GE description

The conceptual model and set of interfaces defined for the Publish/Subscribe Broker GE are aligned with the technical specifications of the NGSI-10 interface, which is one of the interfaces associated to Context Management Functions in the Next Generation Service Interfaces (NGSI) defined by the Open Mobile Alliance [OMA-TS-NGSI-Context]. Specifications of the Publish/Subscribe Broker GE in FI-WARE will not support all the interfaces and operations defined for Context Management Functions in NGSI: it will just focus on the NGSI-10 interface and even only those parts of the specifications associated to this interface that are considered most useful to support development of applications in the Future Internet. On the other hand, it will extend the scope of NGSI-10 specifications as to be able to deal with data elements, not just context elements.

Figure 4 illustrates the basic operations that Event Producers and Consumers can invoke on Publish/Subscribe Broker GEs in order to interact with them. Event Producers publish events by invoking the update operation on a Publish/Subscribe Broker GE. Note that when used to publish context elements representing updates on the values of attributes linked to existing entities, only the value of attributes that have changed may be passed. Despite not currently included in the NGSI-10 specifications, Event Producers in FI-WARE also export the query operation, enabling a Publish/Subscriber Broker GE to pull for events, as illustrated in Figure 14.

The Publish/Subscribe Broker GE exports interfaces enabling Event Consumers to consume events in two basic modes:

  • Request/response mode, enabling retrieval of events as response to query requests on the Publish/Subscribe Broker GE issued by Event Consumers;
  • Subscription mode, enabling to setup the conditions under which events will be pushed to Event Consumers. The Publish/Subscribe Broker GE will invoke the notify operation exported by the Event Consumer every time an event fulfilling the condition established in its subscription arrives. Subscriptions may be setup by third applications and not necessarily the Event Consumer itself, as illustrated in the figure. An expiration time can be defined for each subscription by means of invoking operations exported by the Publish/Subscribe Broker GE. If the expiration time is missing, then default values may be used.

When an Event Consumer or a Publish/Subscribe Broker formulates a query, it may formulate it relative to attributes of a given entity (or group of entities). The context elements being sent as response are considered as events.

Image:Basic_interaction_with_the_Publish-Subscribe_Broker_GE.jpg
Figure 4: Basic interaction with the Publish/Subscribe Broker GE


Note that events are kept in memory by the Publish/Subscribe Broker GE while events do not exceed a given expiration time. Therefore, they are not dropped from the Publish/Subscribe Broker GE persistent memory/storage just because the result of a query (request/response mode) returns a copy of them. That is, the same query on a Publish/Subscribe Broker GE will return the same events (provided they haven’t expired). However, events are notified just once to subscribed Event Consumers in the subscription mode.

Data/context elements associated to events could be of any type. Besides, Event Producers could be either components integrated and embedded into a FI-WARE GE or being part of an application as far as they respect the described interfaces. Therefore, data/context elements in events could be of any provenience and of any granularity such as both raw data (not processed yet through any FI-WARE GE) or a higher level of data abstraction (e.g., insights extracted from the raw data by means of some of the GEs described within this document and shown in Figure 13).

An entity playing the role of Event Producer is an entity that publishes events in this model but could also play the role of an Event Consumer, e.g. may consume events and publish other events based on the consumed ones. Besides, the Publish/Subscribe Broker GE may be connected to many Event Producers so that, the number of Event Producers as well as their availability shall be hidden to the Event Consumer. The Publish/Subscribe Broker GE acts as a intermediate single access point to multiple Event Consumers for accessing events published by multiple Event Producers. An example of an architecture involving multiple Event Producers and Consumers connected to a Publish/Subscribe Broker GE is shown in Figure 15 .

Image:Multiple_Event_Producers_and_Consumers_connected_to_a_Publish-Subscribe_Broker_GE.jpg
Figure 5: Multiple Event Producers and Consumers connected to a Publish/Subscribe Broker GE


A Publish/Subscribe Broker GE may play the role of Event Consumers, therefore being able to consume events published by another Publish/Subscribe Broker GE (in either of the two existing modes). This will support the ability to federate Publish/Subscribe Broker GEs, even in the case that their implementation is different. This will be useful in scenarios like the management of the Internet of Things, where there might be Publish/Subscribe Broker GEs running in devices, IoT gateways or centralized. A service for discovering Publish/Subscribe Broker GEs based on criteria will be defined in FI-WARE, making it easier to support these highly distributed scenarios involving multiple Publish/Subscribe Broker GEs.

The preference given to OMA's framework, compared to other frameworks/technologies, is based on the following technical points:

  • Ability to keep memory of events while conditions for the duration of these event hold, independently of who connects as Event Consumer. This will ease programming of Applications that may shutdown or are not initially ready but need to (re)start or synchronize processing of events that have been generated since a given point in time. Other frameworks such as WS-Notification will typically imply that the Broker receives events and pass them to Event Consumers that are connected at that time, without keeping events on memory (i.e., events received at the Broker are "consumed" by the Event Consumers connected at that moment).
  • Suitability for a wide range of potential implementations of the Publish/Subscribe Broker GE, adaptable to run not only on traditional servers but small devices. This will ease the development of distributed networks of connected Publish/Subscribe Brokers running both on top of small devices and on servers. FI-WARE has to support working in a federated model with different types of federations: hierarchical (local, global, etc.), functional (social, physical, etc.) and/or mash (peer-to-peer info exchange). The OMA NGSI model allows this by design.
  • Ability to define several alternative bindings, being particularly suitable to adapt to a REST binding, which seems like more appropriate in FI-WARE comparing to other frameworks where only one bindings, not necessarily REST-based, such as SOAP, is mandatory. The REST communication is much more suitable for heterogeneous communications of resource (and energy) constrain devices such as IoT nodes and mobile phones, as far as the protocol REST “per se” is very simple and with little (useless) overhead, and payload can be any (text SMS SMPP, XMPP, SMIME/SIP, XML, an ASCII structure or any proprietary binary encoded payload depending on application.
  • Adaptability to handle data with no predefined structure (i.e., mandatory fields)
  • Flexible subscription query language (being able to adapt for support of multiple query languages) compared to other frameworks such as WS-Notification which requires support of the notion of "topic".

On the other hand, ability to influence OMA direction towards changes in the specifications, when needed, is highly desirable. Other specifications/frameworks are considered more "stable" and due to legacy/backward-compatibility issues, less open to changes. FI-WARE partners have already contributed to OMA NGSI specs in the past and will be able to contribute in future.

Critical product attributes

  • All data types (generic data or context elements) are available for consumption through the same Publish/Subscribe GE interface
  • Comprehensive publish/subscribe interfaces enabling to formulate powerful subscription conditions
  • Simple publish/subscribe interface enabling easy and fast integration with consumer applications following a pull or push style of communication
  • OMA-based standard interfaces, easing the interworking with many devices while still being useful for event publication/subscription at backend systems
  • Ability to develop light and efficient implementations (capable to address real- or near real-time delivery of events and to run on small devices)
  • Extendible (ability to extend interfaces or add interfaces in order to introduce new or domain-specific features)
  • Scalable (enabled through Publish/Subscribe Broker GE federation)
  • Reserved facility to be auto-cleaning and self-controlled (to purge unused or forgotten subscriptions)

Complex Event Processing

Target usage

Complex Event Processing (CEP) is the analysis of event data in real-time to generate immediate insight and enable instant response to changing conditions. Some functional requirements this technology addresses include event-based routing, observation, monitoring and event correlation. The technology and implementations of CEP provide means to expressively and flexibly define and maintain the event processing logic of the application, and in runtime it is designed to meet all the functional and non-functional requirements without taking a toll on the application performance, removing one issue from the application developer’s and system managers concerns.

For the primary user of the real-time processing generic enabler, namely the consumer of the information generated, the Complex Event Processing GE (CEP GE) addresses the user’s concerns of receiving the relevant events at the relevant time with the relevant data in a consumable format. Relevant - meaning of relevance to the consumer\subscriber to react or make use of the event appropriately. Figure 16 depicts this role through a pseudo API derivedEvent(type,payload) by which, at the very least, an event object is received with the name of the event, derived out of the processing of other events, and its payload.

The designer of the event processing logic is responsible for creating event specifications and definitions (including where to receive them) from the data gathered by the Massive Data Gathering Generic Enabler. The designer should also be able to discover and understand existing event definitions. Therefore FI-WARE, in providing an implementation of a Real-time CEP GE, will also provide the tools for the designer. In addition, APIs will be provided to allow generation of event definitions and instructions for operations on these events programmatically, such as by an application or by other tools for other programming models that require Complex Event Processing such as the orchestration of several applications into a composed application using some event processing. In Figure 16 these roles are described as Designer and Programs making use of the pseudo API deploy definitions/instructions.

Finally, the CEP GE addresses the needs of an event system manager and operator, could be either real people or management components, by allowing for configurations (such as security adjustments), exposing processing performance, handling problems, and monitoring the system’s health, represented in Figure 16 as Management role making use of the pseudo API configuration/tuning/monitoring.

Image: Interactions_with_and_APIs_of_the_Real-time_CEP_Generic_Enabler.jpg
Figure 6: Interactions with and APIs of the Real-time CEP Generic Enabler


GE description

The Complex Event Processing Generic Enabler (CEP GE) provides:

  • tools to define event processing applications on data interpreted as events, either manually or programmatically
  • execution of the event processing application on events as they occur and generation of derived events accordingly
  • management of the runtime

The functions supported by such a GE aligns with the functional architecture view produced by the Reference Architecture Work Group of the Event Processing Technical Society (EPTS) [EPTS][EPTS-RA 10], depicted in Figure 17 and further elaborated bellow.


Image:Functional_View_of_Event_Processing.jpg
Figure 7: Functional View of Event Processing


Entities connected to the CEP GE (application entities or some other GEs like the Publish/Subscribe Broker GE) can play two different roles: the role of Event Producer or the role of Event Consumers. Note that nothing precludes that a given entity plays both roles.

Event Producers are the source of events for event processing. They can provide events in two modes:

  • "Push" mode: The Event Producers push events into the Complex Event Processing GE by means of invoking a standard operation the GE exports.
  • ”Pull” mode: The Event Producer exports a standard operation that the Complex Event Processing GE can invoke to retrieve events.

Event Consumers are the sink point of events. Following are some examples of event consumers:

  • Dashboard: a type of event consumer that displays alarms defined when certain conditions hold on events related to some user community or produced by a number of devices..
  • Handling process: a type of event consumer that consumes processed events and performs a concrete action.
  • The Publish/Subscribe Broker GE (see section 4.2.1): a type of event consumer that forwards the events it consumes to all interested applications based on a subscription model.

Despite not being decided yet, it is most likely the case that Event Producers and Event Consumers that can be connected to the CEP GE in FI-WARE export the interfaces of Event Producers and Consumers as specified in section 4.2.1 (description of the Publish/Subscribe Broker GE). This will allow that a Publish/Subscribe Broker GE is connected to forward events to a CEP GE or, viceversa, a CEP GE forward events result of processing to a Publish/Subscribe Broker GE.

The CEP GE in FI-WARE implements event processing functions based on the design and execution of Event Processing Networks (EPN). Processing nodes that make up this network are called Event Processing Agents (EPAs) as described in the book “Event Processing in Action” [EPIA]. The network describes the flow of events originating at event producers and flowing through various event processing agents to eventually reach event consumers, see Figure 18 for an illustration. Here we see that events from Producer 1 are processed by Agent 1. Events derived by Agent 1 are of interest to Consumer 1 but are also processed by Agent 3 together with events derived by Agent 2. Note that the intermediary processing between producers and consumers in every installation is made up of several functions and often the same function is applied to different events for different purposes at different stages of the processing. The EPN approach allows to deal with this in an efficient manner, because a given agent may receive events from different sources. At runtime, this approach also allows for a flexible allocation of agents in physical computing nodes as the entire event processing application can be executed as a single runtime artifact, such as Agent 1 and Agent 2 in Node 1 in Figure 18, or as multiple runtime artifacts according to the individual agents that make up the network, such as Agent 1 and Agent 3 running within different nodes. Thus scale, performance and optimization requirements may be addressed by design. The reasons for running pieces of the network in different nodes or environments vary, for example:

  • Distributing the processing power
  • Distributing for geographical reasons – process as close to the source as possible for lower networking
  • Optimized and specialized processors that deal with specific event processing logic

Another benefit in representing event processing applications as networks is that entire networks can be nested as agents in other networks allowing for reuse and composition of existing event processing applications.


Image:Illustration_of_an_Event_Processing_Network_made_of_producers_agents_and_consumers.jpg
Figure 8: Illustration of an Event Processing Network made of producers, agents and consumers


The event processing agents and their assembly into a network is where most of the functions of this GE are implemented. In FI-WARE, behavior of an event processing agent is specified using a rule-oriented language that is inspired by the ECA (Event-Condition-Action) concept and may better be described as Pattern-Condition-Action. Rules in this language will consist in three parts:

  • A pattern detection that makes a rule of relevance
  • A set of conditions (logical tests) formulated on events as well as external data
  • A set of actions to be carried out when all the established conditions are satisfied

Following is an indication of the capabilities to be support in each part of the rule language. A description of how such rules are assembled with simplifications is described in Event Processing Agent in Detail.

Pattern Detection

In the pattern detection part the user may program patterns over selected events within an event processing context (such as a time window or segmentation) and only if the pattern is matched the rule is of relevance and according to conditions, the action part is executed. Examples for such patterns are:

  • Sequence, meaning events need to occur in a specified order for the pattern to be matched
  • Count, a number of events need to occur for the pattern to be matched

Event Processing Context [EPIA] is defined as a named specification of conditions that groups event instances so that they can be processed in a related way. It assigns each event instance to one or more context partitions. A context may have one or more context dimensions and can give rise to one or more context partitions. Context dimension tells us whether the context is for a temporal, spatial, state-oriented, or segmentation-oriented context, or whether it is a composite context that is to say one made up of other context specifications. Context partition is a set into which event instances have been classified.

Conditions

The user of the CEP GE may program the following kind of conditions in a given rule:

  • Simple conditions, which are established as predicates defined over single events of a certain type
  • Complex conditions, which are established as logical operations on predicates defined over a set of events of a certain type:
  • All (conjunction) meaning that all defined predicates must be true
  • Any (disjunction) meaning that at least one of the defined predicates must be true
  • Absence (negation) meaning that none of the defined predicates can be true

Predicates defined on events can be expressed based on a number of predefined operators applicable over:

  • values of event data fields
  • values of other properties inherent to an event (e.g., lifetime of the event)
  • external functions the GE can invoke and to which event data field values or event property values can be passed

Note that the conditions selected for a given rule establish whether processing of that rule is stateless or stateful. Stateless processing requires that the conditions apply to a single event and are only formulated over properties of the event, without relying on external variables. Stateful processing applies to multiple events or even when a single event is processed but some of the conditions rely on external variables. A processing agent is stateless when processing of all rules governing its behavior are stateless. Otherwise, the processing agent is stateful.

Actions

The user of the GE may program the following kind of actions in a given rule:

  • Transformations, defined over events satisfying the rule, which may consist in generating a new event whose data is the result of:
  • Projecting a subset of the data fields from one or several of the events satisfying the rule
  • Translating values of projected data fields into new values as a result of applying some programmed function
  • Enriching data of the new event with data not present originally
  • Forwarding actions, which would consist in forwarding one or several of the events satisfying the rule
  • Invocation of external services that allow achieving some desired effect in the overall system.

Note that several transformations can be programmed in the same rule. This allows an event or set of events to be split in multiple derived events. In addition, external processes being executed as the result of an action may lead to updates of variables based on which certain functions used in predicates of rule conditions are formulated.

Designing EPNs

At design time the functional aspects of this Generic Enabler include the definition, modelling, improvement, and maintenance of the artefacts used in event processing. This is an integration point with FI-WARE Tools GEs through the management tools. These artefacts are:

  • event definitions, includes at the very least a type name and in most cases also definition of the payload
  • event processing network assembled by event processing agents

These artefacts can be programmatically developed and deployed on the fly to the execution or can be developed by users with form based tools (eclipse or web based) and manually deployed to the execution – also on the fly.

Event Processing Agent in Detail

To simplify the specification of an event process agent, rather than requiring to model the logic associated to the event processing using a rule-oriented language, a framework based on a number of building blocks is provided to assist in specifying the logic. This allows the user to express his intent without being familiar with the logical and temporal operators in the rule-oriented language and without the need to write logical statements that are long and sometime difficult to understand. This is done by specifying three building blocks that make up and agent as well as input and output terminals that specify the type of events to be considered in a rule and the type of events that may be derived by the rule respectively, as depicted in Figure 19.


Image:The_building_block_of_Event_Processing_Agent.jpg
Figure 9: The building block of Event Processing Agent – simpler specification of the logic


Only events of the type that have been specified in the input terminals of the agent will be considered for the rule. Then, the following building blocks are defined:

  • The first building block is the filter block, an optional block, where filters can be applied on individual input events whether to consider them or not in further execution of the rule.
  • The second building block is the matching block, an optional block, where the events are matched against a specified pattern and where matching sets are created.
  • The final building block is the derivation block, the only mandatory block, where matching sets, filtered events or direct input events are used to compose one or more derived events according to conditions and expressions.

The derived events are specified in the output terminals to be selected as inputs by other agents.

This entire rule may be executed in a processing context (temporal, segmentation (group-by), state, space and spatiotemporal) which is also specified for an agent when needed. This is equivalent to the processing context of the pattern part of the rule.

For example, if only the derivation block is specified then the agent provides transformation operations (the type of transformation is dependent on the derivation – if the same event type is derived with only a subset of its attributes then it’s a projection type)

The two most common contexts of a rule execution or evaluation are interval (window) and segmentation (group-by) of events:

  • The interval context may be a fixed interval where the user specifies the exact times of start and end, a sliding interval where the user specifies how the interval slides either by time or number of events, and an event-based interval where the user specifies the event that will initiate the interval and the event that will terminate it. Only events that occur within the same interval are considered for processing by the agent for that interval, i.e. will satisfy the pattern part of the rule.
  • The segmentation context is specified by the user through a data element by which all events are grouped-by if their values match. Only events with the same specified data value will be considered for processing by the agent for a particular segment, i.e. will satisfy the pattern part of the rule.

The syntax of the building blocks is also available for specifying event processing agents programmatically and the artefacts are deployable through APIs to the processing engine (whether single, clustered or distributed). This is for the same reason as to avoid programming sound logical temporal statements.

Critical product attributes

  • Business\Application agility – change patterns rapidly at the end user level, ability to implement and change more rapidly, react and adapt to events, real-time monitoring, continuous intelligence
  • Business\Application optimization – early detection, ability to dynamically assemble needed process components at runtime, dynamic services composition and orchestration
  • Business\Application efficiency – support decision making, sophisticated action initiation
  • Event Processing Network as a CEP logic abstraction towards standardization
  • Lower cost of operations – lower maintenance of patterns and rules
  • Lower cost of implementation – reduce in design, build and test costs and time
  • Scale – increase scale of response and volume, multiple channels of events, widely distributed event sources



Big Data Analysis

Target usage

Big Data Batch Processing (also known as Big Data Crunching) is the technology used to process huge amounts of previously stored data in order to get relevant insights in scenarios where latency is not a highly relevant parameter. These insights take the form of newly generated data, which will be at disposal of applications using the same mechanisms through which initially stored data is available.

On the other hand, Big Data Stream Processing could be defined as the technology to process continuous unbounded and large streams of data extracting relevant insights on the go. This technology could be applied to scenarios where it is not necessary to store all incoming data or it has to be processed “on the go”, immediately after it becomes available. Additionally, this technology would be more suitable to big-data problems where low latency in generation of insights is expected. In this particular case, insights would be continuously generated, parallel to incoming data, allowing continuous estimations and predictions.

The Big Data Analysis Support GE offers a continuous solution for both Big Data Crunching and Big Data Streaming. A key characteristic of this GE is that it would present a unified set of tools and APIs allowing developers to program the analysis on large amount of data and extract relevant insights in both scenarios. Using this API, developers will be able to program Intelligent Services like the ones described in the Intelligent Services section of the High Level Vision. These Intelligent Services will be plugged in the Big Data Analysis GE using a number of tools and APIs that this GE will support.

Input to the Big Data Analysis GE will be provided in two forms: as stored data so that analysis is carried out in batch mode or as a continuous stream of data so that analysis is carried out on-the-fly.

The first is adequate when latency is not a relevant parameter or additional data (not previously collected) is required for the process (i.e. access to auxiliary data on external databases, crawling of external sites, etc). The second is better suited in applications where lower latency is expected.

Algorithms developed using the API provided by the Big Data Analysis GE in oder to process data will be interchangeable between the batch and stream modes of operation. In other words, the API available for programming Intelligent Services will be the same in both modes.

In both cases, the focus of this enabler is in the "big data" consideration, that is, developers will be able to plug "intelligence" to the data-processing (batch or stream) without worrying about the parallelization/distribution or size/scalability of the problem. In the batch processing case, this means that the enabler should be able to scale with the size of the data-set and the complexity of the applied algorithms. On the other hand, in the stream mode, the enabler has to scale with both input rate and the size of the continuous updated analytics (usually called "state"). Note that other GEs in FI-WARE are more focused on real-time response of a continuous stream of events not making emphasis in the big-data consideration (see the Complex Event Processing section of High Level Vision).

GE description

Technologically speaking, big data crunching was revolutionized by Google, introducing a flexible and simple framework called MapReduce. This paradigm allows developers to process big data sets using a really simple API without having to worry about parallelization or distribution. This paradigm is well suited for batch processing in highly distributed data-sets but it is not focused on high-performance and events, so it is less suited for stream-like operations.

The GE we present here is originally based on the MapReduce paradigm but extends it in order to offer high performance in batch mode while still making it suitable for stream processing. The following diagram offers a general overview of the Big Data Analysis GE, showing main blocks and concepts.

Figure 10 - Big Data Analysis GE

As can be seen, a ser of entry points or data injectors precedes all the other blocks. These injectors provide data to the processing core of the GE, either in streaming mode, either by receiving input data that is directly copy into the distributed storage in order to perform batch processings. Once the results are ready, they are placed in a NoSQL database for future consumption in a high throughput fashion. Finally, a frontend for administrative and operation purposes is added on top of all the other components.

What differs this GE from conventional Big Data platforms are primarily:

  • The streaming and batch processing functionalities both in one single platform. Due to batch and stream processing are managed by using totally different approaches, today Big Data platforms are uniquely oriented to a unique type of data: large log files or continuous streams of data. The envisioned GE will be able to deal wich both, firstly by allowing injectors for streams that will be internally turn into batchs in order to perform MapReduce techniques (first releases), and then by performing real differentiated batch and streaming processing (final releases).
  • The automatic deployment capabilities in a cloud-based cluster of nodes. Big Data platforms are designed to deploy on a cluster of commodity hardware. This GE goes far beyond and proposes replace the physical machines by virtual nodes and provide means to automatically deploy on such a cloud-based cluster.
  • The wide range of available data injectors. The GE will expose a set of interfaces ready to accept data in several formats and ways, e.g. the above mentioned stream injectors, but also agent-based gatherers of data and conventional file transfer systems.
  • The high speed access to the resulting insights via a NoSQL database. Today Big Data platforms relay on distributed file systems to store the input data and all its intermediate transformations since it is the unique way to manage large files manipulation. Nevertheless, the throughput of these distributed file systems is not high, which becomes specially critical when accessing several times to the same piece of data: the results. Thus, the BigData GE foresees to use a NoSQL database where to copy the resulting insights and access them with high throughput rates.

Critical product attributes

  • Streaming and batch processing functionalities both in one single platform.
  • Automatic deployment capabilities in a cloud-based cluster of nodes.
  • Wide range of available data injectors.
  • High speed access to the resulting insights via a NoSQL database.

Compressed Domain Video Analysis

Target usage

The target users of the Compressed Domain Video Analysis GE are all applications that want to extract meaningful information from video content and that need to automatically find characteristics in video data bases on given tasks. The GE can work for previously stored video data as well as for video data streams (e.g., received from a camera in real time).

In the media era of the web, much content is user-generated (UGC) and span over any possible kind, from amateur to professional, nature, parties, etc. In such context, video content analysis can provide several advantages for classifying content and later search, or to provide additional information about the content itself.

Example applications in different industries addressed by this Generic Enabler are:

  • Telecom industry: Identify characteristics in video content recorded by single mobile users; identify communalities in the recordings across several mobile users (e.g., within the same cell).
  • Mobile users: (Semi-)automated annotation of recorded video content, point of interest recognition and tourist information in augmented reality scenarios, social services (e.g., facial recognition).
  • IT companies: Automated processing of video content in databases.
  • Surveillance industry: Automated detection of relevant events (e.g., alarms, etc.).
  • Marketing industry: Object/brand recognition and sales information offered (shops near user, similar products, etc.).

GE description

The Compressed Domain Video Analysis GE consists of set of tools for analyzing video streams in the compressed domain. Its purpose is to avoid costly video content decoding prior to the actual analysis. Thereby, the tool set processes video streams by analyzing compressed or just partially decoded syntax elements. The main benefit is its very fast analysis due to a hierarchical architecture.

Its main characteristics can be summarized as follow:

  • It is a single central point of integration of several content detection/recognition technologies, also encompassing real-time processing.
  • It is capable of selecting the appropriate processing technologies based on type of content submitted.
  • It provides a central feature to provide feedback on the results of the detection/recognition process.
  • It offers a set of APIs to allow processing of file and streaming content, both in real-time.

A generic description of the Compressed Domain Video Analysis GE is shown in Figure 13. It depicts the generic functional blocks of this GE.


Figure 13: Compressed Domain Video Analysis GE – Generic description


The four components of the Compressed Domain Video Analysis GE are Media Interface, Media (Stream) Analysis, Metadata Interface, and the API:

  • The Media Interface receives the media data through different formats. Several streams/files can be accessed in parallel (e.g., different RTP sessions can be handled). Different interchange formats like for streaming and file access can be realized. Example formats are:
    • Real-time Transport Protocol (RTP) as standardized in RFC 3550 [RFC3550]. Payload formats to describe the contained compression format can be further specified (e.g., RFC 3984 [RFC3984] for the H.264/AVC payload).
    • ISO Base Media File Format as standardized in ISO/IEC 14496-12 [ISO 08].
    • HTTP-based interfaces (e.g., REST-like APIs). URLs/URIs could be used to identify the relevant media resources.

    Two different usage scenarios are regarded:

    • File Access: A multimedia file has already been generated and is stored on a server in a file system or in a database. For analysis, the media file can be accessed independently of the original timing. This means that analysis can happen slower or faster than real-time and random access on the timed media data can be performed. The analysis is performed in the Media Analysis operation mode. Note that in some cases also a streaming interface might be used to realize file access, e.g., in case media playout is realized in addition to (automated) multimedia analysis. In this case, the Media Stream Analysis operation mode might be used.
    • Streaming: A multimedia stream is generated by a device (e.g., a video camera) and streamed over a network using dedicated transport protocols (e.g., RTP, DASH). For analysis, the media stream can be accessed only in its original timing, since the stream is generated in real time. The analysis is performed in the Media Stream Analysis operation mode.
  • The Media (Stream) Analysis component: The GE operates in the compressed domain, i.e., the media data is analyzed without prior decoding. This allows for low-complexity and therefore resource-efficient processing and analysis of the media stream. The analytics can happen on different semantic layers of the compressed media (e.g., packet layer, symbol layer, etc.). The higher (i.e., more abstract) the layer, the lower the necessary computing power. Some schemes work codec-agnostic (i.e., across a variety of compression/media formats) while other schemes require a specific compression format. In principle, the analytics operations can be done in real-time. In practical implementations, this depends on computational resources, the complexity of the algorithm, and the quality of the implementation. In general, low complexity implementations are targeted for the realization of this GE. In some more sophisticated realizations of this GE (e.g., crawling through a multimedia database), a larger time span of the stream is needed for analytics. In this case, real-time processing is in principle not possible and also not intended.
  • The Metadata Interface: A metadata format used for subsequent processing should be used here. The format could, for instance, be HTTP-based (e.g., REST-like APIs) or XML-based. Binary encoding, e.g., using EXI [W3C 11] or BiM [ISO 06], are usually advantages to avoid repeated parsing operations.
  • The API component is used to access and configure the Compressed Domain Video Analysis GE from outside.

A realization of the Compressed Domain Video Analysis GE consists of a composition of different types of realizations for the four building blocks (i.e., components). The core functionality of the realization is determined by the selection of the Media (Stream) Analysis component (and the related sub-components). Input and output format are determined by the selection of the inbound and outbound interface component, i.e., Media Interface and Metadata Interface components. The interfaces are stream-oriented.

Critical product attributes for the Compressed Domain Video Analysis GE are especially high detection/recognition ratios containing only few false positives and low-complexity operation. Furthermore, partitioning to independent functional blocks enables the GE to support a variety of analysis methods on several media types and to get easily extended by new features. Even several operations can be combined. The mentioned attributes are also reflected in the Critical product attributes listed below.

Critical product attributes

  • General purpose enabler for compressed domain video analysis and information extraction
  • Automated detection of relevant/critical events in video streams
  • (Semi-)automated annotation & enrichment of video content
  • Low complexity algorithms for processing of video data in massively parallel streams and in huge databases
  • Parallel processing of a single video stream regarding different criteria

Unstructured data analysis

Target usage

In some domains there is a clear need of using high volumes of unstructured data coming from the Internet (blog posts, rss feeds, new, etc.) in almost real time for a later process and analysis. Target users are any stakeholder that needs to first transform unstructured data from Internet into machine-readable data streams for further almost-real time analysis, decision support systems, etc.

GE description

Information is amongst the most valuable assets in the future Internet. Most of the information existing in the current Web is mostly of unstructured nature (blog posts, HTML pages, news, feeds, etc.). The majority of the existing applications today are using only structured data, and therefore overlooking all the potential hidden knowledge that resides in those unstructured resources. There is a clear need of providing a large-scale, near real-time, automatic data acquisition infrastructure that allows the processing of unstructured data from over the Web.

This data acquisition should transform this vast amount of data into processable streams in order to apply the necessary information extraction algorithms needed to transform the raw data into machine-readable information. Algorithms for extraction of high-level features (sentiments, opinions, etc.) tailored for the specific needs of different domains are also needed in this context.

Therefore the system should be able to generate and process massive non-structured and semi-structured data streams in a uniform manner. Once acquired, the data passes through multiple stages where it is processed, resulting in relevant knowledge being extracted. Each stage analyses and processes the received data, enriches it with annotations, and passes it to the next stage. In the final stage, the outcome is presented to the end-user. The whole process can be divided into 4 main stages as illustrated in Figure 27 below.


Figure 17: High Level Unstructured data processing enabler


The depicted process covers all functional parts with their proper order and direction of data processing. Pipelining is the fundamental idea of near real-time massive stream processing in the Unstructured Data Processing Enabler. Every stage of the pipeline is able to process data at the same time, ensuring the high throughput that is required for handling massive amounts of data.

The Unstructured Data Processing Enabler pipeline is shown in the figure below.


Figure 18: Architecture of Unstructured data processing enable pipeline


From the technical point of view, the pipeline consists of multiple stages (components, depicted as square blocks) that continuously processes data as they are gathered in the form of document streams. First part of the pipeline, called “Data Cleaning and Ontology Evolution” performs necessary preparatory and filtering steps in order to extract only the relevant and raw text from documents streams and make it suitable for later processing. It consists of the following core components:

  • Language detection – preliminary filter of documents that cannot be processed because of language limitations. Word-based algorithms or n-gram algorithms play significant role at this step.
  • Boilerplate removal – filtering the content of each document, by removing unnecessary and boilerplate information from web documents, such as like advertisements, navigation elements, copyright notices, etc. Numerous techniques might be applied here, such as probabilistic or statistical methods or shallow text features.
  • Duplicate removal – ensure that the same or similar documents are not processed twice. To achieve that near-duplicates might be discarded by analysing i.e. each document’s simhash.
  • Off-topic removal, Opinion spam removal – ensure that only quality data can pass through, thus avoiding spam to bias the computation of high-level features.
  • Ontology evolution – semi automatic topic ontology evolution from massive textual stream of documents.

Second part of the processing pipeline “Information Extraction and Sentiment Analysis” is applying NLP techniques in order to extract higher-level features from the document stream, such as sentiments. This second part can be assimilated to an intelligent service generic enabler that extracts sentiments for a particular domain. Other relevant semantic features besides sentiments would potentially be extracted and analysed. In the case of sentiments this enabler offers an ontology-supported process, making use of the knowledgebase from ontology evolution component to ensure improvement of information acquisition over time. Extracted sentiments are stored and aggregated in the way that it is possible to dig down from aggregated sentiment result into the concrete point in the source document.

Critical product attributes

  • Massive web-based information from different sources is extracted, cleaned and transformed to streams.
  • The resulting streams are ready to be analysed using different data analysis intelligent services.
  • The system provides an ontology evolution module that is capable of evolving an existing ontology for sentiment classification automatically.


Metadata Preprocessing

Target usage

Target users are all stakeholders that need to convert metadata formats or need to generate objects (as instantiation of classes) that carry metadata information. The requirements to transform metadata typically stem from the fact that in real life various components implementing different metadata formats need to inter-work. However, typically products from different vendors are plugged together. In this case, the “Metadata Preprocessing” GE acts as a mediator between the various products.

GE description

Figure 19 depicts the components of the “Metadata Preprocessing” Generic Enabler. These functional blocks are the Metadata Interface for inbound streams, Metadata Transformation, Metadata Filtering, and Metadata/Class Interface for outbound (processed) streams.


Figure 19: GE “Metadata Preprocessing”


The functionality of the components is described in the following.

  • Metadata Interface: This interface for inbound streams. Different interchange formats like for streaming and file access can be realized. An example formats is the Real-time Transport Protocol (RTP) as standardized in RFC 3550 [RFC3550]. Different packetization formats for the contained payload data (i.e., the metadata) depending on the application might be used.
  • Metadata Transformation: The Metadata Transformation component is the core component of this Generic Enabler. Based on an XML Stylesheet Language for Transformations (XSLT) and a related stylesheet, the processing of the metadata is performed. In principle, also other kind of transforms (other than XSLT) can be applied. The output of this step is an new encapsulation of the metadata received. This could also be a instantiation of a class (e.g., JAVA, C++, C#, etc.)
  • Metadata Filtering: Metadata Filtering is an optional step in the processing chain. The filtering can be used, e.g., for thinning and aggregation of the metadata, or simple fact generation (i.e., simple reasoning on the transformed metadata).
  • Metadata/Class Interface: Through this interface, the transformed (and possibly filtered) metadata or metadata stream is accessed. Alternatively, instantiated classes containing the metadata can be received.

Figure 20 shows an example realization of the “Metadata Preprocessing” Generic Enabler.


Image:Example_metadata_pre-processing_chain.jpg
Figure 20: Example metadata preprocessing chain


The internal structure is implemented using a plug-in approach in this example, but this does not need be the case necessarily. In the example, timed metadata is received over an RTSP/RTP interface, which implements the metadata interface for inbound data/streams. Different RTP sessions can be handled; therefore metadata streams can be received from several devices (e.g., cameras or other type of sensors). The target in such a realization could be the provision of metadata as facts to the metadata broker, which would be the receiver of the outbound stream. Internally the metadata items (‘facts’) are represented by one class per task, which leads to Java classes with flat hierarchy. (In principle, also derivation of classes for the representation of metadata could be used if there are advantages for the application.) As mentioned above, the Metadata Transformation itself is performed by an XSLT stylesheet. The schema for this transform defines a XML-to-‘XML serialized JavaBeans’ transformation, which produces the Java classes that incorporate the metadata. Individual stylesheets can be used in order to customize the different metadata schemas. Further metadata filtering can be plugged in, which, however, was not necessary in the imaged application of XML-to-‘XML serialized JavaBeans’ transformation.

The external API is yet to be defined, but we envision a RESTful API that permits easy integration into web services other components requiring metadata access and transformation services.

Critical product attributes

  • Encapsulation of transport and metadata transformation as-a-service, usable from other web applications or components
  • Generic metadata transformation approach
  • Transformation based on standardized and commonly used XML Stylesheet Language for Transformations (XSLT).
  • In addition to encapsulation in (XML- or JSON-based) metadata formats, also incorporation of the metadata into objects (e.g., serialized Java/C++/C# classes) can be realized (by simply exchanging the stylesheet for the XSLT).

Location Platform

Target usage

The Location GE in FI-WARE targets any application, GEs in FI-WARE, or any complementary platform enabler, that aims to retrieve mobile device positions and Location area events. The Location GE is based on various positioning techniques such as A-GPS, WiFi and Cell-Id whilst taking into account the end-user privacy.

This GE addresses issues related to Location of mobile devices in difficult environments such as urban canyons and light indoor environments where the GPS receiver in the mobile device is not able to acquire GPS signals. It improves GPS coverage whilst waiting for a GPS position, which helps to enhance the user experience of end-users using location-aware applications through their mobile handsets, and the performance of applications requesting the position of mobile devices.

GE description

The following figure describes the main modules of the Location GE, also called SUPL Location Platform (SLP) as detailed in the Open Mobile Alliance (OMA) standard. This platform relies on two fully standardised protocols:

  • SUPL: The “Secure User Plane Location” protocol facilitates the communication between the SUPL Location Platform (SLP) and any SUPL Enabled Terminal (SET) over TCP/IP. Two main scenarios are standardised:
  • Set-initiated: the SET requests GPS assistance data from the SLP to compute a GPS fix or requests the computation of a WiFi position from measurements,
  • Net-initiated: a third party application requests the position of a SET via MLP (see below) which triggers the sending of a binary SMS to the SET by the SLP. The mobile can then communicate with the SLP over TCP/IP to exchange assistance data, WiFi measurements or send event reports.
  • MLP: The “Mobile Location Protocol” facilitates the communication to and from an application such as Yellow Pages over HTTP/XML. The request contains various parameters that are used by the SLP to determine the best or preferred location method to use and return mobile positions or event reports. Another alternatives for the interface between the SLP and the applications, wrapping usage of MLP, are being considered but are still under discussion (see section 4.4.2).


Image:SUPL_Location_Platform.jpg
Figure 21: SUPL Location Platform


The following describes the main modules of the SLP:

  • Access Control and Privacy Management: A third party requesting the location of an end-user using a SET is first authorised based on login/password, application invoked, number of location requests per month, per second. If the request is accepted, the privacy of the end-user is then verified based on a global profile for all applications or based on a specific application profile. The end-user configures its profiles via SMS or web-pages (self-care); a few examples are listed below:
  • Localisation allowed permanently, once, on specific time windows of the day or refused,
  • Localisation allowed for specific level of accuracy: low (CID), medium (WiFi), high (GPS),
  • Localisation allowed for list of friends at the origin of the location request (contained in MLP request), being the list of friends managed by the proper Security, Trust and Privacy GEs
  • Option to receive notification SMS each time the end-user is being localised,
  • Option to active or deactivate the caching of its location.
  • Quality of Positioning (QoP) Manager: Based on parameters contained in MLP request or set-init request and the rough location of the SET (cell-id), the SLP is able to select the best positioning method to use. This selection mechanism is dynamically configurable in the internal cell database and multiple location technologies can be triggered.

The criteria for the location technology support in the terminal can also be fully configured inside the cell database. For example, yellow pages may want a very quick and rough location, where a “find a friend” application may be more permissive regarding latency in getting a friend location.

When multiple locations are returned by the terminal, the QoP manager is in charge of selecting the best location to use or even perform hybridisation of those locations to generate the final position fix.

A coherence check of the two locations is also performed to ensure the integrity of the end-user location; this feature is also called location authentication.

  • SPC: The SUPL Positioning Centre has in charge the position calculation of the SET, based on the following positioning techniques:
  • A-GPS: Based on GPS&SBAS receivers, the SPC computes assistance GPS data that is used by the SET to enhance its time to first fix and receiver sensitivity, and offer a worldwide coverage of the service. The platform offers additional enhancements related to GPS integrity (allowing the detection of a faulty GPS satellite) and GPS differential corrections used to smooth pseudo-distance measurement degradations. This technique requires the SPC to know the rough location of the terminal with an uncertainty of hundreds of kilometres. The Cell Id data base is either provided by third party, or can be automatically provisioned.
  • WiFi: Based on WiFi hotspot signal strength measurements sent by the SET to the SLP, the SPC is able to compute a position by standard triangulation technique. This technique requires the SPC to have a direct mapping between hotspot Medium Access Control (MAC) address and its position.
  • Cell-Id: Cell identifiers sent by the terminal to the SPC are converted to a position thanks to the internal cell database, which can be provisioned dynamically.

The SUPL Positioning Centre has also in charge the event triggering and processing of event reports from the terminal as requested by a third-party via the MLP interface. The event triggering facilitates the following scenarios:

  • Inside: Each time the terminal is within a specific area, it will send a report back to the SPC which is transferred back to the original third-party application as an event-driven response (Trigger Location Report). Note that it is possible to command the terminal to send periodic reports at configurable intervals as long as it is within the area.
  • Outside: Each time the terminal is outside a specific area, a report will be sent back to the third-party application,
  • ''''Entering: Each time the terminal enters a specific area, a report will be sent back to the third-party application,
  • Leaving: Each time the terminal leaves a specific area, a report will be sent back to the third-party application.

All those reports are based on the area requested by the third-party application in the MLP request, which can be a polygon, a list of GSM cells or even one single cell. In case a polygon is requested, the SPC computes all GSM cells that are within this polygon and borderline to make the terminal computation easier.

Event triggering is illustrated on the following figure, taken from OMA SUPL standard with a third-party application requesting periodic reporting of mobile inside a specific cell:


Image:Event_Triggers.jpg
Figure 22: Event Triggers

Critical product attributes

  • Provides mobile location and geo-fencing events based on standard lightweight protocols.
  • Fully configurable end-user privacy management per third-party application.
  • Best in class GPS assistance data allowing a high service availability, worldwide.
  • Dynamic location technology selection based on end-user environment.


Media-enhanced Query Broker

Target usage

The Media-enhanced Query Broker GE provides an intelligent, abstracting interface for retrieval of data from the FI-WARE data management layer. This is provided in addition to the publish/subscribe interface as another modality for accessing data.

Principal users of the Media-enhanced Query Broker GE include applications that require a selective, on-demand view on the content/context data in the FI-WARE data management platform via a single, unified API, without taking care about the specifics of the internal data storage and DB implementations and interfaces.

Therefore, this GE provides support for integration of query-functions into the users’ applications by abstracting the access to databases and search engines available in the FI-WARE data management platform while also offering the option to simultaneously access outside data sources. At the same time its API offers an abstraction from the distributed and heterogeneous nature of the underlying storage, retrieval and DB / metadata schema implementations.

The Media-enhanced Query Broker GE provides support for highly regular (“structured”) data such as the one used in relational databases and queried by SQL like languages. On the other hand it also supports less regular “semi-structured” data, which are quite common in the XML tree-structured world and can be accessed by the XQuery language. Another data structure supported by the Media-enhanced Query Broker is RDF as a well structured graph-based data model that is queried using the SPARQL language. In addition, the Media-enhanced Query Broker GE provides support for specific search and query functions required in (metadata based) multimedia content search (e.g., image similarity search using feature descriptors).

The question about how non-relational or “NoSQL” databases, which are becoming an increasingly important part of the database landscape, can be integrated, is one of the open points to be addressed during the FI-WARE project.

The underlying approach for the extension of the Media-enhanced Query Broker GE is to try identifying families of (abstract) query languages (based on minimum common denominators of existing query languages) together with preferred representatives allowing to categorize the capabilities of the data resources in respect to what and how they can be queried.

GE description

Main Functionality

The Media-enhanced Query Broker GE is implemented as a middleware to establish unified retrieval in distributed and heterogeneous environments with extensions for integrating (meta-)data in the query and retrieval processes. To ensure interoperability between the query applications and the registered database services, the Media-enhanced Query Broker is based on the following internal design principles:

  • Query language abstraction:
The Media-enhanced Query Broker GE will be capable to handle queries formulated in any of a defined set of query languages/APIs (e.g., XQuery or SQL or SPARQL) used by the services for retrieval. Following this concept, all incoming queries will be converted into an internal abstract format that will be then translated into the respective specific query languages/APIs when accessing the actual data repositories. Addressing this requirement, this abstract query format may be based on and extend XQuery functionalities. As an example, this format may be based on the MPEG Query Format (MPQF) [Smith 08], which supports most of the functions in traditional query lnaguages and also incorporates several types of multimedia specific queries (e.g., temporal, spatial, or query-by-example). By this, requests focusing on data centric evaluation (e.g., exact matches by comparison operators) are inherently supported.
  • Multiple retrieval paradigms
Retrieval systems are not always following the same data retrieval paradigms. Here, a broad variety exists, e.g. relational, No-SQL or XML-based storage or triple stores. The Media-enhanced Query Broker GE attempts to shield the applications from this variety. Further, it is most likely in such systems, that more than one data base has to be accessed for query evaluation. In this case, the query has to be segmented and distributed to applicable retrieval services. This way, the Media-enhanced Query Broker GE acts as a federated database management system.
  • Metadata format interoperability:
For an efficient retrieval process, metadata formats are used to describe syntactic or semantic attributes of resources. Currently there exist a huge number of standardized/proprietary metadata formats for nearly any use case or domain. Therefore it can be estimated, that more than one metadata format is in use in a heterogeneous retrieval scenario. The Media-enhanced Query Broker GE therefore provides functionalities to perform the transformation between diverse metadata formats where a defined mapping exists and is made available.

Query Processing Strategies

The Media-enhanced Query Broker GE is a middleware that can be operated in different facets within a distributed and heterogeneous search and retrieval framework including multimedia retrieval systems. In general, the tasks of each internal component of the Media-enhanced Query Broker (see Figure 24) depend on the registered databases and on the use cases.

In this context, two main query processing strategies are supported, as illustrated in Figure 23.


(a) Local processing (b) Distributed processing
Figure 23: Query processing strategies


The first paradigm deals with registered and participating retrieval systems that are able to process the whole query locally, see Figure 23(a). In this sense, those heterogeneous systems may provide their local metadata format and a local / autonomous data set. A query transmitted to such systems is understood as a whole and the items of the result set are the outcome of an execution of the query. In case of differing metadata formats in the back ends a transformation of the metadata format may be needed before the (sub)query is transmitted. In addition, depending on the degree of overlap among the data sets, the individual result sets may contain duplicates. However, a result aggregation process only needs to perform an overall ranking of the result items of the involved retrieval systems. Here, duplication elimination algorithms may be applied as well.

The second paradigm deals with registered and participating retrieval systems that allow distributed processing on the basis of a global data set, see Figure 23 (b). The involved heterogeneous systems may depend on different data representation (e.g., ontology based semantic annotations and XML-based feature values) and query interfaces (e.g., SPARQL and XQuery) but describe a common (linked) global data set. In this context, a query transmitted to the Media-enhanced Query Broker needs to be evaluated and optimized, which results into a specific query execution plan. In series, segments of the query are forwarded to the respective engines and executed. Now, the result aggregation has to deal with a correct consolidation and (if required) format conversion of the partial result sets. In this context, the Media-enhanced Query Broker GE behaves like a federated Database Management System.

Media-enhanced Query Broker Architecture

Figure 24 illustrates an end-to-end workflow scenario in a distributed retrieval scenario. At its core, the Media-enhanced Query Broker GE transforms incoming user queries (of different formats) to a common internal repesentation for further processing and distribution to registered data resources and aggregates the returned results before delivering it back to the client. In the following, the subcomponents of a potential reference implementation of the Media-enhanced Query Broker GE, based on internal usage of the the MPEG Query Format (MPQF), are briefly described.


Figure 24: Architecture of the Media-enhanced Query Broker


Backend Management Layer

The main functionalities of the Backend Management Layer are the (de-)registration of backends with their capability descriptions and the service discovery for the distribution of queries. These capability descriptions are standardized in ISO 15938-12, allowing the specification of the retrieval characteristics of registered backends. Such characteristics consider for instance the supported query types or metadata formats. In series, depending on those capabilities, this component is able to filter registered backends during the search process (service discovery). For a registered retrieval system, it is very likely that not all functions specified in the incoming queries are supported. In such an environment, one of the important tasks for a client is to identify the backends which provide the desired query functions or support the desired result representation formats identified by e.g. an MIME type using the service discovery.

MPQF Factory Layer

The main purpose of the MPQF Factory Layer is the generation and validation of (internal) MPQF queries. The transformation of incoming user queries is handled through an API. In general, the internal MPQF query representation consists of two main parts. First, the QueryCondition element holds the filter criteria in an arbitrary complex condition tree. Second, the OutputDescription element defines the structure of the result set. In this object, the needed information about required result items, grouping or sorting is stored. After finalizing the query creation step, the generated MPQF query will be registered to the Media-enhanced Query Broker. A set of query templates at the client side can be established to simplify the query creation process using the API approach. In case an instance of a query is created at the client side in MPQF format then this query will be directly registered to the Media-enhanced Query Broker.

This layer optionally also encapsulates interfaces for inserting preprocessing plug-ins. These could for example support perform file conversations.

Query Management Layer

The Query Management Layer organizes the registration of queries and their distribution to the applicable retrieval services. After the registration with a unique identifier of the entire query, the distribution of the query depends on the underlying search concept. For the local processing scenario, the whole query is transmitted to the backends in parallel. In contrast to that, in a distributed processing scenario, the query will be automatically divided in segments by analyzing the query types used in the condition tree. Here, well known tree algorithms like depth-first search can be used. The key intention of this segmentation is that every backend only gets a query segment, which it can process as a whole. In addition, the transformation between metadata formats is another task of the management layer. In order to monitor and manage the progress of received queries, the Media-enhanced Query Broker implements the following query lifecycle: pending (query registered, process not started), retrieval (search started, some results missing), processing (all results available, aggregation in progress), finished (result can be fetched) and closed (result fetched or query lifetime expired). These states are also valid for the individual query segments, since they are also valid MPQF queries.

MPQF Interpreter

MPQF interpreters act as a mediator between the Media-enhanced Query Broker and a particular retrieval service. An interpreter receives an MPQF formatted query and transforms it into native calls of the underlying query language of the backend database or search engine system. In this context, several interpreters (mappers) for heterogeneous data stores have been implemented (e.g., Flickr, XQuery, etc.). Furthermore, an interpreter for object- or relational data stores is envisaged. After a successful retrieval, the Interpreter converts the result set in a valid MPQF formatted response and forwards it to the Media-enhanced Query Broker.


Response Layer and (planned) Backend Benchmarking Layer

The Response Layer performs the result aggregation and returns the aggregated result set. The current implementation provides a Round Robin aggregation mechanism. Additional result aggregation algorithms are under consideration [Döller 08b], which also could take advantage of the Backend Benchmarking Layer.

In order to describe the main advantage of the (planned) Backend Benchmarking Layer (BBL), let us assume a scenario as shown in Figure 23(a). There, for instance image retrieval may be realized by a query by example search. A query may be sent directly to the Media-enhanced Query Broker GE and the whole query is distributed to the applicable backends. The major task in this case is not the distribution, but the result aggregation of the different result sets. The Media-enhanced Query Broker GE has to aggregate the results on the one side by eliminating all duplicates and on the other side by performing a ranking of the individual result items. The initial implementation uses the round robin approach which provides efficient processing of result sets of autonomous retrieval systems. However, it is supposable that different backends use different implementations and quality measures for processing the fuzzy retrieval leading to quality discrepancies between the result sets. Therefore, similar to approaches such as [Crashwell 99] where statistics about sources are collected, the BBL will provide information about the search quality of a supporting a more intelligent re-ranking and aggregation of the result set. This information and a respective query classification model may be realized by a new benchmarking environment that allows to rate the search quality of registered backends. This subcomponent is currently under investigation.

Critical product attributes

  • Middleware component for unified access to distributed and heterogeneous repositories (with extensions supporting multimedia repositories)
  • Provisioning of metadata format interoperability via schema transformation
  • Abstraction from heterogeneous retrieval paradigms in the underlying data bases and search engines

Semantic Annotation

Target usage

Target users are all stakeholders that want to enrich textual data (tags or text) with meaningful and external content.

In the media era of the web, much content is text-based or partially contains text, either as media itself or as metadata (e.g. title, description, tags, etc). Such text is typically used for searching and classifying content, either through folksonomies (tag-based search), predefined categories, or through full-text based queries. To limit information overload with meaningless results there is a clear need to assist this searching process with semantic knowledge, thus helping in clarifying the intention of the user. This knowledge can be further exploited not only to provide the requested content, but also to enrich results with additional , yet meaningful content, which can further satisfy the user needs.

Semantics, and in particular Linked Open Data (LOD), is helpful in both annotating & categorizing content, but also in providing additional rich information that can improve the user experience.

As end-user content can be of any type, and in any language, such enabler requires a general purpose & multilingual approach in addressing the annotation task.

Typical users or applications can be thus found in the area of eTourism or eReading, where content can benefit from such functionality when visiting a place or reading a book, for example being provided with additional information regarding the location or cited characters.

The pure semantic annotation capabilities can be regarded as helpful for editors to categorize content in a meaningful manner thus limiting ambiguous search results (e.g. an article wouldn’t be simply tagged with apple, but with its exact concept, i.e. a fruit, New York City or the brand)

GE Description

Figure 25 depicts the components of the “Semantic Annotator” Generic Enabler. The internal functional blocks are the Text Processor, the Semantic Broker and related resolvers, the Semantic Filter, the Aggregator, and the API.


Figure 25: Semantic Annotation GE


The functionalities of the components are described in the following.

  • Text Processor: this component is in charge of performing a first analysis of the text to be annotated, namely language detection, text cleaning, and natural-language processing. The goal of this component is to pre-process the original text to extract useful information for the next step, for example by identifying specific terms that may be of higher interest for the user. Depending on the tools used in this component, a rich multiword Named Entity Recognition can be performed, as well as some generic text classification to assist the following process.
  • Semantic Broker: this component is composed of a broker itself, assisted by a set of resolvers that can perform full-text or term-based analysis based on the previous output. Such resolvers are aimed at providing candidate semantic concepts referring to Linked Open Data as well as additional related information if available. The exact set of resolvers, and how they are invoked is an implementation issue. Resolvers may be domain- or language-specific, or general purpose.
  • Semantic Filter: Such component is essential in filtering out candidate LOD concepts coming from the broker, and can include algorithms for scoring results (potentially provided by the single resolvers), ranking & validating candidates. This component is thus responsible for solving disambiguations by comparing results together, and with the original context to achieve the best fitted concept. This filter can further cross-check consistency & relation between candidate concepts to improve disambiguation.
  • Aggregator: This component is in charge of further expanding information related to the concepts identified after the filtering process, thus possibly aggregating information from related concepts, and/or from several resolvers to provide a unique composite view of the concept related to a textual term.
  • API: Through this interface, the semantic annotation process is triggered and provides the candidate LOD concepts and related links.

Figure 26 shows an example realization of the “Semantic Annotator” Generic Enabler.


Image:Example_Semantic_Annotator.jpg
Figure 26: Example Semantic Annotator


The internal structure of this example implementation uses a plug-in approach for invoking resolvers, but this does not need be the case necessarily. In the example, the Text processor is implemented using the FreeLing language analyzer, coupled with a language analyzer to identify the text language first. This feature distinguish the enabler from most current annotators (e.g. Zemanta, Evri) which are mostly focused on English-based concepts and content.

The resolvers used in this implementation are mainly DBPedia and Geonames (for location concepts only) as they proof to be generic enough to annotate any type of text.

The current implemented filter is applying an algorithm based on syntactic and semantic matching of the concepts against the original terms.

The API module provides a JSON-based interface to remotely interact with the enabler. Such API provides a list of candidate concepts and related information for each term considered ‘relevant’ within the original text.

Critical product attributes

  • Semantic annotation as a service: general purpose enabler to provide related LOD concepts and information to any text
  • Dual usage: editorial (semantic classification) and end-user (content augmentation)
  • Multilingual support
  • Pluggable approach for adding resolvers


Semantic Application Support

Target usage

Target users are mainly ontology engineers and developers of semantically-enabled applications that need RDF storage and retrieval capabilities. Other GE from the FI-WARE, such as for example the GE for semantic service composition or the query broker, or from the usage areas of the PPP that need semantic infrastructure for storage and querying are also target users of this GE.

GE Description

Ten years had passed since Tim Berners-Lee envisioned a new future for the Web, the Semantic Web. In this future, the Web, that had been mostly understandable by humans, will evolve into a machine understandable Web, increasing its exploitation capabilities. During these years, Semantic Web has focused the efforts of many researchers, institutions and IT practitioners. As a result of these efforts, a large amount of mark-up languages, techniques and applications, ranging from semantic search engines to query answering system, have been developed. Nevertheless the adoption of Semantic Web from IT industry has been a hard and slow way.

In the past few years, several authors had theorized about the reasons preventing Semantic Web paradigm adoption. Reasons can be categorized into two main subjects: technical reasons and engineering reasons. Technical reasons focus on the lack of infrastructure to meet industry requirements in terms of scalability, distribution, security, etc. Engineering reasons stress that methodologies, best practices and supporting tools are needed to allow enterprises developing Semantic Web applications in an efficient way.

Semantic Application Support enabler will address both engineering and technical aspects, from a data management point of view, by providing:

  • An infrastructure for metadata publishing, retrieving and subscription that meets industry requirements like scalability, distribution and security.
  • A set of tools for infrastructure and data management, supporting most adopted methodologies and best practices.

Therefore Semantic Web Application Support enabler will allow users of the GE efficient and effective development of high quality Semantic Web based applications.

Figure 27 illustrates the architecture of the Semantic Web Application Support Enabler.

Three main areas can be identified: Semantic Infrastructure, Semantic Engineering and External Components.

  • Semantic Infrastructure contains services and APIs providing core functionalities for both Semantic Engineering and External components.
  • Semantic Engineering contains tools and services built on top of Semantic Infrastructure functionality that provides ontology management and engineering features to human users.
  • External Components are the clients of the functionality provided by this GE. It contains software agents that take advantage of the functionality provided by the Semantic Infrastructure. In the scope of FI-WARE, two main kinds of External Components can be foreseen: GE from other FI-WARE areas and applications developed by FI-WARE Usage Areas. As Semantic Engineering and External Components share the same Semantic Infrastructure, humans can use engineering functionality to easily modify and mange the behaviour of External Components by modifying stored information.


Figure 27: Semantic Web Application Support Enabler architecture


The Semantic Infrastructure is composed of two layers: the storage layer and the utility layer.

The storage layer contains components providing storage for semantic based metadata. Therefore, the repository would storage RDF triples, while the registry would storage ontologies making them available through HTTP protocol. Both repository and registry should meet strong security, scalability and performance requisites in order to support large scale applications.

The utility layer contains components providing business logic that allows applications exploit RDF plus ontologies semantic capabilities. These components include:

  • Querying component, that allows the execution of SPARQL queries against a RDF repository.
  • Publishing component, that allows the publication of RDF data to a repository.
  • Subscribing component, that allows the subscription for specific data.
  • Reasoning component, that allows the generation of new knowledge from current knowledge and ontologies.

Due to performance reasons, historically reasoning functionality has been provided by repositories instead of a dedicated component. In the scope of FI-WARE the separation between reasoning and storage component would need further investigations.

The Semantic Engineering is also composed of two layers: the services layer and the tools layer.

The services layer contains services that support processes related with ontology and data engineering such us ontology modularization, ontology translation, ontology search, inconsistency resolutions, etc. As engineering and methodologies are continually evolving, the services layer will allow the deployment of new services. The interfaces of these services will need further identification.

The tools layer presents a set of tools supporting ontology engineering processes and ontology development methodologies in an integrated way. As methodologies and engineering processes are continually evolving, the tools layer should provide mechanisms to integrate new tools into the current framework. An initial set of tools has been identified:

  • The ontology browser, that allows users to navigate and visualize through ontology networks.
  • The ontology editor, that allows users to edit ontologies stored in the system.
  • The repository management, that allows users to interact with the repository in the Semantic Infrastructure area.

Critical product attributes

  • Provide an infrastructure for semantic web applications that support large scale applications including: metadata storage in RDF, publication of RDF triples, querying by SPARQL and inference.
  • Provide a framework for supporting methodologies and engineering processes related with metadata management and ontology development.


Generic Enablers Implementing Intelligent Services

The Intelligent Services plug-ins interact with the off-line and real-time stream processing enablers, as well as with data that resides in memory and the persistence layer, to provide analytical and algorithmic capabilities in the following main areas: a) Social Network Analysis, b) Mobility Analysis, c) Real-Time Recommendations, d) Behavioural and Web Profiling, and e) Opinion Mining.

These services will be consumed by other FI-Ware components for providing a personalised user interaction, either by adapting the functionality and behaviour based on the user profiles and aggregated knowledge generated (for example the social communities or common itineraries of a user), or by embedding their functionalities into these components (for example displaying recommendations provided by the real-time recommendations service).

Social Network Analysis

GE Description

The Social Network Analysis (SNA) plug-ins analyse the social interactions of users to unveil their social relationships and social communities, and also build a social profile of both individuals and communities. Social Network intelligent services will add the social dimension to the platform, necessary for providing a truly personalised interaction to users.

Data capturing social interactions between users (Facebook posts or comments, re-tweets, SMSs sent and received…) will be processed and analysed to build a network of social connections, including the strength and nature of such connections. The social network derived from interactions will be further analysed to:

  • Detect the social communities that appear on the network.
  • Build a social profile for the individual user: social connectivity, potential influence on peers, number of communities the user belongs to, etc.
  • Build a social profile for communities: number of members, group cohesion, etc.
  • Profile the communities across a number of attributes (interests, sociodemographics…) based on the homophily principles that appear in social groups.

Other plug-ins can be provided in this group too, among others, model and predict social diffusion processes, or to provide identity information through the construction of a social fingerprint that can be used for detecting user identity thefts and other security threats.

The SNA plug-ins rely on the following enablers:

  • The off-line processing enablers to execute, in a periodic manner, the social network algorithms over the data of social interactions periodically gathered from the various sources available.
  • The persistency layer and in-memory storage to store intermediate and final results that will be consumed by other components or plug-ins to provide for example social recommendations.
  • The stream processing enablers to make incremental updates to the social network knowledge with low latency.

Critical product attributes

  • The Social Network Analytical capabilities provided by the service integrate social communications of different nature (on-line social networks and mobile voice communications) to unveil the true social network of users.
  • The nature of the communications and their temporal pattern is analyzed to identify truly strong social connections, eliminating noise from the data and finding the social relationships that are really important for users. These relationships are the ones that are relevant for social influence propagation or group behaviours.
  • The social communities found on the social network can overlap, i.e., a user can belong to more than one social community e.g. family, friends, colleagues, and they are detected following a scalable and explainable algorithm.
  • Community profiles are created across a number of individual attributes like interests or socio-demographic characteristics.
  • The influence power calculated for users, which captures the potential influence a user can exert in his social circles, has been proven in a number of business applications.

Mobility Analysis

GE Description

The mobility analysis plug-ins transform geo-located user activity information into a mobility profile of the user. Specifically, geo-located events that contain information about the user generating the event and a timestamp are analysed to extract meaningful patterns of the user’s behaviour.

The events processed by the plug-ins must contain longitude and latitude coordinates or have IDs e.g. mobile cell IDs that can be converted into longitude and latitude information through a mobile cell catalogue.

The following information is derived for the user based on the events gathered by the data management platform:

  • Points of Interest of the user (home, workplace, usual leisure areas, etc.) and frequent activity there.
  • Usual area of activity of the user.
  • Frequent itineraries between points of interest.

Other plug-in will receive real-time updates of the user current user-location (longitude-latitude) and will add meaning to this location. The mobility user profile that gets built will be used to provide meaning to the current location of the user, e.g. at home or commuting. This information will be consumed by other services to provide a personalised user interaction based on the current context of the user.


Image:The_mobility_analysis_intelligent_services_transform_geo-located_user_events.jpg
Figure 28: The mobility analysis intelligent services transform geo-located user events into a mobility profile


The mobility analysis plug-ins rely on the following enablers:

  • The off-line processing enablers to create the mobility user profiles, as these profiles refer to usual mobility patterns that do not change frequently. Therefore, batch updates will be run, as the expected latency demands do not require stream processing.
  • The persistency layer and in-memory storage to store intermediate and final results that will be consumed by other components or plug-ins to, for example, provide communications or a personalised interaction based on the estimated current location of the customer.
  • The stream processing enablers to process current user location and provide its meaning to other components.

Critical product attributes

  • Mobility analysis intelligent services transform geo-located user events into a mobility profile that provides a complete view on the usual mobility patterns of the user. This profile is built at a user level and in an automatic fashion, including the detection and labeling of the user points of interest.
  • The services go beyond current location data, attaching meaning to real-time locations and transforming it into more actionable and valuable context information.
  • The algorithms and models have been used over large volumes of data, and they have been proven to be both efficient and scalable.


Real-time recommendations

GE Description

The real-time recommendation module analyses the behaviour of a user through a service in order to make a recommendation of an item that such a user will most likely be interested in. This can be used in order to increase sales, downloads, or simply to improve the user experience and navigation by showing overall more related content to its interests.


Image:Recommendation_system.jpg
Figure 29: Recommendation system


This module receives two different types of information as input:

  • Information from the catalogue of items that will be recommended, such as applications in an application store, ring back tones, music songs or bands, concerts, movies and TV series, etc. Each one of the items in the catalogue needs to have some information describing it such as a title, a unique identifier, possibly a category associated, price and currency if to be sold, a textual description as well as a set of metadata associated to the domain the item belongs to (e.g., device where it runs in the case of applications, actors and director for movies, band and genre for concerts, etc.).
  • Activity events generated by users while accessing and interacting with the service and the different items in the catalogue. This information is used in order to compute a profile of the customer as well as a likelihood model to build the recommendations from. Example of types of events include the visualization of an item (showing interest of the user), a rating (explicit feedback), purchases and downloads, etc.

Using this information, this module is able to generate generic (top purchased, top downloaded, most popular, etc.) as well as personalised recommendations to a given user.

Critical product attributes

  • The Recommendations Module provided by the service exploits user activity information in order to create an individual profile of the customer as well as a global profile of the usage of the service
  • Such profiles are used in order to generate generic recommendations as well as personalised recommendations for items to be the most relevant for each individual user


Web behaviour analysis for profiling

GE Description

The behavioural and web profiling intelligent service is responsible for the derivation of profiles extracted from the activity of each individual user. Generally speaking, in order to profile a user, it is needed one or more feeds of activity, where different types of feeds might be used, such as

  • Transactions of a user within a service (e.g., when the user downloads an element, purchases it, etc.)
  • Click-stream within a service (e.g., when a user visualises an item or a classified page of the application)
  • Web navigation log (e.g., which web pages the customer is visiting)

Each of these activity feeds might allow for the inference of different information, mostly including

  • Categories of interest of a user in a service
  • Domain specific information of the activity of a user in a service (e.g., if it is a heavy user, when it uses it, day or night user, devices/channels used to access the service, etc.)
  • Main categories visited on the Web, or even main keywords of the web pages visited
  • Etc.

Critical product attributes

  • The Behavioural and Web Profiling capabilities provided by the service allows to fully exploit the data generated as part of the interactions of a user with a service.
  • These interactions can be analysed and with the right configuration, information is extracted out of it for each individual user, therefore allowing for the inference of an individual user profile extracted purely from the behavior of a user.


Opinion mining

GE Description

Users frequently express their opinions about diverse topics across different channels (blog posts, tweets, Facebook comments, ratings and reviews, calls to customer care…), and this information provides useful insights on both the user and the object of the opinions.

The opinion mining plug-ins provide the analysis of textual sources to derive information about user’s opinions, performing the following tasks:

  • Language detection.
  • Topic detection and classification.
  • Sentiment analysis (positive, neutral or negative).

Opinions are provided at three different levels of granularity:

  • Opinions of an individual user.
  • Aggregated, general opinions
  • Structure of the opinions, providing an aggregated view but where different opinion groups e.g. tech-savvy users or elderly people are found based on the source where the opinion was found and the profile of the users expressing the opinion.

For providing an accurate view the bias inherent to the source is taken into account. For example, messages sent to customer care have a negative bias and they express opinions of users of the service or product, while public blog posts can come from non-users. Voice analysis in voice calls – like rate of words/second, pitch, whether certain words are used or not etc – can also be input sources.

The opinion mining plug-ins rely on the following enablers:

  • The off-line processing enablers to process the textual sources and perform the required analysis to extract opinions from them.
  • The persistency layer and in-memory storage to store intermediate and final results that will be consumed by other components or plug-ins.


Image:Functionalities_of_the_opinion_mining_intelligent_services.jpg
Figure 30: Functionalities of the opinion mining intelligent services


Critical product attributes

  • Textual information from different sources is analysed to extract topics and classify sentiment.
  • Both an individual and structured view of opinions from different sources is created, as compared to market solutions that only address the extraction of a single aggregated view of opinions.
  • Not all opinion sources and users are treated equally. The services handle the bias inherent to the sources and benefit from the user profile knowledge.


Question Marks

We list hereafter a number of questions that remain open. Further discussion on these questions will take place in the coming months.

Security aspects

Privacy Management

End users are concerned about privacy of their data. Mechanisms ensuring that applications cannot process data nor subscribe to data without the consent of its owner should be put in place. In addition, end users should be able to revoke access to data they own or establish time limitations with respect to that access. They should be able to mask/partially mask/delete their data. This is a particular function of an obfuscation policy management system. They should also be able to delete data they own as well. Supporting this will require to carefully define the concept of “data ownership” (not only for data provided by end users but generated as a result of processing it), creating mechanisms for establishing and managing different rights based on data ownership.

It should be able to establish privacy control mechanisms enabling enforcement of laws established by the regulatory entities, such as government laws for management of citizens privacy data, or laws establishing rules for management of data privacy within enterprises (e.g., data collected within companies about their employees). This, for example, will warrant that data access rights be set by default at a highest privacy enforcement level, only enabling the customer to reduce control within certain limit (this limit shall be anyway governed for some types of sensitive data such as, e.g., enforced control over minority data).

An mechanism shall be put in place for an automatic control of the privacy issues by, e.g. formal rules (policies) accessed and asserted by data owners and government authorities (regulators) entities and processed by a sort of policy enforcement. It should be able to configure the data (context, events and other including meta-data) that is going to be governed under privacy policies. This mechanism should be able to provide proofs of data privacy policy enforcement

The platform should favour a "privacy by design" approach which in particular makes it possible to deliver/retrieve/store only the necessary data.

Last but not least, provision will be made to manage the re-configurability of security policies in the context of highly dynamic service composition environments. However, the user should be given a consistent view of his attached policies.

Auditing

Access of applications to data should be audited in a secured manner. This may imply that some operations be logged and generated logs be protected. However, audit should take place in a way it doesn’t penalize the overall performance.

Transparency of data usage should be ensured: what/who/when data has been collected and how it has been exploited

Trustworthy data

Data may have different level of trustworthiness. Algorithms dealing with analysis of data, and applications in general, should be able to categorize handled data based on trustworthiness level.

A candidate GE is currently under study, which would deal with assessing the quality of information, and in particular the confidence that can be placed in it. Such GE may constitute an essential tool for informational watch, especially in the context of the development of so-called open sources: the increased use of the Web makes it possible for everyone to participate in the information spread and to be a source of information, and thus the quality of information collected on the internet must be assessed. Metadata – typically source reliability but also source certainty – may be processed and analysed by this GE to determine information value. Relationships between information sources, such as affinity and hostility relations, would also be taken into account in order to, e.g., reduce the confirmation effect when sources are known to be friends.

In addition, several aspects should be accountable such as the data storage entity, the data host/provider and the component which exposes some given data.

Data Quality and Control

Data quality concept shall be introduced in FI-WARE in order to allow to the services and applications running on top of FI-WARE capabilities to be selected in and treated accordingly to SLA (Service Level Agreement) and QoS (Quality of Service). However, while the quality of data is optional, its handling within the FI-WARE GE is a mandatory, so that when the QoS/SLA requires certain quality of data and the date are tagged with meta-data about their quality, the FI-WARE shall enable its control and processing respectively.

Identification of Entities and Data

Once data are available within FI-WARE platform they should be univocally identified in order to avoid ambiguity and uncertainty during their handling.

All the entities within FI-WARE shall be unequivocally identified at least within the same application domain in order to process the business logic. In case of coexistence of many service domains within the same application domain, e.g. many different Social Networks providers within the same SN application domain, an arbitrary mechanism, such as a broker, could exist in order to create a strong linking of the same entities even if differently identified, as an alternative to an unique identification among all Social Networks. Then this identification mechanism shall be transparent for the application and services built on top of that domain, e.g. many applications using customers’ data from different Social Networks.

Also entities authentication and data usage authorizations can be built and used upon the aforementioned identification technology.

Data Usage Monitoring and Data Access Control

A dedicated mechanism or framework shall be built-in to or enabled by FI-WARE platform that allows the real-time data access control by the entities and data usage management in order to set up the rules for the data management and avoid incorrect data access or alarm data misuse respectively. The framework should support the authentication of the data/context consumer.

Data and Connectivity Availability and Reliability

FI-WARE platform shall integrate or employ mechanisms ensuring data and connectivity availability and reliability at a certain level (e.g. required by SLA or QoS) implemented through “standard” techniques such as redundancy and high-availability. Moreover, if needed by the service or application or by the data nature the overall FI-WARE system shall support resilient mode and fault-tolerance.

The platform should secure flows during data retrieval, in order to preserve data confidentiality, to ensure authenticity of data as well as ensure data integrity. Reliability of the storing of the data.

Non-repudiation of Data Handling and Communications

Non-repudiation property, if required by application, service or by a data nature, shall be supported and provided accordingly by the FI-WARE platform by any available technological means such as e.g. digital certificates and digital signatures.


Other topics still under discussion

Internal GE communication

The method of communication between GEs is under discussion whether it is proprietary and therefore optimized or standard (perhaps using the same interfaces we expect data be accessible to the Context and Data Management and vise versa). For example, what is the method of communication between the Massive Data Gathering GEs and the Processing GEs. What is the method of communication between GEs and the storage for accessing or writing? And how do the processing GEs communicate with the Pub/Sub GE.

When to use Big Data GE and when to use Complex Event Processing GE

Looking closely at the descriptions of the Complex Event Processing GE and the stream processing portion of the Big Data GE they may appear to be addressing the same requirements. Indeed both address the need to continuously process data on the move, data that flows or streams continuously and that sense can be made out of this data continuously. However they address these requirements differently and for different purposes, thus it is important to describe the commonalities, differences and probably more important to give insight which GE to use, when and for what purpose.

The most important distinction between the two approaches is the programming paradigm. CEP takes more of a declarative (rule-based) approach whereby developers of event processing applications have all the necessary constructs to declare the necessary processing without programming and can therefore be targeted to more business oriented users and suited for higher rates of change of the processing logic. Stream processing is more algorithmic, requiring programming skill and is therefore targeted to IT programmers, where the processing logic is programmed, assembled, compiled and then deployed.

Another distinction is in performance aspects and how the two approaches are designed to address them. Generally, the stream processing approach is designed to cope with very high rates of receiving data usually more than the CEP approach. The latency in generating results is usually lower in stream processing than CEP. This comparison, however, is not based on executing the same processing logic in the two approaches, rather it is based on the nature of the scenarios the approaches are applied to.

Additional insights [Chandy11] to be aware of:

  • Complex pattern matching is common in CEP but it is generally not central to stream processing.
  • Stream processing queries tend to be compositions of database-style operators (e.g., joins) and user-defined operators. These queries mostly represent functionality of aggregation and transformation.
  • Stream processing tends to place a higher emphasis on high data volumes with relatively fewer queries.
  • CEP tends to consider the effect of sharing events across many queries or many patterns as a central problem.
  • Processing unstructured data is typically not considered a scenario for CEP.
  • Streaming systems, having originated mainly from the database community, tend to have a schema that is compatible with relational database schemata, whereas CEP systems support a larger variety of schema types.

Data gathering

In Figure 13, a Massive Data Gathering Enabler is depicted that collects data from different sources. At the moment, this Generic Enabler is not covered (fully) by the GEs listed under section 4.1. However, work to fill this gap is taking place. There are mainly two GEs that already provide part of the functionality. These GEs can be seen as Transformation Enablers (e.g., realised as plug-ins) to the Massive Data Gathering Enabler.

The GE on pre-processing of meta-data (section 4.2.6) transforms and filters incoming meta-data in order to prepare the data for subsequent processing. E.g., a transformation of inbound meta-data to (Java) classes can be performed to supply the correct input format for a stream processing engine requiring a dedicated meta-data format.

The GE on pre-processing of unstructed data (section 4.2.5) works on data without explicit semantic or formatting in contrast to the above enabler. The semantic and ontology of these kinds of input data are derived during the operations inside the GE, e.g., data cleaning, ontology evolution, information extraction, and sentiment analysis.

The two Gerneric Enablers could even interwork inside a Massive Data Gathering Enabler, i.e., the GE on pre-processing of unstructed data serving as input to the GE on pre-processing of meta-data. A final version of the Massive Data Gathering Enabler is envisioned as a generic sink/collector for various kinds of data sources in order to provide input for subsequent processing like Complex Event Processing (section 4.2.2) and Big Data Analysis (section 4.2.3).

Query Broker

The Query Broker GE targets the ambitious goal of being capable to handle queries formulated in any of a defined set of query languages/APIs (e.g., XQuery, SQL or SPARQL). This requires all incoming queries to be converted into an internal abstract format that in turn be translated into the respective specific query languages/APIs supported by the actual data repositories. This also requires to deal with different retrieval systems, not always following the same data retrieval paradigms. A careful analysis should be made on whether this GE can be recommended for general usage or just as a mechanism enabling federation of multiple and heterogeneous data sources, involving multiple data storage formats and retrieval paradigms, when usage scenarios require such federation. The former case would require that execution of queries expressed in a given language do not suffer any performance penalty when issued to a data repository natively supporting that query language.

The Publish/Subscribe Broker GE export operations for querying data/context elements being generated by Event Producers. It would be highly desirable that specification of these operations are aligned to those exported by the Query Broker GE. This would allow, among other things, treating Publish/Subscribe GEs as data sources which may be federated with other data sources. However, this requires further analysis.

Last but not least, the question about how non-relational or “NoSQL” databases, which are becoming an increasingly important part of the database landscape, can be integrated, is also one of the open points to be addressed during the FI-WARE project.

Location services

Some topics has been identified for further discussion regarding capabilities supported by the Location GE beyond those related to support of SUPL and MLP: RRLP protocol, timing advance info from base stations, base station visibility and RX level gained from SIM/ME interface etc. It has also been pointed out that this GE should not be a centralized GE, but a distributed one, that has components running inside smart/feature phones and/or SIM cards. Yet additionally, this should have optionally a “proactive” feature, determining devices to “make noise” if an application wants more accurate location information about them.

The current Location GE establishes MLP as the basic interface that applications may use to retrieve the current position of a mobile device or get event reports linked to variation of these positions (when the device enters, leaves, remains inside, or remains outside a given area). However, discussion is taken place to design how an instance of the Publish/Subscribe Broker GE may be connected to the Location GE so this information may be delivered in the form of context events. This has the advantage of being able to merge handling of location-related events with handling of any other kind of context and even data events relevant to a given application. In addition, it would allow to setup parameters for the reporting of location events linked to a given set of mobile devices that would be common to several applications while still allowing each application to handle its subscription to location events, changing it dynamically without affecting others. Last but not least, it may also make it easier to forward location events to a Complex Event Processing GE or BigData Analysis GE.

Other topics still under discussion regarding the Location GE have to do with enriching its functions as to include location propagation and learning features. Location propagation features are used when location of a desired mobile device is unknown and cannot be retrieved, while that device is in a detectable proximity with another entity (e.g., mobile device of another user, thing that can be detected as near) which location is known and could be retrieved. In this case, the location of the mobile device is obtained through the propagation of the location of the entity (or some location calculated as adjustment of the location of the entity). Learning features require registering locations of mobile devices as they are resolved and location of entities based on some application interactions (e.g., check-in of users in restaurants). Data/context that is learned can then be further exploited to enhance location of mobile devices and entities by the SLP, including the enhancement of propagation techniques.


Terms and definitions

This section comprises a summary of terms and definitions introduced during the previous sections. It intends to establish a vocabulary that will be help to carry out discussions internally and with third parties (e.g., Use Case projects in the EU FP7 Future Internet PPP)

  • Data refers to information that is produced, generated, collected or observed that may be relevant for processing, carrying out further analysis and knowledge extraction. Data in FIWARE has associated a data type and avalue. FIWARE will support a set of built-in basic data types similar to those existing in most programming languages. Values linked to basic data types supported in FIWARE are referred as basic data values. As an example, basic data values like ‘2’, ‘7’ or ‘365’ belong to the integer basic data type.
  • A data element refers to data whose value is defined as consisting of a sequence of one or more <name, type, value> triplets referred as data element attributes, where the type and value of each attribute is either mapped to a basic data type and a basic data value or mapped to the data type and value of another data element.
  • Context in FIWARE is represented through context elements. A context element extends the concept of data element by associating an EntityId and EntityType to it, uniquely identifying the entity (which in turn may map to a group of entities) in the FIWARE system to which the context element information refers. In addition, there may be some attributes as well as meta-data associated to attributes that we may define as mandatory for context elements as compared to data elements. Context elements are typically created containing the value of attributes characterizing a given entity at a given moment. As an example, a context element may contain values of some of the attributes “last measured temperature”, “square meters” and “wall color” associated to a room in a building. Note that there might be many different context elements referring to the same entity in a system, each containing the value of a different set of attributes. This allows that different applications handle different context elements for the same entity, each containing only those attributes of that entity relevant to the corresponding application. It will also allow representing updates on set of attributes linked to a given entity: each of these updates can actually take the form of a context element and contain only the value of those attributes that have changed.
  • An event is an occurrence within a particular system or domain; it is something that has happened, or is contemplated as having happened in that domain. Events typically lead to creation of some data or context element describing or representing the events, thus allowing them to processed. As an example, a sensor device may be measuring the temperature and pressure of a given boiler, sending a context element every five minutes associated to that entity (the boiler) that includes the value of these to attributes (temperature and pressure). The creation and sending of the context element is an event, i.e., what has occurred. Since the data/context elements that are generated linked to an event are the way events get visible in a computing system, it is common to refer to these data/context elements simply as "events".
  • A data event refers to an event leading to creation of a data element.
  • A context event refers to an event leading to creation of a context element.
  • An event object is used to mean a programming entity that represents an event in a computing system [EPIA] like event-aware GEs. Event objects allow to perform operations on event, also known as event processing. Event objects are defined as a data element (or a context element) representing an event to which a number of standard event object properties (similar to a header) are associated internally. These standard event object properties support certain event processing functions.


References

[Craswell 99]

Nick Craswell, David Hawking, and Paul B. Thistlewaite, “Merging results from isolated search engines,” in Proceedings of the Australasian Database Conference, 1999, pp. 189–200.

[Chandy 11]

Mani K. Chandy, Opher Etzion, Rainer von Ammon, “10201 Executive Summary and Manifesto – Event Processing”, Schloss Dagstuhl, 2011.

[Döller 08a]

Mario Döller, Ruben Tous, Matthias Gruhne, Kyoungro Yoon, Masanori Sano, and Ian S Burnett, “The MPEG Query Format: On the way to unify the access to Multimedia Retrieval Systems,” IEEE Multimedia, vol. 15, no. 4, pp. 82–95, 2008.

[Döller 08b]

Mario Döller, Kerstin Bauer, Harald Kosch, and Matthias Gruhne, “Standardized Multimedia Retrieval based on Web Service technologies and the MPEG Query Format,” Journal of Digital Information, vol. 6, no. 4, pp. 315–331, 2008.

[Döller 10]

Mario Döller, Florian Stegmaier, Harald Kosch, Ruben Tous, and Jaime Delgado, “Standardized Interoperable Image Retrieval,” in ACM Symposium on Applied Computing (SAC), Track on Advances in Spatial and Image-based Information Systems (ASIIS), Sierre, Switzerland, 2010, pp. 881–887.

[EPIA]

Opher Etzion and Peter Niblett, “Event Processing in Action”, Manning Publications, August, 2010, ISBN: 9781935182214

[EPTS]

Event Processing Technical Society: http://www.ep-ts.com/

[EPTS-RA 10]

Adrian Paschke and Paul Vincent. “Event Processing Architectures”, Tutorial, DEBS 2010.

http://www.slideshare.net/isvana/debs2010-tutorial-on-epts-reference-architecture-v11c

[ISO 06]

ISO/IEC 23001-1:2006, Information technology – MPEG system technologies – Part 1: Binary MPEG Format for XML, Apr. 2006.

[ISO 08]

ISO/IEC 14496-12:2005 Information technology – Coding of audio-visual objects – Part 12: ISO base media file format, third edition, Oct. 2005.

[OMA-TS-NGSI-Context]

OMA-TS-NGSI_Context_Management-V1_0-20100803-C. Candidate Version 1.0. 03, August 2010. Open Mobile Alliance.

[1]

[RFC3550]

H. Schulzrinne, S. Casner, R. Frederick, and V. Jacobson, “RTP: A transport protocol for real-time applications,” RFC 3550, Jul. 2003.

[RFC3984]

S. Wenger, M. M. Hannuksela, M. Westerlund, and D. Singer, “RTP payload format for H.264 video,” RFC 3984, Feb. 2005.

[Smith 08]

John R. Smith, “The Search for Interoperability,” IEEE Multimedia, vol.15, no. 3, pp. 84–87, 2008.

[W3C 11]

W3C, “Efficient XML Interchange (EXI) Format 1.0”, Mar. 2011

Personal tools
Create a book