We use proprietary and third party's cookies to improve your experience and our services, identifying your Internet Browsing preferences on our website; develop analytic activities and display advertising based on your preferences. If you keep browsing, you accept its use. You can get more information on our Cookie Policy
Cookies Policy
FIWARE.OpenSpecification.Data.QueryBroker - FIWARE Forge Wiki

FIWARE.OpenSpecification.Data.QueryBroker

From FIWARE Forge Wiki

Jump to: navigation, search
Name FIWARE.OpenSpecification.Data.QueryBroker
Chapter Data/Context Management,
Catalogue-Link to Implementation [ <N/A>]
Owner Siemens AG, Thomas Riegel


Contents

Preface

Within this document you find a self-contained open specification of a FIWARE generic enabler, please consult as well the FIWARE Product Vision, the website on http://www.fiware.org and similar pages in order to understand the complete context of the FIWARE platform.


FIWARE WIKI editorial remark:
This page corresponds to Release 3 of FIWARE. The latest version associated to the latest Release is linked from FIWARE Architecture

Copyright

Legal Notice

Please check the following Legal Notice to understand the rights to use these specifications.

Overview

Introduction to the Media-enhanced Query Broker GE

Today data - and especially in the media domain - is produced at an immense rate. By investigating solutions and approaches for storing and archiving the produced data, one rapidly ends up in a highly heterogeneous environment of data stores. Usually, the involved domains feature individual sets of metadata formats for describing content, technical or structural information of multimedia data [Stegmaier 09a]. Furthermore, depending on the management and retrieval requirements, these data sets are accessible in different systems supporting a multiple set of retrieval models and query languages. By summing up all these obstacles, easy and efficient access and retrieval across those system borders is a very cumbersome task [Smith 08]. Standards are one way to introduce interoperability among different peers. Recent developments and achievements in the domain of multimedia retrieval concentrated on the establishment of a multimedia query language (MPEG Query Format (MPQF)) [Döller 08a], standardized image retrieval (JPEG) and the heterogeneity problem between metadata formats (JPEG) [Döller 10]. Another approach for interoperable media retrieval is the introduction of a mediator or middleware system abstracting the communication: a Media-enhanced Query Broker. Acting as middleware and mediator between multimedia clients and retrieval systems, collaboration can be remarkably improved. A Media-enhanced Query Broker accepts complex multi-part and multimodal queries from one or more clients and maps/distributes those to multiple connected Multimedia Retrieval Systems (MMRS). Consequently, implementation complexity is reduced at the client side, as only one communication partner needs to be addressed. Result aggregation and query distribution is also accommodated, further easing client development. However, the actual retrieval process of the data is performed inside the connected data stores.

Target usage

The Media-enhanced Query Broker GE provides a smart, abstracting interface for retrieval of data from the FI-WARE data management layer. This is provided in addition to the publish/subscribe interface (e.g. Context Broker (Publish/Subscribe Broker) GE) as another modality for accessing data.

Principal users of the Media-enhanced Query Broker GE include applications that require a selective, on-demand view on the content/context data in the FI-WARE data management platform via a single, unified API, without taking care about the characteristics of the internal data storage and DB implementations and interfaces.

Therefore, this GE provides support for integration of query-functions into the users’ applications by abstracting the access to databases and search engines available in the FI-WARE data management platform while also offering the option to simultaneously access outside data sources. At the same time its API offers an abstraction from the distributed and heterogeneous nature of the underlying storage, retrieval and DB / metadata schema implementations.

The Media-enhanced Query Broker GE provides support for highly regular (“structured”) data such as the one used in relational databases and queried by SQL like languages. On the other hand it also supports less regular “semi-structured” data, which are quite common in the XML tree-structured world and can be accessed by the XQuery language. Another data structure supported by the Media-enhanced Query Broker is RDF, a well-structured graph-based data model that is queried using the SPARQL language. In addition, the Media-enhanced Query Broker GE provides support for specific search and query functions required in (metadata based) multimedia content search (e.g., image similarity search using feature descriptors).

Example Scenario

To illustrate that the Media-enhanced Query Broker GE is not stuck to the media domain, but can contribute positively in other application fields too, an example from the medical domain is given:

Typically, in the current diagnostic process at hospitals the already identified issues of heterogeneity can be also found. The workflow of a medical diagnosis is mainly based on reviewing and comparing images coming from multiple time points and modalities in order to monitor disease progression over a certain period of time. For ambiguous cases the radiologist deeply relies on reference literature or second opinion. Beside textual data stored in appraisals, a vast amount of images (e.g., CT scans) is stored in Picturing Archive and Communications Systems (PACS), which could be reused for decision support. Unfortunately efficient access to this information is not available due to weak search capabilities.
The mission of the MEDICO application scenario is to establish an intelligent and scalable search engine for the medical domain by combining medical image processing and semantically rich image annotation vocabularies.

Search infrastructure: end-to-end workflow in MEDICO

The figure above sketches an end-to-end workflow inside the MEDICO system. It provides the user with an easy-to-use web-based form to describe the desired search query. Currently, this user interface utilizes a semantically rich data set composed of DICOM tags, image annotations, text annotations and gray-value based (3D) CT images. This leads to a heterogeneous multimedia retrieval environment with multiple query languages: DICOM tags as well as the raw image data are stored in a PACS, annotations describing images, doctor´s letter as well as laboratory examinations are saved in a triple store. Finally, a similarity search can be conducted by the use of an image search engine, which operates on top of extracted image features. Obviously, all these retrieval services are using their own query languages for retrieval (e.g., SPARQL) as well as the actual data representation for annotation storage (e.g., RDF/OWL). To fulfill a sophisticated semantic search, the present interoperability issues have to be solved. Furthermore, it is essential to enable federated search functionalities in this environment. These requirements have been taken into account in the design and implementation of the QueryBroker following the undermentioned design principles. An overview of the architecture can be found in [Stegmaier 10] and [Stegmaier 09b].


Basic Concepts

The QueryBroker is implemented as a middleware to establish unified retrieval in distributed and heterogeneous environments with extended functionality to integrate multimedia specific retrieval paradigms in the overall query execution plan, e.g., multimedia fusion techniques.

Query Processing Strategies

The Media-enhanced Query Broker is a middleware component that can be operated in different facets within a distributed and heterogeneous search and retrieval framework including multimedia retrieval systems. In general, the tasks of each internal component of the Media-enhanced Query Broker depend on the registered databases and on the use cases. In this context, two main query-processing strategies are supported, as illustrated in the following figure.

(a) Local/autonomous processing (b) Distributed processing
Query processing strategies


The first paradigm deals with registered and participating retrieval systems that are able to process the whole query locally, see the left side (a) of the figure above. In this sense, those heterogeneous systems may provide their local metadata format and a local / autonomous data set. A query transmitted to such systems can be completely evaluated by the data store and the items of the result set are the outcome of an execution of the query. In case of differing metadata formats in the data stores, a transformation of the metadata format is needed before the (sub-) query is transmitted. In addition, depending on the degree of overlap among the data sets, the individual result sets may contain duplicates. However, the most central task for the Media-enhanced Query Broker is the result aggregation process that performs an overall ranking of the partial results. Here, duplicate elimination algorithms may be applied as well.

The second paradigm deals with registered and participating retrieval systems that allow distributed processing on the basis of a global data set as illustrated in the right side (b) of the figure above. The involved heterogeneous systems may depend on different data representation (e.g., ontology based semantic annotations and XML-based feature values) and query interfaces (e.g., SPARQL and XQuery) but describe a common (linked) global data set. In this context, a query transmitted to the Media-enhanced Query Broker needs to be evaluated and optimized resulting in a specific query execution plan. Segments of the query are forwarded to the respective engines to be executed in parallel. Subsequently, the result aggregation has to deal with the correct consolidation and (if required) format conversion of the partial result sets. In this context, the Media-enhanced Query Broker behaves like a federated Database Management System.

MPEG Query Format (MPQF)

Before discussing the design and the implementation of the Media-enhanced Query Broker in more detail, the main features of MPQF will be introduced as it is used for representing the queries. MPQF became an international standard in early 2009 as part 12 of the MPEG-7 standard [MPEG-7]. The main intention of MPQF is to formulate queries in order to address and retrieve multimedia data, like audio, images, video, text or a combination of these. At its core, MPQF is a XML based query language and intended to be used in a distributed multimedia retrieval services (MMRS). Beside the standardization of the query language, MPQF specifies the service discovery and the service capability description. Here, a service is a particular system offering search and retrieval abilities (e.g. image retrieval).


Possible scenario for the use of MPQF


The figure above shows a possible retrieval scenario in a MMRS. The Input Query Format (IQF) provides means for describing query requests from a client to a MMRS. The Output Query Format (OQF) specifies a message container for MMRS responses and finally the Query Management Tools (QMT) offer functionalities such as service discovery, service aggregation and service capability description.


Structure of the Input Query Format


In detail, the IQF (see the figure above) can be composed of three different parts. The first is a declaration part pointing to resources (e.g., image file or its metadata description, etc.) that are used within the query condition or output description part. The output description part allows, by using the respective MMRS metadata description, the definition of the structure as well as the content of the expected result set. Finally, the query condition part denotes the search criteria by providing a set of different query types (see the table below) and expressions (e.g., GreaterThan), which can be combined by Boolean operators (e.g., AND). In order to respond to MPQF query requests, the OQF provides the ResultItem element and attributes signaling paging and expiration dates.


Query type Description/Functionality
QueryByMedia Similarity or exact search using query by example (using multimedia data)
QueryByDescription Similarity or exact search using XML based metadata (like MPEG-7)
QueryByFeatureRange Range retrieval for e.g., low level features like color
QueryByFreeText Free text retrieval
SpatialQuery Retrieval of spatial elements within media objects
TemporalQuery Retrieval of temporal elements within media objects (e.g., a scene in a video)
QueryByXQuery Container for limited XQuery expressions
QueryByRelevanceFeedback Retrieval that takes result items of a previous search into accountFree text retrieval
QueryByROI Retrieval based on a certain region of interest
QueryBySPARQL Container for limited SPARQL expressions (a SPARQL expression that operate on a single triple is used to filter information.
Available MPQF query types

Semantic expressions and the QueryBySPARQL query type ensure the retrieval on semantic annotations stored in ontologies possibly defined by RDF/OWL.


Structure of the Query Management Tools


The QMT of MPQF copes with the task of searching for and choosing desired multimedia services for retrieval. This includes service discovery, querying for service capabilities and service capability descriptions. The figure above depicts the element hierarchy of the management tools in MPQF. The management part of the query format consists of either the Input or Output element depending on the direction of the communication (request or response). The MPEG Query Format has been explicitly designed for its use in a distributed heterogeneous retrieval scenario. Therefore, the standard is open for any XML based metadata description format (e.g., MPEG-7 [Matinez 02] or Dublin Core [DublinCore]) and supports, as already mentioned, service discovery functionalities. First approaches in this direction have been realized by [Gruhne 08] and [Döller 08b] which address the retrieval in a multimodal scenario and introduce a MPQF aware Web-Service based middleware. Besides, MPQF adds support for asynchronous search requests as well. In contrast to a synchronous request (the result is allocated as fast as possible) in an asynchronous scenario the user is able to define a time period after when the result will be caught. Such a retrieval paradigm might be of interest for e.g. users of mobile devices with limited hardware/software capabilities. The results of requests (triggered by the mobile device) like “Show me some videos containing information about the castle visible on the example image that has been taken with the digital camera” can then be gathered and viewed at a later point in time from a different location (e.g., the home office) and a different device (e.g., a PC).

Federated Query Evaluation Workflow

As already mentioned, the Media-enhanced Query Broker is not only a routing service for queries to specific data stores, but it is capable of managing federated query execution, too. Thereby Media-enhanced Query Broker transforms incoming user queries (of different formats) to a common internal representation for further processing and distribution to registered data resources and aggregates the returned results before delivering it to the client. In particular it runs through the following central phases:

  • Query analysis
The first step after receiving a query is to register it in the Media-enhanced Query Broker. During registration, the query will be analysed and an according query-tree will be generated. Each sub-query comprising a single query type will become a leaf node. Using the information from the data store registration (cf. "KnowledgeManager" in chapter QueryBroker Architecture) a set of data stores is identified that are able to evaluate certain parts of the incoming query.
  • Query segmentation
The next step is to conduct the actual segmentation of the query based on the already created query-tree. Here, the query will be divided in semantically correct sub-queries, which are again valid MPQF queries but with different semantics. The segmentation has a direct coherence to the set of identified data stores.
  • Generation of a query execution plan
In order to ensure an efficient retrieval, the incoming query (or the generated segments) is transferred into a graph tree structure (directed acyclic graph). After this initial transfer, various techniques for optimization will be applied. The current implementation is able to perform the following optimizations: Early selection push down, move/combination and decamping selections as well as projections, insertion of projections into query execution, join ordering on the basis of selectivity and finally pipelining. Further statistics of the query cache component are used to create an efficient query execution plan on the basis of physical information. Further, it enables the injection of equal (or similar) partial results directly in the query execution planning process.
  • Distribution of query
The query or its segments will be distributed in parallel to the appropriate data stores. After retrieval, the partial result sets will be collected.
  • Consolidation of partial results
The partial result sets will be aggregated with respect to the overall query semantics. For this the query-tree is processed backwards from the leaves to the root in a "breadth-first" manner. In the case where the corresponding parent node defines an AND the partial results are joined with the help of a corresponding established semantic link ("join attribute" - see also Creating a Semantic Link) whereas a union operation is carried out if the parent node presents an OR. Unary operators (cf. Querying) are processed directly on the intermediate result.


The described workflow of the federated query processing can best be illustrated using the example scenario, as depicted in the figure below.


Central steps of the query execution plan

The federation process always needs a global data set or at least knowledge about the interlinking of the data stores in order to perform an aggregation of the partial results. This interlinking is a way to enable a non-invasive integration of the data stores at the mediator.

This principle is called semantic links, for a definition and examples see Creating a Semantic Link. The following figure depicts for the example scenario the diverse data sources forming a common (semantically linked) global data set.


Semantic Link between knowledge bases


QueryBroker Architecture

Knowing the principle processing steps an end-to-end workflow scenario in a distributed retrieval scenario can be sketched, also revealing the architecture. The following figure illustrates the global workflow starting from incoming user queries to returning the aggregated results to the client. It is possible to handle synchronous as well as asynchronous queries. In the following, the subcomponents of a reference implementation of the QueryBroker, based on internal usage of the MPEG Query Format (MPQF), are briefly described. This discussion will be continued thereafter with a focus on the actual implementation.



Architecture of the QueryBroker

  • QueryManager:
The QueryManager is the entry point of every user request. Its main purpose is the receiving of an incoming query as well as API assisted MPQF query generation and validation of MPQF queries. In case an application is not aware in formulating MPQF queries, these can be built by consecutive API calls. Following this, two main parts of the MPQF structure will be created: First, the QueryCondition element holds the filter criteria in an arbitrary complex condition tree. Second, the OutputDescription element defines the structure of the result set. In this object, the needed information about required result items, grouping or sorting is stored. After finalizing the query creation step, the generated MPQF query will be registered at the QueryBroker using the query cache & statistics component In case an instance of a query is created at the client side in MPQF format then this query will be directly registered at the QueryBroker. After a query has been validated, the QueryManager acts as a routing service. It forwards the query to its destination, namely the KnowledgeManager or the RequestProcessing component.
  • KnowledgeManager:
The main functionalities of the KnowledgeManager are the (de-) registration of data stores with their capability descriptions and the service discovery as an input for the distribution of (sub-) queries. These capability descriptions are standardized in MPQF, allowing the specification of the retrieval characteristics of registered data stores. These characteristics consider for instance the supported query types or the metadata formats. Subsequently, depending on those capabilities, this component is able to filter registered data stores during the search process (service discovery). For a registered retrieval system, it is very likely that not all functions specified in the incoming queries are supported. In such an environment, one of the important tasks for a client is to identify the data stores, which provide the desired query functions or support the desired result representation formats identified by e.g. an MIME type using the service discovery.
  • RequestProcessing:
For each query a single RequestProcessing component will be initialized. This ensures parallelism as well as guaranteeing that a single object manages the complete life cycle of a query. The main tasks of this component are query execution planning, optimization of the chosen query execution plan, distribution of a query and result aggregation, as already discussed above. Besides managing the different states of a query, this component sends a copy of the optimized query to the query cache and statistics component, which collects information in order to improve optimization. Regarding the lifetime of a query, the following states have been defined for easing the concurrent query processing: pending (query registered, process not started), retrieval (search started, some results missing), processing (all results available, aggregation in progress), finished (result can be fetched) and closed (result fetched or query lifetime expired). These states are also valid for the individual query segments, since they are also valid MPQF queries.
  • Query cache and statistics:
The query cache and statistics organizes the registration of queries in the query cache. It collects information about data stores, such as execution times, network statistics, etc. Besides, the data store statistics, the complete query will be stored as well as the partial result sets. The information provided by this component will be used for two different optimization tasks, namely: internal query and query stream optimization. Internal query optimization is a technique following well-known optimization rules of the relational algebra (e.g., operator reordering on the basis of heuristics / statistics). In contrast to that, query stream optimization is intended to detect similar / equal query segments that have already been evaluated. If such a segment has been detected, the results can be directly injected into the query execution plan. Obviously, the query cache will also implement the paging functionality.
  • MPQF interpreter:
MPQF interpreters act as a mediator between the QueryBroker and a particular retrieval service. An interpreter receives an MPQF formatted query and transforms it into native calls of the underlying query language of the backend database or search engine system. In this context, several interpreters (mappers) for heterogeneous data stores have been implemented (e.g., Flickr, XQuery, etc.). Furthermore, an interpreter for object- or relational data stores is envisaged. After a successful retrieval, the Interpreter converts the result set in a valid MPQF formatted response and forwards it to the QueryBroker.


Main Interactions

Modules and Interfaces

This section covers the description of the software modules and interfaces of the QueryBroker. First, the overall architecture will be highlighted, followed by the actual backend and frontend functionalities. The implementation at its core is based on the Spring Framework (e.g., enabling extensibility and inversion of control) and MAVEN (e.g., quality assurance and build management).

Architecture

The following figure shows an overview over the QueryBroker software architecture. Only the key elements are listed below for getting a quick impression how the elements are related.


QueryBroker software architecture

  • Broker represents the central access point to the federated query framework. It provides the functionality to query distributed and heterogeneous multimedia databases using MPQF as query format. The main task is to receive MPQF-queries and control the following request processing (synchronous / asynchronous mode of operation or result fetching). See the section on Frontend Functionality for more information.
  • QueryManager handles all received and active queries. New queries can be checked-in and corresponding result sets can be checked-out by the Broker.
  • RequestProcessing controls single query processing in a parallelized way. First an execution plan for the received query is created, followed by an optimization of the plan. Afterwards the query distribution and aggregation of the resulting sub-queries are performed. The implementations of the 4 parts are injected via the Spring framework and can be modified easily by XML configuration.
  • ExecutionPlanCreator transforms the received MPQF query tree into an internal execution plan tree structure.
  • ExecutionPlanOptimizer optimizes the default execution plan by replacing or switching the original tree nodes. The tree can be also transformed into a directed acyclic graph (DAG), to avoid isomorphic sub-trees in the execution plan.
  • QueryDistributor has to analyse which sub-trees of the execution plan have to be distributed. Sub-queries can consist of one or many distributed queries to service endpoints. Each distributed query gets encapsulated in a Service Execution.
  • ServiceExecution is a wrapper for a parallel execution of a service endpoint to utilize multicore processors.
  • QueryAggregator gets the sub-queries including the results from the service endpoints and the query execution plan. The aggregator can combine theses two elements and process the queried results.
  • BackendManagement provides the functionality to register and remove service endpoints. (See Chapter Backend Functionality for more information)
  • Service interface has to be implemented by any service endpoint. A service endpoint connects a database or another dataset to the multimedia query framework.

Backend Functionality

Before queries can be sent to the QueryBroker, the backend management has to be set up. All backend functionalities are reachable through the BackendManagement singleton (de.uop.dimis.air.backend.BackendManagement). Here, services endpoints can be (de-) registered and semantic links between them created. A service endpoint provides the functionality to connect a database or dataset to the multimedia query framework. A semantic link is meant to define the atomic connection between two heterogeneous and distributed knowledge bases on the basis of semantically equal properties. The semantic links can be set by XPath expressions.

(De-)Register a Service

Service endpoints are able to execute sub trees of the query execution plan. At the moment only single leaves are supported as sub trees. These can be Query-By-Media or Query-by-Description. To register a service endpoint, which has to implement the Service Interface (de.uop.dimis.air.backend.Service), a valid MPQF message needs to be formulated like the following:

<?xml version="1.0" encoding="UTF-8"?>
<mpqf:MpegQuery mpqfID="" xmlns:mpqf="urn:mpeg:mpqf:schema:2008" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="urn:mpeg:mpqf:schema:2008 mpqf_semantic_enhancement.xsd">
 	<mpqf:Management>
 		<mpqf:Input>
 			<mpqf:DesiredCapability>
 				<!-- Query By Media: 100.3.6.1(Standard Annex B.2) -->
 				<mpqf:SupportedQueryTypes   href="urn:mpeg:mpqf:2008:CS:full:100.3.6.1" />
 			</mpqf:DesiredCapability>
 			<mpqf:ServiceID>
 				de.uop.dimis.air.ExampleService
 			</mpqf:ServiceID>
 		</mpqf:Input>
 	</mpqf:Management>
</mpqf:MpegQuery>

This contains the ServiceID, which is equal to the qualified name of the implementation class. The DesiredCapabilities declare which query types the service can handle. In this example the ExampleService can handle Query-By-Media. See the MPQF-Standard Annex B.2 for more Query URNs. In order to deregister a service endpoint a MPQF-Register-Message must be sent with an empty list of desired capabilities.

Creating a Semantic Link

To be able to merge results from different services it is necessary to know which fields can be used for identification (cp. Primary key in relational database systems). For every pair of services a semantic link can be defined. If such a link is undefined, a default semantic link will be created at runtime. The default semantic link uses the identifier field of the JPSearch Core Meta Schema for every Service. KeyMatchesType-Messages are used for the registration of a semantic link:

<?xml version="1.0" encoding="UTF-8"?>
<key:KeyMatches xmlns:key="urn:keyMatches:schema:2011" xmlns:xsi=http://www.w3.org/2001/XMLSchema-instance schemaLocation="urn:keyMatches:schema:2011 keys.xsd">
    <key:DB id="de.uop.dimis.air.mpqfManagement.interpreter.DummyInterpreterQbM">
        <key:Key>
            <key:Field>identifier</key:Field>
            <key:ReferencedDB>
                de.uop.dimis.air.mpqfManagement.interpreter.DummyInterpreterQbD
            </key:ReferencedDB>
            <key:ReferencedDBField>identifier</key:ReferencedDBField>
        </key:Key>
     </key:DB>
</key:KeyMatches>

The KeyMatchesType contain the Ids of source and target/referenced database (service endpoint) and the fields that should be used to identify results from both services as equal. A KeyMatchesType can contain multiple referenced databases. When you register a new semantic link between two Services, three semantic links will be generated. In addition to the registered link, the reflexive links will also be created by using the identifier for this database. If this particular reflexive semantic link already exists, it will be updated with the current field. Note that semantic links are symmetric (undirected edges between services). One has to be aware that semantic links are not transitive.

Frontend Functionalities

After at least one service endpoint is registered and the backend configuration is done, the QueryBroker is available for multimedia queries. The frontend functionalities are reachable through the Broker singleton (de.uop.dimis.air.Broker). Here you can start synchronous/asynchronous queries or fetch the query results for a specified asynchronous query.

Querying

The QueryBroker uses the MPEQ Query Format (MPQF) to describe queries. The XML-based query format is implemented by use of the Java Architecture for XML Binding (JAXB). The transformed binding java code is located in the package de.uop.dimis.air.internalObjects.mpqf. It is possible to describe a query with an xml file or specify the conditions directly in Java. Since the MPQF-Standard has much complex functionality, not all query operators are currently implemented in the QueryBroker.

Implemented operators:

  • Projection
  • Limit
  • Distinct
  • GroupBy (with aggregation) over multiple attributes
  • Or (half blocking, merging, using hashmaps for improved runtime)
  • And (half blocking, merging, using hashmaps for improved runtime)
  • SortBy over a single attribute


Synchronous Query

A synchronous query can be sent by setting the isImmediateResponse-field of the MPQF-Query to true. The QueryBroker blocks the query until the query process is finished and the client gets the results immediately. A possible minimal synchronous query can look like the following XML-file. Here, a single Query-By-Media (similar search for an image with the url 'http://any.uri.com') is sent to the QueryBroker:

<?xml version="1.0" encoding="UTF-8"?>
<mpqf:MpegQuery mpqfID="">
     <mpqf:Query>
          <mpqf:Input immediateResponse="true">
               <mpqf:QueryCondition>
                    <mpqf:Condition xsi:type="mpqf:QueryByMedia" matchType="similar">
                         <mpqf:MediaResource resourceID="res01">
                              <mpqf:MediaResource>
                                   <mpqf:MediaUri>http://any.uri.com</mpqf:MediaUri>
                              </mpqf:MediaResource>
                         </mpqf:MediaResource>
                    </mpqf:Condition>
               </mpqf:QueryCondition>
          </mpqf:Input>
     </mpqf:Query>
</mpqf:MpegQuery>


Asynchronous Query

To start an asynchronous query the isImmediateResponse-field of the MPQF-Query has to be set to false. The QueryBroker sends a response with a unique MPQF query id. So, the results for the query can be fetched afterwards by referring to the retrieved id.


Complex Query Example

The following XML code shows a more complex query example. The result count is limited to 10 items (maxItemCount), the results are sorted ascending by the “identifier”-field and a projection on the field “description” (ReqField) takes place. The query condition consists of a join of a QueryByMedia and a QueryByDescription, which contains metadata constraints described by the MPEG-7 metadata schema.

<?xml version="1.0" encoding="UTF-8"?>
<mpqf:MpegQuery mpqfID="101" xmlns:mpqf="urn:mpeg:mpqf:schema:2008" 
     xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
     xsi:schemaLocation="urn:mpeg:mpqf:schema:2008 mpqf_semantic_enhancement.xsd">
    <mpqf:Query>
        <mpqf:Input>
            <mpqf:OutputDescription maxItemCount="10" distinct="true">
                <mpqf:ReqField typeName="description"></mpqf:ReqField>
                <mpqf:SortBy xsi:type="mpqf:SortByFieldType" order="ascending">
                    <mpqf:Field>identifier</mpqf:Field>
                </mpqf:SortBy>
            </mpqf:OutputDescription>
            <mpqf:QueryCondition>
                <mpqf:Condition xsi:type="mpqf:AND">
                    <mpqf:Condition xsi:type="mpqf:QueryByMedia">
                        <mpqf:MediaResource resourceID="ID_5001">
                            <mpqf:MediaUri>http://tolle.uri/1</mpqf:MediaUri>
                        </mpqf:MediaResource>
                    </mpqf:Condition>
                    <mpqf:Condition xsi:type="mpqf:QueryByDescription">
                        <mpqf:DescriptionResource resourceID="desc001">
                            <mpqf:AnyDescription 
                                  xmlns:mpeg7="urn:mpeg:mpeg7:schema:2004" 
                                  xsi:schemaLocation="urn:mpeg:mpeg7:schema:2004 M7v2schema.xsd">
                                <mpeg7:Mpeg7> 
                                    <mpeg7:DescriptionUnit 
                                             xsi:type="mpeg7:CreationInformationType">
                                        <mpeg7:Creation>
                                            <mpeg7:Title>Example Title</mpeg7:Title>
                                        </mpeg7:Creation>
                                    </mpeg7:DescriptionUnit>
                                </mpeg7:Mpeg7>
                            </mpqf:AnyDescription>
                        </mpqf:DescriptionResource>
                    </mpqf:Condition>
                </mpqf:Condition>
            </mpqf:QueryCondition>
        </mpqf:Input>
    </mpqf:Query>
</mpqf:MpegQuery>

Query Execution Tree Evaluation

The query aggregator evaluates the query execution plan (QEP). The result of this evaluation is a number of results that will later be returned to the querying client.

There are blocking, half blocking and none blocking operators. A blocking operator needs all results from its children to decide which result will returned next. The SortBy operator is a blocking operator. An operator is half blocking, if it doesn’t need all results from every child. The AND operator is implemented in such a way. None blocking operators like LIMIT can forward results without knowing every other possible result.

Some operators have to merge results. If two results are equal (according to the specific semantic link) they must be merged. Merging operators are for example AND and OR. Merging two results means that one result is augmented with additional information from the second result. No information is overwritten.


A detailed description on how to access the software modules and interfaces of the QueryBroker is provided in the User and Programmer Guide of the Query Broker. It explains the necessary steps to integrate the QueryBroker into another application and how to access its actual backend and frontend functionalities. Additionally a code example is given, which shows an example implementation of all required steps to initialize and run the QueryBroker.


Design Principles

To ensure interoperability between the query applications and the registered database services, the Media-enhanced Query Broker is based on the following internal design principles:

  • Query language abstraction:
The Media-enhanced Query Broker is capable of federating an arbitrary amount of retrieval services utilizing various query languages/APIs (e.g., XQuery, SQL or SPARQL). This is achieved by converting all incoming queries into an internal abstract format that is finally translated into the respective specific query languages/APIs of a data store. As an internal abstraction layer, the Media-enhanced Query Broker makes use of the MPEG Query Format (MPQF) [Smith 08], which supports most of the functions in traditional query languages as well as several types of multimedia specific queries (e.g., temporal, spatial, or query-by-example).
  • Multiple retrieval paradigms:
Retrieval systems do not always follow the same data retrieval paradigms. Here, a broad variety exists, e.g. relational, No-SQL or XML-based storage or triple stores. The Media-enhanced Query Broker attempts to shield the applications/users from this variety. Further, it is most likely in such systems, that more than one data store has to be accessed for query evaluation. In this case, the query has to be segmented and distributed to applicable retrieval services. Following this, the Media-enhanced Query Broker acts as a federated database management system.
  • Metadata format interoperability:
For an efficient retrieval process, metadata formats are applied to describe syntactic or semantic attributes of (media) resources. There currently exist a huge number of standardized or proprietary metadata formats covering nearly every use case and domain. Thus more than one metadata format are in use in a heterogeneous retrieval scenario. The Media-enhanced Query Broker therefore provides functionalities to perform the transformation between diverse metadata formats where a defined mapping exists and is made available.
  • Modular architectural design:
A modular architectural design should always be striven for in software development. The central aspects in these topics are convertibility, extensibility and reusability. These ensure loosely coupled components in the overall system supporting an easy extension of the provided functionality of components, or even the replacement of these by new implementations.


References

[DICOM]

Digital Imaging and Communications in Medicine; The DICOM Standard

[Döller 08a]

M. Döller, R. Tous, M. Gruhne, K. Yoon, M. Sano, and I. S. Burnett, “The MPEG Query Format: On the Way to Unify the Access to Multimedia Retrieval Systems,” IEEE Multimedia, vol. 15, no. 4, pp. 82–95, 2008.

[Döller 08b]

Mario Döller, Kerstin Bauer, Harald Kosch and Matthias Gruhne. Standardized Multimedia Retrieval based on Web Service technologies and the MPEG Query Format. Journal of Digital Information, 6(4):315-331,2008.

[Döller 10]

M. Döller, F. Stegmaier, H. Kosch, R. Tous, and J. Delgado, “Standardized Interoperable Image Retrieval,” in Proceedings of the ACM Symposium on Applied Computing, Track on Advances in Spatial and Image-based Information Systems, (Sierre, Switzerland), pp. 881–887, 2010.

[DublinCore]

Dublin Core Metadata Initiative. Dublin Core metadata element set – version 1.1: Reference description. http://dublincore.org/documents/dces/, 2008.

[Gruhne 08]

Matthias Gruhne, Peter Dunker, Ruben Tous and Mario Döller. Distributed Cross-Modal Search with the MPEG Query Format. In 9th International Workshop on Image Analysis for Multimedia Interactive Services, pages 211-224, Klagenfurt, Austria, May 2008. IEEE Computer Society.

[JAXB]

Java Architecture for XML Binding (JAXB), Metro Projekt, http://jaxb.dev.java.net/

[Matinez 02]

J. M. Matinez, R. Koenen and F. Pereira. MPEG-7. IEEE Multimedia, 9(2):78-87, April-June 2002.

[MAVEN]

Apache Maven, Apache Software Foundation, http://maven.apache.org/

[MPEG-7]

ISO/IEC 15938-1:2002 - Information technology -- Multimedia content description interface -- http://www.iso.org/iso/iso_catalogue/catalogue_tc/catalogue_detail.htm?csnumber=34228

[Smith 08]

J. R. Smith, “The Search for Interoperability,” IEEE Multimedia, vol. 15, no. 3, pp. 84–87, 2008.

[Spring]

The Spring Framework, SpringSource, 2012, http://www.springsource.org/

[Stegmaier 09a] 

F. Stegmaier, W. Bailer, T. Bürger, M. Döller, M. Höffernig, W. Lee, V. Malaisé, C. Poppe, R. Troncy, H. Kosch, and R. V. de Walle, “How to Align Media Metadata Schemas? Design and Implementation of the Media Ontology,” in Proceedings of the 10th International Workshop of the Multimedia Metadata Community on Se- mantic Multimedia Database Technologies in conjunction with SAMT, vol. 539, (Graz, Austria), pp. 56–69, December 2009.

[Stegmaier 09b]

Florian Stegmaier, Udo Gröbner, Mario Döller. “Specification of the Query Format for medium complexity problems (V1.1)”, Deliverable CTC 2.5.15 of Work-Package 2 (“Video, Audio, Metadata, Platforms”) of THESEUS Basic Technologies, 2009.

[Stegmaier 10]

Florian Stegmaier, Mario Döller, Harald Kosch, Andreas Hutter and Thomas Riegel. “AIR: Architecture for Interoperable Retrieval on distributed and heterogeneous Multimedia Repositories”. 11th International Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS2010), Desenzano del Garda, Italy, p 1-4.

[XPath]

XML Path Language (XPath), 2.0 (Second Edition, W3C Recommendation, 14. December 2010, http://www.w3.org/TR/xpath20/


Detail Specifications

The following is a list of Open Specifications linked to this Generic Enabler. Specifications labeled as "PRELIMINARY" are considered stable but subject to minor changes derived from lessons learned the development of a first reference implementation planned for the current Major Release of FI-WARE. Specifications labeled as "DRAFT" are planned for future Major Releases of FI-WARE but they are provided for the sake of future users.

Open API Specifications


Re-utilised Technologies/Specifications

  • At its core, the QueryBroker utilizes the MPEG Query Format (MPQF) [ISO/IEC 15938-12:2008] as common internal representation for input and output query description and managing the backend search services. A comprehensive overview can be found in the papers [1,2].
  • The GE itself is implemented in Java.
Standards
[ISO/IEC 15938-12:2008] "Information Technology - Multimedia Content Description Interface - Part 12: Query Format". Editors: Kyoungro Yoon, Mario Doeller, Matthias Gruhne, Ruben Tous, Masanori Sano, Miran Choi, Tae-Beom Lim, Jongseol James Lee, Hee-Cheol Seo.
[ISO/IEC 15938-12:2008/Cor.1:2009] "Information Technology - Multimedia Content Description Interface - Part 12: Query Format, TECHNICAL CORRIGENDUM 1". Editors: Kyoungro Yoon, Mario Doeller.
Take a look at the latest version of the MPEG Query Format XML Schema at https://svn-itec.uni-klu.ac.at/repos/MPEGSchema/trunk/mpeg7/mpqf.xsd
References
[1] Mario Döller, Ruben Tous, Matthias Gruhne, Kyoungro Yoon, Masanori Sano, and Ian S Burnett, “The MPEG Query Format: On the way to unify the access to Multimedia Retrieval Systems”, IEEE Multimedia, vol. 15, no. 4, pp. 82–95, 2008.
[2] Ruben Tous and Jaime Delgado (2008). Semantic-driven multimedia retrieval with the MPEG Query Format. 3rd International Conference on Semantic and Digital Media Technologies SAMT 3 - 5 Dec 2008, Koblenz, Germany. Lecture Notes in Computer Science. ISSN 0302-9743. Volume 5392/2008. ISBN 978-3-540-92234-6. Pag. 149-163.


Terms and definitions

This section comprises a summary of terms and definitions introduced during the previous sections. It intends to establish a vocabulary that will be help to carry out discussions internally and with third parties (e.g., Use Case projects in the EU FP7 Future Internet PPP). For a summary of terms and definitions managed at overall FI-WARE level, please refer to FIWARE Global Terms and Definitions

  • Data refers to information that is produced, generated, collected or observed that may be relevant for processing, carrying out further analysis and knowledge extraction. Data in FIWARE has associated a data type and avalue. FIWARE will support a set of built-in basic data types similar to those existing in most programming languages. Values linked to basic data types supported in FIWARE are referred as basic data values. As an example, basic data values like ‘2’, ‘7’ or ‘365’ belong to the integer basic data type.
  • A data element refers to data whose value is defined as consisting of a sequence of one or more <name, type, value> triplets referred as data element attributes, where the type and value of each attribute is either mapped to a basic data type and a basic data value or mapped to the data type and value of another data element.
  • Context in FIWARE is represented through context elements. A context element extends the concept of data element by associating an EntityId and EntityType to it, uniquely identifying the entity (which in turn may map to a group of entities) in the FIWARE system to which the context element information refers. In addition, there may be some attributes as well as meta-data associated to attributes that we may define as mandatory for context elements as compared to data elements. Context elements are typically created containing the value of attributes characterizing a given entity at a given moment. As an example, a context element may contain values of some of the attributes “last measured temperature”, “square meters” and “wall color” associated to a room in a building. Note that there might be many different context elements referring to the same entity in a system, each containing the value of a different set of attributes. This allows that different applications handle different context elements for the same entity, each containing only those attributes of that entity relevant to the corresponding application. It will also allow representing updates on set of attributes linked to a given entity: each of these updates can actually take the form of a context element and contain only the value of those attributes that have changed.
  • An event is an occurrence within a particular system or domain; it is something that has happened, or is contemplated as having happened in that domain. Events typically lead to creation of some data or context element describing or representing the events, thus allowing them to processed. As an example, a sensor device may be measuring the temperature and pressure of a given boiler, sending a context element every five minutes associated to that entity (the boiler) that includes the value of these to attributes (temperature and pressure). The creation and sending of the context element is an event, i.e., what has occurred. Since the data/context elements that are generated linked to an event are the way events get visible in a computing system, it is common to refer to these data/context elements simply as "events".
  • A data event refers to an event leading to creation of a data element.
  • A context event refers to an event leading to creation of a context element.
  • An event object is used to mean a programming entity that represents an event in a computing system [EPIA] like event-aware GEs. Event objects allow to perform operations on event, also known as event processing. Event objects are defined as a data element (or a context element) representing an event to which a number of standard event object properties (similar to a header) are associated internally. These standard event object properties support certain event processing functions.
Personal tools
Create a book