We use proprietary and third party's cookies to improve your experience and our services, identifying your Internet Browsing preferences on our website; develop analytic activities and display advertising based on your preferences. If you keep browsing, you accept its use. You can get more information on our Cookie Policy
Cookies Policy
FIWARE.OpenSpecification.MiWi.AugmentedReality R3.3 - FIWARE Forge Wiki

FIWARE.OpenSpecification.MiWi.AugmentedReality R3.3

From FIWARE Forge Wiki

Jump to: navigation, search
Name FIWARE.OpenSpecification.MiWi.AugmentedReality
Chapter Advanced Middleware and Web UI,
Catalogue-Link to Implementation AugmentedReality
Owner University of Oulu, Antti Karhu



Within this document you find a self-contained open specification of a FIWARE generic enabler, please consult as well the FIWARE Product Vision, the website on http://www.fiware.org and similar pages in order to understand the complete context of the FIWARE platform.

Release Information

This document corresponds the release R3.3


Legal Notice

Please check the following Legal Notice to understand the rights to use these specifications.


The goal of the Augmented Reality Generic Enabler is to provide a high-level application programming interface, which can be used in developing various kinds of HTML5 Augmented Reality applications, which run on web browsers without any specific plug-ins. These applications rely on the functionality of the other GEs, like XML3D Technology, POI Data Provider, etc. Such AR applications will provide additional virtual content on top of the real world surroundings, and enhance the user’s view of the physical world with information that is not directly visible.

Target Usage

The anticipated use-cases are diverse, but the basic functionality in typical use-cases will be similar. With the camera and sensors in a smartphone and tablet devices, the future applications would use the Augmented Reality GE to capture 2D/3D video streams of the environment, determine the location and pose of the user in this environment, and embed virtual content on top of it. In more basic use-cases, augmented reality applications will overlay simple head up displays, images or text into the user’s field of view. In more complex use-cases augmented reality applications will display sophisticated 3D models rendered in such a way that they are blended into the surrounding of the natural scene, appearing indistinguishable from it. The AR interface will focus on the less complex use-cases, and the virtual content will most likely be overlay objects rendered over video feed, because real-time registration and tracking of the real world is computationally expensive. Especially in the case of more complex vision based techniques. Although the computational power of mobile devices have greatly increased, mobile web browsers are still lacking in computationally intensive tasks. Furthermore, sensor data needed in location-based techniques must be highly accurate.

Example Scenario

One use-case could be, that AR application searches nearby restaurants from remote service (POI Data Provider), fetches the locations of restaurants and displays information like restaurant menu, opening hours, user reviews and so on.

Basic Concepts

3rd Party Web Service

3rd Party Web Service provide the actual content (text, images, 3D models, etc) displayed by the augmented reality application. The content resides on servers and can be queried using their specific APIs.

Related FIWARE generic enablers:

Registration and Tracking

Registration and tracking is the process of aligning a virtual object with a real scene in the three-dimensional space. For smartphone and tablet device applications, object tracking involves either location sensors or an image recognition system or a combination of the two. Mobile augmented reality techniques can be roughly classified into two categories based on the type of registration and tracking technology used: location-based and vision-based.

Location Based

Location based techniques determine the location and orientation of a mobile device using sensors such as GPS, compass, gyroscope and accelerometer, and then overlay the camera display with information relevant to the user’s location or direction. This information is usually obtained from remote services, which provide location based information.

Vision Based

Vision based techniques try to obtain information of the shape and location of the real world objects in the environment, using image processing techniques or predefined markers. The information is then used to align virtual content in the real world surroundings. These techniques may be subdivided into two main categories: Marker based and markerless tracking.

Marker Based Tracking

Markers create a link between the real physical world and the virtual world. Once a marker is found in a video frame, it is possible to extract its position, pose and other properties. After that the marker properties can be used to render some virtual object upon it. These markers are usually simple monochrome markers, which can be detected easily using less complex image processing algorithms.

1. Marker: A fiducial marker is defined as a square box that is divided into a grid. The outside cells of the grid are always black and the inside cells can be either white or black. The inside cells are used to encode data in binary form.
2. Image Marker: An image marker is very similar to the above defined marker with the difference being that the resolution of the grid is a bit larger and the inside cells are selected to form a shape or logo.

Markerless Tracking

Markerless tracking tracks features and/or predefined images or objects from the environment instead of fiducial marks. The specific features are recognized by image feature analysis. Markerless tracking is more flexible, because any part of the real environment may be used as a target that can be tracked in order to place virtual objects.


Alvar is an augmented reality software library developed by VTT Technical Research Centre of Finland. The main functionality of Alvar is to detect predefined markers from input video data and to return information (transformation and visibility) about the found markers as a response.


In basic computer graphics, the virtual scene is projected on an image plane using a virtual camera and this projection is then rendered on screen. In augmented reality the virtual content is rendered on top of the camera image. Therefore, the virtual camera has to be identical to the device’s real camera, hence the optical characteristics of the virtual and real camera must be similar. This guarantees that the virtual objects in the scene are projected in the same way as real world objects.

Related FIWARE generic enablers:

Generic Architecture

The Augmented Reality GE architecture consist of the following components:

  • Location-based tracking and registration: The Location-based Tracking and Registration component is used to sense relevant information from the real-world surroundings. This is realized by combining the following W3C specifications: Geolocation, DeviceOrientation, DeviceLight, DeviceProximity into the HTML Sensor Events component. Geolocation API is used to locate the mobile device's location. DeviceOrientation API is used to sense the amount the device is leaning side-to-side, front-to-back and the direction the device is facing. DeviceProximity API is used to sense the distance between the mobile device and a nearby physical object. DeviceLight API is used to to sense the light intensity level in the surrounding environment.

  • Vision-based Registration and Tracking: The Vision-based Registration and Tracking component is used to detect markers from a video stream obtained from a mobile device’s camera. This is realized by combining the HTML MediaStream interface, ALVAR and Xflow. The HTML MediaStream interface provides access to a video stream from a local camera. The video stream is used as a input to JavaScript version of ALVAR, which detects predefined markers and returns information (transformation and visibility) about the discovered markers. At the moment the functionality of the JavaScript version is limited to marker-based tracking (1) Markers and (2) Image Markers. ALVAR functionality is encapsulated into a Xflow dataflow node. Hence the ALVAR can be easily placed into the DOM structure.

  • 3D Scene Management: The 3D Scene Management component is an interface between the data obtained from the previous two components and XML3D. For example, a mobile web AR application can use the orientation of a mobile device to manipulate the virtual camera, or use the detected markers and GPS coordinates for placing 3D virtual objects in the 3D scene. However, the data obtained from Location- and Vision-based Registration and Tracking components is not in a form that XML3D can use directly. The GPS location is provided in terms of a latitude, longitude pair on the WGS84 coordinates system. The device orientation is provided in terms of a set of intrinsic Tait-Bryan angles of type Z-X'-Y'. The marker transform is provided in terms of 4x4 transform matrix. In order to use this information it must be converted into a right-handed, three-dimensional Cartesian coordinate system used by XML3D. The 3D Scene Management provides this exact functionality.

  • XML3D: The XML3D component contains the representation of the virtual 3D scene and is responsible for rendering it. Detailed description of XML3D can be found here: FIWARE.OpenSpecification.MiWi.3D-UI. However, XML3D is not natively supported in browsers, hence it is emulated by using xml3d.js, which is a Polyfill implementation of XML3D.

  • Web Service Interface: The Web Service Interface component manages the communication between a mobile web AR application and the 3rd Party Web Services using standard web technologies. The communication takes typically place either using RESTful API over HTTP, or through a WebSocket, for instance, when the requested data has to be streamed at high frequency. The Web Service Interface component requires that the 3rd Party Web Services are using the JSON format to represent the provided data.

  • 3rd Party Web Services: The 3rd Party Web Services provide the content, such as text, photos, videos, audio and 3D objects, displayed in a mobile web AR application. The content usually comprises spatial information like GPS coordinates which enables the content to be retrieved and visualized correctly in a mobile web AR application. In example, the 3rd Party Web Service might be a POI or a 3D GIS Data Service, which provide functionality for searching content from a specific location based on GPS coordinates.

The API architecture is modular and each of its components is independent. Therefore, one can use only the components needed in a specific augmented reality application.

Main Interactions

Figure describes the flow of data, both between the AR application components, and between the components and some 3rd party web services. In a basic example case the GPS coordinates is used to query nearby POIs from the POI Data Provider via the Web Service Interface. Only one POI is returned, it includes POI's physical location in wgs84 format and an 3D icon model. First the icon's location in the virtual scene is calculated, and then the icon is added in the scene, both is done by using the 3D scene management. Next, the user rotates the device and the Location-based Registration and Tracking component uses the 3D Scene Management component to pan the virtual camera according to the device orientation. Finally, when the device is facing the physical location of the POI, user sees the icon on the device screen.

Basic Design Principles

  • AR applications should run directly on modern web browsers, with no need for plug-ins.
  • Modular open source architecture.

Detailed Specifications

Following is a list of Open Specifications linked to this Generic Enabler. Specifications labeled as "PRELIMINARY" are considered stable but subject to minor changes derived from lessons learned during last interactions of the development of a first reference implementation planned for the current Major Release of FI-WARE. Specifications labeled as "DRAFT" are planned for future Major Releases of FI-WARE but they are provided for the sake of future users.

Open API Specifications

Re-utilised Technologies/Specifications

Related GEs:

Terms and definitions

This section comprises a summary of terms and definitions introduced during the previous sections. It intends to establish a vocabulary that will be help to carry out discussions internally and with third parties (e.g., Use Case projects in the EU FP7 Future Internet PPP). For a summary of terms and definitions managed at overall FI-WARE level, please refer to FIWARE Global Terms and Definitions

Annotations refer to non-functional descriptions that are added to declaration of native types, to IDL interface definition, or through global annotations at deployment time. The can be used to express security requirements (e.g. "this string is a password and should be handled according the security policy defined for password"), QoS parameters (e.g. max. latency), or others.
AR → Augmented Reality
Augmented Reality (AR)
Augmented Reality (AR) refers to the real-time enhancement of images of the real world with additional information. This can reach from the rough placement of 2D labels in the image to the perfectly registered display of virtual objects in a scene that are photo-realistically rendered in the context of the real scene (e.g. with respect to lighting and camera noise).
IDL → Interface Definition Language
Interface Definition Language
Interface Definition Language refers to the specification of interfaces or services. They contain the description of types and function interfaces that use these types for input and output parameters as well as return types. Different types of IDL are being used including CORBA IDL, Thrift IDL, Web Service Description Language (WSDL, for Web Services using SOAP), Web Application Description Language (WADL, for RESTful services), and others.
Middleware is a software library that (ideally) handles all network related functionality for an application. This includes the setup of connection between peers, transformation of user data into a common network format and back, handling of security and QoS requirements.
PoI → Point of Interest
Point of Interest (PoI)
Point of Interest refers to the description of a certain point or 2D/3D region in space. It defines its location, attaches meta data to it, and defines a coordinate system relative to which additional coordinate systems, AR marker, or 3D objects can be placed.
Quality of Service (QoS)
Quality of Service refers to property of a communication channel that are non-functional, such a robustness, guaranteed bandwidth, maximum latency, jitter, and many more.
Real-Virtual Interaction
Real-Virtual Interaction refers to Augmented Reality setup that additionally allow users to interact with real-world objects through virtual proxies in the scene that monitor and visualize the state in the real-world and that can use services to change the state of the real world (e.g. switch lights on an off via a virtual button the the 3D scene).
A Scene refers to a collection of objects, which are be identified by type (e.g. a 3D mesh object, a physics simulation rigid body, or a script object.) These objects contain typed and named data values (composed of basic types such as integers, floating point numbers and strings) which are referred to as attributes. Scene objects can form a hierarchic (parent-child) structure. A HTML DOM document is one way to represent and store a scene.
Security is a property of an IT system that ensures confidentiality, integrity, and availability of data within the system or during communication over networks. In the context of middleware, it refers to the ability of the middleware to guarantee such properties for the communication channel according to suitably expressed requirements needed and guarantees offer by an application.
Security Policy
Security Policy refers to rules that need to be fulfilled before a network connection is established or for data to be transferred. It can for example express statements about the identity of communication partners, properties assigned to them, the confidentiality measures to be applied to data elements of a communication channel, and others.
Synchronization is the act of transmitting over a network protocol the changes in a scene to participants so that they share a common, real-time perception of the scene. This is crucial to implementing multi-user virtual worlds.
Type Description
Type Description in the context of the AMi middleware refers to the internal description of native data types or the interfaces described by an IDL. It contains data such as the name of a variable, its data type, the hierarchical relations between types (e.g. structs and arrays), its memory offset and alignment within another data type, and others. Type Description are used to generate the mapping of native data types to the data that needs to be transmitted by the middleware.
Virtual Character
Virtual Character is a 3D object, typically composed of triangle mesh geometry, that can be moved and animated and can represent a user's presence (avatar) in a virtual world. Typically supported forms of animation include skeletal animation (where a hierarchy of "bones" or "joints" controls the deformation of the triangle mesh object) and vertex morph animation, where the vertices of the triangle mesh are directly manipulated. Virtual character systems may support composing the character from several mesh parts, for example separate upper body, lower body and head parts, to allow better customization possibilities.
WebGL → (Web Graphics Library) is a JavaScript API for rendering 3D and 2D computer graphics in web browser.
Personal tools
Create a book