We use proprietary and third party's cookies to improve your experience and our services, identifying your Internet Browsing preferences on our website; develop analytic activities and display advertising based on your preferences. If you keep browsing, you accept its use. You can get more information on our Cookie Policy
Cookies Policy
FIWARE.OpenSpecification.WebUI.2D-3D-Capture - FIWARE Forge Wiki


From FIWARE Forge Wiki

Jump to: navigation, search
Name FIWARE.OpenSpecification.WebUI.2D-3D-Capture
Chapter Advanced Web-based User Interfaces,
Catalogue-Link to Implementation 2D-3D-Capture



Within this document you find a self-contained open specification of a FIWARE generic enabler, please consult as well the FIWARE Product Vision, the website on http://www.fiware.org and similar pages in order to understand the complete context of the FIWARE platform.


Legal Notice

Please check the following Legal Notice to understand the rights to use these specifications.


Ideally this service is consists of following set of capabilities.

  • Capturing 2D and 3D visual information using standard browser API.
  • Identify and capture relevant meta information that improves user experience, accuracy and performance.
  • Using "tagging" techniques to encapsulate information with the image.
  • Publish/Store/Submit tagged multimedia in known/requested public services.
  • RESTfull API should be used to access multimedia for the use of other GE such as Cloud rendering GE, Augmented Reality GEs, Virtual Characters GEs.

2D 3D Capturing is capture contextual information related a 2D 3D scene of the surrounding so that the data can be used to provided to or as services. Location information, lighting information, device orientation, heading direction are the necessary contextual information and based on the service these other information available to the browser can be used. Air pressure, time the image is taken are some of the available important meta data can also be used as "Tagging content". Tagging can be similar to the contemparary geo tagging but highly dependent on the image format(JPEG, PNG, ogg, wmv, ect....). Exif extension method can be used if the format used is JPEG and for image formats such as PNG another method can be used. Format of the image in image capture depends on the usage scenario. For example in a situation where image processing is minimal JPEG compresssion can be better over PNG lossless images.

Simple Use Case of Advanced data capture

3D model builder using 2D-3D capture server to filter most suitable images

As depicted in the image above, photos taken from devices combined with the meta data shown as Tag are transferred to the 2D3D capture server. 2D 3D capture server API then can be used to dig up image based on time, location for use of other applications where 2D or 3D visual information is required. In the use case described here 3D scene builder can be a part of other GEs such as POI. Another use case is, using tagged images published by other services or devices such as Flickr to build a 3d model of the current location so that a person such as lost tourist could find his way. Elaborating on the "Tag" and its content, images are geo-tagged in order to improve the user experience in social networks images. 3D information can be used similarly to improve the quality of the user experience and meta data that can be associated with a model improves number of use cases 3d modelling can be used. Most importantly information such as orientation of the device at the moment which the photo is taken can be used to improve the accuracy and it also helps to filter out less useful images from the useful once. In a hypothetical security situation where two images taken from known locations by two security cameras at the same time could be useful in identifying a criminal who happened to be at that location. By using meta data such as lighting conditions the accuracy of the 3d model of the criminal can be improved and will avoid innocent being targeted.

Basic Concepts

2D3D capture Capturing of 2D/3D data consists of 3 main aspects

  • 2D/3D Data Capture
  • Metadata capturing and Tagging/associating
  • Data Streaming/Saving

Data Capture

Cameras of mobiles device's provide data in the form of still images and streams of 2D. There are cameras equipped with stereoscopic cameras yet they too are managed by software to create the 3D images using the stereoscopic data provided by the setting. HTML 5 and the standard W3C provides standard functions to obtain access to device cameras and sensors. Though there are W3C specifications browsers do not support a universal API. Abstract API is required to accommodate the functional differences of browsers. In relation with real time and actual representation of 3D data most popular today is GEO tagging photos. In modern mobile devices there are more information available to optimize the process placing photos in actual places so that they would provide accurate information to the 3D model creators. Hence capturing sensor information and encapsulate them with the Image is the primary target of this architecture document.

There some other means that anticipate future development of the 3D rendering of scene using actual real time data but they have rendered out of scope from this document. They are considered to be important and API's are designed to extend the functionality with such means.Capturing audio data as 3D for such purposes is, at the moment, a highly theoretical. One important aspect of audio is echo which is useful in detecting the proximity from the adjacent obstacles.

Capturing depth element from the 2D content from the device directly is highly desired. Current implementation will make an attempt to provide means to acheive this goal by identifying necessary information to extract depth from current images. This is a workaround at the server end in place for this as current processing powers of handheld devices is insufficient. Data capturing components should make the data available for back end 2D-3D capture server over http connection and transport mechanisms support loss less and lossy compressions and encoding mechanisms which contemporary browsers provides by supporting PNG, GIFF as well as JPEG images. By combining with services such as adaptive streaming (eg . [https://developer.mozilla.org/en-US/docs/DASH_Adaptive_Streaming_for_HTML_5_Video | DASH) similar results can be obtained for videos as well. Video streams are out of the scope of this document but algorithms developed to manipulate sensor data are still valid for video streams obtained from mobile devices.

This aspect of 2D3D capturing is dependent on the Device API's provided by the browsers.


Necessary metadata that is required to respond to a service request are identified as follows.

Geo-location (Longitude, Latitudes)
Placing an Image on a map is achieved with this information
Altitude gives the 3rd dimension of a map and also assist to identify images taken from air and multistory buildings.
Air pressure
In case altitude is not present for considerable difference proper altitude approximation can be made.
This is the most important aspect which helps to place the photo in real 3D map. Device may be angled when the image is taken hence important to understand the portion of a scene covered by the particular. It also helps to filter out most appropriate images to use as stereoscopic images for model building.
Heading direction
in a single position (Longitude, Latitudes) the full circle a user may be facing provides another filtering parameter.
In most mobile devices main camera is located opposite to the light sensor but secondary camera is aligned with the sensor. This parameter is one of the filtering parameters to identify proper images that accurately depict the scene

Meta data information mentioned above are part of the GPS standard specification[1] for W3C and most importantly this is a part of Mozilla API[2][3] [4]. Abstract API needs to be defined to harmonize the data extraction from a range of browsers.

Advanced geo tag that needs to be embedded can be of following form in json.

    1.First three digits explain the length of the metadata.eg 345.
    2.Metadata is of the following format
    message {
         "type":"image",              // Image/Video
         "time":"2014.3.4_14.40.30",  //year_month_day_hours_minutes_Seconds This is used as the Image name
         "ext":"png",                 //Image compression type. JPG and PNG are supported
         "DeviceType":"Mobile",       //Mobile/Desktop
         "DeviceOS":"Badda",           //Operating system
         "Browsertype":"Firefox",     //Browser application
         "position":  {
               "lon":25.4583105,     //Longiture 
               "lat":65.0600797,     //Latitude
               "alt":-1000,          //Altitude
               "acc":48.38800048828125 //Accuracy
         "device":  {               //Seonser information
               "ax":0, "ay":0, "az":0, "gx":0, "gy":0, "gz":0, "ra":210.5637, "rb":47.5657, "rg":6.9698,  //Accellerometer information

Currently audio is considered meta data due to the reason that it does not provide specific information to 3D context with contemporary technologies but remains to be a important data with useful use cases as mentioned in the data capture topic. Apart from those use cases could be used in value added services So audio remains a meta data as well.

Data Streaming/Saving

Out of the scope from the API but is included for completion. Basic requirements are similar to images but intermediate components that needs streaming needs to be identified to make the architecture component.

Conceptually capturing should support saving media locally with tagged information as well as publishing to required services after tagging. Tagging Streams remain a major challenge and it is yet to realize means for such implementations. For streaming from portable devices data capture server can use similar services to DASH like services which assists adaptive streaming YouTube like services should be able to adapted and tagged content from the capture server. For still images services such as Flickr already provides services but the capture service should support publishing and extracting tagged image data from public services for pub sub or on demand services.

Generic Architecture

Architecture Diagram

Above diagram is a very high level diagram and it can only be considered as a conceptual architecture of a prototype. Generic Architecture of 2D/3D capture is multiplier client server architecture where there is a client residing on capturing device, content capture server in the back end and main storage.

Architecture consists of three main components.

  • Mobile Clients
  • Public Server Application
  • REST Clients

Mobile Clients

Mobile Clients are main information capturing devices in 2D3D capture applications. Contemporary browsers provide APIs but standard APIs are far from being used by any of the available browsers. Contemporary mobile applications are in need for a Unifying API to address the problem and this API is one such attempt. This is a very limited API and parameters such as orientation information which expected to be obtained from device compass are adjusted to be able to feed as data to 3D contexts such as WebGL. Accuracy of the information are dependent on manual calibration of the device.

Contemparary web technologies such as AJAX, HTML5 provide means for this type of services.

Public Server Application - 2D3D Capture server

Main feature of the service is providing a repository for images and their meta information. Server stores meta data in data repository and also they are embedded in images when they are inserted to the repository. Server consists of three interfaces and one internal Module.

  • EventManager
  • REST server
  • Web socket Server are the three interfaces and internal module consists of Database interface functions that are required to access and retrieve data from and to the MYSQL database.

Rest server provides means to

  • Carry out basic server functions (Start/Stop)
  • Upload Images
  • Query Images and relevant information
  • Subscribe For Event from Event manager

Event Manager Provides Means to update subscribers about latest updates to the server. Purpose of this is to carry out image processing in real time "As It Happens".

Web Socket server is the back bone of Event manager. It also provides means to connect and save images using the Mobile API.

Combination of Database layer and a web server technologies can be used to realize implementation goals of this component.

Main Interactions

Following diagram depicts very basic interactions of 2D3D capture.

  • an user decided to upload content to a information capturing service and
  • a single service requesting services from a single available device.

Sequence Diagram of the real time publisher subscriber scenario is as follows.

These two very preliminary interactions require following interfaces from the 2D3D capture server, purely on an information capture point of view those are the necessary transactions related to a server.

Interactions concerning 2D-3D capture can be directly related with GE requests. Follwing hypothetical situation comprises of some interactions that concerns this specification. Service Based on POI can demand data capture and POI IDs can be used as meta data. 2D-3D-Capture could use contemporary 3D services such as youtube 3D and Flickr 3D community hence the requirement for capture and publish data in public services can be supported. CloudRendering, RealVirtualInteraction and VirtualCharacters require much finer details such as extracting texture material on real time and off line. Further AugmentedReality GE live feed or an Image can request the feed annotated based on the facing direction and the location. SceneAPI,DataflowProcessing GEs also indirectly affect 2D 3D capture API as they depend on the update of the data repositories and streams. These requirements can be generalized for generic service enablers to define advanced interactions.

Basic Design Principles

  • Development of this GE focused on few scenarios.
  • API should define necessary functions leaving room for customization.
  • Most important design principle is provide room for future developments. This includes catering for increasing number of sensors that portable devices area equipped with and supporting advanced data capturing mechanisms.
  • Persist heterogeneity of browser echo systems from the application developer.

Detailed Specifications

RESTful and JavaScript API

The programming API is split in two at this point of development. The split is not final in any form, and primarily serves as a testing ground which allows gathering of material for further processing. The first part of the API is build for the browser as a JavaScript library. The library allows web developer to query capture interfaces, as well as capture data from them in a way it is harmonized across different browsers. The API functionality has been tested on both mobile and desktop browsers. The second part of the API is a RESTful interface for saving and collecting sensor data. At the moment the API allows saving if captured image files with advanced GeoTag information. The aim of it is twofolder: first, it serves as a network service where the device can push data and second, it acts as a datastorage to more advanced processing backends, such as 3D construction (not implemented yet)

JavaScript API

This API is based on the following W3C specifications(geoAPI,OrienationAPI) and current implementation does not assume harmonization of this API through out the browser spectrum. Hence isXXXXSupported set of functions.These functions avoid run-time exceptions due to unsupported W3C functions.

 isDeviceOrientationSupported: Returns true if the browser is able to obtain calculated orientation readings from the accelerometer.
 isDeviceMotionSupported: Returns true if the browser is able to obtain readings from the accelerometer.
 isGeolocationSupported: Returns true if the browser is able to obtain readings from the GPS sensors.
 hasMediaSupport: Returns true if the browser is able to obtain camera feed from the local camera to the video element

Following set of functions accesses the interfaces.

  • showVideo()

Start streaming video from the device camera to a available Video element in the DOM.

   function registerForDeviceMovements(onLocationSuccess,onLocationError,onMotion , options)

If the location is found onLocationSuccess is triggered in case of a failure onLocationError function is triggered. Speed is calculated using GPS and onMotion function updates the speed for every GPS update received.

  • getCurrentLocation()

Returns the GPS coordinates of the current location.

   function getCurrentLocation (callback,options) 

On successful retrieval of current location callback function is triggered.

  • registerDeviceMotionEvents()

Registers to device motion data from the accelerometer. On successful event retrieval handlacceleration, handleAccelerationWithGravity, handleRotation callback functions are triggered respectively. Values are averaged over 2 past values to avoid errors and simple error correction methods are used to avoid sudden value changes.

   function registerDeviceMotionEvents(handlacceleration, handleAccelerationWithGravity, handleRotation)

  • registerDeviceOrentationEvent()

Registers to device orientation data and calls eventHandlingFunction in case of device orientation is changed in along x, y and x axis.

   function registerDeviceOrentationEvent(eventHandlingFunction)

Contemporary major browsers do not provide access to compass or the gyroscope. This values are calculated from the accelerometer. Device dependent Orientation API is implemented to adjust these sensor values to 3D contexts.

  • registerAmbientLightChanges()
   function registerAmbientLightChanges(handleLightValues)
  • subscribe()

This function subscribes to receive data from all of the above functions are call back functions are provided to obtain data

   function subscribe(onLocationSearchSuccess,onLocationServiceSearchError,onMotion, handlacceleration,handleAccelerationWithGravityEvent,handleRotation, handleOrientationChanges)

Existing implementation depends on WebSocket API media transfer and json ans a standard for massaging and streaming.

  • snapshot()

This functions takes a snapshot of the current video feed and appends on the webpage.

  • sendImage()

Uploads an Image to a designated server using web sockets. Depending on the server setup up and running.

  • postImage()

Uploads an Image to a designated server using REST POST.

Following code snippet can be used to take a image and then upload the image with necessary data.

   dAPI = new FIware_wp13.Device("localhost","remote 2d3dCapture server", "local server port" , "WebSocketPORT","LG_stereoscopic" ,"REST server Port");
   dAPI.subscribe(onLocationSearchSuccess, onLocationServiceSearchError, onMotion, handlacceleration, handleAccelerationWithGravityEvent, handleRotation, handleOrientationChanges);
   function upladlImageWithPost(){

User And Programmer Guide provides detailed instructions on how to update the APIs and how to use them.


The RESTful image storage server API is not included because is highly experimental state, and publishing at this point will cause only confusion. At the same time the image server as such will not be a final deliverable anyway, but a part of larger system, hence the API is not yet here.

Starts the websocket server

Close Web socket Server

Post an Image to be saved in the repository

Retrieve all the data in the database

Get Images that are closest to a given GPS location

Re-utilised Technologies/Specifications

  • FIWARE.OpenSpecification.WebUI.3D-UI

Terms and definitions

This section comprises a summary of terms and definitions introduced during the previous sections. It intends to establish a vocabulary that will be help to carry out discussions internally and with third parties (e.g., Use Case projects in the EU FP7 Future Internet PPP). For a summary of terms and definitions managed at overall FI-WARE level, please refer to FIWARE Global Terms and Definitions

Annotations refer to non-functional descriptions that are added to declaration of native types, to IDL interface definition, or through global annotations at deployment time. The can be used to express security requirements (e.g. "this string is a password and should be handled according the security policy defined for password"), QoS parameters (e.g. max. latency), or others.
AR → Augmented Reality
Augmented Reality (AR)
Augmented Reality (AR) refers to the real-time enhancement of images of the real world with additional information. This can reach from the rough placement of 2D labels in the image to the perfectly registered display of virtual objects in a scene that are photo-realistically rendered in the context of the real scene (e.g. with respect to lighting and camera noise).
IDL → Interface Definition Language
Interface Definition Language
Interface Definition Language refers to the specification of interfaces or services. They contain the description of types and function interfaces that use these types for input and output parameters as well as return types. Different types of IDL are being used including CORBA IDL, Thrift IDL, Web Service Description Language (WSDL, for Web Services using SOAP), Web Application Description Language (WADL, for RESTful services), and others.
Middleware is a software library that (ideally) handles all network related functionality for an application. This includes the setup of connection between peers, transformation of user data into a common network format and back, handling of security and QoS requirements.
PoI → Point of Interest
Point of Interest (PoI)
Point of Interest refers to the description of a certain point or 2D/3D region in space. It defines its location, attaches meta data to it, and defines a coordinate system relative to which additional coordinate systems, AR marker, or 3D objects can be placed.
Quality of Service (QoS)
Quality of Service refers to property of a communication channel that are non-functional, such a robustness, guaranteed bandwidth, maximum latency, jitter, and many more.
Real-Virtual Interaction
Real-Virtual Interaction refers to Augmented Reality setup that additionally allow users to interact with real-world objects through virtual proxies in the scene that monitor and visualize the state in the real-world and that can use services to change the state of the real world (e.g. switch lights on an off via a virtual button the the 3D scene).
A Scene refers to a collection of objects, which are be identified by type (e.g. a 3D mesh object, a physics simulation rigid body, or a script object.) These objects contain typed and named data values (composed of basic types such as integers, floating point numbers and strings) which are referred to as attributes. Scene objects can form a hierarchic (parent-child) structure. A HTML DOM document is one way to represent and store a scene.
Security is a property of an IT system that ensures confidentiality, integrity, and availability of data within the system or during communication over networks. In the context of middleware, it refers to the ability of the middleware to guarantee such properties for the communication channel according to suitably expressed requirements needed and guarantees offer by an application.
Security Policy
Security Policy refers to rules that need to be fulfilled before a network connection is established or for data to be transferred. It can for example express statements about the identity of communication partners, properties assigned to them, the confidentiality measures to be applied to data elements of a communication channel, and others.
Synchronization is the act of transmitting over a network protocol the changes in a scene to participants so that they share a common, real-time perception of the scene. This is crucial to implementing multi-user virtual worlds.
Type Description
Type Description in the context of the AMi middleware refers to the internal description of native data types or the interfaces described by an IDL. It contains data such as the name of a variable, its data type, the hierarchical relations between types (e.g. structs and arrays), its memory offset and alignment within another data type, and others. Type Description are used to generate the mapping of native data types to the data that needs to be transmitted by the middleware.
Virtual Character
Virtual Character is a 3D object, typically composed of triangle mesh geometry, that can be moved and animated and can represent a user's presence (avatar) in a virtual world. Typically supported forms of animation include skeletal animation (where a hierarchy of "bones" or "joints" controls the deformation of the triangle mesh object) and vertex morph animation, where the vertices of the triangle mesh are directly manipulated. Virtual character systems may support composing the character from several mesh parts, for example separate upper body, lower body and head parts, to allow better customization possibilities.
WebGL → (Web Graphics Library) is a JavaScript API for rendering 3D and 2D computer graphics in web browser.
Personal tools
Create a book