We use proprietary and third party's cookies to improve your experience and our services, identifying your Internet Browsing preferences on our website; develop analytic activities and display advertising based on your preferences. If you keep browsing, you accept its use. You can get more information on our Cookie Policy
Cookies Policy
FIWARE.OpenSpecification.Data.StreamOriented R3 - FIWARE Forge Wiki

FIWARE.OpenSpecification.Data.StreamOriented R3

From FIWARE Forge Wiki

Jump to: navigation, search
Name FIWARE.OpenSpecification.Data.StreamOriented
Chapter Data/Context Management,
Catalogue-Link to Implementation Kurento
Owner URJC NAEVATEC, Luis López Fernández


Contents

Preface

Within this document you find a self-contained open specification of a FIWARE generic enabler, please consult as well the FIWARE Product Vision, the website on http://www.fiware.org and similar pages in order to understand the complete context of the FIWARE platform.


FIWARE WIKI editorial remark:
This page corresponds to Release 3 of FIWARE. The latest version associated to the latest Release is linked from FIWARE Architecture

Copyright

Copyright © 2010-2014 by URJC. All Rights Reserved.

Legal Notice

Please check the following Legal Notice to understand the rights to use these specifications.

Overview

The Stream Oriented GE provides a framework devoted to simplify the development of complex interactive multimedia applications through a rich family of APIs and toolboxes. It provides a media server and a set of client APIs making simple the development of advanced video applications for WWW and smartphone platforms. The Stream Oriented GE features include group communications, transcoding, recording, mixing, broadcasting and routing of audiovisual flows. It also provides advanced media processing capabilities involving computer vision, video indexing, augmented reality and speech analysis.

The Stream Oriented GE modular architecture makes simple the integration of third party media processing algorithms (i.e. speech recognition, sentiment analysis, face recognition, etc.), which can be transparently used by application developers as the rest of built-in features.

The Stream Oriented GE’s core element is a Media Server, responsible for media transmission, processing, loading and recording. It is implemented in low level technologies based on GStreamer to optimize the resource consumption. It provides the following features:

  • Networked streaming protocols, including HTTP (working as client and server), RTP and WebRTC.
  • Group communications (MCUs and SFUs functionality) supporting both media mixing and media routing/dispatching.
  • Generic support for computational vision and augmented reality filters.
  • Media storage supporting writing operations for WebM and MP4 and playing in all formats supported by GStreamer.
  • Automatic media transcodification between any of the codecs supported by GStreamer including VP8, H.264, H.263, AMR, OPUS, Speex, G.711, etc.

Basic Concepts

Signaling and media planes

The Stream Oriented GE, as most multimedia communication technologies out there, is built upon two concepts that are key to all interactive communication systems:

  • Signaling Plane. Module in charge of the management of communications, that is, it provides functions for media negotiation, QoS parametrization, call establishment, user registration, user presence, etc.
  • Media Plane. Module in charge of the media itself. So, functionalities such as media transport, media encoding/decoding and media processing are part of it.

Media elements and media pipelines

The Stream Oriented GE is based on two concepts that act as building blocks for application developers:

  • Media Elements:. A Media element is a functional unit performing a specific action on a media stream. Media elements are a way of every capability is represented as a self-contained “black box” (the media element) to the application developer, who does not need to understand the low-level details of the element for using it. Media elements are capable of receiving media from other elements (through media sources) and of sending media to other elements (through media sinks). Depending on their function, media elements can be split into different groups:
    • Input Endpoints: Media elements capable of receiving media and injecting it into a pipeline. There are several types of input endpoints. File input endpoints take the media from a file, Network input endpoints take the media from the network, and Capture input endpoints are capable of capturing the media stream directly from a camera or other kind of hardware resource.
    • Filters: Media elements in charge of transforming or analyzing media. Hence there are filters for performing operations such as mixing, muxing, analyzing, augmenting, etc.
    • Hubs: Media Objects in charge of managing multiple media flows in a pipeline. A Hub has several hub ports where other media elements are connected. Depending on the Hub type, there are different ways to control the media. For example, there are a Hub called Composite that merge all input video streams in a unique output video stream with all inputs in a grid.
    • Output Endpoints: Media elements capable of taking a media stream out of the pipeline. Again, there are several types of output endpoints specialized in files, network, screen, etc.
Media Element
Figure 1.
A media element is a functional unit providing a specific media capability, which is exposed to application developers as a "black box"
  • Media Pipeline: A Media Pipeline is a chain of media elements, where the output stream generated by one element (source) is fed into one or more other elements input streams (sinks). Hence, the pipeline represents a “machine” capable of performing a sequence of operations over a stream.
Media Pipeline Example
Figure 2.
Example of a Media Pipeline implementing an interactive multimedia application receiving media from a WebRtcEndpoint, overlaying and image on the detected faces and sending back the resulting stream

Agnostic media adapter

Using the Stream Oriented GE APIs, developers are able to compose the available media elements, getting the desired pipeline. There is a challenge in this scheme, as different media elements might require different input media formats than the output produced by their preceding element in the chain. For example, if we want to connect a WebRTC (VP8 encoded) or a RTP (H.264/H.263 encoded) video stream to a face recognition media element implemented to read raw RGB format, a transcoding is necessary.

Developers, specially during the initial phases of application development, might want to simplify development and abstract this heterogeneity, the Stream Oriented GE provides an automatic converter of media formats called the agnostic media adapter. Whenever a media element’s source is connected to another media element’s sink, our framework verifies if media adaption and transcoding is necessary and, in case it is, it transparently incorporates the appropriate transformations making possible the chaining of the two elements into the resulting pipeline.

Hence, this agnostic media adapter capability fully abstracts all the complexities of media codecs and formats. This may significantly accelerate the development process, specially when developers are not multimedia technology experts. However, there is a price to pay. Transcoding may be a very CPU expensive operation. The inappropriate design of pipelines that chain media elements in a way that unnecessarily alternate codecs (e.g. going from H.264, to raw, to H.264 to raw again) will lead to very poor performance of applications.

Media Element
Figure 3.
The agnostic media capability adapts formats between heterogeneous media elements making transparent for application developers all complexities of media representation and encoding.

Stream Oriented GE Architecture

High level architecture

The conceptual representation of the GE architecture is shown in the following figure.

Stream Oriented GE Architecture
Figure 4.
The Stream Oriented GE architecture follows the traditional separation between signaling and media planes.

The right side of the picture shows the application, which is in charge of the signaling plane and contains the business logic and connectors of the particular multimedia application being deployed. It can be build with any programming technology like Java, Node.js, PHP, Ruby, .NET, etc. The application can use mature technologies such as HTTP and SIP Servlets, Web Services, database connectors, messaging services, etc. Thanks to this, this plane provides access to the multimedia signaling protocols commonly used by end-clients such as SIP, RESTful and raw HTTP based formats, SOAP, RMI, CORBA or JMS. These signaling protocols are used by client side of applications to command the creation of media sessions and to negotiate their desired characteristics on their behalf. Hence, this is the part of the architecture, which is in contact with application developers and, for this reason, it needs to be designed pursuing simplicity and flexibility.

On the left side, we have the Media Server, which implements the media plane capabilities providing access to the low-level media features: media transport, media encoding/decoding, media transcoding, media mixing, media processing, etc. The Media Server must be capable of managing the multimedia streams with minimal latency and maximum throughput. Hence the Media Server must be optimized for efficiency.

APIs and interfaces exposed by the architecture

The capabilities of the media plane (Media Server) and signaling plane (Application) are exposed through a number of APIs, which provide increasing abstraction levels. Following this, the role of the different APIs can be summarized in the following way:

  • Stream Oriented GE Open API: Is a network protocol exposing the Media Server capabilities through WebSocket (read more in the Stream Oriented Open API page).
  • Java Client: Is a Java SE layer which consumes the Stream Oriented GE Open API and exposes its capabilities through a simple-to-use modularity based on Java POJOs representing media elements and media pipelines. This API is abstract in the sense that all the inherent complexities of the internal Open API workings are abstracted and developers do not need to deal with them when creating applications. Using the Java Client only requires adding the appropriate dependency to a maven project or to download the corresponding jar into the application developer CLASSPATH. It is important to remark that the Java Client is a media-plane control API. In other words, its objective is to expose the capability of managing media objects, but it does not provide any signaling plane capabilities.
  • JavaScript Client: Is a JavaScript layer which consumes the Stream Oriented GE Open API and exposes its capabilities to JavaScript developers. It allow to build Node.js and browser based applications.

From an architectural perspective, application developers can use clients or the Open API directly for creating their multimedia enabled applications. This opens a wide spectrum of potential usage scenarios ranging from web applications (written using the JavaScript client), desktop applications (written using the Java Client), distributed applications (written using the Open API), etc.

Creating applications on top of the Stream Oriented GE Architecture

The Stream Oriented GE Architecture has been specifically designed following the architectural principles of the WWW. For this reason, creating a multimedia applications basing on it is a similar experience to creating a web application using any of the popular web development frameworks.

At the highest abstraction level, web applications have an architecture comprised of three different layers:

  • Presentation layer: Here we can find all the application code which is in charge of interacting with end users so that information is represented in a comprehensive way user input is captured. This usually consists on HTML pages.
  • Application logic: This layer is in charge of implementing the specific functions executed by the application.
  • Service layer: This layer provides capabilities used by the application logic such as databases, communications, security, etc.

Following this parallelism, multimedia applications created using the Stream Oriented GE also respond to the same architecture:

  • Presentation layer: Is in charge of multimedia representation and multimedia capture. It is usually based on specific build-in capabilities of the client. For example, when creating a browser-based application, the presentation layer will use capabilities such as the <video> tag or the WebRTC PeerConnection and MediaStreams APIs.
  • Application logic: This layer provides the specific multimedia logic. In other words, this layer is in charge of building the appropriate pipeline (by chaining the desired media elements) that the multimedia flows involved in the application will need to traverse.
  • Service layer: This layer provides the multimedia services that support the application logic such as media recording, media ciphering, etc. The Media Server (i.e. the specific media elements) is the part of the Stream Oriented GE architecture in charge of this layer.
Layered architecture of web and multimedia applications
Figure 5.
Applications created using the Stream Oriented GE (right) have an equivalent architecture to standard WWW applications (left). Both types of applications may choose to place the application logic at the client or at the server code.

This means that developers can choose to include the code creating the specific media pipeline required by their applications at the client side (using a suitable client or directly with the Open API) or can place it at the server side.

Both options are valid but each of them drives to different development styles. Having said this, it is important to note that in the WWW developers usually tend to maintain client side code as simple as possible, bringing most of their application logic to the server. Reproducing this kind of development experience is the most usual way of using this GE. That is, by locating the multimedia application logic at the server side, so that the specific media pipelines are created using the the client for your favorite language.

Main Interactions

Interactions from a generic perspective

A typical Stream Oriented GE application involves interactions among three main modules:

  • Client Application: which involves the native multimedia capabilities of the client platform plus the specific client-side application logic. It can use Clients designed to client platforms (for example, JavaScript Client).
  • Application Server: which involves an application server and the server-side application logic. It can use Clients designed to server platforms (for example, Java Client for Java EE and JavaScript Client for Node.js).
  • Media Server: which receives commands for creating specific multimedia capabilities (i.e. specific pipelines adapted to the needs of specific applications)

The interactions maintained among these modules depend on the specificities of each application. However, in general, for most applications they can be reduced to the following conceptual scheme:

Main interactions between architectural modules
Figure 6.
Main interactions occur in two phases: negotiation and media exchange. Remark that the color of the different arrows and boxes is aligned with the architectural figures presented above, so that, for example, orange arrows show exchanges belonging to the Open API, blue arrows show exchanges belonging to the Thrift API, red boxes are associated to the Media Server and green boxes with the Application Server.

Media negotiation phase

As it can be observed, at a first stage, a client (a browser in a computer, a mobile application, etc.) issues a message requesting some kind of capability from the Stream Oriented GE. This message is based on a JSON RPC V2.0 representation and fulfills the Open API specification. It can be generated directly from the client application or, in case of web applications, indirectly consuming the abstract HTML5 SDK. For instance, that request could ask for the visualization of a given video clip.

When the Application Server receives the request, if appropriate, it will carry out the specific server side application logic, which can include Authentication, Authorization and Accounting (AAA), CDR generation, consuming some type of web service, etc.

After that, the Application Server processes the request and, according to the specific instructions programmed by the developer, commands the Media Server to instantiate the suitable media elements and to chain them in an appropriate media pipeline. Once the pipeline has been created successfully the server responds accordingly and the Application Server forwards the successful response to the client, showing it how and where the media service can be reached.

During the above mentioned steps no media data is really exchanged. All the interactions have the objective of negotiating the whats, hows, wheres and whens of the media exchange. For this reason, we call it the negotiation phase. Clearly, during this phase only signaling protocols are involved.

Media exchange phase

After that, a new phase starts devoted to producing the actual media exchange. The client addresses a request for the media to the Media Server using the information gathered during the negotiation phase. Following with the video-clip visualization example mentioned above, the browser will send a GET request to the IP address and port of the Media Server where the clip can be obtained and, as a result, an HTTP request with the media will be received.

Following the discussion with that simple example, one may wonder why such a complex scheme for just playing a video, when in most usual scenarios clients just send the request to the appropriate URL of the video without requiring any negotiation. The answer is straightforward. The Stream Oriented GE is designed for media applications involving complex media processing. For this reason, we need to establish a two-phase mechanism enabling a negotiation before the media exchange. The price to pay is that simple applications, such as one just downloading a video, also need to get through these phases. However, the advantage is that when creating more advanced services the same simple philosophy will hold. For example, if we want to add augmented reality or computer vision features to that video-clip, we just need to create the appropriate pipeline holding the desired media element during the negotiation phase. After that, from the client perspective, the processed clip will be received as any other video.

Specific interactions for commonly used services

Regardless of the actual type of session, all interactions follow the pattern described in section above. However, most common services respond to one of the following three main categories:

HTTP player

This type of session emerges when clients use the Stream Oriented GE to receive media through an HTTP response. The client sends a JSON request identifying the desired content and, as a result, it receives an URL where the content can be found. This URL is associated to a pipeline where the media processing logic is executed. The Application Server is in charge of commanding the creation of that media pipeline following the instructions provided by the application developer. The Application Server can interrogate that pipeline for obtaining the URL it is exposing to the world. This URL travels at the end of the negotiation to the client, which can recover the stream by connecting to it. The following image shows the interactions taking place in this kind of session.

Main interactions in a Stream Oriented GE session devoted to playing an HTTP media stream
Figure 7.
Main interactions in a Stream Oriented GE session devoted to playing an HTTP media stream.

Clearly, the specific media stream that the client receives depends on the pipeline serving it. For HTTP playing, the usual pipeline may follow the scheme depicted in the figure below, where a video clip is recovered from a media repository (e.g. the file system) and it is fed into a filter performing specific processing on it (e.g. augmenting the media, recognizing objects of faces through computer vision, adding subtitles, modifying the color palette, etc.) At the end of the pipeline an element called HttpGetEndpoint adapts the media and sends it as an HTTP answer upon client requests. This basic pipeline can be modified by the developer adding additional elements at wish, which can be done creating the server-side application logic.

Example of pipeline for an HTTP player session
Figure 8.
Example of pipeline for an HTTP player.

HTTP recorder

HTTP recording sessions are equivalent to playing sessions although, in this case, the media goes from the client to the server using POST HTTP method. The negotiation phase hence starts with the client requesting to upload the content and the Application Server creating the appropriate pipeline for doing it. This pipeline will always start with an HttpPostEndpoint element. Further elements can be connected to that endpoint for filtering media, processing it or storing it into a media repository. The specific interactions taking place in this type of session are shown in the figure below

Example of pipeline for an HTTP recorder
Figure 9.
Example of pipeline for an HTTP recorder.

RTP/WebRTC

The Stream Oriented GE allows the establishment of real time multimedia session between a peer client and the Media Server directly through the use of RTP/RTCP or through WebRTC. In addition, the Media Server can be used to act as media proxy for making possible the communication among different peer clients, which are mediated by the Stream Oriented GE infrastructure. Hence, the GE can act as a conference bridge (Multi Conference Unit), as a machine-to-machine communication system, as a video call recording system, etc. As shown in the picture, the client exposes its media capabilities through an SDP (Session Description Protocol) payload encapsulated in a JSON object request. Hence, the Application Server is able to instantiate the appropriate media element (either RTP or WebRTC end points), and to require it to negotiate and offer a response SDP based on its own capabilities and on the offered SDP. When the answer SDP is obtained, it is given back to the client and the media exchange can be started. The interactions among the different modules are summarized in the following picture

Main interactions in a WebRTC session
Figure 10.
Interactions taking place in a Real Time Communications (RTC) session. During the negotiation phase, a SDP message is exchanged offering the capabilities of the client. As a result, the Media Server generates an SDP answer that can be used by the client for establishing the media exchange.

As with the rest of examples shown above, the application developer is able to create the desired pipeline during the negotiation phase, so that the real time multimedia stream is processed accordingly to the application needs. Just as an example, imagine that we want to create a WebRTC application recording the media received from the client and augmenting it so that if a human face is found, a hat will be rendered on top of it. This pipeline is schematically shown in the figure below, where we assume that the Filter element is capable of detecting the face and adding the hat to it.

Example pipeline for a WebRTC session
Figure 11.
During the negotiation phase, the application developer can create a pipeline providing the desired specific functionality. For example, this pipeline uses a WebRtcEndpoint for communicating with the client, which is connected to a RecorderEndpoint storing the received media stream and to an augmented reality filter, which feeds its output media stream back to the client. As a result, the end user will receive its own image filtered (e.g. with a hat added onto her head) and the stream will be recorded and made available for further recovery into a repository (e.g. a file).

Basic Design Principles

The Stream Oriented GE is designed based on the following main principles:

  • Separate Media and Signaling Planes. Signaling and media are two separate planes and therefore the Stream Oriented GE is designed in a way that applications can handle separately those facets of multimedia processing.
  • Distribution of Media and Application Services. Media Server and applications can be collocated, escalated or distributed among different machines. A single application can invoke the services of more than one Media Servers. The opposite also applies, that is, a Media Server can attend the requests of more than one application.
  • Suitable for the Cloud. The Stream Oriented GE is suitable to be integrated into cloud environments to act as a PaaS (Platform as a Service) component.
  • Media Pipelines. Chaining Media Elements via Media Pipelines is an intuitive approach to challenge the complexity of multimedia processing.
  • Application development. Developers do not need to be aware of internal Media Server complexities, all the applications can deployed in any technology or framework the developer like, from client to server. From browsers to cloud services.
  • End-to-end Communication Capability. The Stream Oriented GE provides end-to-end communication capabilities so developers do not need to deal with the complexity of transporting, encoding/decoding and rendering media on client devices.
  • Fully Processable Media Streams. The Stream Oriented GE enables not only interactive interpersonal communications (e.g. Skype-like with conversational call push/reception capabilities), but also human-to-machine (e.g. Video on Demand through real-time streaming) and machine-to-machine (e.g. remote video recording, multisensory data exchange) communications.
  • Modular Processing of Media. Modularization achieved through media elements and pipelines allows defining the media processing functionality of an application through a “graph-oriented” language, where the application developer is able to create the desired logic by chaining the appropriate functionalities.
  • Auditable Processing. The Stream Oriented GE is able to generate rich and detailed information for QoS monitoring, billing and auditing.
  • Seamless IMS integration. The Stream Oriented GE is designed to support seamless integration into the IMS infrastructure of Telephony Carriers.
  • Transparent Media Adaptation Layer. The Stream Oriented GE provides a transparent media adaptation layer to make the convergence among different devices having different requirements in terms of screen size, power consumption, transmission rate, etc. posible.


Detailed Specifications

Following is a list of Open Specifications linked to this Generic Enabler. Specifications labeled as "PRELIMINARY" are considered stable but subject to minor changes derived from lessons learned during last interactions of the development of a first reference implementation planned for the current Major Release of FI-WARE. Specifications labeled as "DRAFT" are planned for future Major Releases of FI-WARE but they are provided for the sake of future users.

Open API Specifications


Re-utilised Technologies/Specifications

The StreamOriented GE relies on the following specifications:

  • HTTP
  • TLS
  • JSON-RPC V2.0
  • WebSockets
  • RTP
  • SRTP
  • DTLS
  • H.264
  • MPEG4
  • VP8
  • Opus
  • AMR
  • WebRTC
  • HTML5

On the other hand, Kurento GEi reuses the following technologies:

  • JBoss
  • Spring
  • GStreamer
  • OpenCV
  • Thrift
  • maven
  • Selenium

Terms and definitions

This section comprises a summary of terms and definitions introduced during the previous sections. It intends to establish a vocabulary that will be help to carry out discussions internally and with third parties (e.g., Use Case projects in the EU FP7 Future Internet PPP). For a summary of terms and definitions managed at overall FI-WARE level, please refer to FIWARE Global Terms and Definitions

  • Data refers to information that is produced, generated, collected or observed that may be relevant for processing, carrying out further analysis and knowledge extraction. Data in FIWARE has associated a data type and avalue. FIWARE will support a set of built-in basic data types similar to those existing in most programming languages. Values linked to basic data types supported in FIWARE are referred as basic data values. As an example, basic data values like ‘2’, ‘7’ or ‘365’ belong to the integer basic data type.
  • A data element refers to data whose value is defined as consisting of a sequence of one or more <name, type, value> triplets referred as data element attributes, where the type and value of each attribute is either mapped to a basic data type and a basic data value or mapped to the data type and value of another data element.
  • Context in FIWARE is represented through context elements. A context element extends the concept of data element by associating an EntityId and EntityType to it, uniquely identifying the entity (which in turn may map to a group of entities) in the FIWARE system to which the context element information refers. In addition, there may be some attributes as well as meta-data associated to attributes that we may define as mandatory for context elements as compared to data elements. Context elements are typically created containing the value of attributes characterizing a given entity at a given moment. As an example, a context element may contain values of some of the attributes “last measured temperature”, “square meters” and “wall color” associated to a room in a building. Note that there might be many different context elements referring to the same entity in a system, each containing the value of a different set of attributes. This allows that different applications handle different context elements for the same entity, each containing only those attributes of that entity relevant to the corresponding application. It will also allow representing updates on set of attributes linked to a given entity: each of these updates can actually take the form of a context element and contain only the value of those attributes that have changed.
  • An event is an occurrence within a particular system or domain; it is something that has happened, or is contemplated as having happened in that domain. Events typically lead to creation of some data or context element describing or representing the events, thus allowing them to processed. As an example, a sensor device may be measuring the temperature and pressure of a given boiler, sending a context element every five minutes associated to that entity (the boiler) that includes the value of these to attributes (temperature and pressure). The creation and sending of the context element is an event, i.e., what has occurred. Since the data/context elements that are generated linked to an event are the way events get visible in a computing system, it is common to refer to these data/context elements simply as "events".
  • A data event refers to an event leading to creation of a data element.
  • A context event refers to an event leading to creation of a context element.
  • An event object is used to mean a programming entity that represents an event in a computing system [EPIA] like event-aware GEs. Event objects allow to perform operations on event, also known as event processing. Event objects are defined as a data element (or a context element) representing an event to which a number of standard event object properties (similar to a header) are associated internally. These standard event object properties support certain event processing functions.
Personal tools
Create a book