We use proprietary and third party's cookies to improve your experience and our services, identifying your Internet Browsing preferences on our website; develop analytic activities and display advertising based on your preferences. If you keep browsing, you accept its use. You can get more information on our Cookie Policy
Cookies Policy
FIWARE.OpenSpecification.Cloud.Monitoring - FIWARE Forge Wiki

FIWARE.OpenSpecification.Cloud.Monitoring

From FIWARE Forge Wiki

Jump to: navigation, search
Name FIWARE.OpenSpecification.Cloud.Monitoring
Chapter Cloud,
Catalogue-Link to Implementation [Monitoring GE - TID Implementation TBD]
Owner Telefónica I+D, Fernando López Aguilar, Pablo Rodríguez Archilla

Contents

Preface

Within this document you find a self-contained open specification of a FIWARE generic enabler, please consult as well the FIWARE Product Vision, the website on http://www.fiware.org and similar pages in order to understand the complete context of the FIWARE platform.

Copyright

Copyright © 2012-2015 by Telefónica I+D. All Rights Reserved.

Legal Notice

Please check the following FIWARE Open Specification Legal Notice (implicit patents license) to understand the rights to use these specifications.

Overview

Every distributed system needs to incorporate monitoring mechanisms in order be able to constantly check the performance. Monitoring involves gathering operational data in a running system. There might be many consumers, which might use this information for various purposes. SLA management, where the system needs to be able to constantly check that the performance adheres to the terms signed, could be one of them. Monitoring data can also be used in a variety of ways, for example as optimization of virtual machines, products and applications, alarms detection, recommendations, etc.

This specification describes the Monitoring GE, which is the key enabler to provide monitoring information to the rest of GEs. Its architecture can be seen in the following figure:

Monitoring GE architecture overview
Monitoring GE architecture overview


The Monitoring GE works once the resource has been deployed. The architecture of the Monitoring GE requires having monitoring probes distributed in the VMs and Hosts. This is valid for both IaaS or PaaS. This information is pushed to an adaptation layer either directly by the probes or through a custom monitoring collector responsible for that on behalf the probes.

Adaptation layer (NGSI Adapter) expresses raw monitoring data in terms of updates of entities' context, where the resources being monitored are such entities, and therefore sends requests to Context Broker GE. This GE holds the last available context of monitored entities, but lacks historical information. Through a connector component subscribed to Context Broker, every context update is written to storage (Hadoop distributed filesystem), thus building such historical information.

This information is offered to the cloud management enablers following both pull and push models. On the one hand, a query manager server implements a query API that allows gathering the last records of a measurement at different levels (vApps, VM, and even deployed software); besides, by using map-reduce mechanisms, we are able to aggregate certain measurements into higher-level data that can be used for more precise management of the resources (for instance, aggregating all the measurements from all the VMs hosting an application/service into a more meaningful KPI at the service level). On the other hand, it is still possible to subscribe to Context Broker notifications about context updates, thus being pushed with new monitoring data.

Target Usage

The monitoring system is used by different Cloud GEs in order to track the status of the resources. They use gathered data to take decisions about elasticity or for SLA management. Whenever a new resource is deployed in the cloud, the proper monitoring probe is set up and configured to start providing monitoring data. The GE offers a query interface to allow other GEs to poll for information relevant to any KPI associated to resources.

Basic Concepts

Following the above FMC diagram of the Monitoring GE, in this section we introduce the main concepts related to this GE through the definition of their interfaces and components and finally and example of their use.

The key concepts visible to the cloud user could be differentiated between the interfaces and the components, each of them are described below.

Entities

The following entities are considered:

  • Measure. A value that corresponds to the value of a metric for a resource in a given moment. These metrics are collected by monitoring probes, i.e., software which have been installed inside the VMs.
  • Source. A probe that generates data for different metrics of a given resource.
  • Resource. A cloud resource for which a cloud metric can be generated. It can apply to IaaS level resources and their aggregations (Organization, Project, vApp, VMs) and PaaS level resources (PIs and ACs).
  • Context. In the NGSI vocabulary from Context Broker, a set of attributes (measures) from an entity (resource) in a given moment.

Interfaces

Two different models are supported for accessing monitoring data:

  • Pull model. The client fetches monitoring data from resources using the Monitoring API offered by its query manager.
  • Push model. The client subscribes to Context Broker GE (registering a callback URL) and receives notifications whenever new monitoring data is available.

Components

The Monitoring GE comprises a set of distributed components:

  • Monitoring Probes, part of the software tool used to gather metrics. They are installed in the resource to be measured (virtual machine, physical node, etc.) and configured to provide the monitoring information. Such information has to be pushed to the Adapter. Monitoring GE should be agnostic to the concrete monitoring software chosen. Alternatives are Nagios, Zabbix, openNMS, perfSONAR, collectd, mBeanCmd, Ganglia, etc.
  • Monitoring Collector: in case probes aren't able to push data directly to Adapter, then a custom component has to be deployed as part of the monitoring tool to gather probe data and issue HTTP requests to Adapter.
  • NGSI Adapter: responsible for translating probe raw data into a common format (NGSI), and issuing update requests to Context Broker.
  • Context Broker GE: publish/subscribe broker managing context updates.
  • Hadoop: this is the storage system for saving the history of metrics for several years. These metrics have been organized according to the data model, that is, it will include also aggregated information.
  • BigData Connector: subscriber of Context Broker resposible for writing context updates into storage.
  • Query Manager: implementation of the Monitoring API (pull model).

Example Scenario

When a VM or server has been deployed, probes installed on it start sending monitoring data according to a defined schedule. Either directly or through a monitoring collector, data reach the adapter, which in turn publish them into Context Broker. Hadoop stores such data by means of a connector already subscribed to Context Broker for updates.

Monitoring GE example scenario
Monitoring GE example scenario


At this point, clients may use both querying modes:

  1. The pull mode, by means of using the API implemented by Query Manager.
  2. The push mode, by means of subscribing a client to Context Broker for updates.

Main Interactions

The following section represents the main interactions in both modes.


Pull model

This set of operations involves querying information about a measurable resource once the resource has been deployed. A measurable resource can be anything being measured. It can involve both a virtual machine and a physical node. Inside the virtual machine, it can consider also software installed. Monitoring probes have to be installed at the monitored resource, in order to generate a set of metrics. Using the operations of the query API, it is possible to obtain this information.

Monitoring GE pull model
Monitoring GE pull model


Query metrics for all resources of a given type

Client obtains the list of metrics for all available monitored resources of a given type

  • INPUT: resource type (corresponding to NGSI entityType)
  • OUTPUT: list resources and their metrics

Query metrics for at most n resources of a given type

Client obtains the list of metrics for at most n monitored resources of a given type

  • INPUT: resource type (corresponding to NGSI entityType); the maximum number n of results
  • OUTPUT: list resources and their metrics

Query metrics for a specific resource

Client obtains the list of metrics for a specific resource

  • INPUT: resource type (corresponding to NGSI entityType) and resource identifier (NGSI entityId)
  • OUTPUT: resource and its metrics

Push model


Monitoring GE push model
Monitoring GE push model


Subscribe a client

It subscribes a client to Context Broker for updates, providing a callback URL to receive notifications (see more details in Context Broker User and Programmers Guide)

  • INPUT: subscribeContext request specifying: entities to subscribe to, attributes, notify conditions and callback URL
  • OUTPUT: subscribeContext response including subscriptionId

Process notification

Client processes input data on Context Broker notification

  • INPUT: notifyContext request including subscriptionId and context elements
  • OUTPUT: none

Basic Design Principles

Design Principles

This section specifies a set of requirements for the Cloud monitoring framework:

  • Non-intrusiveness on resource functionality and performance: monitoring needs to be as non-intrusive as possible. This means that, although monitoring probes are installed within the monitored resources, they should not affect neither the rest of resource functionality nor performance.
  • Deal with metric heterogeneity: monitoring system has to deal with different kind of metrics (infrastructure, KPI, applications and product metrics), different virtualization technologies, different products, applications, etc.
  • Scalability in monitored resources: the system needs to be able to scale to large numbers of monitored nodes and resources.
  • Scalability in number of measures: when the number of monitored resources increases, the number of information to be included in the storage also increases. Taking into account that we collect information frequently (say at a 5 seconds rate) and we store it for keeping historical data, a normal relational database can fail.
  • Data aggregation: monitoring system should be able to aggregate the information at application/service level, which means that it has to be able to aggregate metrics from VMs or hardware resources at service level.
  • Federation capabilities: monitoring data from different regions (federated infrastructures) should be published and made available at a Federation Layer.

Resolution of Technical Issues

Several approaches can be adopted to convey the design principles described in the previous section. Here, some of them are suggested:

  • Probes for non-intrusive solution: the approach would consist of introducing probes installed in the same virtual machines or host where the resource to be measure is placed, but not being part of the resource software itself.
  • Collector for solving heterogeneity in metrics: the approach would consist of including a monitoring collector component as part of the architecture, which is in charge of collecting the different metrics from the different probes.
  • Distributed storage for metrics scalability: a promising approach would consist of gaining scalability in the number of metrics handled by means of using Hadoop distributed filesystem as storage.
  • BigData GE for data analysis: the approach would consist of using map-reduce techniques to perform data aggregation.
  • Context Broker GE for data federation: the approach would consist of using the publish/subscribe capabilities of Context Broker GE.

Detailed Specifications

Following is a list of Open Specifications linked to this Generic Enabler. Specifications labeled as "PRELIMINARY" are considered stable but subject to minor changes derived from lessons learned during last interactions of the development of a first reference implementation planned for the current Major Release of FIWARE. Specifications labeled as "DRAFT" are planned for future Major Releases of FIWARE but they are provided for the sake of future users.


Open API Specifications

Either in the push model or in the pull model, Monitoring GE has no API itself. It relies on other components' APIs, such as Context Broker or the so-called Query Manager. Integration with OpenStack Ceilometer is being considered: in such case, the API for the pull model would be that of Ceilometer (thus acting as the Query Manager).


Re-utilised Technologies/Specifications

The Monitoring Manager GE is based on RESTful Design Principles. The technologies and specifications used in this GE are:

  • RESTful web services
  • HTTP/1.1 (RFC2616)
  • JSON data serialization format.


Terms and definitions

This section comprises a summary of terms and definitions introduced during the previous sections. It intends to establish a vocabulary that will be helpful to carry out discussions internally and with third parties (e.g., Use Case projects in the EU FP7 Future Internet PPP). For a summary of terms and definitions managed at overall FIWARE level, please refer to FIWARE Global Terms and Definitions

  • Infrastructure as a Service (IaaS) -- a model of delivering general-purpose virtual machines (VMs) and associated resources (CPU, memory, disk space, network connectivity) on-demand, typically via a self-service interface and following a pay-per-use pricing model. The virtual machines can be directly accessed and used by the IaaS consumer (e.g., an application developer, an IT provider or a service provider), to easily deploy and manage arbitrary software stacks.
  • Platform as a Service (PaaS) -- an application delivery model in which the clients, typically application developers, follow a specific programming model to develop their applications and or application components and then deploy them in hosted runtime environments. This model enables fast development and deployment of new applications and components.
  • Project is a container of virtual infrastructure that has a set of virtual resources (e.g., computing capacities, storage capacities) to support the former. In other words, a VDC is a pool of virtual resources that supports the virtual infrastructure it contains.
  • Service Elasticity is the capability of the hosting infrastructure to scale a service up and down on demand. There are two types of elasticity -- vertical (typically of a single VM), implying the ability to add or remove resources to a running VM instance, and horizontal (typically of a clustered multi-VM service), implying the ability to add or remove instances to/from an application cluster, on-demand. Elasticity can be triggered manually by the user, or via an Auto-Scaling framework, providing the capability to define and enforce automated elasticity policies based on application-specific KPIs.
  • Service Level Agreement (SLA) is a legally binding contract between a service provider and a service consumer specifying terms and conditions of service provisioning and consumption. Specific SLA clauses, called Service Level Objectives (SLOs), define non-functional aspects of service provisioning such as performance, resiliency, high availability, security, maintenance, etc. SLA also specifies the agreed upon means for verifying SLA compliance, customer compensation plan that should be put in effect in case of SLA incompliance, and temporal framework that defines validity of the contract.
Personal tools
Create a book