We use proprietary and third party's cookies to improve your experience and our services, identifying your Internet Browsing preferences on our website; develop analytic activities and display advertising based on your preferences. If you keep browsing, you accept its use. You can get more information on our Cookie Policy
Cookies Policy
FIWARE.ArchitectureDescription.Cloud.Monitoring - FIWARE Forge Wiki

FIWARE.ArchitectureDescription.Cloud.Monitoring

From FIWARE Forge Wiki

Jump to: navigation, search

Contents

Copyright

Copyright © 2012-2015 by Telefónica I+D. All Rights Reserved.

Legal Notice

Please check the following FIWARE Open Specification Legal Notice (implicit patents license) to understand the rights to use these specifications.

Overview

Every distributed system needs to incorporate monitoring mechanisms in order be able to constantly check the performance. Monitoring involves gathering operational data in a running system. There might be many consumers, which might use this information for various purposes. SLA management, where the system needs to be able to constantly check that the performance adheres to the terms signed, could be one of them. Monitoring data can also be used in a variety of ways, for example as optimization of virtual machines, products and applications, alarms detection, recommendations, etc.

This specification describes the Monitoring GE, which is the key enabler to provide monitoring information to the rest of GEs. Its architecture can be seen in the following figure:

Monitoring GE architecture overview
Monitoring GE architecture overview


The Monitoring GE works once the resource has been deployed. The architecture of the Monitoring GE requires having monitoring probes distributed in the VMs and Hosts. This is valid for both IaaS or PaaS. This information is pushed to an adaptation layer either directly by the probes or through a custom monitoring collector responsible for that on behalf the probes.

Adaptation layer (NGSI Adapter) expresses raw monitoring data in terms of updates of entities' context, where the resources being monitored are such entities, and therefore sends requests to Context Broker GE. This GE holds the last available context of monitored entities, but lacks historical information. Through a connector component subscribed to Context Broker, every context update is written to storage (Hadoop distributed filesystem), thus building such historical information.

This information is offered to the cloud management enablers following both pull and push models. On the one hand, a query manager server implements a query API that allows gathering the last records of a measurement at different levels (vApps, VM, and even deployed software); besides, by using map-reduce mechanisms, we are able to aggregate certain measurements into higher-level data that can be used for more precise management of the resources (for instance, aggregating all the measurements from all the VMs hosting an application/service into a more meaningful KPI at the service level). On the other hand, it is still possible to subscribe to Context Broker notifications about context updates, thus being pushed with new monitoring data.

Target Usage

The monitoring system is used by different Cloud GEs in order to track the status of the resources. They use gathered data to take decisions about elasticity or for SLA management. Whenever a new resource is deployed in the cloud, the proper monitoring probe is set up and configured to start providing monitoring data. The GE offers a query interface to allow other GEs to poll for information relevant to any KPI associated to resources.

Basic Concepts

Following the above FMC diagram of the Monitoring GE, in this section we introduce the main concepts related to this GE through the definition of their interfaces and components and finally and example of their use.

The key concepts visible to the cloud user could be differentiated between the interfaces and the components, each of them are described below.

Entities

The following entities are considered:

  • Measure. A value that corresponds to the value of a metric for a resource in a given moment. These metrics are collected by monitoring probes, i.e., software which have been installed inside the VMs.
  • Source. A probe that generates data for different metrics of a given resource.
  • Resource. A cloud resource for which a cloud metric can be generated. It can apply to IaaS level resources and their aggregations (Organization, Project, vApp, VMs) and PaaS level resources (PIs and ACs).
  • Context. In the NGSI vocabulary from Context Broker, a set of attributes (measures) from an entity (resource) in a given moment.

Interfaces

Two different models are supported for accessing monitoring data:

  • Pull model. The client fetches monitoring data from resources using the Monitoring API offered by its query manager.
  • Push model. The client subscribes to Context Broker GE (registering a callback URL) and receives notifications whenever new monitoring data is available.

Components

The Monitoring GE comprises a set of distributed components:

  • Monitoring Probes, part of the software tool used to gather metrics. They are installed in the resource to be measured (virtual machine, physical node, etc.) and configured to provide the monitoring information. Such information has to be pushed to the Adapter. Monitoring GE should be agnostic to the concrete monitoring software chosen. Alternatives are Nagios, Zabbix, openNMS, perfSONAR, collectd, mBeanCmd, Ganglia, etc.
  • Monitoring Collector: in case probes aren't able to push data directly to Adapter, then a custom component has to be deployed as part of the monitoring tool to gather probe data and issue HTTP requests to Adapter.
  • NGSI Adapter: responsible for translating probe raw data into a common format (NGSI), and issuing update requests to Context Broker.
  • Context Broker GE: publish/subscribe broker managing context updates.
  • Hadoop: this is the storage system for saving the history of metrics for several years. These metrics have been organized according to the data model, that is, it will include also aggregated information.
  • BigData Connector: subscriber of Context Broker resposible for writing context updates into storage.
  • Query Manager: implementation of the Monitoring API (pull model).

Example Scenario

When a VM or server has been deployed, probes installed on it start sending monitoring data according to a defined schedule. Either directly or through a monitoring collector, data reach the adapter, which in turn publish them into Context Broker. Hadoop stores such data by means of a connector already subscribed to Context Broker for updates.

Monitoring GE example scenario
Monitoring GE example scenario


At this point, clients may use both querying modes:

  1. The pull mode, by means of using the API implemented by Query Manager.
  2. The push mode, by means of subscribing a client to Context Broker for updates.

Main Interactions

The following section represents the main interactions in both modes.


Pull model

This set of operations involves querying information about a measurable resource once the resource has been deployed. A measurable resource can be anything being measured. It can involve both a virtual machine and a physical node. Inside the virtual machine, it can consider also software installed. Monitoring probes have to be installed at the monitored resource, in order to generate a set of metrics. Using the operations of the query API, it is possible to obtain this information.

Monitoring GE pull model
Monitoring GE pull model


Query metrics for all resources of a given type

Client obtains the list of metrics for all available monitored resources of a given type

  • INPUT: resource type (corresponding to NGSI entityType)
  • OUTPUT: list resources and their metrics

Query metrics for at most n resources of a given type

Client obtains the list of metrics for at most n monitored resources of a given type

  • INPUT: resource type (corresponding to NGSI entityType); the maximum number n of results
  • OUTPUT: list resources and their metrics

Query metrics for a specific resource

Client obtains the list of metrics for a specific resource

  • INPUT: resource type (corresponding to NGSI entityType) and resource identifier (NGSI entityId)
  • OUTPUT: resource and its metrics

Push model


Monitoring GE push model
Monitoring GE push model


Subscribe a client

It subscribes a client to Context Broker for updates, providing a callback URL to receive notifications (see more details in Context Broker User and Programmers Guide)

  • INPUT: subscribeContext request specifying: entities to subscribe to, attributes, notify conditions and callback URL
  • OUTPUT: subscribeContext response including subscriptionId

Process notification

Client processes input data on Context Broker notification

  • INPUT: notifyContext request including subscriptionId and context elements
  • OUTPUT: none

Basic Design Principles

Design Principles

This section specifies a set of requirements for the Cloud monitoring framework:

  • Non-intrusiveness on resource functionality and performance: monitoring needs to be as non-intrusive as possible. This means that, although monitoring probes are installed within the monitored resources, they should not affect neither the rest of resource functionality nor performance.
  • Deal with metric heterogeneity: monitoring system has to deal with different kind of metrics (infrastructure, KPI, applications and product metrics), different virtualization technologies, different products, applications, etc.
  • Scalability in monitored resources: the system needs to be able to scale to large numbers of monitored nodes and resources.
  • Scalability in number of measures: when the number of monitored resources increases, the number of information to be included in the storage also increases. Taking into account that we collect information frequently (say at a 5 seconds rate) and we store it for keeping historical data, a normal relational database can fail.
  • Data aggregation: monitoring system should be able to aggregate the information at application/service level, which means that it has to be able to aggregate metrics from VMs or hardware resources at service level.
  • Federation capabilities: monitoring data from different regions (federated infrastructures) should be published and made available at a Federation Layer.

Resolution of Technical Issues

Several approaches can be adopted to convey the design principles described in the previous section. Here, some of them are suggested:

  • Probes for non-intrusive solution: the approach would consist of introducing probes installed in the same virtual machines or host where the resource to be measure is placed, but not being part of the resource software itself.
  • Collector for solving heterogeneity in metrics: the approach would consist of including a monitoring collector component as part of the architecture, which is in charge of collecting the different metrics from the different probes.
  • Distributed storage for metrics scalability: a promising approach would consist of gaining scalability in the number of metrics handled by means of using Hadoop distributed filesystem as storage.
  • BigData GE for data analysis: the approach would consist of using map-reduce techniques to perform data aggregation.
  • Context Broker GE for data federation: the approach would consist of using the publish/subscribe capabilities of Context Broker GE.
Personal tools
Create a book