We use proprietary and third party's cookies to improve your experience and our services, identifying your Internet Browsing preferences on our website; develop analytic activities and display advertising based on your preferences. If you keep browsing, you accept its use. You can get more information on our Cookie Policy
Cookies Policy
FIWARE.OpenSpecification.Cloud.PolicyManager - FIWARE Forge Wiki

FIWARE.OpenSpecification.Cloud.PolicyManager

From FIWARE Forge Wiki

Jump to: navigation, search
Name FIWARE.OpenSpecification.Cloud.PolicyManager
Chapter Cloud,
Catalogue-Link to Implementation Bosun
Owner Telefónica I+D, Fernando López

If you need to check out the Open Specs of the previous FIWARE Release (Release 4) you can go to FIWARE.OpenSpecification.Cloud.PolicyManager R4

Contents

Preface

Within this document you find a self-contained open specification of a FIWARE generic enabler, please consult as well the FIWARE Product Vision, the website on http://www.fiware.org and similar pages in order to understand the complete context of the FIWARE platform.


Copyright

Copyright © 2012-2016 by Telefónica I+D. All Rights Reserved.


Legal Notice

Please check the following FIWARE Open Specification Legal Notice (implicit patents license) to understand the rights to use these specifications.


Overview

Policy Manager GE is a FIWARE Cloud component developed to provide Policy as a Service across any collection of monitoring data received from cloud services in order to supply a governance and compliance for dynamic cloud services management.

Policy Manager GE aims to supply an extensible open-source expert system framework for automatic governance of cloud services (e.g. applications, networks, computation and storage) within a dynamic infrastructure. It is a FIWARE Cloud service whose role is the automatic resolution of policy enforcement. Input data to Policy Manager GE (“facts”) comes from the monitoring of cloud services through FIWARE Monitoring GE, which relies on Ceilometer, Monasca or any other monitoring framework. Facts are received via a publish-subscribe pattern (for instance, using Publish/Subscribe Context Broker GE).

The baseline for the Policy Manager GE is PyCLIPS, which is a module for Python language that embeds full CLIPS functionality into Python applications. This means that OpenStack and FIWARE community can benefit from a reliable, widely used and strong Python-based inference engine. Hence, Policy Manager offers the decision-making ability, independently of the type of resource (physical/virtual resources, network or applications), able to solve complex problems within the cloud field by reasoning about the knowledge base.

CLIPS is an environment to develop knowledge-based systems, defined using a specific programming language which allows us to represent procedural and declarative knowledge. This knowledge represents any problem or situation in the real world that we want to automatize. Examples of problems (and actions to be taken) could be a service failure requiring a restart, or physical hosts with high CPU utilization requiring the action of moving resources and/or services to other hosts in order to balance the overall utilization of the infrastructure. All these problems and actions are represented in CLIPS by means of facts and rules.

The facts are received from entities or context producers (i.e. cloud services) and become available to the knowledge system or context consumer defined in the corresponding rule. The resolution of these rules by the inference or rule engine can produce again new facts or raise some actions to be taken by third parties. Actions are defined by those third parties and represent notifications inferred by the rule engine. In essence, each rule is expressed in form of a combination of conditions which trigger certain actions that need to be taken. The logic to trigger these actions can be expressed as rules.

The main functionality that the Policy Manager GE provides is:

  • Rules management. Register, classify and manage all the rules, including check for consistency in their definitions. It takes care about the relationships management between different rules and the proper decision to resolve conflicts between rules activation.
  • Facts management. Register and manage all facts received from any cloud service (physical or virtual) through a publish/subscribe broker. It is responsible to implement a small-signal control and stability of dynamic performance of those received facts in order to provide signal stability across time before inferring certain rules.
  • Actions management. It is responsible of publishing the inferred actions from the rule engine. Policy Manager GE just notifies about the actions that have to be taken, actions which are not either executed or resolved.

The next architecture overview gives us detailed description of the behavior of this component. The Policy Manager needs interaction with the user who provides the specification of the rules and actions that comprise the knowledge system following the CLIPS language format. The facts are received from the context producer that monitors the different resources of the cloud system. Policy Manager GE is subscribed to Context Broker GE to receive notifications (facts) of cloud services. These facts are used by the inference engine to produce new facts based on the rules or infer new actions to take by third parties.

Policy Manager Enabler Architecture Overview


Policy Manager architecture specification


Target Usage

The Policy Manager GE is an expert system that provides an independent service in the OpenStack ecosystem which evaluates the current state of the knowledge base, applies the pertinent rules and infers new knowledge. The goal is to provide a framework so that one FIWARE Lab infrastructure owner can enforce such rules by defining them in a simple manner. The rules are defined by users, who defines conditions and actions to be taken. The rules can be of different types, as shown in the following examples:

  • Availability rules
    • An example of availability rule can be to migrate services off a physical host when we can infer that this host might be going down.
    • Another example is to simply restart all the services on another physical host if it crashes or we need to carry out some scheduled intervention (e.g. upgrading the operating system).
  • Performance rules
    • When we detect that a physical host is close to 100% of its utilization, we can decide to move some resources to another host.
    • When we detect that the resources associated to a cloud service are close to be exhausted, we can decide to increase the flavour associated to this cloud service (vertical scaling) or increase the number of instances of it under a load balance service (horizontal scaling).
  • Load balancing rules
    • If we detect that some physical host is overutilized while others are underutilized we might decide to move some resources across hosts to balance the overall CPU utilization in the system.
  • User Defined rules
    • FIWARE Lab infrastructure owners can collect some customized metrics and create rules based on those metrics.


Basic concepts

Following the above FMC diagram of the Policy Manager GE, in this section we introduce the main concepts related to this component through the definition of their interfaces and components and finally an example of their use. The Policy Manager GE manages a set of rules which trigger actions when certain conditions are activated when some facts are received. These rules can be associated with a specific cloud service or be a general rule that affects a physical host. The key concepts, components and interfaces associated to the Policy Manager GE and visible to the FIWARE Lab infrastructure owners, are described in the following subsections.

Entities

The main entities managed by the Policy Manager GE are the following:

  • Rules. They represent the policy or rule that will be used to infer new facts or actions based on the facts notified by the Publish/Subscribe Context Broker GE. Usually, rules are some type of statement of the form: if <x> then <y>. The "if" part is the rule premise or antecedent and consist of a condition, and the "then" part is the consequent. The rule fires when the "if" part is determined to be true or false. There are two rule types:
    • General Rules. They represent a global policy to be considered regardless the specific cloud service. Each rule comprises the name to identify it and the condition and action which is triggered. General Rules entities are represented as RuleModel.
    • Specific Rules. They represent a policy associated to a specific cloud resource. Specific Rules entities are represented as SpecificRuleModel.
  • Information. It represents the information about the Policy Manager API and project information. This includes the window size, a modifiable value to manage the minimal number of measurements to evaluate the stability of the small-signals or facts. This means that the window size is really a time window size used to provide system stability in real-time monitored cloud system.
  • Facts. They represent the measurement of the cloud resources and will be used to infer new facts or actions. They are the base of the reasoning process and are notified by the Publish/Subscribe Context Broker.
  • Actions. They represent the triggered message inferred by the rule engine based on the execution of certain conditions defined in a rule or rules set. This message is notified to the third parties in order to raise the proper action.

Interfaces

The Policy Manager GE shows two main interfaces:

  • The Policy Manager interface (PMI) is the REST API that implements all features of the Policy Manager GE exposed to the users. The PMI allows to define new rules with their actions associated to a specific cloud service. Additionally, this interface allows to get the information about this GE (URL, documentation, window size, owner and time of the last server start). Besides, the PMI implements the NGSI-10 interface in order to receive facts provided by Publish/Subscribe Context Broker GE (notification of the context data) related to a cloud service.
  • NGSI Interface which implements the NGSI-10 “notifyContext” operation in order to receive facts provided by Context Broker GE related to a cloud service. See Publish/Subscribe Context Broker GE for more details.

Components

The Policy Manager GE comprises the following components:

  • API-Agent (PMI) is responsible for offering a REST API to the Policy Manager GE users (by default FIWARE Lab infrastructure owners). It triggers the appropriate manager to handle any request received from users. We can find the following subcomponents:
    • InfoManager, responsible for the management of general information about the running Policy Manager GE instance besides the specific project information like the window size.
    • RuleManager, responsible for the management of general and specific rules associated to cloud services.
  • Rules Engine. Is responsible for activating a rule when a condition is satisfied based on the received facts and triggering the associated actions.
    • RuleEngineManager, middleware to interact with CLIPS adding the new facts to the Rule Engine and check rule conditions.
    • DBManager, provides connection to the Database where the different information (i.e. rules, window size, Policy Manager GE instance information) is stored.
  • Fact-Gen, provides the mechanisms to insert facts into the Rules Engine from context data received.
    • FactGenManager, provides the small-signal control and stability of received data context (facts) from the Publish/Subscribe Context Broker and insert the stabilized signal (fact) into the rule Engine. The stabilization of the signal depends of the window size defined in the InfoManager.
  • Condition & Actions Queue, which contains all the rules and actions that can be managed by Policy Manager, including the window size for each project. This is a database component in which Policy Manager GE stores the information received in the API Agent component to be used afterwards by the Rule Engine.
  • Facts Queue, is a queue used to store all the different context data of facts received from the Publish/Subscribe Context Broker GE. The queue is divided into different channels, once per each cloud service and its queue size corresponds to the window size defined for the project of each cloud service. This means that we have the same windows size for all the cloud services of the same project.

Example Scenario

In this section we want to present in detail which is the behavior of the Policy Manager GE in the simplest way, taking into account all the entities, components and interfaces aforementioned. All scenarios will require the following steps:

  • Rules creation provided by users and subscription to the context broker.
  • Populate rule engine with facts collected from the context broker.
  • Rules management (status) at runtime.

Rules Management

The Rules Management involves several operations to prepare the knowledge-based system. First of all, the rules have to be defined. The rule definition includes the conditions that must be checked, the actions to be triggered, and a descriptive name to identify it so user can easily recognize the rule. This rule can also be specified for a single cloud service.

Secondly, but not less important, we need to receive facts (context data) from the cloud services. If we want to get such facts, Policy Manager GE must subscribe to Publish/Subscribe Context Broker GE in order to receive notifications of those cloud services, identified by a <entityId>:<entityType> pair as specified by NGSI standard. Publish/Subscribe Context Broker GE gets updates of the context (fact) of each cloud service from the information that it is received from OpenStack Monitoring system. After that, the Publish/Subscribe Context Broker GE notifies the fact to the Policy Manager GE through Fact-Gen and stores this information in the Fact Queue in order to get a stable monitored value without temporal oscillation of the signal.

Finally, the rules can be deleted or redefined. When a rule is deleted, Policy Manager GE automatically unsubscribe the cloud service from Publish/Subscribe Context Broker GE if rule is a Specific Rule.

Collecting data

Each cloud service publishes the status of its resources (gathered by the OpenStack Monitoring components) on the Publish/Subscribe Context Broker GE. Policy Manager GE receives notifications of status changes. After that, Policy Manager GE is in charge of stabilizing facts and insert them into de Rule Engine. When Policy Manager GE receives a facts number equal to the window size, the Policy Manager GE calculates the arithmetic mean of the data and insert its value into the Rule Engine. Keep in mind that this process can be performed if and only if the time between the first and last queue values is less than or equal to a limited acceptance time. We have previously stated that this window size is really a time window so it is defined a limited acceptance time equal to 10 seconds. This parameter should be configurable by the user. Finally, Policy Manager discards the oldest value in the queue or values whose limited acceptance time is greater than the calculates one.

Runtime Management

During the runtime of an application, the Policy Manager GE can detect if a rule condition is inferred and in this case, triggers the actions associated with. This action will be communicated (notified) to the user or users that was or were subscribed to this specific rule. If there are more than one rule that can be activated the system will calculate the preference between each of them.


Main Interactions

The following pictures show some interactions between the Policy Manager GE, the Cloud Portal as main user in a typical scenario and the Identity Management GE (from now on IdM GE) to provide authorization/authentication control to the component. For more details about the Open REST API of this GE, please refer to the Open Spec API specification.

First of all, every interaction needs authentication sequence before starting. Authentication sequence follows like this:

Authentication sequence
  1. The Policy Manager requests a new Administration Token (Admin Token) from IdM GE in order to validate the future token received from the Cloud Portal through generate_adminToken() message. This is the normal functionality of every OpenStack service.
  2. The IdM GE returns a valid Administration token that is used to check the Token received from the Cloud Portal requested message through the checkToken(Token) message.
  3. There are two options at this point:
    1. If the Token is valid, the IdM GE returns the information related to this token.
    2. If the Token is invalid, the IdM GE returns the message of unauthorized token.



The next interactions get information about the Policy Manager GE server instance:

Get Information sequence
  1. The User through Cloud Portal or CLI sends a GET operation to request information about the Policy Manager GE instance that it is currently up and running through getInformation() message.
  2. The InfoManager returns the information related to the Policy Manager GE associated to this project.
    1. Owner of the GEi.
    2. Window size of the facts stabilization queue.
    3. Current version under execution.
    4. Time and date of the last wake up of the Policy Manager GE instance.
    5. URL of the open specification.



Let's guess a scenario in which the user wants to update the windows size associated to the project of the cloud service. You can see the request to update the window size.

Update Window Size sequence
  1. The User through Cloud Portal or CLI sends a PUT message to the Policy Manager GE to update the window size of the projectId through the updateWindowSize() message.
  2. The Policy Manager returns the information associated to this projectId in order to confirm that the change was made. This information includes the projectId in which the update was realized besides with the updated window size value.



Next iterations are related to the process to create generic or specific rule.

Create general or specific rule sequence
  1. The User through Cloud Portal or CLI requests a POST operation to create a new general/specific rule to the Policy Manager GE.
    1. In case of general one, the create_general_rule() message is used, with the following parameters:
      1. projectId, the OpenStack identification of the project,
      2. rule, the rule description which includes the name that we want to give to the specific rule, the condition to activate it and the action that we want to trigger when the condition is true.
    2. In case of specific one, the create_specific_rule() message is used, with the following parameters:
      1. projectId, the OpenStack identification of the project,
      2. serverId, the OpenStack identification of the server,
      3. rule, the rule description in the same way that it was defined previously.
  2. The Rule Manager returns the new ruleModel associated to the new requested rule and the Policy Manager returns the response to the user.
    1. If the operation produces and error, due to incorrect representation of the rule, a HttpResponseServerError is returned in order to inform to the user that something was wrong.



Afterward, you could see the interactions to get information about already created general rules:

Get all general rules sequence
  1. The User through Cloud Portal or CLI requests a GET operation to the Policy Manager GE in order to receive all the general rules associated to a project through get_all_rules() message with parameter projectId, the identifier of the project.
  2. The Rule Manager component of the Policy Manager GE responses with the list of general rules (RuleModel) that it is stored in the database.
  3. If the project identifier is wrong or any other error is produced in the operation, the Rule Manager responses a HttpResponseServerError.



Following, the interactions to get detailed information about getting general or specific rule sequence.

Get general or specific rule sequence
  1. The User through Cloud Portal or CLI requests a GET operation to recover the rules.
    1. If we decide to recover a general rule, the get_rule() message should be used with ruleId parameter, the identifier of the previously created rule.
    2. Otherwise, if we decide to recover a specific rule, the get_specific_rule() message should be used with the ruleId parameter.
  2. The Rule Manager of the Policy Manager GE returns the ruleModel that it is stored in the database. If something is wrong in the operation, Policy Manager GE returns HttpResponseServerError to the user.



Next off, the interactions to delete general or specific rule.

Delete a general or specific rule sequence
  1. The User through Cloud Portal or CLI requests the deletion of a general or specific rule to the Policy Manager with the identifiers of the project and the rule.
    1. The view sends the request to the RuleManager by calling the delete_rule() message with identifier of the rule as parameter of this interface to delete it.
    2. Otherwise, if the rule is specific for a server, the views sends the request to the RuleManager by calling the delete_specific_rule() message, with the indentifier of the rule as parameter of this interface to delete it.
  2. If the operation is ok, the RuleManager responses a HttpResponse with the ok message, by contrast, if something is wrong, it returns a HttpResponseServerError with the details of the problem.


Finally, the interactions to update a specific or general rule.

Update a general or specific rule sequence
  1. The User through Cloud Portal or CLI requests the update of a general or specific rule to the Policy Manager GE with the identifier of the project and rule.
    1. The view sends the request to the RuleManager by calling the update_general_rule() message with the identifier of the project and rule as parameters of this message to delete the rule.
    2. Otherwise, if the rule is specific for a server, the views sends the request to the RuleManager by calling the update_specific_rule() message, with the identifier of the project, server and rule as parameters of this interface to delete it.
  2. If the operation is ok, the RuleManager responses with a new ruleModel class created and the API returns a HttpResponse with the ok message, by contrast, if something is wrong, it returns a HttpResponseServerError with the details of the problem.


Basic Design Principles

Design Principles

The Policy Manager GE has to support the following technical requirements:

  • The condition to fire the rule could be formulated on several facts.
  • The condition to fire the rule could be formulated on several interrelated facts (the values ​​of certain variables in those facts match).
  • User could add facts "in runtime" via API (without stopping the server).
  • User could add rules "in runtime" via API (without stopping the server).
  • The management of rules have to include:
    • Update facts.
    • Delete facts.
    • Create new facts.
  • Actions can use variables used in the condition.
  • Actions implementation can invoke REST APIs.
  • Actions can send an email.
  • The Policy Manager GE should be integrated into the OpenStack without any problem.
  • The Policy Manager GE should interact with the IdM GE in order to offer authentication and authorization functionality.
  • The Policy Manager GE should interact with the Publish/Subscribe Context Broker GE in order to receive monitoring information from resources.
  • The Policy Manager GE can manage very high incoming throughput due to the high number of cloud resources and the time interval between each fact (at least each 5 seconds).

Resolution of Technical Issues

When applied to Policy Manager GE, the general design principles outlined at Cloud Hosting Architecture can be translated into the following key design goals:

  • Rapid Elasticity, capabilities can be quickly elastically provisioned and released, in some cases automatically, to scale rapidly outward and inward commensurate with demand. To the consumer, the capabilities available for provisioning often appear to be unlimited and can be appropriated in any quantity at any time.
  • Availability, Policy Manager GE should be running all the time without interruption of the service due to the nature of itself.
  • Reliability, Policy Manager GE should assure that the activations of rule are produce by correct inference based on facts received from a Publish/Subscribe Context Broker GE.
  • Safety, if the Policy Manager has any problem, it should continue working without any catastrophic consequences on the user(s) and the environment.
  • Integrity, Policy Manager GE does not allow the alteration of the facts queued and/or rules and actions queued.
  • Confidentiality, Policy Manager GE does not allow the access to facts, rules and actions associated to a specific project.

Regarding the general design principles not covered at Cloud Hosting Architecture, they can be translated into the following key design goals:

  • RESTful interfaces, for rules and facts.
  • The Policy Manager GE keeps stored all rules provisioned for each user.
  • The Policy Manager GE manage all facts and checks when actions should be fired.

Detailed Specifications

Following is a list of Open Specifications linked to this Generic Enabler. Specifications labeled as "PRELIMINARY" are considered stable but subject to minor changes derived from lessons learned during last interactions of the development of a first reference implementation planned for the current Major Release of FIWARE. Specifications labeled as "DRAFT" are planned for future Major Releases of FIWARE but they are provided for the sake of future users.


Open API Specifications


Re-utilised Technologies/Specifications

The Policy Manager GE is based on RESTful Design Principles. The technologies and specifications used in this GE are:

  • RESTful web services
  • HTTP/1.1 (RFC2616)
  • JSON data serialization formats.

Terms and definitions

This section comprises a summary of terms and definitions introduced during the previous sections. It intends to establish a vocabulary that will be helpful to carry out discussions internally and with third parties (e.g., Use Case projects in the EU FP7 Future Internet PPP). For a summary of terms and definitions managed at overall FIWARE level, please refer to FIWARE Global Terms and Definitions

  • Infrastructure as a Service (IaaS) -- a model of delivering general-purpose virtual machines (VMs) and associated resources (CPU, memory, disk space, network connectivity) on-demand, typically via a self-service interface and following a pay-per-use pricing model. The virtual machines can be directly accessed and used by the IaaS consumer (e.g., an application developer, an IT provider or a service provider), to easily deploy and manage arbitrary software stacks.
  • Platform as a Service (PaaS) -- an application delivery model in which the clients, typically application developers, follow a specific programming model to develop their applications and or application components and then deploy them in hosted runtime environments. This model enables fast development and deployment of new applications and components.
  • Project is a container of virtual infrastructure that has a set of virtual resources (e.g., computing capacities, storage capacities) to support the former. In other words, a VDC is a pool of virtual resources that supports the virtual infrastructure it contains.
  • Service Elasticity is the capability of the hosting infrastructure to scale a service up and down on demand. There are two types of elasticity -- vertical (typically of a single VM), implying the ability to add or remove resources to a running VM instance, and horizontal (typically of a clustered multi-VM service), implying the ability to add or remove instances to/from an application cluster, on-demand. Elasticity can be triggered manually by the user, or via an Auto-Scaling framework, providing the capability to define and enforce automated elasticity policies based on application-specific KPIs.
  • Service Level Agreement (SLA) is a legally binding contract between a service provider and a service consumer specifying terms and conditions of service provisioning and consumption. Specific SLA clauses, called Service Level Objectives (SLOs), define non-functional aspects of service provisioning such as performance, resiliency, high availability, security, maintenance, etc. SLA also specifies the agreed upon means for verifying SLA compliance, customer compensation plan that should be put in effect in case of SLA incompliance, and temporal framework that defines validity of the contract.
Personal tools
Create a book