We use proprietary and third party's cookies to improve your experience and our services, identifying your Internet Browsing preferences on our website; develop analytic activities and display advertising based on your preferences. If you keep browsing, you accept its use. You can get more information on our Cookie Policy
Cookies Policy
FIWARE.ArchitectureDescription.Cloud.JobScheduler R3 - FIWARE Forge Wiki

FIWARE.ArchitectureDescription.Cloud.JobScheduler R3

From FIWARE Forge Wiki

Jump to: navigation, search

Contents

FIWARE WIKI editorial remark:
This page corresponds to Release 3 of FIWARE. The latest version associated to the latest Release is linked from FIWARE Architecture

Copyright

Copyright © 2013 by INRIA. All Rights Reserved.

Legal Notice

Please check the following FI-WARE Open Specifications Legal Notice to understand the rights to use these specifications.

Overview

This specification describes the Job Scheduler GE, which is the key enabler to execute a generic job over distributed multiple heterogeneous computer systems, both physical and virtual ones.

The Job Scheduler GE integrates to major internal services, namely the Resource Manager (RM) Service and the Scheduler Service. Thanks to the internal RM Service, the Job Scheduler GE abstracts computer systems as computing resources where job can be executed, called nodes, offering the following main features:

  • infrastructures management
  • nodes provisioning based on users criteria
  • nodes life-cycle management
  • monitoring

Thanks to the internal Scheduler Service, the Job Scheduler GE finally puts those resources at the disposal of applications, users and other FI-WARE GEs, by giving the possibility of submitting a job for execution and handling its life-cycle.

So, the Job Scheduler GE can act as a general purpose GE that helps to save valuable time when a high amount of computation is required for data processing. It also increases the average computing resources usage, when underutilized, by offering the possibility to add dynamically underutilized resources through a registration mechanism.

The following diagram shows the main components of the Job Scheduler Generic Enabler.

Job Scheduler Architecture Overview

Job Scheduler GE architecture specification

In the above diagram, the REST Server implements the API front-end to the Job Scheduler GE and offers a representation of both the internal Scheduler Service and the Resource Manager Service to the Cloud User. Those two services are the backbone of the Job Scheduler GE back-end.

Because the computing resources being handled are distributed, an Asynchronous Message Queuing System is needed to allow the communication between all components to take place in a non-blocking way. Such a system relies on a middleware layer, whose role is to fill the gap that exists between all the heterogeneous computing systems, where nodes handled by the Job Scheduler GE can be available.

Target Usage

The Job Scheduler GE enriches the FI-WARE Cloud Architecture with an intuitive and powerful enabler, based on simple abstractions, by introducing the concepts of job and computing node.

Thanks to the middleware layer, it finds naturally place in the Cloud Hosting Chapter, since cloud computing itself is just one of the possible computing systems.

The Job Scheduler GE offers the way of gathering, aggregating and monitoring resources for computing purposes, by playing a key role for attracting users/enterprises that may already have physical resources -other than virtual ones- at their disposal. So, the Job Scheduler GE actually enables them to achieve a hybrid (physical and virtual) approach concerning computing resources usage, by increasing its average value. In fact, that might be the case when a physical computing resource, such as a workstation, is assigned to a just one developer and, at the same time, be configured to host tasks computations, as well.

Finally, the Job Scheduler GE is a general purpose GE to process data, available to users, enterprises and all the other FI-WARE GEs, especially to those having low computing performance at their disposal, as it happens to the Cloud Edge GE.

Main concepts

Following the above FMC diagram of the Job Scheduler, in this section we introduce the main concepts related to this GE through the definition of their interfaces and components; finally, an example of their use.

The Job Scheduler GE allows the job submission and its life-cycle control, by taking into account resources that are free, meaning available from Resource Manager perspective. Leveraging on computing resources abstraction achieved by these entities called nodes, it handles the dynamic addition/subtraction of resources, which could be desktop computers, clusters or clouds.

The key components visible to the cloud user could be differentiated between the interfaces and the components, together with the explication of the concepts used on it, each of them is described below.

Entities

In order to use the Job Scheduler GE, users should be familiar with the following key concepts:

  • Job. A Job is the entity to be submitted to the Scheduler, according with a given priority. It is composed of one or more tasks. More concretely, a job can be described through an XML which reflects the Job XSD schema.
  • Task. A Task is the smallest schedulable entity. It is included in a Job and will be executed in accordance with the scheduling policy on the available resources. It might be a Java task, which implements a specific interface, or a native task that is any user program, a compiled C/C++ application, a shell or batch script. Moreover, since a task might require more Computing Nodes at the same time, a multi-node task could be considered, as well.


Scheduler concepts class diagram

Scheduler concepts class diagram

  • Computing Node. A Computing Node, briefly Node, is a logical container for computing tasks. More concretely, it might be thought as a software agent (such a JVM), running on the computing resource, able to leverage its operating system to compute tasks and to extend it with any customized library, in order to be part of the middleware.
  • Nodes Deployment. Nodes Deployment, briefly Deployment, is the process by which at least one computing resource is enabled to host one or more Nodes. It depends highly on the middleware nature, but a simple example might be achieved by implementing ssh commands.
  • Infrastructure. An Infrastructure is an aggregation of Nodes. An example is the Local Infrastructure in the architecture picture, which represents the aggregation of some nodes, locally deployed together with the Job Scheduler GE implementation. Being local, the nodes deployment does not require ssh commands.
  • Infrastructure Manager. It is in charge of main Infrastructure management. Its behaviour could be extended by the Infrastructure.
  • Policy. A Policy defines a strategy of the Deployment. An example of Policy might be "deploy the Nodes at the moment of the Infrastructure creation and never remove them" (static deployment policy) or "deploy the Nodes at a particular time" (time slot policy).
  • Node Source. A Node Source is composed by an Infrastructure Manager and a Policy. All Nodes under the same Node Source will be launched on the same Infrastructure, according with the given Policy.


Resource Manager concepts class diagram

Resource Manager concepts class diagram

  • Nodes Selection Script. A Nodes Selection Script, briefly Selection Script, is the tool through which the user performs Nodes selection mechanism, according to some available criteria of his interest. The most common scripting languages (Javascript, Python, Ruby and so on) could be used. A Javascript concrete example is available here.
  • Node Registration. Node Registration, briefly Registration, is the process by which a Node is registered to a Node Source. After that, the Node belongs officially to the computing resources pool visible at Resource Manager level.

Interfaces

The Job Scheduler GE is currently composed of one main interface:

  • The Job Scheduler Interface (JSI) is the REST API that, at the same time, provides RESTful representation of both internal Scheduler and Resource Manager Services, which together constitute its backbone. It utilizes JavaScript Object Notation (JSON) to serialize state objects and transports them over HTTP. Usage of HTTP, as the transport protocol, eliminates most of the restrictions imposed by corporate firewalls; while JSON provides a lightweight, widely used mechanism to serialize/deserialize state objects.

Components

By referring to the Job Scheduler GE Architecture, here follows the description of the five main modules characterizing the Job Scheduler GE, the most of them subjected to the admin configuration:

  • REST Server, that relieves the Scheduler and the RM Service from the eventual overload due to the connection of large groups of simultaneous clients. In fact, thanks to a built-in caching mechanism, clients operations involving the state (task, job or the Scheduler and so on) are served according to the local (cached) state objects, periodically updated. Hence, REST Server effectively reduces the communication load due to multiple/concurrent requests. For the back-end integration, this component uses the native middleware to communicate with both the Scheduler Service and the Resource Manager Services.
  • Asynchronous Message Queuing System, that assures non-blocking messages delivery across heterogeneous computing systems. The way that could be accomplished depends on the embedded middleware architecture, which could leverage on a distributed or centralized communication system between its components. In the last case, it acts basically like a software router.
  • Authentication & Authorization, that is the first component contacted when the user wants to log in. It is in charge of authenticating the user and allowing him to access (or not) to the Scheduler Service or Resource Manager Service features.
  • Resource Manager Service, that is the software for coupling distributed resources in order to solve large-scale problems. It provides a single point of access to all resources by enabling an effective way of selecting and aggregating them for computations with required criteria. In order to accomplish all the above, Resource Manager relies on the following embedded sub-components:
    • RM Front-end, that offers the interface for accessing all the other sub-components.
    • RM Core, that keeps an up-to-date list of nodes able to perform the Scheduler tasks; gives nodes to the Scheduler asked by its user; dialogs with Node Sources for adding/removing nodes; performs creation and removal of Node Source; treats nodes addition/removal request; creates and launches events concerning nodes and node sources to Monitoring
    • Monitoring, that provides a way for a monitor to ask the Resource Manager to throw events, generated by nodes and nodes sources management.
    • Selection Manager, that is responsible for nodes selection from a pool of free nodes for further scripts execution. User requests of getting nodes are processed by Selection Manager, which may contact nodes at the request time and execute some code there in order to know whether the node is suitable. Once the user has obtained nodes, he contacts them directly without involving the RM.
    • Node Source, that manages acquisition, monitoring and creation/removal of a set of nodes in the Resource Manager.
    • Infrastructure Manager, that is the part of Node Source responsible for node deployment/release to/from the actual underlying infrastructure. For instance, it may launch a node over ssh or by submitting a specific job to the native scheduler of the system.
    • Node Source Policy, that is the part of node source defining rules and limitations of node source utilization. All policies require to define an administrator of the node source and a set of its users, so that you can limit nodes utilization. Moreover, the policy defines rules of nodes deployment, like static deployment (all nodes are launched at the moment of node source creation and never removed) or time slot deployment (nodes are deployed for particular time) or others.
  • Scheduler Service, that is the main entity and is not a GUI daemon, which -acting as client- is connected to the Resources Manager Service. It is composed of the following sub-components:
    • Scheduler Front-end, that is responsible for the management of the Scheduler. All authenticated users requests are treated by this front-end. Before transmitting requests to the core, the front-end checks if the users have the required authorization. This interface allows users to submit jobs, get scheduling state, and retrieve job results.
    • Scheduler Core, that is the main entity of the Scheduler Service. The Core is responsible for the Scheduler implementation and communicates with the RM Service to acquire nodes. It is in charge of scheduling Jobs according with the policy (FIFO by default), retrieving scheduling events to the user and making storage. Users cannot interact directly with the Scheduler Core, but they need to pass through the Scheduler Service Front-end.
    • Global Dataspace, that allows user to handle files during the scheduling process. In fact, as part of the Scheduler Service infrastructure, it defines the default location from where nodes, in order to accomplish a given task, can pick files up and where they can put eventual produced files. The first place is represented by the global input dataspace, where nodes may have read access; the second one is the global output dataspace, where access may be with read-write permissions. Finally, once needed files are fetched from input dataspace, a task can read/write from/to its own local dataspace.

Example Scenario

This section provides three main use cases from user, enterprise and FI-WARE GEs perspective, in order to show:

  • how the Job Scheduler GE could cover in FI-WARE the needs of a short time or temporary business model, whereas cloud computing uses to address users towards a long term one.
  • how the Job Scheduler GE could push to get into FI-WARE Cloud ecosystem, by allowing hybrid resources aggregation for jobs scheduling.
  • how the Job Scheduler GE could play in FI-WARE the role of a general purpose GE.

User Use Case

  1. A worker needs to have heavy computation and many tries are required to get the best result, to be achieved as soon as possible. By doing that with just his own laptop or with his current VM in the Cloud, it will require hours of processing for each try and, so, he loses valuable time.
  2. From a colleague, that had similar constraints, he finds the solution by designing an ad-hoc job to submit to the Job Scheduler GE, available in the cloud.
  3. After job design phase, he fills his credentials, waiting for Identity Management GE validation.
  4. Once approved, he submits the job and in few minutes fetches first results. If not satisfied, he tries again by tuning those parameters that define the job.
  5. Finally, the worker is so happy that shares his experience on his blog

Enterprise Use Case

  1. An enterprise has some business in the field of computing intensive data rendering and is thinking to get into the cloud. The only drawback with such an approach is that all its physical machines might be wasted, since nothing acting like a "glue" between distributed resources seems to be available.
  2. So, it receives the nice news about the existence of the Job Scheduler GE, which enables the usage of distributed heterogeneous computing resources for data processing.
  3. Once deployed into the cloud, the administrator of the enterprise proceeds as follows:
    1. first, he discovers which types of infrastructures and policies are available;
    2. then, creates new node sources, according with his infrastructure and policy requirements;
    3. finally, runs nodes on each available physical machine and registers them under the same RM Service, running at a given endpoint in the cloud;
    4. eventually, checks if nodes are actually available from the RM Service perspective and proceeds by assigning them the rendering tasks.

FI-WARE GEs Use Case

Both the previous scenarios could be easily rethought in such perspective. In particular, the Cloud Edge GE could have some computing issue due to not being a high performance device. As well as, any Edglet could leverage the Job Scheduler GE computing resources, for example to manipulate audio and video files.

Main Interactions

The Job Scheduler GE provides intuitive operations to manage the Resource Manager Service, the Scheduler Service and the other main entities, previously described.

Resource Manager Service

  • login - enables user to access the RM with his credentials
  • disconnect - disconnects user from the RM and release all the nodes taken by user for computations
  • isActive - tests if the RM is operational.
  • shutdownRM - kills the RM
  • getRMInfo - retrieves specific information relate to the RM
  • getMonitoring - gets the initial state of the RM
  • getRMStatHistory - returns the RM statistic history
  • getRMState - returns an overview about current free/alive/total nodes number of the RM
  • getRMVersion - returns the current RM version

Node Source

  • createNodeSource - creates a new node source in the RM, specifying infrastructure and policy, with related parameters
  • removeNodeSource - removes a new node source from the RM
  • getSupportedInfrastructures - returns the list of supported node source infrastructures descriptors
  • getSupportedPolicies - returns the list of supported node source policies descriptors

Nodes

  • isNodeAvailable - tests if a node is registered to the RM
  • lockNode - prevents other users from using a set of locked nodes
  • unlockNode - allows other users to use a set of nodes previously locked
  • releaseNode - releases a node, previously reserved for computation
  • addNode - adds a node to a particular node source. If not specified, add it to the default node source of the RM
  • removeNode - removes a node from the RM

Scheduler Service

  • login - enables user to access the Scheduler with his credentials
  • disconnect - disconnects user from the Scheduler
  • isConnected - tests whether or not the user is connected to the Scheduler.
  • startScheduler - starts the Scheduler
  • pauseScheduler - pauses the Scheduler
  • freezeScheduler - freezes the Scheduler
  • resumeScheduler - resumes the Scheduler
  • stopScheduler - stops the Scheduler
  • killScheduler - kills the Scheduler
  • getSchedulerStatus - returns the current Scheduler status.
  • getSchedulerStats - returns statistics about the Scheduler
  • getMySchedulerStats - returns statistics about the Scheduler usage of the current user
  • getConnectedUsers - returns users currently connected to the Scheduler
  • getJobs - returns jobs list
  • linkRM - connects the Scheduler to a given the RM endpoint
  • getSchedulerVersion - returns the current REST Server API and the Scheduler version.

Jobs

  • submitJob - submits a job to the Scheduler
  • killJob - kills the job execution
  • getLiveLogs - returns only the currently available logs of a job
  • getServerLogs - returns job server logs
  • pauseJob - pauses the job execution
  • resumeJob - resumes the job execution
  • getJobState - returns the job state
  • getJobsInfo - returns a subset of the Scheduler state
  • changeJobPriority - changes the priority of a job under execution
  • getJobResult - returns the job result and related logs
  • getTasks - returns the list of all the tasks belonging to the job
  • getTasksState - returns the list of the state of all tasks related to the job

Tasks

These operations are used to manage tasks within a job:

  • killTask - kills a task within a job
  • restartTask - restarts the task
  • preemptTask - preempts a task within a job
  • getTaskResult - returns the task result
  • getTaskState - gets task state


Basic Design Principles

In order to address a technical audience interested in the design of the Job Scheduler GE, the following basic principles should be taken into account:

  • Job Scheduler supports a RESTful interface;
  • Job Scheduler should be addressed to gather the most heterogeneous computing resource as possible;
  • Job Scheduler should be deployed on a virtual machine/physical host with a public IP address, in order to gather resources across internet;
  • Job Scheduler components communication must be asynchronous;
  • Job Scheduler relies on a middleware layer, able of behaving in an adaptive way with respect to network layer and devices (especially firewalls);
  • Job Scheduler's RM Service, being a single point of access to resources, should be highly stable.
  • Job Scheduler should have nodes available, in order to launch jobs, so that the RM Service should be launched before Scheduler Service;
  • Job Scheduler enables several users to share the same pool of resources and also to manage issues related to distributed environment, such as failing resources;
  • Job Scheduler should offer the possibility to aggregate nodes in different types of infrastructures, each one having a policy, and abstract each pair infrastructure-policy in a node source;
  • Job Scheduler must offer a dynamic node addition/subtraction to a node source;
  • Job Scheduler should maintain and monitor the list of resources;
  • Job Scheduler must supply computing nodes to users based on user criteria (i.e. specific operating system, available resources or licenses);
  • Job Scheduler is in charge of scheduling submitted jobs in accordance with the scheduling policy;
  • Job Scheduler should have a default global dataspace for input and output data, defined by the administrator;
  • Job Scheduler should give the possibility to the user to overwrite the default dataspace when a job is defined;
  • Cloud Users could either deploy it or could register nodes to an already existing the Job Scheduler;
  • Cloud Users could deploy nodes on any physical host, laptop, workstation or virtual machine.
Personal tools
Create a book