We use proprietary and third party's cookies to improve your experience and our services, identifying your Internet Browsing preferences on our website; develop analytic activities and display advertising based on your preferences. If you keep browsing, you accept its use. You can get more information on our Cookie Policy
Cookies Policy
FIWARE.ArchitectureDescription.Cloud.ObjectStorage - FIWARE Forge Wiki

FIWARE.ArchitectureDescription.Cloud.ObjectStorage

From FIWARE Forge Wiki

Jump to: navigation, search

Contents

Copyright

The FIWARE Cloud Object Storage Service Generic Enabler Specification is Copyright © 2015-2016 IBM and INTEL. Please note that this specification adopts the OpenStack Object Storage specification, which is published by and copyright OpenStack Foundation.

Legal Notice

Please check the following Legal Notice to understand the rights to use these specifications.

Overview

Object Storage is one of the Generic Enablers within FIWARE. It offers persistent storage for digital objects, important cloud-based functionality that has been specifically requested by Use Cases. Objects can be files, databases or other datasets which need to be archived. Objects are stored in named locations known as containers.

Containers and objects can have Metadata associated with them, providing details of what the data represents. Similar to files in a traditional filesystem - objects in an Object store belong to a certain user (account).

Rather than develop an entirely new interface, this Generic Enabler is based on OpenStack Object Storage (Swift).

The users of the Object Storage Generic Enabler include both FIWARE Cloud Instance Providers and FIWARE Cloud Instance Users.

  • Provider usage: Cloud Instance Providers can both provide Object Storage as a service to Cloud Instance Users, and consume the Object Storage service themselves. In terms of providing the service, the Cloud Instance Providers will require a system that demands as little maintenance as is possible. This entails that any:
    • stale data be purged,
    • deactivated accounts be removed,
    • corrupt data is replaced with a valid replica,
    • Issues are escalated to an automated service that will attempt to resolve them (if they cannot be resolved then notifications to the Provider should be sent),
    • relevant statistics should be available to support inspection of the system and the User's utilization of the system,
    • additional requirements for hardware (storage capability) can be easily added to the system without any drop in service. This will allow the storage capacity to grow over time.

In terms of consuming the service themselves, the Cloud Instance Providers will want to store certain types of data such as monitoring, reporting and auditing data to support their offering. This data can be made available to the Cloud Instance Users depending on requirements. The Object storage service can also be used as a virtual machine staging area. A Cloud Instance User may upload their custom virtual machine to the Object Store from which location the provider will make it available.

  • User usage: The User will use the object storage service as a means to distribute static content rather than incur the additional load of serving static content from an application. Taking this approach allows the Provider to optimize the distribution of those files. The Provider can also use this as a building block to offer further content distribution network capabilities. The User could also use the object storage service as a means to supply a customized virtual machine that only they have access to (the storage is isolated by user). This would operate in much the same way as how customized virtual machine images are supplied on services like Amazon EC2.

Basic Concepts

Basic Object Storage

Implementations of the Object Storage Generic Enabler (GE) should provide a highly available, distributed and eventually consistent object store. The object store is a collection of objects that are structured in a simple hierarchy. The object store presents itself as a service that has multi-tenant capabilities such that the service can be offered to many users and organizations and that their data is safely partitioned. The object storage service does not have a traditional POSIX-type file system and as it has a simple hierarchical structure it really has little notion of true directory semantics. Notably the Object Storage GE adopts the OpenStack Object Storage (Swift) specification.

The key abstract entities identified that need to be considered by this GE are:

  • Object: opaque piece of data with associated meta-data.
  • Container: collection of objects with associated meta-data.
  • Policy: meta data associated with an object or container that dictates the use of the data by the object storage service provider. Policies will be expressed through the meta-data facilities.
  • Account: a collection of containers assigned to a user.
  • User: the actor accessing and managing the above entities through the GE’s API. At a minimum the actors of end-user (human or external service) and administrator are considered. The security information related to User is managed by the Identity Management GE.

Access to and management of Object Storage entities is performed through the defined API. As standardized and open interfaces to GEs is an important aspect to consider in the specification of all GEs, the OpenStack Object Storage (Swift) API has been adopted as the API specification for the Object Storage GE.

Storlets Overview

In this section we describe an extension to Swift that we call "storlets", that allows a user to run code snippets in the object store.

Storlets are now an Openstack project. For a more completed description of this technology, see Storlet documentation .


The Storlet Engine

The Storlet engine is a Swift advanced functionality allowing to co-locate user defined compute near the storage. The engine allows running computations inside Swift's proxy and storage nodes. The data processing can take place as data is either flowing in or out of Swift, or already resides in Swift's storage nodes.

The executed code - called Storlet - is user defined, and is isolated from the Swift system using Docker Linux containers. In addition, the user also controls the Docker image where the Storlet is executed.

The Storlet Engine is multi-tenant in the sense that each account has its own set of Storlets and Docker images.

Storlets and Swift Multi-tenancy

Swift Multi-tenancy means that different tenants get different Swift accounts. From Swift's authentication and authorization point of view, each account has a different set of users, whose access to the account can be controlled at the account and at the container level. We mention that this behavior is authentication middleware dependent, but most middleware allow to configure the above behavior.

Storlets are aligned with this multi-tenancy in the sense that an account is the scope in which storlets work:

  • Each account has a different set of storlets that can be executed on data in that account.
  • Each account has a Docker image, where the account's storlets are executed.
  • Each account can be enabled for storlets independently from other accounts.
    • An attempt to send a storlet related request to an account that is not enabled for storlets will result in an error (400 Bad Request)

Storlets Basics

Invoking Storlets

Storlets can be executed in the following forms:

  • Execution upon GET. In this case the user gets a transformation of the object residing in the store (as opposed to the actual object). A typical use case for GET is anonymization, where the user might not have access to a certain data unless it is being anonymized by some storlet. Also bandwidth reduction is an important use case. More specifically we demonstrated how Apache Spark could benefit from Storlet by pushing the basic SQL filtering from Spark down to Swift. In the swift cluster, a dedicated storlet performed the SQL filter function (near the data) thus permitting to drastically reduce the network bandwidth used to connect the Spark cluster to swift. Note also that media transformation was the original use case of storlets.
  • Execution upon PUT. In this case the data kept in the store is a transformation of the object uploaded by the user (as opposed to the actual uploaded data or metadata). A typical use case is metadata enrichment, where a storlet extracts format specific metadata from the uploaded data and adds it as Swift metadata.

The Storlets 'store'

Whenever a Swift account is being enabled for Storlets, several storlet related containers having exclusive access by the account manager are created. Those containers are for storing the account's Docker images, the storlets, and the logs being emitted when the storlets are executed.

Storlets Dependencies

While we think of storlets as small pieces of code, they might be dependent on some 'heavier' software stack. We provide mechanisms to make this dependency available for the storlet.

Generic Architecture

The overall internal architecture of the Object Storage GE is taken from OpenStack Swift and is reproduced below:

The main elements in the functional block diagram are as follows:

  • Auth Node -Identity Management GE: This is the entity which handles the privileges of the users. In some setups, this entity is the Keystone component of OpenStack.
  • PRoxy Node: this entity is what exposes the interface of the Object Storage GE and allows users to manage their service instances. In OpenStack Swift, this entity is called the Swift proxy service.
  • Storage Nodes: this is the entity that manages the resources associated with a user’s service instance. Here entities such as containers, meta-data, objects and policies are managed. In OpenStack Swift, this entity is implemented via the Swift object service and Swift container service.

Main Interactions

The Object Store has interactions with users, Identity Management, and between its internal components. Users must use the Swift API to manipulate containers, objects, and meta-data. The Swift Proxy uses the interfaces of the Identity Management (Keystone) to confirm authorization of user operations.

Swift API

OpenStack Swift API is a RESTful interface and all interactions are via well-known HTTP methods such as GET, PUT and DELETE. For complete details on how a client interacts with the OpenStack Swift interface, please refer to the OpenStack Swift API documentation.

Storlet API

The Storlet API extends the usual HTTP methods GET and PUT. The usage of a storlet on an object is indicated by a special message header that indicates that the operation is a storlet operation. Details of the Storlet API and usage examples are available with a reference implementation at Getting started and Documentation for Storlets.

Basic Design Principles

OpenStack Swift is designed to be an object store that is:

  • highly available,
  • distributed,
  • eventually consistent,
  • accessible with a RESTful interface via well-known HTTP methods (GET, PUT, POST, DELETE).

For details, please see the OpenStack Swift documentation.


OpenStack Swift allows pluggable extensions to be added in a pipeline via middleware. The storlet engine is designed as Swift middleware, so that no changes are needed to the base Swift implementation in order to support storlets. In order to achieve tenant isolation, storlets from different tenants are run in different Docker containers.

Personal tools
Create a book