Architecture Overview

The simpleblock.io system consists of the following components:

  • a stack of user-space drivers (bypassing the operating system kernel) on each initiator host (an initiator host is the storage user, in a virtualization environment its also called compute node or hypervisor)

  • a stack of user-space drivers on each storage node

  • additional background services running on each storage node

  • a regular Infiniband Storage Area Network

  • a webservices api and command line interface, which can run on any host remotely and

  • a set of drivers for OpenStack, which can partly run remotely and partly need to be located on the compute nodes (initiator hosts)

Distributed Block Storage Controller

Our product in its core is a distributed storage controller, which transforms a bunch of servers with NVME disks into a super-fast, highly-scalable, perfectly reliable shared block storage system.

For each IO request, the placement algorithm will determine the correct location of a data page in the cluster by selecting a storage node, a device and a device partition. These steps are performed by the placement driver on the initiator host.

On the storage node, the specific data has then to be read from or written to blocks of the partition.

It is the responsibility of the allocation driver to manage the space on a disk partition and perform lookups and mappings.  

We do this in a manner that minimizes write amplification by minimizing the required frequency and amount of meta-data updates. Also we ensure that read operations can be performed with almost zero overhead and most write operations as well. 

Efficient space allocation on disks

Distributed Data Placement, Replication, Erasure Coding and Localization

We develop a ighly scaleable, distributed algorithm to stripe, erasure code/decode, place and localize data in accordance with reliability, storage efficiency and performance policies.

 

The algorithm uses a cluster topology map and does not require a directory or any other persistant data structure.

 

It is designed to work at nearly constant speed independent of the size of a cluster and to minimize data migration activities when scaling or re-configuring the cluster.

 

The algorithm also reliably protects from data loss, ensures full data integrity and supports high-availability.

It ensures perfectly balanced allocation of storage in the cluster and efficiently deals with temporary and permanent device failures and overloaded devices. 

Abstract Lines

Logical Volume Manager

All IO is performed through logical volumes and they are the main unit of administration.

We support both thinly and thickly provisioned logical volumes. Volumes may be resized after their creation and can be erased. Volume snapshots can be taken at any time and these snapshots can be read or restored. Writable clones may be created from snapshots, implementing effective copy-on-write.

Volumes and clones are tied to a particular initiator host, but can be migrated between hosts in milliseconds. All logical volume meta-data is stored in the cluster and can be accessed from all hosts.

Additional features supported by logical volumes are encryption and compression.

In addition, logical volumes may be configured with a replication or erasure coding policy.

Most administrative procedures are fully automated and do not require any administrator interaction. 

The system continues to operate without interruption in case of failed or removed devices and failed or removed storage nodes by retrieving requested data from alternate locations in real-time;

 

It performs automatic rebuilds of data on failed devices to spares or replacements and re-integrates devices from failed storage nodes;

 

It performs automated data integrity checks and repairs integrity errors;

 

It does automated re-balancing of the cluster after scaling, re-configuration and in case of overloaded devices; 

 

It also performs regular SMART checks of devices to ensure their proper functioning and creates warnings in case maximum write endurance is in sight.

Abstract Lines

Self-Healing and Automated Maintainance

Wavy Abstract Background
Wavy Abstract Background

Command Line and Webinterface

Our system provides a RESTful webservices API and a command line interface.

All administrative operations are performed through this interface: Adding initiator hosts and storage nodes to the cluster, re-configuring storage nodes, managing logical volumes, snapshots and clones, performing diagnostic activities and reading statistics.

The interface can be used both locally and remotely and requires credentials to connect to the simpleblock.io configuration service on a particular initiator host or storage node.

Any cluster-wide operations may be performed via any storage node, as our cluster-wide key-value store is replicated and resides on each storage node, this is easy. 

The same interface is used by OpenStack drivers to create and configure logical volumes and access to them.