A short introduction to the world of Shared Block Storage Systems
Block storage is a technology that is used to store data externally - opposite to internal or directly-attached server storage - on Storage Area Networks (SANs) or cloud-based storage environments in the form of fixed-sized blocks. This form of external storage is used when speed and efficiency are important (opposite to file storage and object storage). Other than file storage, it is also independent of the operating system and file system used, therefore it is easy to replace one block storage system with another. also see https://www.ibm.com/cloud/learn/block-storage
Software Defined Storage
Software-defined storage (SDS) is a storage architecture that separates storage software from its hardware. The main advantage of this approach is the ability to use standard (commodity) server hardware for storage rather than specialized storage controllers and to create virtual, distributed storage systems at a large scale
A SAN (Storage Area Network) system provides shared block storage to hosts (servers). It consists of:
External enclosures housing mass storage media (SSD, HDD) connected to the SAN Controllers
SAN Controllers, which may be housed in separate enclosures or together with storage media
A storage network, which connects the storage arrays to the hosts - today, three types of protocols are used over three types of transport layers: iSCSI (over TCP/IP), SCSI (over Fiber Channel and Infiniband) and NVMEoF (over all three)
Host Drivers and Host Bus Adapters (network interfaces to connect to the storage network)
Why is block storage in wide-spread use?
Shared block storage scales significantly better than local storage as it is not limited to the available space and connectors in a single server enclosure
A storage reserve can be pooled across storage-using hosts, which provides better utilization
Storage can be centrally managed
SAN storage is an important instrument in building application/database high-availability and scalability
Many SAN systems also implement a wide range of additional features such as storage compression, encryption and remote storage replication
Compared to shared file storage, SAN storage has a much lower access latency and higher IOPS (IO transactions per second)
NVME (2.0) is a new industry standard for interfacing of SSD: While the older SATA and SAS standards were originally developed for HDDs, NVME was specifically created for SSDs to keep up with the enormous performance gains of NAND technology.
It uses the fast PCIe (version 3.0 or 4.0) bus that is part of most commercial server mainboards and connects periphery directly with the CPUs and the RAM of a server.
NVME is the future storage technology: The market is rapidly growing at a CAGR of 28% and expected to reach a USD 168 bln. in 2025.
Storage performance indicators include IOPS (IO requests processed per second), access latency (roundtrip processing time of a single IO request) and sequential throughput (how many bytes per second can be written or read on a single IO request).
These indicators are inter-connected, for example too many IO requests at a time will ultimately lead to increasing access latency, because queues will build up.
Sequential throughput is important for certain workloads such as media-streaming and backups, while IOPS are more important for mixed workloads such as virtualization, application servers and databases.
Intel SPDK is a software development kit for NVME and NVMEoF (NVME-over-Fabric) storage applications.
Through its NVME focus and implementation in user space (bypassing operating system kernel IO services), the Intel SPDK provides maximum performance;
Regular spdk performance tests indicate outperformance of existing SAN systems in terms of IOPS / GB / without exhausting CPU or RAM resources of the servers.
Why Data Center Storage Systems cannot catch up with dramatic performance gains and price drops of new generations of SSDs
Due to the increased adoption of SSDs, continuous NAND technology R&D and the development of the NVME standard, costs per performance unit (e.g.measured in K-IOPS) of non-volatile mass data storage, which is built into notebooks, client computers and servers, is continuing to drop dramatically over the last years.
We do not see the same trend, however, with SAN systems used in data centers: One K-IOPS of NVME, costs as little as 1 USD if built locally into a server, but 300 USD and more if its part of an external SAN system.
Moreover, the scalability of SAN systems did not catch up with the massive increase of performance per capacity unit (GB) of NVME SSD: The fastest commercially available SAN systems cost millions of dollars and reach a total performance of ca. 25.000 K-IOPS (see https://spcresults.org/), which is only about 40 times the amount that a single NVME drive can process.
For comparison: A few years back, in the times when HDDs (magnetic hard disk drives) where used as main mass storage media, SAN systems could provide 500 times the K-IOPS of a single HDD.
Only a few years ago, the HDD was still the dominant mass storage media in the data center. Today, there are several generations of SAN system in use:
Many of SANs out there are still based on HDD-only arrays
Hybrid Systems combining SSDs and HDDs are most common now
All-Flash systems, which are based on the previous SATA and SAS standards, became more popular over the last 2-3 years
New NVME-based all-flash systems are rare and very expensive, but on the rise
Each of these generations requires different features and a new hardware architecture, design and production technology; Particularly with NVME, a complete scratch re-design of the hardware with their circuit boards and micro-chips, is required, followed by a full replacement of production technology.
At the same time, most previous investments are becoming obsolete: SAN systems, which are based on the latest generations of disks, do not benefit from most of the many complex features built over many years for previous generations of SANs.
Full system (hardware and software) R&D cycles are very expensive and take time, also because production technologies need to follow the adoption; Frequent adoption cycles lead to high vendor costs and this is then reflected in the high prices for customers.
Also, it is hard to keep up with the enormous performance gains of NVME SSDs that are made with each new generation of disks every year: All internal SAN components must be designed and continuously enhanced not to become the new bottleneck (instead of the disks, which were the bottleneck for the last 30 years). In this regard, SAN vendors also depend on third-party products such as PCIe switch micro-chip vendors, which are expensive.
The problem is also aggrevated because SAN systems are usually based on a centralized controller architecture, which means that all IO passes through a set of controllers and SAN interfaces. If more and more storage is added to the same system, interface capacity ultimately reaches saturation and performance starts to wear off. While it took hundreds of disks to reach that limit with HDDs, today a few super-fast NVME drives may be sufficient to reach performance saturation on the controller-level.
The high costs and long duration of technology cycles for hardware-based SAN systems can be overcome by Software-Defined Storage:
Frequent release cycles allow to always adopt to the latest generation of nvme disks at comparably low cost.
However, performance has been the limiting factor of Software-Defined Shared Storage: IO requests have to pass the operating system kernel stack, requiring multiple in-memory data copies. While these data copies did not increase overall access latency and decrease IOPS for HDD - simply because computer memory is a million times faster than HDDs - this is not anymore true with NVME SSDs. A single full data copy may already significantly increase access latency and decrease IOPS.
Moreover, so-called locking mechanisms, which are synchronization mechanisms of operating system kernels and are required to coordinate concurrent IO processing by the kernel, are slowing down the highly parallelized, asynchronous NVME IO protocol, particularly when the number of parallel requests increases.
Another issue with SDS is write amplification. Due to the architecture of these systems, each IO write request may result in multiple write requests on the storage device level (sometimes up to 20 writes for a single request), massively slowing down the critical write IOPS indicator.
For Software-Defined Storage Solutions in an hyper-converged environment, another issue becomes vivid: A high storage performance comes at a cost of available CPU and RAM resources for other hypervisor services and virtual machines. However, this should not be an either-or: A high storage performance should not slow down all other elements of a virtualization or cloud system - resources need to be balanced, effectively limiting the available storage performance.
With our design, we are addressing these issues to create software-designed storage at the highest possible performance and scalability, faster than most hardware, but not at the cost of resources also required for other services. This can be achieved by offloading all heavy processing to separate storage nodes and circumventing kernel-space code with highly efficient user-space applications, which avoid data copies and work fully lockless based on asynchronous inter-thread communication.