top of page

Container Attached Storage and Container Storage Interface: The Building Blocks of Kubernetes Storage

Containerized services must be stateless, a doctrine that was used a lot in the early days of containerization, which came hand-in-hand with microservices. And while it makes elasticity easy, these days we containerize many types of services, such as databases, which cannot be stateless - at least, without losing their meaning.

Docker, initially released in 2013, brought containerized applications to the vast majority of users (outside of the Solaris and BSD world), making it a commodity to the masses. Kubernetes, however, eased the process of orchestrating complex container-based systems. Both systems enable data storage options, ephemeral (temporary) or persistent.

What is Container Attached Storage (CAS)?

When containerized services need disk storage, no matter if ephemeral, or persistent, container attached storage (or CAS) provides the requested “virtual disk” to the container.

The CAS resources are managed alongside other container resources, and are directly

A diagram of the container attached storage concept
A diagram of the container attached storage concept

attached to the containers own lifecycle. That means that storage resources are automatically provisioned, and potentially deprovisioned. To achieve this functionality, the management of container attached storage resources isn’t provided by the host operating system, but directly integrated into the container runtime environment, hence systems such as Kubernetes, Docker, and others.

Since the storage resource is attached to the container, it isn’t used by the host operating system, nor other containers. Detaching storage and compute resources provides one of the building blocks of loosely coupled services, which can easily be managed by small and independent development teams.

From my perspective, there are five main principles that are important to CAS:

  1. Native: Storage resources are a first-class citizen of containerized environments. Therefore, it is seamlessly integrated with and fully managed by the overall container runtime environment.

  2. Dynamic: Storage resources are (normally) coupled to their container’s lifecycle. This enabled on-demand provisioning of storage volumes whose size and performance profile are tailored to the applications need. The dynamic nature and automatic resource management prevents manual intervention of volumes and devices.

  3. Decoupled: Storage resources are decoupled from the underlying infrastructure, meaning that the container doesn’t know (and care) where the provided storage comes from. That makes it easy to provide different storage options, like high performance, or highly resilient to different containers. For super-high performance but ephemeral storage, even RAM-disks would be an option.

  4. Efficient: By eliminating the need of traditional storage, e.g. local storage, it is easy to optimize resource utilization using special storage clusters, thin provisioning, and overcommitment. It also makes it easy to provide multi-regional backups and enables immediate re-attachment in case the container needs to be rescheduled on another cluster node.

  5. Agnostic: Due to the decoupling of storage resources and container runtime, the storage provider can be easily exchanged. That prevents vendor lock-in or provides the option to utilize multiple different storage options, depending on the need of specific applications. Meaning, a database running in a container will have very different storage requirements from a normal REST API service.

Given the five features above, we have the chance to provide each and every container with exactly the storage option necessary. Some may need only ephemeral storage, hence temporary storage that can be discarded when the container itself stops, while others need persistent storage which either lives until the container is deleted, or, in specific cases, will even survive this to be reattached to a new container (for example in the case of container migration).

What is Container Storage Interface (CSI)?

Like everything in Kubernetes, the container attached storage functionality is provided by a set of microservices, orchestrated by Kubernetes itself, making it modular by design. That said, services, internally provided, as well as extended by vendors, make up the container storage interface (or CSI). Together creating a well defined interface for any type of storage option to be plugged into Kubernetes.

The container storage interface defines a standard set of functionality, some mandatory, some optional, to be implemented by the CSI drivers. Those drivers are commonly provided by the different vendors of storage systems.

A diagram of the Container Storage Interface usage
A diagram of the Container Storage Interface usage

Hence, the CSI drivers build the bridges between Kubernetes and the actual storage implementation, which can either be physical, software-defined, or fully virtual (like an implementation sending all data being "stored" to /dev/null).

On the other hand it provides vendors with the option to implement their storage solution as efficiently as possible, providing a minimal set of operations towards provisioning and general management. That way, vendors can choose how to implement storage, with the two main categories being hyperconverged (compute and storage sharing the same cluster nodes), disaggregated, meaning that the actual storage environment is fully separated from the Kubernetes workloads using them, bringing a clear separation of storage and compute resources.

Just like Kubernetes, the container storage interface is developed as a collaborative effort inside the Cloud Native Computing Foundation (better known as CNCF), by a group of members from all sides of the industry, vendors and users.

The main goal of the CSI is to deliver on the premise of being fully vendor neutral. In addition it enables parallel deployment of multiple different drivers, offering, so called, storage classes for each of them. This provides us, as the users, with the ability to choose the best storage technology for each and every container, even in the same Kubernetes cluster.

As mentioned before, the CSI driver interface provides a standard set of storage (or volume) operations. These include the creation or provisioning, resizing, snapshotting, cloning, and volume deletion. The operations can either be performed directly, or through Kubernetes’ container resource descriptors (CRD), integrating into the consistent approach of managing container resources.

Kubernetes and Stateful Workloads

For many people containerized workloads should be fully stateless, and in the past it was the commonly used mantra. With the rise of orchestration platforms, such as Kubernetes, it also became more typical to deploy more and more stateful workloads, often due to the simplified deployment of such. Orchestrators offer features like automatic elasticity, restarting containers after crashes, automatic migration of containers for rolling upgrades, as well as many more typical operational procedures. Having them built-in into an orchestration platform takes a lot of the burden, hence people started to deploy more and more databases.

Databases aren’t the only stateful workloads though, also other applications and services may have the requirement to store some kind of state. Sometimes as a local cache, making use of ephemeral storage, sometimes in a more persistent fashion, as databases.

Benjamin Wootton (at the time working for Contino, now at Ensemble) wrote a great blog post about the difference between stateless and stateful containers, and why the latter is needed. You should go and read it, but only after this one.

Your Kubernetes Storage with Simplyblock

The container storage interface in Kubernetes serves as the bridge between Kubernetes and external storage systems. It provides a standardized and modular approach to provision and manage container attached storage resources.

By decoupling storage functionality from the Kubernetes core, CSI promotes interoperability, flexibility, and extensibility. This enables organizations to seamlessly leverage a wide range of storage solutions in their Kubernetes environments, tailoring the storage to the needs of each container individually.

With the evolving ecosystem, and changing Kubernetes workloads towards databases, and other IO-intensive, or low-latency applications, storage becomes increasingly important. Simplyblock is your distributed, disaggregated, high-performance, predictable low-latency, and resilient storage solution. Simplyblock is tightly integrated with Kubernetes through the CSI driver and available as a StorageClass and enables storage virtualization with overcommitment, thin provisioning, NVMe over TCP access, copy-on-write snapshots, and many more features.

If you want to learn more about simplyblock, read “Why Simplyblock?”

If you’re ready to buy, we believe in simple pricing!


bottom of page