The Evolution of Persistent Storage with Containers – Container Storage Interface (CSI)

By Abhijith Shenoy | | Containers

The container ecosystem has evolved immensely over the last few years to support production-ready workloads. We, at Hedvig, believe that the evolution is not over yet. This evolution has given birth to several container platforms for implementing modern applications and services with a focus on solving developer problems. While each platform is built on a self-service, API-driven and programmable infrastructure, Kubernetes has emerged as the de facto standard for container orchestration, based purely on its technical pedigree.

As organizations embark on the journey to containerization, it is necessary to recognize the importance of data persistence for enterprise applications. Kubernetes excels in this aspect with its ability to seamlessly provide persistent storage capabilities to workloads, irrespective of whether the workloads are scheduled on-prem or in the cloud. This ability is attributed to the Persistent Volume framework of Kubernetes that standardizes the way in which persistent storage is dynamically provisioned and consumed by application pods.

Hedvig has complete integration with the Kubernetes Persistent Volume framework. The Hedvig Dynamic Provisioner, which is an out-of-tree provisioner, allows Kubernetes users to dynamically provision Hedvig virtual disks and consume them as persistent volumes using native Kubernetes constructs. A detailed walkthrough of this technical architecture can be found here and you can also watch the Tech Field Day presentation and demo of Hedvig’s Kubernetes integration here.

Although the Persistent Volume framework had a positive impact on application development — by providing a programmable interface to develop stateful applications while accelerating test/dev to production release times — it had a negative impact on persistent storage providers. Volume plugins in Kubernetes are in-tree plugins, which means that their code is packaged and shipped with Kubernetes binaries. Therefore, any enhancements to in-tree plugins (even a bug fix or a cool new feature) by storage providers had to align with Kubernetes release timelines. In addition, it was almost impractical for the Kubernetes developer community to test and certify third-party plugins. Enter CSI!

Container Storage Interface (CSI)

CSI is a community-driven project with the main goal of standardizing persistent volume workflows across different container orchestrators (CO) such as Kubernetes and Mesos. With CSI, storage providers (SP) can develop, maintain and deploy plugins across different container orchestrators with no dependency on the orchestrator core code. This leads to better turnaround times for bug fixes and new features.

The CSI specification that explains the interactions between container orchestrators and storage providers can be found here.

In a nutshell, a CSI driver consists of the following components:

Node Server – This is a gRPC server that enables access to persistent volumes. If you have deployed a Kubernetes cluster with 3 worker nodes, the node server should be running on each of these 3 nodes, since stateful applications can be scheduled on any of these nodes.

Controller Server – This is a gRPC server that manages the lifecycle (creation/deletion, among other operations) of persistent volumes. Therefore, it is unnecessary to run this on all nodes.

In the following section, we will describe how these components are deployed for Kubernetes and how they interact with each other to seamlessly create stateful applications.

Hedvig-CSI Driver

The Hedvig-CSI Driver supports v1.0.0 of the CSI specification.

The following figure provides an overview of how Hedvig integrates with any Kubernetes cluster through the CSI driver.

  • The Hedvig-CSI Controller Server is installed as a Deployment and is responsible for dynamically provisioning CSI volumes. It is also responsible for other operations, such as attaching and snapshotting volumes, which need not be executed on the node where the volume is consumed.
  • The Hedvig-CSI Node Server is installed as a Daemonset and is responsible for mounting and unmounting CSI volumes on Kubernetes nodes where the volumes will be consumed by applications.
  • The Hedvig Storage Proxy is deployed as a Daemonset and is responsible for handling I/O requests for all CSI volumes attached locally.

The following sequence of events occurs when a Kubernetes user issues a request to provision storage using the Hedvig-CSI driver. These events explain how Hedvig components interact with Kubernetes and utilize the Kubernetes constructs to let end users seamlessly manage Hedvig storage within a Kubernetes cluster.

  1. The administrator creates one or more storage classes (StorageClass) for Hedvig.
  2. The user creates a PersistentVolumeClaim by specifying the StorageClass to use and the size of the PersistentVolume requested.
  3. The Hedvig-CSI Controller Server provisions a Hedvig virtual disk on the underlying Hedvig Storage Cluster with the size requested and the attributes specified in the StorageClass.
  4. The Hedvig-CSI Controller Server then creates a PersistentVolume in Kubernetes corresponding to the newly provisioned Hedvig virtual disk. Kubernetes binds the PersistentVolumeClaim to the PersistentVolume created.
  5. The Hedvig-CSI Controller Server presents the Hedvig virtual disk as a LUN to the Hedvig Storage Proxy on the Kubernetes node where the application is scheduled.
  6. The Hedvig-CSI Node Server (running on the node where the application is scheduled) mounts the persistent volume, which is then consumed by the application.

A default StorageClass for Hedvig-CSI can be created using the following specification.

Any persistent volume created using this storage class will result in the creation of a Hedvig virtual disk with compression and deduplication enabled. In order to provision a persistent volume using the aforementioned storage class, create a persistent volume claim using the following specification.

In order to consume the persistent volume, create an application pod using the aforementioned persistent volume claim. The following specification creates an Nginx application pod and mounts the persistent volume claim under “/data” within the application container.

The Hedvig-CSI driver can be found on Docker Hub.