Battle of Bytes: Comparing kubernetes storage solutions

Vaibhav Rajput
7 min readSep 22, 2020

--

In this article, I will be comparing some of the known storage solutions, how to readily deploy and use them, a bit about their architecture and usability. I will be topping it off with my experience and what I’ve heard in the community about these offerings.

To test these services out, I created a 3 node cluster out of EC2 instances on AWS. Instead of going for more number of small nodes, I decided to go for less number of big instances. Once I set up the cluster, I deployed these solutions one by one and tested them using a job running dbench container. Below is the configuration I used

kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: dbench-pv-claim
spec:
storageClassName: ______ <--- Tool specific Storage Class
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 50Gi
---
apiVersion: batch/v1
kind: Job
metadata:
name: dbench
spec:
template:
spec:
containers:
- name: dbench
image: vaibhavrajput/dbench:latest
imagePullPolicy: Always
env:
- name: DBENCH_MOUNTPOINT
value: /data
- name: FIO_SIZE
value: 1G
volumeMounts:
- name: dbench-pv
mountPath: /data
restartPolicy: Never
volumes:
- name: dbench-pv
persistentVolumeClaim:
claimName: dbench-pv-claim
backoffLimit: 4

So let’s get started

OpenEBS

First on our list is OpenEBS. It’s really easy to install and configure, so you can readily set it up and put to use. All it takes to install is

kubectl apply -f https://openebs.github.io/charts/openebs-operator.yaml

Or you can also use the Helm chart to install the same. Once this command is successfully executed, just confirm by checking the pods using kubectl get pods -n openebs and the block devices using kubectl get blockdevices -n openebs

Coming to the architecture, the following diagram is taken from the docs

If we look at the three main cluster components,

Maya-ApiServer or m-apiserver, exposes the OpenEBS REST APIs. It is also responsible for creating deployment specification files required for creating the volume pods.

OpenEBS provides a dynamic provisioner, which is the standard Kubernetes external storage plugin.

NDM or Node Device Manager treats block devices as resources that need to be monitored and managed just like other resources such as CPU, Memory and Network. It is a daemonset which runs on each node, detects attached block devices based on the filters and loads them as block devices custom resource into Kubernetes.

The developers at the Slack channel are super-helpful to get you through any issue if you face, so you would not face many difficulties using it.

Now coming to the performance, I saw great results in benchmarking tests which threw me a bit off given to the general word in the community. Turns out, when you will start working with high load, you will feel it lacking on the performance side. I tried the Jiva setup for this but you also get two other options, cStor and LocalPV. So long story short, loved the solution but unfortunately, the performance was a turn-off on big loads.

Rancher Longhorn

Coming from Rancher, I was really interested in this option. Its installation is pretty straight up

git clone https://github.com/longhorn/longhorn
kubectl create namespace longhorn-system
helm install longhorn ./longhorn/chart/ --namespace longhorn-system

Coming to the architecture,

Longhorn creates a dedicated storage controller for each block device volume and synchronously replicates the volume across multiple replicas stored on multiple nodes. The storage controller and replicas are themselves orchestrated using Kubernetes.

Longhorn also provides a sweet dashboard which was pretty handy and made things smoother. The performance was fairly good as well and overall I would call it a good solution which needs a tad bit of refinement.

StorageOS

StorageOS is a paid tool so I was not able to test it as extensively as I tested others. It also comes with a dashboard and tops it up with a CLI, so that was handy. I really liked how they made a Katacoda like Playground for people to get a feel of this tool.

That’s as detailed as it gets about the “architecture diagram” that I could find in the docs

What I can tell you is that these are the basic containers that this deployment runs: an operator to creation and maintenance of StorageOS cluster, a CSI helper for registering this tool with k8s cluster as a CSI driver, a control plane for monitoring and maintaining the state of volume, a data plane for I/O tasks, and finally a scheduler making sure apps and volumes stay on the same node.

StorageOS initially performed well but as I started hitting the load, it got super slow which was really displeasing for a paid tool.

MooseFS

A young project which provides a general-purpose network distributed file system. It can scale well and gets pretty good integration with cloud storage. To install it for AWS EBS, just do

git clone https://github.com/moosefs/moosefs-csi.git
cd moosefs-csi
kubectl apply -f deploy/kubernetes/moosefs-csi.yaml

Its architecture is pretty simple, and have limited components

MooseFs can act as a layer on top of hybrid storage. The storage can be distributed across multiple private/public clouds. MooseFs abstracts heterogenous storage providers and acts as a single interface

Yet another simple tool in this list. It was easy to install and use the AWS setup with it. Overall I would say that it just gets the job done. It performed well in the benchmarking and would probably land somewhere in the middle of the list.

Portworx

The best performing tool on this list was Portworx. It triumphed in all the gradations of the benchmarking tool. I don’t know about other people but I found the installation to be a bit tricky so I would recommend that you go through the docs for that. However, if you do want to try it out for fun, you can just play here.

Portworx is full of opportunities so instead of discussing one architecture, I’d rather refer you some pages like deployment architecture, use cases of stateful applications and well-instructed videos.

As told earlier, Portworx outperformed almost all others on this tool but the catch here is that it came at a cost. Like literally! it’s a paid tool.

Rook (Ceph)

Rook is a pretty well-known tool under CNCF incubation but still, I decided to keep in late in the list. The reason being that it is more than just a distributive storage system. It orchestrates the storage systems into a self-managing, self-scaling, self-healing storage services. You can use it with a variety of backends, I chose Ceph: a very versatile system offering File, Block and Object storage.

Installation for Rook was again pretty straight forward

git clone --single-branch --branch v1.4.4 https://github.com/rook/rook.git
cd rook/cluster/examples/kubernetes/ceph
kubectl create -f common.yaml
kubectl create -f operator.yaml
kubectl create -f cluster.yaml

If you look at the architecture,

The Rook operator automates configuration of storage components and monitors the cluster to ensure the storage remains available and healthy.

The Rook operator is a simple container that has all that is needed to bootstrap and monitor the storage cluster. The operator will start and monitor Ceph monitor pods. The Ceph OSD daemons to provide RADOS storage, as well as start and manage other Ceph daemons. The operator manages CRDs for pools, object stores (S3/Swift), and filesystems by initializing the pods and other artefacts necessary to run the services.

Except for a few slow write operations, Rook performed pretty well but the main reason I would recommend it is the simplicity it brings by abstracting the storage system.

Comparison

Once running the job on each of the setups, I picked the logs using kubectl logs job/dbench -f and gathered the stats. Below is the comparison on different parameters plotted using amcharts. All lines with bold vertices regard to Reads and without vertices regard to Writes.

Random Read/Write IOPS

Read/Write Bandwidth (MiB/s)

Latency (usec)

Parting Note

I would not choose one over the other, I’d rather say that:
If you want simplicity, go for OpenEBS
If you want performance, go for Portworx (paid)
If you want to take the load off your hands, go for Rook

--

--

Vaibhav Rajput
Vaibhav Rajput

Written by Vaibhav Rajput

DevOps working on cloud, containers, and more. Writer for Level Up Coding, The Startup, Better Programming, Geek Culture, and Nerd for Tech.

No responses yet