One of my jobs as a Senior Principal Software Engineer is to answer questions from customers at events. At events like KubeCon, I know a lot of those questions are going to be about one thing: Ceph in OpenShift. Usually, those questions boil down to one word: “Why?”

Here’s what I tell them.

What makes Ceph unique — the main differentiator between it and everything else you see in the cloud-native landscape — is that it's safe. Ceph has been around a while now. After more than 15 years of active development in the community, it’s proven its reliability. Traditionally you had hardware-based technologies, I tell them. We extracted all of that and we made it software. Ceph is the most popular SDS (Storage Defined Storage)  backend for OpenStack, for example.

Next, I carry that history forward a bit further., Ceph, which had been one of the first of its kind, already had a long history of adoption from virtualization with Proxmox, to Cloud with OpenStack and today to Cloud-native with Kubernetes That made people comfortable, because more than anything, what they needed was something robust. 

That’s still true. It hardly needs to be said that when you talk about storage, about your data, you’re talking about the most critical piece of your business. You don't really want to place your data in something that is just two years old, that hasn't been tested and doesn't have the track record we do. Because if you lose it, you lose the business.

Additionally, Ceph is Open Source and has a vibrant community which often time comforts people in their decision making.

Ceph’s reputation for stability only slightly preceded the awareness of how powerful it is. And now, it’s more and more understood that a lot of that power comes from its flexibility. Centrally, Ceph allows you to consume storage in different interfaces, as objects, block or file. Ceph does all three of them, something you rarely see in storage products.

When I'm talking to users, I'm often talking to people responsible for ten, 20, 30 or more petabytes of storage. They describe sometimes wildly different use cases, in which some missions will only use object storage, while others use block storage (like OpenStack in virtual machines), and some need a NAS file storage alike Others use all. 

So they don’t ask me why they should be flexible. They tell me they have to be. Maybe they have a portion of their infrastructure optimized for block storage, yet also have a section where capacity matters more than performance, and want to allocate that portion of infrastructure to do only object. Ceph makes that possible, I tell them.

Next, they ask about scale. That’s easy to explain, fortunately. Ceph's ability to scale is all about the core design. The architecture of the circuit is really micro services-oriented in Ceph. It's not like you have this monolithic block that you just place onto all your machines. Rather, it’s part of Ceph's nature, because it was designed to be divided from the very beginning.

When Ceph was first designed, I explain, the big problem was that your storage might contain many things, and you actually didn't know all of what was in there. Decoupled into components that only do specific tasks addresses that, and is also one of the ways Ceph allows you to scale. It sometimes surprises them how small the units can be, and that you can have many of them. You don't have one that is managing everything. 

And that is where sometimes it seems to really click for people. They get what I mean: that you don't have to pre-allocate a minimum set of resources. There's a minimum viable resource that you need to have, sure. But it is rather small compared to other aspects of your storage requirements, typically. Then, you can add more pieces to it. The more storage the user wants, the more you add. So, I tell them about the technical capabilities, and it starts to make sense to them.

But I also have to talk operations-wise. There too, however, flexibility is a big part of the reason for betting on Ceph.

With Ceph, you only have one storage entity to maintain, and teams only have one storage product to learn. That's cost-effective, because you don't have to have two different products to allocate in different types of drives, in different types of machines, to do different types of things, two support centers to call. You can have a set of machines dedicated to a certain task, like running objects, and another set of machines that are only doing block — all as part of the same cluster.

On top of all that goodness sits our OpenShift Container Storage operator - based on Rook, Ceph and NooBaa - that contains all the operational knowledge that administrators have accumulated for more than a decade now. All that logic lives in the Operator and makes day 2 management tasks such as scaling storage, maintenance and upgrades smoother with no manual human intervention.

That, I conclude, is “why Ceph.” Developed to prioritize stability and agility, it evolved into the perfect solution for today's cloud and multi-cloud environments. Their data will be safe, and they will be able to deploy storage in the manner that fits their workloads, at the scale they need, when they need it. 

They ask me about Ceph because what they hear sounds new. And it is. But, I tell them, it’s also been the case for a very long time.