A Brief History of Simplyblock

In this episode of Cloud Commute, Michael Schmidt, co-founder & CTO of simplyblock, joins us to discuss the innovation behind simplyblock. Michael shares his journey from consulting and leading major retail IT operations to the creation of simplyblock.

In this episode of Cloud Commute, Chris and Michael discuss:

What is software-defined storage?
Why is hardware-defined storage outdated?
What are the challenges with cloud storage solutions today?
How is Simplyblock addressing the inefficiencies of traditional and cloud storage?

Interested to learn more about the cloud infrastructure stack like storage, security, and Kubernetes? Head to our website (www.simplyblock.io/cloud-commute-podcast) for more episodes, and follow us on LinkedIn (www.linkedin.com/company/simplyblock-io/mycompany/). You can also check out the detailed show notes on Youtube (https://youtu.be/AJO5MKSg7Ls).

You can find Michael Schmidt (Co-Founder and CTO at simplyblock) on Linkedin: https://www.linkedin.com/in/michael-schmidt-7268a89/

About simplyblock:

Simplyblock is an intelligent database storage orchestrator for IO-intensive workloads in Kubernetes, including databases and analytics solutions. It uses smart NVMe caching to speed up read I/O latency and queries. Single system connects local NVMe disks, GP3 volumes, and S3 making it easier to handle storage capacity and performance. With the benefits of thin provisioning, storage tiering, and volume pooling, your database workloads get better performance at lower cost without changes to existing AWS infrastructure.

👉 Get started with simplyblock: https://www.simplyblock.io/buy-now

🏪 simplyblock AWS Marketplace: https://aws.amazon.com/marketplace/seller-profile?id=seller-fzdtuccq3edzm

Chris Engelbert: Simplyblock, tell us a little bit about it and what it actually is. There's probably still a lot of people that don't know. Michael Schmidt: We had like a sold process over the last several years. We had quite a few changes in that. Initially, I started off thinking that the times of this monolithic hardware defined storage systems are a little bit, let's say outdated. It's a very mature, very slow moving market. And while at the one time most enterprises still deploying this type of system. On the other side, the technology really to build the next generation of storage systems that are software defined by nature, it's definitely there. So all the foundational technology is there. There are also some products out already. And this is something that will change, not only with the moving to the cloud, but in general as a transformational, let's say, process into the software defined data center. This is clearly a thing that will happen. Chris Engelbert: Hello, everyone. Welcome back to this week's episode of Simplyblock's Cloud Commute Podcast. I know I say that every time, but this week it is true. I have a very special guest. Actually it's one of the two people that made this happen here and made me join the company. So Michael, it's a great pleasure having you. Michael Schmidt: Thank you very much. Chris Engelbert: And maybe just go ahead and give us a quick introduction of who you are and where you're from and what you did in the past. Michael Schmidt: Yeah, sure, of course. So, yeah, I am, let's say am a very old guy in terms of IT, and in general, not the youngest for a startup, but yeah. I'm in this business for, I would say 25 years now, maybe like this. And I have done a lot of different things. I'm Austrian citizen. So I live in Austria. And yeah, I have been through let's say both the consulting side I did quite some time with Accenture, big projects in financial services, data migration and system migration and transition to production. I moved into my own business. So I had a business in the early days of DevOps where we built a product that helped large organizations to automate releases. This product was later sold and became part of Computer Associates, doesn't exist anymore as well, Computer Associates, that's Broadcom now. So yeah, that after that, I became a product director in the space of infrastructure software, particularly automation software on the IT side. I did that with quite a large pool of customers. We had customers from Netflix over to Walmart. So quite, big customers in that space. Then I decided to change back to the customer side. So I was- I had the opportunity to become CIO in a more or less large retail business, which I did. And I did again in another one. But then I decided finally, I want to come back to startup life. And yeah, so that's basically how I ended up founding and building Simplyblock together with Rob. Chris Engelbert: All right, cool. Yeah, yeah. I mean it's either IBM or it is Oracle or it is Broadcom these days, I guess. Whenever you got to get acquired by a big corporation, it's one of those three, I guess. Michael Schmidt: Yeah, I guess. Chris Engelbert: Yeah. Cool. So yeah, Simplyblock. Just tell us a little bit about it. I mean, how did you come up with the idea and what it actually is? There's probably still a lot of people that don't know. Michael Schmidt: Yeah, I mean, we had like a sold process over the last several years. We had quite a few changes in that. Initially, I started off thinking that the times of this monolithic hardware defined storage systems are a little bit, let's say outdated. It's a very mature, very slow moving market. And while at the one time most enterprises still deploying this type of system, on the other side, the technology really to build the next generation of storage systems that are software defined by nature, it's definitely there. So all the foundational technologies there, there are also some products out already. And this is something that will change not only with the move into the cloud, but in general, it's a transformational let's say process into the software defined data center. This is clearly a thing that will happen. And we saw or I saw the opportunity there really to build something that is not entirely in the beginning of that process, but somehow still at the stage where we are very far from reaching the mainstream, so software defined is still not something used in store in mainstream for storage. It's still the hardware defined systems that are out there, but at the same time, we are far enough down the route to have all the foundational technology ready to get started with it. So that's what we sold or what I sold at the time. And more specifically, I was confronted both as a CIO. And then later when I built a small cloud provider myself with the question on how to get reasonably or reliable storage at a good performance for a good cost point. Because at the end of the day, yes, storage must be very reliable there. So there's no question that's the number one criteria for any type of data storage system. But on the other side if the costs are high, I mean, this is directly cutting off any business from the bottom line. So basically that's really net profit. That's really cutting into profits all of this foundational IT infrastructure technology as we know, so that's going straight into CAPEX or OPEX. And anything that you can save there, any dollar you can save there will be a dollar more of profit for your business. So that's why it's important also to look into that aspect of it. Chris Engelbert: Right. I think everyone loves the word increasing profit. That's always good. You mentioned software defined storage. And I guess a lot of people might be familiar these days with software defined networking, software defined whatever. So software defined storage is basically rethinking of how storage systems like SAN, like the storage area networks worked in the past, right? Is that a correct, like, idea of how to think about that? Michael Schmidt: On the one side it's that, but on the other side, of course, the power of software is so much bigger, so you can do so many more things. Also what's a very important aspect of it is I would say there are two main aspects of it. One aspect is really life cycle management. So you're becoming independent of hardware, which means that essentially you can use the same system, the same interfaces and the same actually deployed storage while under the hood changing the hardware from generation to generation. So that's really one of the big benefits and not only from generation to generation, but also from vendor to vendor. We all know that there can be shortages in particular hardware components of particular vendors waiting for a longer delivery time and also maybe even end of life that comes too early for an organization because they just don't have the resources to migrate right now. So all of that goes away, in fact, with software defined storage. And that's what I see as the one really main benefit of it. Clearly, this independence of the hardware, this independence of the lifecycle management of the hardware. The other big advantage is clearly about scalability. So a software defined system can just be built in a tremendously scalable manner, which is hard for hardware, because at hardware, you will still, in any time, you have one component or several components. The control is typically that limit your total bandwidth, your total IOPS, your total let's say, performance output of the system. Yes, you can add more of those, but then the question is how do they cooperate together? So there is a natural limit to all of those configurations, even if they are clusters on the hardware side and only software can overcome that limit. So I think those are the two main things. Other things are like, more flexibility when it comes to cyber security protection. The upgrade cycles are faster, so you can upgrade software much faster than hardware or firmware, obviously. So all of that is also a big advantage functionality wise. Chris Engelbert: All right. That makes a lot of sense. So when you think back, like, why did you come up with the idea of Simplyblock? Like saying, okay, we need to rebuild or we need to build something in the sense of storage, that is different. And what was like the main point where you said, okay, that needs to be solved in a different way. It doesn't make sense to do this in hardware. Michael Schmidt: Yeah. So looking into that, I mean, the first thing was I was looking for, again, for the reasons that I have mentioned, I was looking for a software defined alternative for a SAN system or several SAN systems that I had to purchase, but it was hard to get. So at the same time, there are software defined alternatives out there. They are quite scalable. CEPH is probably the most prominent one but they come with two big downsides. So, one downside is really performance. And this performance, I mean baseline performance. This is performance per terabyte, IOPS per terabyte mainly, and also access latency. The SAN systems have very low access latency nowadays definitely below one millisecond. And this is just not something you'll get with a CEPH system. You need a lot of general purpose hardware, x86 or ARM to provide, to prepare a CEPH system that is reasonably performant, but you will never reach the performance density of a SAN system. So that's the one thing. The other thing is, it's not about total performance only, it's about SLAs. So SLAs. With the CEF system, it's very hard, and generally all the other software defined systems, very hard to get an SLA. What do I mean by this? SLA means like well, how many IOPS per terabyte I can really get from the system. So what's my maximum excess latency that I can expect from my system? What's my throughput? So these are guarantees that are maybe even more important than the total output, but I need to rely as a customer that the system at the minimum will give me this amount of performance for my workloads, for my use cases. And that's what the hardware defined systems do very well, but that's where the software defined systems are still quite weak as of today, because you need to define a very clear setup, you need to define your hardware parameters very well. So how many so what's your CPU, your network, your PCIe bus resources, all of that need to be defined very well so that you can measure and expose an SLA from a system perspective. And that's where I thought really, look, CEPH has a great foundational technology. I really like their placement, distributed placement algorithm. They're very resilient and reliable and really scalable, but let's think of something that has a much higher performance density on the one side, and on the other side, will give us the opportunity to put this SLAs in place. So these were my thinkings around that. At that time, but then later on, this evolved because clearly, as we were told in the early stages, the elephant in the room is the cloud. So what's going to happen in general in the IT? We see still in some areas, the low adoption of the cloud, but at the same time the growth rates are clear and not disputed and the question is how much will be on prem in five or 10 years. So there can be different arguments on that, but anyways, having something that works for the cloud and in the cloud. This is a strategy that will give you more, let's say, potential to be successful. And that's where our little bit, let's say, not really pivoting. So the base technology is still the same. The thinkings are the same, the algorithms are the same, but then how to deploy it or how to make it native to a cloud environment and sync the cloud first, actually that's where our journey pivoted a little bit over the last two years, I would say. Chris Engelbert: Yeah, I think that is a perfect bridge because the next question would be like with Simplyblock, we said, okay, let's integrate- as you said, we want to make it like a first class citizen of the cloud. So we said, let's go for the storage services that are already available in the cloud. There's multiple ones. There's object storage, there's block storage, there's like local storage, which is like some kind of a PCI express attached NVMe or however they implement that. But we said none of those services in itself is a good solution. So we did something special. Maybe you want to say a few words about that. Michael Schmidt: Sure, absolutely. So I would say what we have been learning and we have been learning early on, which is the good news, so we don't have to basically redo our architecture because we were thinking that from more or less the beginning the cloud is different. So the cloud comes. You can't just build a software defined SAN system and deploy it into the cloud. It's not how it works. On the one side, the cloud has some deficiencies in terms of storage that are already solved in on premises high end SAN or generally storage appliances. But on the other side the thinking is entirely different because users are used to consume storage as a service in the cloud. Obviously, that's why they are in the cloud. They want to consume as a service. They don't want to deal with a million of parameters and tuning options. They don't want to- They don't require a lot of those things on the one side. So that's, really different. And then on the other side, you have quite good services there. And the question is not so much how to rebuild the services from the scratch, but more like how can you orchestrate them in a way to overcome these deficiencies that we currently see in the cloud. So there're a couple of, them that really struck me actually, when you think about in a bit more detail we know that for example, thin provisioning. Thin provisioning is really something that any user of an enterprise storage system, it's just like bread and butter, it's what they are using for the last 20 years. And you can't think out of without it. So it's obvious. Everything is simply provisioned. You have a pool of storage and then you have your storage consumers, your hosts, your containers, whatever. And they will just- they can ask or reserve a certain amount of storage, but only during utilization, the storage will be really, let's say, taken from the pool. And this gives us a good opportunity to over provision because a lot of those users will finally not take what they have asked for in the beginning, but only a portion of it and some will take more. So that's, that's really the huge benefit. And it goes away in the cloud at least, to a certain extent, not completely, but when we talk particularly about block storage. Block storage in the cloud, you don't have this pool that you manage, but you have to basically provision each volume individually, and you have to think in advance how big that volume should be because you're paying for it. So you're paying not for the actual utilization, but you're paying for the provisioned amount. And that can really become a problem over time because this plan, this amount of planning is very hard to do and very often impossible. It's just not predictable on a volume level how much storage you will need. So that's really one of the things that struck us, how big the inefficiencies are around of non-use of thin provisioning or this option, this nonexisting option of sim provisioning. So that's one of the thing. Another thing is clearly, storage tiering. So storage tiering exists to a certain extent in this cloud offering. So for example, Amazon S3 has internal intelligent tiering between different types of storage, but it's only within S3, within object storage. But when we think about particularly high performance block storage, they don't have that option at all. So there is no tiering. So if you take a volume. Everything on this volume comes with the performance that you asked for and with the costs that relate to this performance. So there is no option really to tier parts of the storage dynamically or smartly into lower performance and much cheaper pools. And that's another thing that we clearly found missing there. So that's something that we incorporated very much into our philosophy. Other things are more about for example, disaster recovery. So when we see block storage, fast block storage in the cloud, and here we have to differentiate a little bit between the providers because they are not all the same. GCP has more. AwS has a little bit less. So AWS, by default, doesn't have asynchronous replication between zones. It means if you lose your storage, your block storage, your EBS, your GP3 volume in one zone, your data is just gone. So in the worst case, you have a zone level disaster, you lose your data. And that's not acceptable for let's say, a lot of data that has to be stored persistently in a reliable manner. So then you need some workarounds for that. And it becomes much more complicated to resolve that issue, let's say. So that's, another deficiency. Classically, outside of the cloud, you had your two SAN systems between two different data centers or availability zones. They have replication on a dark fiber or whatever. Everything works perfectly. This is something that got somehow lost in the cloud, or at least in the default setting of the cloud, just got lost. Yeah, so cyber security aspects, I don't want to go too deep here, but also things that storage vendors have spent a lot of time to build into their hardware defined systems. And this functionality is also sort of not to the same, available not to the same extent in the cloud. There's parts and pieces available, but also still quite some things missing. So all of that. And yeah, last not least that goes to performance. So I would say that's a problem that exists outside of the cloud as well, but not to that extent. You have now with this fast NVMe drives that are really like it's 1 million times faster than hard drives in some cases, even I would say, or 500 times faster, so this is just such a huge performance difference that the local disk in the local server is so much faster than anything else, whether that's a SAN system outside of the cloud or a cloud block storage. And the cost difference per IOPS and per performance unit is just tremendous. So the question is, how can customers leverage on that benefit without having to consume that expensive either SAN system or cloud block storage? And what's another, I would say, major part of our value prop where we are coming in, how we can give the customer more IOPS, significantly more IOPS, significantly better access latencies through utilizing these local disks and making them reliable in a sort of a scale out storage cluster than this is possible with the existing block storage solutions. Chris Engelbert: Yeah, I think you brought up like, at least for very, very important things. It's like tiering. It is like the asynchronous replication. Forget what the third one was that I was thinking about, but all of those things are elements that people tend to, Michael Schmidt: I would say if you- Chris Engelbert: thin provisioning, right. Michael Schmidt: Thin provisioning, tiering, and then yeah. Chris Engelbert: But thin provisioning is slightly different from- So from my perspective, especially like the two former parts are things that people typically try to solve by implementing them into their own solution, right? So everyone's building the same thing over and over again. And in the past, when I was working for Hazelcast we were like one of the big vendors for caches and it was the same thing, right? Everyone just built their own cache and like, don't do that. A cache isn't as simple as you might think. And it's the same thing. A tiering solution, a good tiering solution isn't just like, oh that one isn't used for like 20 minutes, just send it somewhere else. You have to think about so many things. You have to make sure that if somebody wants that. And you see, okay, the access pattern changes and it becomes suddenly hot, you want to get this back into the fast storage. It's like super, super complicated stuff that people, I feel often kind of fail to really acknowledge. Michael Schmidt: Yeah. And then the question is, what's the benefit? I mean, build versus buy. So can you, I mean, the question - there are two aspects on that question every CTO, CIO has to ask themselves. So number one, what's the integration effort? The integration effort of a storage solution is close to zero because we run on a standard NVMe interface and basically that's available on each host and in each container. So moving from one storage solution to another is, I wouldn't say zero effort, but compared to a database migration, it's nothing. So you have very little integration effort or integration risk with choosing a storage solution and there is really literally no lock in. So that problem really goes away. So that's really a clear let's say plus for buy versus build. And then the other thing is how complex is that what you're building there? Wouldn't you maybe underestimate the complexity of storage solution technology on its own? It's quite complicated and it takes several years and there was at least several years to get it right. And I don't know anyone who found a shortcut to do it faster. All of these systems until they're production ready, three to four or five years. So that's, one thing to think about. So yeah, you build for one feature, but then you realize you need another one and a third one, and that's not working. And finally, you have a roadmap of five to 10 years, getting all that into your product to get something that you could just buy from the market. And then, yeah, the question, can you do something that is more specific to your particular technology? I adopted, because all that we do is basically exposed through this NVMe interface. And in effect, it will be very hard to get a better featured product or a more specific product for your own, let's say, technology stack that can do more for you. Chris Engelbert: Yeah, just for the people that don't really get the idea of like an NVMe device. Imagine it to be a general hard disk. It looks like any hard disk to whatever you can think of, just fast. Like, the stuff you put in your computer at home these days. We already crossed the 20 minute, like three minutes ago. God. Two more questions. When I want to get started with Simplyblock today, how would I best do that? Like, I feel like I really want it now. Michael Schmidt: Yeah, I think it's so at the moment it's, getting easier and easier to deploy us. So basically all you need is usually a Kubernetes cluster with some workloads that are currently served, if you are in AWS with GP3 volumes or maybe local instance storage, and you would install our CSI drivers that comes with the Helm chart. You can do that installation, you can install, you would provision probably. So you have two options, so either you provision a few instances with a local disk attached. So AWS has a lot of those options out there. We can advise you which to use best in your setup, or if you think that's not what you need, you can stay with GP3, so then you can stay with existing instances that do not have local instance storage after installation of the CSI driver. You would also install a few storage nodes. That's also part of the Helm chart. So the storage nodes are essentially containers running on your compute worker nodes. And after that or before that actually install the control plane, that's two or three separate nodes at the moment. You would have to do that yourself in the near future, you can connect to our control plane, which makes the way to a pilot even much faster. So basically, yeah. That's all you need. Overall time, a few hours, I would say two to three hours ready to go. Chris Engelbert: All right. So you heard it here first, folks. Call us. Ask us to get started. Last question, then. What do you think- It's basically always the last question, like what do you think is like the next big thing in storage, in database, in Kubernetes, whatever you can think of? Michael Schmidt: Yeah, I think still when we talk about deployment models in the cloud, it's still, I think, the serverless model that will be gaining adoption. So that's what we hear a lot from customers that they are looking into that direction. And that means entirely decoupling compute and storage as it's partly done today. But I think that trend will continue with serverless. So that's certainly something that's yeah, very important aspect. I also do believe that more and more workloads, stateful workloads, will remove- will move from proprietary services. So databases, for example, running currently in AWS RDS, they will move into Kubernetes. The Kubernetes ecosystem is really ready for that. So you have a huge amount of support in the environment to automate your database operations. And make them safe and scalable and reliable. And storage comes with that. So storage is a natural extensional point to a very important question in that ecosystem. So I believe that's really what we can expect to see in the next one to three years. Chris Engelbert: All right, that makes sense. As long as you don't say storage will be solved by AI, I think we're all good. All right. Yeah cool. 26, 27 minutes. Thank you for being here. It was a pleasure. Is there anything else you want to shout out to the world and let the people know? Michael Schmidt: Yeah, I would say we're very happy if you just approach us and let's talk about what we can do for you. I mean, that's really where we are right now. We're looking for partners. We're looking for early adopters. We're looking for people who really think next generation in storage and understand that there is a gap that can be solved. So, yeah. Chris Engelbert: Awesome. Thank you very much. Yeah. It was a pleasure having you and doing this show slightly different than we normally do. Normally it's all about, not about us. This time it was very different. So thank you for being here and you're gonna be in San Francisco for the Storage something conference. Can you- Michael Schmidt: SNI, yeah, exactly. So I'll be there on- Chris Engelbert: So in case you're in San Francisco and you go to this storage conference you should certainly join his talk. I've seen it. It's really good. All right, folks, thank you very much. You know the rule next time, next week, same place. I hope you listen in again. Thank you very much.

A Brief History of Simplyblock | Michael Schmidt

Cloud Frontier

Cloud Commute