Cloudian CEO On The Importance Of S3 Data Lake For AI, ML

‘What we have seen in the past year or so is that AI is making everybody question their data strategy and their cloud versus on-prem footprint. People are much more sensitive in terms of where they store their data. So essentially, what we're seeing is that enterprises are realizing that data is very fundamental to their AI journey. And at Cloudian, the data lake always sits at the foundation of everything else,’ says Cloudian CEO and Co-founder Michael Tso.

S3-compatible Storage Growth To Meet AI/ML Requirements

Cloudian Tuesday closed a $23 million round of growth financing from Morgan Stanley Expansion Capital, bringing total funding in the San Mateo, Calif.-based developer of S3-compatible object storage technology.

For Cloudian, the new funding is not critical as the company has already passed the break-even point and indeed saw its annual recurring revenue grow by 30 percent year-over-year, said CEO and Co-founder Michael Tso.

Instead, that new infusion of funds will target growth for the company’s storage technology which is seeing strong pull as customers look for ways to store and manage data for artificial intelligence and machine learning uses, Tso told CRN.

[Related: Storage 100: The Digital Bridge Between The Cloud And On-Premises Worlds]

“Cloudian started as and still is the most [AWS] S3-compatible object storage that's out there,” he said. “We are committed to being completely compatible with Amazon and guaranteeing that anything that runs in the cloud with Amazon is going to run for you on-prem with Cloudian. That guarantee is very, very important now, because that's basically allowing you to future-proof your AI and ML journey. New tools are coming up all the time that are cloud-native. They're all born in the cloud. So they all speak S3 as its native language.”

Object storage and S3-compatible object storage have transformed from a slow backup and archive technology to a key foundation for building high-performance artificial intelligence infrastructures, Tso said.

“People think about object storage as slow, as secondary storage,” he said. “I can tell you that the fastest growing workloads on our systems are actually modern applications including AI and ML workloads. And for those workloads, people typically deploy Cloudian on all-flash gear. That is growing very strongly for us. We have customers with close to 100 petabytes each on Cloudian and running on all-flash across multiple data centers. And that is their primary data store. We provide tier-zero storage for these guys.”

There’s a lot going on at Cloudian and in the S3-compatible object storage market. Here is more from CRN’s interview with Tso.

How do you define Cloudian?

Cloudian is the leading provider of on-prem data lake solutions for AI and ML (machine learning). We provide software that runs on standard hardware that basically creates exabyte-scale data lakes on-prem for people who have latency and data sovereignty deeds.

What are customers using those data lakes for?

There’s a huge number of different use cases. One of our clients has 3,000 internal applications running on top of data that they have with Cloudian. It runs the gamut from traditional back-end backup and archive and ransomware protection to things like file sharing and research data, and all the way to AI and ML What we have seen in the past year or so is that AI is making everybody question their data strategy and their cloud versus on-prem footprint. People are much more sensitive in terms of where they store their data. So essentially, what we're seeing is that enterprises are realizing that data is very fundamental to their AI journey. And at Cloudian, the data lake always sits at the foundation of everything else.

You seem to say Cloudian already uses AI and machine learning as part of its technology. How does the company use AI?

We have been very active. We of course use AI for our internal operations and for our products. But more importantly, we announced connectivity to [Meta’s] PyTorch earlier this year. We already worked with TensorFlow. The piece which is really important is that Cloudian started as and still is the most [AWS] S3-compatible object storage that that's out there. We are committed to being completely compatible with Amazon and guaranteeing that anything that runs in the cloud with Amazon is going to run for you on-prem with Cloudian. That guarantee is very, very important now, because that's basically allowing you to future-proof your AI and ML journey. New tools are coming up all the time that are cloud-native. They're all born in the cloud. So they all speak S3 as their native language. By being the most compatible cloud out there and having a full commitment to stay alongside Amazon—we actually meet with them every quarter, and we align our roadmap—so we are by far the most compatible out there.

We are compatible with everything that's out there, the machine learning and all the AI pipeline tools including [Apache] Kafka, Spark, and Neutrino. All those different tool sets and data analytics tools that prepare data for deep learning support S3. So if your data is in a Cloudian S3 data lake, you never have to move that data. And that is really important because we all know that moving large amounts of data is very costly. And also for AI and ML, data doesn't cache well because you're scanning over the database only once so hierarchical storage doesn't really work that well. So you want to have a data lake where you have everything that integrates well with your entire pipeline and all the workflow you have going on.

Given that S3 is a standard that a lot of storage vendors say they're compatible with, and it is a standard published by Amazon, how can one company be more compatible than the others?

A couple points. One is that S3 is not like NFS (Network File System) where there’s a fairly small setup. NFS itself is not simple. But compared to S3, NFS is actually very simple. S3 has a very large set of verbs and error codes for API behavior. Number two, S3 is an evolving standard, with new functions coming up basically every quarter, and new API calls. And it has needed that evolution because essentially it was created to handle very large volumes of data in the modern era. When we started 12 years ago, we were the first company, the only company, to build what we call S3-native, which means that we implemented S3 as our only protocol. And there's no other APIs. That means we can clone it down to the minute details, every error code, every behavior. And that's why we're able to offer that guarantee. When we started, there were not a lot of other protocols. So to kind of keep their bets safe, a lot of players built some underlying storage layer with different protocol engines, different gateways, on top that translated, whatever protocol into their underlying system. Now that kind of approach, by definition, is never going to be 100-percent compatible, because if you're translating from one thing to another, you're not able to translate everything if your underlying thing does not support that error code. So for a long time, and even today, most vendors that claim they're compatible are not doing it the way that we're doing. So when you take any random workload that you have created that runs in the cloud and try to run it on prem on their system, it won’t work because all the details really matter. We're the only ones that started that as our main company mission, and we have continued with that.

It's great that everybody says they're compatible. But when we go head-to-head against anybody, and the customers run a POC (proof of concept), we will beat 100 percent of those because we are by far the most compatible. Not only do we support the greatest number of S3 functions, we also stay up to date. Any error code, any behavior that Amazon has we have, too. So we are actually the only one that Amazon's right now stands alongside their hybrid edge solution. We are the only one in the marketplace qualified to sell with Amazon Outposts and also Amazon Local Zones, which is their main hybrid edge space.

Cloudian this week unveiled a new $23 million round of financing from Morgan Stanley. What does that bring total Cloudian funding to?

It’s $256 million, including this new round.

Was Morgan Stanley the only participant in this funding?

We've reached a point in life where we have achieved breakeven. As always, when you don't need the money, that’s when the money starts to follow you. It's growth capital, money that we can put towards growth when we see the right opportunities. So we don't need a massive amount of investment. The funding environment is still not easy out there. But Morgan Stanley is a wonderful name, and it's a great long-term partner. They've actually been talking with us for about three years now as a potential financial partner.

With this new funding, what is Cloudian’s total valuation?

We are private. We have not disclosed valuation, and will continue to do that. There are a lot of companies out there that are kind of screaming their valuation. And we're generally against that because one reason you're private is because you don't have the same kind of rigor in the way that you do your reporting. And some of these companies have to restate things when they file for their IPO. I just don't think there's really any value in screaming a big number because that means new employees who at that point don't have enough upside. So we have not traditionally shared the valuation numbers.

You said Cloudian has already broken even financially, right? Any plans for an IPO?

Yes. I can't give you any specific plans right now, but it's encouraging from our standpoint to see other tech companies going IPO and the market be extremely receptive. So we don't have any specific dates right now.

Does Cloudian sell through indirect channels, direct channels, or both?

We are 100 percent channel focused. We do 100 percent of our deals through channels. There are some exceptions, but those are generally based on customer policies. If there's any way possible, even if it’s a deal we sourced and worked all the way, we generally try to find a partner to actually run it through. The way that we built the company is, we really want to be a very good technology provider, and we want to leverage the channel as much as we can. So it’s a pure channel play from our sales strategies.

What kinds of services can channel partners take advantage of?

It really varies, of course. It's a whole spectrum. There are partners who do nothing other than transact on their paper, and there are those that go all the way of providing the first line of support. A lot of our partners provide professional services or anything their customers need. I would like to build a product that is fairly straightforward, so people don't need a lot of professional services. In general, when a customer sets up a system, initially they usually only need maybe a half-day, a few hours, of remote help just what watch over the install. It’s pretty straightforward. Everybody can actually do it on their own. But we also enable our partners to do that because our philosophy is, the more money our partners can make, and the more sort of interactions they can have with a user, the more it actually benefits them, and then the more is going to benefit us. We have about 1,000 customers now. And we're not going to go out there and talk to every single customer every day, right? But I have hundreds of partners out there and all the customers work through a partner, so the partner can go and talk to them and see what they need. And if the customer needs some help with something, the partner should be able to do that professional services work for them, so they don't have to wait for us.

Has Cloudian made any acquisitions?

We have. A few years ago, we acquired an Italian file company that we launched as HyperStore, and is now called HyperStore File Services. And we continuously look for technologies that are complementary to us, with cultures that are similar to ours, and where we would be able to grow faster together.

What are some of your strategic focuses for the rest of 2024?

There are things I could talk about, and other things I cannot talk about. The overall focus of the company is really on profitable growth. Our ARR (annual recurring revenue) is growing at 30 percent a year. And we would like to continue focusing on our strategic partnerships with Amazon, Lenovo, and HPE. They all resell our products. That is, I guess, a channel-related focus because a lot of our transactions end up going through an HPE, Amazon, or Lenovo partner as well. It's a very scalable model.

And then in terms of just product, we are announcing more and more integrations with different AI tool sets. I can't talk about everything else that's going on, but we have some exciting news that will come out a little later this year. So we are continuing on our journey of not only being the most compatible, but also offering the highest performance, which is kind of unique. People think about object storage as slow, as secondary storage. I can tell you that the fastest growing workloads on our systems are actually modern applications including AI and ML workloads. And for those workloads, people typically deploy Cloudian on all-flash gear. That is growing very strongly for us. We have customers with close to 100 petabytes each on Cloudian and running on all-flash across multiple data centers. And that is their primary data store. We provide tier-zero storage for these guys. And typically, it's in the banking sector, transaction processing, fraud detection, that kind of stuff.

So we're tuning our product to be able to perform better in those kinds of environments. And according to our customers, we are equal to or better than a lot of the guys out there selling all flash-based hardware solutions. They do a lot of tuning for their hardware. We are software-based, so we don't try to tune for a particular platform, but we tune for media. We could be running on an Intel box, a Supermicro box, an HEP box, a Lenovo box, and if the media is faster, our software runs faster. That's really key for us. That allows us to work right away as new flash comes out. And as new servers come out, things are always faster, always cheaper, and we're able to immediately take advantage of that as opposed to a more traditional approach where you have some hardware and you choose your software for the hardware, but then you're kind of stuck on that hardware for whatever time.

You mentioned Amazon, HPE, and Lenovo as strategic partners. Do any of those three companies have equity stakes in Cloudian?

One of them does. I'm not sure if we've announced anything.