Cohesity CEO On AI, Data Protection, Data Insights And Impacts On The Channel
Sanjay Poonen, who joined Cohesity two years ago as CEO, shed light on his company, the value of data and more at this week’s 2024 XChange Best of Breed Conference in Atlanta.
Cohesity was founded to develop technology for protecting and managing data, but more recently has gone beyond that by helping customers find value in their ever-growing stores of data.
Sanjay Poonen, who joined Cohesity two years ago, told a group of solution providers at this week’s 2024 XChange Best of Breed Conference in Atlanta, hosted by CRN parent The Channel Company, that the initial release of the ChatGPT AI platform showed the potential for AI to search all of a company’s data, including hard-to-use backup data, for insights and to turn it into data that could be used for artificial intelligence applications.
“So I went to Microsoft and I asked [CEO] Satya Nadella, a friend of mine for many, many years, if he thought we could apply Open AI in ChatGPT to backup data,” Poonen said. “And he says, ‘Yeah, there's this technology called RAG.’ I had no idea. I went back like a kid in a toy shop and I said let's download every computer science paper on retrieval augmented generation and study it. That's kind of how startups had to work.”
[Related: Storage 100: The Digital Bridge Between The Cloud And On-Premises Worlds]
That resulted in Cohesity’s Gaia technology for doing AI-powered searches, which Poonen said also led to a relationship with Nvidia, which became an investor in his company.
Poonen also used his conversation with CRN’s Jennifer Follett and Steven Burke to discuss the status of Cohesity’s planned acquisition of Veritas, a much larger data protection and management technology developer. When the deal closes, it will combine the fast growth and innovation of Cohesity with the scale of Veritas.
“The growth rate comes down a bit, but profitability is significantly better,” he said. “At Cohesity, we felt rather than go public, which we could have done on our own, let's bulk up and go from number eight to number one. Number eight combined with number three makes us number one. It puts us in a much more profitable position. We are majority U.S.-driven, but 45 percent-plus of Veritas’ businesses is international, and they’ve got an incredible operation in countries that would take us years to get to.”
Poonen also answered questions about the return on investment of AI, and the status of AI when it comes to channel partners. His remarks offer a wide-ranging look at Cohesity, AI, the value of data, and a look at where the IT industry is going.
What is the secret sauce for Cohesity and what is the real opportunity for partners?
The partner community means a lot to me. I've built our businesses both at SAP and VMware with your help, and I hope we've reciprocated. I've always felt like the [channel] ecosystem is a force multiplier. So I want to say up front for many of you who've invested in me and our companies that we are deeply grateful, and we hope we can continue to do that.
As it relates to Cohesity, the founders of the company built a revolutionary technology about 10 years ago, in the cloud era. Mohit [Aron, Nutanix co-founder and Cohesity founder] built the first software-defined file system at Google. He took that tech to Nutanix, and the company became the first hyperconverged data protection [developer]. Then that became popular, because a year after that, Rubrik was formed with a [similar] architecture. You couldn't patent hyperconverged. And later, there was a company called Hedvig that Commvault bought. And now it has become normal. If you are building a data protection product today, you’re following Mohit Aron.
I watched his company with a tremendous amount of admiration. At VMware, we weren't in the data protection space, but we were helping Veeam, Commvault, Rubrik, Cohesity, and many of these companies. And I always thought Cohesity had the best tech. I watched Mohit. A lot has been done since then, but the basic principles of zero trust and security was built 10 years ago.
But this entire aspect of how we use AI to get value from that data, I did not see it. And for me the most remarkable aspect of the last 12 months is not just “AI washing” this capability, but finding real use cases. Think of data like an iceberg. All your primary data for 40 or 50 years, structured data, may have been in an Oracle database. Now it may be in Snowflake. Your unstructured data may have been in Filers, whether it was NetApp or Isilon, or in tape. And over time, the index data starts to pile up and separate at the bottom of the iceberg.
And two things happen. First, the bottom of that iceberg starts to get unmanageable, so you needed modern architectures to approach it, hence this entire move to a hyperconverged stack, which Mohit invented. But the second thing that happened was, a bad guy decided to go after your secondary data because if all your index data and all your time and series-based data and metadata is there, I'm likely going to be able to exfiltrate it, extract it, and you will pay the ransom. So it became the most popular attack vector for security.
The part we did not all see was that this data, once it was safe, could be an incredible [source of business insights]. Most often, people buried this data in a vault or on tape. AI techniques can get insights out of this large amount of data when we complete the transaction. Veritas sits on hundreds of exabytes of protected data. We can use our platform to get the value from that data. So we're very excited about the future.
Gaia is your generative AI platform, and you're utilizing retrieval augmented generation (RAG) to bring this value that you're talking about. Unpack this technology and talk about the opportunity.
Well, go back to that picture of that iceberg, and think about all the data, hundreds of exabytes of data, that we protect. I remember playing around with ChatGPT in the early days, about January or February of last year. ... It became very clear that with this summarization capability, the company with the most amount of data would win.
Who are those companies? Office 365 has lots of emails. Databases? Oracle, Snowflake, Databricks, because they have a lot of data. We set Cohesity’s mission to protect, secure, and provide insights into the world's data. The world’s data could be our platform. Well, could we use ChatGPT to summarize all the data in backups?
Think about the following use case. You have a million PDFs in a big bank or a hedge fund or wherever. Ninety to 95 percent of them are in backups, and 10 to 15 percent are in your primary storage. What if you could write a query that says, ‘summarize all my contract terms that I've had over all my PDFs and give me a summary of all the contract terms’ or ‘summarize legal discovery.’ …
What if you were to summarize backup with ChatGPT? No one had done that before. No one. Typically, to do this, you had to rehydrate the data, take it out of the backup where it's highly compressed, put it into another filer or something like that, and then run your algorithm on that. We discovered the first way to go directly to backup data, because a lot of that data was built as an index file system years ago, but it didn't have generative AI and RAG technology at the time that Mohit started the company.
I went to Microsoft and I asked [CEO] Satya Nadella, a friend of mine for many years, if he thought we could apply Open AI in ChatGPT to backup data. And he says, ‘Yeah, there's this technology called RAG.’ I had no idea. I went back like a kid in a toy shop and I said let's download every computer science paper on retrieval augmented generation and study it. That's kind of how startups had to work. In the early days when I was at VMware building our technology, you just studied the problem and then coded.
And that's what we did for the next six to nine months. This was last year. And then Nvidia came along. I've known Jensen [Huang, Nvidia CEO] also for many years. He was very helpful to [VMware] in the early years when they were only a $20-billion market cap company. … So Jensen was a good friend. I was talking with him, saying, ‘Hey, I want to show you what we've done with RAG technology. It's going to drive your GPUs.’ He spent 45 minutes and said he needed an investment in our company. So they put money in Cohesity.
And those two moments, my conversations with Satya and Jensen, showed me we're onto something here. So we furiously got to productizing what is now Gaia, and shipped it in March. Jensen had already put the money into our company. His team came and said, ‘We would like to feature you in our GTC [2024] conference.’ This is a big, mega conference, the mecca of all AI. We said we'd be tremendously honored. He featured three companies in his keynote: Snowflake, ServiceNow, and Cohesity, the only data protection company. And he said Cohesity backs up the world's data, and that Nvidia is excited to be working with them to build Gaia to allow you to get insights into your data. Like wow, he got it down in 30 seconds.
What are the ROI implications of being able to get to this data and make it useful?
Here’s a quick example, legal discovery, where you have a lot of unstructured data. I use PDFs as an example, being able to search and summarize for something that you want to you know. A bank was telling us they have a lot of loan documents, and they want to understand the terms they gave over the entire history of all loans. ChatGPT and generative AI is the best summarization technology for your first draft of a summary. That's the way to think of it. I want to summarize the full data, and then I want to improve on it. That's what ChatGPT does. It shouldn't be your final answer, but it's your first draft of a summary. So you want to think every time you're using that technology it may not be the final answer, but it's much better than hiring interns to go and read a million documents.
And that's what this technology does. One thing you've got to watch with these technologies is the permissions. I should not be allowed to recover or search a set of documents that I don't have permission to access. For example, I'm the CEO of a company. I'm not allowed to search another person's inbox in the company that's private to them. If there's a legal investigation, our legal team can search anybody's inbox. That's allowed because it's company property. But I should be allowed to search all my emails, even on the backup. … The permissioning scheme across search is very important, because you don't want the wrong people searching for a summarization of documents they're not supposed to access.
Secondly, when the result set comes back, the summary has to be believable. A classic example is to summarize Leo Tolstoy, but the characters that in the summary don’t match the people in the book. That's not believable. And that's called hallucination. The world is getting better at not hallucinating.
The good news in retrieval augmented generation, which is summarizing the documents that a company has, is it's their own data, so the chance of pollution is low. And with retrieval augmented generation, RAG, you build a better interface, you summarize [the data], and then you send the results set to an in-house LLM for the final summarization stage. … And if you don't believe the summary, it will actually send you a link to the source document. So then you go to your whole document and decide if you believe the summary. …
The industry is working on this. This is not just an ‘us problem’ or Nvidia or Microsoft or Amazon or Google or Anthropic or whoever. They're all working on both of these, and we feel, as we mix with those folks with the AI work we're doing, excited about the future.
You mentioned Satya Nadella. What’s he like? What do you come away with from those conversations with him? What are the lessons you learned?
He's a remarkable man. I've known him since he came to the U.S. a few years after I came to the U.S. in 1987 as an immigrant to go to college at Dartmouth. He came a few years later. I've known him since he was the executive sponsor for SAP at Microsoft and I was the executive sponsor at SAP for Microsoft. We got to know each other, and he's just an incredibly humble, approachable person. At VMware, we did a lot of things with Microsoft and Azure. In the early years of VMware, Microsoft was an arch enemy. I think in the early years they were trying to squash VMware because of Hyper-V, and then at some point they gave up. But I found him to be very approachable about the joint things we could do together as opposed to the small piece of overlap [between the companies]. He was very good at thinking about, ‘Okay, let me do a small piece of overlap, but one plus one equals three. Let's look at the broader aspect of it.’
His first public event, when he became CEO, was going to Apple to talk about Microsoft Office products on iPad and iOS. I mean, could you imagine [former Microsoft CEO] Bill Gates going to Apple? That would never have happened in the past. But he was like, ‘Listen, the broader good for society is that iOS has Office on it.’ And now, Office is one of the more popular tools on iOS. So these ways of thinking about having as few competitors possible, that's how I've sought to operate at VMware, and certainly now at Cohesity.
You got people’s attention with Cohesity’s [planned] Veritas acquisition. Veritas is an old-school data protection software company. Cohesity is cutting edge GenAI and security. What does that bring to you? What's the opportunity for partners as you bring these two companies together?
We announced it February 8 this year and expect it to close by the end of this calendar year. When I joined Cohesity two years ago, again, tremendous tech, the best tech, we talked about the five S’s: speed, scale, security, simplicity and smarts. The tech is very differentiated. But data protection is a crowded market. I call it sort of a New York marathon with 50 players. I don't know why venture capitalists put so much money into this space. This was all before the ransomware and security and AI things we're talking about. There are 50 players in the space. The number one player I think has 11 or 12 percent of the market, and we're number eight. We're going the fastest. We're 5 percent of the market. I felt if we were to change the market, I want to play at number one in this space. …
Veritas does very well in marketing. Extremely sticky. It's hard to replace them. We were doing well with the modern tech, but they were really strong internationally. Many of the patents and IP are there. And we were able to come up with a deal that was very attractive where we're going to acquire Veritas data protection business, the product called NetBackup, and we have a very aggressive roadmap. Think of them like a BMW and us like a Tesla. We're going to have a console that's common for both, and we have a very rich roadmap for both products so customers on both sides will get innovation. But powerhouse innovation is what Cohesity brings to the picture. You could think of the new company, which is going to be called Cohesity, as having the speed and innovation of Cohesity combined with the scale of Veritas.
We're about a $549 million [revenue] company. We announced that number at the end of our last fiscal year, growing nearly 30 percent, 10 percent free cash flow. That’s our independent Cohesity status. And then the Veritas piece gets us to about a $2-billion [revenue] business. The growth rate comes down a bit, but profitability is significantly better. At Cohesity, we felt rather than go public, which we could have done on our own, let's bulk up and go from number eight to number one. Number eight combined with number three makes us number one. It puts us in a much more profitable position. We are majority U.S.-driven, but 45 percent-plus of Veritas’ businesses is international, and they’ve got an incredible operation in countries that would take us years to get to. The new [Cohesity] will be a much bigger, more profitable, public company when we integrate it. And that's the plan. …
The good news is, I've had conversations with many of you. Partners have been asking about the roadmap. We've talked about that privately. I've talked to customers that want to know where we're going. They're very excited. The Veritas customer base is incredibly excited about the innovation we have planned. Everything we're doing in security and AI is additive. The only overlap is a little bit of NetBackup versus Cohesity DataProtect. But everything we’re doing in security and AI, the Veritas customers get to benefit in a big way. For example, Gaia should work for every NetBackup customer. A lot of the things that we're doing in security—DataHawk [ransomware protection], FortKnox [cyber vault], and our cyber recovery orchestration—will work with them. When we put this together we’ll have twice the R&D team of our modern competitors. So this is all about innovation.
You mentioned global scale. What do you see in terms of channel maturity to take your AI story to market, and how important is that global scale?
We built a lot of our AI strategy in the back of four companies—Nvidia, Microsoft, Amazon, Google—and they're all global companies. This is not a U.S.-only play. But a lot of use cases start off in U.S. companies with us and our closest partners in the United States, and then can play out globally with what I call NATO-friendly G10 countries: the U.K., Germany, France, Japan, Australia, Canada, India, Netherlands, and over time other places. We think there's a tremendous opportunity to take the same playbook to many of those G10 friendly countries. …
We are channel-first. Today, we’re 100 percent through the channel, and we’ll continue that. But the way we think about our ecosystem is the following. There’s a set of tech partners, hardware partners, companies like Dell, HPE, Cisco, IBM, and others. They’re all important. Many of them have invested in us. And then there are tech and service partners in the cloud: Amazon, Microsoft Azure, Google, Oracle. Maybe someday we’ll get to the China companies, but those are the top four. We’ve got the private cloud that I think will increasingly be VMware, Nutanix, [Red Hat] OpenShift Virtualization. And then there’s a set of security players [including] Microsoft, Palo Alto, CrowdStrike, Zscaler, Okta, Tenable. These are all very strong partners of ours. We've created the Data Security Alliance. Gartner recognized us for doing that the best of anybody. Now many of our competitors are copying that playbook, but we were the first to do this.
And then you get to the VARs, the systems integrators, the distributors and so on. That entire ecosystem is very much replicable [elsewhere]. It's not very different. It might be a different set of players, but Palo Alto plays there, Amazon plays there, Azure plays there. So we tend to bake the playbook here and then take it on the road to the G10 friendly countries. And if you are playing those countries, expect us to partner with you there. … Many of the global players are also very strong players there. Every one of my teams in the regions has to have a partner-friendly strategy for how they operate in that country for them to be an effective leader. By the way, it's not different from how I ran things at VMware. It’s the same principle. We were successful at VMware because we built our business on the backs of many of you. And that's what we're going to do again with Cohesity.
As you start talking with partners about Gaia, what is your assessment of the level of channel readiness for selling AI?
As I said earlier, be aware of AI washing. I think there are very tangible use cases for AI [and] there is a need for many of these AI models in certain vertical industries. I find Nvidia’s approach is very refreshing because they're taking a vertical industry approach to AI. And as a result of that, the infrastructure the channel sells, whether it's an HPE server or in the cloud, all work very well. I would encourage partners to really think through these use cases, and don't ‘AI wash’ your capabilities. Go look at a use case that is tangible. Test it with customers. It took us a long time to perfect the first use case for Gaia, and now we're starting to see some resonance. That is probably the most important advice I'd give to all of you.