Why Google Cloud’s Data Analytics Tops The Competition: Debanjan Saha
‘We have been in the business of organizing the world‘s information for a very long time,’ said Debanjan Saha, vice president and general manager of data analytics at Google Cloud. ‘Data is in our DNA, and that’s what drives Google.’
The technology industry is in generation 3 of the cloud computing journey – the data cloud era – according to Debanjan Saha, and he sees Google Cloud, with its data analytics prowess, as having the best chance to lead it.
Gen 1 of the cloud journey was about bespoke SaaS applications for business uses such as CRM and ERP, said Saha, vice president and general manager of data analytics at Google Cloud, while gen 2 centered on cloud infrastructure, where rival Amazon Web Services (AWS) had a head start.
“Gen 3 is going to be around data cloud,” Saha said. “It is about using data to transform business. And this is going to be bigger than anything else, because this is not just reducing your costs by moving from on-prem to cloud. This will be about creating additional value by transforming business. Because of Google‘s investment in data -- and experience in data and analytics over a very long period -- we really have an opportunity to lead this and help our customers really take advantage of cloud in a very different way than what they have been able to do in gen 1 and gen 2 of the cloud journey. And that’s what I’m really excited about.”
Saha, who previously served as a vice president and general manager running managed database services and cloud-native databases at AWS, outlined the top reasons why he believes Google Cloud’s data analytics capabilities top the competition.
Google has been in the data business for over 20 years – “long before cloud became a thing,” Saha said – and many of the assets that it’s built for its internal products and applications are available to its enterprise customers.
“We have been in the business of organizing the world‘s information for a very long time,” Saha said. “Data is in our DNA, and that’s what drives Google. There are, I believe, 10 Google applications which have more than a billion users -- and this includes Google Search, YouTube, Gmail -- and all of these applications generate a lot of data. And we have built our data and analytics and AI/ML platform to use that data to better-serve our customers.”
The Data Cloud
The No. 1 differentiator for Google Cloud, Saha said, is that it has the “most complete portfolio of data analytics products in the market.”
That’s a combination of its cloud-native products that have been built over time -- initially for Google’s internal use – its managed open-source products such as Dataproc and Cloud Data Fusion, and partner products including Databricks on Google Cloud, Apache Kafka as a service with Confluent Cloud on Google Cloud Platform (GCP) and Elastic on Google Cloud.
Dataproc is a managed Apache Spark and Apache Hadoop service that allows users to take advantage of open-source data tools for batch processing, querying, streaming and machine learning, while Cloud Data Fusion is a fully managed, cloud-native, enterprise data integration service for quickly building and managing data pipelines.
“The reason for having all of these services is to give optionality and choice to our customers, but what makes it special is that we make it a platform,” Saha said. “Think of it like a data cloud -- that these are not individual products, but these are all tied together, so that people can do everything they want: collecting data, preparing them for analysis, doing the analytics on them, presenting them, presenting the insight in dashboards and many other formats, doing predictions with AI/ML.”
Standout Products
Saha called out several Google Cloud products as “best-in-breed” differentiators for its portfolio, including BigQuery, Dataflow, Looker and Cloud Spanner.
BigQuery is Google Cloud’s fully managed, petabyte-scale, multi-cloud analytics data warehouse. It was named a leader in The Forrester Wave for cloud data warehouses in March.
“There is no other cloud data warehouse like that,” Saha said.
BigQuery offers 99.99 percent service-level agreements, which is 10 times better than any other cloud data warehouse, according to Saha.
Google Cloud is working to expand the footprint of BigQuery with BigQuery Omni, which is powered by Anthos, its hybrid and multi-cloud platform. Introduced last July, BigQuery Omni is a multi-cloud analytics solution that allows users to access and securely analyze data across Google Cloud, AWS and Microsoft Azure (in the near future) using standard SQL and without leaving the BigQuery user interface. Now in private alpha, it’s expected to be generally available later this year.
“There are customers…who have data in multiple clouds, and they want to manage it from a single pane of glass and run queries, which crosses cloud boundaries,” Saha said. “For example, we have customers who have data related to their marketing campaigns and ads in GCP and BigQuery, and they’re running some of the applications in AWS. They want to run cross-cloud analytics between data stored in Google Cloud and data in AWS. BigQuery Omni, which is this multi-cloud BigQuery…is going to do that. This is creating essentially a data lake which transcends cloud boundaries.”
Dataflow is Google Cloud’s streaming analytics product designed to minimize latency, processing time and cost through autoscaling and batch processing.
“There is no other product like that,” Saha said. “This is something that we developed to build our ads, stream processing, all the event processing that we do for YouTube and Nest.”
Google Cloud closed its $2.6 billion acquisition of Looker Data Sciences in February 2020, adding its unified enterprise platform for business intelligence, data applications and embedded analytics to its portfolio.
“This is a very modern business intelligence platform, which is very, very different from other options available in the market,” Saha said. “We are very well-integrated with Looker in our portfolio. At the same time, Looker is also a multi-cloud, multi-product platform. BigQuery can be used with Looker, but we also have open interfaces to (Microsoft) Power BI and Tableau, etc. Looker also works with other data warehouses, like (Amazon) Redshift, Snowflake. We believe in an open and multi-cloud environment, and Looker is a perfect example of that.”
Analytics also have underpinnings on databases and storage, and Google Cloud has invested heavily in this area, according to Saha, who noted Cloud Spanner, its fully managed relational database service.
“Those are all very, very special products, and…they offer very different foundational characteristics like security and privacy, scale and performance, availability, etc., which is unmatched in the industry,” Saha said. “We deal with a lot of regulated industries, a lot of compliance regimes, and that is all embedded in our processes and in our products. Those are the foundational aspects where we are much, much better in my view.”
Feature/Function Differentiation
For a long time, analytics was about batch processing, meaning that data used to come in every six hours or every day, for example, and it would be loaded in the data warehouse for time-driven, daily or end-of-quarter analyses.
“What has changed recently is that people want to have real-time streaming analytics,” Saha said. “The data is coming in all the time. It is always fresh, and you want results and analysis very, very quickly and fast.”
General Motors-backed Cruise is a Google Cloud customer that has data constantly coming in from it self-driving vehicles that must be analyzed in real time. Similarly, Google Cloud customer Palo Alto Networks, a network security company, has various network security events coming from devices all the time.
“They have to get them in real time to BigQuery over Dataflow, and they have to analyze to figure out if there is a new threat that they need to address,” Saha said. “And if you think about Twitter -- the tweet stream is coming all the time, and they want to figure out the analytics of those Tweets, which will help them…with placing the right advertisements to the right people. Those are all examples where you have to ingest and analyze data and provide insights in real time, which we do really, really well because of a long history in doing this in the context of Ads, in the context of event processing for YouTube, Google Maps, as well as things like Nest.”
But while much of analytics is about asking questions about what has happened in the past, organizations also want to predict the future.
“You want to make sure what is going to happen tomorrow, so that you can plan your logistics,” Saha said. “We work with companies like JB Hunt and UPS, and they have to predict how many packages are going to arrive next day or next week, so that they can plan their fleet and their routes and their packaging. That’s where the boundary between analytics and AI/ML kind of blur.”
Google Cloud has integrated BigQuery with AI/ML with Big Query ML, which became generally available in May 2019.
“We can do predictive analysis using the data which is already there in BigQuery, and that is really, really popular with a lot of our customers,” Saha said.
Democratization Of Data Insights
Google Cloud’s democratization of data insights is another area of differentiation for Google Cloud, according to Saha.
“It is good that we have these powerful tools, but if it is in the hands of a few people -- let’s say the data scientist or the data analysts in the company -- we are not empowering everybody in an organization to take data-driven decisions and transform their businesses,” Saha said. “In order to do that, you cannot really teach everybody complex technologies. You have to bring technology to them, so that they can use whatever experience or education or skills that they have, but still get their data and insight and ask questions in whatever language they’re familiar with.”
That’s where Looker comes in, according to Saha.
“Looker provides a very intuitive dashboard and workflows, which make it embedded in the application that people use, so they don‘t have to learn anything new,” he said.
Google Cloud also has integrated its analytics platform with spreadsheet interfaces, so users don’t have to learn any new language. If they know how to use spreadsheets, they can use them on top of Google Sheets on top of BigQuery, and use BigQuery to do analytics and derive insight from the data stored in it.
Google Cloud also added a natural language interface for analytics on BigQuery data with Data QnA, announced last July to make it easier for non-technical users to access data insights by asking questions just as they would on a Google search bar. Now in private alpha, Data QnA is expected to be generally available later this year.
“That opens up the aperture in terms of not just data scientists and data analysts, but everybody, and that is empowering the whole organization to use the data for the business,” Saha said. “At the same time, it’s really good for business also. Imagine if your customer base now expands from 10 to hundreds to millions of users.”
A Google Cloud Partner’s Take
SADA Systems sees Google Cloud’s data analytics standing out among cloud providers every day, according to Miles Ward, chief technology officer for the Los Angeles-based Google Cloud Premier Partner that’s earned a data analytics specialization.
“The majority of my business in data analytics is re-platforming companies that have done some amount of data and analytics deployment on another public cloud and gotten frustrated and end up migrating to GCP’s more integrated, more direct, lower-cost, higher-performance stack,” Ward said.
That work accounts for a third of SADA’s professional services revenue, he said.
“Google’s been working on exabytes for a really long time, because that’s how big any of the analytics are about any of its products that serve billions of users,” Ward said. “No other environment has had to do the data work that Google does internally at anywhere near that size. And if you’re going to serve most of the products to customers for free, you have to be able to do that analytics not only quickly and huge, but cheap. By turning all of those tools loose with outside customers, you end up in this scenario where they earn access to a sitting set of capabilities that were designed for where they’re going to be in a couple of years, not where they already are. And the result…is BigQuery costs a lot less, and the queries get done in seconds, not in days, and there’s no operational work, because it’s completely managed on the Google side.”
And making its services available for multi-cloud work is an important partner of Google Cloud’s strategy that differentiates it from other cloud providers, according to Ward, who cited BigQuery Omni.
“There’s a whole ecosystem of products being built under this Omni brand,” Ward said. “You can run BigQuery in Amazon (Web Services). You can federate queries from data that’s in Amazon and Google, and you just pay Google the same way you would pay as if it was normal BigQuery only running on the Google side. That ability to meet more customers where they are -- and make it so that they can do this stuff without having to ship their data around and create these big logistical nightmares -- is another big differentiator.”