The 10 Coolest Big Data Products Of 2014 (So Far)
Hot Market, Cool Technology
Market researcher Wikibon calculates that the worldwide big data market hit $18.6 billion in 2013, of which $4.1 billion, or 22 percent, was software. (Services accounted for 40 percent and hardware 38 percent.) Wikibon projects the total market to reach $28.5 billion this year and $50.1 billion in 2017.
No wonder new big data products seem to be hitting the market every day. And many of them are very cool. Here's 10 that particularly caught our attention in the first six months of this year.
Cloudera Enterprise 5
CEO: Tom Reilly
In April Cloudera launched the latest release of Cloudera Enterprise, the company's enterprise data management platform with its CDH distribution of Apache Hadoop at its core.
Cloudera Enterprise 5 includes YARN (Yet Another Resource Negotiator), the advance resource management technology built into Hadoop 2.2, for managing multiple resources. New data controls and reporting and auditing features improve the software's governance and compliance capabilities. And the product's security and data protection is enhanced through new centrally deployed security functionality through Cloudera Manager and Cloudera Navigator.
Databricks Cloud
CEO: Ion Stoica
One of the most significant recent developments in the big data arena has been the May release of Apache Spark, an open-source, in-memory processing engine that supercharges the data analytics performance of the Hadoop big data platform.
Databricks was founded by some of Spark's developers and last month the company launched Databricks Cloud, which is built around the Spark technology. The hosted platform, now in beta test, simplifies Spark implementation and provisioning, and comes with a set of built-in applications for accessing and analyzing data. A business, for example, can use Databricks Cloud to speedily process and analyze data stored in Amazon S3.
DataStax Enterprise 4.5
CEO: Billy Bosworth
DataStax is among the many startups challenging established relational databases such as the Oracle Database and Microsoft SQL Server with a next-generation database architecture. DataStax develops its DataStax Enterprise (DSE) distributed, NoSQL database management system on the open-source Apache Cassandra that's capable of managing huge volumes of data across a large number of commodity servers.
DSE 4.5, released in late June, incorporates the Apache Spark in-memory processing technology (through a technology partnership with Databricks) to boost DSE's real-time data analysis capabilities. New automated diagnostics and performance-tuning tools boost the software's management services. And for the first time the product is fully integrated with Hadoop (through partnerships with Cloudera and Hortonworks), providing the ability to merge Cassandra data with data from Hadoop and other sources -- making it possible to combine operational and historical data for analysis.
Guavus Reflex 2.0
CEO: Anukool Lakhina
Analyzing data in real time can be magnitudes more complex than analyzing static data in data warehouse. Guavus has been pushing the boundary of what's possible in real-time business analytics with its Guavus Reflex Operational Intelligence Platform.
Last month the company debuted Reflex 2.0 with support for Apache Spark, the in-memory processing engine that supercharges Hadoop's data analytics performance, and YARN, the advance resource management technology built into Hadoop 2.2. That boosts the software's ability to analyze data as soon as it arrives or is generated. The product is particularly suited for service providers and large data center operators that need to analyze network data to detect system anomalies, identify and prevent fraudulent activity, and respond to customer online activities.
Hortonworks Data Platform 2.1
CEO: Rob Bearden
Hortonworks debuted the new release of its Hadoop distribution in April, adding new SQL query technology to boost the speed and scale of Hadoop application queries. Hortonworks Data Platform 2.1 includes Apache Hive 0.13, the fruits of the Apache Software Foundation's "Stinger" initiative to boost SQL query performance and provide interactive query capabilities at petabyte scale.
HDP 2.1 also adds the Apache Falcon technology for improved data governance around Hadoop, Apache Knox for perimeter security, the Apache Storm processing engine for improved real-time stream processing, and the Apache Solr search technology.
MongoDB 2.6
CEO: Max Schireson
MongoDB is one of a number of next-generation "NoSQL" databases challenging the dominance of relational database products from Oracle, Microsoft and others found in most corporate data centers today.
MongoDB, a cross-platform, document-oriented database, is designed to help organizations manage their exploding volumes of unstructured data. MongoDB 2.6, released in April, offers new text search features and tools for running ad-hoc analyses, expanded security functionality, new tools for manipulating large data volumes and summarizing/aggregating data, enhancements to MongoDB Management Services for simplified management, and improved scalability and performance.
Numerify 360 for IT
CEO: Gaurav Rewari
IT managers need business analytics too. With that in mind, Numerify came out of stealth mode in April and launched its cloud-based Numerify 360 for IT, a turnkey application that uses analytics to provide managers with a 360-degree view of IT service operations.
Tapping into data generated by ServiceNow, the Platform-as-a-Service IT service management applications, Numerify 360 for IT integrates information from operational and financial systems into a cloud-based data warehouse that IT managers use to examine such questions as service level agreement (SLA) adherence, root cause analysis, workload and backlog management, cost of service reduction, and IT asset utilization.
SAS In-Memory Statistics for Hadoop
CEO: Jim Goodnight
SAS developed this interactive analytics programming environment for the Hadoop framework based on the vendor's in-memory technology that powers other SAS products such as SAS Visual Analytics. The in-memory capability provides a major performance boost for users who are trying to manage, explore, score and analyze massive volumes of data in Hadoop.
SAS In-Memory Statistics for Hadoop supports numerous statistical and machine learning modeling techniques including clustering, regression, decision trees, text analytics, recommendation systems and generalized linear models.
SiSense 5
CEO: Amit Bendov
SiSense develops analysis, reporting, visualization and dashboard software that helps everyday users make sense of huge volumes of data. One of the product's key capabilities is its ability to join huge data sets from multiple sources and combine them into one database for analysis.
In February the company launched SiSense 5, a release that brings all those capabilities to tablet computers, smartphones and other mobile devices, in addition to desktop computers. The software includes new push notification and drill-down capabilities the company said is designed to encourage wider adoption of the application. The release relies on the "In-Chip" analytics technology SiSense unveiled last year.
Splice Machine Hadoop RDBMS
CEO: Monte Zweben
In May Splice Machine launched its long-awaited Hadoop realtime relational database that's designed to help businesses get around Hadoop's batch-analytics limitations, providing a full-featured, transactional SQL database on Hadoop that can run operational applications and realtime analytics.
Splice Machine is pitching itself as a best-of-both-worlds alternative to traditional relational database such as the Oracle Database and Microsoft's SQL Server. Database architects and application developers can build real-time applications that work with huge volumes of data without giving up their SQL technology and expertise.