The 10 Coolest Big Data Products Of 2013

Coolest Of The Cool

Big data was one of the hottest areas in the IT industry in 2013. The rapid growth in the volume, variety and velocity of data flowing through businesses' IT systems is growing exponentially -- and so is the demand for leading-edge technologies for capturing, managing and analyzing all that data.

So it's no surprise there's been an explosion this year of innovative big data products from both young startups and more established vendors. Here are the ones that particularly caught our attention in 2013.

0xdata H2O 2.0

0xdata develops software for conducting advanced statistical analysis against data stored in the Hadoop Distributed File System. In October, the startup revealed the second generation of its H2O machine learning and predictive analytics engine.

The company's goal is to bring big data statistical analysis capabilities to a broad audience of users who don't have degrees in statistical analysis. The company is the brainchild of SriSatish Ambati, a former engineer at DataStax and co-founder of Platfora, both of which are leading big data software companies.

ClearStory Data Intelligence

Startup ClearStory Data is developing platform and application software that helps everyday business users access, explore and analyze big data collected from internal and external sources, including corporate databases, Hadoop and the Internet. The company's goal is to bring big data analysis to a broad range of information workers.

The software, with its collaboration and data visualization capabilities, debuted in October. It's currently available through the company's early access program.

Cloudera Enterprise 5

In October, Cloudera began offering a public beta release of Cloudera Enterprise 5, the fifth generation of the company's big data platform, and CDH 5, a new release of the company's Hadoop distribution. Both incorporate Apache Hadoop 2, the latest release of the open-source Hadoop software.

Key enhancements to the new Cloudera Enterprise 5 include unified management of third-party applications, the ability to cache data sets from the Hadoop Distributed File System (HDFS) in-memory, and incorporation of YARN (Yet Another Resource Negotiator) for improved resource management for running multiple frameworks for data processing and analysis on a single cluster.

The new software offers several new capabilities for managing and exploring big data. And data protection is improved through HDFS and HBase support of snapshots to prevent data loss.

DataStax Enterprise 3.2

There's a growing number of vendors challenging traditional database vendors like Oracle and Microsoft with next-generation "NoSQL" database products. DataStax has become one of the more visible players with its Apache Cassandra-based DataStax Enterprise (DSE) database.

In November, DataStax launched DSE 3.2, providing what the company said is the first NoSQL database with built-in automatic management services that allow IT administrators to more effectively manage database clusters and optimize them to handle user demand.

The vendor has claimed other firsts in the NoSQL database realm, including the first comprehensive security features for NoSQL and the first visual developer tool.

Hortonworks 2.0

Hortonworks, Cloudera's chief rival in the Hadoop world, announced the general availability of its Hortonworks Data Platform (HDP) 2.0 in October. The new release of the company's commercial distribution of the Hadoop big data platform is built on the recent Hadoop 2 release from the Apache Software Foundation (ASF).

A key enhancement to the new Hortonworks release is inclusion of YARN (Yet Another Resource Negotiator), a new Hadoop technology that allows developers to use programming frameworks other than MapReduce. Also in the Hortonworks 2.0 release is technology from the company's Stinger initiative that improves the speed and scale of SQL semantics support by Apache Hive.

JethroData SQL On Hadoop

After two years of development, JethroData in October took the wraps off its SQL on Hadoop technology, what the company describes as the first analytic database that runs natively on Hadoop.

While Hadoop, with its Hadoop Distributed File System, is good for storing large volumes of data, it wasn't designed for data analysis tasks. JethroData's database software provides the indexing and columnar structure needed to run analytical queries against that data.

The software is currently in an early-release mode.

Platfora 3.0

After more than two years of development and beta testing, startup Platfora launched its native in-memory business intelligence platform for Hadoop in March. The company followed that up with a major new release, Platfora Big Data Analytics 3.0, in October. The technology's value proposition is that it allows analysts to work directly with data in Hadoop, eliminating the need for complex data warehouse systems and extract, transform and load (ETL) tools.

The 3.0 release added event series analytics for analyzing Web logs, application logs, call center records and other event-related data. It also provides an entity-centric data catalog for organizing data around an entity such as a customer, business or product.

Splice Machine

Splice Machine, led by founder and CEO Monte Zweben, has been developing what it calls the industry's only real-time, SQL-on-Hadoop database. In October, the company launched a limited release program for the new software.

Splice Machine is developing the database as an alternative to traditional relational databases, such as Oracle and IBM DB2, for real-time, transactional big data applications. The company is relying on a limited number of customer evaluators to try out the technology, including validating specific use cases and testing SQL coverage and benchmark performance, before releasing the product for general availability.

Splunk Enterprise 6 And Hunk

Ten-year-old Splunk is one of the more established vendors in the big data arena. In October, the company launched Splunk Enterprise 6, a major release of its real-time operational intelligence platform for machine data.

The product's new pivot technology and drag-and-drop interface bring data analysis and visualization capabilities to non-technical business users and analysts. The release also sports enhancements that speed up the software's analytics, new data models to represent underlying machine data and relationships between the data, and a new high-performance analytics store the company said delivers analytics performance improvements up to 1,000 times faster than earlier releases.

Also in October, the company debuted Hunk: Splunk Analytics for Hadoop, new software for exploring and analyzing data stored in Hadoop.

Sqrrl Enterprise 1.1

Sqrrl has been getting a lot of attention this year, due, in part, to its founders, who came from the super-secret National Security Agency and helped develop that organization's massive database.

The company has been developing database software that's both scalable and secure -- the technology provides data security at the cell level -- to power big data applications. The 1.1 release in June moved the product from limited release to general availability, as well as added advanced security tools and enhanced analytic capabilities.