How IoT’s ‘Massive Data Sprawl’ Opens New Opportunities
The growth of the Internet of Things market means an incoming, perpetually expanding flood of data, which creates all sorts of big new problems that can translate into big new opportunities for solution providers. CRN talks to executives at three vendors working in IoT, edge computing and AI about the different kinds of software and systems that can help organizations make sense out of all the IoT data they’re collecting.
There is a data tsunami coming and — well, it may already be here, largely thanks to the increasingly growing number of IoT sensors and endpoints that are sprouting at the edge. Consider Gartner’s prediction from a few years ago that the amount of data created by enterprises outside traditional data centers will reach 75 percent, much higher than the 10 percent figure it assessed for 2018.
The big problem is there’s not nearly enough infrastructure in place to process and analyze all this data, according to John Cowan and James Thomason, co-founders of EDJX, a Raleigh, N.C.-based edge computing startup. Thomason said based on some “back-of-the-napkin” math, he determined it would take 10 if not hundreds of trillions of dollars in additional spending on IT infrastructure over the next 10 years to keep up with the overflowing data processing needs at the edge.
[Related: 5 Big IoT Security Challenges (And How To Overcome Them)]
“No one has that amount of capital,” said Thomason, who is CTO at EDJX.
But that doesn’t mean there isn’t a solution, which is why Cowan and Thomason believe there’s a big opportunity for businesses that work in the edge computing space, including solution providers, to help organizations of all kinds harness the large amounts of IoT data they’re creating.
“It’s a hard thing for, I think, most of the industry to grasp,” said Cowan, who is EDJX’s CEO. “It’s a very abstract concept to say, ‘There’s not enough infrastructure to actually process the volume of data that’s coming off the sensors.’ Most of the sensor data gets stranded because there’s just no way to apply any kind of algorithmic processing to it, so when you start to think about it in that context, it becomes an enormous problem and an enormous opportunity to try and solve for that.”
Thomason said the world’s largest cloud service providers cannot solve the problem alone because the amount of edge data being created overshadows their collective footprint, which is why he believes the most economical approach to tackling IoT’s “massive data sprawl” problem is federated infrastructure.
To Cowan and Thomason, a federated infrastructure means tapping into a network of underutilized data centers and systems across the world, plus new edge compute nodes as needed, and using a serverless computing platform that allows applications to run on the network, whenever and wherever data needs to be processed. In this case, they’re talking about EDJX’s EdjOS operating system.
“The key architectural difference is that this is all event-driven, and event-driven is what liberates developers from the burden of understanding what is happening in the infrastructure,” Thomason said.
“You want the infrastructure to be smart enough to look at an event and then provision what you need provisioned at literally the software level.”
This means that an edge server, for instance, will automatically know what to do with a stream of video data coming from a camera, which Thomason said is different from how things have been traditionally architected in data center infrastructure, where connections have to be established beforehand.
EDJX is putting these ideas into practice with an ambitious new project called the Public Infrastructure Network Node (PINN) pilot, in Austin, Texas. Led by a cooperative research consortium called the Autonomy Institute, a PINN is being promoted as the “first unified open standard” for bringing together 5G connectivity, edge computing, radar, LIDAR, enhanced GPS and intelligent transportation systems into what essentially looks like a futuristic utility pole.
The vision is that cities will be decked out with PINNs so that IoT sensors and edge computing will be nearly ubiquitous, allowing data to be processed as close to the source as possible for a variety of new applications ranging from smart buildings to connected vehicles.
“What kind of application can you build when you can be sure that your user is standing 1,000 feet away from the computer it’s going to talk to on the server side? And so we think this is going to genesis an entirely new era of infrastructure,” Thomason said.
But EDJX’s platform isn’t just about running applications on shiny futuristic utility poles. The company is mainly building out its edge network by selling recycled servers from hyperscalers that are pre-integrated with EDJX’s software and then hosted at data centers, whose operators earn money based on consumption by EDJX’s users. This is made possible through partnerships with ITRenew, which provides the servers, and Virtual Power Systems, which provides power management software.
“We’re technically hyperconverged for the edge,” Thomason said.
EDJX is making a big play with the channel too, giving value-added resellers and system integrators the opportunity to sell and integrate servers and systems with EDJX software, own and host the infrastructure and provide managed services.
“There’s a whole new fertile ground that’s emerging in IoT and edge, and it’ll be highly rewarding and opportunistic for the channel,” Cowan said.
In-Memory Computing Can Help Process Edge Data Fast, Reduce TCO
Building out an economically feasible edge network is one important part of the equation when it comes dealing with the deluge of edge data. But another challenge with the towering tidal wave of data from the edge is the ability to process such data as fast as possible, in an economic manner, and one vendor believes in-memory computing is the best solution in many cases.
John DesJardins said his company, an in-memory computing provider called Hazelcast, has been making inroads with IoT applications because the company’s platform is able to process data in real time within systems that have constrained form factors, making it a good fit for edge environments.
“If you have a sensor that’s attached to some kind of piece of machinery, there’s data constantly being born, and that real time data generally needs a lot of context and needs historical data for enrichment, so the fact that we can work with both that historical data and the continuously changing event stream in real time is really what differentiates us,” said DesJardins, who is CTO of the San Mateo, Calif.-based company. “We’re able to unify these together so that you can take a look at what’s going on in real time and unify it with additional contexts.”
The benefit of an in-memory computing platform like Hazelcast’s, according to DesJardins, is that the software not only keeps data in memory, but it also creates a distributed memory layer for applications across multiple systems. This gives applications the ability to quickly make calculations on whichever node is receiving new data.
“We’ve just recently done a benchmark, where we had a 40-node cluster, and it processed a billion events per second, and the average latency was 30 milliseconds,” DesJardins said. “When you blink your eyes, that’s 300 milliseconds. So the average latency we’re able to achieve even with 1 billion events coming in per second was still under 30 milliseconds, or 10 times faster than blinking your eyes. That low latency also means we can just process more data and respond to it in more real time.”
That computational power can be used to host machine learning algorithms on Hazelcast’s platform and run inference applications on edge devices, according to DesJardins. For analyses that don’t require quick turnaround times, Hazelcast can be set up to send data to the cloud.
“If you can inject intelligence in real time, you’re able to create a whole different paradigm of an application as opposed to just applying simple business rules,” he said.
But Hazelcast’s in-memory computing platform isn’t just about being able to process data fast. It can also make edge computing more cost effective and reduce total cost of ownership because of the lower costs associated with SSDs and Intel’s Optane memory technology, the latter of which can be combined with DRAM memory in servers, DesJardins said.
“An in-memory platform can still take advantage of and provide a very efficient processing of real time data, but it can also handle more data and store more data because of the decrease in cost of SSDs and Optane, enabling you to get near-memory speeds with larger data sets,” he said.
Factories: A Major Source Of Untapped Data Potential
One major source of edge data resides within factories, and Monty Zweben of AI startup Splice Machine believes much of it is going to waste. Or, at the very least, the data is not being used effectively.
“I see a huge opportunity to take advantage of data that, heretofore, I think people are storing and not even using,” said Zweben, who is CEO and co-founder of San Francisco, Calif.-based Splice Machine.
One wellspring of data within factories, according to Zweben, is historian software, which is used by plant operators in control rooms to capture and analyze production data from various systems. The problem, Zweben said, is that most operators aren’t fully realizing the value of such data.
“What I’ve noticed in many of the accounts that I’ve been in is that the data kind of sits there,” he said.
The opportunity for solution providers, according to Zweben, is to take that historical data that has been collected from various systems and machines, clean the data and then run machine learning models on it to enable new applications like predictive maintenance so that operators can receive early warning signs of a system failure and make fixes before the system breaks down.
Zweben said Splice Machine’s software is built to handle such capabilities, from the cleaning of the data to building machine learning models.
“The opportunity is to take real-time machine learning platforms like ours and put them into the plants, connect them to both the historian for historical data to train as well as into the real-time feed of data to actually predict on-the-fly in real time. That’s the picture of the future,” he said.
Attached to the real-time machine learning opportunity is the hardware that’s needed to run such software. Zweben said in the case of Splice Machine’s software, it typically takes low-cost commodity Linux servers that meet a certain specification. These Linux servers are set up to run Kubernetes containers so that resources can be spun up and wound down in real time when compute needs ebb and flow between training machine learning models and deploying them to run inference.
“The cool thing about using Kubernetes is that you can reallocate the resources on that Kubernetes cluster on the fly so that when you’re training, you take as much of the resources as possible to do the crunching and the math, and then when you’re not training, you let those resources go be used for other purposes,” Zweben said.
Zweben said introducing machine learning applications into industrial environments presents solution providers with “significant” services opportunities, but it will require them up to build expertise in data engineering, data science and process engineering. That latter area is just important as the important as the first two because experts in the field can “talk to the plaint people about their existing processes” and serve as a liaison to data engineers and data scientists, Zweben said.
“They have more subject matter expertise and can help the data scientists understand, ‘This data belongs in a compression area,’ and, ‘This data belongs in the refrigeration area,’ and ‘They should be treated differently,’ and so forth,” he said.