7 Things To Know About Spectre And Meltdown Patch-Related Performance Hits
Sifting Through The Speculation
As concerns over the Spectre and Meltdown processor vulnerabilities stretch on into another week, technology vendors and solution providers seem to be facing more questions than answers when it comes to security patches and their possible performance effects. New information regarding PC and server impact is emerging daily as vendors and integrators conduct further testing and research.
While some solution providers have reported minimal or no performance effects related to recently deployed software and OS patches, some customers and technology professionals have experienced problems ranging from noticeable CPU usage increase to blue-screening.
The Spectre and Meltdown threats are examples of side-channel attacks. Two of those variants are known as Spectre, including one that could be used to leak Linux kernel memory and another that could change how an application works based on the contents of memory. The third, known as Meltdown, could let an application read kernel memory without misdirecting the control flow of kernel code.
To date, there have been no known exploits of the security issue.
In the following slides, CRN has highlighted seven key points worth considering as businesses continue to size up the potential consequences of deploying Spectre and Meltdown mitigation patches.
The Hypervisor And The Guest OS
Since patching their own cloud platforms, the major public providers have asked users to patch their operating systems. Amazon Web Services' official advisory notes that customers are protected against Spectre and Meltdown vulnerabilities posed by "other instances," meaning that virtual machine operating systems do need to be patched, as well – this according to an AWS employee.
The root cause of these OS patch performance effects come from applications making system calls to the OS kernel. Thus, workloads that frequently jump between the OS and the application – regardless of whether they exist in the cloud or in a data center – are most at risk to suffer the potential 30 percent hit that Intel confirmed earlier this week. Note that 30 percent is the high end of that range.
Ronak Singhal, director of CPU compute architecture at Intel, confirmed as much, saying that the specific workload in use matters most. Intel added that the average performance hit caused Spectre and Meltdown threat mitigations is between zero and 2 percent, with most user workloads experiencing few or no effects.
The Windows OS Update Has Caused Problems
Terry Myerson, executive vice president of Microsoft's Windows and Devices Group, wrote in a blog post Tuesday that "older silicon" machines running Windows 10, Windows 8 and Windows 7 will notice "a decrease in system performance," according to industry benchmarks. This includes 2015-era PCs with Haswell or older CPUs.
Myerson also notes that any Windows Server – particularly when running IO-heavy applications – suffer more serious performance hits "when you enable the mitigations to isolate untrusted code within a Windows Server instance." Microsoft says Windows 10 devices running on newer silicon and newer CPUs, including Skylake, should experience less of a performance hit.
Since this weeks update: 1: Bricked SurfaceBook Pro / 1: Bricked Intel i7/NVidia rig / 1: Bricked Lowend HP Elitebook. Pretty sure it isn’t my hardware. This is; without a doubt;
/**/ /**/
Also this week, some Microsoft customers reported "unbootable" AMD devices after installing the Windows update on their PCs. An AMD spokesperson told CRN that this bug affects a smaller subset of "older" processors. Microsoft has temporarily paused patch deployment to affected devices. AMD has previously said that "negligible performance impact" is expected related to software and OS patches.
Other Users Are Having Problems, Too, Including One Cloud-Reliant Company
Ian Chan, director of engineering at Branch, tweeted that a Spectre patch applied to a high input/output workload's AWS EC2 hypervisor caused CPU usage to increase between 5 percent and 20 percent. Syslog_NG's Peter Czanik also tweeted that compiling times on Fedora had significantly increased, particulary when it came to Java, on an Intel i5 processor. He added that CentOS is "badly affected," while openSUSE Linux and Gentoo Linux are experiencing minimal effects.
Late last week, the Fortnite team at Epic Games published a blog on the company web site that blamed Meltdown-related security updates for "login issues and service instability" affecting its back end. The Cary, N.C-based video game developer's infrastructure is built around cloud services, and "all" of those services had been affected, according to the company. Included in the post was a graphic detailing its CPU utilization, which more than doubled on the night of Jan. 3.
"Unexpected issues may occur with our services over the next week as the cloud services we use are updated," Epic wrote. "We are working with our cloud service providers to prevent further issues and will do everything we can to mitigate and resolve any issues that arise as quickly as possible."
Cloud solution providers who spoke with CRN have reported negligible performance impacts in most cases. Allen Falcon, CEO of Westborough, Mass.-based Cumulus Global, said his customers' workloads aren't showing signs of patch-related performance issues.
Effects On NetApp
John Woodall, vice president of engineering at Integrated Archive Systems, a Palo Alto, Calif.-based solution provider, told CRN that NetApp's OnTap-based systems are not environments where anyone can run other applications, such as NetApp's cloud-based and virtual storage appliances.
"They may be built on sever hardware with code or processors that can be attacked, but because OnTap controls access, unauthorized applications cannot access the data," Woodall said.
However, developing long-term fixes to Spectre and Meltdown could be problematic given the broad range of data that is accessible by at-risk servers, Woodall added. He cited a potential 30 percent drop in performance caused by server patches that would create the need for more storage to compensate.
Storage Application Performance
Tom's Hardware writes that storage apps running "enterprise-class workloads" are most at risk to suffer performance loss, with some early testing pinpointing degradation between 20 percent and 30 percent. However, application benchmarking tests conducted by the site showed minimal if any difference in performance between a patched and an unpatched Intel Optane 900P (480 GB), implying that synthetic testing results may be exaggerated.
The tests pitted both solid-state drives against each other under a variety of application workload scenarios, including video games such as "World of Warcraft" and "Battlefield 3," a range of Adobe software products and multiple Microsoft Office tools. The results saw both SSDs perform either exactly the same or nearly the same in every scenario.
In its post, Tom's Hardware warned about relying on synthetic tests to size up potential performance hits because they tend to isolate components – components that effectively work together in practical settings.
LFENCE And Bounds Check Bypass Mitigation
Intel recommends inserting a barrier to stop the process of speculation. Speculation, at the heart of the Spectre and Meltdown exploits, allows processors to skip ahead in their execution of code to save time on computing processes – but also potentially enabling malicious code to access a portion of the memory on the chip.
The chip company recommends the LFENCE instruction as this barrier, which could prevent new operations from executing before they are supposed to, said Intel. It is also possible to develop static analysis rules to find places in the software where a speculation barrier like LFENCE might be needed.
However, Intel also notes that LFENCE insertion could "significantly" compromise performance "if it is used too liberally."
Security Vendors And Maintaining CPU Overhead
CompassMSP CTO Paul Breitenbach said that MSPs will want to ensure clients are using certified anti-virus vendors that meet OS patch requirements in order to prevent blue-screening on servers and PCs.
He added that having CPU overhead and keeping it on systems customers rely on at peak usage times helps mitigate whatever performance effects are felt.
"None of our clients we're particularly worried about," Breitenbach said. "We monitor the metrics of their servers to make sure they're not on the edge of any kind of performance [loss] on an ongoing basis. … For MSPs and shops that don't make recommendations with regard to systems overhead, it could be something of concern for them."