Happy Leap Smear: Partners Focused On A Second In The Cloud
A leap second will be added to the last minute of 2016 to keep our clocks in sync with the sun, and partners of cloud and networking providers are keeping an eye out for technical problems that extra second can induce.
Beyond extending a year many would like to see come to a close, the common adjustment to compensate for small, unpredictable changes in the earth's rotation challenges the global timekeeping technologies that operators of precision information systems rely on.
Cloud providers are meeting the 61st second of the last minute of the year with carefully planned remediation efforts that highlight the intricate challenges they face in operating global infrastructure.
[Related: The 10 Biggest Cloud Stories Of 2016]
"A question as fundamental as the time becomes really challenging when dealing with computers all over the world," said Alex Lovell-Troy, director of solutions engineering at Pythian, an AWS and Google partner based in Ottawa.
For starters, there's no central authority on timekeeping and different approaches to setting clocks are appropriate for different use cases and technologies.
Cloud providers rely heavily on Network Time Protocol for clock synchronization. NTP uses a sophisticated algorithm to select time servers that counters the desynchronizing effects of network latency.
NTP gets systems across the global Internet within tens of milliseconds of UTC, or coordinated universal time, which in 1972 superseded Greenwich Mean Time as the universal standard.
Leap seconds were introduced when the change to UTC was made almost 45 years ago – the one inserted on December 31st will be the 27th since then. The last leap second was added at the end of June 2015.
These minor adjustments can have major impacts on advanced systems like cloud-connected servers, websites, and mobile apps that sometimes need to agree on time to the thousands of a second – about how long it takes light to travel from New York to Boston.
For banking applications, or Internet security, that level of precision is particularly vital.
For better control, Amazon Web Services maintains its own master clock, called AWS Adjusted Time, which at times subtly diverges from UTC, the globally agreed upon standard.
Instead of rolling 23:59:59 to 23:59:60 on December 31 in accordance with UTC, AWS Adjusted Time will spread the leap second out throughout the hours before and after midnight.
"They have broken out a group of NTP servers and are using them to add little bits of a second throughout the 24 hours surrounding the official [National Institute of Standards and Technology] addition," Lovell-Troy told CRN.
"Once they're done, they'll sync with the rest of the world again," he said.
Windows instances on AWS will obey AWS Adjusted Time, Jeff Barr, AWS' chief evangelist, explained in a blog post.
But most relational databases hosted by the provider won't, Barr said. Instead, those instances will mark 23:59:59 twice (aside from some Oracle versions that will stay true to AWS Adjusted Time).
It's that complicated because applications like databases that receive timestamps can't avoid at least indirectly registering the NTP shift when it happens at midnight. Further complicating matters, some databases internally record the date as the total number of seconds that have passed since the start of 1970 – a system called Epoch time or Unix time.
"Databases need to calculate how many seconds between two events. If those two events happen to contain an additional leap second, that's where life gets interesting," Lovell-Troy said.
Vendors of database engines have to make a choice about what happens during a leap second, and they're not always consistent, he said.
"I can say that leap seconds routinely cause issues," Paul Vallee, Pythian's CEO, told CRN.
Five of roughly 200 Pythian managed services customers saw their services affected during "the great leap second of June 2015," Vallee said.
Flux7, a systems integrator based in Austin, Tex., also learned the hard way about some of the nuances of cloud timekeeping in an engagement with LegalZoom, said CEO Aater Suleman.
For the online legal technology company, Flux7 developed an API for secure file sharing between lawyers and clients. But the system periodically failed because it mismatched time between a Flux7 server and AWS S3 storage. The fix involved setting Flux7 virtual machines to AWS Adjusted Time.
AWS S3 receives a time stamp before creating a private, temporary URL for an uploaded object. For a transfer to work, the origin must be synchronized with the AWS clock.
"Information about the leap second and how it will impact AWS Adjusted Time becomes critical," Suleman said. "Not having the understanding of how AWS time is managed around the anomaly of a leap second could make it difficult to debug and resolve a similar issue."
The issue is particularly acute because modern apps are being built with distributed architectures that typically assume time is constant across all systems to simplify their designs, Suleman said.
It's critical for cloud services partners, especially those not using the provider's time server, to familiarize themselves with the procedures implemented for managing the leap second adjustment, he said.
Google has shared plans to handle the leap second and keep operations running smoothly based on what the Internet giant learned from previous leap seconds in 2008, 2012, and 2015.
"No commonly used operating system is able to handle a minute with 61 seconds, and trying to special-case the leap second has caused many problems in the past," blogged Michael Shields, technical lead for Google's Time Team.
For that reason, Google will run its clocks .0014 percent slower for the 10 hours before and after the leap second.
All Google services and APIs will be synchronized on that "smeared time," Shields said. Customers who want to opt out of the "leap smear" can use non-Google NTP servers.
Other cloud and networking providers are issuing similar warnings to their partners to forestall any problems.
Pythian shared with CRN some observations from leap seconds of the past.
As far as operating systems, Windows and AIX servers seem to not be affected by the issue. But Linux servers using NTP can see error messages, server hangs or maxed out CPU utilization, and those issues can affect databases that get timestamps from the operating system.
Java programs also carry a risk of generating endless error loops or spiking CPU demand. Those problems can be passed to some open-source databases.
Patches and workarounds are available for most of the problems Pythian has encountered in the past.