5 years ago
 

Amadeus admits ALTEA crash was a result of Leap Second bug

A software failure at Qantas last week was not the result of slippery fingers by Amadeus engineers but was in fact due to the dreaded Leap Second bug.

The crash of the Amadeus ALTEA airline hosting system, the large platform used by hundreds of airlines for passenger check-in and ground control, hit Qantas and Virgin Australia on Sunday 1 July, just at the moment when many feared the Leap Second bug might make its mark felt across the digital world.

It wasn’t a coincidence. And now Amadeus has admitted it got caught out by the bug.

But what is the so-called Leap Second, and why is it buggy?

The rather grandly titled International Earth Rotation and Reference Systems Service has the monumental responsibility of deciding when to insert a single second into the daily clock to ensure the rotation of the earth on its axis keeps in line with solar time.

Might sound a bit crazy to anyone out of physics and astronomy, but it’s a vitally important process which happens every few years – the last occasion was at the end of 2008 (generally the additions take place at the strike of midnight on 1 January or 1 July).

But given that so many of the world’s systems and processes are controlled electronically, including airline hosting, adding a second into the so-called digital clock is perhaps not as easy as it sounds.

This is what happened to Amadeus last weekend.

It turns out that some systems running on Linux were hit by a bug triggered by the insertion of the additional second on the clock at the stroke of midnight GMT on Sunday.

An Amadeus official says confirms the incident which took out ALTEA for less than hour was the result of a “Linux bug”.

“We are now investigating how we can enhance our ability to detect and address such bugs in advance.

“We take any systems disruption very seriously, we have always valued our reputation for reliability and we are determined to do everything that is appropriate to provide a reliable service in future.”

NB: Clock and aircraft images via Shutterstock.

Share on FacebookTweet about this on TwitterShare on LinkedInEmail to someone
 
 
Kevin May

About the Writer :: Kevin May

Kevin May was a co-founder and member of the editorial team from September 2009 to June 2017.

 

Comments

Your email address will not be published. Required fields are marked *

  1. IT-monocultuur nekt wereldwijd vluchten - SWAT

    […] luchtvaartfirma’s Qantas en South West een storing toen Amadeus onderuit ging, in 2012 zorgde een schrikkelseconde-bug ervoor dat een aantal vliegvelden die door Amadeus werden bediend platgingen en een crash in 2011 […]

     
  2. thomas mc

    Don’t blame Linux. My Linux system handled it just fine. If you don’t update your system for years, you have nobody to blame but yourself.

     
  3. Mark Lenahan

    Also… wouldn’t disabling NTP on your firewalls from 5 second before midnight to 5 seconds after midnight completely inoculate all your systems against this issue? (That might be wisdom in hindsight!)

     
  4. Mark Lenahan

    I don’t understand the Leap Second reasoning at all.

    Once every few years a truly accurate atomic clock ticks over from 23:59 to 23:60 and stays there for one second before going to 00:00. A 61 second minute. The difference between atomic and astronomic clocks is pretty academic for most purposes, the former is based on atomic reactions and later based on a second defined as 1/86,400 of a mean solar day. Of course they don’t keep in sync.

    I can understand software crashing if a system time function were to return a value of 60 for the second, in fact a lot of binary representations (unix time for example) just don’t accommodate that value.

    What I don’t understand is why you would synchronize a business environment directly with an atomic clock designed to handle leap seconds in the first place.

    Why would any travel or internet or commercial company need their system clocks to be that accurate? If they never did that sync (in other words just use astronomic time rather than atomic time) after a while they might notice the clocks on the cluster are a second faster than some trading partners – something which will happen anyway – so sync them with an atomic clock then, some point after the 61 second minute has already happened.

    I suspect I may be missing something. I know some other companies said it was Cassandra and/or Hadoop issues. Has anyone got a good link on why this matters? And why it matters now and not any of the other dozens of times we have had a leap second in the last 40 years?

    In my imagination someone (probably wearing a snazy suit) decided that super-accurate clocks in data centres was a great idea and figured out a way to hook up NASA or US Navy or Grenwich and other atomic time sources to NTP or some such protocol. What I really don’t understand is when this happened (since the last leap second?) and why anyone felt it was necessary and did it without understanding the difference between UT and UTC.

     
  5. Charles

    Hmm SABRE, Travelport, SITA and everyone else were fine, what is wrong with Amadeus?

     
    • Jo

      Hmmm, I personnaly see one possible answer: tmaybe (this is my hypothesis only) theses competitors are simply not yet running under Linux, but still a legacy system, not impacted by this issue?

       
      • Kevin May

        Kevin May

        @jo – so, is it better for a company to use a modern system that crashes twice in six months or what some perceive as creaking legacy systems that apparently do not?

        Quite a dilemma!

         
  6. JT

    Isn’t this the second time in leas than 12 months Amadeus systems have gone down?

     
 
 

Newsletter Subscription

Please subscribe now to Tnooz’s FREE daily newsletter.

This lively package of news and information from Tnooz’s web site provides a convenient digest of what’s happening in technology that drives the global travel, tourism and hospitality market.

  • Cancel