Amadeus admits ALTEA crash was a result of Leap Second bug

A software failure at Qantas last week was not the result of slippery fingers by Amadeus engineers but was in fact due to the dreaded Leap Second bug.

The crash of the Amadeus ALTEA airline hosting system, the large platform used by hundreds of airlines for passenger check-in and ground control, hit Qantas and Virgin Australia on Sunday 1 July, just at the moment when many feared the Leap Second bug might make its mark felt across the digital world.

It wasn’t a coincidence. And now Amadeus has admitted it got caught out by the bug.

But what is the so-called Leap Second, and why is it buggy?

The rather grandly titled International Earth Rotation and Reference Systems Service has the monumental responsibility of deciding when to insert a single second into the daily clock to ensure the rotation of the earth on its axis keeps in line with solar time.

Might sound a bit crazy to anyone out of physics and astronomy, but it’s a vitally important process which happens every few years – the last occasion was at the end of 2008 (generally the additions take place at the strike of midnight on 1 January or 1 July).

But given that so many of the world’s systems and processes are controlled electronically, including airline hosting, adding a second into the so-called digital clock is perhaps not as easy as it sounds.

This is what happened to Amadeus last weekend.

It turns out that some systems running on Linux were hit by a bug triggered by the insertion of the additional second on the clock at the stroke of midnight GMT on Sunday.

An Amadeus official says confirms the incident which took out ALTEA for less than hour was the result of a “Linux bug”.

“We are now investigating how we can enhance our ability to detect and address such bugs in advance.

“We take any systems disruption very seriously, we have always valued our reputation for reliability and we are determined to do everything that is appropriate to provide a reliable service in future.”

NB: Clock and aircraft images via Shutterstock.

Related posts:

  1. Amadeus takes blame for Qantas check-in system crash
  2. United Airlines and Amadeus dissolve Altéa contract with $75M payment
  3. Amadeus outage hit other GDSs as airline systems crashed
Kevin May About Kevin May

Kevin May is editor of Tnooz. He joined as a co-founder in August 2009 after spending nearly four years as editor of UK-based business publication Travolution.

Passionate about the business of travel and the internet, Kevin played a major role in establishing Travolution in print, online, events and with an annual awards programme, as well as becoming a regular speaker and moderator at industry events.

Prior to Travolution, Kevin was web editor at Media Week (UK) and also worked in regional newspapers for two years at the Essex Enquirer. He started his career in journalism at the Police Gazette at New Scotland Yard in London.

Comments

  1. JT says:

    Isn’t this the second time in leas than 12 months Amadeus systems have gone down?

  2. Charles says:

    Hmm SABRE, Travelport, SITA and everyone else were fine, what is wrong with Amadeus?

    • Jo says:

      Hmmm, I personnaly see one possible answer: tmaybe (this is my hypothesis only) theses competitors are simply not yet running under Linux, but still a legacy system, not impacted by this issue?

      • Kevin May Kevin May says:

        @jo – so, is it better for a company to use a modern system that crashes twice in six months or what some perceive as creaking legacy systems that apparently do not?

        Quite a dilemma!

  3. Mark Lenahan says:

    I don’t understand the Leap Second reasoning at all.

    Once every few years a truly accurate atomic clock ticks over from 23:59 to 23:60 and stays there for one second before going to 00:00. A 61 second minute. The difference between atomic and astronomic clocks is pretty academic for most purposes, the former is based on atomic reactions and later based on a second defined as 1/86,400 of a mean solar day. Of course they don’t keep in sync.

    I can understand software crashing if a system time function were to return a value of 60 for the second, in fact a lot of binary representations (unix time for example) just don’t accommodate that value.

    What I don’t understand is why you would synchronize a business environment directly with an atomic clock designed to handle leap seconds in the first place.

    Why would any travel or internet or commercial company need their system clocks to be that accurate? If they never did that sync (in other words just use astronomic time rather than atomic time) after a while they might notice the clocks on the cluster are a second faster than some trading partners – something which will happen anyway – so sync them with an atomic clock then, some point after the 61 second minute has already happened.

    I suspect I may be missing something. I know some other companies said it was Cassandra and/or Hadoop issues. Has anyone got a good link on why this matters? And why it matters now and not any of the other dozens of times we have had a leap second in the last 40 years?

    In my imagination someone (probably wearing a snazy suit) decided that super-accurate clocks in data centres was a great idea and figured out a way to hook up NASA or US Navy or Grenwich and other atomic time sources to NTP or some such protocol. What I really don’t understand is when this happened (since the last leap second?) and why anyone felt it was necessary and did it without understanding the difference between UT and UTC.

  4. Mark Lenahan says:

    Also… wouldn’t disabling NTP on your firewalls from 5 second before midnight to 5 seconds after midnight completely inoculate all your systems against this issue? (That might be wisdom in hindsight!)

  5. thomas mc says:

    Don’t blame Linux. My Linux system handled it just fine. If you don’t update your system for years, you have nobody to blame but yourself.

Speak Your Mind

*