Data crunching to find the cheapest airline in the world

NB: This is a guest article by Michael Cameron, co-founder of Rome2rio.

The growth of big data and availability of APIs is providing exciting new opportunities for making sense of travel data, even for a fledgling start-up like Rome2rio.

Airfares fluctuate wildly, but do follow certain obvious trends; longer flights cost more, and some airlines are more expensive per mile flown than others.

We recently started an internal project aiming to model approximate/typical air fares for the flight itineraries assembled by our system. Our aim was to use this model to improve the accuracy of our multi-modal routing engine. However, in the process we generated some interesting data worth sharing with the industry.

We modeled airfares using some simple parameters. To do this, we examined the economy class airfares displayed by Rome2rio to users over the past 4 months, totalling some 1,780,832 price points. We grouped the airfares by distance and selected the 20th percentile fare for each distance (where 20% of fares are less, and 80% are more), to produce the following graph:

The graph shows a pretty clear linear relationship between distance traveled and airfares. Based on this data, we can create a simple equation to model this relationship:

Fare = $50 + (Distance * $0.11)

Where Fare is the cost in US$ of flying Distance miles. On average, a fare costs $50 before any flight distance is taken into account, plus an average of 11 cents per mile travelled.

So what happens if we divide our data by airline? How does the 11 cents per mile flown vary per carrier?

We analyzed the average cost per mile for fares grouped by airline, using the same methodology. We only considered competitive fares – those within two times the cheapest fare for that price search – to remove outlier price points. We also excluded airlines where we had insufficient data.

The results are summarized below:

The results are fascinating, and there are some clear trends. Budget carriers such as Ryanair and AirAsia are at the low end of the scale; short haul, turboprop operating carriers such as Regional Express and Darwin Airlines are at the high end.

There are, however, many factors which can influence per mile costs including type of aircraft flown, routes flown, local salary and fuel costs, ancillary revenue, and airport landing fees.

The results should also be taken with a grain of salt, since our sampling set is small, no statistical analysis has been performed, and the results may be biased depending upon the types of searches performed on Rome2rio. Also, Rome2rio may not always have access to the cheapest fares. A major, comprehensive meta-search player such as Kayak or Skyscanner could perform a more thorough analysis based on a far greater sample of search logs or their airfare caches. Nonetheless we wanted to share this data since we thought the results would be of interest to the travel industry, travel buffs, or anyone excited about big data.

NB: This is a guest article by Michael Cameron, co-founder of Rome2rio

NB2: Globe image via Shutterstock

NB3: Special thanks to Fenn Bailey from Adioso and Timothy O’Neil-Dunne for providing valuable feedback on the analysis.

Share on FacebookTweet about this on TwitterShare on LinkedInEmail to someone
 
 
Viewpoints

About the Writer :: Viewpoints

A founding principle of tnooz was a diversity of viewpoints from across the spectrum. Viewpoints are articles by guest contributors from around the travel and hospitality industries. The views expressed are the views and opinions of the author and do not reflect or represent the views of his employer, tnooz, its writers, or partners.

 

Comments

Your email address will not be published. Required fields are marked *

  1. Martino @ WhichBudget

    @Kevin – Unjustifiably snarky, yes. But more of a cynic really. I’ll explain why…

    As I expected, the follow up comments did, indeed, clarify the potential use of such analysis. However, I have to agree with Timothy on this one: “If you are a shopper – caveat emptor. If you are an intermediary this is something you can use.” The nature of airfare search – with bundling/unbundling, yield management, loyalty programs, different pricing depending on location, etc. – is extremely complex and the airlines are deliberately keeping it that way. I have not yet seen a piece of software which is capable of interpreting this huge amount of data in the context of a complex search. Of course, it is possible and someone will crack it one day (thumbs up to Michael for taking the leap towards it), but I think at the moment it takes someone like Stuart to use the figures provided by big data crunching and use sense and experience to interpret it.

    To answer your question, at WhichBudget we have actually done similar analysis back in 2009 but felt that the overall picture (especially when you throw in LCCs in the mix) was so complex the data we came up with was just ninjas showing off. I do regret not turning it into an article though, it would have got us heaps of PR from Tnooz!

     
  2. Ereid @ Drungli.com

    It’s a pretty fascinating theory, but I have a question for Michael. I’d love to know the source of Rome2Rio for the lowcost airlines since they don’t even show these airlines on their website?

     
  3. Stuart

    Enjoyed this article and comments.
    So it an average RTW is around 29k miles…
    29000 miles x .11 +50 = 3240 in funny money
    vie XE.com today
    3,240.00 USD = 1,999.86 GBP
    So it should be around the £2k mark
    Averages shoulder season (and what’s an average) to about £1700 in the UK.
    Not a bad formula though.

    @Cheers Michael, food for thought.

     
  4. aib

    What is this “mile” you keep talking about?

     
  5. Timothy O'Neil-Dunne

    There are many applications of this data. Its not that hard to figure out how it can be used. Obviously the disclaimer is “past history is no indicator of future behaviour” (joke). There are a couple of real truths that exist in travel. One of the basic ones is that the airlines (bless ’em) don’t want to have data in the wild that can remove the opacity of the prices. Its a natural and normal requirement.

    The web is about transparency. We are still so early in the cycle that the ability to delivery information like this and the far greater granularity of which Michael wrote is a natural progression.

    Airlines need to make money. Delta’s yields are running at about 11%. Industry wide yields over the past year are running at about 3%. So clearly it is possible to make money. The absolute truth is that the market for airline seats is no longer demand side controlled in many markets. Particularly in the USA.

    I did an analysis (manually) on the top 20 markets for several years by channel by city pair by airline. The results were dramatically different then both on aggregate and obviously by component. But that was 1999-2002. Those days of fare wars and silly fares are long gone.

    Why is that? In free markets real price fluctuates. In oligopoly markets prices appear to be volatile to the consumer but not over time with hindsight.

    If you are a shopper – caveat emptor. If you are an intermediary this is something you can use. If you are a supplier – I would say this information is not pleasant to be exposed.

    Pick your poison 😉

    Cheers

     
  6. Kevin May

    Kevin May

    @martino – I think you’re being a bit snarky, no? Michael and his team have just played around with some data and come up with some interesting results.

    The methodology is also interesting and no doubt be applied to product or services in the future.

    So, let me turn the question on you: if you had the ability to play around with this type of data and establish such results, what would you do with it, especially with respect to WhichBudget?

    Or do you think there’s no point at all, it’s just data ninjas showing off? 😉

     
  7. John Pyle

    Having travelled on a number of these airlines I’d say you’re results are pretty accurate.

     
  8. Hillrider

    Michael,

    Pretty cool. I presume you looked at economy fares only and tossed business class/first class queries (you don’t state).

    Also very interesting that you found a linear fit; I was under the impression that the best fit was exponential, with the shorter flights being more expensive on a per-mile basis (even taking into account the intercept at $50) than longhauls due to the utilization of less-efficient (smaller) aircraft. However, I wonder if this effect is negated by the low-fare airlines, whose pricing structure is such that your analysis misses lots of the actual TRUE costs of flying (e.g. the extra bag fee, the ticketing fee, the drink fee, the snack fee, the fee to pay the fee etc.)

     
    • Michael Cameron

      Hillrider,

      Good question, looks like I left this detail out – we did indeed exclude non-Economy class fares.

      I think you’re right – the shorter flights are a mixture of less-efficient, turboprop services and budget carriers. Shorter flights are also typically domestic with lower taxes / fees, plus the additional ticket fees like you described coming into play. All this seems to lead to a more linear fit.

       
  9. Larry Smith

    I love the simple equation: Fare = $50 + (Distance * $0.11)
    It would make for a simple, fun mash-up app to budget for a trip, giving one an alternative view of what the fare should approximate.

    If you all have some more cycles, it would be very interesting to analyze by day of week and trip length.

     
    • Geert-Jan Brits

      Would make for a nice mashup indeed. You could easily arrive at statements like ‘20% less than expected’ and enable users to sort and/or filter on that.

       
      • Michael Cameron

        This is indeed what we’re planning. By using a more complex version of the model I describe here, we can layer indicative prices on all flight legs on Rome2rio.

        We’re also working on adding pricing information to all train, bus, ferry and taxi legs displayed by Rome2rio using a similar approach. The end result will be a total estimated door-to-door cost.

        Geert-Jan – you’re right; once the end consumer selects a date, there’s an opportunity there to say “this is a great/expensive fare, it’s X% less/more than expected” as you suggested.

        Larry – some more analysis of day of week or day of year sounds like a great idea.

         
        • Brandon

          So how can you go about getting more data? Can you poll it (APIs, screen scrapping, etc)? Or does a third party have to request the fare, and you’re then “caching” (or recording the data) for later analysis?

           
  10. Martino @ WhichBudget

    Nice graph (airline logos always have the warming effect on me), but other than admiring the effort put into doing this research, I am not at all clear as to how this can help either with finding cheap flights nor with business decision making. Am looking forward to follow-up comments to enlighten me.

     
 
 

Newsletter Subscription

Please subscribe now to Tnooz’s FREE daily newsletter.

This lively package of news and information from Tnooz’s web site provides a convenient digest of what’s happening in technology that drives the global travel, tourism and hospitality market.

  • Cancel