Big Data and the infinite possibilities for the travel industry

If you’re a regular Tnooz reader, you will undoubtedly have read about the potential of big data in travel.

But just like most up-and-coming technologies, there is much confusion as to what it all really means and how it can help make the customer experience better and lead to more sales.

This article is meant to serve as a technical primer and will show some of the potential future applications across the travel ecosphere.

Big Data 101

The term “Big Data” is often applied in several ways, but it is quite self-explanatory. Big data just means “a lot of data” – that is, datasets that are beyond the capabilities of a typical database in terms of size, workload and overall cost — and the technologies that enable the extraction of meaning from it.

So where might we encounter a lot of data in travel?

A prime example would be the analytics logs of an online travel agency. For years, analytics tools have enabled companies to keep track of conversion funnels, detailed demographic statistics, and other pertinent information such as which pages convert the best, have the highest bounce rates, etc.

These reports are then used to optimize sites to ensure the best possible conversions. But in the era of big data, companies are gathering and paying attention to much more data than seemed possible just a few years ago.

For example, some sites are now gathering the detailed mouse movements of customers in real time as they move around pages.

This generates literally millions of coordinates and data points for every user, allowing companies unparalleled insights into what users are doing when they’re on a page.

Just a few years ago, storing the amount of data required for projects like this would’ve been prohibitively expensive.

These days, the proliferation of cheap storage as well as distributed file systems that allow storage across dozens (or hundreds) of commodity computers enables the cheap and efficient storage of petabytes of data without massive cost.

As storage technology improves, the costs of keeping every last byte of data for later analysis will keep going down.

Big data analysis, MapReduce and the extraction of meaning

Having all this data is nice, but the real value lies in the extraction of meaning from it. Big data tools such as MapReduce, based on technology originally invented at Google, enable easy discovery of common trends in data and the action upon those trends.

This is more easily demonstrated by an example:

Imagine you had an Excel spreadsheet of every hotel in the world and wanted to find the ones most commonly described as “awesome”. [NB: Ed – Are you sure? 🙂 ]

The raw data you have might look something like the following:

Hotel Name, Review
“Hotel A”,”Miserable experience”
“Hotel B”,”Awesome pool!”
“Hotel C”,”Liked it”
“Hotel B”,”Awesome restaurants”
“Hotel A”,”Loved it”
“Hotel C”,”Awesome experience”
“Hotel B”,”Boring”

Even though we’re only showing a few records here for simplicity, this spreadsheet would have hundreds of thousands of rows.

With MapReduce, you could write a function that maps each hotel name and creates a group of reviews for each name that is commonly discovered.

In the above example, “Hotel B” occurs 3 times, so the Map function would create a collection resembling something like this:

“Hotel B”:”Awesome pool!”, “Awesome restaurants”, “Boring”

Already, the map function has helped us find every review for “Hotel B”. But we’re not done yet — we’ll let the Reduce function do its job.

This function lets us perform any type of analysis we want on the collection that Map created for us. In our case, we wanted to find only reviews containing the word “awesome.” Reduce would contain computer code that did this:

“If the review contains the word awesome, increment an internal counter by 1”

This internal counter would essentially act as a score, or number of times the word “awesome” appeared for a given collection. In our example, “Hotel B” would come out on top with 2 “awesome” reviews, “Hotel C” would come out with 1, and “Hotel A” with 0.

So what did we accomplish in the above example? We found commonality in raw, unstructured data and then analyzed it for a business purpose.

While simple, this is the raw power of MapReduce and similar technologies: Extracting meaning when there previously wasn’t any.

The amazing part of this is that MapReduce can analyze billions – or trillions – of data points and find patterns in them. All on clusters of commodity hardware, and at a cost that even a startup can afford.

Can you see the potential yet?

State of Big Data in travel

The ability of big data technology to enable us to find intelligence in vast amounts of data presents a clear, massive opportunity to reshape the way consumers are marketed and sold to in travel.

It’s not a question anymore that those companies with ongoing big data projects are at the forefront of an entirely new way to sell travel to customers.

Emmanuel Marchal, currently at big data storage company Acunu, and a former-director at LikeCube, a company that leveraged big data for consumer travel sites and was later acquired by TimeOut, spoke with me briefly about the current state of Big Data in the travel industry:

  • Big data applications are moving from profiling to true personalization. For example, true personalization would enable a site to recommend a specific hotel to a specific traveler based on their specific wants, needs, and previous purchase patterns, rather than a generic set of recommendations based on the type of traveler.
  • True personalization is the main driver and “holy grail” of all big data efforts in travel.
  • OTAs and other sellers of travel now see their big data efforts as “must haves” rather than “nice to haves”
  • 2009 was the year of talking about it. 2010 was the year of starting big data projects. 2011 was the year of the prototypes. Will 2012 be the year these efforts finally see broad implementation?

In addition to personalization, efforts such as Hopper (NB: Disclosure – Hopper CEO Fred Lalonde is chairman of Tnooz) are demonstrating another prime example of how big data can empower consumers.

Large scale analysis of data related to places combined with natural language processing (NLP) enable search queries such as “nearby beach vacations under $500”. While this represents a combination of technologies, they all are enabled by the application of big data processing.

Into the future

While personalization is surely a holy grail, there are literally thousands of potential uses of big data in travel.

Geo-fencing, the process of knowing when a traveler is near a certain attraction or vendor, is starting to emerge.

An example of this is the recently launched Foursquare Radar feature, which alerts you when you are near a place you at one time wanted to be reminded of.

This technology is pure big data: gathering your coordinates in real time via your mobile phone’s GPS and realizing when you are in a certain boundary.

Enabled across millions of consumers, the amount of data gathering and processing required for efforts like this were unthinkable just a few years ago.

Think about it: how many GPS coordinates do you generate in a given day? The potential of geo-fencing to marketers is nothing short of amazing.

Image data processing: Billions of photographs representing petabytes of storage are uploaded to the internet every day.

Photo sharing apps such as Instagram might seem a great way to share moments with friends, but the true value behind these startups lies in the data they’re collecting on the backend. Each photo generates a mountain of data that, with further analysis, reveals a host of information on the user uploading it.

Color, the much maligned startup that debuted with a stratospheric valuation before even launching their product, surely wowed its investors with the potentials behind all the data it was hoping to collect.

In travel, the startup Jetpac is already using these image processing methods to put a different spin on the social travel concept.

Recently acquired by eBay, the startup Hunch represented one of the finest examples of finding commonality in previously disparate forms of data.

Their API and publicly available test tool allows anyone to find out things like “People who prefer Subaru cars also prefer to stay at 5 star resorts 40% of the time” (this is not a real example!).

While these types of correlations might seem questionably useless, imagine the power of putting these concepts to use when you’re trying to market and cross-sell a traveler.

Taste Graphs turn knowing “something” about your customer into knowing a lot of other things about that customer. The use and application of these Taste Graphs is sure to become much wider in the coming year in everything from online retailers to, you guessed it, travel.

To sum it all up, big data isn’t an amazing new magic box you’ll be able to buy from a vendor.

Big data merely sums up the concept that we each generate billions of data points every day and that with the economically viable application of technology, we can be sold to better than ever before.

Are you ready?

NB: Image via Shutterstock.

Share on FacebookTweet about this on TwitterShare on LinkedInEmail to someone
Alex Kremer

About the Writer :: Alex Kremer

Alex Kremer is co-founder and head of product at Redeam, an electronic ticketing platform serving the tours & activities industry. He was previously Senior Vice President of Partnerships at Nor1, a leading hospitality merchandising provider. He joined Nor1 after it acquired Flextrip, a B2B tours & activities distribution network he co-founded. Alex is a 15-year veteran of technology startup companies, previously co-founding Cruvee, a business intelligence company for the wine industry where he led Business Development. Prior to that, he co-founded FanAxis, one of the world's first fan club management and merchandising firms in the music and entertainment industries. Alex is based in Boulder, Colorado. Follow him on Twitter at axk.



Your email address will not be published. Required fields are marked *

  1. Big Data In Travel | The Data Exchange

    […] Read the full piece. […]

  2. Andrew Clegg

    The map-reduce example is a bit back-to-front. Mappers don’t do grouping themselves, and reducers are much more rarified than mappers, so you want to make the workload on each reducer as light as possible. This ensures that your algorithm remains scalable.

    The map function should grep the input string for the string “awesome” and output a 1 if it’s there, and a 0 if it’s not, resulting in mapper output like this:

    “Hotel A”,0
    “Hotel B”,1
    “Hotel C”,0
    “Hotel B”,1
    “Hotel A”,0
    “Hotel C”,1
    “Hotel B”,0

    Grouping by key happens on the way to the reducers automagically, so the reduce function receives the following:

    “Hotel A”,[0,0]
    “Hotel B”,[1,1,0]
    “Hotel C”,[0,1]

    Finally the reduce function just adds the list of integers for each group.

    Happy to help 🙂

    • Alex Kremer

      Thanks for the followup Andrew. The example was admittedly a bit over-simplified and more pseudo code than anything, but meant to demonstrate how the technology works on a process level. I’ll be the first to defer to the real geeks lurking here on Tnooz 🙂

  3. William El kaim

    Most of the time people tends to associate Big Data with Divide and conquer algorithms (Map reduce, Hadoop). But parallel computing is only one way of doing Big data. Another one, is to use Big data to make data mining and forecasting. Trying to discover new facts or new segment, etc.

    For me the best examples of Big Data in travel are FlightCaster ( or Fare Prediction (fareCompare now Bing Travel, Sabre) or Fare Tracking (Yapta). They all share common features: lots of data, parallel computing for running complex algorithms and advanced visualization tools to crunch the results.

    I’m convinced that companies will more and more leverage and value data offered for free (open data movement) like RITA BTS and their internal data, in order to use them in their applications. Trip planning tools should evolve quickly for example …

    The value is no more in the “raw data” but in the way data are aggregated, segmented and enriched in context. Building your own technical platform and looking for the right quality data is key. You will also need new (rare) skills internally…

  4. Jonathan Meiri

    Great post!

    There is a bunch of information already out there, and the challenge is to distill it to actionable insights.

    From our experience at Superfly (a personalized flight search service) this is what we learned:

    – Personalization must be implicit – users will not fill any information
    – Small set of super relevant and bookable options that take into account different modes of travel (business vs pleasure)
    – Delivered at the right point in the user experience

    Finally, and this may come as a surprise, we are seeing the social graph taking a back seat in this process. With a very high average number of friends per user, and the increasing range of travel preferences among friends, we found that elite status is a much better indicator for preferences than your facebook graph.

  5. Kuan Sng

    With such horsepower and the payoff promise from leveraging hitherto unknown insight comes the spectre of “Big Disinformation” and system gaming. As many potential paths exist to “Eureka!”, there lie landmines, IEDs and hooded ne’er-do-wells waiting to siphon your investment dollars (at best) or derail your business (at worst).

  6. susan hopley

    This is a notion finallly taking hold. The issue is monetizing data, something we directly address at The Data Exchange, where data, particularly travel data, can be gathered and traded at the field level. There is no question but that producers of data need to make a return on all they expend in generating data in the first place. Nowadays, it is the data that is more valuable than the transaction of removing a booking from the “shelf,” i.e. booking source. What is far more valuable, as the retail industry well knows, is what to put on the shelf in the first place and where to put it. The Data Exchange is the enabler.

  7. Dennis Schaal

    Sean O'

    Smart article!

  8. Jim Kovarik

    This is actually something we’re very focused on at C2G, and as we’ve just surpassed over 1MM vehicles that users have entered on our site we’re able to pull together some very interesting data similar to the examples above.

    For example, we recently examined 10,000 car trips to the Disney parks to see the top vehicles people are driving to Disney World and Disneyland. Turns out its the Dodge Caravan (you can see the full results here

    This is only scratching the surface, and completely agree that personalization will become increasingly sophisticated (we think that personalizing car trips to the car you are driving is interesting)

  9. Personalizarea informatiei in turism | Blog Turism

    […] ma pot abtine sa nu va recomand (inca o data) un articol foarte bun de pe Tnooz referitor la “Big Data“. Nu este vorba despre fratele mai mare al lui Data ( s-ar putea sa fie ruda cu Big Deal […]


Newsletter Subscription

Please subscribe now to Tnooz’s FREE daily newsletter.

This lively package of news and information from Tnooz’s web site provides a convenient digest of what’s happening in technology that drives the global travel, tourism and hospitality market.

  • Cancel