Distil
489 days ago
 

Screen scraping bots banished by latest startup targeting travel vertical

Distil Inc, an inaugural member of TechStars Cloud in early 2012, has built a sophisticated system for companies looking to protect their content from automated attempts to scrape it.

Screen scraping is the process of deploying automated bots to gather website content used to seed other services, fuel competitive intelligence, and aggregate product details like pricing, features and inventory.

Screen scraping is ubiquitous, and ubiquitously contentious. It has been the subject of high-profile lawsuits across industries (see Craigslist v. PadMapper), and, in the travel space, has spawned epic feuds (see Ryanair v. Everyone).

Companies hate having their data and content scraped by bots, claiming copyright infringement and unfair business practices. And businesses reliant on scraping claim freedom of information and fair use.

Regulators have weighed in on some cases, such as Google’s recent concession to stop screen scraping review sites and placing the content in their search results. Courts have been involved in others, most recently in the aforementioned Ryanair case, where a court refused to grant an injunction against Budget Travel’s use of screen scrapers on Ryanair’s sites.

Regardless, screen scraping continues to be popular, a boon for some and an immense frustration for others. Distil has built a much more impenetrable wall between these two camps by providing screen scraper blocking via a SaaS model.

The company claims benefits of not only protecting proprietary content and inventory, but also to provide the server resource savings of blocking resource-intensive bad bots. By preventing bots from being deployed on-site, the site owner also ensures that they can preserve the integrity of their pricing, inventory or other competitive advantages.

Distil was founded two years ago by veterans of a cloud security company. When CEO and co-founder Rami Essaid kept fielding requests for a bot blocker, he researched options and quickly realized the unfulfilled need in the market.

The company is venture-backed, having recently closed a $1.8 million financing round.

Distil sees great potential in the travel sector, which they’ve estimated internally to be a $400 million market opportunity. Some of the scraping problems plaguing the travel industry are, according to the company:

  • Loss of Sales and Ancillary Revenue
  • Decreased Traffic and Visitor Engagement
  • Deflated Brand Appreciation
  • Increased Network and Bandwidth Costs
  • Increased Latency and Diminished User Experience

Distil is also targeting the Digital Publishing, Directory/Classified, eCommerce, and Social Media/Forums verticals. In aggregate, Distil estimates the market for bot blocking at over $2 billion – especially when considering the enterprise side of the market.

Read on for the Tnooz Q&A with CEO and co-founder Rami Essaid.

What does the competitive landscape for your product look like?

There are currently two existing companies that claim to address content scraping, Siteblackbox and Sentor. They both utilize a hardware-based approach to tackle the problem. They are very difficult to integrate, do not block bots in real-time, slow down a company’s website, and are 3-10x Distil’s cost.

What is your revenue model and strategy for profitability?

Distil uses a SaaS model with a recurring monthly fee. We have two types of customers. The first is small – mid sized inbound customer. This customer type comes through marketing campaigns including email outreach, PPC, Webinars, Social Media maketing, etc, They are very low touch customers that go through automated provisioning and quickly convert after the trial.

The second customer type is the larger enterprise customer that requires an outbound sales effort. Enterprise customers need more hand holding and a longer sales cycle, but they account for much higher revenues.

Distil also has channel partnerships established with Rackspace, Softlayer, and several other small hosting providers that provide inbound leads.

Describe what your start-up does, what problem it solves (differently to what is already out there) and for whom?

Distil helps companies identify and block malicious bots and programs trying to access your webpage and database. Distil makes real-time decisions and seamlessly distinguishes human visitors from web scrapers, with no false positives. Distil also accelerates content through 15 global nodes, improving page load times and reducing server load.

Why should people or companies use your startup?

With the advent of aggregated travel search sites like Kayak, Hipmunk and Expedia, the travel industry has exposed large amounts of pricing data.

Gigabytes of information are available and presented to increasingly price sensitive consumers at the click of a button. In an attempt to win this intense pricing war some travel sites scrape data from both competitor’s and contributor’s sites. This automated scraping leads to competitors duplicating content, confusing customers, and pricing each other out of the market.

Distil helps companies stop this by detecting and preventing malicious web scraping and content theft in real-time.

Other than going viral and receiving mountains of positive PR, what is the strategy for raising awareness and getting customers/users?

We want to bring awareness to the problem, not to our company. Many websites that are suffering from competitive price scraping or data theft are not even aware that it is happening to them.

We continue to publish white papers, case studies, conduct webinars, and speak at conferences to evangelize the problem. Once companies recognize they have an issue, we believe they will choose Distil on their own as the best solution on the market.

How did your initial idea evolve? Were there changes/any pivots along the way? What other options have you considered for the business if the original vision fails?

Originally Distil was founded to help Digital Publishers stop web scrapers from stealing their content. We soon realized that scripts and bots were harming more than just Publishers and so we started targeting other industries as well to help them stop web scraping.

Distil is now in beta with a new product that leverages our anti-bot platform to help companies prevent fraud caused by bots and programs.

Where do you see yourselves in 3 years time, what specific challenges do you hope to have overcome?

Distil’s mission is to make the web more secure. In 3 years time we anticipate a suite of solutions to web companies combat web-scraping, fraud, and other security related issues.

What is wrong with the travel, tourism and hospitality industry that requires another startup to help it out?

The travel industry is at risk of self-cannibalization. Too many websites simply aggregate pricing and content without providing any new features that help drive the industry forward. By protecting pricing and content, we want to help the industry concentrate on building new functionality that improve the user experience, not just fight to be the cheapest.

Tnooz view:

As digital publishers ourselves, we understand the value of protecting intellectual property and proprietary content. Screen scrapers enable not only shady businesses, but a sort of innovation stagnation that doesn’t allow for much industry evolution.

Protecting content at such an affordable rate has the potential to truly shake things up. If every mid-size publisher begins to protect their sites from scraping, there would be a seismic shift away from aggregators towards the actual content creators.

It will also force those focused on curating content from original publishers to either start contributing original content, develop direct partnerships with content owners, or to close down shop.

The business is also scaleable beyond just the travel vertical, and has potential to disrupt scraping-as-business-model across verticals. If a business can no longer gain inventory from scraping, they will be forced again, just like content curators, to either develop direct relationships with inventory owners (ie. pay them) to develop their own inventory, create their own inventory feeds/change models or close up shop completely.

Having a bad bot banisher is a valuable tool to maintain a defensible competitive advantage, and we’re eager to see just how an affordable bot blocker changing the landscape Web-wide.

 

 
 
Nick Vivion

About the Writer :: Nick Vivion

Nick Vivion is a reporter for Tnooz, based in New Orleans, USA.

His passion for travel technology led him to travel around the world shooting travel videos for Current TV and Lonely Planet TV in 2006 and 2007.

He shot on Mini-DV, edited on a white MacBook, uploaded and shared online as he traveled. His moxie for travel video has resulted in over two million views on his YouTube partner channel.

In addition to travel, Nick co-founded of one of the web’s most talked about LGBT media sites, Unicorn Booty, and has gone "blog-to-brick" with a bricks-and-mortar restaurant called Booty's Street Food in New Orleans – serving street food from around the world.

 

Comments

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

  1. Miramon

    I doubt that resource consumption is a serious motivation for this firm or their software. When Google indexes a typical site — downloading every linked page that is allowed by robots.txt — there is no perceivable impact, after all. You could have a dozen well-behaved bots running on your site at the same time with no noticeable performance hit.

    Of course, as you mention, this is really to stop webscrapers from taking key content and data off the website. But apart from radical and hopelessly unusable approaches like replacing all text with images, most any means that is employed by the firm attempting to protect the content can be defeated by a clever enough bot, because after all human end-users have to be able to use the website themselves, and have to be able to navigate from page to page, fill out the search forms, etc. etc. etc.

    Frankly, I think a better approach is to post “No webscraping” very plainly in your terms of service, set up a robots.txt file to exclude law-abiding well-behaved bots, and send your lawyers out after violators when you find your content has been stolen and replicated.

     
    • Brant

      Miramon, you clearly don’t understand the dynamics. They are scraping factual data and publishing it. Just as I am able to communicate verbally a discounted price to you for a given hotel, I am also able (and legally should be) to communicate that price by other means of communication. I think it’s very normal, automated or otherwise. Heaven forbid that I tell my friends of the good price on potatoes at the local market.

       
      • Miramon

        What don’t I understand? I clearly stated that webscrapers are collecting factual data and publishing it, just as you said. This is not a crime or even a tort in itself because data is not subject to copyright.

        However, deliberately accessing a particular website has been found to be subject to website terms of service — I personally disagree with this notion of an implied contract you haven’t even read or signed, but the courts seem to think it’s OK. And so if some real company with real assets bases their business on violating those terms, they may have a bad day in court some time down the line — pace Ryanair, who seems to have been rebuked in their attempts to suppress certain scrapers, possibly for contingent reasons having to do with their particular situation.

         
  2. Brant

    As sites find new ways to protect their content from scrapers, it becomes clearer and clearer of the importance of having direct relationships with the providers in order to develop and maintain value. Even a small amount of effectiveness in disrupting scrapers is 100% effective as the data scraped has to be relavent, accurate, complete, and up to date to provide anything of value.

     
 
 

Newsletter Subscription

Please subscribe now to Tnooz’s FREE daily newsletter.

This lively package of news and information from Tnooz’s web site provides a convenient digest of what’s happening in technology that drives the global travel, tourism and hospitality market.

  • Cancel