jump to navigation

Semantic web in travel February 26, 2008

Posted by jeremyliew in data, meaning, semantic web, structure, travel.
5 comments

I saw today that Radar raised a Series B for its semantic web application. As I’ve noted in the past, I am a believer in approaching the semantic web top down rather than bottom up, i.e. by inferring structure from domain knowledge rather than requiring all websites to mark up their content in RDF. The user doesn’t care about the semantic web (just as they don’t care about wikis or web 2.0 or tagging), all they care about is that they can more quickly get to the things that they want. The mechanisms that we use to create this better experience should be invisible to the user.

Two companies that are taking this approach are doing it in travel. Travel is a good vertical to start in for three reasons (i) lots of users (ii) well defined universe of data and (iii) easy to monetize.

The first of these is Tripit. Tripit takes travel confirmation emails from multiple sources and creates a master itinerary. As Mike Arrington noted in Techcrunch:

It’s dead simple to use and it keeps you organized – all you have to do is forward confirmation emails to them when you purchase airline tickets, hotel reservations, car rentals, etc. Tripit pulls the relevant information out of the emails and builds an organized itinerary for you. You can send emails in any order, for multiple trips, whatever. It just figures everything out and organizes it.

This is a great example of the semantic web being used to improve a users experience, invisibly. The user neither knows nor cares that Tripit is inferring structure from the emails (e.g. SFO is an airport in San Francisco, the Clift is a hotel in San Francisco, and since your reservation at the Clift starts on the same day as your arrive into SFO, Tripit will offer driving directions automatically from SFO to the Clift etc). All the user knows is that they automagically have a single itinerary compiled and supplemented with other relevant information (e.g. maps, weather etc).

The second is Kango. Kango helps travelers decide where they want to go by crawling 10,000 sites and 18,000,000 reviews and organizing that content semantically. As Erik Schonfeld of Techcrunch notes:

But what’s promising about Kango is the way it slices up search subjectively. Kango is building a semantic search engine focussed narrowly on travel. It parses the language in all of those reviews and guides, and categorizes them by generating tags for them. “You cannot wait for users to add tags, you have to derive them,” says CEO Yen Lee. So hotels that have been reviewed across the Web (on sites like Yahoo Travel, TripAdvisor, or Yelp) with words such as “perfect,” “relaxing,” “couples,” “honeymoon,” or “spa” would rank higher in a search for romantic travel. Hotels associated with the words “kitchen,” “pool,” and “kids,” would rank higher in a search for family trips.

Again, the semantics are being applied in a way that is invisible to users. Users don’t need to know how key words in reviews are mapped to characteristics like “family” or “romantic”. The company uses its domain knowledge to make this transparent to the user.

Expect to see more such semantic web approaches to specific verticals.

Meaning = Data + Structure October 22, 2007

Posted by jeremyliew in data, meaning, semantic web, structure, user generated content.
20 comments

Through Techcrunch, I saw the video “Information R/evolution” embedded below (5minutes, worth watching):

The video’s key message is that when information is stored digitally instead of in a material world, then our assumptions about how to get to information, and how information gets to us, are substantially disrupted, allowing for high quality (and quantity) user generated, organized, curated and disseminated content.

It’s an entertaining video and spot on. However, I think it glosses over one key point about make information truly useful. User generated content, often unstructured, can be very hard to navigate and search through. Adding structure makes the data vastly more meaningful.

Search engines are the best example of how adding structure (a search index) to an unstructured data set (the list of all websites) makes the dataset more useful. Whether that structure is established by link popularity (as Google and all modern search engines do) or by human editors (as Yahoo started out) affects the size and quality of the structure, but even a rudimentary structure built by humans is better than no structure at all.

Social networks are another great example of how adding structure (a social graph) to an unstructured data set (personal home pages) improves the data’s usefulness. There were plenty of successful examples of personal home pages and people directories in the late 90s , including Tripod and AOL’s People Connect, but none of them had the high levels of user engagement that MySpace, Facebook, Bebo and the current generation of social networks have.

One of the key themes of Web 2.0 has been the rise of user generated content. Often this content has been largely unstructured. Unstructured data is hard to navigate by search – you need to rely on the text, and that can be misleading.

Take one of my favorite websites, Yelp, as an example. If I do a search for diabetes near 94111, I get one relevant result (i.e. a doctor) in the top 10 – the rest of the results range from tattoo parlors to ice cream parlors, auto repair to sake stores. All contain the word “diabetes” in a review, some humorously, others incidentally.

This isn’t a one off either; try baseball mitt, TV repair or shotgun. In every case, the search terms show up in the text of the review, which is the best that you can hope for with unstructured data.

Recently I’ve started to become intrigued in companies who are adding structure to unstructured data. There seem to be at least three broad approaches to this problem:

1. User generated structure
2. Inferring structure from knowledge of the domain
3. Inferring structure from user behavior.

I’m not smart enough to know if this is the semantic web or web 3.0, or even if the labels are meaningful. But I do know that finding ways to add or infer structure from data is going to improve the user experience, and that is always something worth watching for.

I’m going to explore the three broad approaches that I’ve seen in subsequent posts, but would love to hear reader’s thoughts on this topic.

I’ve found this post on the structured web by Alex Iskold to be very helpful in thinking about this topic.