jump to navigation

Semantic web in travel February 26, 2008

Posted by jeremyliew in data, meaning, semantic web, structure, travel.
5 comments

I saw today that Radar raised a Series B for its semantic web application. As I’ve noted in the past, I am a believer in approaching the semantic web top down rather than bottom up, i.e. by inferring structure from domain knowledge rather than requiring all websites to mark up their content in RDF. The user doesn’t care about the semantic web (just as they don’t care about wikis or web 2.0 or tagging), all they care about is that they can more quickly get to the things that they want. The mechanisms that we use to create this better experience should be invisible to the user.

Two companies that are taking this approach are doing it in travel. Travel is a good vertical to start in for three reasons (i) lots of users (ii) well defined universe of data and (iii) easy to monetize.

The first of these is Tripit. Tripit takes travel confirmation emails from multiple sources and creates a master itinerary. As Mike Arrington noted in Techcrunch:

It’s dead simple to use and it keeps you organized – all you have to do is forward confirmation emails to them when you purchase airline tickets, hotel reservations, car rentals, etc. Tripit pulls the relevant information out of the emails and builds an organized itinerary for you. You can send emails in any order, for multiple trips, whatever. It just figures everything out and organizes it.

This is a great example of the semantic web being used to improve a users experience, invisibly. The user neither knows nor cares that Tripit is inferring structure from the emails (e.g. SFO is an airport in San Francisco, the Clift is a hotel in San Francisco, and since your reservation at the Clift starts on the same day as your arrive into SFO, Tripit will offer driving directions automatically from SFO to the Clift etc). All the user knows is that they automagically have a single itinerary compiled and supplemented with other relevant information (e.g. maps, weather etc).

The second is Kango. Kango helps travelers decide where they want to go by crawling 10,000 sites and 18,000,000 reviews and organizing that content semantically. As Erik Schonfeld of Techcrunch notes:

But what’s promising about Kango is the way it slices up search subjectively. Kango is building a semantic search engine focussed narrowly on travel. It parses the language in all of those reviews and guides, and categorizes them by generating tags for them. “You cannot wait for users to add tags, you have to derive them,” says CEO Yen Lee. So hotels that have been reviewed across the Web (on sites like Yahoo Travel, TripAdvisor, or Yelp) with words such as “perfect,” “relaxing,” “couples,” “honeymoon,” or “spa” would rank higher in a search for romantic travel. Hotels associated with the words “kitchen,” “pool,” and “kids,” would rank higher in a search for family trips.

Again, the semantics are being applied in a way that is invisible to users. Users don’t need to know how key words in reviews are mapped to characteristics like “family” or “romantic”. The company uses its domain knowledge to make this transparent to the user.

Expect to see more such semantic web approaches to specific verticals.

2008 Consumer Internet Predictions December 3, 2007

Posted by jeremyliew in 2008, ad networks, advertising, casual games, Consumer internet, games, gaming, mmorpg, predictions, semantic web, social media, social networks, structure, user generated content, video.
16 comments

Last year I made some predictions about the consumer internet in 2007 and they were at least directionally correct. So let me take a crack at 2008. Regular readers will not be surprised at some of my predictions as they are themes that I’ve been talking about for some time. Later in the week my colleagues will take a crack at predictions for Mobile, Infrastructure and Cleantech.

1. Social Media advertising, Online Video advertising and In-Game advertising start to become scalable.

Social media, online video and games are at early stages of development as advertising vehicles. Even more than the internet at large, a disproportionately small percentage of advertising dollars are being spent on these three media relative to time spent. Some people have even questioned if social media will be a media business at all, or online if video is a good way to monetize.

The slow start is because there are no standards yet in any of these media. If an advertiser wants to buy TV advertising across NBC, CBS, ABC and FOX, they can buy a common unit, the 30 second spot. If she wants to buy print advertising across Time, Fortune, Forbes, Newseek and Businessweek, she could similarly buy a common unit (e.g. a full page ad). But to buy across YouTube, Metacafe and Break, or across Myspace, Facebook and Bebo, or across GTA, Wild Tangent games and Pogo.com games, she needs to buy custom ad units in each property. This makes ad sales look more like business development – she is negotiating not just price, demographics and reach, but also what the actual units are. This is what makes new forms of advertising so hard. All three industries need ad unit standards to be able to scale. Otherwise they will be trapped by demands for customization.

This year, standards will start to emerge in each media. Some candidates for standards include (i) for social media; behavioral targeting, content targeting, demographic targeting or social ads, (ii) for online video; contextual targeting, overlays or pre-roll and (iii) for in game advertising; rich media or product placements. I don’t know which of these candidates will become standards, but I am confident that we will start to see growing support from both advertisers and publishers for the more successful units.

Ad networks will also gain share in each media, helping make the process of both buying and selling advertising easier.

Viewed through this lens, Facebook’s recent Beacon launch and subsequent adjustments are simply early moves towards figuring out what will be the native social media standard.

2. Structured web emerges.

The last couple of years have seen an explosion of user- generated content, across blogs, social networks, social media sites and user reviews. Previously, when most web content was created by editors, there was good structure and metadata around it. As most of the user- generated content has been unstructured, there has been an overall decrease in the level of structure, and hence a decrease in the ease with which people and computers can access and use this data.

But Meaning = Data + Structure. Search on user-generated sites has not been a great experience so far. This year we should start to see some point solutions emerge to help add structure to unstructured data, substantially improving the user experience. This will include both explicit (user-generated structure) and implicit (inferring structure from domain knowledge or user behavior) methods.

3. Games 2.0

Tens of millions of users are now using casual immersive worlds and playing MMOGs. These sites are some of the stickiest on the web, resulting in some of the highest levels of time spent per month online, and indicating that this is becoming a primary form of online communication for some users. Many of these users skew young, and if you believe that demographics is destiny, then you will expect this behavior to spread. The social aspects of these games is key to their popularity

Even more people are playing casual games online. These people often don’t have the ability to commit the time that MMOGs demand. They want to play with their friends, but instead of spending hours online together, they want to do it on their own schedule and in bite sized chunks.

These trends are likely to come together in asynchronous multiplayer games.

Other key drivers of growth for these products will include innovation in business models (free to play, ad- based and digital goods- based models) and channels (in- browser gaming, mobile, widgets).

Note – this post is cross posted to Venturebeat.

Meaning = Data + Structure: Inferring structure from user behavior November 19, 2007

Posted by jeremyliew in attention, data, semantic web, structure, user generated content.
11 comments

A little while ago I started a series about the structured web where I claimed that Meaning = Data + Structure. I followed up with a couple of posts on ways that structure can be added to user generated content, through user generated structure, and through inferring structure from domain knowledge. The third way that structure can be inferred is from user behavior, otherwise known as attention. As Wikipedia notes:

Attention economics is an approach to the management of information that treats human attention as a scarce commodity, and applies economic theory to solve various information management problems.

Alex Iskold has a good overview of the attention economy elsewhere at ReadWriteWeb.

By watching user behavior, by inferring intent and importance from the gestures and detritus of actions taken for other purposes, you can sometimes also infer structure about unstructured data. Google does this with its PageRank algorithm, Del.icio.us uses individual bookmarking to build a structured directory to the web, and Xobni maps social networks through analysis of your emailing patterns. Behavioral targeted advertising is based on the assumption that users display their interests through the websites they visit.

Using implicit data to infer structure requires making some assumptions about what each behavior means, but it can be a useful supplement to the other two methods of inferring data. As with inferring structure from domain knowledge, it requires a well defined ontology so that people and things can be mapped against it

Would love to hear more examples of using attention data to infer structure.

Meaning = Data + Structure October 22, 2007

Posted by jeremyliew in data, meaning, semantic web, structure, user generated content.
20 comments

Through Techcrunch, I saw the video “Information R/evolution” embedded below (5minutes, worth watching):

The video’s key message is that when information is stored digitally instead of in a material world, then our assumptions about how to get to information, and how information gets to us, are substantially disrupted, allowing for high quality (and quantity) user generated, organized, curated and disseminated content.

It’s an entertaining video and spot on. However, I think it glosses over one key point about make information truly useful. User generated content, often unstructured, can be very hard to navigate and search through. Adding structure makes the data vastly more meaningful.

Search engines are the best example of how adding structure (a search index) to an unstructured data set (the list of all websites) makes the dataset more useful. Whether that structure is established by link popularity (as Google and all modern search engines do) or by human editors (as Yahoo started out) affects the size and quality of the structure, but even a rudimentary structure built by humans is better than no structure at all.

Social networks are another great example of how adding structure (a social graph) to an unstructured data set (personal home pages) improves the data’s usefulness. There were plenty of successful examples of personal home pages and people directories in the late 90s , including Tripod and AOL’s People Connect, but none of them had the high levels of user engagement that MySpace, Facebook, Bebo and the current generation of social networks have.

One of the key themes of Web 2.0 has been the rise of user generated content. Often this content has been largely unstructured. Unstructured data is hard to navigate by search – you need to rely on the text, and that can be misleading.

Take one of my favorite websites, Yelp, as an example. If I do a search for diabetes near 94111, I get one relevant result (i.e. a doctor) in the top 10 – the rest of the results range from tattoo parlors to ice cream parlors, auto repair to sake stores. All contain the word “diabetes” in a review, some humorously, others incidentally.

This isn’t a one off either; try baseball mitt, TV repair or shotgun. In every case, the search terms show up in the text of the review, which is the best that you can hope for with unstructured data.

Recently I’ve started to become intrigued in companies who are adding structure to unstructured data. There seem to be at least three broad approaches to this problem:

1. User generated structure
2. Inferring structure from knowledge of the domain
3. Inferring structure from user behavior.

I’m not smart enough to know if this is the semantic web or web 3.0, or even if the labels are meaningful. But I do know that finding ways to add or infer structure from data is going to improve the user experience, and that is always something worth watching for.

I’m going to explore the three broad approaches that I’ve seen in subsequent posts, but would love to hear reader’s thoughts on this topic.

I’ve found this post on the structured web by Alex Iskold to be very helpful in thinking about this topic.