jump to navigation

Nutanix launches and a new era for data center computing is born — No SAN or NAS required! August 16, 2011

Posted by ravimhatre in 2011, Cloud Computing, data, database, datacenter, enterprise infrastructure, Infrastructure, platforms, Portfolio Company blogs, startup, startups, Storage, Uncategorized.
Tags: , , , , , ,
5 comments

The Nutanix team (ex-Googlers, VMWare, and Asterdata alums) have been quietly working to create the world’s first high-performance appliance that enables IT to deploy a complete data center environment (compute, storage, network) from a single 2U appliance.

The platform also scales to much larger configurations with zero downtime or admin changes and users can run a broard array of mixed workloads from mail/print/file servers to databases to back-office applications without having to make upfront decisions about where or how to allocate their scare hardware resources.

For the first time an IT administrator in a small or mid-sized company or a branch office can plug in his or her virtual data center and be up/running in a matter of minutes.

Some of the most disruptive elements of Nutanix’s technology which enable the customer to avoid expensive SAN and NAS investments typically required for true data center computing are aptly described on company’s bloghttp://www.nutanix.com/blog/.

Take a look. We believe this represents the beginning of the next generation in data center computing.

Data exhaust moves beyond targeted marketing and into financial services decision making November 27, 2010

Posted by jeremyliew in data, financial services, marketing, targeting.
2 comments

Fascinating article in the WSJ a couple of weeks ago about how the big insurance companies are testing using data profiles to identify risky clients. Using the data is potentially an alternative to the costly physical exams currently used to underwrite health insurance policies:

 

In one of the biggest tests, the U.S. arm of British insurer Aviva PLC looked at 60,000 recent insurance applicants. It found that a new, “predictive modeling” system, based partly on consumer-marketing data, was “persuasive” in its ability to mimic traditional techniques…

Making the approach feasible is a trove of new information being assembled by giant data-collection firms. These companies sort details of online and offline purchases to help categorize people as runners or hikers, dieters or couch potatoes.

They scoop up public records such as hunting permits, boat registrations and property transfers. They run surveys designed to coax people to describe their lifestyles and health conditions.

Increasingly, some gather online information, including from social-networking sites. AcxiomCorp., one of the biggest data firms, says it acquires a limited amount of “public” information from social-networking sites, helping “our clients to identify active social-media users, their favorite networks, how socially active they are versus the norm, and on what kind of fan pages they participate.”…

Acxiom says it buys data from online publishers about what kinds of articles a subscriber reads—financial or sports, for example—and can find out if somebody’s a gourmet-food lover from their online purchases. Online marketers often tap data sources like these to target ads at Web users.

Not everyone is comfortable with this approach. Some regulators have raised potential concerns:

“An insurer could contend that a subscription to ‘Hang Gliding Monthly’ is predictive of highly dangerous behavior, but I’m not buying that theory: The consumer may be getting the magazine for the pictures,” says Thomas Considine, New Jersey’s commissioner of banking and insurance.

I think I’d bet against Mr. Considine on this one.

I’m fascinated by the idea of using publicly available data to make better underwriting decisions, whether for insurance or for lending. This isn’t a new idea. Student loans first became a growth industry when someone decided that using a students major (pre-med vs liberal arts) or GPA could help them decide who to lend to and how much. But as the amount of data available has exploded, whether directly reported (e.g. on social networks), inferred from behavior (e.g. web surfing and ecommerce habits) or volunteered as part of an application (e.g. bank account log in info, as supplied to Mint, that can show regularity of income and cash payments), a financial instituions ability to underwrite more individually goes well beyond FICA scores.

I’m interested in any companies looking at doing something like this. Email me.

 

Semantic web in travel February 26, 2008

Posted by jeremyliew in data, meaning, semantic web, structure, travel.
5 comments

I saw today that Radar raised a Series B for its semantic web application. As I’ve noted in the past, I am a believer in approaching the semantic web top down rather than bottom up, i.e. by inferring structure from domain knowledge rather than requiring all websites to mark up their content in RDF. The user doesn’t care about the semantic web (just as they don’t care about wikis or web 2.0 or tagging), all they care about is that they can more quickly get to the things that they want. The mechanisms that we use to create this better experience should be invisible to the user.

Two companies that are taking this approach are doing it in travel. Travel is a good vertical to start in for three reasons (i) lots of users (ii) well defined universe of data and (iii) easy to monetize.

The first of these is Tripit. Tripit takes travel confirmation emails from multiple sources and creates a master itinerary. As Mike Arrington noted in Techcrunch:

It’s dead simple to use and it keeps you organized – all you have to do is forward confirmation emails to them when you purchase airline tickets, hotel reservations, car rentals, etc. Tripit pulls the relevant information out of the emails and builds an organized itinerary for you. You can send emails in any order, for multiple trips, whatever. It just figures everything out and organizes it.

This is a great example of the semantic web being used to improve a users experience, invisibly. The user neither knows nor cares that Tripit is inferring structure from the emails (e.g. SFO is an airport in San Francisco, the Clift is a hotel in San Francisco, and since your reservation at the Clift starts on the same day as your arrive into SFO, Tripit will offer driving directions automatically from SFO to the Clift etc). All the user knows is that they automagically have a single itinerary compiled and supplemented with other relevant information (e.g. maps, weather etc).

The second is Kango. Kango helps travelers decide where they want to go by crawling 10,000 sites and 18,000,000 reviews and organizing that content semantically. As Erik Schonfeld of Techcrunch notes:

But what’s promising about Kango is the way it slices up search subjectively. Kango is building a semantic search engine focussed narrowly on travel. It parses the language in all of those reviews and guides, and categorizes them by generating tags for them. “You cannot wait for users to add tags, you have to derive them,” says CEO Yen Lee. So hotels that have been reviewed across the Web (on sites like Yahoo Travel, TripAdvisor, or Yelp) with words such as “perfect,” “relaxing,” “couples,” “honeymoon,” or “spa” would rank higher in a search for romantic travel. Hotels associated with the words “kitchen,” “pool,” and “kids,” would rank higher in a search for family trips.

Again, the semantics are being applied in a way that is invisible to users. Users don’t need to know how key words in reviews are mapped to characteristics like “family” or “romantic”. The company uses its domain knowledge to make this transparent to the user.

Expect to see more such semantic web approaches to specific verticals.

Meaning = Data + Structure: Inferring structure from user behavior November 19, 2007

Posted by jeremyliew in attention, data, semantic web, structure, user generated content.
11 comments

A little while ago I started a series about the structured web where I claimed that Meaning = Data + Structure. I followed up with a couple of posts on ways that structure can be added to user generated content, through user generated structure, and through inferring structure from domain knowledge. The third way that structure can be inferred is from user behavior, otherwise known as attention. As Wikipedia notes:

Attention economics is an approach to the management of information that treats human attention as a scarce commodity, and applies economic theory to solve various information management problems.

Alex Iskold has a good overview of the attention economy elsewhere at ReadWriteWeb.

By watching user behavior, by inferring intent and importance from the gestures and detritus of actions taken for other purposes, you can sometimes also infer structure about unstructured data. Google does this with its PageRank algorithm, Del.icio.us uses individual bookmarking to build a structured directory to the web, and Xobni maps social networks through analysis of your emailing patterns. Behavioral targeted advertising is based on the assumption that users display their interests through the websites they visit.

Using implicit data to infer structure requires making some assumptions about what each behavior means, but it can be a useful supplement to the other two methods of inferring data. As with inferring structure from domain knowledge, it requires a well defined ontology so that people and things can be mapped against it

Would love to hear more examples of using attention data to infer structure.

Meaning = Data + Structure October 22, 2007

Posted by jeremyliew in data, meaning, semantic web, structure, user generated content.
20 comments

Through Techcrunch, I saw the video “Information R/evolution” embedded below (5minutes, worth watching):

The video’s key message is that when information is stored digitally instead of in a material world, then our assumptions about how to get to information, and how information gets to us, are substantially disrupted, allowing for high quality (and quantity) user generated, organized, curated and disseminated content.

It’s an entertaining video and spot on. However, I think it glosses over one key point about make information truly useful. User generated content, often unstructured, can be very hard to navigate and search through. Adding structure makes the data vastly more meaningful.

Search engines are the best example of how adding structure (a search index) to an unstructured data set (the list of all websites) makes the dataset more useful. Whether that structure is established by link popularity (as Google and all modern search engines do) or by human editors (as Yahoo started out) affects the size and quality of the structure, but even a rudimentary structure built by humans is better than no structure at all.

Social networks are another great example of how adding structure (a social graph) to an unstructured data set (personal home pages) improves the data’s usefulness. There were plenty of successful examples of personal home pages and people directories in the late 90s , including Tripod and AOL’s People Connect, but none of them had the high levels of user engagement that MySpace, Facebook, Bebo and the current generation of social networks have.

One of the key themes of Web 2.0 has been the rise of user generated content. Often this content has been largely unstructured. Unstructured data is hard to navigate by search – you need to rely on the text, and that can be misleading.

Take one of my favorite websites, Yelp, as an example. If I do a search for diabetes near 94111, I get one relevant result (i.e. a doctor) in the top 10 – the rest of the results range from tattoo parlors to ice cream parlors, auto repair to sake stores. All contain the word “diabetes” in a review, some humorously, others incidentally.

This isn’t a one off either; try baseball mitt, TV repair or shotgun. In every case, the search terms show up in the text of the review, which is the best that you can hope for with unstructured data.

Recently I’ve started to become intrigued in companies who are adding structure to unstructured data. There seem to be at least three broad approaches to this problem:

1. User generated structure
2. Inferring structure from knowledge of the domain
3. Inferring structure from user behavior.

I’m not smart enough to know if this is the semantic web or web 3.0, or even if the labels are meaningful. But I do know that finding ways to add or infer structure from data is going to improve the user experience, and that is always something worth watching for.

I’m going to explore the three broad approaches that I’ve seen in subsequent posts, but would love to hear reader’s thoughts on this topic.

I’ve found this post on the structured web by Alex Iskold to be very helpful in thinking about this topic.