jump to navigation


1. Mike Bergman - October 24, 2007

I share your enthusiasm for the structured Web (http://www.mkbergman.com/?cat=23) and applaud your recent posts to bring it greater attention. I also think it is helpful to describe the four approaches to ‘user generated structure.’

But I question whether UGS is really in the same mold as user-generated content or even user-generated data. Besides making for an ugly acronym, UGS (“ughs”) will never be of interest or of value to users in the same way that content is. Face it, alone, structure is simply not sexy nor identifiable to a given user.

The real approach — the fifth approach — is to mine the value of the massive amounts of structured data (granted, mostly semi-structured data) that already exists. And much of that structure is from Web 1.0 sites, as well.

Structure is the root crossing the trail that we stumble over at night while gazing up at the stars. It is here, right now, and is in our metadata, our tables, our HTML, our tags, our content, our DOMs and our links. While user-generated structure is welcomed, and we will see rapid growth in linked structured data from publishers as has occurred from the Linked Open Data movement in the past few months, the trick is to figure out how to gain value from the structure already at hand.

The fifth way, then, will not be users generating anything special for structure at all. It will be background tools and extractors that grab the valuable structure already locked in our documents to connect with the linked data value growing daily across the Web.

Methinks the issue is not generating structure, but freeing and linking what already exists.

2. Nik - October 24, 2007


Great job trying to figure this out.

I think the anything “user generated” has to necessarily also has to necessarily mean “non-coordinated”. The whole confusion is because there are a bunch of folks acting independently that moves away from structure (i.e. the old taxonomy or library index approaches) and towards confusion (or conversely better outcomes i.e. wisdom of the crowds). A Godfather movie could imply Marlon Brando or Oscar movie or Both. Every user actning independently leads to confusion about labelling and the rest of the community COULD be worse off.

My thoughts on the approach:
1) Tagging is useful but not really a great value addition. The whole point of tagging is great for data that does not have a whole lot of text for “auto generating” tags. For e.g. it works great for links (Del.icio.us), images(Flickr) and videos (YouTube). But does a tag really add that much value when the content is all text (e.g. a blog post). Shouldn’t we able to figure out the tag based on the content (i.e. Similar to jiglu)

2) Structured Data- This I think has by far the greatest promise from a user generated perspective. Since the system is sheperding users to act in a co-ordinated way i.e. thou shall give your comments/responses in only A.B.C format

3) Bottom Up- No need to say anything more on this. Alex has covered the issue and related issues of why this does the work, spam etc and why the “semantic web” promise seems to be always 2 years away.

4) A central authority of meaning to me implies more of a “domain knowledge” approach highlighted in your previous post. You are trusting the central authority (based on a trusted community) to give meaning. You are essentially depending on the community to show a difference between Manhattan, NY or Manhattan, Kansas.

3. vruz - October 24, 2007

more like…. User-Generated Semantically Structured Content…

which is the same as saying:

Meaningful User-generated Content. “MUG”

sounds a lot more… meaningful to me. 🙂

4. Mark - October 24, 2007

interesting post

5. Danny - October 24, 2007
6. Mike Veytsel - October 24, 2007

I think that the best approach will be one that employs the best facets of the other approaches. Here is my recipe:

1) There are already many major hubs of fairly reliable structured data, openly available through APIs. Create a user tagging system that bypasses the ambiguity and redundancy of folksonomies by using only existing, structured data objects as endpoints for tag associations.

2) Use a centralized system to crossmap tags between structured abjects on various data hubs. The issues of systemic bias, false information, and vandalism can be resolved much more easily than in a system like Wikipedia, since one of the major benefits of a tagging system is that, unlike entire articles, tags are atomic in nature, so their accuracy can be measured by weighting. By weightings I mean that, just like in del.icio.us, the more users who’ve tagged a keyword to a site, the more visible and reliable it is.

3) User inertia is by far the hardest problem to overcome, because a worthwhile system will need user scale to be useful. A good system would attract users and scale by starting with tag associations which have innate incentives for the end user. Facebook already does this, quite successfully, with photo tagging of friends. Users don’t consider this work or altruism, but a natural extension of how they use the app. Likewise, delicious users tag sites as a means of organizing their online life. These types of actions are what Jakon Neilson calls ‘participation as a side effect’ in his interesting article about user participation: http://www.useit.com/alertbox/participation_inequality.html

The same guiding principles could be a applied to a more universal system.

Also, another huge advantage of a tagging system is that a) users are largely already familiar with tagging (according to a recent PEW Internet study), b) tagging is a microtransaction that requires little effort on the users part (as opposed to, say, adding to a Wikipedia article, or even filling in the structured data for an object).

The bottom-up, high level approach is only as complex for users as the system makes it. I think that at this stage, collective intelligence is still smarter (even if slower) than artificial intelligence. We just need a way to harness it usefully and effectively.

7. Between the Lines mobile edition - October 24, 2007

[…] Liew is writing about user generated structure. There are various ways this happens: tagging, soliciting data in structured ways, and explicit […]

8. David Goodman - October 24, 2007

We’ve come at the structured data problem by offering users value in return for their effort. We’ve targeted recipe as our social object .. don’t leave me now, recipes aren’t just for mom!

The idea is structuring the recipe and offering a suite a value-added services in return. Example: once the recipe is entered into Key Ingredient, you can blog it with a widget. The widget allows readers to save the recipe to their own collection, email it, print it, etc. But they can also assemble a collection of recipes and buy a print-on-demand cookbook.

The nice part of the structure is that we can parse out metadata (like ingredients, servings) to offer more services like nutritional estimates and recipe scaling on-the-fly. So I agree with Mike that further meaning can be implied from the basic structure of data in “unstructured” form.

Will users be bothered to structure their data? They will structure if the quid pro quo is there. Look at the success of photo scanning services. That is a huge investment, but literally millions of images have crossed that barrier into digital form. People have paid for it, and that says a lot today. Once digital, Shutterfly and the like offer the services that fulfill the promises that made the transition worthwhile.

Our challenge is to add value beyond the obvious, low-hanging fruit and make the transition into structured data worth the hassle.

9. Dave McClure - October 24, 2007

really excellent post jeremy.

i wrote about some related ideas a few months for O’Reilly Release 2.0 report on prediction markets, re: mining / structuring user-generated content for creating prediction markets in vertical communities.

lots of interesting potential around creating structure from user data, particularly where the domain of conversation can be used to filter / define specific datatypes & ratings / tags / reviews.

– dave mc

10. lawrence - October 24, 2007

I think I’d break out the structuring of UGC as follows:

The Users apply structure – add tags, annotate, drop content into the appropriate buckets, etc. This involves a heck of a lot of education (anybody else ever try to teach their users to SEO their content?), tapping into selfish interest (Delicious, Squidoo), and balancing the need for structure with the need to keep obstacles for participation low. Scales well, but tough to maintain consistency / quality.

The Site Administrators provide structure – aka, brute force, put those interns to work, aka the Mahalo, Yahoo! Directory, model. Preferably managed by complementing what’s done by the users. Doesn’t scale, but quality / consistency control is more enforceable.

The “System” provides the structure automatically based on implicit clues in the content, where the content is coming from, who’s linking to the content, etc. (what Mike says, and what Google is already doing in terms of structuring web site content into a search index).

Of these, to me, only the third method is next generation stuff.

11. Jan Horna - October 25, 2007

How about microformats (http://microformats.org/wiki/hreview)? Does not this solve the problem of meta data structure?

12. Jordan Mitchell - October 25, 2007

Yesterday you spoke of three sources for structure: UGC, inferred from knowledge of domain, and inferred from user behavior. I actually kinda think of it more simply: explicit and implicit.

So much of the structure today is explicit, whereas I think the bigger opportunities lie in implicit — that way, no one has to do all kinds of “work” and we don’t suffer the effects of participation inequality (where <1% of the population is driving). Maybe we’re already “voting” with our attention and “structuring” with our personal attributes (location, interests, behaviors, etc.) — layered on top of existing content attributes, of course.

13. fewquid - October 25, 2007

A few thoughts…

a) IDC reckon that data is growing at a compound annual rate of 56%. Most of that growth is in unstructured data.

b) In a corporate environment 85% or more of the data is unstructured. I don’t have numbers for the consumer world, but I’d guess it is even more skewed to unstructured data.

So my first point: In general, isn’t it clearly a losing battle to make the fastest growing majority of data try and behave like a shrinking minority??

Second point: the idea of a central repository of “meaning” is absurd, even at it’s most general. “Meaning” is highly personal. On almost any given topic, what has “meaning” for me will have no meaning for you (and vice versa). Meaning is fundamentally about personal relevance. The reason today’s search engines can be frustrating is because they have no concept of personal relevance (and because they mostly don’t consider meta-data).

Third point: the traditional concepts of structure really only work when data is relatively sparse. When it becomes superabundant, rigid “classical” ideas of structure break down. The concept of structure needs to evolve into something new, user driven and fundamentally transient.

Last point: There’s a ton of evolutionary potential in tagging that folks are barely scratching the surface of…

As you can hopefully tell, I think about this stuff a great deal (it’s my day job)…

14. Meaning = Data + Structure: User Generated Structure « digital asset management weblog - October 25, 2007

[…] Meaning = Data + Structure: User Generated Structure October 24, 2007 […]

15. alexiskold - October 25, 2007

Hi Jeremy,

This is well researched and thought through post! Here is my take on each of these:

1) Tagging is an awesome way of adding light structure/semantics on top of the content. The pure fact that it is engaging, well-spread and people do it is a huge plus for it.

The problem with tagging is that it is not precise. The tag is in the eye of the beholder and a collective set of tags may not amount to a consensus. Another problem is better seen through an example. If you tag the book “Road” like this: [book], [cormac mccarthy] [science fiction]. the tags do not reflect the structure of the object. The thing is that book defines object type, corman mccarthy is the author and science fiction is the genre. For an algorithm this is a hugely important information that gets lost.

2) Soliciting structure from the users is really about art of the interface. People hate forms. People hate long forms even more. An interface which engages the users and drives her to reveal the structure over time is the one that is likely to succeed.

3) I said a ton already on the bottom up annotation approach. It could work in theory but hard in practice on the web wide scale. If anything, I would be on microformats as a lighter and simpler approach.

4) The silos are not the answer, because squeezing the web into 1 site is not possible. Its too rich. We look for web-wide solution, thats what we really need.

Will see how things unfold. I look forward to your next posts.


16. SezWho Blog » Blog Archive » Meaning = Data + Structure: User Generated Structure - October 25, 2007

[…] Great piece by Jeremy Liew about now to make sense of all the user generated content. He explores 4 approaches prevalent right now: Tagging is the first approach, and its use has been endemic to web 2.0. Sometimes the tagging is limited to the author of the content, and other times any user can add tags to create a folksonomy. […]

17. Yen - October 26, 2007

Jeremy, excellent job summarizing and explaining the different approaches. As you and a number of your readers point out, organic approaches to creating a critical mass of usable, structured data is challenging.

At Kango, we believe it has to be approach four, and have invested to create an ontology to focus our efforts analyzing unstructured content (e.g. reviews and articles) & data (e.g. product attributes and location) to derive weighted tags (e.g. 80% percentile for kid-friendly) for travel products. The explosion of traveler-generated opinions (blogs, ratings, reviews, journals, articles, trip plans…) has been a gold mine for deriving those weighted tags, and enable consumers to search based on both subjective and objective criteria. For example, you can search for romantic hotels and activities in Monterey and get a different set of results then if you search for kid-friendly hotels and activities.

Look forward to hearing about how other ventures are generated structured data.

18. gerel - October 27, 2007

Well, I think you already gave the problem and the solution in this quote:

From a UGC perspective, site administrators can force structure by requiring every site contribution to have a parent category, or descriptive tags. The problem is that the more obstacles you put in place before content can be submitted, the less participation you are going to get.

You see ?, If people don’t like/want to give more structure to what they say, it’s likely they’re not saying anything worth hearing.
They don’t really have an argument then. And if they do have an argument, please think again before writing it !!!

The reality is that we have millions of trolls hanging around.


19. Meaning = Data + Structure « Lightspeed Venture Partners Blog - October 27, 2007

[…] User generated structure 2. Inferring structure from knowledge of the domain 3. Inferring structure from user […]

20. Meaning = Data + Structure: Inferring Structure from domain knowledge « Lightspeed Venture Partners Blog - October 29, 2007

[…] the idea that Meaning = Data + Structure. A number of readers commented on my previous post, about user generated structure. They point out that one of the challenges of relying on this approach is finding the right […]

21. Meaning = data + Structure: More thoughts on user generated structure « Lightspeed Venture Partners Blog - October 29, 2007

[…] trackback My post claiming that Meaning = Data + Structure and follow up post exploring how User Generated Structure is one way that structure can be added to data have generated some great comments from readers. […]

22. Nate Westheimer - October 29, 2007

There are some excellent comments here!

The only thing I could add, from my self-interested side of the world, is that the concept of UGC being separated from UG-meta-data comes from the fact that blogs and wikis are the only mainstream publishing tools.

BricaBox will be one way folks can start to use the “right tool for the job.” When content is better published with structure (not always the case!) we hope they use our platform.

But advertisement aside, the key is using the right publishing tool for the job. It’s a lot simpler than using the wrong tool and then trying to use underdeveloped semantic technologies to clean up the mess.

23. User-Contributed Data Auditing? « SmoothSpan Blog - October 29, 2007

[…] so it matters.  There’s another rash of articles dealing with user-generated metadata or user-generated structure.  These are all worthwhile concepts that show how people using the web can add back tremendous […]

24. smoothspan - October 29, 2007

I love the idea of users contributing structure. There are so many ways users can help. A completely different example is having users contribute accuracy for business data:


25. Eghosa - October 30, 2007

Jeremy, you probably should come attend my VC Taskforce panel on Semantic Search & Discovery coming up next week (Nov 6).
We will touch upon some of the issues you raised.

Cheers, Eghosa

26. Vario Creative Blog - Marketing, design, web tech and small farm animals - October 31, 2007

[…] doesn’t mean that we can’t provide a means to allow users to enrich and enlighten.  As Jeremy Liew suggests in his recent post, there are a couple of ways to expand upon the taxonomy, by relying on […]

27. amisare - November 4, 2007

There are two main considerations when setting up User Generated Structures for UGC’s:

1. How to group key data ie Categorisation Rules
2. Who sets up the Rules/Compliance ie Categorisation Control

Rules can be:

• Discrete: logical; either/or; black/white; pros/cons
• Non-discrete or Analogue: fuzzy; gaps & overlap ;shades of gray; relationships

Control can be private/local or communal/central.

Thus Jeremy’s Four Ways (Approaches) may be approximately fitted into the following matrix which is formed by combining the above two main considerations:

Non-discrete Discrete Categorisation
Private/Local |1st Approach | 2nd Approach
Control | Tagging |Wikihow ;Powereview
Communal/Central | 3rd Approach | 4th Approach
Control | Semantic web | Metaweb; Freebase

The 4th approach requires hard work but may provide good seach results.

28. Meaning = Data + Structure: Inferring structure from user behavior « Lightspeed Venture Partners Blog - November 19, 2007

[…] up with a couple of posts on ways that structure can be added to user generated content, through user generated structure, and through inferring structure from domain knowledge. The third way that structure can be […]

29. 2008 Consumer Internet Predictions « Lightspeed Venture Partners Blog - December 3, 2007

[…] to unstructured data, substantially improving the user experience. This will include both explicit (user-generated structure) and implicit (inferring structure from domain knowledge or user behavior) […]

30. 網絡集錦 « Alan Poon’s Blog - February 22, 2008

[…] 網絡集錦 Friday, February 22, 2008 — Alan Lightspeed Venture Partners Blog – Meaning = Data + Structure: User Generated Structure https://lsvp.wordpress.com/2007/10/24/meaning-data-structure-user-generated-structure/ […]

31. bentrem - February 25, 2008

Something I’m waiting for is to find “structuration” in context of an article like this. (I can’t expect “tensgegrity” … something just too too precious about that one.)

What I’ve been trying to say about conventional forum flow (hard to be critical without being read as cynical or put-downie or sour-grapes) is that it imposes a very primitive structure. Typically, a far too sweeping subject followed by responses of all sorts arrayed by nothing more than chronology. (Threaded systems like LiveJournal introduce a sophistication that can be very beneficial. “LiquidThreads” for MediaWiki goes some distance in improving that platform.)

My own project is, well, I’m tempted to say “orthogonal” to these, and to those you’ve mentioned … something like comparing ToC and FootNotes.


32. The Software Abstractions Blog - February 25, 2008

Semantic Web: Where are the Meaning-Enabled Authoring Tools?

Jason Kolb sees it as a way to identify data objects using URIs. John Markoff, of the New York Times, calls it Web 3.0 . And Nova Spivack has a long post clarifying what it is Not. What are all

33. Innovablog > Le Web Sémantique : Où sont les outils de création de contenu riche ? - February 25, 2008

[…] Lightspeed Venture Partners a récemment publié une série d’articles sur le Web Sémantique : Meaning = Data + Structure, basé sur une structure créée par l’individu ; le domaine du savoir ainsi que le comportement […]

34. Meaning = Data + Structure: User Generated Structure — Biography. writers and their biography - April 1, 2008

[…] Ive been thinking about how the explosion of user generated content that has characterized web 2.0 can be made more useful by the addition of structure, ie meaning = data + structure. The obvious way that structure can be added to user generated content is by asking users to do it – user generated structure. There are at least four ways that I can think of to get at user generated structure. Tagging is the first approach, and its use has been endemic to web 2.0. Sometimes the tagging is li source: Meaning = Data + Structure: User Generated Structure […]

35. Nodalities » Blog Archive » This Week’s Semantic Web - April 17, 2008

[…] Meaning = Data + Structure: User Generated Structure […]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: