Wednesday, February 14, 2007

Collaborative Semantic Bookmarking

When it comes to the semantic web, the general concern amongst researchers is feasability. If implemented, it would afford countless opportunities — but the path is the problem. How would we go about developing ontologies for the semantic web? And who would spend the time writing semantically accurate markup? Search doesn't seem so terrible right now, so there's no incentive.

What if, instead of waiting for an incentive on the producer's end, the users had an incentive to implement the semantic web? I propose this idea as an initial step: a collaborative bookmarking system, a la del.icio.us, allowing for semantic tagging rather than keyword tagging.

A first incarnation of semantic tagging might simply allow you to assign binary relationships in the form subject-verb-object, where subject is always the page in question. As a naive example, consider "Brain Diseases I Wish I Had". del.icio.us users have used the tags "article", "video", "science" and "psychology" (amongst other things). Semantic tags would say that it is in the form of an article, addresses science and psychology, and contains video.

Users would contribute this information because it would allow them to search their own bookmarks easily and find new links contributed by other users more efficiently. Using current web technologies like Ajax to reccomend words for relational tags (like "contains") would help hone the network. Clustering algorithms already implemented on sites like Flickr could help answer questions about the architecture of the web and increase the accuracy of search results despite multiple naming conventions. Simple analysis would allow automated summaries of a site's contents.

Over time, more detailed semantic information could be added (like recognizing psychology as a type of science), or even imported from Wikipedia or other open categorization systems and expanded upon.

6 comments:

Jason said...

I don't think it'll take off, for the simple reason that simple tags are so damned convenient, and still enable me to search for exactly what I want easily.

Granted, other people might use different tags than I do ("tv" vs. "television"), but a non-stupid search can group tags that seem to be related (ie. objects are often tagged both), and you'd end up with similar problems with a more semantic markup, too... different people generate different ontologies.

Which is the fundamental problem: the world defies categorization, yet we as humans love to categorize.

I like the concept of tagging, since it keeps the ontologies loose and organic, so that you don't have to worry about a lot of things. What you do have to deal with is that it won't be "perfect."

Kyle said...

Tags are certainly convenient; which is why I'm recommending, essentially, two-word tags (verb & object). I think there will be enough of a usability increase with such a small change in effort (i.e., lookup becomes radically less "complex" when insertion is made slightly more so) that it would be used. That's just extrapolating from what I think I'd use, and what it seems like people have been using recently (especially tools that allow new ways of relating to more people on higher levels).

Great visualizations of personal bookmarks and new browsing interfaces would be a bit of an incentive as well.

"...different people generate different ontologies." I've seen a few papers with people making progress on this. I'm pretty sure when you have more information it's, on a macroscopic scale, easier to resolve similarities.

"...the world defies categorization, yet we as humans love to categorize." Western philosophers have had a debate going for a while about whether categories are inherent to things or not. I'm on your side, I think categories are things we construct — we're analogy machines running an OS called "language". But these constructs are useful and, to some extent, communicable. So semantic search and categorization may be difficult but not practically insurmountable.

I agree that tagging is "loose" and "organic". I'd add "pure" or "elegant" even. And I like that about it. What I have in mind is something that combines the best features of tagging with those of traditional categorical/ontological systems. I think this simple idea would afford enough information to really change how we view, search, and browse new and old bookmarks/pages. You can only derive so much from tags alone.

Jason said...

Oh, I definitely agree that there is an incentive, and it's more than definitely worth the extra effort.

But even though it's worth it, I'm still lazy, and don't know if I could motivate myself thus. Perhaps, though. I'd have to try it out.

In any case, practicality aside, I still think XML sucks as a representation scheme; but that aside, that there's got to be a more elegant way to build the Semantic Web than the crazy amount of stuff the W3C has been throwing in... but maybe I just havn't looked at it closely enough. But it just screams ugly and complex to me, the same way that HTML is ugly.

Kyle said...

Why is XML a terrible representation scheme? Can you reccomend something better (real or imagined)? Would this allow for all the complexities of knowledge representation?

Jason said...

It's big, it's hard for a human to read, and requires a big parsing libraries.

Of course, since it's a dominant standard, it's probably better than lots of sucky competing standards.

In any case, at the expense of sounding like a LISP freak, if I were given a choice, I'd argue for S-Expressions... concise, easy to parse, and does everything XML does.

http://theory.lcs.mit.edu/~rivest/sexp.txt

Kyle said...

I love S-expressions just as much, but I don't see anything inherent to them that would improve readability or conciseness of ontological or semantic information. You still have to deal with namespaces somehow, attribute/value pairs, etc. I think there's a more fundamental problem than XML, and that's what you're seeing.