Reimplementing the Web


It's masters second throw

I'm a bit dazed. He is doing it again - he is reinventing the Web. For years he's been pushing it. Using RDF as a blueprint for language itself, he opens up the Web to everybody to do it their way. And achieves what was meant to be there in the beginning: the emergence of meaning from data.

The self-effacing, unotrusive guy with the lilting voice. Who has the spleen to think that 'W' is green. When the Web reached me in 1993 my opinion was that it was 'just about ok', LaTeX to be superior to HTML and that the one cool thing Tim Berners-Lee invented for us all was the notion of the URL.
So I thought, all-in-all "that Tim Berners-Lee guy, he was a bit lucky, he took many things already present together, mixed them into a new cup and had this one genuine ingredient that he contributed. But kudos to him for succeeding."

Today I reevaluate my position. Despite all the crap that I could not make head or tail of in the Web evolution (proprietary browser extensions, frames, ani-gifs and plugin plagues), here comes back the drive towards the meaning of content. And it is not XML but RDF (Resource Description Framework). The ideas behind this are not new. But RDF has to be perceived in it's full strength (beyond Remote Site Syndication).

It will for one thing allow the Web to become a place where information matters again, and in the second step it paves the road for the future - towards more intelligent agents for sure, and as more experience is expressed in RDF, it may lead on to an ontology of the web, which could be used for inference through AI agents or proof machines. And it ain't Sci-Fi. It's here already, and all you can't see is the lower 7/8's of the iceberg.

What is it ?
  1. RDF allows the expression of statements in the form of Subject - Predicate - Value, and even statements about statements (since a statement can be a subject) in XML notification.
  2. Using XML Namespaces it releases the meaning to the definition of the individual user, organisation etc, by letting them define existing predicates describing their purpose.
The 'statement' thing is quite cool. Imagine a web page can tell things like: But this is only the tip of the iceberg. Imagine a search application where you can ask for deliberate things: Navigation won't be the same any more.
The misery of the commons, or How could we NOT do this ?

Of course for such a thing to be possible, it is of necessary that you and whichever agent you or your software interface with 'speak the same language'. The syntax is of course XML, and subjects will usually be URI's, but what about predicates and values ?
The dilemma of the XML approach was, that it had no 'common vocabulary' to offer:
I can say ">temperature<16>/temperature<" very well, but it doesn't mean anything What's the value range ? Is it 16 degrees Fahrenheit or Celsius ? How does it relate to concepts like 'cold', 'hot' or 'lukewarm' ? If I told my browser all those things laboriously by hand, so he could go fetch chill drinks for me, how can I hold everybody else to use a the same tags instead of >Temperature<, >temp< or >weather< ?

But how do we do it ?

By building vocabularies and relaying our trust by linking them into our own descriptions. The problem, to state it again, is to build common vocabularies of predicates and values. Predicates state how a subject and a value (or object) relate to another. Subjects are URLs and well defined insofar as you care. The basic mechanics behind RDF is expressing facts like 'John Doe is author of' as triplets: (Subject) has Author (predicate or property type) John Doe (object or value).

Now I can define a vocabulary for my predicates: That is because I can say which XML namespaces I use.

The above statement in RDF would be:

<?xml:namespace ns=""
<?xml:namespace ns=""

    <DC:Author>John Doe</DC:Author>

<DC:Author> here carries a unique meaning: it relates to 'the Element Author from the Dublin Core MI Element Set'. It is an atom for the RDF 'language'. The Dublin Core declares one of the most widely used element sets, containing the usual stuff: author, date, copyright... Of course you can specify your very own vocabulary, which is the whole point of RDF. Create a good vocabulary, and many people will use it.
Mark the difference between the above <DC:Author> and a simple <author> tag. The former is a semantic element, defined within an element set (read: vocabulary), the latter a syntactic element, defined within a DTD (read: grammar). The other difference is: I get to choose which vocabularies I understand or want to share, by simply linking semantic elements.

Be aware, that almost anything can be expressed this way, and an software agent will be able to handle it, as long as you use its vocabulary set. For example URI's can be objects in RDF, and if you give an RDF Statement a separate URI, you can say something about an expression. More than that, from other people's use of an unkown resource (URI, unknown predicate, value) and the vocabulary it started with, an intelligent agent could infer to an internal representation (or 'understanding', in anthropomorphic terms), allowing it to cope with the new term.

Finally, when XML Schema will be standardized, even the format of values can be formalized


More to be found about:
Lassila & Swick on RDF Syntax Spec :
DC Metadata Initiative on Dublin Core Element Set :
A. Powell on RDF and the Dublin Core:
We have Chosen Shame and Will get War :
Philip Greenspun on XML:
Tim BL on Evolvability:
Tim BL on Web Architecture from 50,000 feet:, 21.03.01