Home » Blog »

Intro to microformats

Nick Nettleton | 08 July 2006 | 7 comments |

Microformats are an important  - no, very important - new idea on the web. In fact, I think they are so important, they could precipitate a leap of evolution more important than AJAX and as important as XML web services. But first, an introduction.

The focal site for microformats, microformats.org, is not clear at all on what microformats are, but here is my understanding:

Microformats build on the semantic capabilities of the web, using existing standards.

Unless you're fairly technical, that's probably meaningless. So, to explain. <h1>, <h2>, <p>, <ul> - all of these and other HTML tags are designed to tell human readers, web browsers and other HTML readers what sort of information they contain. Not what it looks like - that's what CSS is for - but how that bit of information relates to other bits on the page. Is it a heading, a paragraph or a list of things?

Where HTML is not enough...

This is quite useful, for example, for automatically generating outlines of documents. But it's not 1% of the distance we could go with the idea. For example, a lot of websites have information about points of contact - employees, business associates, personal contact info and so on. Just like this:

Nick Nettleton
20 Crescent Lane, Bath BA1 2PX, UK
+44 (0)1225 358 346
nospam@example.com

Here's a typical HTML for this:

<p><strong>Nick Nettleton</strong><br /> 
20 Crescent Lane, Bath BA1 2LT, UK<br />
+44 (0)1225 358 346<br />
<a href="mailto:nick@plumdigitalmedia.com">nospam@example.com</a></p>

This is great for human readers, becaue we know what a phone number and address looks like. But to a computer, it's a just paragraph of text, with a more important bit in bold at the top and an email address at the bottom. That is the total semantic capability of HTML for this bit of text.

A clever web browser, plugin, page scraper or other HTML reader could try recognise what looks like a phone number, and put a Skype button next to it. It could even try to recognise my address and separate out its compontents - street, town, postcode, country. But it would be hard pressed to get this right on a regular basis, and would be at the mercy of what information I choose to include, as well as my particular way of representing it.

For example, there is small village near where I live called Petit France. If my address was "20 High Street, Petit France", even a highly intelligent address parser, such as a human reader, with no further information to go on, cannot be sure whether I live in the villiage of Petit France, or in a town called Petit in France.

As anyone who has received a letter starting with 'Dear Mr Nick' or somesuch will know - and especially anyone who has worked with databases of people - this sort of confusion is a constant and major issue.

So how can we give readers more information to go on, more semantics or meta information to understand whether the text really is contact information, and which bits of it are what?

Header-style tagging

One solution would be to come up with stronger rules on how people should write their names and addresses. It's not a bad idea. We have fairly good rules in place already, they are quite consistent from country to country, and a solution like this would apply not just to the web, but also to the whole electronic and even paper world. For example:

Given name: Nick
Family name: Nettleton
Address: 20 Crescent Lane
Town: Bath
Postcode: BA1 2PX
Country: UK
Telephone: +44 (0)1225 358 346
Email: nospam@example.com

That's pretty darn clear for a human reader, and we know this format can work well in the electronic world because we use this for HTTP and email headers. It's a good idea.

But there's one downside. We've traded one sort of clarity (knowing exactly what is what) for another: readability. If I already know Nick lives in the UK, and how UK people tend to write their names and addresses, the much simpler format up top is way clearer and easier to read. A directory full of contact details presented in the format above will be extremely hard to read, and probably not very popular.

So what other technique can we use to give readers a better chance of understanding contact information, but without compromising on visual clarity?

Creating new HTML tags

HTML was originally designed with human readability and semantics in mind. As far as it goes, it's pretty successful in this. Since our return from the bad days of table-based layouts, good, simple HTML is quick and easy to read, and describes the information it tags accurately - as headings, paragraphs, lists and so on. Better still, it is designed so that the semantic information, the HTML tags, can be hidden from viewers and replaced with more natural visual cues to semantics, such as spacing, font size and character weight.

This has been extremely effective, and applied to contact information would solve all our problems at a single stroke:

<contact>
<givenname>Nick</givenname>
<familyname>Nettleton</familyname>
<!-- etc -->
</contact>

In a web browser, only the text is displayed to the viewer, while style information can be used to add visual cues based on the semantics. HTML readers, and users that need a little more than visual cues on the semantics, can inspect the source code itself.

The trouble here is that, unlike XML, HTML doesn't allow us to add new tags like this on an as-required basis, and for very good reason: HTML's huge value is by virtue of it being a shared standard. History has shown us that some companies will use flexibility in this to their own advantage and the significant detriment of the overall community - and use their market position to seal their alternative approach.

hCard: giving meaning to HTML classes

Nevertheless, the game is not over. There is a microformat called hCard that instead of uses HTML classes instead of new tags to add new semantics to HTML. Since these class names have no semantic meaning under HTML, conform to no standards (beyond the characters that can be used), and are intended for referencing only visual  information, the W3C has little to say on what class names you can or can't use.

HTML designers tend to use similar class names for many things across projects and between each other, simply because life is easier that way, and after all, there are only a few different ways of writing the words 'box', 'header' or 'footer', for example.

So the hCard format rather cleverly leverages this flexibility to use the HTML class attribute for something it wasn't intended: semantics. By giving an element the class name 'postal-code' - in the right context - it indicates to readers that understand this format, that the contained text is a post code.

Here are my contact details, as above, expressed using this format:

<p class="vcard">

<div class="fn">
<span class="given-name">Nick</span>
<span class="family-name">Nettleton</span>
Nettleton
</div>

<div class="adr">
<span class="type">Work</span>
<span class="street-address">20 Crescent Lane</span>,
<span class="locality">BATH</span>
<span class="postal-code">BA1 2PX</span>,
<span class="country-name">UK</span>
</div>

<div class="tel">
<span class="type">Work</span>
<span class="value">+44 (0)1225 358 346</a>
</div>

<a class="email" href="mailto:nospam@example.com">nospam@example.com</a>
</p>

This is an extremely appealing idea. If the web designer sets up the appropriate stylesheet rules for the classes, the information will appear in browsers exactly as the example at the beginning of this article. Meanwhile, readers that need to know more can inspect the code. Web browsers understanding this information can, for example, automatically insert an 'Add to address book' or 'Call with Skype' button beside the address, and be certain of using the right details in the right way.

It's not a completely new idea: web designers have been given things semantic - or at least meaningful - class names for years. JavaScript developers have been using class names not to define style, but to identify groups of elements that they want to apply dynamic behaviours to. Which helps to make the microformat even more appealing - it's something we're already familiar with.

What is new is to promote it as a standard, which makes it far more valuable than an ad-hoc habit or convention, subject to local variations.

hCard is just one of a number of such microformats, a some of them using classes in this way. Others, such as XFN for attaching personal relationship information to hyperlinks, use the HTML rel attribute; and others use plain HTML.

There are at least two underlying drawbacks with this approach, which I will look at in further posts, since I have written enough for now. For more and follow up, see www.microformats.org.

Comments

Scott Reynen 17/07/2006 21:24 - Visit »

"web designers have been given things semantic - or at least meaningful - class names for years. JavaScript developers have been using class names not to define style, but to identify groups of elements that they want to apply dynamic behaviours to."

Knowing all of this, why do you continue to refer to "CSS class names"? I see Ryan King already pointed out in the comments to your previous microformats article that "they're really HTML class names, not CSS classnames." It's troubling to see an otherwise informative article perpetuate this common misunderstanding.

Eivind Lie Nitter 17/07/2006 22:20 - Visit »

Thanks for a good, simple and informative article! I've been wanting to understand microformats for a while, but haven't bothered to dig too much through microformats.org. But it seems like real useful stuff!

Nick Nettleton 18/07/2006 10:19

Thanks Eivind.

Scott, this article was written before the other. It is a fair point if minor point, and I have updated the article accordingly.

Gayle 18/07/2006 18:12 - Visit »

Holy cow, THANK YOU. It makes SENSE now.

Johan Coppieters 19/07/2006 10:23 - Visit »

I agree it would be nice to have custom tags like in HTML, so the enclosed text would have a meaning and we could give it a layout with stylesheets. But if you think about it: XML is all you need.

Data for webpages is quite often stored in SQL databases. If web programmers would generate XML pages from these databases and add XSLT translators to transform these into the HTML layout created by the web designers, we wouldn't need this clever trick.

For those sites that don't have dynamic content, you could keep the webpages in XML format.

Once this accomplished, the only thing to do, is to configure your webserver/application engine, to deliver the XML pages when the client switches from the html, asp, php extention to the xml in the URL.

Sammy Dellicour 23/07/2006 19:45 - Visit »

Thank you for your article, a very good introduction.

I have one worry about microformats: spammers. Once microformats are widespread, aren't we also making it too easy for spammers to get personal information ? Sorry if this might be off topic to your introduction.

Temp 26/07/2006 06:13

Very good job! Thanks! I read a million definitions and still didn't know wtf a microformat was, until I just read your explaination.