User agents defining class names — How The Web Went Wrong

A new trend is disturbing me, and I'd like to nip it in the bud. There are various projects and activities which advocate giving meaning to specific class names (as they appear in the class attribute) so that clever user agents can extract extra meaning from the marked-up text, beyond any meaning that HTML can express. In other words, several pre-defined class names are proposed.

Microformats attempt to express small chunks of information (like details of persons or events) in HTML in a machine-readable way. I think this is a wonderful idea!

However, they do it by defining class names like vevent and vcard, and expect these not to clash with author-defined classes. They even say that authors should move their own classes out of the way of microformats to avoid clashes!
Google now allow authors control over which parts of their pages should not be translated by Google. Another great idea!

But they've also arbitrarily picked a class name, which authors now have to avoid if they don't mean it.
The Link Widgets extension for Firefox does a great job of interpreting rel types on links, but it also appears to define the meaning of HTML classes such as next and prev. You can see an example at this photo gallery:
- Woodbine & Gerrard — Northbound on Woodbine at rush hour

Note that, in all cases, the author has control of when to use these special class names — but he no longer defines their meaning! That is, he cannot use them to mean what he wants. User agents that process his documents now define the meaning, and he has no control over them once he places them on the Web. Until these ideas came along, the author chose to link in stylesheets, which gave meaning to classes by styling up the associated content. Now he has to watch his step, and I see this as an instrusion on his namespace.

This is an impractical situation because the author has to keep track of an ever-growing set of definitions made by others; some authority has to keep track of them so they don't clash with each other (e.g., see this list), and authors may have to make changes to their sites retrospectively to accomodate new definitions. I say that this is an absurd and unnecessary way to implement those good ideas above.

Microformats Would Benefit from a Pseudo-Namespace – Jens Meiert

A discussion of problems caused by microformats' class namespace intrusion

First, here are some mitigations of the approach of user agents defining class names:

Names won't clash very often — This isn't good enough for me. Name clashes simply weren't a problem before, because the meaning was expressed in distinct stylesheets, totally under control of the author. And when a clash happens now, the author has to move out of the way, which he never had to do before.

Furthermore, ad hoc global definition of classes sets a poor trend to follow. Neither Google nor the Microformats group have any special authority to define class names for everyone else, but if they do so nonetheless, others will feel that they have a right to define them how they like, resulting in more clashes.
§7.5.2 of the HTML4.01 specification says that stylesheets are not the only way to interpret class names — Indeed, it implies that user agents can infer meaning any way they choose, but I don't think the intention was to do this without agreement of the author. He has no control over the user agents that process his documents on the Web, so he cannot choose their interpretations. He relies on user-agent writers following the Recommendations of the W3C to ensure that both he and they will make the same interpretations.
A Microformat class should only be interpreted when the document has included a profile for it — This is better, as the author now has some control. The inclusion of a URI at a certain place in the document now indicates that he is using a certain kind of microformat, but it still falls short in that he cannot choose the class names and map them to the meanings defined by the microformat. If he wants to start using a new format, he has to go back and check that his current classes don't clash with those of the new format.

Furthermore, the Tails Export extension for Firefox recognises class names regardless of the presence of an appropriate profile URI.

I assert that all of these problems can be avoided, simply by adapting some existing mechanisms.

What we are really trying to do in these cases is to add properties to certain sections of content so that they will be processed differently or specially in certain contexts or applications. What options do we have to do this?

Add custom attributes to HTML — This could end up filling a page with material useless for most user agents, and makes validation messy. Though both of these are relatively minor problems, can we find a better way? After all, we still have to avoid clashes between lots of arbitrary attribute names.
Add custom attributes to XHTML — We can now use XML namespaces to avoid the clashes, but some say that the Web is not yet ready for XHTML.
Add namespace prefixes to class names — If the author can define, for a given format, a prefix to be used locally within the page, avoiding name clashes would be completely under his control. How would such a prefix be defined?
Associate external attributes to elements (like CSS does) — But what language could we use to express these attributes, how would we express the associations to HTML elements, and how would we link a set of associations to a page?

I've already suggested why I'd rule out the first two options, so I'll describe and evaluate the last two options in detail below.

This technique should give just enough flexibility for the author to avoid name clashes. The author would introduce his own prefix (e.g. google) for a microformat's namespace (identified by URI) like this:

<link rel="schema.google" href="http://google.com/ns/robots">

A user agent that understands this microformat will adjust itself to look only for class names beginning with google, for example:

<span class="google.notranslate">some text</span>
<span class="google.noindex">some text</span>

This should be very simple to implement, and requires no central authority beyond DNS. User agents could behave as follows to allow old pages to continue to use profiles while new ones can use prefixes:

Look for a prefix being defined for the namespace of your microformat, i.e. in the <head> element, find <link rel="schema.pref" href="your_namespace_URI">, and extract pref. Your prefix is then pref..
Otherwise, search for the profile URI of your microformat, i.e. <head profile="... your_profile_URI ...">. If you find that, use an empty string, or some other documented default, as your prefix.
Otherwise, do not recognise any special class names for your microformat.

I think that the last option has the greatest flexibility because the choice of class names remains totally in the hands of the author (as it was before). It even allows authors to select affected content using expressions other than class names. However, it's much more complex to achieve, with significant implications.

To implement this approach, we could simply re-use the syntax and the linking methods of CSS. All we then need to do is define a set of CSS properties for each application (translation, and each microformat), and allow the specialised user agents to detect them.

This raises more issues, but I think they are easily solved:

Who will define all these new properties and prevent clashes? — The application developers will. Each property will belong to a namespace owned by its definer, and authors wishing to use them will define a prefix for each namespace. There will be no need for a central authority beyond the one for DNS.

Namespace prefixes will be defined using the existing CSS syntax, and referenced as if vendor prefixes:
```
@namespace google "http://google.com/css";

.notranslate {
  -google-translate: disabled;
}
```
…or preferably they would use CSS3 namespace syntax for type selectors:
```
@namespace google "http://google.com/css";

.notranslate {
  google|translate: disabled;
}
```
…assuming that's allowed by the lexical grammar of CSS.
Won't regular browsers be downloading lots of useless CSS rules? — CSS rules for different applications could be stored in separate files, and loaded using media queries that only specialised user agents will recognise. These queries will be expressed using media types and features with prefixes chosen by the author to refer to application-specific namespaces:
```
<link rel   = "schema.google"
      href  = "http://google.com/css">
<link rel   = "stylesheet"
      media = "-google-translator"
      href  = "translation-styles.css">

<link rel   = "stylesheet"
      media = "google|translator"
      href  = "translation-styles.css">
```
Again, there's no need for a central authority.
CSS is only for style; won't this break the separation between content and style that is considered so valuable? — CSS was designed for style, but is it really specific to it? The syntax for selectors is largely independent of its application, and by exploiting namespace or vendor prefixes, the set of media types and features you might use to link in a stylesheet belong to an extensible set, so you can still keep style separated from other aspects. And the fact that one writes <link rel="STYLESHEET"> is irrelevant — STYLESHEET is just a mnemonic.

Really, style is just one way of interpreting content, and in general, you're linking in an ‘interpretation sheet’, not a stylesheet. In other words, the content-style separation isn't broken by this proposal; it is merely partitioned further.

The result is a method of designing completely new ways of interpreting content, not merely rendering it, without having to wait for an authority to standardize it, and while leaving choice of class name entirely to the author, where it should be.

As an example, another mode of interpretation could be the application of an indexer for a search engine. A property could be defined to tell search engines not to index or not to follow certain content within a page:

@namespace search "http://www.w3c.org/2009/search-engines";

.noindex {
  search|index: disabled;
}

.nofollow {
  search|follow: disabled;
}

In this case, a consortium of search engines could agree on these properties, without having to affect other assignments within W3C.

These styles are only relevant to search engines, so we should inform regular browsers using an appropriate media query that they don't need to load them in:

<link rel="schema.search"
     href="http://www.w3c.org/2009/search-engines">
<link rel="stylesheet"
     href="search-styles.css"
    media="search|robot">

As well as being more expressive than namespaced class names (the previous proposal), it also pushes almost all the namespacing into stylesheets. These are most effectively used when linked from and shared by many pages with a common style, so maintenance should be simpler.

This technique also permits an easy way to migrate a website using microformats without profiles. The specifying party of the format writes and publishes an interpretation sheet for their default class names, and authors happy to use those default names simply link the sheet in.

namespaces considered harmful

From the Microformats Wiki

Namespaced content on the Web has failed.

Namespacing isn't a goal that can fail or succeed according to popularity or how much it is exploited. It is a technical means to solve a problem, and so would only fail if it technically could not solve that problem. Does it solve it? Yes! Therefore, has it failed? No!

[…]in practice people write scrapers that look for namespace prefixes as if they are part of the element name, or perform literal string matches on common namespace prefix uses […], not as mere shorthands for namespace URIs.

People use perfectly good tools the wrong way all the time. There's nothing wrong with the tool, so the solution isn't to throw it away!

Namespaces are actually *not* well supported in sufficient modern browsers[…]

Only the plugins and extensions need to handle namespaces for class names (as described above). Look through the page's <link> elements for your URI and determine the local prefix.

Namespaces encourage people to seclude themselves in their own namespace and invent their own schema rather than reusing existing elements in existing formats. This hurts interoperability because a dozen different namespaces can all have their own slightly different semantics for the same element.

You can prevent that by having a community that looks at how microformats overlap and share structure. Oh, you seem to have one already, which you need anyway to stop name clashes between formats. Format inventors will want to be part of that community in order to get widespread support.

If you want to carry on a theoretical discussion of namespaces, please do so elsewhere, for in practice, discussing them is a waste of time, and off-topic for microformats lists.

Oh, how very open-minded!

microformats principles — Lowering barriers for publishers

From the Microformats Wiki

[…]but it does mean that we ask less of [publishers] than most other standards efforts, which ask publishers to learn new languages, create new files, namespaces etc.

Publishers won't be creating new namespaces, just using them.

[…]humans first, machines second. One aspect of being more human-centric in design is about making it easier for humans in general to publish information in microformats, rather than just making it easier for machines (programs) to parse microformats. This seems like an obvious trade-off in that many fewer humans develop/write parsers than publish content, and thus making publishing easier benefits more people.

Instances of microformats are going to be read (by machines) thousands of times more than humans (and often machines) will write them. And the definition of microformat classes without namespace prefixes (or at least, some sort of author-controlled switch) makes it more difficult for an author to be sure he doesn't stumble on one.

Namespace prefixes for class names

Using CSS for non-style properties

What do the Microformats people think about namespacing?