A new trend is disturbing me, and I'd like to nip it in the bud. There are various projects and activities which advocate giving meaning to specific class names (as they appear in the class attribute) so that clever user agents can extract extra meaning from the marked-up text, beyond any meaning that HTML can express. In other words, several pre-defined class names are proposed.

Note that, in all cases, the author has control of when to use these special class names — but he no longer defines their meaning! That is, he cannot use them to mean what he wants. User agents that process his documents now define the meaning, and he has no control over them once he places them on the Web. Until these ideas came along, the author chose to link in stylesheets, which gave meaning to classes by styling up the associated content. Now he has to watch his step, and I see this as an instrusion on his namespace.

This is an impractical situation because the author has to keep track of an ever-growing set of definitions made by others; some authority has to keep track of them so they don't clash with each other (e.g., see this list), and authors may have to make changes to their sites retrospectively to accomodate new definitions. I say that this is an absurd and unnecessary way to implement those good ideas above.

First, here are some mitigations of the approach of user agents defining class names:

I assert that all of these problems can be avoided, simply by adapting some existing mechanisms.

What we are really trying to do in these cases is to add properties to certain sections of content so that they will be processed differently or specially in certain contexts or applications. What options do we have to do this?

I've already suggested why I'd rule out the first two options, so I'll describe and evaluate the last two options in detail below.

Namespace prefixes for class names

This technique should give just enough flexibility for the author to avoid name clashes. The author would introduce his own prefix (e.g. google) for a microformat's namespace (identified by URI) like this:

<link rel="schema.google" href="http://google.com/ns/robots">

A user agent that understands this microformat will adjust itself to look only for class names beginning with google, for example:

<span class="google.notranslate">some text</span>
<span class="google.noindex">some text</span>

This should be very simple to implement, and requires no central authority beyond DNS. User agents could behave as follows to allow old pages to continue to use profiles while new ones can use prefixes:

  • Look for a prefix being defined for the namespace of your microformat, i.e. in the <head> element, find <link rel="schema.pref" href="your_namespace_URI">, and extract pref. Your prefix is then pref..

  • Otherwise, search for the profile URI of your microformat, i.e. <head profile="... your_profile_URI ...">. If you find that, use an empty string, or some other documented default, as your prefix.

  • Otherwise, do not recognise any special class names for your microformat.

Using CSS for non-style properties

I think that the last option has the greatest flexibility because the choice of class names remains totally in the hands of the author (as it was before). It even allows authors to select affected content using expressions other than class names. However, it's much more complex to achieve, with significant implications.

To implement this approach, we could simply re-use the syntax and the linking methods of CSS. All we then need to do is define a set of CSS properties for each application (translation, and each microformat), and allow the specialised user agents to detect them.

This raises more issues, but I think they are easily solved:

  • Who will define all these new properties and prevent clashes? — The application developers will. Each property will belong to a namespace owned by its definer, and authors wishing to use them will define a prefix for each namespace. There will be no need for a central authority beyond the one for DNS.

    Namespace prefixes will be defined using the existing CSS syntax, and referenced as if vendor prefixes:

    @namespace google "http://google.com/css";
    
    .notranslate {
      -google-translate: disabled;
    }
    

    …or preferably they would use CSS3 namespace syntax for type selectors:

    @namespace google "http://google.com/css";
    
    .notranslate {
      google|translate: disabled;
    }
    

    …assuming that's allowed by the lexical grammar of CSS.

  • Won't regular browsers be downloading lots of useless CSS rules? — CSS rules for different applications could be stored in separate files, and loaded using media queries that only specialised user agents will recognise. These queries will be expressed using media types and features with prefixes chosen by the author to refer to application-specific namespaces:

    <link rel   = "schema.google"
          href  = "http://google.com/css">
    <link rel   = "stylesheet"
          media = "-google-translator"
          href  = "translation-styles.css">
    <!-- or -->
    <link rel   = "stylesheet"
          media = "google|translator"
          href  = "translation-styles.css">
    

    Again, there's no need for a central authority.

  • CSS is only for style; won't this break the separation between content and style that is considered so valuable? — CSS was designed for style, but is it really specific to it? The syntax for selectors is largely independent of its application, and by exploiting namespace or vendor prefixes, the set of media types and features you might use to link in a stylesheet belong to an extensible set, so you can still keep style separated from other aspects. And the fact that one writes <link rel="STYLESHEET"> is irrelevant — STYLESHEET is just a mnemonic.

    Really, style is just one way of interpreting content, and in general, you're linking in an ‘interpretation sheet’, not a stylesheet. In other words, the content-style separation isn't broken by this proposal; it is merely partitioned further.

The result is a method of designing completely new ways of interpreting content, not merely rendering it, without having to wait for an authority to standardize it, and while leaving choice of class name entirely to the author, where it should be.

As an example, another mode of interpretation could be the application of an indexer for a search engine. A property could be defined to tell search engines not to index or not to follow certain content within a page:

@namespace search "http://www.w3c.org/2009/search-engines";

.noindex {
  search|index: disabled;
}

.nofollow {
  search|follow: disabled;
}

In this case, a consortium of search engines could agree on these properties, without having to affect other assignments within W3C.

These styles are only relevant to search engines, so we should inform regular browsers using an appropriate media query that they don't need to load them in:

<link rel="schema.search"
     href="http://www.w3c.org/2009/search-engines">
<link rel="stylesheet"
     href="search-styles.css"
    media="search|robot">

As well as being more expressive than namespaced class names (the previous proposal), it also pushes almost all the namespacing into stylesheets. These are most effectively used when linked from and shared by many pages with a common style, so maintenance should be simpler.

This technique also permits an easy way to migrate a website using microformats without profiles. The specifying party of the format writes and publishes an interpretation sheet for their default class names, and authors happy to use those default names simply link the sheet in.

What do the Microformats people think about namespacing?

Namespaced content on the Web has failed.

Namespacing isn't a goal that can fail or succeed according to popularity or how much it is exploited. It is a technical means to solve a problem, and so would only fail if it technically could not solve that problem. Does it solve it? Yes! Therefore, has it failed? No!

[…]in practice people write scrapers that look for namespace prefixes as if they are part of the element name, or perform literal string matches on common namespace prefix uses […], not as mere shorthands for namespace URIs.

People use perfectly good tools the wrong way all the time. There's nothing wrong with the tool, so the solution isn't to throw it away!

Namespaces are actually *not* well supported in sufficient modern browsers[…]

Only the plugins and extensions need to handle namespaces for class names (as described above). Look through the page's <link> elements for your URI and determine the local prefix.

Namespaces encourage people to seclude themselves in their own namespace and invent their own schema rather than reusing existing elements in existing formats. This hurts interoperability because a dozen different namespaces can all have their own slightly different semantics for the same element.

You can prevent that by having a community that looks at how microformats overlap and share structure. Oh, you seem to have one already, which you need anyway to stop name clashes between formats. Format inventors will want to be part of that community in order to get widespread support.

If you want to carry on a theoretical discussion of namespaces, please do so elsewhere, for in practice, discussing them is a waste of time, and off-topic for microformats lists.

Oh, how very open-minded!

[…]but it does mean that we ask less of [publishers] than most other standards efforts, which ask publishers to learn new languages, create new files, namespaces etc.

Publishers won't be creating new namespaces, just using them.

[…]humans first, machines second. One aspect of being more human-centric in design is about making it easier for humans in general to publish information in microformats, rather than just making it easier for machines (programs) to parse microformats. This seems like an obvious trade-off in that many fewer humans develop/write parsers than publish content, and thus making publishing easier benefits more people.

Instances of microformats are going to be read (by machines) thousands of times more than humans (and often machines) will write them. And the definition of microformat classes without namespace prefixes (or at least, some sort of author-controlled switch) makes it more difficult for an author to be sure he doesn't stumble on one.