Metadata vs. Metatags
There would appear to be a certain amount of confusion about the terms “metadata” and “meta tags” - I know that it has confused me in the past so I am hoping that this article may make things a little more clear for those who are struggling with these meta-things.
Metadata
As this article is being written for a newsletter focusing on accessibility, let’s start by looking at meta-things in this context. Checkpoint 13.2 of WCAG10 tells us to:
Provide metadata to add semantic information to pages and sites.
What does this mean? Let’s start by defining what we mean by metadata: metdata is data about data. So, providing metadata for our page (which is data) means that by some means we need to describe that data.
How much metadata do we need to provide about our page to satisfy Checkpoint 13.2? There is probably no one correct answer to this other than to provide as much useful metadata as possible. The minimum is to provide a value for (X)HTML’s only real metadata element - the <title></title>.
Yes, the <title></title> is metadata - it is data that describes the data that is our page or document. Please note that “Untitled” is not a good value for our <title></title>, nor is it cool or clever to use the title of the site for the <title></title> of every page in the site. (Including the site title, however, is good as it allows the page to make more sense when viewed on its own.)
Meta Tags or Meta Elements
What about “meta tags” or “meta elements” then? Are these not metadata? No; meta elements (or tags if you will) are not metadata in themselves, they are (X)HTML elements that allow us to embed metadata in a page.
Let’s look at how we can use meta elements to put metadata into a document, using a couple of well-known examples:
<html>
<title>My page</title>
<meta name=”description” content=”A page I wrote about some stuff.” />
<meta name=”keywords” content=”Smiffy, stuff, things,” />
</head>
In the example above, we are presenting three pieces of metadata - the page title, some information called “description” and some information called “keywords”. The description would give a brief summary of what the page is about and the keywords a comma separated list of terms relevant to the page. Whilst description and keywords are often published, how much they actually get used is debateable. There was a time when search engines might have taken note of these but from what I have been reading recently, they are largely ignored for the simple reason that much metadata of this type cannot be trusted. The description and keywords metadata values may be of use if your organisation has an in-house search engine that can make use of them - but there are better things available as we will see shortly.
I have seen many other terms added as metadata, for instance “author”. Great - it’s good to identify the author of a document, but information embedded in meta elements is really for machines to read, not humans. Besides, can we agree that the creator or writer of a document is called an author? Probably not.
The problem that we have here is that we are not working to a formal metadata scheme. If we make up our own terms, they will probably only be of use to us, and only then if we have our in-house search engine as mentioned earlier.
Formal Metadata Schemes
If we really want to make our metadata useful, we need to agree on what the terms are (description and keywords may be common, but they are informal). If we all say that the person who creates a document is a “creator”, then my in-house search engine can look at your documents and know who wrote them because we have both used the same term - and your in-house search engine can look at my documents and make sense of them in the same way.
Where we can get really clever is when I use a set of terms that you may not be familiar with, but I also provide you with a link to something called a schema that defines those terms. Your software can then run off and look at the schema and come back and tell you what my metadata means. This is where we start to touch on what is known as the Semantic Web.
Dublin Core: A Formal Metadata Scheme
Let’s look now at a metadata scheme called Dublin Core[2]. Some people might even call Dublin Core the formal metadata scheme as it is actually listed with the International Standards Organisation as ISO15836. There are two parts to Dublin Core, the 15 Elements which constitute ISO15836 and the Terms, which give us much more scope about what we can describe. It should be noted that Dublin Core metadata isn’t just about describing Web content - it can describe physical objects, events, services and more.
Rather than go into the boring theory of Dublin Core metadata, let’s get our hands dirty and look at a practical example; this is a selection of the Dublin Core metadata used to describe the document you are currently reading:
<link rel=”schema.DC” href=”http://purl.org/dc/elements/1.1/” />
<link rel=”schema.DCTERMS” href=”http://purl.org/dc/terms/” />
<meta name=”DC.language” scheme=”DCTERMS.RFC1766″ content=”en” />
<meta name=”DC.type” scheme=”DCTERMS.DCMIType” content=”Text” />
<meta name=”DC.format” scheme=”DCTERMS.IMT” content=”text/html; charset=UTF-8″ />
<meta name=”DC.title” lang=”en” content=”Smiffy’s Place: Metadata, Meta Tags, Meta What?” />
<meta name=”DC.creator” content=”Matthew Smith (Smiffy)” />
<meta name=”DC.identifier” content=”http://www.smiffysplace.com/metadata-meta-tags-meta-what” />
<meta name=”DCTERMS.license” content=”http://creativecommons.org/licenses/by-nc-sa/3.0/” />
<meta name=”DC.rights” content=”(C) Copyright 2001-2007 Matthew Steven Smith” />
<meta name=”DC.description” content=”Article for the GAWDS newsletter clarifying the differences
between metadata and meta tags and what the two are actually for.” />
<meta name=”DC.subject” content=”Accessibility;Dublin Core;HTML;Technical;XHTML;adaptability;metadata;namespace;scheme” />
<meta name=”DCTERMS.created” scheme=”DCTERMS.W3CDTF” content=”2007-08-14″ />
Rather than describing every element, I will point out a few items of interest - you can read up on the elements and terms at the Dublin Core site.
Firstly, those first two links - what are they there for? These links point to the schemas for the Dublin Core Elements (DC) and Terms (DCTERMS). If you don’t understand my metadata, you can follow these links to the schemas so that you can find out how they work - or at least your software can.
The first meta element, DC.language, has more than the usual name and content properties - it also has a property called “scheme”. What this means is that the value of our content is part of a formal, controlled vocabulary. This means that the only values that can appear in the content must be picked from this vocabulary, so we are all speaking the same language - no making up your own.
The element DC.title has yet another new property: lang=”en”. This means that the title that I am presenting is in English. I could have several different values of DC.title, each with a different language attribute, allowing me to present a multi-lingual version of the metadata.
Not forgetting that we started off talking about description and keywords, Dublin Core covers these. Description is DC.description and keywords becomes DC.subject. If you already have description and subject meta elements in your documents, you can easily convert them to the equivalent Dublin Core terms by a) renaming them and b) changing the commas in keywords to semicolons in DC.subject. Easy!
I would encourage readers to go through the code example above in conjunction with the documentation on the Dublin Core site.
Continue to read post by Matthew Smith for the GAWDS (Guild of Accessible Web Designers)