How to use title, subject and keywords

Many document formats allows you to add metadata to a document like title, subject and keywords. Widespread formats which does are html, ms-word, jpeg. Despite the fact that it is an effective way to organize your computer content it’s rarely used.
One reason is that even this three attributes have different meaning for html, word and jpeg. Try to find with google examples how to fill in. Let’s try to find one.

The known standard Dublin Core is already a good starting point because many metadata schemas are using it. It defines among others these attributes:

  • title: The name given to the resource. Typically, a Title will be a name by which the resource is formally known, e.g. “My dog has fun”
  • subject: The topic of the content of the resource. Typically, a Subject will be expressed as keywords or key phrases or classification codes that describe the topic of the resource. Recommended best practice is to select a value from a controlled vocabulary or formal classification scheme. Separate the keywords with semicolon, e.g “animal; dog; water”
  • description: Description may include but is not limited to: an abstract, table of contents, reference to a graphical representation of content or a free-text account of the content
  • source: A Reference to a resource from which the present resource is derived, e.g. “”

How to use it in

  • HTML can store metadata in the header section and initially only title, keywords, description but also allows to embed DC attributes like e.g. <meta name=”DC.title” content=”SELFHTML: Meta-Angaben”>. Always use the DC attributes.
  • JPEG allows to store metadata in EXIF or XMP container inside the JPEG file. The future proof standard is XMP and default for most professional application (e.g. Adobe Photoshop, Apple Aperture). XMP supports natively the Dublin Core attributes
  • MS-Word is as always different. They provide as default metadata title, subject, keywords and description and don’t support DC in their old formats. The new document format OpenXML is storing DC metadata but keeps also the attribute keywords. Best practice is to ignore the attribute keywords and use subject instead to add keywords
  • OpenOffice also mismatches subject and keywords like it’s precursor and even in new OpenDocument Format.

History of documents

Following my recent post I think future software applications should embed the context from where they received documents before saving them to the local hard disk.

In example downloading images always means to keep track from where you got it in case you need more images from the source or because of legal aspects. For example I’m a big fan of icons and collecting them for private use when they are something special. The simplest way is collect them in folders for every website. Better would be to store the information in the metadata of the file itself.

When saving attachments from an email to the local hard disk the email client should also embed the sender email address in the local file. A lot of document types allow storing this kind of information. I think to rely on an application to keep track which document originates from which email is the wrong approach.

The reason why:

  • when downloading images from e.g. stockphotos they often don’t contain the original website. In case you want to find more photos of the same at a later time it’s easier with embed data
  • when you send the document to a friend the document on your hard disk would contain from whom you got the file (email adress) and where it was downloaded
  • integrates in modern concepts of organizing the content on your hard disk with desktop search engines like google desktop, ms desktop search, spotlight or beagle. All of them extracting metadata of files to bring them in some context so you can find them easily.