My digital archive

Recently I recognized that my oldest documents which I still have in my archive are from 1994. In between they have seen several migrations of computer hardware, operating systems and applications. I think that is a very good base to review what worked out to be a good practice after that long time to archive documents and what didn’t.

First my archive is always stored on my computer hard disk and followed all my migrations steps during the time. I refuse to use backup tools which move the file on external media or compressed them in zip or proprietary file formats. The files in the archive are organized by year that means the root of the archive has only one folder for every year. This eases the navigation in history and I don’t need a special application to manage the content. The backup of my archive is just a mirror to an external hard drive or recently a NAS. Since the past 14 years I needed that backup only 3 times to recover fragmented or deleted files.

The main lessons I learned is to save content in files with mainstream document standards like MS Word&Excel, HTML, PDF, AVI, MP3, MPEG, JPEG, TIFF, SVG, Plain ASCII. Today I would suggest open document standards but they were not available 10 years ago. I know there were standards like latex but they were mostly for techies and today still not in widespread use.

Further I didn’t loose files because of hard disk problems where your files dies a slow death (just one old AVI video) but suffered most because of the usage of applications which use proprietary document formats to store their data.

One negative example is CorelDraw, once my favorite vector graphic program. The version I licensed was not capable to run on Windows XP which rendered my Corel files to meaningless binary files. I spent some time to convert the most important ones to mainstream formats but in the end I lost a lot of my drawings. You can continue the negative hit list of all day applications for processing and managing images, word documents, emails, notes, calender, passwords and contact list which will give you headache with the next migration or patch of your operating system.

In the end it was always the same: I lost the data only because the viewer was not available anymore on my operating system which I’m actually using. I used many OS with time (MS DOS 6.X, W3.11, Windows 2000, OS/2, Linux, Windows XP and Mac OS/X) and many applications like MS Office beginning with MS Word 2.0 for MS DOS 🙂 or Word Perfect and OpenOffice.

What is the consequence for me now?

  1. Use open standards to save your files or at least mainstream standards which will be readable in the next decade
  2. The application centric approach of the last 20 years is a dead end for long term archiving and a impressive confirmation of the old unix rule that everything is a file
  3. Embedd all information in the file for organizing purposes (e.g. Keywords for a JPEG file)
  4. Take care what application you use and check how they store the information
  5. Don’t rely on the features of your Operating System and it’s applications. I promise you will use in 5 years something different 🙂