Keeping your (digital) archive

Published: 2009-04-14
Last Updated: 2009-04-16 13:05:01 UTC
by Swa Frantzen (Version: 2)
3 comment(s)

Steve posted a link to a story about NASA's effort to read a bunch of tapes from the 60s:

http://www.latimes.com/news/nationworld/nation/la-na-lunar22-2009mar22,0,1783495,full.story

The story is about the effort and cost needed to read tapes from unmanned space missions that a.o. mapped the surface of the moon. The big issue is the difficulty of finding -restoring- a tape drive for the tapes they used back then.

Are we today doing better with our (digital) archive than NASA did 40 years ago?

  • CDs/DVDs: sure the drives are common, but will they remain so? How about CD-rot ? How long will the disks actually remain readable, esp. those writable ones ?
  • Blueray: will it become a big success or not and still be around in X years ?
  • Tapes: a breed on a path to extinction already!? How about backward compatibility? How long do tapes actually remain readable ?
  • Harddisks are cheap! Same here: how long can they be kept in working order? What if the interface changes (a drive of a decade old PC, will it actually work in a modern PC? How about one 20 years old?)
  • File formats: EBCDIC and ASCII will go a long while, but if you need to express more than English they become rather limited. What about "office": what if Microsoft doesn't make or exist anymore in X years ? What if PDF isn't the preferred interchange format anymore at some point in time?
    After all who can edit a 20 year old wordperfect document today?
  • Online services (e.g. storing it in a gmail mailbox) are possibly even more tricky (what if Google ...)
  • Niel pointed out some more problems:
    • (propietary) floppies and floppy drives, computers using casettes to store data, zip drives.
    • Archive software sometimes isn't all that common (he pointed to old Apple archives)
  • ...

If you take small steps it all can grow with you, you buy a new technology to store it, and you still have the old media or format readable.

But big steps are far more trouble than small ones.

So how can you make it works for your archives:

  • At the computer science department where I worked many years ago, every year we read 1/3 of the archive (was on tape) and converted it to new media (Being a unix shop, there was little need to convert formats, but it was taken into account for e.g. framemaker documents), so the media in the archive were kept in two copies, both at most 3 years old.
    I saw the reduction of stacks of 1/2 inch tape into a few exabyte tapes. And on the next run the exabyte tapes were of a higher density ...
    3 years might seem short: media do last longer, but there is a safety factor to consider. 3 years was also short enough to allow for the hardware usd to create it, to still have a support contract active on it by the last time we needed to read it.
  • Use multiple formats for every item: don't just bet on .doc (or .docx), open office, .pdf or even a collection of tiff files, bet on all at the same time. Keep them all. Even that image could be -worst case- input into OCR to get to a new format if it really has to be done.
    Make sure to also archive a computer system that can edit the data. This is sometimes critical, especially with proprietary data formats.
    Make sure to update the formats to the latest versions in addition to the original so that you have a better chance for backward compatibility with future products. 
  • Archive complete machines when needed. Now archiving hardware is tricky, it's the part that breaks with old age ... But you can virtualize the machine. If you keep .e.g old accounting systems for which you long since lost support from the vendor in a VMware image that you never run, but only clone and let it be consulted as/when needed. (and wipe the clone after every use), you can continue to run it for a long while even if the hardware is long obsolete, the OS is long since unsupported and not safe to run on your normal network anymore, etc. While still keeping access to the proprietary data possible.
  • Have a policy in place to not let hardware support expire before all the archive is updated away from the technology.
  • Roseman wrote in to remind us to keep the archive in multiple locations, not just one.
  • Niel wrote in to tell some successes on having done conversions for his friends:
    • Keep old networks around (10base2, old hubs, and their cabling).
    • Keep old computers around like a mac with system 8 on it.
    • Keep serial ports (while the technology still exists) to allow to transfer data from one machine to the next.
    If you have a friend like Niel and you failed to update your archive before you lost the ability to read it, (s)he will become your best friend for a while at least, but anticipating not to need this service is much safer, cause -as Neil points out- how long can he keep a floppy drive working, they do corrode eventually.
  • ...

Other success stories ? Send them in via the contact page!

Before you ask what this has to do with security: It's all about availability!

--
Swa Frantzen -- Section 66

Keywords:
3 comment(s)

Comments

These comments are not for this story. Just wondering if any one is seeing following:

1. There is a steady raise in IRC.Bot and or Backdoor.irc infections (possible through game sites, usb and media sites)
2. Strange emails (with a link to shopping site and or auto vacation reply) being sent from hotmail & yahoo user accounts to all the user's in their address book without the knowledge of users. Possibly related to cross site scripting mentioned in previou diary entry.

Any new insite is helpful.

Regards
These comments are not for this story. Just wondering if any one is seeing following:

1. There is a steady raise in IRC.Bot and or Backdoor.irc infections (possible through game sites, usb and media sites)
2. Strange emails (with a link to shopping site and or auto vacation reply) being sent from hotmail & yahoo user accounts to all the user's in their address book without the knowledge of users. Possibly related to cross site scripting mentioned in previou diary entry.

Any new insite is helpful.

Regards
I just recently received an email from someone I know over Yahoo with a link to some gadget shopping site. I haven't clicked the link and may end up not doing so with your comment. I haven't asked if the sender meant to send the email.

Diary Archives