A definition of the word 'archive' does not include the element of time. Yet time is critical to human communication and the digital environment.
I submit that as communication becomes more digital, the digital data will become more volatile, fragile, and inherently weaker to future retrieval.
While continuous backup provide a temporary solution. A digital archive solution that compares to the longevity of cave paintings in France has yet to materialize.
I would have thought this would give pause to those digital knowledge workers developing in closed and encrypted digital environments. However, the digital flow through seems to follow a more consumer oriented model of 'use and throw away'. Hence DVD Regions codes, software keys and activations are a very short term commercial mindset, and horribly unfriendly archive solution.
Therefore, open-source and copyright free digital environments will have the greatest advantage in future retrieval and repurposing.
Think cave paintings - if they had been encrypted or copyrighted for destruction - would there be much of human history left?
Does anyone expect to pass on an office file 100 years from now?
The Landing is a social site for Athabasca University staff, students and invited guests. It is a space where they can share, communicate and connect with anyone or everyone.
Unless you are logged in, you will only be able to see the fraction of posts on the site that have been made public. Right now you are not logged in.
If you have an Athabasca University login ID, use your standard username and password to access this site.
We welcome comments on public posts from members of the public. Please note, however, that all comments made on public posts must be moderated by their owners before they become visible on the site. The owner of the post (and no one else) has to do that.
If you want the full range of features and you have a login ID, log in using the links at the top of the page or at https://landing.athabascau.ca/login (logins are secure and encrypted)
Posts made here are the responsibility of their owners and may not reflect the views of Athabasca University.
Comments
I recently blogged about the prospect of a looming digital dark age. The embedded video addresses your concerns about non-futureproof media, not to mention the regimes of production that promote obsolescence. Region codes -- even discs themselves -- like so many other media represent little more than windows of short-term exploitation for copyright holders. (Lawrence Lessig's Free Culture points out that neglected depository requirements have left much of 20th-century history -- that of broadcast media -- either locked up or lost.)
Unfortunately, the fact that the short-term priorities of media conglomerates trump the public interest of the public archive can make history itself profoundly malleable. We have always been at war with Oceania.
My first thesis was produced on an A B Dick machine with a 5.25" floppy disk back in 1985. By 1990 I was fully entrenched in PC DOS systems but had no way of reading the A B Dick format (even though I had 5.25" floppy drives). I sent my precious disks to a company in Toronto that guaranteed conversion or no charge. I got my disks back at no charge with no conversion.
Lesson learned early on is keep your precious data in formats that will easily be retrievable in the future. (i.e., formats that are cross-platform / platform independent). Open source applications are a step towards format transparency. I would hazard to speculate that so long as binary computers are used, TIFF, WAV, AIFF, FLAC, ODT, etc. will still be readable by future systems.
Hence, "future-proofing" digital tech is even a challenge within one lifetime.
Reference:
http://en.wikipedia.org/wiki/A.B._Dick_Company
What formats would you identify as robust (if not futureproof) for text documents, and for moving-image/film/video documents? (Also, what is ODT?)
Good question.
Here are some links that attempt to solve that:
http://www.digitalpreservation.gov/formats/fdd/descriptions.shtml
http://www.jiscdigitalmedia.ac.uk/
http://futureproof.records.nsw.gov.au/digital-archives/
http://www.nla.gov.au/padi/
http://agogified.com/tools-and-services
ODT = LibreOffice Open Document Text format.
FODT = LibreOffice Flat XML Document format.
Back in 2004 "Decay and obsolescence could make many files unretrievable in future By Katie Hafner" discussed the Web as one format having uniformity. However, I disagree. As the business of web took over, the HTML became less pure. Now you have to be running at least four different web browsers to check website output.
The formats that I respect for long term preservation are:
Audio -
WAV, AIFF, FLAC
MP3 (ONLY in cases where I am not concerned with quality)
Office -
ASCII TEXT, RTF
Open Office formats which are XML
PDF (primarily for web distribution)
EMAIL -
EML
COMPRESSION
ZIP
7ZIP (is good but not widely used)
Graphic -
SVG (Vector)
TIFF (Raster)
RAW (Only when bundled with RAW decoder for all platforms)
DJVU (for web distribution)
GIS/GPS ?
Video -
FFMPEG (FFV1)
MKV (x264)
WWW ?
Thanks for this comprehensive answer! Looks like I've got a lot of .doc-to-.rtf conversion work ahead...
In my experience RTF is my last choice derived in rare cases from such output as email archives. Doc is easier to convert to open office format that retains graphics in XML. For PPT I use "Presentation" in LibreOffice. The conversions between Open Office and MS Office with multimedia, tables, etc. are close but not perfect as I'm sure you have already experienced.
Reference:
http://en.wikipedia.org/wiki/Comparison_of_Office_Open_XML_and_OpenDocument
More links on Office conversions:
Not as up-2-date as I would like, but do contain calibration test files useful in comparing conversion compatability between office software.
Okay so you'd recommend .txt or .xml over .rtf -- good to know.
You weren't kidding about 7zip. That format has amazing compression capacity, which I discovered while researching the Wikileaks cable archives. (300 Mb 7zip = 16 Gb!)
TXT is as low as you can go in hopes of retrieval of text (albeit complicated by character encoding). XML is human/machine readable with the chance to be transformed to other formats (assuming known schema). RTF is just legacy retrieval but not much toward "futureproof" since RTF is a proprietary format (as far a I know).
7zip (http://en.wikipedia.org/wiki/7-Zip) is an odd project that has a group following to counter other archivers.
I remember the PKWARE company well and was surprised when they open sourced ZIP (happily).
For PC users there is also http://portableapps.com/apps/utilities/7-zip_portable
7zip was not included in the Windows and MAC OS's. Therefore ZIP got the traction, while 7-zip became a geek app.
I lost the link on a technical discussion of digital leakage from compression/decompression that may become relevant to archiving.
C'est la vie!