Landing : Athabascau University

Digital Archive or Self-Delete

A definition of the word 'archive' does not include the element of time.  Yet time is critical to human communication and the digital environment.

I submit that as communication becomes more digital,  the digital data will become more volatile, fragile, and inherently weaker to future retrieval.


While continuous backup provide a temporary solution.  A digital archive solution that compares to the longevity of cave paintings in France has yet to materialize.

I would have thought this would give pause to those digital knowledge workers developing in closed and encrypted digital environments.  However, the digital flow through seems to follow a more consumer oriented model of 'use and throw away'.  Hence DVD Regions codes, software keys and activations are a very short term commercial mindset, and horribly unfriendly archive solution.

Therefore, open-source and copyright free digital environments will have the greatest advantage in future retrieval and repurposing.

Think cave paintings - if they had been encrypted or copyrighted for destruction - would there be much of human history left?

Does anyone expect to pass on an office file 100 years from now?

Comments

  • Mark A. McCutcheon August 29, 2011 - 8:36pm

    I recently blogged about the prospect of a looming digital dark age. The embedded video addresses your concerns about non-futureproof media, not to mention the regimes of production that promote obsolescence. Region codes -- even discs themselves -- like so many other media represent little more than windows of short-term exploitation for copyright holders. (Lawrence Lessig's Free Culture points out that neglected depository requirements have left much of 20th-century history -- that of broadcast media -- either locked up or lost.)

    Unfortunately, the fact that the short-term priorities of media conglomerates trump the public interest of the public archive can make history itself profoundly malleable. We have always been at war with Oceania.

  • Steve Swettenham August 30, 2011 - 12:15pm

    My first thesis was produced on an A B Dick machine with a 5.25" floppy disk back in 1985.  By 1990 I was fully entrenched in PC DOS systems but had no way of reading the A B Dick format (even though I had 5.25" floppy drives).  I sent my precious disks to a company in Toronto that guaranteed conversion or no charge.  I got my disks back at no charge with no conversion.

    Lesson learned early on is keep your precious data in formats that will easily be retrievable in the future.  (i.e., formats that are cross-platform  / platform independent).  Open source applications are a step towards format transparency.  I would hazard to speculate that so long as binary computers are used, TIFF, WAV, AIFF, FLAC, ODT, etc. will still be readable by future systems.

    Hence, "future-proofing" digital tech is even a challenge within one lifetime.

    Reference:
    http://en.wikipedia.org/wiki/A.B._Dick_Company

  • Mark A. McCutcheon August 30, 2011 - 12:42pm

    What formats would you identify as robust (if not futureproof) for text documents, and for moving-image/film/video documents? (Also, what is ODT?)

  • Steve Swettenham August 31, 2011 - 11:09am

    Good question.

    Here are some links that attempt to solve that:

    ODT = LibreOffice Open Document Text format.

    FODT = LibreOffice Flat XML Document format.

    Back in 2004 "Decay and obsolescence could make many files unretrievable in future By Katie Hafner" discussed the Web as one format having uniformity. However, I disagree. As the business of web took over, the HTML became less pure. Now you have to be running at least four different web browsers to check website output.

    The formats that I respect for long term preservation are:

    Audio -


    • WAV, AIFF, FLAC


    • MP3 (ONLY in cases where I am not concerned with quality)





    Office -

    • ASCII TEXT, RTF

    • Open Office formats which are XML

    • PDF (primarily for web distribution)

    EMAIL -

    • EML



    COMPRESSION

    • ZIP

    • 7ZIP (is good but not widely used)



    Graphic -

    • SVG (Vector)


    • TIFF (Raster)


    • RAW (Only when bundled with RAW decoder for all platforms)

    • DJVU (for web distribution)



    

GIS/GPS ?

    Video -

    • FFMPEG (FFV1)

    • MKV (x264)


     

    WWW ?



  • Mark A. McCutcheon August 31, 2011 - 3:29pm

    Thanks for this comprehensive answer! Looks like I've got a lot of .doc-to-.rtf conversion work ahead...

  • Steve Swettenham September 2, 2011 - 1:36pm

    In my experience RTF is my last choice derived in rare cases from such output as email archives.  Doc is easier to convert to open office format that retains graphics in XML.  For PPT I use "Presentation" in LibreOffice.  The conversions between Open Office and MS Office with multimedia, tables, etc. are close but not perfect as I'm sure you have already experienced.

    Reference:

    http://en.wikipedia.org/wiki/Comparison_of_Office_Open_XML_and_OpenDocument

  • Steve Swettenham September 2, 2011 - 1:46pm

    More links on Office conversions:

    Not as up-2-date as I would like, but do contain calibration test files useful in comparing conversion compatability between office software.

     

  • Mark A. McCutcheon September 2, 2011 - 2:26pm

    Okay so you'd recommend .txt or .xml over .rtf -- good to know.

    You weren't kidding about 7zip. That format has amazing compression capacity, which I discovered while researching the Wikileaks cable archives. (300 Mb 7zip = 16 Gb!)

  • Steve Swettenham September 2, 2011 - 3:30pm

    TXT is as low as you can go in hopes of retrieval of text (albeit complicated by character encoding).  XML is human/machine readable with the chance to be transformed to other formats (assuming known schema).  RTF is just legacy retrieval but not much toward "futureproof" since RTF is a proprietary format (as far a I know).

    7zip (http://en.wikipedia.org/wiki/7-Zip) is an odd project that has a group following to counter other archivers.

    I remember the PKWARE company well and was surprised when they open sourced ZIP (happily).

    For PC users there is also http://portableapps.com/apps/utilities/7-zip_portable

    7zip was not included in the Windows and MAC OS's.  Therefore ZIP got the traction, while 7-zip became a geek app.

    I lost the link on a technical discussion of digital leakage from compression/decompression that may become relevant to archiving.

    C'est la vie!