Landing : Athabascau University

Digital Archive or Self-Delete

A definition of the word 'archive' does not include the element of time.  Yet time is critical to human communication and the digital environment.

I submit that as communication becomes more digital,  the digital data will become more volatile, fragile, and inherently weaker to future retrieval.


While continuous backup provide a temporary solution.  A digital archive solution that compares to the longevity of cave paintings in France has yet to materialize.

I would have thought this would give pause to those digital knowledge workers developing in closed and encrypted digital environments.  However, the digital flow through seems to follow a more consumer oriented model of 'use and throw away'.  Hence DVD Regions codes, software keys and activations are a very short term commercial mindset, and horribly unfriendly archive solution.

Therefore, open-source and copyright free digital environments will have the greatest advantage in future retrieval and repurposing.

Think cave paintings - if they had been encrypted or copyrighted for destruction - would there be much of human history left?

Does anyone expect to pass on an office file 100 years from now?

Comments

  • I recently blogged about the prospect of a looming digital dark age. The embedded video addresses your concerns about non-futureproof media, not to mention the regimes of production that promote obsolescence. Region codes -- even discs themselves -- like so many other media represent little more than windows of short-term exploitation for copyright holders. (Lawrence Lessig's Free Culture points out that neglected depository requirements have left much of 20th-century history -- that of broadcast media -- either locked up or lost.)

    Unfortunately, the fact that the short-term priorities of media conglomerates trump the public interest of the public archive can make history itself profoundly malleable. We have always been at war with Oceania.

    Mark A. McCutcheon August 29, 2011 - 8:36pm

  • My first thesis was produced on an A B Dick machine with a 5.25" floppy disk back in 1985.  By 1990 I was fully entrenched in PC DOS systems but had no way of reading the A B Dick format (even though I had 5.25" floppy drives).  I sent my precious disks to a company in Toronto that guaranteed conversion or no charge.  I got my disks back at no charge with no conversion.

    Lesson learned early on is keep your precious data in formats that will easily be retrievable in the future.  (i.e., formats that are cross-platform  / platform independent).  Open source applications are a step towards format transparency.  I would hazard to speculate that so long as binary computers are used, TIFF, WAV, AIFF, FLAC, ODT, etc. will still be readable by future systems.

    Hence, "future-proofing" digital tech is even a challenge within one lifetime.

    Reference:
    http://en.wikipedia.org/wiki/A.B._Dick_Company

    Steve Swettenham August 30, 2011 - 12:15pm

  • What formats would you identify as robust (if not futureproof) for text documents, and for moving-image/film/video documents? (Also, what is ODT?)

    Mark A. McCutcheon August 30, 2011 - 12:42pm

  • Good question.

    Here are some links that attempt to solve that:

    ODT = LibreOffice Open Document Text format.

    FODT = LibreOffice Flat XML Document format.

    Back in 2004 "Decay and obsolescence could make many files unretrievable in future By Katie Hafner" discussed the Web as one format having uniformity. However, I disagree. As the business of web took over, the HTML became less pure. Now you have to be running at least four different web browsers to check website output.

    The formats that I respect for long term preservation are:

    Audio -


    • WAV, AIFF, FLAC


    • MP3 (ONLY in cases where I am not concerned with quality)





    Office -

    • ASCII TEXT, RTF

    • Open Office formats which are XML

    • PDF (primarily for web distribution)

    EMAIL -

    • EML



    COMPRESSION

    • ZIP

    • 7ZIP (is good but not widely used)



    Graphic -

    • SVG (Vector)


    • TIFF (Raster)


    • RAW (Only when bundled with RAW decoder for all platforms)

    • DJVU (for web distribution)



    

GIS/GPS ?

    Video -

    • FFMPEG (FFV1)

    • MKV (x264)


     

    WWW ?



    Steve Swettenham August 31, 2011 - 11:09am

  • Thanks for this comprehensive answer! Looks like I've got a lot of .doc-to-.rtf conversion work ahead...

    Mark A. McCutcheon August 31, 2011 - 3:29pm

  • In my experience RTF is my last choice derived in rare cases from such output as email archives.  Doc is easier to convert to open office format that retains graphics in XML.  For PPT I use "Presentation" in LibreOffice.  The conversions between Open Office and MS Office with multimedia, tables, etc. are close but not perfect as I'm sure you have already experienced.

    Reference:

    http://en.wikipedia.org/wiki/Comparison_of_Office_Open_XML_and_OpenDocument

    Steve Swettenham September 2, 2011 - 1:36pm

  • More links on Office conversions:

    Not as up-2-date as I would like, but do contain calibration test files useful in comparing conversion compatability between office software.

     

    Steve Swettenham September 2, 2011 - 1:46pm

  • Okay so you'd recommend .txt or .xml over .rtf -- good to know.

    You weren't kidding about 7zip. That format has amazing compression capacity, which I discovered while researching the Wikileaks cable archives. (300 Mb 7zip = 16 Gb!)

    Mark A. McCutcheon September 2, 2011 - 2:26pm

  • TXT is as low as you can go in hopes of retrieval of text (albeit complicated by character encoding).  XML is human/machine readable with the chance to be transformed to other formats (assuming known schema).  RTF is just legacy retrieval but not much toward "futureproof" since RTF is a proprietary format (as far a I know).

    7zip (http://en.wikipedia.org/wiki/7-Zip) is an odd project that has a group following to counter other archivers.

    I remember the PKWARE company well and was surprised when they open sourced ZIP (happily).

    For PC users there is also http://portableapps.com/apps/utilities/7-zip_portable

    7zip was not included in the Windows and MAC OS's.  Therefore ZIP got the traction, while 7-zip became a geek app.

    I lost the link on a technical discussion of digital leakage from compression/decompression that may become relevant to archiving.

    C'est la vie!

    Steve Swettenham September 2, 2011 - 3:30pm

These comments are moderated. Your comment will not be visible unless accepted by the content owner.

Only simple HTML formatting is allowed and any hyperlinks will be stripped away. If you need to include a URL then please simply type it so that users can copy and paste it if needed.

(Required)

(Required)