Standardisation and Persistent Archives
I’ve been attending the Australian Academy of the Humanities Annual Symposium on “Humanities Futures” and the place of eResearch in the Humanities. It’s really interesting stuff, and many of the papers move beyond the standard call to digitise the archive, and on to considerations of archival formats, meta-tagging, and standardisation. It a timely call, as the digitisation project is gathering pace, and more and more public and private archives are being made available online.
What worries me however, is the lack of understanding about the importance of standardisation and format control. Many of the file formats that currently exist for sound, image, and even text seem stable enough, but if formats have been changing ever since the personal computer was invented, why should we assume they are going to stop now? MP3 is currently dominant, but if Microsoft gets its way the WMA format will take over, and I don’t need to comment about Apple’s DRM formats. Of course, as a speaker said yesterday, MP3 is not an archival format (presumably WAV is) but what guarantee is there that even that standard will continue to be supported?
Because ultimately, with the possible exception of XML, all these formats are proprietary: we can assume they will continue at least until they cease to be profitable, but then what?
It’s hard to find the data, but at the end of 2003 it was estimated that there were 50,000 online journals, most of them available in PDF. Imagine what would happen if Adobe ceased to support the format, or changed it sufficiently that older versions became unreadable. Already, apparently, PDFs created with versions one or two of Acrobat are no longer readable.There is precedent in Microsoft refusing to follow the W3C standards for HTML, obviously for marketing reasons. Adobe could effectively hold the academy to ransom, now that so much is dependent on that format.
Online research and archiving can only be effective if the data is robust and persistent, and in a standard, approved format. There was a plaintive cry at the end of one paper yesterday for an agreed set of standards for Humanities eResearch, covering file formats, archiving conventions, and (especially) an agreement on such things as XML definitions. This is an enormous task, and neither the government nor the current opposition looks willing to fund such an enterprise—the humanities after all only need a few books and a dusty garret to work in.
There are many new and exciting tools available for researchers in the Humanities (GIS, interactive databases, multimedia archives, etc.). In the enthusiasm for these new toys, let’s not forget to to take a measured approach, and think about what we are doing, and how it can be made portable and enduring. There are way too many Hypercard databases out there already.