Format LongevityLooking for the Opposite?
Carefully choose your storage format. Remember that everyone was surprised when WordStar, the dominant word-processing tool of the early 1980s, was replaced by WordPerfect. And then WordPerfect was largely replaced by Microsoft Word and OpenOffice, which in turn was replaced by LibreOffice.This page is part of the Availability cybersecurity collection
format is intended to solve intercompatability problems.
Make sure to save files in that format,
filename extensions for text documents,
for spreadsheets, and
Plain text content is really the best electronic format for long-term storage and wide use. This originally meant just ASCII, but now UTF-8 and Unicode. These files could be viewed and edited with general purpose tools.
An HTML file (containing ASCII, UTF-8, or Unicode) can preserve formatting — proportional fonts, varying typefaces, colors, etc. It is also easy to edit in the future. PDF could preserve pagination and printed appearance, but with the loss of editability without specialized tools.
For image files, JPEG should be quite long-lived. Any eventual replacement of JPEG will have to be a gradual thing, with the enormous base of existing JPEG data, and conversion tools should be widely available when or if needed.
To take this to an extreme, researchers at the Pacific Northwest National Laboratory (or PNNL) investigated encoding information into DNA sequences inserted into the genome of extremophile bacteria. See the 2003 paper in Communications of the ACM, "Organic Data Memory Using the DNA Approach", by Pak Chung Wong, Kwong-Kwok Wong, and Harlan Foote, also available here.
Deinococcus radiodurans can survive highly acidic environments, vacuum, dessication, and exposure to radiation flux about 1,000 times what is fatal to humans. Strains have been developed that can also consume and detoxify both toluene and ionic mercury residue found in the radioactive waste generated by nuclear weapons manufacturing processes. PNNL is part of the U.S. Department of Energy, those scientists were looking for a recording format that would survive intense nuclear war or natural disasters including large asteroid impacts, and be retrieved in the distant future.
The archived information should also include messages like "Don't militarize intelligent robots" and "Don't antagonize extraterrestrial civilizations capable of interstellar flight."
Canadian poet Christian Bök, a visiting artist at MIT's Center for Art, Science, and Technology created The Xenotext, a pair of poems to be recorded in DNA and protein within an extremophile. The DNA would encode a 14-line poem "Orpheus", which would replicate an amino acid sequence encoding another 14-line poem "Eurydice". The output protein would also fluoresce red to draw attention to the encoded information.
Of course, some events kill even the extremophiles.
Media Longevity and Failure Rates
Have you considered the longevity of your storage media? The article "Ensuring the Longevity of Digital Documents" [Scientific American, January 1995, pg 42] discussed this. An updated revision is available for download. More recently, "Avoiding a Digital Dark Age" [American Scientist, v98, n2 (Mar-Apr 2010), pg 106] and "Now we know it..." [New Scientist, 30 Jan 2010, pp 37-39] also discussed it. Scientific American had another short feature on this topic in April 2011, "Seeing Forever: Storing Bits Isn't the Same as Preserving Them", which pointed out that no electronic data format had been around for even fifty years (both ASCII and EBCDIC being first standardized in 1963). Nature had an article in 2017, "Disks back from the dead" [v545, p117, 4 May 2017]. It discusses the importance of both physical and logical longevity. It mentions some surprisingly cheap services — "a few dollars per disk", far less that what I would expect, including FloppyDisk in Lake Forest, California, and RetroFloppy in Cary, North Carolina. Then David Pogue at Scientific American returned to the theme in his column "Fighting Format Rot" in November 2017 [v317, no 5, pg 26] (he also wrote the April 2011 column).
Also see Vivek Navale's paper "Predicting the Life Expectancy of Modern Tape and Optical Media" in RLG DigiNews, Aug 15, 2005, 9:4.
The U.S. Library of Congress studies these problems at their Center for the Library's Analytical Science Samples, their work was described in an Atlantic article.The Machine Stops
E. M. Forster, 1909
Summary: All media erodes, ink on paper is far better than any magnetic media, and we just don't have enough information to really say how long optical media is likely to last. The longevity of the logical format may be more important. Consider that the last heiroglyphs were carved in 396 AD, but soon after that we lost the ability to read the still very sharp and distinct writing. Similarly, Sumerian was used as a sacred, literary and scientific language in Mesopotamia until the 1st century AD, when it was quickly forgotten. Both Egyptian and Sumerian were deciphered in the 1800s through the use of trilingual inscriptions, although Egyptian is far better understood today. Also consider that the Dead Sea Scrolls are ink-on-parchment and ink-on-papyrus media about 1900 years old but still readable. Egyptian papyrus is up to twice as old and is also readable. The oldest true paper we have is from 868 AD. But computer media from a decade ago is often useless.
Ink jet output becomes fuzzy and dim after a few years. Moisture in the air makes the ink spread out within the paper, making it fuzzy. And, the ink itself dims. Different colors of ink fade at different rates, so colors will shift or things that used to be black are now some color.
Laser jet output will last longer, as the toner is fused onto the paper and the black toner gets it color from carbon. But, toner also contains polymers, which will break down over time.
Of course, the paper itself will start to break down due to acid content in paper used in a typical office, so I would expect the laser jet printing to last about as long as the paper it's printed on, which might be a few decades.
|Estimated longevity of electronic storage media, in years|
|CD-R (cyanine & azo dyes, used by Taiyo Yuden and Verbatim)||7|
|CD-RW, DVD-RW, DVD+RW||7|
| Audio CD, DVD movie, CD-R (phthalocyanine dye and silver metal layer), DVD-R, DVD+R
Most CD-R media uses phthalocyanine, although Taiyo Yuden uses cyanine and Verbatim uses azo compound dyes.
|CD-R (phthalocyanine dye and gold metal layer)||100|
"Now we know it...",
New Scientist, 30 Jan 2010, pp 37-39.
Our storage media longevity gets worse over time.
Paleolithic art, including the Venus figurines and especially the Venus of Hohle Fels, dates from up to 40,000 years ago.
Clay tablets were developed about 8,000 BC and have expected lifetimes of 4,000 years and more.
Pigment on paper or papyrus came along about 3,500 BC and lasts at least 2,000 years.
Oil-based paintings were developed about 600 AD and can be expected to last for centuries.
Silver halide monochrome photographic film was developed around 1820 and lasts over 100 years, but modern color photo films (early examples of which were developed around 1860) only last for decades.
The article "Are We Losing Our Memory? Or: The Museum of Obsolete Technology", from Lost magazine, discussed this problem as experienced by the U.S. National Archives.
I have a personal story about attempts to recover data from old media in which both the logical format and the physical media had problems.
Available Digital Media Types
The three major categories are magnetic, flash, and optical.
Magnetic media comes in the form of tape and disk. Discs can be installed inside a computer system case, or they can be placed in small self-contained cases and used as portable external devices. Some require their own power supply, others can be powered over the same USB cable carrying the data connection.
Flash memory is electronic. It can be in the form of a small "chip" or "card" that slides into a slot in a camera, smart phone, or other device. Or, it can be in the form of a "USB stick" or "USB thumb drive". They are increasingly being used to replace or supplement magnetic disk storage inside computers. Internally, the memory cells are dual-gate MOSFET transistors, with the data stored in their gates as charges in low-leakage capacitors.
Optical media takes the form of optical discs, usually spelled that way and not "disk". CD or Compact Disc, DVD or Digital Video Disc, and BD or Blu-Ray Disc media have identical physical dimensions, but very different optical and data storage characteristics. CD holds just 700 MB, DVD holds 4.7 GB per layer (6.7 times one CD), and BD holds 25 GB per layer (35.7 times one CD), with two-layer discs the current industry standard.
Lifespan of Flash Memory
This is the media with the least public information on storage lifetime. Usage lifetime is the more useful measurement for most applications.
Flash memory has a finite number of program-erase cycles, which you will see described as P/E cycles. Most of the flash products you can buy are guaranteed for around 100,000 P/E cycles before the memory wear begins to degrade data integrity. Some chip firmware or operating system drivers can count the writes and remap write operations across sectors, this is called wear leveling. Another technique verifies write operations and remaps I/O to spare sectors, this is called bad block management.
Either way, you will start to lose data after about 100,000 program-erase cycles.
There is also a problem called read disturb, in which a large numbers of read operations on some data blocks can cause changes to nearby cells if those nearby cells are not re-written. These errors also begin to appear after hundreds of thousands of operations.
Now, what if you store data in flash memory and set it aside. For how long will you be able to read that data back out? The articles cited above, used to build the table of estimated longevity shown above, said that about 10 years is what you could expect.
With the rapidly dropping cost of flash memory, and the corresponding rapid growth of storage capacity you can get for a fixed price, I think that the industry sees this as a somewhat silly question. They would ask: "Instead of worrying about how long your data will safely reside in that old 128 MB thumb drive, why haven't you copied it into your brand-new, much cheaper device with over 100 times the storage capacity?"
Lifespan of Optical Memory
First, realize that factory produced optical media is entirely different from what you record at home.
Factory produced optical media have names like CD-ROM and DVD-ROM to indicate that they are read-only memory storage. A glass master is used to mold a polycarbonate disc with the required pattern of pits. That surface is then metallized, with a thin layer of mostly aluminum plus traces of other metals sputtered onto it in a vacuum chamber. UV-curable lacquer is then applied to the metallized surface and cured under high intensity UV illumination. The result should be useful for 20 years or more.
Hold a factory produced CD or DVD up to a bright light. You will not see any light coming through the disc as it contains a thin but solid metal layer.
Compare this to a piece of writable or re-writable media, as in the picture at left. There is a thin reflective metal layer within the disc, but it is usually so thin that it is somewhat translucent. There are many specific forms: CD-R, CD-RW, DVD-R, DVD+R, DVD-RW, DVD+RW, BD-R, BD-RE. Most of these rely on optically sensitive dyes to allow recording data. The chemistry of these dyes gives them widely varying stability, all of it significantly worse than the metal layer of a factory version. Exposure to direct sunlight greatly accelerates data loss, as does high or varying temperature and humidity.
CD-R discs rely on photosensitive dyes. Initially, cyanine dyes and hybrid dyes based on cyanine were used. They would fade and become unreadable in a few years even if carefully stored. They would become unreadable in just a few days if exposed to direct sunlight, with the "stabilized" ones lasting a week in the sun before losing data.
Azo and phthalocyanine dyes are more stable, with azo CD-Rs typically rated for decades and phthalocyanine CD-Rs rated for a hundred years or more (although recent studies seriously question these claims). Both are sensitive to UV radiation and therefore quickly degrade when exposed to sunlight. Phthalocyanine CD-Rs begin to degrade after two weeks of direct sunlight exposure, and azo CD-Rs are three to four weeks. Other factors leading to early degradation include the quality of the polycarbonate forming the disc and the metallic reflective layer behind the dye. Writer calibration and quality also effect the longevity of the recorded disc. A more marginal disc with recoverable errors will more quickly degrade to the point where its errors are no longer recoverable.
DVD-R and DVD+R are similar to CD-R in their reliance on chemical dyes that can fade or otherwise degrade over time. The laser wavelength is shorter, in order to read and write smaller pits on narrower tracks and therefore pack more data onto the disc. CD-R uses near infrared lasers at 780 nm, while DVD lasers use a red 640 nm laser. So, the DVD dyes are different from those used on CD-R.
DVD-R and DVD+R differ in some non-chemical details not effecting their lifetime. DVD+R may be a little more reliable when burning or recording, but DVD-R is a little more portable as some drives can read DVD-R but not DVD+R.
DVD-R DL and DVD+R DL are dual-layer versions, storing twice as much data as single-layer DVDs.
CD-RW uses an AgInSbTe alloy as its reflective layer. Its original state is polycrystalline and reflective, and would be read as a "1". To write a "0", the laser uses its maximum power of 8-14 mW to heat the material to 500-700 °C, liquifying the alloy and making it amorphous and non-reflective. To later change that bit back to a "1", the laser heats the bit with low power to about 200 °C, at which the alloy returns to its polycrystalline and reflective state. This can only be done a limited number of times, long-term data retention is quite poor, and the resulting media cannot be read in many drives. DVD-RW and DVD+RW are very similar to CD-RW in technical details and poor lifetime and portability, usually using a different alloy, GeSbTe. DVD-RW and DVD+RW differ in some non-chemical details not effecting their lifetime.
BD-R seems to be similar to DVD-R, again with changes in dye chemistry as now the laser illumination is blue, at 405 nm.
BD-RE seems to be similar to DVD-RW, possibly with changes in alloy chemistry.
Use the smartmontools package to automatically test your storage devices. See the excellent Linux Journal article for details. A very short reminder is:
||Display information on disk
||Health status results|
||Attributes as of the last measurement|
||Log of errors|
Lifespan of Magnetic Memory
How long do you expect a magnetic disk drive to last before it fails? Is one brand better than another?
Who knows, and not especially....
Disk manufacturers do studies, but they are accelerated failure tests on their own systems only under very specific conditions. Any manufacturer can have a short run of worse or better devices, and comparisons between various manufacturers' products haven't been very meaningful.
Two papers presented at the 5th USENIX Conference on File And Storage Technology (FAST '07) have gotten quite a bit of attention.
The first is "Failure Trends in a Large Disk Drive Population", by Eduardo Pinheiro, Wolf-Dietrich Weber, and Luiz Andre Barroso, of Google.
The second is "Disk Failures in the Real World: What Does an MTTF of 1,000,000 Hours Mean To You?" by Bianca Schroeder and Garth A Gibson, of Carnegie Mellon University. You can read their CMU Technical Report or the FAST '07 paper.
Here is my summary of the Google paper:
Their study was based on over 100,000 disk drives, a variety of PATA and SATA from a variety of manufacturers, 80-400 GB and 5400-7400 RPM. They do not provide information about the specific manufacturers, but that really isn't all that important. All manufacturers have short runs of worse and better quality, and an attempt to measure who was better would probably be overwhelmed by measurement noise.
Some SMART parameters are highly correlated with disk failures. However, SMART parameters alone are not all that useful for predicting individual drive failures.
Contrary to common assumptions, temperature and activity are not highly correlated to drive failure.
Drive manufacturers quote yearly failure rates below 2%, but user studies report up to 6%. Many apparent failures in the field don't seem to be failures in the lab — maybe the problem was with a specific controller or data cable. They cite other studies of failure rates:
- Study of 368 SCSI disks over 18 months, 1.9% failure rate.
Study of 2489 disks at
archive.orgover 12 months, 2% failure rate (although up to 6% per year in the past).
- Study of 15,805 and 22,400 disks at each of two large web hosting companies, 3.3-6% failure rates.
Some SMART data is clearly bogus. I agree — one of my disks seems to consistently report its temperature in degrees Farenheit instead of the expected Celsius, and so it appears to always be somewhere above the boiling temperature of water.
A significant number of drives fail within the first 3 months. The weak ones die quickly.... Then the failure rate climbs after two years. Annualized failure rates, approximated from their Figure 2:
|Annualized failure rate||2.8%||1.7%||1.7%||8.1%||8.6%||6.0%||7.8%|
Four SMART parameters were significantly correlated with increased failure rates.
|Error type||Meaning||After the first such occurrence of this error, this many times more likely to fail within 60 days than a drive without this error|
|Scan error||Drives typically scan the disk surface in the background and report errors as they are found. Large scan error counts may indicate surface defects.||39 times more likely to fail|
|Reallocation counts||Drive's logic has remapped a faulty sector number ot a new physical sector drawn from its pool of spares, because of recurring soft errors or a hard error. May indicate drive surface wear.||14 times more likely to fail|
|Offline reallocation||Subset of the reallocation counts, counting only reallocated sectors found during background analysis. Should exclude sectors reallocated due to errors during actual I/O.||21 times more likely to fail|
|Probational counts||Suspect bad sectors put "on probation". Weaker indication of possible problems.||16 times more likely to fail|
But while that looks impressive, over 56% of the failed drives had zero counts in all four of those SMART parameters! So, models based only on those four signals will predict less than half the failed drives.
The Google report said that there was a strong correlation with manufacturer but they did not report it. That's fair enough, because the clusters of good and bad disks seem to be with manufacturing batches and not with manufacturers. Meaning, that is, that any manufacturer has both good and bad runs of disks.
If you want to see names, a Russian study included it. It was on the net at pro.sunrise.ru but the article is no longer there. You can, however, find it through the archive.org Wayback Machine.
Counter-Availability and Destroying Media
If you want to quickly and easily destroy a CD or DVD, place it in a microwave for just a second or so.
Below you see the result of putting a commercial CD into a microwave oven for just one second. The oven was a General Electric E640J 002 nearly twenty years old, and it probably doesn't generate its original 970 watts of power at 2.45 GHz. However, just one second rendered this disk unreadable by most if not all adversaries.
Yes, some heavy-duty office shredders can also eat CDs and DVDs, but they can make a huge mess of metal foil slivers and plastic chips, and the resulting mix of paper, plastic and metal is not recycleable.
Availability topics with their own pages:
On the general Availability page:
Back to the Security Page