Hex dump of Gibe-F worm.

Availability Tools

Availability is Different

Of the CIA triad of information security — Confidentiality, Integrity, and Availability — this one is different.

We have encryption for confidentiality. This is defensive cryptographic technology, attempting to prevent an adversary from reading our information. It cannot guarantee that an adversary could not discover the decryption key or otherwise obtain our information, but it makes that attack very difficult.

We have cryptographic hash functions for integrity. This is detective cryptographic technology, attempting to tell us if an adversary has modified our information. It cannot guarantee that an adversary could not somehow modify our information in a way that its content has different meaning but we do not notice this, but it makes that attack very difficult.

We have specific numbers in both cases for how much work it would take to successfully attack us. Attacks will always be possible in theory, but we can make them hard enough in practice that we do not need to worry.

Unfortunately, we have no cryptographic tools for availability. This means that we have no math, and so we have no numbers. We cannot rigorously prove the likelihood of data or any other resource remaining available. We cannot even say that any one data set is more likely to remain available than another.

The best thing we have is statistics on what has happened so far in a similar setting. If someone reports that a specific type of storage media has "a lifetime of 2 to 5 years", what they are really saying is that in some percentage of similar cases, maybe 95% of them, maybe 99% of them, the data was available for between two and five years. In a few cases, it did not last even two years, and in a few more it may have lasted for more than five. All you really know is that if you use a large number of these storage devices, most of your data will probably still be around two years later.

Availability simply cannot be guaranteed. Any unprivileged user on a Unix-family operating system can type the following:

$ a() { a|a & } ; a 

That defines a shell function a() which immediately calls itself and pipes that output into a second copy of itself. That would recurse down an endless hole, doubling the number of processes at each level. And then that line calls that disastrous function.

On Solaris 9 that immediately freezes the system.

On Linux with a 3.* kernel and a typical amount of RAM, you would have about one second before the system freezes.

On OpenBSD the system freezes for a few seconds before the kernel steps in and kills the out-of-control set of processes.

On Linux with a 4.* kernel, the system freezes for a few seconds, then is very sluggish for several seconds while a blizzard of error messages fly up the screen where you did this, a mix of:
"-bash: fork: retry: No child processes"
"-bash: fork: retry: Resource temporarily unavailable"
The load average can climb over 100 within a few seconds. I tested this on a Raspberry Pi with only 512 MB of RAM and a single-core CPU. In another terminal window where I was connected in over SSH, I ran "top -d 0.2" to observe the freeze, the sluggishness, and the load average spike.

The systemd project is taking over more and more of the Linux operating system environment. Thanks to its reckless design, you may be able to freeze a system with this single line:

$ NOTIFY_SOCKET=/run/systemd/notify systemd-notify "" 

See that attacks's explanation here, along with discussion of how systemd's creeping takeover of the Linux operating system may be a very bad idea.

Also see this "Compiler Bomb", a 29-byte C program that compiles to a 17,179,875,837 byte (or 16 GB) executable. We must pass the -mcmodel=medium option because the array is larger than 2 GB, and possibly the -save-temps option to keep temporary files in the local directory if there isn't enough space in the /tmp file system. During attempted compilation by any unprivileged user, the system becomes sluggish from time to time as memory is exhausted. See the Compiler Bomb page and the original discussion for more details on this.

$ cat cbomb.c
$ gcc -mcmodel=medium cbomb.c -o cbomb
cbomb.c:1:1: warning: data definition has no type or storage class
/tmp/ccZbsIhp.s: Assembler messages:
/tmp/ccZbsIhp.s: Fatal error: cannot write to output file '/tmp/cc5mxHSz.o': No space left on device
$ time gcc -mcmodel=medium -save-temps cbomb.c -o cbomb
cbomb.c:1:1: warning: data definition has no type or storage class
/usr/bin/ld: final link failed: Memory exhausted
collect2: error: ld returned 1 exit status

real    2m16.169s
user    0m4.512s
sys     0m12.573s
$ ls -l
total 16777232
-rw-rw-r-- 1 cromwell cromwell          15 Oct 19 10:18 cbomb.c
-rw-rw-r-- 1 cromwell cromwell         143 Oct 19 10:23 cbomb.i
-rw-rw-r-- 1 cromwell cromwell 17179870214 Oct 19 10:26 cbomb.o
-rw-rw-r-- 1 cromwell cromwell         219 Oct 19 10:23 cbomb.s

Finally, we can't defeat nature, especially when we're overly reliant on limited facilities.
2018: 30-minute power outage at Samsung factory near Pyeongtaek destroyed 3.5% of global v-NAND flash memory output for March 2011: Floods in Thailand led to hard drive shortages for months

Here are the detailed sections on this page:

Netflix As An (Extreme) Example

Netflix has created the Chaos Monkey and other elements of its Simian Army to stress its system to test resiliency. It's very surprising that they unleash these tools on their production systems, tell people about this, and even give away the tools. See the Netflix technical blog for details.
Netflix technical blog

Netflix is largely built on the Amazon Web Services public cloud. The Chaos Monkey disables selected production systems, while the Chaos Gorilla takes out an entire AWS availability zone. The Doctor Monkey does automated alert and response, searching Netflix's resources for any degradations in performance.

Format Longevity

Carefully choose your storage format. Remember that everyone was surprised when WordStar, the dominant word-processing tool of the early 1980s, was replaced by WordPerfect. And then WordPerfect was largely replaced by Microsoft Word and LibreOffice.

The OpenDocument format is intended to solve intercompatability problems. Make sure to save files in that format, with .odt or .fodt filename extensions for text documents, .odp or .fodp for presentations, .ods or .fods for spreadsheets, and .odg or .fodg for graphics.

Plain text content is really the best electronic format for long-term storage and wide use. This originally meant just ASCII, but now UTF-8 and Unicode. These files could be viewed and edited with general purpose tools.

An HTML file (containing ASCII, UTF-8, or Unicode) can preserve formatting — proportional fonts, varying typefaces, colors, etc. It is also easy to edit in the future. PDF could preserve pagination and printed appearance, but with the loss of editability without specialized tools.

For image files, JPEG should be quite long-lived. Any eventual replacement of JPEG will have to be a gradual thing, with the enormous base of existing JPEG data, and conversion tools should be widely available when or if needed.

"Organic data memory using the DNA approach" PNAS on Christian Bök and The Xenotext Deinococcus radiodurans How to Destroy the Earth

To take this to an extreme, researchers at the Pacific Northwest National Laboratory (or PNNL) investigated encoding information into DNA sequences inserted into the genome of extremophile bacteria. Deinococcus radiodurans can survive highly acidic environments, vacuum, dessication, and exposure to radiation flux about 1,000 times what is fatal to humans. Strains have been developed that can also consume and detoxify both toluene and ionic mercury residue found in the radioactive waste generated by nuclear weapons manufacturing processes. PNNL is part of the U.S. Department of Energy, those scientists were looking for a recording format that would survive intense nuclear war or natural disasters including large asteroid impacts.

Canadian poet Christian Bök, a visiting artist at MIT's Center for Art, Science, and Technology created The Xenotext, a pair of poems to be recorded in DNA and protein within an extremophile. The DNA would encode a 14-line poem "Orpheus", which would replicate an amino acid sequence encoding another 14-line poem "Eurydice". The output protein would also fluoresce red to draw attention to the encoded information.

Of course, some events kill even the extremophiles.

Media Longevity and Failure Rates

Have you considered the longevity of your storage media? The article "Ensuring the Longevity of Digital Documents" [Scientific American, January 1995, pg 42] discussed this. An updated revision is available for download. More recently, "Avoiding a Digital Dark Age" [American Scientist, v98, n2 (Mar-Apr 2010), pg 106] and "Now we know it..." [New Scientist, 30 Jan 2010, pp 37-39] also discussed it. Scientific American had another short feature on this topic in April 2011, "Seeing Forever: Storing Bits Isn't the Same as Preserving Them", which pointed out that no electronic data format had been around for even fifty years (both ASCII and EBCDIC being first standardized in 1963). Nature had an article in 2017, "Disks back from the dead" [v545, p117, 4 May 2017]. It discusses the importance of both physical and logical longevity. It mentions some surprisingly cheap services — "a few dollars per disk", far less that what I would expect, including FloppyDisk in Lake Forest, California, and RetroFloppy in Cary, North Carolina. Then David Pogue at Scientific American returned to the theme in his column "Fighting Format Rot" in November 2017 [v317, no 5, pg 26] (he also wrote the April 2011 column).

Also see Vivek Navale's paper "Predicting the Life Expectancy of Modern Tape and Optical Media" in RLG DigiNews, Aug 15, 2005, 9:4.

The U.S. Library of Congress studies these problems at their Center for the Library's Analytical Science Samples, their work was described in an Atlantic article.

The Machine Stops
E. M. Forster, 1909
What if
The Machine

Summary: All media erodes, ink on paper is far better than any magnetic media, and we just don't have enough information to really say how long optical media is likely to last. The longevity of the logical format may be more important. Consider that the last heiroglyphs were carved in 396 AD, but soon after that we lost the ability to read the still very sharp and distinct writing. Similarly, Sumerian was used as a sacred, literary and scientific language in Mesopotamia until the 1st century AD, when it was quickly forgotten. Both Egyptian and Sumerian were deciphered in the 1800s through the use of trilingual inscriptions, although Egyptian is far better understood today. Also consider that the Dead Sea Scrolls are ink-on-parchment and ink-on-papyrus media about 1900 years old but still readable. Egyptian papyrus is up to twice as old and is also readable. The oldest true paper we have is from 868 AD. But computer media from a decade ago is often useless.

Ink jet output becomes fuzzy and dim after a few years. Moisture in the air makes the ink spread out within the paper, making it fuzzy. And, the ink itself dims. Different colors of ink fade at different rates, so colors will shift or things that used to be black are now some color.

Laser jet output will last longer, as the toner is fused onto the paper and the black toner gets it color from carbon. But, toner also contains polymers, which will break down over time.

Of course, the paper itself will start to break down due to acid content in paper used in a typical office, so I would expect the laser jet printing to last about as long as the paper it's printed on, which might be a few decades.

Estimated longevity of electronic storage media, in years
CD-R (cyanine & azo dyes, used by Taiyo Yuden and Verbatim) 7
Flash RAM 10
Digital tape 13
Analogue tape 20
Audio CD, DVD movie, CD-R (phthalocyanine dye and silver metal layer), DVD-R, DVD+R
Most CD-R media uses phthalocyanine, although Taiyo Yuden uses cyanine and Verbatim uses azo compound dyes.
CD-R (phthalocyanine dye and gold metal layer) 100
From "Now we know it...", New Scientist, 30 Jan 2010, pp 37-39. Our storage media longevity gets worse over time.

Paleolithic art, including the Venus figurines and especially the Venus of Hohle Fels, dates from up to 40,000 years ago.
Clay tablets were developed about 8,000 BC and have expected lifetimes of 4,000 years and more.
Pigment on paper or papyrus came along about 3,500 BC and lasts at least 2,000 years.
Oil-based paintings were developed about 600 AD and can be expected to last for centuries.
Silver halide monochrome photographic film was developed around 1820 and lasts over 100 years, but modern color photo films (early examples of which were developed around 1860) only last for decades.

The article "Are We Losing Our Memory? Or: The Museum of Obsolete Technology", from Lost magazine, discussed this problem as experienced by the U.S. National Archives.

I have a personal story about attempts to recover data from old media in which both the logical format and the physical media had problems.

Available Digital Media Types

120 MB DC2120 QIC 80 magnetic tape.

Another relic from the collection of obsolete storage media: 120 MB DC2120 QIC 80 magnetic tape.

The three major categories are magnetic, flash, and optical.

Magnetic media comes in the form of tape and disk. Discs can be installed inside a computer system case, or they can be placed in small self-contained cases and used as portable external devices. Some require their own power supply, others can be powered over the same USB cable carrying the data connection.

Flash memory is electronic. It can be in the form of a small "chip" or "card" that slides into a slot in a camera, smart phone, or other device. Or, it can be in the form of a "USB stick" or "USB thumb drive". They are increasingly being used to replace or supplement magnetic disk storage inside computers. Internally, the memory cells are dual-gate MOSFET transistors, with the data stored in their gates as charges in low-leakage capacitors.

Optical media takes the form of optical discs, usually spelled that way and not "disk". CD or Compact Disc, DVD or Digital Video Disc, and BD or Blu-Ray Disc media have identical physical dimensions, but very different optical and data storage characteristics. CD holds just 700 MB, DVD holds 4.7 GB per layer (6.7 times one CD), and BD holds 25 GB per layer (35.7 times one CD), with two-layer discs the current industry standard.

Lifespan of Flash Memory

This is the media with the least public information on storage lifetime. Usage lifetime is the more useful measurement for most applications.

Flash memory has a finite number of program-erase cycles, which you will see described as P/E cycles. Most of the flash products you can buy are guaranteed for around 100,000 P/E cycles before the memory wear begins to degrade data integrity. Some chip firmware or operating system drivers can count the writes and remap write operations across sectors, this is called wear leveling. Another technique verifies write operations and remaps I/O to spare sectors, this is called bad block management.

Either way, you will start to lose data after about 100,000 program-erase cycles.

1 GB xD memory card, 16 GB MicroSD memory card, USB flash drive.

1 GB xD memory card, 16 GB MicroSD memory card, USB flash drive.

There is also a problem called read disturb, in which a large numbers of read operations on some data blocks can cause changes to nearby cells if those nearby cells are not re-written. These errors also begin to appear after hundreds of thousands of operations.

Now, what if you store data in flash memory and set it aside. For how long will you be able to read that data back out? The articles cited above, used to build the table of estimated longevity shown above, said that about 10 years is what you could expect.

With the rapidly dropping cost of flash memory, and the corresponding rapid growth of storage capacity you can get for a fixed price, I think that the industry sees this as a somewhat silly question. They would ask: "Instead of worrying about how long your data will safely reside in that old 128 MB thumb drive, why haven't you copied it into your brand-new, much cheaper device with over 100 times the storage capacity?"

Lifespan of Optical Memory

Blank DVD+R media shows its translucency.

Much recordable optical media is translucent.

First, realize that factory produced optical media is entirely different from what you record at home.

Factory produced optical media have names like CD-ROM and DVD-ROM to indicate that they are read-only memory storage. A glass master is used to mold a polycarbonate disc with the required pattern of pits. That surface is then metallized, with a thin layer of mostly aluminum plus traces of other metals sputtered onto it in a vacuum chamber. UV-curable lacquer is then applied to the metallized surface and cured under high intensity UV illumination. The result should be useful for 20 years or more.

Hold a factory produced CD or DVD up to a bright light. You will not see any light coming through the disc as it contains a thin but solid metal layer.

Compare this to a piece of writable or re-writable media, as in the picture at left. There is a thin reflective metal layer within the disc, but it is usually so thin that it is somewhat translucent. There are many specific forms: CD-R, CD-RW, DVD-R, DVD+R, DVD-RW, DVD+RW, BD-R, BD-RE. Most of these rely on optically sensitive dyes to allow recording data. The chemistry of these dyes gives them widely varying stability, all of it significantly worse than the metal layer of a factory version. Exposure to direct sunlight greatly accelerates data loss, as does high or varying temperature and humidity.

Executive Summary
  • Small capacity and degrades quickly, especially if exposed to sunlight.
  • Should be good for a decade or so if you start with good media (stable dye, good metal reflective layer) and keep them away from sunlight.
  • A little more portable than DVD+R.
  • A better choice if you're sharing.
  • A better choice if you're copying a DVD to play in a DVD player.
  • Should be good for a decade or so if you start with good media (stable dye, good metal reflective layer) and keep them away from sunlight.
  • A little less portable than DVD-R, but you will waste fewer blanks due to burning errors.
  • A better choice if you keep them and use them only in the system where you burn them.
  • Should be about like DVD-R with a little over five times the storage capacity.
  • Useful for short-term storage, as long as you don't try to re-use the media too many times.

CD-R discs rely on photosensitive dyes. Initially, cyanine dyes and hybrid dyes based on cyanine were used. They would fade and become unreadable in a few years even if carefully stored. They would become unreadable in just a few days if exposed to direct sunlight, with the "stabilized" ones lasting a week in the sun before losing data.

Azo and phthalocyanine dyes are more stable, with azo CD-Rs typically rated for decades and phthalocyanine CD-Rs rated for a hundred years or more (although recent studies seriously question these claims). Both are sensitive to UV radiation and therefore quickly degrade when exposed to sunlight. Phthalocyanine CD-Rs begin to degrade after two weeks of direct sunlight exposure, and azo CD-Rs are three to four weeks. Other factors leading to early degradation include the quality of the polycarbonate forming the disc and the metallic reflective layer behind the dye. Writer calibration and quality also effect the longevity of the recorded disc. A more marginal disc with recoverable errors will more quickly degrade to the point where its errors are no longer recoverable.

DVD-R and DVD+R are similar to CD-R in their reliance on chemical dyes that can fade or otherwise degrade over time. The laser wavelength is shorter, in order to read and write smaller pits on narrower tracks and therefore pack more data onto the disc. CD-R uses near infrared lasers at 780 nm, while DVD lasers use a red 640 nm laser. So, the DVD dyes are different from those used on CD-R.

DVD-R and DVD+R differ in some non-chemical details not effecting their lifetime. DVD+R may be a little more reliable when burning or recording, but DVD-R is a little more portable as some drives can read DVD-R but not DVD+R.

DVD-R DL and DVD+R DL are dual-layer versions, storing twice as much data as single-layer DVDs.

CD-RW uses an AgInSbTe alloy as its reflective layer. Its original state is polycrystalline and reflective, and would be read as a "1". To write a "0", the laser uses its maximum power of 8-14 mW to heat the material to 500-700 °C, liquifying the alloy and making it amorphous and non-reflective. To later change that bit back to a "1", the laser heats the bit with low power to about 200 °C, at which the alloy returns to its polycrystalline and reflective state. This can only be done a limited number of times, long-term data retention is quite poor, and the resulting media cannot be read in many drives. DVD-RW and DVD+RW are very similar to CD-RW in technical details and poor lifetime and portability, usually using a different alloy, GeSbTe. DVD-RW and DVD+RW differ in some non-chemical details not effecting their lifetime.

BD-R seems to be similar to DVD-R, again with changes in dye chemistry as now the laser illumination is blue, at 405 nm.

BD-RE seems to be similar to DVD-RW, possibly with changes in alloy chemistry.

500 GB PATA 3.5 inch disk drive and 1 TB SATA 2.5 inch disk drive.

Left: 500 GB PATA 3.5 inch internal-mount disk drive
Right: 1 TB SATA 2.5 inch external disk drive

Lifespan of Magnetic Memory

How long do you expect a magnetic disk drive to last before it fails? Is one brand better than another?

Who knows, and not especially....

Disk manufacturers do studies, but they are accelerated failure tests on their own systems only under very specific conditions. Any manufacturer can have a short run of worse or better devices, and comparisons between various manufacturers' products haven't been very meaningful.

Two papers presented at the 5th USENIX Conference on File And Storage Technology (FAST '07) have gotten quite a bit of attention.

The first is "Failure Trends in a Large Disk Drive Population", by Eduardo Pinheiro, Wolf-Dietrich Weber, and Luiz Andre Barroso, of Google.

The second is "Disk Failures in the Real World: What Does an MTTF of 1,000,000 Hours Mean To You?" by Bianca Schroeder and Garth A Gibson, of Carnegie Mellon University. You can read their CMU Technical Report or the FAST '07 paper.

Here is my summary of the Google paper:

Their study was based on over 100,000 disk drives, a variety of PATA and SATA from a variety of manufacturers, 80-400 GB and 5400-7400 RPM. They do not provide information about the specific manufacturers, but that really isn't all that important. All manufacturers have short runs of worse and better quality, and an attempt to measure who was better would probably be overwhelmed by measurement noise.

Some SMART parameters are highly correlated with disk failures. However, SMART parameters alone are not all that useful for predicting individual drive failures.

Contrary to common assumptions, temperature and activity are not highly correlated to drive failure.

Drive manufacturers quote yearly failure rates below 2%, but user studies report up to 6%. Many apparent failures in the field don't seem to be failures in the lab — maybe the problem was with a specific controller or data cable. They cite other studies of failure rates:

  • Study of 368 SCSI disks over 18 months, 1.9% failure rate.
  • Study of 2489 disks at archive.org over 12 months, 2% failure rate (although up to 6% per year in the past).
  • Study of 15,805 and 22,400 disks at each of two large web hosting companies, 3.3-6% failure rates.

Some SMART data is clearly bogus. I agree — one of my disks seems to consistently report its temperature in degrees Farenheit instead of the expected Celsius, and so it appears to always be somewhere above the boiling temperature of water.

A significant number of drives fail within the first 3 months. The weak ones die quickly.... Then the failure rate climbs after two years. Annualized failure rates, approximated from their Figure 2:

Months 0-3 4-6 7-12 13-24 25-35 36-48 49-60
Annualized failure rate 2.8% 1.7% 1.7% 8.1% 8.6% 6.0% 7.8%

Four SMART parameters were significantly correlated with increased failure rates.

Error type Meaning After the first such occurrence of this error, this many times more likely to fail within 60 days than a drive without this error
Scan error Drives typically scan the disk surface in the background and report errors as they are found. Large scan error counts may indicate surface defects. 39 times more likely to fail
Reallocation counts Drive's logic has remapped a faulty sector number ot a new physical sector drawn from its pool of spares, because of recurring soft errors or a hard error. May indicate drive surface wear. 14 times more likely to fail
Offline reallocation Subset of the reallocation counts, counting only reallocated sectors found during background analysis. Should exclude sectors reallocated due to errors during actual I/O. 21 times more likely to fail
Probational counts Suspect bad sectors put "on probation". Weaker indication of possible problems. 16 times more likely to fail

But while that looks impressive, over 56% of the failed drives had zero counts in all four of those SMART parameters! So, models based only on those four signals will predict less than half the failed drives.

The Google report said that there was a strong correlation with manufacturer but they did not report it. That's fair enough, because the clusters of good and bad disks seem to be with manufacturing batches and not with manufacturers. Meaning, that is, that any manufacturer has both good and bad runs of disks.

If you want to see names, a Russian study included it. It was on the net at pro.sunrise.ru but the article is no longer there. You can, however, find it through the archive.org Wayback Machine.

Cloud Storage Availability

People use the word "cloud" to mean anything, so now it means nothing. First, large portable external disk drives were called "Your own personal cloud", and then USB thumb drives got that label. Some Microsoft ads seem to mean "software" when they use the term. Let's stick to the original meaning of remote large data centers where customers store and process their data.

IaaS or Infrastructure as a Service is where you're renting virtualized servers in a remote data center. They provide the infrastructure, you have to take responsibility for system administration.

IaaS can be very rugged, given the huge investments in physical facilities, redundant hardware, on-site power generation capacity, and redundant network connectivity. So far, major providers including Amazon, Google, and Microsoft look like they are in it for the indefinitely long run. See my cloud security page for details.

What we might call "Storage as a Service" can be similarly rugged. Some providers are simply reselling Amazon Web Services' S3, Glacier, and other storage services wrapped in their more convenient interfaces and packaging. Why not? AWS is the biggest provider in terms both of what they offer and their investment in resilient infrastructure.

Remember two useful pieces of folk wisdom:

You get what you pay for.

If it looks too good to be true, it probably is.

We can't trust free cloud storage.

For one thing, if it's free they are probably using it at least for market research, and possibly to train their A.I. systems to build a profile on you to sell to advertisers.

Free storage services may be terminated with very little notice. Users must extract their data and find somewhere else to store, share, and process it. For some examples, 2013 was a bad year for short notices:

T-Mobile dropped their free MobileLife Album picture-storing service with just 26 days notice in June 2013. There was industry news about this, but I'm a T-Mobile customer and I only received notice from the company 26 days before the data was deleted and the service shut down. I'm glad that I wasn't using it.

Don't count on free storage offered by your ISP. The service will become popular, thus expensive, and they will shut it down.

Nirvanix simply shut down all operations with only two weeks warning in September 2013. See stories in Wired and Computer Weekly.

SugarSync dropped a free storage service with two months warning in December 2013. See stories in TechCrunch and Time.

Bitcasa announced an end to unlimited low-cost storage in October 2014, giving customers just about 3 weeks to pull out their data before the company deletes it.

Some times it's a little better. Microsoft provided 6 months warning in 2017 that they were shutting down the Docs.com file-sharing site.

Amazon announced the end of its unlimited cloud storage plan in 2017.

Companies like Wiredrive and Dropbox have easy-to-use storage services that you pay for. The prices may go up, but once you're an existing customer it seems that you're safe.

That leads to another question: What keeps a provider from doubling their price, or maybe multiplying it by 10 or more, with no warning? Nothing!

Archiving data in AWS Glacier

Amazon's Glacier storage service is a great value. Long-term storage in Glacier costs just US$ 0.004 per gigabyte per month. It's meant for backups and archives retrieved infrequently or never, you pay a penalty for frequent access and downloads.

How do you upload data to Glacier? That's the non-obvious part. It's not point-and-click as with S3. Solutions include the following:

Boto is the AWS SDK for Python.

HashBackup is a command-line tool for Linux, Mac OS X, and BSD.

CrossFTP is a graphical client for Linux, Mac OS X, and Windows.

SAGU or Simple Amazon Glacier Uploader is a graphical client for Linux, Mac OS X, BSD, and Windows.

mt-aws-glacier Perl multi-threaded multi-part sync to Amazon Glacier.

My experience with these tools is that you must start by creating vaults with meaningful names. Describe the content and the archive date in the vault so you can immediately tell what you have without waiting until tomorrow to get a Glacier inventory.

Boto is useful for an initial test. Upload a small (1-3 GB) archive to make sure it works and to get an idea for the amount of time required per upload.

Do your uploading with SAGU. It opens a window that it describes as a "progress bar" but it gives you no idea of speed or amount of progress, just that it's trying to do something. Instead, watch your outbound network utilization.

I saved the SAGU Java archive file and then created this shell script in ~/bin/glacier-sagu. I put the actual Access Key ID and Secret Access Key in the file, so ownership and permissions are crucial!


java -jar ~/bin/SimpleGlacierUploaderV0746.jar &
Access Key ID: A⌀⌀⌀⌀⌀⌀⌀⌀⌀⌀⌀⌀⌀⌀⌀⌀⌀⌀⌀
Secret Access Key: B⌀⌀⌀⌀⌀⌀⌀⌀⌀⌀⌀⌀⌀⌀⌀⌀⌀⌀⌀⌀⌀⌀⌀⌀⌀⌀⌀⌀⌀⌀⌀⌀⌀⌀⌀⌀⌀⌀⌀ 

Expect the occasional error message like the following. It should be OK, the announced retry should work:

Dec 14, 2018 2:38:13 AM org.apache.http.impl.client.DefaultHttpClient tryExecute
INFO: I/O exception (java.net.SocketException) caught when processing request: Broken pipe
Dec 14, 2018 2:38:13 AM org.apache.http.impl.client.DefaultHttpClient tryExecute
INFO: Retrying request 
SAGU or Simple Amazon Glacier Uploader main window.

The first image is the main SAGU window. The second is the "progress bar" window, which is neither exciting nor informative.

SAGU or Simple Amazon Glacier Uploader 'progress bar' window.

Here is the result of signing into AWS, selecting one of my vaults, and viewing the details.

To delete archives and vaults, first you must request an inventory with SAGU or similar and wait 4 hours. You will get a file with a 138-character ArchiveID. Select the vault in SAGU, then Delete in the top menu, then paste that ArchiveID into the new window that appears.

Once the vault is empty, wait for a few hours. Then you will be able to use the AWS dashboard to delete the vault.

This page has details on how to use Boto by writing your own Python code to get more than the rather limited command-line options.

Amazon Glacier vault description.

DDoS (Distributed Denial of Service)

Google has built a live all-Internet visualization of DDoS attacks. Also see Gizmodo's description of the tool. It's an interesting page, although it's very resource hungry.

DDoS is awfully hard to fight because you can't tell where it's really coming from. The very short description of the amplification type of attack is:

  1. The attacker is at home on their control system.
  2. The attacker gains access to a number of trigger systems, each on a network which allows source IP address spoofing. That is, the trigger system's ISP does not do sanity checking, and in particular egress filtering.
  3. A program running on each trigger system sends forged packets to a number of amplifier systems. The forged source IP address of these packets is that of the target. For a system to be an amplifier there must be a UDP service running with some combination of outdated software, misconfigured software, and/or missing or misconfigured packet filtering between the server and the Internet.
  4. Each packet requests information to be sent from the amplifier to the apparent sender, which is the target in a DDoS attacker. The amplification effect is caused by the logic of the abused protocol making the response much larger than the request. The amplification effect is up to 8× in DNS, 206× in NTP, and 650× and higher in SNMP.

These traffic graphs are from a victim organization that had all three of their GigE ISP links completely saturated with an NTP amplification attack.

Traffic graphs for NTP amplification DDoS attack on three 1Gbps ISP links.

The forged packets were from UDP ports 80 and 443, so the amplified flood was directed to those ports. Only one ISP would implement an ACL to block UDP/80 and UDP/443 to them, the other two would only blackhole the six IP addresses being attacked. As two of those blackholed IP addresses were DNS servers, they could no longer talk to the root servers or any other DNS servers and so external name resolution was completely broken.

Another organization I talked to was cut off when 3 Gbps of NTP traffic was directed at their 1 Gbps ISP link.

For good explanations of DDoS attacks in more detail see Cloudflare's introductory Understanding and mitigating NTP-based DDoS attacks, and the more detailed and specific Technical Details Behind a 400Gbps NTP Amplification DDoS Attack. Earlier, they wrote Deep Inside a DNS Amplification DDoS Attack and The DDoS That Almost Broke the Internet, and before that and at a more basic level, How to Launch a 65Gbps DDoS, and How to Stop One.

Also see The New Normal: 200–400 Gbps DDoS Attacks at KrebsOnSecurity.

More recently, Arbor Networks' 10th Annual Worldwide Infrastructure Security Report reported a 50× increase in DDoS attack size over the past decade, with a 400 Gbps attack in December 2014.

SSDP, the Simple Service Discovery Protocol, was the top mechanism for DDoS attacks in early 2015.

Akamai reported on RIPv1 reflection attacks in mid 2015.

NTP amplification was behind late 2015 DDoS attacks.

The Register described a November 2015 attack on the DNS root servers, many of which were hit with 5 million queries per second.

the Mirai
worm author

The Krebs on Security site was knocked off the Internet by the Mirai botnet for almost four days in September 2016. See Brian Krebs' great investigation of who was behind the attack.

In July 2016 Arbor announced that a study of the first half of 2016 included a peak attack size of 579 Gbps, and 274 attacks over 100 Gbps. That's about two per day. The average attack size in the first half of 2016 was 986 Mbps, projected to grow to 1.15 Gbps by the end of the year. This means that the average DDoS attack can knock most organization off-line.

October 2016 DDoS Attack

A DDoS attack on Dyn, a DNS provider, started at 0710 EDT on 21 October 2016. The attack used a botnet of "Internet of Things" devices including cameras, surveillance cameras, baby monitors, home routers, and other devices. The result cut off access to many popular web sites, especially from the eastern U.S.

The botnet was made up of devices based on components from Hangzhou Xiongmai Technology. The devices use well-known Telnet passwords as listed here. They are "white label" goods, produced by an unbranded company for third-party companies. The original manufacturer has no way of knowing which companies have rebranded and sold the insecure devices, preventing a recall.

Wikipedia on
the attack
Dyn's analysis
of the attack
Flashpoint on
the attack
The Daily Dot on
the attack
Brian Krebs on
the attack
Brian Krebs on
the manufacturer
The Atlantic,
"When the Entire Internet ..."
The Atlantic,
"How a Bunch of Hacked DVR ..."
"What We Know About ..."
The New York Times,
"Hackers Used New Weapons ..."
"Blame the Internet of Things ..."

2018 — Attacks Get Worse

Attackers started abusing exposed memcached database caching daemons for an amplification effect of 51,000. This delivers DDoS volumes above 500 Gbps.

Arbor Network report Cloud Flare report Ars Technica story

Where not to place telco pedestals

Do not place them where this one was in Herndon, Virginia — right along a road winding through office parks, where the anxious commuters hit speeds around 50 m.p.h. despite that being almost twice the posted limit.

And especially not where a sidewalk ramp makes it so easy to drift off the road while texting and smash into the poor pedestal.

Telco pedestal smashed open by a car.
Telco pedestal smashed open by a car.
Telco pedestal smashed open by a car.
Telco pedestal smashed open by a car.


How can Amazon claim such high availability for their cloud storage?

Their S3 and Glacier storage services store multiple copies of your data, in multiple physical locations. Three copies are stored in at least two physical locations. The hash value of each is periodically calculated. If any one is ever found to differ from the other two, it is recreated from the presumed good pair.

Meanwhile, those stored data objects are periodically re-written onto different physical storage media. And the underlying storage devices are rotated out of service after a specified period of time.

This process of periodic rewriting onto reasonably fresh hardware and comparing cryptographic hash values for the three current copies is designed to provide an average annual durability of 99.999999999% for an archive of data.

This estimate is based on the details of their design (frequency of disk replacement, frequency of re-writing the archive copies) and the probabilities of the scenario and environment (probability of RAID array failure leading to data loss, likelihood of cryptographic hash collisions).

Data Risk Management has a very interesting model for data archiving: www.datariskmgmt.com.

A Rather Extreme Take on Availability

See the 2003 paper in Communications of the ACM, "Organic Data Memory Using the DNA Approach", by Pak Chung Wong, Kwong-Kwok Wong, and Harlan Foote of the Pacific Northwest National Laboratory Encode information as artificial DNA sequences and insert those into the genomes of living hosts such as bacteria or possibly plants. They hope that the organisms and thus the information could survive nuclear catastrophe or similar disasters and be retrieved in the distant future.

I suppose the information would include messages like "Don't militarize intelligent robots" or "Don't antagonize extraterrestrial civilizations capable of interstellar flight."

Data Loss Costs

National Archives and Records Administration (Washington DC, USA)

93% of companies that lost their data center for 10 days or more due to a disaster filed for bankruptcy within one year of the disaster.

50% of businesses without data management for this same period filed for bankruptcy immediately.

Symantec and Ponemon

Symantec conducts a periodic study of disaster recovery plans and estimated costs. Using their 2009 report as an example, they surveyed disaster recovery management at 1,650 companies worldwide, each with at least 5,000 employees and a current disaster recovery plan. They collect a lot of data, but they summarize and present it differently from year to year so you can't necessarily track a given statistic through the years.

According to the 2001 Cost of Downtime Survey Results, companies said the cost of downtime is:
46% said up to US$ 50,000 per hour.
28% said US$ 51,000 - 250,000 per hour.
18% said US$ 250,001 - 1,000,000 per hour.
8% said over US$ 1,000,000 per hour.

According to the 2001 Cost of Downtime Survey Results, companies said that loss of data threatens the survival of a business within:
40% said 72 hours.
21% said 48 hours.
15% said 24 hours.
8% said 8 hours.
9% said 4 hours.
3% said 1 hour.
4% said less than 1 hour.

According to Symantec's 2009 Cost of Downtime Survey Results:

93% of organizations reported that they have had to implement their disaster recovery plans, either in full or partially.

They could achieve skeleton operations after a site-wide outage in a median of three hours, and get mostly back up and running in about four hours.

Based on the reported recovery time and the cost per hour of downtime (not listed in the 2009 report), the cost per incident globally averages approximately US$ 287,000 and the median cost per incident can be as high as US$ 500,000.

IT is becoming more critical over time, with 56% of applications deemed mission critical in 2008 and 60% in 2009.

Database servers are the most likely technologies covered by disaster recovery plans, at 62%, closely followed by applications and web servers, at 61% each.

As for the cause of needing to implement those disaster recovery plans:
59%    Computer system failure
54%    External threats (malware, hackers)
53%    Natural disasters (fire, flood)
45%    Power outage / issues
43%    User/operator error
39%    IT problem management
37%    Data leakage or loss
36%    Malicious employee behavior
34%    Configuration change management issues
33%    Man made disasters (e.g., war, terrorism)
26%    Configuration drift issues
 7%    Never
I am skeptical of this data. Seriously, 33% of these companies had their IT operation taken down by war and terrorism? Those DR managers were being much too broad in their interpretation of "man made disaster"! That category is stealing significant credit away from user / operator error, and some from IT problem management, configuration change mismanagement, and configuration drift. Also, "data leakage or loss" seems to me to be a result, not a cause.

Interestingly, and alarmingly, companies reported backing up only 37% of their data in virtual environments. Slightly over 25% reported that they do not test their virtual servers.

Symantec and the Ponemon Institute's 2013 Cost of Data Breach Study: United States reported that, counter to assumptions, the cost of a data breach continues to decline. I don't know if that should be attributed to people being tired of hype, or getting a little better at analyzing breaches, or what.

Malicious or criminal attacks cause more breaches than negligence or "system glitches", whatever those are.

They say that having formal incident responses in place before the incident lower the overall cost. Also listed as reducing breach cost are: "having a strong security posture", appointing a CISO or Chief Information Security Officer, and hiring outside consultants to assist with the response. And guess what the authors of that report can help you with!

CA Technologies

CA Technologies issued a 2010 report "The Avoidable Cost of Downtime" reporting that European organizations with more than 50 employees collectively lose more than €17 billion in revenue each year due to the time taken to recover from IT downtime, a total of almost 1 million hours or 14 hours per company per year. On average, each company loses €263,347 per year. The average loss per organization varied all over the place, from €500,000 in France, just under €400,000 in Germany, and just over €300,000 in Spain and Norway, to about €90,000 in Belgium and just under €34,000 in Italy

Hardware or Systems Malfunction 59%
Human Error 28%
Software Program Malfunction 9%
Viruses 4%
Natural Disaster 2%

This table shows the causes data loss according to Ontrack engineers (who seem to have lost no data to malicious intruders):

According to a Gallup poll, most businesses value 100 megabytes of data at US$ 1,000,000.

Counter-Availability and Destroying Media

If you want to quickly and easily destroy a CD or DVD, place it in a microwave for just a second or so.

Below you see the result of putting a commercial CD into a microwave oven for just one second. The oven was a General Electric E640J 002 nearly twenty years old, and it probably doesn't generate its original 970 watts of power at 2.45 GHz. However, just one second rendered this disk unreadable by most if not all adversaries.

Yes, some heavy-duty office shredders can also eat CDs and DVDs, but they can make a huge mess of metal foil slivers and plastic chips, and the resulting mix of paper, plastic and metal is not recycleable.

Original CD/DVD from a retailer.
CD/DVD destroyed in a microwave, lying on a paper towel.
CD/DVD destroyed in a microwave, silhouetted against a light.

Laptop Theft Prevention

Security cables:
Kensington Philadelphia Security Products American Power Conversion PC Guardian Secure-It Inc

"Phone home" style laptop tracking, Windows only as far as I know:
PC PhoneHome zTrace ComputracePlus

Spam, or Unwanted Junk E-Mail

It's a denial-of-service attack.

The best user-centric anti-spam tool I have used is Spam Assassin. However, you really want to fight spam on the mail gateways, not the endpoints.

IronPort seems to be a very good anti-spam system, based on my observations as a user of e-mail at some ISPs and Purdue University.

Postini and Proofpoint are cloud-based mail providers that offer spam and malware filtering. My experience with Postini is that a huge amount of spam comes through. Malware, which I would find much more interesting,, is almost entirely filtered out.

Several free spam filters are listed at: paulgraham.com.

MIMESweeper blocks junk mail and filters content for viruses and malicious applets.

SpamCan is a Sendmail patch to detect spam by regular expressions.

How can you tell where spam was injected? Read the "Received:" fields in reverse, looking for inconsistency where the promiscuous relayer accepted the spam from the source. Using a real example I received, my comments inserted below the relevant lines in red:

	From Bio-Med5241_a@linux.com.pk Thu Oct 26 15:38 EST
No, the message did not come from Pakistan (.pk), see below
	Received: from sclera.ecn.purdue.edu (root@sclera.ecn.purdue.edu [])
		by rvl3.ecn.purdue.edu (8.9.3/8.9.3moyman) with ESMTP id PAA16066
		for <cromwell@rvl3.ecn.purdue.edu> Thu, 26 Oct 15:38:34 -0500 (EST)
Hop #3 — sclera forwarded my mail to rvl3.ecn.purdue.edu
	From: Bio-Med5241_a@linux.com.pk
	Received: from glasgow3.blackid.com ([])
		by sclera.ecn.purdue.edu (8.9.3/8.9.3moyman) with ESMTP id PAA13819
		for <cromwell@sclera.ecn.purdue.edu>; Thu, 26 Oct 15:38:24 -0500 (EST)
 Hop #2 — glasgow3.blackid.com, the spam relayer, hands the spam to sclera.ecn.purdue.edu
	Date: Thu, 26 Oct 15:38:24 -0500 (EST)
	Message-Id: <XXXX10262038.PAA13819@sclera.ecn.purdue.edu>
	Received: from geo5 (host-216-77-220-220.fll.bellsouth.net [])
		by glasgow3.blackid.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2650.21)
		id 449GZRTV; Thu, 26 Oct 21:33:01 +0100
Hop #1 — glasgow3.blackid.com, the spam relayer, accepts mail from the source,
a dial-in client of bellsouth.net using the IP address  The
dial-in client undoubtedly got its IP address via DHCP, and so any system using
that IP address right now is not necessarily the original spam source.  However,
bellsouth.net should be able to figure out which of their clients used this
IP address at this particular time.
	To: customer@aol.com
That's odd — I'm not sure how they're getting SMTP to send it to me but with this
bogus address in the "To:" field — maybe I was a blind carbon-copy recipient...
	Subject: A New Dietary Supplement That Can Change Your Life....
	MIME-Version: 1.0
	Content-Type: text/plain; charset=unknown-8bit
	Content-Length: 5463
	Status: R

	[ long pseudo-medical nonsense deleted.... ]

Further investigation could use traceroute or whois to figure out where really is in case the reverse resolution above either failed or was faked. As per the GNU version of whois

% whois
NetRange: -
NetHandle:  NET-216-76-0-0-1
Parent:     NET-216-0-0-0-0
NetType:    Direct Allocation
Comment:    For Abuse Issues, email abuse@bellsouth.net. NO ATTACHMENTS. Include IP
Comment:    address, time/date, message header, and attack logs.

Back to the Security Page