Rotors of M-209 cipher machine.

Availability Tools

Availability is Different

Of the CIA triad of information security — Confidentiality, Integrity, and Availability — this one is different.

We have encryption for confidentiality. This is defensive cryptographic technology, attempting to prevent an adversary from reading our information. It cannot guarantee that an adversary could not discover the decryption key or otherwise obtain our information, but it makes that attack very difficult.

We have cryptographic hash functions for integrity. This is detective cryptographic technology, attempting to tell us if an adversary has modified our information. It cannot guarantee that an adversary could not somehow modify our information in a way that its content has different meaning but we do not notice this, but it makes that attack very difficult.

We have specific numbers in both cases for how much work it would take to successfully attack us. Attacks will always be possible in theory, but we can make them hard enough in practice that we do not need to worry.

Unfortunately, we have no cryptographic tools for availability. This means that we have no math, and so we have no numbers. We cannot rigorously prove the likelihood of data or any other resource remaining available. We cannot even say that any one data set is more likely to remain available than another.

The best thing we have is statistics on what has happened so far in a similar setting. If someone reports that a specific type of storage media has "a lifetime of 2 to 5 years", what they are really saying is that in some percentage of similar cases, maybe 95% of them, maybe 99% of them, the data was available for between two and five years. In a few cases, it did not last even two years, and in a few more it may have lasted for more than five. All you really know is that if you use a large number of these storage devices, most of your data will probably still be around two years later.

Availability simply cannot be guaranteed. Any unprivileged user on a Unix-family operating system can type the following:

$ a() { a|a & } ; a 

That defines a shell function a() which immediately calls itself and pipes that output into a second copy of itself. And then that short line of code calls that disastrous function. That would recurse down an endless hole, doubling the number of processes at each level.

On Solaris 9 that immediately freezes the system.

On Linux with a 3.* kernel and a typical amount of RAM, you would have about one second before the system freezes.

On OpenBSD the system freezes for a few seconds before the kernel steps in and kills the out-of-control set of processes.

On Linux with a 4.* kernel, the system freezes for a few seconds, then is very sluggish for several seconds while a blizzard of error messages fly up the screen where you did this, a mix of:

-bash: fork: retry: No child processes 

and:

-bash: fork: retry: Resource temporarily unavailable 

The load average can climb over 100 within a few seconds. I tested this on a Raspberry Pi with only 512 MB of RAM and a single-core CPU. In another terminal window where I was connected in over SSH, I ran "top -d 0.2" to observe the freeze, the sluggishness, and the load average spike.

The systemd project is taking over more and more of the Linux operating system environment. Thanks to its reckless design, you may be able to freeze a system with this single line:

$ NOTIFY_SOCKET=/run/systemd/notify systemd-notify ""

See that attacks's explanation here, along with discussion of how systemd's creeping takeover of the Linux operating system may be a very bad idea.

Also see this "Compiler Bomb", a 29-byte C program that compiles to a 17,179,875,837 byte (or 16 GB) executable. We must pass the -mcmodel=medium option because the array is larger than 2 GB, and possibly the -save-temps option to keep temporary files in the local directory if there isn't enough space in the /tmp file system. During attempted compilation by any unprivileged user, the system becomes sluggish from time to time as memory is exhausted. See the Compiler Bomb page and the original discussion for more details on this.

$ cat cbomb.c
main[-1u]={1};
$ gcc -mcmodel=medium cbomb.c -o cbomb
cbomb.c:1:1: warning: data definition has no type or storage class
 main[-1u]={1};
  ^
/tmp/ccZbsIhp.s: Assembler messages:
/tmp/ccZbsIhp.s: Fatal error: cannot write to output file '/tmp/cc5mxHSz.o': No space left on device
$ time gcc -mcmodel=medium -save-temps cbomb.c -o cbomb
cbomb.c:1:1: warning: data definition has no type or storage class
 main[-1u]={1};
  ^
/usr/bin/ld: final link failed: Memory exhausted
collect2: error: ld returned 1 exit status

real    2m16.169s
user    0m4.512s
sys     0m12.573s
$ ls -l
total 16777232
-rw-rw-r-- 1 cromwell cromwell          15 Oct 19 10:18 cbomb.c
-rw-rw-r-- 1 cromwell cromwell         143 Oct 19 10:23 cbomb.i
-rw-rw-r-- 1 cromwell cromwell 17179870214 Oct 19 10:26 cbomb.o
-rw-rw-r-- 1 cromwell cromwell         219 Oct 19 10:23 cbomb.s

Internet traffic travels over submarine cables, most with multiple very high bandwidth fibre optic lines. Fishing trawlers can cut these. Vietnam, where over 50% of the people are on the Internet, seems to have more problems than most countries. The cables joining Vietname to Hong Kong, other Chinese landing points, and the Philippines are frequently cut, often near the far end.
Submarine cable and satellite outages

Errors or intentional attacks on IP routing can misdirect traffic. This can be a denial of service. It is frequently used within a country for political reasons in parts of Asia and Africa. Or, it could be for espionage or other traffic collection.
BGP hijacking and accidental routing blunders

Finally, we can't defeat nature, especially when we're overly reliant on limited facilities. Storms can disrupt supply chains.
2018: 30-minute power outage at Samsung factory near Pyeongtaek destroyed 3.5% of global v-NAND flash memory output for March 2011: Floods in Thailand led to hard drive shortages for months

Table of Contents

Some topics have their own page.

On this page:

Detailed topics with their own pages:

Netflix As An (Extreme) Example

Netflix has created the Chaos Monkey and other elements of its Simian Army to stress its system to test resiliency. It's very surprising that they unleash these tools on their production systems, tell people about this, and even give away the tools. See the Netflix technical blog for details.
Netflix technical blog

Netflix is largely built on the Amazon Web Services public cloud. The Chaos Monkey disables selected production systems, while the Chaos Gorilla takes out an entire AWS availability zone. The Doctor Monkey does automated alert and response, searching Netflix's resources for any degradations in performance.

Where not to place telco pedestals

Do not place them where this one was in Herndon, Virginia — right along a road winding through office parks, where the anxious commuters hit speeds around 50 m.p.h. despite that being almost twice the posted limit.

And especially not where a sidewalk ramp makes it so easy to drift off the road while texting and smash into the poor pedestal.

Telco pedestal smashed open by a car.
Telco pedestal smashed open by a car.
Telco pedestal smashed open by a car.
Telco pedestal smashed open by a car.

Data Loss Costs

National Archives and Records Administration (Washington DC, USA)

93% of companies that lost their data center for 10 days or more due to a disaster filed for bankruptcy within one year of the disaster.

50% of businesses without data management for this same period filed for bankruptcy immediately.

Symantec and Ponemon

Symantec conducts a periodic study of disaster recovery plans and estimated costs. Using their 2009 report as an example, they surveyed disaster recovery management at 1,650 companies worldwide, each with at least 5,000 employees and a current disaster recovery plan. They collect a lot of data, but they summarize and present it differently from year to year so you can't necessarily track a given statistic through the years.

According to the 2001 Cost of Downtime Survey Results, companies said the cost of downtime is:
46% said up to US$ 50,000 per hour.
28% said US$ 51,000 - 250,000 per hour.
18% said US$ 250,001 - 1,000,000 per hour.
8% said over US$ 1,000,000 per hour.

According to the 2001 Cost of Downtime Survey Results, companies said that loss of data threatens the survival of a business within:
40% said 72 hours.
21% said 48 hours.
15% said 24 hours.
8% said 8 hours.
9% said 4 hours.
3% said 1 hour.
4% said less than 1 hour.

According to Symantec's 2009 Cost of Downtime Survey Results:

93% of organizations reported that they have had to implement their disaster recovery plans, either in full or partially.

They could achieve skeleton operations after a site-wide outage in a median of three hours, and get mostly back up and running in about four hours.

Based on the reported recovery time and the cost per hour of downtime (not listed in the 2009 report), the cost per incident globally averages approximately US$ 287,000 and the median cost per incident can be as high as US$ 500,000.

IT is becoming more critical over time, with 56% of applications deemed mission critical in 2008 and 60% in 2009.

Database servers are the most likely technologies covered by disaster recovery plans, at 62%, closely followed by applications and web servers, at 61% each.

As for the cause of needing to implement those disaster recovery plans:
59%    Computer system failure
54%    External threats (malware, hackers)
53%    Natural disasters (fire, flood)
45%    Power outage / issues
43%    User/operator error
39%    IT problem management
37%    Data leakage or loss
36%    Malicious employee behavior
34%    Configuration change management issues
33%    Man made disasters (e.g., war, terrorism)
26%    Configuration drift issues
 7%    Never
I am skeptical of this data. Seriously, 33% of these companies had their IT operation taken down by war and terrorism? Those DR managers were being much too broad in their interpretation of "man made disaster"! That category is stealing significant credit away from user / operator error, and some from IT problem management, configuration change mismanagement, and configuration drift. Also, "data leakage or loss" seems to me to be a result, not a cause.

Interestingly, and alarmingly, companies reported backing up only 37% of their data in virtual environments. Slightly over 25% reported that they do not test their virtual servers.

Symantec and the Ponemon Institute's 2013 Cost of Data Breach Study: United States reported that, counter to assumptions, the cost of a data breach continues to decline. I don't know if that should be attributed to people being tired of hype, or getting a little better at analyzing breaches, or what.

Malicious or criminal attacks cause more breaches than negligence or "system glitches", whatever those are.

They say that having formal incident responses in place before the incident lower the overall cost. Also listed as reducing breach cost are: "having a strong security posture", appointing a CISO or Chief Information Security Officer, and hiring outside consultants to assist with the response. And guess what the authors of that report can help you with!

CA Technologies

CA Technologies issued a 2010 report "The Avoidable Cost of Downtime" reporting that European organizations with more than 50 employees collectively lose more than €17 billion in revenue each year due to the time taken to recover from IT downtime, a total of almost 1 million hours or 14 hours per company per year. On average, each company loses €263,347 per year. The average loss per organization varied all over the place, from €500,000 in France, just under €400,000 in Germany, and just over €300,000 in Spain and Norway, to about €90,000 in Belgium and just under €34,000 in Italy

Hardware or Systems Malfunction 59%
Human Error 28%
Software Program Malfunction 9%
Viruses 4%
Natural Disaster 2%

This table shows the causes data loss according to Ontrack engineers (who seem to have lost no data to malicious intruders):

According to a Gallup poll, most businesses value 100 megabytes of data at US$ 1,000,000.

Laptop Theft Prevention

Security cables:
Kensington Philadelphia Security Products American Power Conversion PC Guardian Secure-It Inc

"Phone home" style laptop tracking, Windows only as far as I know:
PC PhoneHome zTrace ComputracePlus

Spam, or Unwanted Junk E-Mail

It's a denial-of-service attack. It wastes your network bandwidth, your mail server processing time, your storage, and your employees' time.

The best user-centric anti-spam tool I have used is Spam Assassin. However, you really want to fight spam on the mail gateways, not the endpoints.

IronPort seems to be a very good anti-spam system, based on my observations as a user of e-mail at some ISPs and Purdue University.

Postini and Proofpoint are cloud-based mail providers that offer spam and malware filtering. My experience with Postini is that a huge amount of spam comes through. Malware, which I would find much more interesting, is almost entirely filtered out.

Several free spam filters are listed at: paulgraham.com.

MIMESweeper blocks junk mail and filters content for viruses and malicious applets.

SpamCan is a Sendmail patch to detect spam by regular expressions.

How can you tell where spam was injected? Read the "Received:" fields in reverse, looking for inconsistency where the promiscuous relayer accepted the spam from the source. Using a real example I received, my comments inserted below the relevant lines in red:

	From Bio-Med5241_a@linux.com.pk Thu Oct 26 15:38 EST
No, the message did not come from Pakistan (.pk), see below
	Received: from sclera.ecn.purdue.edu (root@sclera.ecn.purdue.edu [128.46.144.159])
		by rvl3.ecn.purdue.edu (8.9.3/8.9.3moyman) with ESMTP id PAA16066
		for <cromwell@rvl3.ecn.purdue.edu> Thu, 26 Oct 15:38:34 -0500 (EST)
Hop #3 — sclera forwarded my mail to rvl3.ecn.purdue.edu
	From: Bio-Med5241_a@linux.com.pk
	Received: from glasgow3.blackid.com ([212.250.136.251])
		by sclera.ecn.purdue.edu (8.9.3/8.9.3moyman) with ESMTP id PAA13819
		for <cromwell@sclera.ecn.purdue.edu>; Thu, 26 Oct 15:38:24 -0500 (EST)
 Hop #2 — glasgow3.blackid.com, the spam relayer, hands the spam to sclera.ecn.purdue.edu
	Date: Thu, 26 Oct 15:38:24 -0500 (EST)
	Message-Id: <XXXX10262038.PAA13819@sclera.ecn.purdue.edu>
	Received: from geo5 (host-216-77-220-220.fll.bellsouth.net [216.77.220.220])
		by glasgow3.blackid.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2650.21)
		id 449GZRTV; Thu, 26 Oct 21:33:01 +0100
Hop #1 — glasgow3.blackid.com, the spam relayer, accepts mail from the source,
a dial-in client of bellsouth.net using the IP address 216.77.220.220.  The
dial-in client undoubtedly got its IP address via DHCP, and so any system using
that IP address right now is not necessarily the original spam source.  However,
bellsouth.net should be able to figure out which of their clients used this
IP address at this particular time.
	To: customer@aol.com
That's odd — I'm not sure how they're getting SMTP to send it to me but with this
bogus address in the "To:" field — maybe I was a blind carbon-copy recipient...
	Subject: A New Dietary Supplement That Can Change Your Life....
	MIME-Version: 1.0
	Content-Type: text/plain; charset=unknown-8bit
	Content-Length: 5463
	Status: R

	[ long pseudo-medical nonsense deleted.... ]

Further investigation could use traceroute or whois to figure out where 216.77.220.220 really is in case the reverse resolution above either failed or was faked. As per the GNU version of whois

% whois 216.77.220.220
NetRange:   216.76.0.0 - 216.79.255.255
CIDR:       216.76.0.0/14
NetName:    BELLSNET-BLK5
NetHandle:  NET-216-76-0-0-1
Parent:     NET-216-0-0-0-0
NetType:    Direct Allocation
NameServer: NS.BELLSOUTH.NET
NameServer: NS.ATL.BELLSOUTH.NET
Comment:
Comment:    For Abuse Issues, email abuse@bellsouth.net. NO ATTACHMENTS. Include IP
Comment:    address, time/date, message header, and attack logs.

Back to the Security Page