Textual Analysis for Network Attack Recognition
Designing the Attack

Designing the Attack

Let's limit this analysis to guessing SSH passwords. SSH service is available for routers, and it can be added to Windows systems. But the majority of hosts running SSH service are some variety of Unix. So the hacker should conclude that the system is probably Unix and therefore:

There will be an account named root. This account is the most valuable target because it provides unlimited access. Effort here might be greatly rewarded. However, even the least qualified of Unix administrators should realize this and the root password should be hard to guess. Furthermore, remote access as root should be disabled. However, there are enough sloppily configured systems out there to make root password guesses pay off at times.
There may be some accounts associated with commonly installed services and subsystems: named, apache, httpd, sys, adm, backup, operator, and so on. You don't really know if they will be in use, and if so, what they will be named (see both apache and httpd in that list), but there are many reasonable guesses for Unix-like operating systems. Some of these may have privileges higher than those of ordinary users.
There may be user accounts, possibly a very large number, but there is no way to predict what form they may take. They probably won't have especially interesting data and they probably won't have high privileges, but they probably will have weaker passwords, will not be carefully monitored, and can still be quite useful for storing data, launching further attacks, and running services like IRC on unprivileged TCP ports.

The attack will probably use two components — a program probably written in C to automate the connection and interaction with targeted SSH servers, and a data file listing the login/password pairs to use. The program also needs to be told which targets to attack, in terms of a list of hostnames and/or IP addresses or as a range or CIDR block of IP addresses. That target set could be specified in the same data file, in a separate data file, or on the command line, depending on the attack code design. See my detailed analysis of an intrusion for an example of actual attack code. Let's say that the following is the plan for a simple attack:

	Target host
	`target1`	`target2`	`target3`
Login/password guess	`root/password`	`root/password`	`root/password`
	`root/admin`	`root/admin`	`root/admin`
	`root/letmein`	`root/letmein`	`root/letmein`
	`operator/password`	`operator/password`	`operator/password`
	`apache/password`	`apache/password`	`apache/password`

Let's also assume that the three target hosts are within the same organization, and they are collecting their syslog messages in one place for analysis. Remember that the syslog message from the SSH daemon records success or failure for the event and the login used, but it does not (nor should it) record the password guess. The above would record that there were three guesses for the root password on each target host, but not what those guesses were.

The attack will leave a trail in the logs in one of four forms, depending on the attack code design. This is the first way we can start to distinguish between and recognize attacks. I can name these patterns based on how the attack progresses through the target table:

Single-threaded vertical — One process goes through the list of login/password guesses on the first target, then it starts down the list on the second target, and so on. The result is a scan column by column through the table:
root/password on target1
root/admin on target1
root/letmein on target1
operator/password on target1
apache/password on target1
root/password on target2 (back to the start of the login list...)
root/admin on target2
root/letmein on target2
...
Single-threaded horizontal — One process goes through the list of login/password guesses, trying the first guess on each host in turn, then the second guess, and so on. The result is a scan row by row through the table:
root/password on target1
root/password on target2
root/password on target3
root/admin on target1 (back to the start of the target list...)
root/admin on target2
root/admin on target3
...
The larger the target set, the less aggressive this looks in an individual system's log because of the delay after each guess while that guess is made against all other targets in the set. Of course, larger target sets appear in the collected log and show just how ambitious the overall attack was.
Multi-threaded vertical — Separate threads, or separate processes, possibly running on multiple machines, each attack one target by going down the login/password guess list. On any one host this looks like the result of the single-threaded vertical scan. But it looks very aggressive where the logs are collected because the scans are happening simultaneously.
Multi-threaded horizontal — Separate threads, or separate processes, possibly running on multiple machines, each take one login/password pair and try it on each host in turn. You don't see this happen, at least not very often, because the login/password list is usually much larger than the target list. Such an attack would require an awful lot of resources by the attacker and would likely overwhelm the poor target. This would be a rather poor design for an attack, as the attacker would likely miss the opportunity for exploit because their overly aggressive probes turned into an accidental denial of service attack....

The attacker could throttle the rate of the attack in an attempt to avoid attention. You see this once in a while (see the later examples), but usually the attack runs as quickly as possible. See the above observation that the attack is probably launched from compromised systems, so detection is unfortunate but not critical. The SSH daemon's intentional rate throttling and interaction with security mechanisms like the PAM library will be the limiting speed factor, so expect to see no more than one guess per 1 to 4 seconds per thread of attack execution.

If we extract attack sequences from our aggregated logs, one sequence per pair of attacker host and target host, we can categorize the attack design:

Single-threaded vertical attacks will produce sequences that complete on one target and immediately begin on a new target. Long gaps between sequences from a single attacker within one aggregated log set indicates that there were probably other attacks you didn't notice. These would include attacks against your systems that you didn't notice (the block of IP addresses includes Windows desktops or printers running neither SSH nor syslog, or SSH servers whose syslog messages you ignore), or attacks against systems at other organizations.
Single-threaded horizontal attacks will produce individual attacker-victim sequences that are spread over quite a bit of time — more and more time as the list of targets grows. Those many sequences will be almost completely overlapping, as the attack doesn't move on to the second login in the list until the overall attack has started a sequence on each target host. Each sequence may be many hours long, but all the sequences probably start within minutes or even seconds of each other.
Multi-threaded vertical attacks will produce sequences that overlap like the single-threaded horizontal attacks, but which all happen much faster. Expect each of these to take just 3 to 4 seconds per login/password guess.
Multi-threaded horizontal attacks, if you were to see such a thing, would similarly produce sequences almost completely overlapping in time. They would probably cover a time period of just 3 to 4 seconds per host in the list.

Further information captured in the syslog data includes the client TCP port number. There might be some information contained in that sequence, some way of categorizing the attacking host operating system and its patch level (for example, are the client port numbers randomized or sequential?), but the meaning would be largely obscured by other simultaneous client activity occuring on the attacking host but not observable.

Finally, once in a while you will see a very simple attack. One attacking host makes just one guess for the root account on each target. What would be the point? There are two possible explanations:

Believe it or not, there are Unix systems out there where the administrator is so clueless that they set the root password to password, or root, or admin. The attacker is looking for those extremely soft targets as quickly as possible.
The attacker is really looking for SSH servers running old and vulnerable versions of the SSH service. The scan looks for the SSH version information in the initial handshaking. While they're making the connection, they might as well make a simple guess for the root password — because of the above reason they might get lucky.

One thing that does not matter is the order of the guesses. The automated attack will go through its entire list of target hosts and login/password guesses, saving any information about successful guesses for later use. The precise sequence does not matter to the designer or the user of the attack code.

As for the set of target hosts and their ordering, you do not really know how many hosts are in the set because it may contain many others not at your organization. What's more, what you can observe would often be very difficult to explain. The set of targets may be in what appears to be random order, neither numerical by IP address nor alphabetical by host name. If you have several hosts configured identically except for strictly sequential IP address and host names differing by only a single character (e.g., research1, research2, research3, etc), you will find to your surprise that given attacks only hit randomly selected subsets in random order. Even if you could somehow see the complete list of targets, it seems unlikely that it would be of any analytical help.

The list of accounts, however, can be observed from a single target and provides a way to recognize similar attacks. The person using the attack code could simply use a list included with the program itself, they could create their own list, or combine lists from several sources. Distinctive patterns observed include these:

User names drawn from one nationality: Spanish, Portuguese, German, Indonesian or Malay, among others seen.

Inclusion of RACF-like alphanumeric logins.

Logins unlikely to be found on a Unix system, such as mixed-case user names and many guesses for administrator — these would seem to be attacks by a naively Windows-centric hacker.

Long lists of what seem to be likely password guesses rather than logins: password, letmein, keyboard patterns like 12345 and qwerty, and words with digits and punctuation marks appended. These must be the result of mistakes in coding or format of the data file defining the planned attack!

On the next page we see that it may be easy for a human to generalize a few similar attacks as "Single-threaded horizontal scan with about a hundred guesses for root, then five or six guesses each for common Unix system accounts, then two guesses each for an alphabetical list of common Spanish names starting with alberto."

However, we will also see that the attacks will probably be similar, not identical, and this makes the automatic clustering and classification more difficult.

Smart hackers attacking high-value targets will do some research and thinking when they design their attack. Usually, however, the attack sequences do not make much sense. A site in the United States is attacked by a host in Brazil attempting to guess passwords for thousands of logins based on German names, none of which exist on the target host. The attacking host is just a handy tool for a hacker who is randomly or erraticly selecting targets. Once in a while they may happen to select a target for which their attack is relevant, and in a few of those cases, they may be successful.

Previous:
The Attacker's Perspective

Next:
Real Data and Common Patterns

Back to the start of the network attack analysis

To The Security Page