This set of pages describes an attempt to characterize patterns of network attack. The goal is to group attacks into similar patterns, and ideally to automatically discover clusters of similar patterns. Similar attack patterns could suggest similar origin or at least relation between attacks widely separated in time and source.
Some tools used to estimate textual similarity can be applied to the patterns to group attacks by similarity measure and to classify a future attack as a member of a previously seen category.
So, we need to look at a number of topics. Each of these has its own page:
The background of the threat
What part of the threat environment is really of interest?
The attacker's perspective
What are they trying to accomplish and how will they organize their attack?
Designing the attack
How will the attacker approach the problem, what designs are possible for large-scale attacks, and what will be the general patterns of symptoms that you may notice?
Real data and common patterns
The attack tools are imperfect and the vagaries of the Internet mean that patterns aren't as clear as you might wish, so what do you really see?
Textual analysis tools
Some tools for analyzing similarity of written text can help, but what are these tools and how do they work?
Applying textual analysis
to detect patterns in logs
So, how well does this work on real data?