Textual Analysis for Network Attack Recognition
Real Attack Data Patterns

Real Data and Common Patterns

The following data is based on collecting syslog data from some of the Linux hosts on one /24 subnet at a major U.S. university for 12 consecutive months. None of the hosts were intended as "honeypot" systems, they were all in laboratory use for a mixture of compute service and desktop use.

Not all systems were contributing syslog data throughout the period. Some joined the project laters, others had their operating systems re-installed by graduate students who did not always reconfigure the local syslog service.

The below table includes links to the reports, click on a month name if you want to see a rather large detailed report.

Month	Contibuting Target Hosts	Attacking Hosts	Attack Sequences Captured	Report Size
October	3	100	229	285 kbytes
November	3	347	741	1.5 Mbytes
December	3	113	294	364 kbytes
January	4	120	289	404 kbytes
February	10	105	750	440 kbytes
March	10	129	844	588 kbytes
April	9	124	824	540 kbytes
May	9	759	2126	4.0 Mbytes
June	9	183	778	624 kbytes
July	12	171	799	616 kbytes
August	12	192	838	736 kbytes
September	10	955	1965	6.9 Mbytes
Summary	average = 7.83	total = 3298	total = 10477	17.4 Mbytes

Each monthly report starts with a table ranking the attacking hosts in decreasing order of number of attacks detected. Then a large table shows a summary of the set of attacks from one host. The attack descriptions are in order of their start time. Here is one early example from October:

Attacker	Target	Start	End	Password guesses for:
Attacker	Target	Start	End	root	non-root	invalid users	all users
71.170.120.217 pool-71-170-120-217.dllstx.fios.verizon.net. OrgName: Verizon Internet Services Inc. Address: 1880 Campus Commons Dr City: Reston StateProv: VA Country: US NetName: VIS-BLOCK NetHandle: NET-71-169-192-0-1 Parent: NET-71-0-0-0-0	ipanema	Oct 1 08:09:05	Oct 1 08:10:22 77 seconds	15	13 / 13	140 / 120	168 / 134 0.46 sec/guess
	copacabana	Oct 1 08:09:06	Oct 1 08:10:23 77 seconds	15	13 / 13	140 / 120	168 / 134 0.46 sec/guess
	total: 2 targets 336 probes	Oct 1 08:09:05	Oct 1 08:10:23 78 seconds	30	26 / 13	280 / 120	336 / 134 0.23 sec/guess

The attacking host was at IP address 71.170.120.217. A DNS PTR lookup sucessfully resolved that to the fully-qualified domain name pool-71-170-120-217.dllstx.fios.verizon.net, and in case that failed as it frequently did, a whois lookup was also performed on the IP address.

That attack hit two target hosts, ipanema and copacabana, listed in order of when each part of the attack started.

This was a multi-threaded vertical attack, based on the almost completely overlapping periods. The attack made 15 guesses for the root password, 1 guess each for the passwords of 13 accounts that happened to exist on the system, and a total of 140 guesses for the passwords of 120 accounts that did not exist. The precise sequence of guesses is as follows, with root marked in red, existing accounts marked in yellow, and the remaining invalid accounts marked in green.

staff sales recruit alias office samba tomcat webadmin spam virus cyrus oracle michael ftp test webmaster postmaster postfix postgres paul root guest admin linux user david web apache pgsql mysql info tony core newsletter named visitor ftpuser username administrator library test root root admin guest master root root root root root admin admin admin admin root root test test webmaster username user root admin test root root root danny alex brett mike alan data www-data http httpd pop nobody root backup info shop sales web www wwwrun adam stephen richard george john news angel games pgsql mail adm ident webpop susan sunny steven ssh search sara robert richard party amanda rpm operator sgi sshd users admins admins bin daemon lp sync shutdown halt uucp smmsp dean unknown securityagent tokend windowserver appowner xgridagent agent xgridcontroller jabber amavisd clamav appserver mailman cyrusimap qtss eppc telnetd identd gnats jeff irc list eleve proxy sys zzz frank dan james snort radiomail harrypotter divine popa3d aptproxy desktop workshop mailnull nfsnobody rpcuser rpc gopher

As these types of attack go, this one was rather aggressive with two guesses per second on each target host.

The opposite extreme of timing appears to occur in the very next attack sequence captured, coming from Jiangsu Province in the People's Republic of China.

This appears to be an unusually subtle attack, spread out over one week making just 24 guesses on one target and 14 on another. That means about 6.2 hours between guesses on target ipanema and about 11 hours on target copacabana.

The sequences appeared to start almost ten minutes apart and continue for a week, indicating another multi-threaded vertical attack, but then stopping within one second. I guessed that the attacking host was shut down then or the attack process otherwise killed off.

However, further investigation of the captured log data showed that this was a further complication of this study, one that required further analytic software development! This one compromised host was used as an attack platform for two separate and similar but non-identical attacks, one lasting about 35 minutes on the morning of October 1st, and the second lasting about 45 minutes on the morning of October 7th.

Attacker	Target	Start	End	Password guesses for:
Attacker	Target	Start	End	root	non-root	invalid users	all users
218.3.120.196 inetnum: 218.3.120.192 - 218.3.120.223 netname: ZHENJIANG-DY-E_EDUCATION-CENTER descr: Danyang E_Education Center descr: Zhenjiang City descr: Jiangsu Province country: CN address: No.18,Dianli Road,Zhenjiang 212007 address: XINMINDONG ROAD,DANYANG	ipanema	Oct 1 09:19:11	Oct 7 08:14:53 514542 seconds			24 / 5	24 / 5 22371.39 sec/guess
	copacabana	Oct 1 09:28:50	Oct 7 08:14:52 513962 seconds			14 / 3	14 / 3 39535.54 sec/guess
	total: 2 targets 38 probes	Oct 1 09:19:11	Oct 7 08:14:53 514542 seconds			38 / 5	38 / 5 13906.54 sec/guess

A cormorant fishman on a small boat on the Li River passing through Yangshuo in China.

A cormorant fisherman on the Li River at Yangshuo, in south-eastern China, the source of so many of these attacks.

One of the early stages of log analysis was re-designed, to detect and separate different attacks from the same attacking host against the same target within one general period of time. Investigation indicated that guesses within one attack sequence occur within 10 seconds of each other, while distinct attack sequences from one attacking host as noticed so far have been separated by a few days. An arbitrary threshold of 300 seconds (5 minutes) was found useful to detect the start of a new sequence.

Once the sequences were separated, their significant differences were obvious. They were all based on the small set {admin, guest, mysql, test, webmaster}, but the sequences varied in length, order, and members:

Date  Target  Sequence
1 Oct  ipanema   test admin test admin test admin guest mysql webmaster test admin guest
1 Oct  copacabana   test admin guest test admin
7 Oct  ipanema   test test admin guest test admin guest test test admin test admin
7 Oct  copacabana   test test test admin test admin test admin guest

Moving on into October, here is the first instance in these logs of what will become a familiar pattern:

Attacker	Target	Start	End	Password guesses for:
Attacker	Target	Start	End	root	non-root	invalid users	all users
140.124.181.244 inetnum: 140.117.0.0 - 140.138.255.255 netname: TANET-BNETA descr: imported inetnum object for MOEC country: TW address: Ministry of Education computer Center address: 12F, No 106, Sec. 2, Heping E. Rd., Taipei address: Taipei Taiwan inetnum: 140.124.0.0 - 140.124.255.255 netname: T-NTUT.EDU.TW-NET descr: National Taipei University of Technology descr: Taipei Taiwan address: National Taipei University of Technology	copacabana	Oct 24 15:54:44	Oct 24 15:55:16 32 seconds	3		6 / 4	9 / 5 4.00 sec/guess
	ipanema	Oct 24 15:54:44	Oct 24 15:55:14 30 seconds	3		6 / 4	9 / 5 3.75 sec/guess
	xoanon	Oct 24 15:54:45	Oct 24 15:55:15 30 seconds	3		6 / 4	9 / 5 3.75 sec/guess
	total: 3 targets 27 probes	Oct 24 15:54:44	Oct 24 15:55:16 32 seconds	9		18 / 4	27 / 5 1.23 sec/guess

Another multi-threaded vertical attack, not overly aggressive on any one host. A period of 3.5 to 4 seconds per guess seems to be pretty typical across all the attacks logged on this set of targets.

The interesting feature of this attack is the sequence of accounts guessed:
test guest admin admin user root root root test
That sequence had shown up again and again, and was the original motivation for the investigation resulting in this collection of web pages! Here are the instances of the "9/5" attack seen just in the first four months of data collection. With the exception of the attack on November 5, these attacks were identical — multi-threaded vertical attacks with one guess per target every 3 to 4 seconds, guessing passwords for identical sequences of logins.

Instances of the "9/5" or `"test guest admin admin user root root root test"` attack
Attacker	Start time	Targets	Notes
140.124.181.244 Ministry of Education Computer Center, Taipei, Taiwan	Oct 24 15:54:44	xoanon, ipanema, copacabana
200.226.124.15 cheeseegg.ig.com.br Internet Group do Brasil Ltda	Oct 30 22:08:33	xoanon, ipanema, copacabana	This host returns for another attack on November 23!
218.1.65.233 China Telecom, Room 805, 61 North Si Chuan Road, Shanghai, PRC	Nov 5 15:45:24	xoanon, ipanema, copacabana	This is some variation, as it spread the guesses over 18 days on xoanon and copacabana. It only attempted: `test test test guest` on copacabana, and: `test guest admin` on ipanema, only attacking ipanema during the final 19 seconds of the overall attack sequence.
60.248.162.135 60-248-162-135.HINET-IP.hinet.net Chunghwa Telecom Co., Ltd. Data-Bldg 6F, No.21, Sec.21, Hsin-Yi Rd., Taipei Taiwan	Nov 10 12:58:19	xoanon, ipanema, copacabana
221.4.182.146 CNC Group Guangdong province network, PRC	Nov 20 19:22:34	xoanon, ipanema, copacabana
200.226.124.15 cheeseegg.ig.com.br Internet Group do Brasil Ltda	Nov 23 21:08:32	xoanon, ipanema, copacabana	This host had already done this attack against these same targets on October 30!
212.87.231.34 kpts.pcz.czest.pl Institute of Computer and Information Science , Technical University of Czestochowa, Poland	Nov 27 14:38:23	xoanon, ipanema, copacabana
80.55.184.58 wc58.internetdsl.tpnet.pl IDSL customer, Tychy company, Warszawa, Poland	Dec 2 06:38:21	ipanema, copacabana
59.120.195.104 59-120-195-104.HINET-IP.hinet.net CHTD, Chunghwa Telecom Co., Ltd. Data-Bldg 6F, No.21, Sec.21, Hsin-Yi Rd., Taipei, Taiwan	Dec 13 08:18:51	ipanema, copacabana
62.115.65.34 62-115-65-34.customer.teliacarrier.com Ariave Satcom LTD, CallSat Telecom, 122 Athalassas Ave, Nicosia, Cyprus	Jan 4 17:37:16	ipanema
66.0.90.52 Piedmont Municipal, apparently near Atlanta GA, USA	Jan 18 19:21:08	xoanon, ipanema, copacabana
140.116.214.87 Ministry of Education computer Center, Taipei, Taiwan	Jan 19 12:00:53	xoanon, ipanema, copacabana
222.124.169.163 Pt. Telekomunikasi Indonesia Jakarta, Indonesia	Jan 21 07:55:48	xoanon, ipanema, copacabana
211.203.181.6 Hanaro Telecom Co, Seoul, Korea	Jan 28 22:14:42	xoanon, ipanema, copacabana

A relatively simple attack like this "9/5" sequence can be noticed by simply looking at a table summarizing attacks. The problem is that many times the attacks are very similar but they are not identical. Here is an example of that phenomenon in an attack from Korea:

Attacker	Target	Start	End	Password guesses for:
Attacker	Target	Start	End	root	non-root	invalid users	all users
125.245.59.159 inetnum: 125.240.0.0 - 125.247.255.255 netname: PUBNETPLUS descr: DACOM-PUBNETPLUS descr: DACOM Bldg, 65-228. Hangangro3ga. Yongsan-gu, SEOUL, 140-716 descr: Allocated to KRNIC Member. descr: If you would like to find assignment descr: information in detail please refer to descr: the KRNIC Whois Database at: descr: "http://whois.nic.or.kr/english/index.html" country: KR address: 65-228, 3Ga, Hangang-ro, Yongsan-gu, Seoul inetnum: 125.240.0.0 - 125.251.255.255 netname: PUBNETPLUS-KR	ipanema	Oct 16 01:36:43	Oct 16 01:43:02 379 seconds	26	4 / 4	45 / 30	75 / 35 5.12 sec/guess
	copacabana	Oct 16 01:36:43	Oct 16 01:43:32 409 seconds	26	4 / 4	49 / 34	79 / 39 5.24 sec/guess
	xoanon	Oct 16 01:36:57	Oct 16 01:48:19 682 seconds	27	10 / 10	89 / 69	126 / 80 5.46 sec/guess
	total: 3 targets 280 probes	Oct 16 01:36:43	Oct 16 01:48:19 696 seconds	79	18 / 11	183 / 69	280 / 80 2.49 sec/guess

As usual, a multi-threaded vertical attack. It's interesting that two of these three started within one second while the third started 14 seconds later. I would guess that many other threads were started during that time, attacking other targets not observed in this syslog collection.

Notice that the first two threads terminated early. Not simultaneously, which would have suggested that the attacking host had been shut down or the attack processes all killed. The longer attack against target xoanon continued for about five minutes more.

Let's consider the attack against target xoanon as the intended pattern. Red background indicates logins attacked on all three targets, yellow indicates logins attacked on xoanon and copacabana only, and blue indicates logins attacked on xoanon only:
root fluffy admin test guest webmaster mysql oracle library info shell linux unix webadmin ftp test root admin guest master apache root root network word root root root root root root root root root root root root root root admin admin admin admin root root test test webmaster user username username user root admin test root root root root danny sharon aron alex brett mike alan data www-data http httpd nobody root backup info shop sales web www wwwrun adam stephen richard george michael john david paul news angel games pgsql pgsql mail adm ident resin mikael mike suva webpop technicom susan sunsun root sunny steven ssh search sara robert richard postmaster party michael amanda mysql rpm operator sgi Aaliyah Aaron Aba Abel Jewel sshd users

Based on Observations So Far, What We Should Expect:

Many (but not all) attacks are multi-threaded vertical scans.

Sequences against targets may be identical, but frequently one target list terminates early.

Less frequently, we may see guesses skipped within the list.

So, we are looking to discover clusters of similar lists. They will be highly similar, possibly identical, at least to some point. Members of a cluster likely terminate or skip entries at different positions in the sequence.

To The Security Page