“all the chinese people with shit computers have got home and turned their virus ridden machines on”
Three days ago one of my regular clients called me saying load was high on his webserver, could I take a look. The Apache logs shows that the server was being hammered with HTTP POST requests.
It was easy to spot them as being as being non-legitimate. All were targeted at the default vhost (just the server IP address), and all had the same UserAgent. I grepped through that day’s Apache log, and counted over 180,000 unique IPs issuing these requests. Surely that couldn’t be right? Huge botnets like this certainly exist, but I thought they were mainly used for spam or dossing high profile sites. This server hosts the website for pretty harmless small company (they manufacture stationary), and even if one of their customers did have a gripe with them, it’s unlikely that the customer would have access to such a huge botnet.
A recent worm targetting Plesk servers via a HTTP POST exploit was still fresh in my mind (a had a few new clients as a result of that), and I wondered if perhaps this wasn’t an intentional DoS, but a worm that had gone wrong resulting in all the infected hosts probing this particular server. I fired up tcpdump to inspect the HTTP POST data, but it was blank.
So it seemed like someone was simply trying to tie up Apache slots by bombarding it with empty POST requests. The question was how to stop it. I started with a simple shell script to feed the list of IPs that I’d extracted into iptables, but as the number of entries in the firewall (ok, packet filter) grew, the rate at which iptables could add them started to drop off quite seriously, to the point where it was only adding 3 or 4 per second. With 180,000 addresses, this was going to take a long time.
I decided to change tactics. Scanning through the list, the IPs appeared to be clustered slightly around netblocks. I took the first two octets of each address and generated a list of netblocks to filter in iptables, eg:
iptables -A INPUT -s %IP%.0.0/16 -j DROP
The result was 9,500 netblocks, a much more manageable number. It was still a bit of a gamble: we didn’t have time to look up the start and end addresses for the netblocks to which each IP belonged to, so firewalling off x.x.0.0/16 was just a guess. With this being a UK company, we didn’t care too much about blocking overseas traffic off (not ideal, but an acceptable solution), but there was still a danger that, while 18.104.22.168 might belong to a two-bit Korean ISP, 22.214.171.124 might be a UK netblock. We took the risk.
An hour later these rules had been processed, but Apache was still being hit moderately heavily by new IP addresses. I wrote a perl script to tail the Apache log and add them to the firewall, but at a rate or around 5 per second, iptables was still struggling to keep up. I gave it a couple of hours, in the hope that it would pick up the majority of the addresses, but later that day Apache was still being hammered. In addition, my client had discovered that several legitimate people (including the owner of the site) had been firewalled too – clearly filtering off whole B-classes was too aggressive. So again, the hunt was on for a more effective solution.
I started looking into iptable’s ability to match strings inside the TCP packet payload. One option was to block all HTTP POST requests (which would interfere with the operation of the site to some extent, but was better than no site at all), but blocking the UserAgent seemed a better bet, the POST requests all looked like this:
defaulthost:80 126.96.36.199 – – [28/Jul/2013:16:57:06 +0100] “POST / HTTP/1.1” 301 360 “-” “Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)”
My client agreed that blocking anyone using IE 6 on Windows 2000 wasn’t the end of the world, and should have a negligible impact on legitimate users.
So we tried this:
iptables -I INPUT -m string --algo bm --string "MSIE 6.0; Windows NT 5.1; SV1" -j REJECT
The trouble was, by the time the UserAgent had been sent inside the HTTP request, the TCP connection had already been opened. Apache was still clogged up handling those connections.
I started looking at the tcpdump output again, wondering if there was anything in the TCP headers which would uniquely identify these DoS agents. One thing that struck me was the TTL: an HTTP request from Thailand would almost certainly have gone through more hops than a request from a client in the UK (the server is hosted in the Germany). I considered blocking any traffic with a TTL lower than a certain value, but decided against it: over the years, different operating systems have used different default TTLs, and while we can usually guess at what the original TTL was (if a packet arrives with a TTL of 20 it’s much more likely that the client is 12 hops away and it started at 32, rather than it having started at 64 with the client 44 hops away), it didn’t seem reliable enough.
In the end we went back to my original list of subnets, and compared this against a freely available list of IP blocks by country at https://www.countryipblocks.net/country_selection.php. From that we were able to determine which of my list of B-class networks definitely didn’t contain any UK addresses, and firewall them off.
While that was running, I thought it would be interesting to see what else I could find out about these bots. I picked half a dozen at random and nmap’d them. Nothing interesting, and no common pattern. Perhaps the DDoS agent runs over UDP, or uses port knocking or a reverse tunnel. I also used p0f to do some fingerprinting; all the hosts were identified as ‘Windows XP/2000’ (although p0f doesn’t always get it right).
Even with a reliable set of B-class networks block, the DoS continued, if anything worse than it had been before. As my client elegantly put it:
all the chinese people with shit computers have got home and turned their virus ridden machines on
The server is hosted by 1and1. Based on previous experiences, asking them for help would probably be a head-banging experience, but out of desperation we tried. They pointed out that we had access to a Cisco firewall via the 1and1 control panel, to which we could add a whopping 25 rules. I calculated some netmasks which would allow us to drop as much unwanted traffic as possible (there’s a nice calculator at https://www.dan.me.uk/ipsubnets) in the limited number of rules we had, and ended blocking off 100 million hosts or so, all in Asia/Africa/South America. I then removed these rules from iptables as they were unnecessary and no doubt slowing it down.
Although this had certainly helped a little, we were still being bombed with traffic from thousands of B-class networks, and iptables wasn’t coping well. We were seeing high CPU usage in kernel processes, and network traffic was moving at snail’s pace.
I started looking into the iptables scalability issue, and discovered that when rules are added the whole ruleset is copied into userspace, modified, then copied back to kernelspace. No wonder it can be quick to restore an existing ruleset, but slow to add new rules one by one. Along the way I also discovered ipset, which promised to be a more scalable solution for blocking large numbers of hosts. When I get the chance I’d look to do some more research on ipset: how it stores data, how it improves over iptables etc. Perhaps I’ll even write it up in a blog post, but for the moment if you want to learn more take a look here: http://daemonkeeper.net/781/mass-blocking-ip-addresses-with-ipset/.
Ipset performed really well, but only solved half the problem. Yes, we no longer had high server load and sluggish networking because of iptables, but we were still being overwhelmed with traffic clogging up Apache connection slots. Even though we were now automatically firewalling off any request from MSIE 6.0 Windows NT 5.1 UserAgent, the number of unique hosts was staggering. By this point we had counted over 250,000.
In the end we moved the site to CloudFare, who are now bearing the brunt of the attack. Sorry to disappoint; you were probably hoping that this post would conclude with a brilliantly cunning technique that we’d come up with to fend off the attack, and everyone lived happily ever after. On this occasion, they didn’t.