IP Blocklists
Blocklists are populated in a number of different ways. Some use spam traps to capture email to email addresses that have never been used publically, others use statistical algorithms to judge that a sender is malicious or compromised. Once the data is acquired, blocklist operators populate their lists in two ways:
- They list individual IP addresses, one by one, of all the servers that are sending mail.
- They make use of CIDR (rhymes with spider) notation. CIDR notation, or Classless Internet Domain Routing, is a way to group large blocks of IP addresses. A provider would list a larger group of IP addresses in CIDR notation in order to save on space in the file or database so they don’t have to list them one by one.
For example, the IP addresses in the range 127.0.0.0 – 127.0.0.255 can be listed as 127.0.0.0/24. Rather than using 256 lines in a file, only 1 line need be used.
The XBL from Spamhaus is about 7 million entries (lines of text) and around 100 megs in size. By contrast, the PBL contains 200,000 lines of text (without exceptions in ! notation) and is 6 megs. However, the PBL is represented mostly in CIDR notation. If all of these ranges were expanded, it would be over 650 million individual IP addresses. That’s a whole heck of a lot more IPs in the PBL for a whole lot less file size!
When implementing the IP blocklists from Spamhaus in a real organization, running the XBL in front of the PBL blocks about 4 times as much mail as PBL [1]. The XBL is better at catching individual bots that are sending out spam but are not listed anywhere (they are new IPs) whereas the PBL is better at pre-emptively catching mail servers that should never send out spam (probable bots but it doesn’t matter because they shouldn’t be sending mail anyhow).
However, if every single PBL IP had to be listed singly instead of compressing it into CIDR ranges, then the PBL would be 9.4 gigs in total size. 9.4 gigs is alargefile. It isn’t completely unmanageable but it goes from being a minor inconvenience to a major one. It takes a long time to download, upload, and process a 9.4 gig file. It’s also easier to store the file entries in a database if it is only 500,000 entries (or even 7 million) vs 650 million of them. Databases that large run into the problem of scale.
The PBL and XBL are examples of why different styles of IP blocklists are required. The PBL lists 650 million IPs and we still have over 7 million IPs on the XBL that aren’t on the PBL. Clearly, spamming bots can move around such that they are not published on the lists that have large address spaces listed. Bots are very good at hiding in places that are not blocked yet. Given enough space to hide, spammers will hide in that space because if they didn’t they would not be able to stay in business. The problem that the industry faces is that as soon as we find a spammer’s hiding space, we can block it for a while but the spammer will vacate it, relocate elsewhere and continue to spam [2].
And therein is the problem of IPv6. An IPv4 IP address consists of 4 octets and each octet is a number running from 0-255. This means that there are 256 x 256 x 256 x 256 possible IP addresses, which is 4.2 billion possible IP addresses (in reality, there are less than this because there are many ranges of IPs that are reserved and not for public consumption). If you had to list every single IP address singly in a file, then the size of the file would be 61 gigs. 61 gigs is a very large file size and there are very few pieces of hardware that can handle that size of file in memory (whether you are doing IP blocklist look ups in rbldnsd or some other in-memory solution on-the-mail server). Processing the file and cleaning it up would take a very long time; you simply couldn’t do it in real time where IP blocklists need to be updated frequently (once per hour at a bare minimum).
IPv6 multiplies this problem. We have seen that spammers already possess the ability to hop around IP addresses quickly. They do this because once an IP gets blocked, it is no longer useful to them. However, there are only so many places they can hide – 4.2 billion places. In IPv6, if they copy the same pattern of sending out spam and hopping around IP addresses the same way they do in IPv4, then there is virtually unlimited space they can hide in. To put it one way, there are 250 billion spam messages sent per day. Under IPv6, spammers could send out 1 piece of spam per IPv6 address, discard it and then move on to the next IPv6 address for the next 10,000 years [3] and never need to re-use a previous IPv6 address. A mail server could never load a file big enough even for one day’s IPv6 blocklist if spammers sent every single spam from a unique IPv6 address. Because spammers could hop around so much, IP blocklists would encounter the following problems:
- They would get to be too large for anyone to download, process and upload.
- Even if blocklist maintainers listed only the IP addresses that were spamming, a spammer could send spam from an IP address, let the IP address it used get listed on a blocklist, but discard that IP address and move onto the next IP address. By rotating through IP addresses quickly, a spammer would always be one step ahead of the blocklists, and the lists would lose their effectiveness.
How do we know spammers will do this?
Because they are already doing it! The biggest shift in spammer behavior over the past year and a half is not the move to infected bots, but moving to compromised accounts. By compromising accounts, spammers have virtually unlimited resources from which to spam. Spam filters cannot do IP address blocks without creating many false positives. Thus, from the spammer perspective, they have defeated IP reputation filtering and can send from randomized email accounts. It is difficult for spam filters to create proactive rule sets when the population of potential email addresses is nearly unlimited since you can make your email address almost any combination of letters.
Similarly, the population of potential IPv6 addresses is nearly unlimited. Spammers have already generated the capacity to shift their tactics to mechanisms that evade spam filters. Once they learn that IPv6 gives them another way around the filter, they will start using this technique en masse.
It’s probably true that for the first little while until IPv6 becomes more common, spammers will not use it. However, it is only a matter of time before the cost/benefit ratio shifts to their favor and when it does, they will do it. Time and experience has shown that better spammers always evolve. There truly is a storm coming.
This is why no email receivers are eager to send and receive email over IPv6 [4]. Performing spam filtering in IPv6 the same way as IPv4 will not work. We have to allow for the worst case scenario which is that spammers will overwhelm mail servers and drain processing power by having to deal with a 10x increase in traffic.
Posts in this series:
- A Plan for Email over IPv6, part 1 – Introduction, and How Filters Work in IPv6
- A Plan for Email over IPv6, part 2 - Why we use IP blocklists in IPv4 and why we can't in IPv6
- A Plan for Email over IPv6, part 3 - A solution
- A Plan for Email over IPv6, part 4 - Population of the whitelists
- A Plan for Email over IPv6, part 5 – Removals, key differences and standards
[1] Confirmed by independent research by antispam companies.
[2] This is the origin of the term “whack-a-mole”, a term the antispam industry borrowed from the carnival game. As soon as you whack one mole, it hides and another pops up.
[3] Coincidentally, this is the same amount of time it will take before the Toronto Maple Leafs win another NHL Stanley Cup.
[4] Other readers will point out that the major reason it won’t work is because a server could never cache that many IP addresses. While true, not every mail server looks up IPs on a blocklist via a DNS query.