What history of spam would be complete without a look at the technologies we use every day to try to keep the amount of spam hitting our mailbox to a minimum? In today’s post, we’re going to take a look at how some of those technologies that anti-spam solutions use to keep the majority of the junk from ever hitting our inbox have evolved over time. From the earliest responses to spam, through technologies that even today are in their infancy, these are the tools of the trade, and even the oldest are still in use today. We won’t look at every single technology out there, but we will cover the main ones. Let’s start our look with the original anti-spam tool…
The Delete Key!
For many years, pressing the delete key was the only way users had to avoid spam. Even today, what comparatively little that does make it all the way through to your inbox is best handled by tossing it in the bit bucket, and the delete key is the way to do that with a satisfying tactile response.
Of course, no programmer is going to be satisfied with a manual process, so early recipients of spam started to code together automated reactions based on keywords. When a message arrived, its content was analyzed, and if it contained any of the words on the “naughty” list, the message was considered spam and either quarantined, or deleted. Even today, admins still use keyword lists to block specific phrases, which have given rise to so many creative ways to spell those spammy words.
The earliest blacklists were maintained individually. If a system sent enough spam, an admin could simply add it to a blacklist, and his or her system would no longer accept messages from the offending host. The Internet is great at sharing, and blacklists gave rise to collaborative blacklists, sometimes called RBLs for Realtime Black Lists, or ORBS for Online Black Lists, or even DNSBL for DNS Black Lists. Email admins can subscribe to theses blacklists, or consult them through lookups in real time, to decide whether a sending system is a legitimate messaging system or a spammer.
Filtering systems use analysis of sending systems, SMTP headers, and subject lines, and message content to decide whether a message is legitimate, or is spam. Heuristics are used to analyze messages for indicators of legitimacy or spam. Bayesian filters can dig deep into a message and adapt over time to changes. You’ve probably seen spam that contains random strings of words or seemingly unrelated paragraphs of text after the spam content. These are the spammers attempts to fool the filters.
The RFCs define how sending and receiving email systems should work. How one system connects to another, the process of establishing an SMTP session, the way systems identify one another, exchange data, and then close the connection are all spelled out. Most legitimate systems adhere strictly to the RFCs in how they behave, but also allow some latitude when other systems do not. Many spammers use programs or scripts that play fast and loose with the RFCs. Protocol analysis can look at how a sending system performs, and identify things that are different between “real” email systems and spammers, such as closing a session with QUIT, and if a message looks to be sent by a spammer, classify received messages as spam rather than delivering to the end user.
There are two main ways that receiving systems can authenticate sending systems to verify whether they are legitimate messaging systems or not. Sender Policy Framework (SPF) uses text records in DNS to identify all systems that should legitimately be sending mail for a domain. DomainKeys Identified Mail uses digital signatures to verify the authenticity of a message, again relying upon text records in DNS to circulate the public key of the sending system. Both have their advocates and opponents, and neither is as widely used today as I’d like to see. Both also either have to be a part of a series of tests, or you run the risk of denying significant amounts of legitimate mail since they are not yet widely adopted.
Of course, today’s anti-spam systems are a multi-layered combination of many different technologies, each playing its own important part in the overall effort to block spam. In our next post, we’ll talk about those combination approaches and how they are the most effective solution we have to block spam.