Attachment spam and its evolution

Spam has become a regular feature in today’s inboxes especially as it now appears as text-based spam, image spam and attachment spam. Unfortunately it is the downside of having an email address that is in use regularly and something that is generally accepted by most people. Many users and can determine what mail is spam and what is legitimate; however, spammers are con artists who continuously try to camouflage spam to make it appear as valid email coming from credible sources and containing valid content, fooling both users as well as anti-spam filtering software. The latest trend in spam creativity deals with attachment spam.

Research shows that between 65% and 90% of all email received is considered spam.

On an individual user basis, spam is annoying; it is a waste of time and often contains spyware, malware and even pornography that can be harmful to a computer as well as offending one’s morals. On a company-wide basis the same threats apply, however there is also the financial cost to manage spam that must be taken into account.

Text-based spam and image spam

Most spam tends to be text-based, which is a format that most anti-spam software can deal with and filter from legitimate mail. In order to block spam, email service providers and companies often relied on keyword ‘detection’, and drew up a list of keywords, such as ‘viagra’ or ‘bank’, that commonly appeared in most spam email. However, this method often blocked genuine email and adding more keywords simply resulted in more false positives which in turn blocked legitimate email. But spammers found a way around keyword blocking by replacing keywords such as ‘viagra’ to ‘v1agra’.

They also sought to bypass spam filters by coming up with a different format in which to spam people and therefore began using image spam. Image spam is created by storing the text of a message as a GIF or JPEG image and subsequently displayed in the email. This prevents text-based spam filters from detecting and blocking spam messages. By making use of image spam, spammers were attacking the defenses of most anti-spam solutions; whilst the images displayed text messages to the end-users, the anti-spam software was only able to see pixels. Often, image spam contains nonsensical, computer-generated text which simply annoys the reader.

Some email anti-spam solutions decided to go with OCR (Optical Character Recognition) to turn the images into text that the software could then use. However, spammers took their images to the next level. In an approach usually applied to CAPTCHA (an anti-spam solution that is used on web forums), they started fuzzing (including noise and distortions) images to make it even harder for the machine to recognize text. Although it is possible for the machine to read this text, the process is very CPU intensive – especially when it is handling multitudes of images every few seconds.

The rise of attachment spam, PDF spam, Excel spam and ZIP spam

In order to take spam to a even higher level, a technique emerged in June 2007 that proved to be very popular – attachment spam – where spam content was sent as a PDF, Excel or ZIP file attachment, rather than embedding them in the body of the email message.

This move was clever for a number of reasons:

  • Email users ‘expect’ spam to be an image or text within the body of the email and not an attachment.
  • Since most businesses today transfer documents using the PDF format, email users will have to check each PDF document otherwise they risk losing important documentation.
  • With most anti-spam software products on the market are geared towards filtering the email itself and not attachments, spam has a longer shelf-life within a network.
  • An attachment that is a PDF file has greater credibility in an email, thus making social engineering attacks much easier.
  • The ability to send large PDF files could result in a single spam attack causing huge bottlenecks on a company’s email server, reducing the quality and amount of bandwidth available.
  • By sending PDF attachments, spammers can also resort to phishing by attaching supposedly authentic documents from places such as a bank or service provider.

Seeing this new threat, anti-spam software vendors quickly came out with updates and filters that analyzed the body of every PDF file, thus contributing to a significant decrease in PDF attachment spam. Not to be defeated, spammers took less than a month to come out with a new option: Microsoft Excel files for pump-and-dump scams.

Once again this targeted users for the same reasons as those mentioned above. Attachments are not ‘normal’ spam messages; Excel sheets are commonly use in companies thus giving them more credibility as well as playing on users’ hesitance as to whether the attachment is legitimate or not, and not wanting to miss out on an important company document. And because most anti-spam software detection does not include attachments, these are likely to end up in a user’s inbox. Once again anti-spammers increased the level of detection in their software, to now include Excel files.

Not to be outdone, spammers concocted a new formula in August 2007 that involved compressing their text-based and Excel-based spam documents using the ZIP file format. This is effective for two main reasons:

  • Companies that do not use anti-virus software on their network could be easy targets for this type of spam
  • Users who may not be aware of security issues surrounding attachments are prone to opening these ZIP files. With spammers and hackers thriving in their unholy alliance, the risk of malicious files being packaged with pump-and-dump spam is all too real.

The use of multiple file format combinations that are commonly in use by email users appear to be spammers’ way forward.

Anti-spam filtering software for administrators

Spam continues to be a headache for administrators and end-users because spammers are constantly trying to stay one step ahead of anti-spam software vendors. Using keyword detection methods alone will not solve the problem because new spamming techniques, such as attachment spam, have overcome that hurdle. The solution lies in a product that deploys as many anti-spam techniques as possible, including Bayesian filtering and filtering for images/text embedded in different file-type attachments, while at the same time maintaining false positives at a minimum. In this way both text and image based and attachment spam will be intercepted. Moreover, the package should be easy to install and manage without adding unnecessary administrative burdens and the solution should efficiently handle spam with minimal end-user intervention.