Ten years ago we measured mailbox sizes in megabytes. A 20mb mailbox was adequate. A 100mb mailbox was a luxury.
Today we measure mailbox sizes in gigabytes. A single message in today’s email communications could easily consume the entire mailbox quota of a decade ago. We’re sending more email, bigger email, and keeping it longer.
Email server products such as Microsoft Exchange Server have responded to this growth in storage needs with support for more processing power, more efficient database schemas, and improved performance on storage hardware.
In fact, most of the storage performance gains of the last 4 years have been in the efficiency of the Exchange Server product itself, not in the performance capabilities of storage hardware. Hard disks are getting bigger, but they aren’t getting faster.
As we become more reliant in the ability to retain and access email data quickly it is no surprise that we are storing more and more of it in our mailboxes. This increase in email storage reveals some new bottlenecks in IT systems – the ability to adequately back the data up.
Backups are experiencing similar growing pains to disk storage. Tape speeds and capacities increase through new generations of the technologies, but when disk speeds and network speeds don’t increase with them there is only so much throughput that you can achieve. Eventually many larger enterprises reach a stage in which a nightly, full backup of the Exchange system is not possible within the backup window.
Three key technologies have surfaced to help enterprises manage these growth issues with email storage:
- Synthetic Backups
Email archiving usually involves moving older, less frequently accessed data from the primary storage to a secondary storage system. The secondary storage system may be built in to the email server, such as Exchange Server 2010’s archiving feature, or it might come in the form of a third party product that integrates with Exchange.
The benefit of archiving in reducing backup load is that once the data is stored in the archive it can be subject to different backup schedules than primary email storage. While daily full backups of the primary storage might be a requirement, the archive stores may only require weekly or even monthly backups depending on the archive policies in place.
Meanwhile, good email archive systems still provide fast access to archived email items when required by end users.
A synthetic backup combines the efficiency of an incremental backup (in which only data that has changed since the last backup is backed up) with the restorability of a full backup, by combining data from the incremental backup with existing data in the backup system from earlier backups to form a new, full backup.
In other words, if a file is already stored in the backup system and hasn’t changed, the backup system doesn’t need to copy it from the server again, it simply uses its existing local copy to “stitch together” a complete backup of the server. Because not all data on a server is likely to change every day, the backup takes far less time than a full backup would, but achieves the same end result.
These synthetic backups can then be duplicated on to removable storage such as tape media to send offsite for longer term storage.
Data de-duplication for backups means that multiple copies of the same data are not required to be backed up individually. This is particularly effective in email systems, for example when 100 people all received the same email attachment only one copy of that email attachment needs to be backed up.
This reduces the amount of backup storage needed, but also the amount of backup traffic generated. When the de-duplication occurs at the backup client itself there is less data required to be transmitted to the backup server, reducing overall backup times, yet still achieving a full backup.
Thanks to these three technologies of archiving, synthetic backups, and de-duplication the growing email storage needs of enterprises can be delivered while still achieving a reliable and effective backup regime.