Does the Removal of Single Instance Storage Mean Less Efficient Exchange Servers?
Written by Paul Cunningham on December 17, 2009
The Register published an article describing the removal of Single Instance Storage (SIS) from the Exchange Server 2010 database engine. The comments on the article were largely critical of the change, declaring it a backward step in Exchange storage efficiency.
Before I discuss this further it helps to go back in time a little and understand what SIS is and how it came to be a part of Exchange Server.
Exchange Server Single Instance Storage
The essence of SIS is the storage of a single copy of data that is shared among different users or computers. In the case of Exchange Server this meant a single copy of an email or file attachment is kept in the mailbox database, and then any user who received that item accesses it via a pointer to that one single instance of the item.
SIS was introduced in the Exchange Server product line in version 4.0 back in 1996. Back then disk storage was very expensive compared to today’s prices. Disks were much larger in physical size, smaller in capacity, and slower in performance. Under those conditions SIS made a lot of sense to reduce the overall size of mailbox databases.
SIS remained a part of the Exchange database engine right up to Exchange Server 2007. However in this time the price of disk storage has greatly reduced. The database engine itself had also been improved and between Exchange Server 2003 and 2007 saw as much as a 70% decrease in IOPS (Inputs/Outputs per Second) requirements, meaning more users and larger databases could be stored on fewer disks.However some constraints still remained, such as a recommended maximum database size of 100Gb even with Exchange Server 2007. This meant that many organizations deployed multiple databases within their environment, and of course a lot of organizations have multiple servers as well.
Because of this distribution of mailboxes across multiple databases and servers the benefits of SIS were beginning to decrease. Remember that SIS is only effective within a single database. If an email is sent to recipients on multiple databases each database stores its own instance of the data.
Instead of database reductions in the range of 20% the actual benefits were reducing to less than 10% in a lot of cases. At the same time the Exchange Server product team were looking for ways to further improve the database engine.
Exchange Server 2010 Database Engine Improvements
These improvements were delivered in Exchange Server 2010, with further IOPS reductions of up to 50% and an increase in maximum database size to 2 Terabytes. These improvements came thanks to a restructure of the database schema to reduce the number of tables that needed to be accessed by the average user session. A side effect of this change is that SIS is no longer possible under the new table structure.
You might be thinking that the loss of SIS will cause Exchange databases to balloon in size. Fortunately Microsoft has already addressed this by adding compression to attachments and header information stored in the database. Microsoft says that this compression has been shown to be equally as effective, and in some cases more effective, than SIS at reducing overall database size. Furthermore the processing overhead required to compress and decompress this data has been more than offset by the overall performance gains from the database schema optimizations.
More Efficient Server, Less Efficient Storage?
So what about data duplication then? Many critics say that introducing data duplication into the Exchange database engine is a serious mistake and a step backward. The trend in data architecture is towards data de-duplication.
These concerns are somewhat unwarranted though, for a few reasons:
- Data duplication is a problem that extends far beyond Exchange Servers. Consider the average network drive full of multiple draft versions of documents. The impact of Exchange on this, considering the compression gains, is negligible if any at all.
- The use of email as a file sharing medium is slowly reducing and this trend will continue. Products such as SharePoint allow shared files to be referenced in email as a link instead of as a full attachment.
- Disk storage is very cheap, especially the SATA/SAS drives that are now completely capable of hosting the highly efficient Exchange Server 2010 databases.
- Where organizations choose instead to use more expensive SAN storage for Exchange Server 2010, de-duplication can occur at the block level as many SANs include this capability themselves.
- In a highly available Exchange Server 2010 environment data is going to be duplicated several times anyway thanks to the Database Availability Group feature.


