Optimized Data Services Part II
Friday, September 19th, 2008Last time I suggested a new model for data management, the Optimized Data Services model – an integrated suite of capabilities that would operate in a virtual environment, intelligently managing data use, protection and archiving, which in essence creates a services oriented utility model for critical data services. Today I’d like to continue on this topic, covering the need for integrated risk management, deduplication, and general interoperability.
In my vision of an ODS utility which expands beyond simple storage virtualization and pooling, we need to include data protection, thin provisioning and capacity on demand, storage resource management, intelligent integration with applications , and risk mitigation must also be included. Risk mitigation means the solution would need to provide built-in encryption and inherent off-site replication of all data sets. To reduce WAN costs, data optimization over WAN links would also need to be included.
Since many organizations are obligated by law or regulations to provide removable media copies, the ability to transparently integrate with tape formats and tape-based archiving would also be beneficial. Since tape is also low cost and removable, it should be used by the solution for long-term archives, and data should move transparently based on policy to tape-based media. The media should be automatically encrypted by the solution without requiring expensive tape hardware or libraries that enable encryption. Furthermore, data must be stored in an immutable fashion for compliance and searchable for audit purposes. All datasets should also be deduplicated so that only a single instance of every data object is stored for archive.
Hashed based data deduplication should NOT be used though for structured data when applications need immediate access to it. Instead, the ODS utility should store structured data in native format ready for application use and rapid recovery (under a couple of minutes). Since data deduplication usually implies electronic hashing of data into unique objects, a recovery process would need to be applied to “re-constitute” the data. Instead, data should simply be stored more efficiently by monitoring the data stream and eliminating any “white space” within the file system or data blocks written by the application. During data replication, only these unique “sectors” of disk would need to be replicated and stored for recovery at the DR site. By simply storing data more efficiently, companies gain the benefits of data deduplication without the associated overhead or risk, and the datasets themselves are always instantly available for mounting to the same or a different application for recovery, testing, or DR. In fact, if data can be stored very efficiently, these space efficient “images” can be utilized for retention of multiple data points for many days, providing the ability to recover applications very rapidly to any point in time while saving costs.
The ODS utility should be flexible enough to accommodate not only existing protocols such as Fibre Channel and iSCSI, but also should embrace newer protocols such as FCoE and Infiniband so that rapid obsolescence can be avoided. Technology refresh of any component should be transparent to running applications, and maintenance MUST be able to be accomplished with minimal or no downtime. Scalability should be a simple factor of adding more computing power, connections, or ports in a modular fashion, and should not be limited or hampered by technical issues or artificial resource limitations such as file system limits, capacity, connectivity, or availability limitations.