Storage Virtualization: An Overview
The concept of virtualization has been around for some time. Virtualization is really just the abstraction of an actual entity or construct into logical representations of those entities or constructs. Most of the time, the term “virtualization” is tied to server virtualization — a technology made popular by VMware, Microsoft, Xen, etc.
However, while server virtualization is the hot trend in enterprise IT, storage virtualization is making significant strides in functionality; people just do not realize it yet.
Storage virtualization involves abstracting the physical data storage process to more logical constructs inside of the storage device.
Within the context of a storage system, there are two primary types of virtualization that can occur:
- Block virtualization used in this context refers to the abstraction (separation) of logical storage (partition) from physical storage so that it may be accessed without regard to physical storage or heterogeneous structure. This separation allows the administrators of the storage system greater flexibility in how they manage storage for end users.
- File virtualization addresses the NAS challenges by eliminating the dependencies between the data accessed at the file level and the location where the files are physically stored. This provides opportunities to optimize storage use and server consolidation and to perform non-disruptive file migrations.
Let’s take a quick look at how storage virtualization is taking shape:
Traditional storage: Single disk
- A data consumer issues read/write requests. The disk controller either reads or writes to specific locations on disk.
RAID: Multiple disk
- This is one of the most widely used implementations for storage virtualization. While it may not seem like it, the data storage environment is indeed virtualized.
- Multiple disks are aggregated into a storage structure to increase storage, increase resiliency, or both.
- A data consumer issues read/write requests. The storage controller determines which storage devices contain the data, compute the entire request from multiple devices (potentially), and return it to the consumer. The data is no longer on a single device.
LUN: Multiple logical storage devices
- This takes RAID to the next level.
- A group of disks are placed into an array structure. The disks are aggregated in some fashion (typically in RAID levels). However, a subset of the allocated capacity is divided and presented to a data consumer as a LUN. The LUN is a logical storage device for a consumer.
Storage pooling: Spanning multiple drive array types
- Multiple tiers of storage are created based on storage device profile (capacity and performance), typically a RAID group or other physical storage enclosures.
- The storage device creates a higher-level structure, called a pool, of which the various performance tiers are members. The pool structure is presented to the data consumer at the LUN level.
- The storage controller stores metadata about which data blocks reside in which tier, and their location inside the tier.
Data migration: Moving data around
- Building on top of storage pools, storage controllers (via metadata) are able to determine the data access patterns for individual blocks of data.
- Frequently used data is moved to the highest performing tier of disk while less frequently accessed data is moved to the lower performing tier of disk.
- This migration occurs without the knowledge of the data consumer. The consumer sees the storage as a LUN and does not know (or care) about what happens as long as the data is available.
Deduplication: Sharing common data
- Many data structures share the same data patterns. Microsoft Word files share the same framework across all files, regardless of content. Microsoft Windows servers all have common files. Conceptually, deduplication addresses the idea of “Why store multiple copies of the same data over and over again?”
- Based on the type of algorithm, the storage device processes existing data to determine if any duplicate data exists.
- In the event of duplicate data, the storage controller creates pointers to the common data. Common blocks are replaced by a pointer, and the overall storage footprint is reduced.
Thin provisioning: Not allocating storage at creation time
- This functionality operates under the theory that space may be allocated but never fully used, resulting in unused space that cannot be used by anyone else.
- The storage controller receives a request to allocate space for a data consumer. The controller creates the basic framework that represents a LUN. However, internal to the storage device, the space is not allocated. Rather, the LUN is basically authorized to consume a specific amount of disk space.
- As the disk consumer continues to use storage space, the LUN grows on the storage controller until the LUN size is completely allocated. Until the LUN is fully utilized, the unused space can be used for other purposes.
- This may result in over-allocation of storage, though, and needs monitoring.
Storage Virtualization Continues to Advance
As you can see, from traditional file storage to thin provisioning, storage virtualization has played a major role in advancing how we use our storage infrastructure and reap the benefits from our investments.
Storage virtualization techniques and technology continue to advance. Object storage, pNFS, and server virtualization functional offload will become more commonplace as new storage device models and feature sets are developed and introduced.
Storage virtualization is not the process of storing virtual machine disks. Rather, it is a beast of its own, and continues to provide valuable benefits.