Storage Resiliency
Storage resiliency refers to the ability of a storage system to withstand and recover from various types of failures, such as hardware failures, power outages, cyber attacks, and network disruptions, without losing data or causing significant downtime. StorONE uses erasure coding to protect each virtual storage container (VSC, also referred to as a volume in the user interfaces and documentation), providing a customizable and resilient storage system.
Storage pools
The physical media (hard disk, SSD, and NVMe drives) used by the S1 storage engine is grouped into pools. Although a drive pool may contain drives of the same media type but different capacities and performance characteristics, StorONE strongly recommends grouping drives with similar performance characteristics and capacities into the same pools. Generally, the drive technology (NVMe, SSD, or HDD) is the most crucial factor in determining its performance behavior.
The S1 storage engine provides storage through the use of logical volumes called Virtual Storage Containers (VSCs). Each VSC is defined with its own various features such as redundancy level, performance optimization, replication scheme, snapshot schedule, and so on.
A single pool may provide storage for multiple volumes. The following are some of the characteristics of pools:
- Unlimited number of VSCs per storage pool. To optimize performance, however, StorONE recommends using no more than 3 or 4 volumes per pool.
- The pool is dynamic; physical drives may be added or removed from the pool as needed. When removing drives, however, you must wait for the rebuild of each drive to complete before removing another to avoid compromising the integrity of the volumes.
- No limit on the amount or type of physical drives that can be made available for the pool.
- There is no need to format physical drives. The pool can use space from physical drives according to your predefined settings. Simply assign the drive to an existing pool to increase the capacity of the pool.
When you assign a physical drive to a pool, all of its free space becomes available to that pool. If the pool loses one of its physical drives, all information is recovered and spread across all the remaining physical drives. However, if the pool runs out of space, VSCs may lose redundancy level and become degraded. If a drive failure occurs in a pool used by a VSC without redundancy, data loss can occur.
Erasure coding
Erasure coding (EC) is an algorithm for encoding data. It implements advanced mathematical formulas to regenerate missing data from pieces of known data, called parity blocks. vRAID is a StorONE-patented technology that implements erasure coding in a unique way. StorONE storage engine uses vRAID technology to provide:
- Better availability
- Efficient storage
- Efficient disk recovery
The StorONE platform implements erasure coding to provide data resiliency and fast drive rebuild time. Unlike traditional erasure coding, S1 technology is more efficient. It minimizes CPU utilization overhead and storage stack complexity, delivering high performance and low latency.
The following table illustrates some differences between traditional RAID and StorONE patented vRAID:
Traditional RAID | vRAID |
---|---|
Slow rebuild times (can take days to rebuild a failed drive) | Rapid rebuild time (typically within hours) |
Require idle hot spares for reliability | Rebuilds across existing drives in the pool |
Uses parity drives for storing redundant information | Uses optimized erasure coding algorithm for storing redundant information |
Reliability
StorONE allows you to create a resilient virtual storage container (VSC) using less storage space for redundant information than a traditional parity RAID. StorONE uses erasure coding to save recovery data across all approved drives in the pool. As a result:
- Reliability is not dependent on a fixed number of physical drives.
- Admins can define the level of resiliency required on a per-volume basis.
- Physical drives may be added or removed from the pool without compromising the availability of data in the VSCs.
- Depending on your settings, the system can tolerate multiple drive failures simultaneously without losing data.
Data recovery
With erasure coding, data is fragmented, encoded, and stored across a specified number of drives. Erasure coding allows much faster data recovery than traditional RAID storage. Data can rebuild within hours, instead of days.
Storage efficiency
RAID mandates allocating space in advance. Traditional RAID also mandates the same redundancy level on all of its volumes. These mandatory requirements can have a profound impact on the rebuild time. In the event of a disk failure, the disk recovery process (the rebuild time) can be very slow. To increase reliability, traditional RAID uses hot spares (preallocated disks), forcing the user to dedicate space that sits idle until drive failure occurs. As a result, a large amount of volume space remains unused and unavailable.
Performance
With traditional RAID, when a drive failure occurs, the entire drive rebuild occors on a single hot spare or replacement drive. In many (if not most) cases, this a very slow operation that can take days.
In contrast, vRAID technology implements a proprietary optimized erasure coding algorithm. vRAID technology allows each VSC to have its own level of redundancy. Any drive in the pool used by the logical volume can store redundant data. When a disk failure occurs, the logical volume rebuilds the data across the remaining storage drives in the pool. With vRAID there is no need to rebuild the entire disk; only the lost information needs to be rewritten. Without excessive writing and without overloading a single disk, recovery time is much faster with StorONE.
No Comments