Welcome to Part 2 of our software-defined storage terms glossary. In Part 1, we covered Storage Types and Form Factors, and Architectures in the software-defined storage space. No worries if you missed the first part – below you can download a PDF containing all 37 terms.
Software-Defined Storage Data Services, Policies and Features
The enterprise storage stack typically comprises a multitude of features. In a software-defined storage (SDS) environment, these features are implemented and managed via software, independent from the supporting hardware.
Auto-balancing – The process of load balancing a storage system by intelligently distributing workloads and data across cluster nodes. Auto-balancing enables the storage system to take advantage of available CPU, memory, and disk resources in the cluster to deliver optimal performance, efficiency, and resiliency.
Auto-tiering – Also known as automated storage tiering, this process dynamically matches data to storage nodes based on performance needs and access patterns. Auto-tiering employs a combination of flash (SSD) and spinning disk media, intelligently placing data onto appropriate media and nodes in real time. For example, frequently accessed data might be positioned on the fastest nodes with the highest performing media such as flash while infrequently accessed data might be positioned on slower nodes that host lower cost, higher capacity media like SATA (Serial Advanced Technology Attachment) drives or even in the cloud.
Cloning – Creating a copy of a data set, typically to be used for alternate purpose such as testing and development, backup and recovery, and quality assurance (QA) checks.
Compression – The process of reducing the number of bits needed to represent data to speed transmission over networks and save storage space.
Deduplication – The process of eliminating repeating data to reduce storage needs and streamline data transmission. Solutions that offer global deduplication typically analyze data system-wide to maximize reduction by looking for patterns across a large amount of data.
Disaster recovery (DR) policy – Refers to a definition, plan, or design of how an organization will ensure continuation or recovery of services in the event of a disaster such as equipment failure, security breach, fire, or natural disaster. In the technology arena, a DR policy dictates how data should be protected and recovered. This can include details on where data copies and backup systems should be located. It also typically defines recovery time objectives (RTOs) – the maximum time allowed between an outage and the resumption of operations, and recovery point objectives (RPOs) – the maximum amount of acceptable data loss measured in time.
Erasure coding – The process of breaking data into fragments and encoding them with redundant data that is stored elsewhere across the array. In the event of data corruption, the data can be rebuilt using information about the data stored across different locations.
Replication – The process of creating copies of data. SDS solutions that use a distributed systems approach replicate data across multiple nodes in a cluster. This provides an alternative to RAID (redundant array of independent disks), delivers protection against local failures and can also ensure continuous operations in the case of site-wide disaster. Replication can occur in a synchronous or asynchronous fashion. Both methods write data to primary and secondary locations.
- With synchronous replication, a write is acknowledged to a host only after primary and secondary locations have successfully stored the data.
- With asynchronous replication, a write is acknowledged when the primary storage location has successfully stored the data – the secondary copy proceeds independently.
SDS solutions employ a mix of synchronous and asynchronous replication to deliver data consistency without introducing unacceptable latency or delay to participating hosts. For instance, a three-way replication performs a synchronous operation between two nodes, but allows a third copy to take place asynchronously. This technique is sometimes referred to as semi-synchronous or semi-sync.
SDS solutions also provide controls to direct replication to physically distinct racks in a single data center (rack aware), and to physical distinct sites including private data centers and public clouds (data center aware).
Self-healing – The ability to recreate, rebuild, or repair data from damaged nodes onto other nodes in a storage cluster. As a SDS system grows, the self-healing process takes advantage of the aggregate additional power and resource, accelerating the process.
Self-provisioning – The ability for users to configure and utilize applications and services without support from an IT administrator. One of the goals of SDS is to provide a feature-set that enables self-provisioning of storage resources to speed deployment of capacity to support users, services and applications.
Sequential writes – The process of writing large blocks of data contiguously to a storage device. Sequential writes contrast to random writes, in which small blocks of data are written individually as they arrive to locations across a storage device. Sequential writes are typically faster than random writes and have a lower impact on the lifespan of SSD/flash drives. Some SDS solutions aggregate small, random I/O (input/output) into large sequential writes prior to committing to disk to streamline performance and ensure SSD friendly operations.
Snapshot – Traditionally, a read-only copy of a data set at a particular point in time. Snapshots provide a way to restore a data set to a prior state to recover from issues like data corruption. SDS solutions often implement highly-efficient snapshot capabilities, capturing point-in-time states of data simply with change to metadata.
Thin provisioning – The ability to create a storage volume without requiring pre-allocation and reservation of actual physical disk capacity. Thin provisioning enables more efficient use of available disk space by consuming physical storage only when actual data is stored. In contrast, traditional storage provisioning requires capacity to be dedicated and reserved at the time of provisioning.
Software-defined storage touches a number of different technology components. Familiarity with different types of components is integral to understanding how all the pieces fit together to orchestrate a modern, high-performance storage solution.
Bare metal – A computer with no operating system. Software installed onto a bare metal system typically forms the base operating environment for a given machine.
Container – Software container technology provides a lightweight and portable method for packaging an application that provides isolation from an operating system (OS) and physical infrastructure. Unlike a virtual machine, containers do not include a full operating system (OS), but instead share the OS of a host. Software containers allow an application to be contained and abstracted to simplify deployment between different platforms. Examples include Docker and Linux Containers (LXC).
Container may also refer to a granular unit of data storage. For instance, Amazon S3 (Simple Storage Service) uses the term ‘bucket’ to describe a data container. In certain SDS solutions the data that makes up virtual disks is stored in logical containers housed on various nodes in a cluster.
Clustered file system – A file system where data is shared simultaneously by multiple hosts. Clustered file systems such as VMware’s VMFS (virtual machine file system) enable functions like seamless VM (virtual machine) failover and migration with vMotion to take place.
Metadata – Metadata is data that describes other data. Examples of metadata could include information like the date created, date modified, and file size of a certain data set. Metadata in some cases is stored with the described data itself, and in other cases is detached and stored separately from the described data.
Storage proxy – A program or process that runs on a host to provide access to storage provisioned on a storage cluster. A storage proxy typically presents provisioned volumes via one or more industry-standard protocols (e.g., iSCSI, NFS). The storage proxy intercepts application read and write requests, transmitting data to the underlying cluster while also tracking which nodes hold copies of the data to facilitate read requests as needed. Advanced storage proxies also provide services such as local caching and deduplication to facilitate efficient, high-speed reads and writes.
Quorum – A scenario in which a function or activity requires a response from a majority of participants to be considered successful. In SDS, as data is replicated to multiple nodes (see replication above), for a write to be considered successful, a majority of nodes must acknowledge receipt of data. For instance, with a three-way replication, two nodes must acknowledge successfully receiving the data.
REST API – A simple application programming interface (API) known as representational state transfer or REST that supports interaction between clients and services. REST typically runs over Hypertext Transport Protocol (HTTP) and uses the same verb commands such as GET, POST, PUT, and DELETE used by web browsers when interacting with remote servers.
Storage pool – A logical grouping of multiple physical disks that are presented as a single entity.