Hedvig Architecture Overview PDF
Hedvig Architecture Overview PDF
Version 4.1
April 2017
Technical and Architectural Overview
Table of contents
Executive overview ..................................................................................................... 3
Introduction.................................................................................................................. 4
Hedvig system and component overview ........................................................................ 4
Hedvig deployment options ................................................................................................6
Hedvig technical differentiations ....................................................................................... 7
Core architecture and I/O flow................................................................................. 9
Architecture basics ...............................................................................................................9
Storage Pools and Containers ...................................................................................................... 9
Metadata, data, and cluster coordination processes ............................................................. 10
Block, file, and object storage support ..................................................................................... 10
Hedvig Storage Proxy Caches ............................................................................................ 11
Metacache ...................................................................................................................................... 11
Block cache ..................................................................................................................................... 11
Dedupe cache ................................................................................................................................ 11
I/O basics .............................................................................................................................. 12
Summary of write operations ..................................................................................................... 13
Summary of the read operation ................................................................................................. 14
Enterprise storage capabilities ................................................................................ 14
Scalability and performance ............................................................................................. 14
Storage efficiency ............................................................................................................... 15
Thin provisioning .......................................................................................................................... 15
Deduplication ................................................................................................................................ 15
Compression.................................................................................................................................. 15
Compaction ................................................................................................................................... 16
Auto-tiering ................................................................................................................................... 16
Enterprise Resiliency .......................................................................................................... 16
High availability ............................................................................................................................. 16
Non-disruptive Upgrades (NDU) ................................................................................................ 17
Disk failures.................................................................................................................................... 17
Replication ..................................................................................................................................... 18
Snapshots and Clones ................................................................................................................. 20
Manageability ...................................................................................................................... 20
Hedvig GUI .................................................................................................................................... 20
Hedvig CLI ...................................................................................................................................... 21
Hedvig RESTful APIs ..................................................................................................................... 21
Ecosystem integration........................................................................................................ 21
VMware .......................................................................................................................................... 21
Docker ............................................................................................................................................ 22
OpenStack ..................................................................................................................................... 23
Multitenancy ........................................................................................................................ 24
Summary and conclusion ........................................................................................ 26
Terms and definitions ............................................................................................... 26
Executive overview
Today’s enterprises are global, information-led businesses, regardless of what they
produce or the services they deliver. The agility and precision of IT systems means the
difference between winning or losing customers, and translating new concepts into
revenue-producing products and services. The world’s most admired and best-run
businesses use IT for their competitive advantage.
Virtualization, automation, and self-service are now the cornerstones of modern enterprise
data centers. Traditional approaches to storage do not fit in this new paradigm. A new,
software-defined storage (SDS) approach is required to keep pace with the exponential
growth of data while still achieving automated, agile, and cost-effective infrastructure. This
approach is based on the hyperscale (sometimes referred to as web-scale) approaches
pioneered by Amazon, Facebook, and Google. Gartner Research predicts that by 2017,
web-scale IT will be an architectural approach found operating in 50% of global enterprises,
up from less than 10% in 2013. And when applied to storage, Gartner predicts enterprises
will experience a 50% reduction in storage total cost of ownership (TCO), in addition to
experiencing the benefits of scalability and flexibility from SDS offerings. Key components
of an effective hyperscale IT architecture include a commodity approach to hardware,
software-defined infrastructure, agile processes, and a collaboratively aligned organization
leveraging DevOps. Done right, this approach cost-effectively addresses issues like
exponential growth of data, shadow IT, and “siloed” infrastructure.
Hedvig provides software-defined storage built on a true, hyperscale architecture that uses
modern, distributed systems techniques to meet all of your primary, secondary, and cloud
data needs. The Hedvig Distributed Storage Platform transforms industry standard x86 or
ARM servers into a storage cluster that scales from a few nodes to thousands of nodes. Its
patented Universal Data Plane™ architecture stores, protects, and replicates data across
any number of private and public cloud data centers. The advanced software stack of the
Hedvig Distributed Storage Platform simplifies all aspects of storage with a full set of
enterprise data capabilities, which can be granularly provisioned at the application level
and automated via a complete set of APIs.
This whitepaper describes the Hedvig Distributed Storage Platform architecture, its
enterprise storage capabilities, and how Hedvig delivers business agility, IT flexibility,
competitive advantage, enhanced scalability, and significant cost reductions.
Introduction
Hedvig system and component overview
The Hedvig Distributed Storage Platform transforms how you deliver and manage
enterprise storage. It is fully programmable with a complete set of RESTful APIs that
simplify and automate management and plug into any orchestration framework. It consists
of three primary components (see Figure 1):
1. Hedvig Storage Service — The primary component in the architecture. The Hedvig
Storage Service is Hedvig's patented distributed systems engine. It installs on
commodity x86 or ARM servers and transforms existing server and storage assets —
including SSD/flash media and hard disk — into a full-featured elastic storage
cluster. The software deploys to on-premises infrastructure or to hosted or public
clouds to create a single storage cluster that is implicitly hybrid.
2. Hedvig Virtual Disk (vDisk) — The Virtual Disk is the fundamental abstraction unit
of the Hedvig Distributed Storage Platform. Organizations can spin up any number
of concurrent vDisks — each thinly provisioned and instantly available. There are a
series of user-configurable attributes that can be set when provisioning a vDisk (see
Figure 2). These attributes include:
Disk Type. Designate the type of storage protocol to use for the Virtual Disk —
block or file. Note: Object containers/buckets are provisioned directly from
OpenStack via Swift, via the Amazon S3 API.
Block Size. Set the Virtual Disk block size to either 512 bytes or 4K. Object-based
Virtual Disks have a standard 64K block size.
Residence. Select the type of media on which the data resides. Hedvig provides
two options: hard drive (HDD) and Flash (including support for NVME and 3D
NAND SSDs).
RDM. Select if raw device mapping (RDM) will be utilized with the Virtual Disk.
RDM provides a virtual machine with direct access to a LUN.
Clustered File System. Indicate if the Virtual Disk will be used with a clustered
file system such as VMFS. When selected, the Hedvig platform enables
concurrent read/write operations from multiple VMs or hosts.
Client-Side Caching. Select to cache data to local SSD or PCIe devices at the
application compute tier to accelerate performance.
Replication Policy. Set the policy for how data replicates across the cluster. The
policies are Agnostic, Rack Aware, or Data Center Aware. Refer to the Replication
section for further details.
Replication Factor. Designate the number of replicas for each Virtual Disk.
Replication factor is tunable, ranging from one to six.
Business benefit: A consolidated platform cuts down the cost of learning and
managing disparate storage solutions. By eliminating “siloed” infrastructure, Hedvig
simplifies and improves overall cost efficiencies.
Business benefit: Hedvig eliminates the need for enterprises to deploy bolted-on
or disparate solutions to deliver a complete set of data services. This simplifies
infrastructure and further reduces overall IT capex and opex.
mobile devices via a native HTML5 interface that doesn’t require Flash or Java. This
brings the provisioning simplicity of public clouds like AWS to any data center.
Business benefit: Hedvig reduces the overhead of storage operations and enables
tasks that would normally take days, weeks, or even months to be completed in
minutes. This improves business responsiveness, eliminates downtime due to
human error, and significantly reduces opex costs.
Architecture basics
Storage Pools — logical groupings of disks/drives in the storage nodes, which are
configured as the protection group for disk/drive failures and rebuilds. A typical
storage node will host 2-4 storage pools. Refer to the section on disk failures for
further details.
Figure 4 — Hedvig Distributed Storage Platform abstractions: Virtual Disks, Storage Pools and
Containers
Pages, a metadata process — Pages are responsible for how and where data is
written, tracking all reads and guaranteeing all writes in the system. Pages tracks the
storage node, Storage Pool, and container locations of all data replicas. Metadata is
a key component of the underlying storage cluster, and is also cached by the Hedvig
Storage Proxy to enable metadata queries from the application tier.
HBlock, a data process — HBlocks are responsible for the layout of data on raw
disks.
File storage — Hedvig presents a file-based Virtual Disk to one or more Storage
Proxies as an NFS export. The administrator can then provide access to specific
hosts from the Hedvig software. The Storage Proxy acts as an NFS server that guest
VMs, containers, and operating systems can utilize and browse. In this way, the
Hedvig platform supports multi-writer file storage environments.
Object storage — Buckets created via Amazon S3 APIs or containers via OpenStack
Swift APIs are translated via the Hedvig Storage Proxy and internally mapped to
Hedvig Virtual Disks. The Hedvig cluster acts as the object (S3/Swift) target which
clients can utilize to store and access objects.
Metacache
Metacache stores a copy of the storage system metadata at the application host to enable
local lookups of data replica information. This function is always enabled at the Storage
Proxy and the metadata It is recommended that the metacache be stored on local SSDs.
Block cache
For Virtual Disks that have the Client-Side Caching option enabled during provisioning, the
block cache stores a working set of disk blocks to local SSD/PCIe drives to accelerate reads.
By returning blocks directly from local flash media, read operations avoid network hops
when accessing recently used data.
Dedupe cache
When enabled, the dedupe cache resides on local SSD media and stores fingerprint
information for Virtual Disks that use the deduplication policy. It allows the Storage Proxy
to determine whether blocks have been previously written, and if so, will bypass the need
to write the data over the network to the storage cluster. The Storage Proxy first queries
the cache to assess if the data is duplicate or not. If duplicate, the Storage Proxy simply
updates Pages to map the new block(s) and sends a write acknowledgement immediately
back to the application. If the data is unique, the Storage Proxy will next query the backend
storage cluster to see if the data has been previously written. If so, the dedupe cache and
Pages are updated and the ack goes back to the client. If not, the write proceeds as a
normal, new data write.
I/O basics
The Hedvig Distributed Storage Platform delivers a unique distributed systems
architecture, but overall storage operations are familiar to server, storage, or virtualization
administrators:
1. An admin provisions a Hedvig Virtual Disk with the associated storage policies via
GUI, CLI, or APIs.
2. Block and file Virtual Disks are attached to the Hedvig Storage Proxy, which presents
the storage to application hosts. In the case of object storage, applications directly
interact with the Hedvig vDisk via Amazon S3 or OpenStack Swift protocols.
3. The Hedvig Storage Proxy captures guest I/O through the native storage protocol
and communicates it to the underlying Hedvig storage cluster via remote procedure
calls (RPC).
4. The Hedvig Storage Service distributes and replicates data throughout the cluster
based on individual Virtual Disk policies.
5. The Hedvig Storage Service conducts background processes to auto-tier and balance
across racks, datacenters, and even public clouds based on individual Virtual Disk
policies.
More details can be found in the following read and write operations sections.
As the application writes to the Hedvig cluster, the high-level process is as follows (see
Figure 7):
1. The Hedvig Storage Proxy determines the replica nodes for the blocks to be written
and sends the blocks to one of the replica nodes in a load-balanced manner.
As describe above, if the Virtual Disk has deduplication enabled, the Storage
Proxy calculates a fingerprint, queries the dedupe cache, and if necessary, Pages,
and either makes a metadata update or proceeds with a new write.
2. HBlock on the replica node receives and writes the blocks locally and forwards them
to the other replica nodes.
HBlock writes incoming blocks to local memory and the Hedvig Commit Log on
SSD. Data is later flushed sequentially to local storage (see step 4).
For replication Factor = 3, 2 acks (RF/2 + 1) are needed for the quorum. Two of
the three replicas are written synchronously and one asynchronously.
4. The HBlock flushes larger sequentialized data chunks to underlying storage media
per the Virtual Disk policies to optimize random writes for efficiency.
1. The Hedvig Storage Proxy queries the local metacache for the version, timestamp,
and the replica storage nodes for a particular block to be read and consults the
Pages process if the info is not found.
2. The Hedvig Storage Proxy sends the block details to one of the closest HBlocks,
based on observed latency.
3. HBlock reads the data and sends the block(s) back if found. If the read operation
fails due to any error, the read is attempted from another replica.
If Client-Side Caching is enabled, the Hedvig Storage Proxy queries the local
cache to fetch the data instead, bypassing the remote HBlock and eliminating
the need to traverse the network.
The platform can scale performance and capacity linearly and independently. For the
backend cluster, you add more storage nodes to scale capacity and more powerful
Note that while the performance of the system is subject to the workload and the actual
system configuration, the Hedvig Distributed Storage Platform can support sub-millisecond
latency, and scale to tens of thousands of IOPS and hundreds of MB/sec per Virtual Disk.
Reference the Hedvig hardware guide for specific recommendations on capacity and
performance optimized configurations and expected performance.
Storage efficiency
The Hedvig storage platform contains a rich set of advanced storage efficiency capabilities,
grouped in four major categories.
Thin provisioning
Each Hedvig Virtual Disk is thinly provisioned by default and doesn’t consume capacity until
data is written. This space-efficient dynamic storage allocation capability is especially
significant in DevOps environments that use Docker, OpenStack, and other cloud platforms
where volumes do not support thin provisioning inherently, but can when using Hedvig.
Deduplication
The Hedvig Distributed Storage Platform supports inline global deduplication that delivers
space savings across the entire storage cluster. Deduplication is not one-size-fits all. It can
be toggled at the Virtual Disk level to optimize I/O and lower the cost of storing data for the
data and apps that are suited to data reduction. As writes occur, the storage system
calculates the unique fingerprint of data blocks and replaces redundant data with a small
pointer. The deduplication process can be configured to begin at the Storage Proxy as
highlighted previously, improving write performance and eliminating redundant data
transfers over the network. Data reduction rates vary based on data type, with most
clusters seeing an average 75% reduction.
Compression
The Hedvig Distributed Storage Platform supports compression at the storage node level
that can be toggled at the Virtual Disk (application) level to optimize capacity usage.
While the actual compression ratio and speed depends on the data type and the system
configuration, typical compression ratios are 40% with compress and decompress speeds
are between 250 MB/sec to 500 MB/sec.
Compaction
To improve read performance as well as to optimize disk space, Hedvig periodically does
garbage collection to compact redundant blocks and generate large sequential chunks of
data.
Auto-tiering
The Hedvig Distributed Storage Platform balances performance and cost by supporting
tiering of data. To accelerate read operations, the platform supports Client-Side Caching of
data on SSDs accessible by the Storage Proxy. Data is also cached on storage node SSDs.
For all caching activities, Hedvig supports use of PCIe and NVMe SSDs. All writes are
executed in memory and flash (SSD/NVMe) and flushed sequentially to disk when the
appropriate thresholds are met. For persistent storage of data, Hedvig supports a Flash
(MLC or 3D NAND SSD) or HDD (spinning disk) residence options at the Virtual Disk level.
Enterprise Resiliency
The Hedvig storage platform is designed to survive disk, node, rack, and datacenter
outages without causing any application downtime and minimize any performance impact.
These resiliency features are grouped in four categories.
High availability
Storage nodes running the Hedvig Storage Service support a distributed redundancy model
with a recommended minimum of three nodes. The redundancy can be set as agnostic, at
the rack level, or at a data center level. The system initiates transparent failover in case of
failure. During node, rack, or site failures, reads and writes continue as usual from
remaining replicas.
To protect against a single point of failure, the Hedvig Storage Proxy installs as a high
availability (HA) active/passive pair. A virtual IP (VIP) assigned to the HA pair simply
redirects network traffic automatically to the active Storage Proxy at any given time. If one
Storage Proxy instance is lost or interrupted, operations seamlessly fail over to the passive
instance to maintain availability. This happens without requiring any intervention by
applications, administrators, or users.
During provisioning the administrator can indicate that a host will use a clustered file
system such as VMware VMFS. This automatically sets internal configuration parameters to
ensure seamless failover when using VM migration to a secondary physical host running its
own Hedvig Storage Proxy. During live VM migration (such as VMware vMotion or Hyper-V
Live Migration) any necessary block and file storage “follows” guest VMs to another host.
Storage nodes running the Hedvig Storage Service undergo upgrades first one node at a
time. Any I/O continues to be serviced from alternate, available nodes during the process.
Storage proxies are upgraded second starting with the passive Storage Proxy. Once the
passive Storage Proxy upgrade is complete, it is then made active, and the formerly active
Storage Proxy is upgraded and resumes as the passive of the pair. This process eliminates
any interruption to reads or writes during the upgrade procedure.
Disk failures
The Hedvig Distributed Storage Platform supports efficient data rebuilds that are initiated
automatically upon disk failure. The Hedvig Storage Service recreates data from other
replicas across the cluster performing a wide-stripe rebuild.
The rebuild is an efficient background process that happens without impact to primary I/O.
The average rebuild time for 4TB disk is under 20 minutes. As such, the platform easily
supports the latest 8TB and higher capacity drives, something not feasibly supported by
traditional RAID-protected systems.
Replication
The Hedvig Storage Service uses a combination of synchronous and asynchronous
replication process to distribute and protect data across the cluster and provide near zero
recovery point objectives (RPO) and recovery time objectives (RTO). It supports an
unlimited number of active data centers in a single cluster with up to six copies of data per
Virtual Disk. The platform supports this through the tunable Replication Factor and
Replication Policy parameters. Replication Factor (RF) designates the number of replicas to
create for each Virtual Disk and Replication Policy defines the destination for the replicas
across the cluster.
It is important to note that replicas occur at the Container level of abstraction. If a 100 GB
Virtual Disk with RF = 3 is created, then the entire 100 GBs are not stored as contiguous
chunks on three nodes. Instead, 100 GBs is broken up into several containers and replicas
are spread across the different storage pools within the cluster.
The table below summarizes the Replication Factor recommendations for different
workloads and the associated system impact.
For additional disaster recovery (DR) protection against rack and data center failures, the
Hedvig Distributed Storage Platform supports replication policies that can span multiple
racks or data centers using structured IP addressing, DNS naming/suffix, or customer-
defined snitch endpoints. Hedvig recommends using the fully qualified domain name to
identify the rack and datacenter location. An example name would be:
node.rack.datacenter.localhost.
In a disaster recovery setup where the Replication Policy = Data Center Aware, the
Replication Factor = 3, the Hedvig Distributed Storage Platform breaks up the data into
Containers and ensures three copies of each container are spread to geographically
dispersed physical sites — Data Centers A, B, and C. Two copies of the data are written
synchronously and the third is written asynchronously. At any time if a data copy fails, re-
replication is automatically initiated from replicas across the data centers.
Manageability
The Hedvig Storage Platform is designed to be simple and intuitive for storage, server,
virtualization, and DevOps professionals. It enables complex storage operations to be
completed in seconds. Hedvig supports three interfaces for deployment and ongoing
management.
Hedvig GUI
The Hedvig Distributed Storage Platform provides a simple, intuitive graphical user
interface (GUI) that is customizable and skinnable (new themes can be applied). It supports
a rich set of metrics per Virtual disk or per Storage Proxy. The administrator can get real
time insights into performance including IOPS, throughput, and latency statistics. It is
delivered with HTML-5 support and works responsively across all modern devices including
locked down servers and mobile phones.
Hedvig CLI
The Hedvig Distributed Storage Platform provides a comprehensive Command Line
Interface (CLI) which is fully scriptable and gives users full control of the features and
functionality. The CLI is accessible via a SSH connection using a Linux shell or Putty-like
utility.
Ecosystem integration
VMware
The Hedvig Distributed Storage Platform features a vCenter plug-in that enables
provisioning, management, snapshotting and cloning of Hedvig Virtual Disks directly from
the vSphere Web Client (see Figure 16). Additionally, Hedvig incorporates support for
VMware’s vStorage APIs for Array Integration (VAAI), allowing the offloading of host
operations to the platform.
Docker
The Hedvig Distributed Storage platform provides persistent, portable storage for
containers through the Docker Volume API for NFS and via the open-source Flocker
solution for iSCSI (see Figure 17). With these APIs, users can create a Docker volume, which
in turn creates a new Hedvig Virtual Disk. Different parameters like deduplication,
compression, replication factor and block size can be set for each Docker volume, using the
“Volume options” in the Docker Universal Control Plane (UCP) or via the Flocker
configuration file. The disk can then be attached to any host. Use of these APIs creates a file
system on this Virtual Disk and mounts it using the path provided by the user. The file
system type can also be configured by the user. All I/O to the Docker volume goes to the
Hedvig Virtual Disk. As the container moves in the environment, the Virtual Disk will be
automatically made available to any host and data will be persisted using the policies
chosen during volume creation.
The Hedvig-Flocker driver makes API calls (internally mapped to Hedvig CLI commands) to
support the rich set of Docker volume commands. Hedvig also supports more advanced
Flocker capabilities, such as Storage Profiles. Storage Profiles create pre-configured classes
of service (CoS) that developers can use to simplify storage provisioning. By default, the
Hedvig-Flocker solution supports Gold, Silver, and Bronze Storage Profiles that vary the
Client-Side Caching, Deduplication, Compression, Residence, Replication Factor, and
Replication Policy attributes.
OpenStack
The Hedvig Distributed Storage Platform delivers block, file, and object storage for
OpenStack all from a single platform via native Cinder and Swift integration (see Figure 18).
With Hedvig you can set granular, per-volume (Cinder) or per-container (Swift) policies for
capabilities like compression, deduplication, snapshots, and clones. OpenStack
administrators can provision the full set of Hedvig storage capabilities in OpenStack
Horizon via OpenStack’s QoS functionality. As with VMware, administrators need not use
any Hedvig UI or APIs. Storage can be managed from within the OpenStack interface.
Hedvig also provides a validated Mirantis Fuel plugin that automates installation of Hedvig
storage drivers with Mirantis OpenStack.
Multitenancy
Hedvig supports the use of Rack Aware and Data Center Aware for customers that need to
satisfying regulatory compliance and restrict certain data by region or site. However, these
capabilities also provide the backbone of a multitenant architecture, which Hedvig supports
with three forms of architectural isolation:
vDisk masking — In this option, different tenants are hosted on a shared (virtual)
infrastructure (see Figure 19). Logical separation is achieved by only presenting
Virtual Disks to a certain VM and/or physical hosts (IP range). Quality of Service
(QoS) is delivered at the VM-level.
Additionally, because the platform supports policies at the vDisk level, each tenant can
have unique Virtual Disks with tenant-specific storage policies. Policies can be grouped to
create classes of service (CoS) as per examples below:
The elastic, simple, and flexible Hedvig Distributed Storage Platform is the solution of
choice for enterprises as they look to modernize datacenters and build private and hybrid
clouds. It is a great fit for environments where explosive growth in data is affecting a
company’s bottom line — in terms of the cost-per-terabyte to store the data as well as the
operational overhead of managing silos of disparate storage infrastructure. Hedvig
empowers IT to reduce costs and improve business responsiveness.
Hedvig Storage Proxy A lightweight storage access layer that runs as a guest VM,
Docker container, or on a dedicated bare metal server to
provide storage resource access to compute environments.
The Hedvig Storage Proxy is also referred to as a Controller
VM (CVM).
Hedvig Virtual Disk (vDisk) The storage disk volume abstraction presented by the
Hedvig Distributed Storage Platform.
Hedvig Container A 16GB logical chunk of data.
Hedvig Storage Pool A group of three disks in a Hedvig Storage node used as
part of the distributed data protection mechanisms.
Hedvig Commit Log A file where data is written for durability. Stored on SSDs
and purged when memory buffers are flushed to disk.
Hyperscale A software-defined storage deployment that scales
storage nodes independent from application hosts.
Hyperconverged A software-defined storage deployment that scales
storage nodes in lockstep with application hosts.
Immutable file An unchangeable, authoritative data source to which the
Hedvig storage platform writes memory buffers.
Memory buffers A structure in memory where the incoming data is
sequentialized.
Node / storage node A commodity x86 or ARM server with disk drives,
running the Hedvig Storage Service.
Replicas Replicas are copies of Hedvig containers. The numbers of
replicas are defined by the Replication Factor and are
distributed across the cluster per the Replication Policy.