h16016-ecs-best-practices-guide-wp
h16016-ecs-best-practices-guide-wp
April 2024
H16016.13
White Paper
Abstract
This document provides best practices for the Dell ECS software-
defined cloud-scale object-storage platform.
Copyright
The information in this publication is provided as is. Dell Inc. makes no representations or warranties of any kind with
respect to the information in this publication, and specifically disclaims implied warranties of merchantability or fitness for a
particular purpose.
Use, copying, and distribution of any software described in this publication requires an applicable software license.
Copyright © 2017-2024 Dell Inc. or its subsidiaries. Published in the USA April 2024 H16016.13.
Dell Inc. believes the information in this document is accurate as of its publication date. The information is subject to
change without notice.
Contents
Executive summary ....................................................................................................................4
Provisioning ..............................................................................................................................15
Security .....................................................................................................................................22
Operations.................................................................................................................................27
Conclusion ................................................................................................................................31
Executive summary
Overview Dell ECS is a software-defined, cloud-scale, storage platform offering for traditional,
archival, and next-generation workloads. It provides geo-distributed and multiprotocol
(Object, CAS, Atmos, and NFS) access to data. With ECS, any organization can deliver
scalable and simple public cloud services with the reliability and control of a private-cloud
infrastructure.
The goal of this white paper is to highlight general ECS best practices relating to physical
deployment, external infrastructure services required networking, provisioning, and
application development when using ECS APIs. It describes some of the common pitfalls
associated with deployment, provisioning, and lists practices to mitigate them.
March 2023 H16106.10 Updated for data retention with multiple replication groups
November 2023 H16106.11 Removed intermix between HDD and AFA cluster
We value your Dell Technologies and the authors of this document welcome your feedback on this
feedback document. Contact the Dell Technologies team by email.
Note: For links to other documentation for this topic, see the ECS Info Hub.
Architecture overview
ECS is a strongly consistent, indexed, object storage platform. It is a scalable solution
providing secure multitenancy and superior performance for both small and large objects.
ECS was built as a completely distributed system following cloud principles. The ECS
software running on commodity nodes forms the underlying cloud storage, providing
protection, geo replication, and data access. The software was built with the following
design principles in mind:
• Layered services for horizontal scalability
• Both the index and data use the same underlying storage mechanism
• Good small and large object performance
• Multiple protocol access—Object and File
• Geo replication with lower storage and Wide Area Network (WAN) overhead
• Global access—Read and write access from any site within a replication group
The following figure illustrates the different layers of ECS. For additional information, see
the ECS Overview and Architecture white paper.
Physical deployment
Overview of Strategic planning is critical to the success of any ECS deployment. Some of the factors
considerations to consider during physical deployment relate to:
• Space and power
• Networking
• Single-site and multisite considerations
Working closely with Dell personnel, reading through the documentation, and using tools
that are available for planning are important in designing ECS.
Planning Making assumptions relating to power, space, and infrastructure services, such as
documentation firewall, network, ACL, DNS, NTP, and so on, is a common pitfall and poses challenges
and tools for ECS installation. Thus, knowledge of requirements and existing infrastructure at the
customer site is important to mitigate this issue. Documentation and tools are available to
help plan, prepare, and design ECS to fit your requirements and eliminate the guesswork.
To review, the following components illustrated in Figure 2 form the basis of an ECS
deployment:
• Site: A site is a unique physical location, for example, a data center in Arizona,
USA. An ECS deployment consists of one or more sites.
• Site ID: Dell assigns a unique identifier to each site. All hardware, software, and
services are tied to individual site IDs.
• Rack: A rack consists of hardware that is physically in a single data center floor tile
space.
• Node: A node is basically a server in a rack. Racks generally consist of five or more
nodes.
• Cluster: One or more racks of hardware physically connected at a single site
constitutes a cluster. In general, each site has one cluster that is made up of one or
more racks of hardware, and federation is done between at most one cluster at
each site. That is, it is possible to have two clusters at a single site, but ECS is
designed to federate geographically, not locally. A cluster is also referred to as a
Virtual Data Center (VDC).
A VDC is built up of one or more racks, where each rack requires a tile space on the data
center floor. Racks communicate across the site's local area network (LAN) by means of
uplink network connections through a pair of 25 GbE switches for EX500, EX5000, and
EXF900. These switches may be purchased with the ECS as part of the solution or may
also be customer-provided switches. In addition, ECS communicates privately over a
closed backend for administrative tasks; no data travels over the backend network except
EXF900, which uses a dedicated RDMA backend network. Storage and performance
requirements primarily determine the quantity of racks deployed at each site. Floor space
and plans for future growth are also considerations.
A multisite deployment is built by federating two or more sites. ECS enables you to
configure replication either within a single site or across multiple sites. It provides flexibility
in solution design, allowing for data separation, protection against many types of failures,
and global access.
After understanding the terminology and components, review the documentation and tools
that can help in the planning and deployment. They include:
• ECS Hardware Guide—A guide that contains the hardware information (whether an
ECS appliance or customer rack is used)
• Security Configuration Guide—A guide that provides an overview of settings and
configurations for secure operation
• ECS Designer—An Excel spreadsheet for use in recording and centralizing
required information
All hardware, software, and licensing are associated with a specific and unique
site ID. Use the ECS Designer to perform the critical work of keeping site
information up to date. Verify the site information for accuracy from the earliest
planning stages, through the ordering process, and all the way through
provisioning, alerting, and remote access. Support issues are tied to site IDs as
well. Contact your account team if you do not receive the installation scripts from
the ECS Designer team.
Power and space Power and space are important considerations when planning an installation.
Underspecifying the power requirements can cause overload and overheating issues.
Consider the total weight of the rack. A fully loaded EX5000 ECS appliance weighs more
than a ton. Due to the density of ECS hardware, ECS might have unique requirements
such as custom rack size, depth, cable management, and brackets, which some locations
might not be typically equipped with. Knowledge of the power and space requirements
helps in alleviating issues and planning for future growth.
You must use the documentation and tools that are referenced in Planning documentation
and tools to ensure that installation locations within the data center are compatible with
requirements. Adhering to the requirements outlined in the documentation assists facilities
in supporting ECS.
• Customers who purchase ECS appliances but move the hardware to their own rack should plan
for the disposal of the cabinet purchased with the appliance.
• When expanding ECS clusters, purchase nodes for existing racks to consolidate space, and
purchase racks to allow for future consolidation.
• Consider reserving additional tiles for cluster growth.
• Allow extra time when purchasing hardware outside of a rack as the switches and nodes do not
come preinstalled with operating systems and require additional inspection.
• Consult the most recent hardware specifications guide for power requirements, dimensions, and
weight when ordering hardware.
Note: For ECS D- and U-series hardware models, the private switch is referred to as the turtle.
The following figure shows how ports in front-end network switches are intended to enable
ECS node traffic and customer uplink ports. This is standard across all implementations.
The following figure shows how ports are intended to be used to enable ECS
management traffic and diagnostic ports.
The EXF900 appliance uses the Dell S5248F for the front-end pair of switches and for the
pair of back-end switches and S5232F for the aggregation back-end switch. Customers
have the option of using their own front-end switches instead of the Dell switches.
The following figure show ports in a front-end network switch are used to enable ECS
node traffic and customer uplink ports.
The next figure shows how ports are intended to be used in a back-end network switch.
These port allocations are standard across all implementations.
Dell provides two 100 GbE S5232F back-end aggregation switches (AGG1 and AGG2)
with four 100 GbE VLT cables. These switches are referred to as the falcon and eagle
switches, as shown in Figure 7. The number of uplinks between each rack and the
aggregation switches ensures all the EXF900 nodes have line rate performance to any
node in any rack. This setup allows for low latency and high throughput across the entire
cluster.
Consult the following information for switch, switch port, and overall network planning
guidance:
• ECS Designer Spreadsheet Overview—A critical and informative video about the
design and provisioning process, especially about switches and their related
configuration, which guides users through important questions. Contact your
account team if you do not receive the installation scripts.
• ECS EX Series Hardware Guide—A guide that provides information about
supported hardware configurations, upgrade paths, and rack cabling requirements.
• ECS Networking and Best Practices—A white paper that describes details about
ECS networking, network hardware, network configurations, and network
separation.
• Use the ECS Designer throughout the design and deployment process. Record customer-
provided switch manufacturers, models, and firmware versions. Record ECS rack uplink
information along with switch and port identifiers and cabling descriptions.
• Reserve the necessary number of ports on the customer’s switch infrastructure.
• Understand the options for port channel configuration.
• See the ECS Networking and Best Practices white paper.
Customer-provided infrastructure
The following sections describe best practices associated with these external services.
Domain Name Each node in an ECS cluster requires both forward and reverse Domain Name System
System (DNS) entries and access to one or more domain name servers. Each workflow might
require unique DNS entries (and IP and load balancer configuration). DNS administrators
should be given ample time to meet with all necessary application and workflow engineers
so that the naming requirements can be fully understood and met.
Network Time Network Time Protocol (NTP) accessibility is essential for ECS to operate correctly.
Protocol Precise time is necessary for consistent clock synchronization between nodes in ECS,
ensuring clean log and journal entries for chunk timestamp values. Multisite ECS
deployments should use common sources. Include NTP server IP addresses and names
for each site in planning documents. For more information, see industry NTP best
practices.
IP addressing A minimum of one customer-provided IP address is required for each node. Use of local
and Dynamic or global load balancers requires additional IP addresses. If hosts will retrieve IP
Host addresses from a Dynamic Host Configuration Protocol (DHCP)’ server, record DHCP
Configuration server IP addresses and names. In addition, if traffic separation is used, additional IP
Protocol addresses might have to be reserved. A sufficient number of IP addresses or subnets
must be identified and reserved for deployment. If a separate network team exists,
planners should consult that team early during the planning and design phase. Network
teams are instrumental in deciding which deployment model works best for each site and
allocating and reserving IP addresses or subnets.
DHCP is used for assigning IP addresses. Many customers choose to use static IP
addresses, often with reservations in DHCP. For large scale-out environments, however,
DHCP could be leveraged to avoid hard coding many addresses.
Putting DHCP in a DMZ, which might not be part of the traditional model, is a common
requirement for cloud-based storage. Begin this conversation early to give ample time for
all involved to plan accordingly.
• If you are using DHCP, MAC addresses should be persistent so that nodes get the
same IP addresses during reboot.
Note:
• When two or more VDCs are federated, Network Address Translation (NAT) cannot be
used within unnamed public, named replication, and named management networks.
Load balancing Load balancers are highly recommended in ECS deployments to evenly distribute data
loads across all service nodes. Although customers are responsible for implementing and
configuring their deployed load balancers, Dell does provide recommendations and
suggestions about how to configure some of them with ECS workflows. Load-balancing
needs should be examined at the workflow level. Each workflow might justify or rule out
the use of load balancers. Use of load balancing is important for Atmos traffic, generally
recommended for S3, and can be used with NFS. Load balancers are not required for
CAS because CAS workflows have load balancing built into the client applications.
Both local and global load balancers are recommended where workflows justify their
need. In addition to distributing the load across ECS nodes, a load balancer provides high
availability (HA) for the ECS cluster by routing traffic to healthy nodes. For each workflow
that uses a load balancer, record each load balancer's IP address and Fully Qualified
Domain Name (FQDN) in planning documents.
Several white papers are available that provide information about how to implement a
load balancer with ECS:
• ECS with HAProxy
• ECS with F5
• ECS with KEMP
• Carefully configure and size load balancers so that that they do not reduce performance or
pose a bottleneck to performance. So that the load balancer will not hinder peak throughput
(based on PUT/GET), verify that the maximum transaction rate and bandwidth can pass
through the load balancer.
• Deploy redundant load balancers (according to manufacturer’s instructions) to eliminate single
points of failure.
• Only use DNS round robin if you cannot implement global DNS/load balancing, which is a
better approach.
• For best performance, terminate SSL connections at load balancer, passing traffic
unencrypted to the ECS nodes. This offloads the encryption from ECS to the load balancer.
For workflows carrying Personally Identifiable Information (PII), do not terminate SSL at the
load balancer. This is important to prevent clear text transmission of PII between routers and
ECS nodes.
• If SSL termination is required on ECS nodes themselves, use Layer 4 (TCP) to pass through
the SSL traffic to ECS nodes for handling. The certificates must be installed on the ECS nodes
and not on the load balancer.
• For NFS traffic, use only the high availability functionality of the load balancer.
• When federating three or more ECS sites, employ a global load balancing mechanism to
distribute load across sites to take advantage of ECS XOR storage efficiency. This is also
important to optimize the local object read hit rate in a global deployment.
• Enable web monitoring of traffic.
Authentication Many customers use local ECS authentication for management users. The management
providers users then define all object users, generally one per application. For customers that
leverage Active Directory or LDAP, groups or users are assigned to management roles,
as opposed to local user accounts. Consider the following information when using
authentication providers:
• Active Directory—An Active Directory domain group can be only the namespace
administrator for one namespace. Generally, storage administrators create an
Active Directory group for each namespace and assign Active Directory users to
that group. Namespace users can use the Web UI and only see things pertaining to
their namespace.
• Lightweight Directory Access Protocol (LDAP)—LDAP users can be administrative
users in ECS. LDAP groups are not used in ECS.
• Local—Local management users are not replicated between sites.
Simple Network Simple Network Management Protocol (SNMP) servers, also known as SNMP agents, are
Management optional. SNMP servers provide data about network-managed device status and statistics
Protocol to SNMP Network Management Station clients. ECS supports SNMP basic queries and
SNMP traps.
• For each SNMP server that will be used with an ECS deployment, plan for its IP
addresses, names, ports, version and type of SNMP service used, and community
name.
Firewalls Certain ports must be open for ECS traffic. Firewall rules must be modified to open the
ports that are required for ECS traffic.
• When firewalls are in use, refer to the latest version of the ECS Security Configuration
Guide for a complete list of ports to open; define rules in your firewall accordingly.
Provisioning
Provisioning Once the physical hardware is installed and deployed and the external services are
overview configured and available, you can provision VDCs, namespaces, replication groups,
users, buckets, and so on, to provide data access to the ECS storage platform.
Components that can be provisioned, as shown in Figure 9, include:
• Virtual Data Center (VDC)—A VDC is a geographical location defined as a single
ECS deployment within a site. Multiple VDCs can be federated and managed as a
unit.
• Storage pool—A storage pool can be thought of as a subset of nodes and its
associated storage belonging to a VDC. A storage pool can have any number of
nodes; the minimum recommended is five. A storage pool can be used as a tool for
physically separating data belonging to different applications.
• Replication group—Replication groups define where storage pool content is
protected and locations from which data can be read or written. Local replication
groups protect objects within the same VDC against disk, node, and rack failures.
Global replication groups span multiple VDCs and protect objects against disk,
node, rack, and site failures.
• Namespace—A namespace is a logical construct and is conceptually the same as a
“tenant.” The key characteristic of a namespace is that users from one namespace
generally cannot access objects belonging to another namespace. Namespaces
can represent a department within an organization or a group within a department.
• Buckets—Buckets are containers for object data. They are created in a namespace
to give applications access to data stored within ECS. In S3, these containers are
called “buckets,” and ECS has adopted this term. In Atmos, the equivalent of a
bucket is a “subtenant,” in Swift, the equivalent of a bucket is a “container,” and for
CAS, a bucket is a “CAS pool.” Buckets are global resources in ECS. Where the
replication group spans multiple sites, a bucket is similarly replicated across sites.
This section highlights several best practices and considerations for provisioning ECS.
When provisioning ECS, certain settings, as outlined in the following table, can only be
configured during creation time; once set, they cannot be modified. The sections that
follow the table provide additional details, where applicable.
Enabled if set in
namespace level
Naming Defining proper names for components is sometimes overlooked when provisioning,
conventions which might be problematic in some cases and, in other cases, inconvenient to change
once set. Use DNS-appropriate naming conventions for all ECS constructs—hosts,
clusters, VDCs, storage pools, replication groups, namespaces, and buckets. While some
constructs might allow additional characters, such as underscores, limiting characters to
those that are acceptable to DNS eliminates potential application-related conflicts that
might arise when valid namespace or bucket names that do not translate DNS are used.
Use only the following characters:
• Lower case letters (a-z). Do not use upper case letters.
• Numbers (0-9).
• Hyphens. Avoid the use of underscores.
Storage pool The first step in provisioning a site is creating a storage pool and assigning nodes to the
pool. Storage pools are logical constructs that contain physical nodes. They provide a
means to physically separate data on a cluster, if required. Erasure coding (EC) is
configured at the storage pool level during pool creation. The two EC options on ECS are
12+4 or 10+2 (cold storage). EC cannot be changed once created.
All cluster nodes can belong to a single storage pool. Implement the minimum number of
storage pools required at each VDC. Storage pools along with their associated replication
groups are integral in ECS indexing, so keeping them to the required minimum number
minimizes unnecessary overhead.
There are only two reasons to create additional storage pools within a VDC:
• EC is done at the storage pool level. Generally, only a maximum of two pools is
required when both 12+4 and 10+2 EC is used.
• If data must be physically separated between nodes, additional storage pools are
required. Again, keep the number of storage pools to a minimum.
A storage pool must have a minimum of five nodes and must have three or more nodes
with more than 10 percent free space for data or object writes to be successful. System
metadata, user data, and user metadata all co-exist on the same disk infrastructure.
Space is reserved so that ECS does not run out of space while persisting system
metadata. Storage pool space considerations are also important when sites are
replicated. Multisite environments require sufficient available space to handle temporary
and permanent site failures. When adding additional storage capacity to a site, expand
other sites as needed to accommodate space requirements.
• Size storage pools to account for the minimum free space that is needed to allow for
writes.
• For multisites, account for the space needed in case of temporary site outage and
permanent site removal.
• Keep the total capacity lower than 80 percent. When the storage pool reaches 90
percent of its capacity, it does not accept write requests, and it becomes a read-only
system.
Note:
• The minimum number of nodes for 12+4 EC is five. The minimum number of nodes for
10+2 EC is six.
Virtual Data A VDC identifies the nodes that are participating in an ECS instance. The first VDC must
Center contain the nodes from the local ECS instance. Additional VDCs can then be configured,
identifying all the nodes in that remote ECS instance. Adding remote VDCs to a local ECS
instance creates the federation of ECS instances. To create a replication group that
includes storage pools from a remote VDC, that remote VDC must be federated with the
local VDC.
Generally, a physical site has one VDC. Some organizations have multiple VDCs per
site—for example, one for engineering and one for operations—and can be federated
together for ease of management. However, we do not recommend creating replication
groups consisting of VDCs that are all in one local site to use the ECS XOR feature for
storage efficiency. In such scenario, when a site is down, more than one VDC becomes
unavailable.
Replication Replication groups allow grouping of storage pools from different geographically located
groups VDCs for replication of data across sites. Replication of data across sites has the
following advantages:
• In case of site failure, data is accessible from the surviving site or sites within the
replication group.
• For three or more sites, the ECS XOR feature provides better storage efficiency.
Create the minimum number of replication groups due to the indexing overhead
associated with storage pool and replication group pairs. Do not create more replication
groups that serve the same function. For example, two replication groups containing the
same set of VDC storage pools will add additional and unnecessary overhead.
In general, create one replication group for local data (nonreplicated) and one for
replicated data that spans all VDCs. Organizations with more than two sites might
consider additional replication groups for times when data should only be replicated to a
subset of all sites. Generally, one replication that spans all sites is sufficient. Compliance
requirements might dictate additional replication groups be created, for example, where
data privacy or sovereignty laws prohibit shared data across specific borders.
For scenarios in which there is massive data with short-term and long-term retention
times, especially for the backup and archive use case, it is better to create two separate
replication groups for them. Having short-term and long-term data that are mixed and
stored in a chunk within one replication group will impact the efficiency of the garbage
collection (space reclamation) in ECS. The separate replication groups can avoid this
problem because the garbage collection mechanism is based on each replication group.
When three or more sites are in a replication group, efficiencies in storage overhead can
be gained. ECS can XOR chunks written at two sites at a third site. To gain these
efficiencies, new writes must occur at two or more sites. To balance the efficiency across
all sites in a replication group, all sites must have relatively similar write workloads. This
benefit might not be appropriate for all workloads, especially in scenarios where WAN
latency creates unacceptable bottlenecks. However, there are tradeoffs when data is
spread across sites. For instance, there is an additional latency for WAN lookups of
objects not local to the VDC. Geo-caching does alleviate some of this; however, this
latency can pose some issues for applications if data is not in cache.
• When determining the number of replication groups, first consult Dell Professional
Services to discuss unique, complicated workload requirements and patterns.
• Carefully plan for replication groups. Careful planning is critical because replication
groups cannot be deleted.
• For three or more sites, distribute write requests across sites to take advantage of XOR
feature benefits. However, be aware of the latency tradeoffs for WAN lookups of objects
not in local cache.
• Federate all VDCs before attempting to create a replication group.
• Ensure that replication network bandwidth is at least equal to or greater than data
injection by fronted switch. Otherwise, use the QoS to throttle the network or call Dell
Service team to tune bases on the network situation.
• When replicating two EXF900s across sites, consider the potential performance impacts
over the WAN. Large ingest might put high load on the link, causing saturation or
delayed RPO. Also, a user or application might experience higher latency times on
remote reads and writes as compared to local requests.
Namespaces A namespace provides a way to organize or group items for purposes of separating the
space for different uses or purposes. It allows for the multitenancy feature in ECS. Unlike
storage pools and replication groups, many namespaces can be created. Some
environments might do well with a single namespace; in other cases, you might consider
creating namespaces as follows:
• One per business unit.
• One per application.
• One per reporting boundary.
• One per subscriber—for Internet Service Providers, for example. Dell ECS Test
Drive is configured in this manner, with a unique namespace created for each user.
As a workaround, namespaces can be used to allow targeting buckets in specific
replication groups for legacy applications. Some legacy applications cannot access a
specific storage pool so those applications might have to use buckets that access storage
pools through specific replication groups.
Buckets Buckets are containers for object data. Buckets are created in a namespace to give
applications access to data stored within ECS. Buckets are global resources in ECS.
Where the replication group spans multiple sites, a bucket is similarly replicated across
sites.
• Use buckets for specific environments, workflows, or uses. For instance, “dev,” “test,”
“finance,” “operations,” and so on.
• In multisite deployments, create buckets at the VDC site closest to the application
accessing and updating the objects. There is overhead involved with checking the latest
copy if the ownership of object is at a remote site.
• Follow DNS naming convention best practices when naming buckets. Bucket names
must be unique within a namespace.
Users and roles Users of ECS can be management users, object users, or both, as shown in Figure 10.
Management users have access to the ECS through its web portal and through the
management API. Object users have access to the ECS object interfaces for S3,
OpenStack Swift, Atmos, Centera CAS, and NFS. An object user uses a native object
interface (for example, the standard S3 API) to perform data access operations such as
reading, writing, or listing objects in buckets. They can also create or delete buckets. If a
user requires access to both the ECS portal and the ECS object interfaces, that user must
be created as both a management user and an object user. ECS does not know, for
example, that a single individual named Bob is both a management user and also an
object user. To ECS, management and object users are not correlated.
• When there is a large group of users to be given access to the object store, leverage
existing Active Directory or LDAP infrastructure.
• A common pitfall to make names unique and consistent with Active Directory names is
to create local accounts using a domain style. This implies that Active Directory or
LDAP performs authentication. However, in ECS, authentication is done using secret
keys; thus, to avoid confusion, do not use domain-style names as local accounts that
are not part of any domain.
• The user scope setting must be made before the first object user is created. That is,
once the first object user is created in a VDC, the user scope setting cannot be
changed. The default user scope setting is GLOBAL. If you intend to use ECS in a
multitenant configuration and want to ensure that tenants are not prevented from using
names that are in use in another namespace, change this default configuration to
NAMESPACE.
Note:
• Management users, whether local or domain based, are not replicated across geo-
federated VDCs. This means that all administrators except namespace administrator
must be created at each VDC that requires the account or role. Domain-based
namespace administrator accounts are excluded in this caveat because namespaces
are global constructs and, as such, their associated administrators are also global.
• Local management accounts are not replicated across sites, so a local user who is a
namespace administrator can only log in at the VDC at which the management user
account was created. If you want the same username to exist at another VDC, the user
must be created at the other VDC. Because the accounts are different, changes to a
same-named account at one VDC, such as a password change, are not propagated to
the account with the same name at the other VDC.
• Namespace administrator can only be the administrator of a single namespace.
Identity and ECS Identity and Access Management (IAM) enables users to have fine-grained access
Access to the ECS S3 resources securely. This functionality ensures that each access request to
Management an ECS resource is identified, authenticated, and authorized. ECS IAM allows users to
add users, roles, and groups. Users can also grant and restrict the access by adding
policies to the ECS IAM entities.
• Create an IAM user and give the user administrative permissions. Create individual
users for others who must access the ECS account. Provide each IAM user a separate
set of credentials and grant different permissions. For IAM users, the administrator can
change or revoke permissions anytime.
• Access keys provide systematic access to ECS. Do not share the credentials between
users. Applications should preferably use temporary credentials using an IAM role for
access to ECS.
• Change access keys regularly to avoid misuse of credentials when credentials have
been compromised. Delete IAM user credentials that are no longer required.
• When creating IAM policies, follow the standard security advice of granting least
privilege or granting only the permissions that are required to perform a task.
• Do not define permissions for individual IAM users who perform similar job functions.
Create groups, define the permissions for each group, and assign IAM users to groups.
• Use IAM roles to permit users to access resources.
Temporary site In a multisite ECS deployment, ECS offers an access during outage (ADO) feature. This
outage feature allows access to data when there is a temporary disconnect or site outage
between two sites or a failure of one site due to a power failure or natural disaster. If an
application requires access because of a temporary site outage (TSO), it is best to enable
ADO when you create the bucket.
• If possible, use a Global Load Balancer to handle failover so that requests are
automatically directed to an available site in case of a failure.
Note:
• FS buckets (NFS) are read-only during TSO.
• During rejoin, conflict resolution favors secondary site, though it is nondeterministic.
• Listing of some buckets might fail during a TSO.
Security
Introduction to Besides assigning specific roles for certain access and control for users for security,
ECS security additional measures must be taken to make ECS less vulnerable to unwarranted access,
common user mistakes, or security data breach. ECS provides several features to enable
security of customers’ data, such as encryption, platform lockdown, retention, and so on.
The next sections describe available features and best practices for protecting ECS.
Protection from ECS features that protect against unwarranted access include:
unwarranted • Platform lockdown—Disables SSH access to nodes
access
• Retention policies—Limit the ability to change records or data under retention
• Audit events—Record changes in the system configuration, and track logins and
sudo commands run on node, bucket operations such as setting bucket
permissions, and user operations such as set or delete password.
• Immediately change the ECS default account password for administrator on nodes and
for root on ECS portal.
• Use individual user accounts for day-to-day administration as opposed to the integrated
ECS account.
• Use the “Platform Lockdown” feature if ECS nodes must not be accessible by SSH.
• Set appropriate retention for objects to protect from accidental deletions.
• Use SSL for additional security.
• Monitor “unauthorized” access and modifications through audit events.
Data at Rest ECS provides server-side encryption to protect data on disk. Key management is either
Encryption done automatically or specified by user. Enabling this ECS Data at Rest Encryption
(D@RE) option is done at the namespace level or bucket level, allowing customers to
control what level handles encryption. If encryption is at the namespace level, all buckets
within namespace are encrypted unless encryption is disabled at bucket creation time. If
encryption is not enabled at namespace level, buckets can enable encryption individually
at create time.
Object Lock Dell ECS Object Lock protects object versions from accidental or malicious deletion such
as a ransomware attack. It provides this protection by allowing object versions to enter a
write-once, read-many (WORM) state where access is restricted based on attributes set
on the object version.
in compliance mode, its retention mode cannot be changed, and its retention
period cannot be shortened. Compliance mode helps ensure that an object
version cannot be overwritten or deleted during the retention period.
Object Lock requires the use of versioned buckets. Enabling Object Lock on a bucket
automatically enables versioning. Once Object Lock is enabled, it is not possible to
disable it or suspend versioning for the bucket. Object locks apply to individual object
versions only; different versions of a single object can have different retention modes and
periods.
Application development
Application ECS was designed predominately for archival, content repository, Internet of things, video
development surveillance, and modern applications. Consider the following information when designing
overview an application for ECS:
• ECS was designed mainly for applications or use cases that do not require high
IOPS.
• ECS has a 99.999 percent success rate for transactions. Handle failures
accordingly by either using the integrated retry mechanism in most software
development kits (SDKs) or creating appropriate error handlers.
• Use an SDK for your programming language to facilitate your efforts.
• Use ECS S3 if you want to take advantage of ECS features.
• Use AWS SDK if you want to maintain compatibility with AWS.
• Use the protocol that best fits your needs and skills—S3, Swift, or Atmos.
ECS provides a set of REST APIs for customers to use for data access and management
of ECS through their applications. The next sections highlight some best practices and
considerations for developing or customizing an application for ECS, as related to
namespaces and buckets, objects, retention, extensions, security, and data management.
• Use pre-signed URLs. ECS supports pre-signed URLs to enable users to access objects
without needing credentials.
• Object update frequency should be low since object storage platforms are not designed
for transactional workloads but ideally for static content such sensor data, images,
videos.
• Only one application should write to each bucket. Other applications may read from
them, but not write.
• Use the object copy operation instead of downloading and uploading the object again.
• Beware of the concurrent requests for the same object.
• If order needs to be guaranteed, use Conditional PUTs (ECS extensions).
• If there is no external load balancing in your ECS deployment, implement client-side load
balancing to distribute load across ECS nodes for increased performance.
• Use “Range Reads” for listing objects. Align the range to your application and request
only what is needed. There are “Markers,” “NextMarker,” and “MaxKeys” parameters
available to paginate listings.
ECS extensions ECS APIs have support for additional extensions that are not available in the standard S3
APIs. These features extend ECS capabilities and provide an advantage over other
solutions.
Metadata search
ECS provides a facility for metadata search of objects to improve performance of queries.
ECS maintains an index of the objects in a bucket, based on their associated metadata.
This index allows S3 object clients to search for objects within buckets based on the
indexed metadata using a rich query language. Search indexes can be up to 30 system
and user metadata fields per bucket. The indexes are configured at the time of bucket
creation through the ECS portal, ECS Management REST API, or S3 REST API.
Considerations for developing applications using the metadata search capability include:
• Supported operations include “<, >, <=, >=, =, !=, AND/OR.”
• Metadata search must be enabled during bucket creation. Also, fields and values
must be specified at bucket creation time.
• Performance is lower for accessing objects on buckets configured for metadata
search so use the feature wisely and after careful consideration. The more indexes
created the larger the performance impact.
Byte range extensions
Unlike AWS S3 in which objects are immutable, ECS REST APIs provide byte range
extensions to update and read parts of an object. Features that ECS provides as part of
this extension include:
• Partial reads and updates within an object (which still maintains append-only
behavior).
• Overwrite of part of an object by providing only the starting offset in the data
request.
• Ability to atomically append data to an object without specifying an offset, and the
offset is returned in the response. This is useful for multiclient streams, for example,
syslog, sensor data.
For short-term and long-term data retention, it is better to put them in separate replication
groups. The efficiency of garbage collection (space reclamation) is affected if the short-
term and long-term data is stored in a chunk within one replication group.
• Consider creating two separate replication groups for short-term and long-term data, if
needed.
• The auto-commit period should be as short as possible to minimize the chance that the
file is modified while waiting for conversion to write-only. A 24-hour maximum auto-
commit period should be observed, but use a shorter period if possible.
Security Security is important to safeguard your credentials and data being transmitted over the
Internet.
ECS supports the use of external key servers to store top-level key encrypting keys
(KEKs). Customers can take advantage of the additional layer of security provided by
HSM-based key protection and the latest encryption technology provided by specialized
key management servers. In addition, storing top-level key information outside the
appliance protects data stored on ECS against loss of the entire appliance.
Object version To avoid memory issues and 500 errors due to large object version numbers, we suggest
enforcing a limit and rejecting requests to create versions above the limit.
• For ECS versions earlier than 3.6, customers should keep the object version less than
50k.
Note:
• Beginning with ECS 3.6, the limitation is set as 50k and is enforced on new installs (not
upgrades).
• When object version limitation settings are enabled, there will be two version thread
alerts (50 percent and 80 percent) in the system.
S3 Select Beginning with version 3.7, ECS supports S3 Select, which enables applications to
retrieve only a subset of data from an object by using simple SQL expressions. By using
s3 Select to retrieve only the data needed by your application, you can achieve drastic
performance increases.
Operations
Introduction to Maintaining the health of ECS requires the use of tools such as the ECS portal to monitor
ECS operations overall system-level health and performance information, syslog, and SNMP. The
best practices following sections provide information about best practices for ECS administrators’ day-to-
day operations such as monitoring, Dell Secure Connect Gateway (formerly ESRS), and
product alerts and updates, as well as best practices for hardware capacity expansion.
• Look for unevenness of CPU, memory, and network bandwidth between nodes.
• Become familiar with the performance of the system and the metrics that are expected
over time so that if rates are out of the normal range investigation can be initiated.
• Do not let ECS get too full. Account for rebalancing time when expanding.
• Look for a higher than normal number of failed requests and determine root cause.
• Regularly check events and audit logs.
Dell Secure Dell Secure Connect Gateway provides secure two-way connection between customer-
Connect owned Dell equipment and Dell customer service. It provides faster problem resolution
Gateway with proactive remote monitoring and repair. Although use of Dell Secure Connect
Gateway Services is optional, it is highly recommended and should be included during
deployment planning.
Product alerts We recommend that ECS administrators sign up to receive product updates and alerts as
and updates follows:
1. At www.dell.com/support, click Preferences.
2. On the Account Settings and Preferences page, select the Subscription and
Alerts tab, and subscribe to product updates.
3. In the Alert section on the same page, subscribe to product advisories.
The minimum recommended subscriptions are for ECS Software or ECS
Appliance and ECS Software with Encryption. A search for ECS reveals all
available subscription options.
Hardware The best practice is to add additional capacity to a VDC before capacity utilization
capacity reaches 85%. If your rate of utilization growth is high, start adding additional capacity at
expansion 80% utilization. The goal is to have additional capacity available for the system when
utilization reaches 85%.
When adding an additional rack to an existing configuration, the following are the best
practices.
• Size/capacity of nodes added must be equal to or greater than the size/capacity of the
existing nodes.
• If capacity utilization of the storage pool is below 80 percent, the minimum number of
nodes to be added in the new rack is X:
▪ X = 3, for storage pools using 12+4
▪ X = 4, for storage pools using 10+2
• If capacity utilization of the storage pool is at or above 80 percent, the minimum number
of nodes to be added must be the same as the number of existing nodes (in a rack).
When adding capacity in the same rack, the following are the best practices.
Hardware capacity expansion best practice for adding capacity in the same rack
• Size/capacity of nodes added must be equal to or greater than the size/capacity of the
existing nodes.
• If capacity utilization of the storage pool is below 80 percent, the minimum number of
nodes to be added in the same rack is 1:
• If capacity utilization of the storage pool is at or above 80 percent, the minimum number
of nodes to be added in the same rack is X:
▪ X = 3, for storage pools using 12+4
▪ X = 4, for storage pools using 10+2
When new installs require multiple racks, divide the number of nodes equally, as much as
possible. Keeping in mind the following as a best practice.
Use case 1: customer wants to deploy eight nodes of EX5000 Single node:
• This config requires two racks because one rack can only fit seven EX5000 nodes.
• Rack 1 – 4 EX5000 nodes
• Rack 2 – 4 EX5000 nodes
Use case 2: customer wants to deploy 17 nodes of EX500.
• This config requires two racks because one rack can only fit 16 EX500 nodes.
• Rack 1 – 9 EX500 nodes
• Rack 2 – 8 EX500 nodes
Conclusion
Most of the best practices outlined in this document are also described in Dell product
documentation. We recommend that you read and become familiar with all existing ECS
documentation. Work closely with Dell personnel during the planning phase and refer to
the appropriate hardware specifications.
Storage technical documents and videos provide expertise that helps to ensure customer
success with Dell storage platforms.