Isilon Network Design Considerations PDF
Isilon Network Design Considerations PDF
ABSTRACT
This white paper explores design considerations of Isilon’s external network to ensure
maximum performance and an optimal user experience.
Version 4.3
November 2018
IPV6 ......................................................................................................................................... 33
Why IPv6? ....................................................................................................................................... 33
Security.................................................................................................................................................... 33
Efficiency ................................................................................................................................................. 33
Multicast .................................................................................................................................................. 33
Quality of Service ..................................................................................................................................... 33
IPv6 Addressing .............................................................................................................................. 34
NOTICE TO READERS
It is important to understand that the network design considerations stated in this document are based on general network design and
are provided as guidance to Isilon administrators. As these are considerations, all of these may not apply to each workload. It is
important to understand each consideration and confirm if it pertains to a specific environment.
Each network is unique, not only from a design perspective but also from a requirements and workloads perspective. Before making
any changes based on the guidance in this document, it is important to discuss modifications with the Network Engineering team.
Additionally, as a customary requirement for any major IT implementation, changes should first be tested in a lab environment that
closely mimics the workloads of the live network.
VERSION HISTORY
Version Date Comments
For the following sections, it is important to understand the differences between Distribution and Access switches. Typically, distribution
switches perform L2/L3 connectivity while Access switches are strictly L2. The Figure below provides the representation for each.
Single Points of Failure: Ensure the network design has layers of redundancy. Dependence on a single device or link relates to a
loss of resources or outages. The enterprise requirements take into account risk and budget, guiding the level of redundancy.
Redundancy should be implemented through backup paths and load sharing. If a primary link fails, traffic uses a backup path. Load
sharing creates two or more paths to the same endpoint and shares the network load. When designing access to Isilon nodes, it is
important to assume links and hardware will fail, ensuring access to the nodes survives those failures.
Application and Protocol Traffic: Understanding the application data flow from clients to the Isilon cluster across the network allows
for resources to be allocated accordingly while minimizing latency and hops along this flow.
Available Bandwidth: As traffic traverses the different layers of the network, the available bandwidth should not be significantly
different. Compare this available bandwidth with the workflow requirements.
Minimizing Latency: Ensuring latency is minimal from the client endpoints to the Isilon nodes maximizes performance and
efficiency. Several steps can be taken to minimize latency, but latency should be considered throughout network design.
Prune VLANs: It is important to limit VLANs to areas where they are applicable. Pruning unneeded VLANs is also good practice. If
unneeded VLANs are trunked further down the network, this imposes additional strain on endpoints and switches. Broadcasts are
propagated across the VLAN and impact clients.
VLAN Hopping: VLAN hopping has two methods, switch spoofing and double tagging. Switch spoofing is when a host imitates the
behavior of a trunking switch, allowing access to other VLANs. Double tagging is a method where each packet contains two VLAN
tags, with the assigned or correct VLAN tag is empty and the second as the VLAN where access is not permitted. It is
recommended to assign the native VLAN to an ID that is not in use. Otherwise tag the native VLAN to avoid VLAN hopping,
allowing a device to access a VLAN it normally would not have access. Additionally, only allow trunk ports between trusted devices
and assign access VLANs on ports that are different from the default VLAN.
The Looped Design Model extends VLANs between the aggregation switches, thus creating the looped topology. To prevent actual
loops, Spanning Tree is implemented, using Rapid PVST+ or MST. For each path, a redundant path also exists, which is blocking until
the primary path is not available. Access layer uplinks may be used to load balance VLANs. A key point to consider with the Looped
Access Topology is the utilization of the inter-switch link between the Distribution switches. The utilization must be monitored closely as
this is used to reach active services.
The Looped Triangle Access Topology supports VLAN extension and L2 adjacency across the Access layer. Through the use of STP
and dual homing, the Looped Triangle is extremely resilient. Stateful services are supported at the aggregation layer and quick
convergence with 802.1W/S.
Utilizing the Triangle Looped Topology allows for multiple Access Switches to interface with the external network of the Isilon Scale-Out
NAS environment. Each Isilon node within a cluster is part of a distributed architecture which allows each node to have similar
properties regarding data availability and management.
Implementing link aggregation is neither mandatory nor is it necessary, rather it is based on workload requirements and is
recommended if a transparent failover or switch port redundancy is required.
Link aggregation assumes all links are full duplex, point to point, and at the same data rate, providing graceful recovery from link
failures. If a link fails, traffic is automatically sent to the next available link without disruption.
It is imperative to understand that link aggregation is not a substitute for a higher bandwidth link. Although link aggregation combines
multiple interfaces, applying it to multiply bandwidth by the number of interfaces for a single session is incorrect. Link aggregation
distributes traffic across links. However, a single session only utilizes a single physical link to ensure packets are delivered in order
without duplication of frames.
As part of the IEEE 802.1AX standard, the Frame Distributor does not specify a distribution algorithm across aggregated links, but
enforces that frames must be sent in order without duplication. Frame order is maintained by ensuring that all frames of a given session
are transmitted on a single link in the order that they are generated by the client. The mandate does not allow for additions or
modifications to the MAC frame, buffering, or processing to re-order frames by the Frame Distributor or Collector.
Thus, the bandwidth for a single client is not increased, but the aggregate bandwidth of all clients increases in an active/active
configuration. The aggregate bandwidth is realized when carrying multiple simultaneous sessions and may not provide a linear multiple
of each link’s data rate, as each individual session utilizes a single link.
Another factor to consider is depending on the workload, certain protocols may or may not benefit from link aggregation. Stateful
protocols, such as NFSv4 and SMBv2 benefit from link aggregation as a failover mechanism. On the contrary, SMBv3 Multichannel
automatically detects multiple links, utilizing each for maximum throughput and link resilience.
Each vendor has a proprietary implementation of Multi-Chassis Link Aggregation, but externally the virtual switch created is compliant
with the IEEE 802.1AX standard.
It is important to recognize that regarding bandwidth, the concepts discussed for single switch Link Aggregation still apply to Multi-
Chassis Link Aggregation. Additionally, as the multiple switches form a single virtual switch, it is important to understand what happens
if the switch hosting the control plane fails. Those effects vary by the vendor’s implementation but will impact the network redundancy
gained through Multi-Chassis Link Aggregation.
LATENCY
Latency in a packet-switched network is defined as the time from when a source endpoint sends a packet to when it is received by the
destination endpoint. Round trip latency, sometimes referred to as round-trip delay, is the amount of time for a packet to be sent from
the source endpoint to the destination endpoint, and returned from the destination to the source endpoint.
Minimal latency in any transaction is imperative for several reasons. IP endpoints, switches, and routers operate optimally without
network delays. Minimal latency between clients and an Isilon node ensures performance is not impacted. As latency increases
between two endpoints, this may lead to several issues that degrade performance heavily, depending on the application.
In order to minimize latency, it is important to measure it accurately between the endpoints. For assessing Isilon nodes, this is
measured from the clients to a specified node. The measurement could use the IP of a specific node or the SmartConnect hostname.
After configuration changes are applied that impact latency, it is important to confirm the latency has indeed decreased. When
attempting to minimize latency, consider the following points:
Hops: Minimizing hops required between endpoints decreases latency. The implication is not to drag cables across a campus, but
the goal is to confirm if any unnecessary hops could be avoided. Minimizing hops applies at the physical level with the number of
switches between the endpoints but also applies logically to network protocols and algorithms.
ASICs: When thinking about network hops it also important to consider the ASICs within a switch. If a packet enters through one
ASIC and exits through the other, latency could increase. If at all possible, it is recommended to keep traffic as part of the same
ASIC to minimize latency.
Network Congestion: NFS v3, NFSv4 and SMB employ the TCP protocol. For reliability and throughput, TCP uses windowing to
adapt to varying network congestion. At peak traffic, congestion control is triggered, dropping packets, and leading TCP to utilize
smaller windows. In turn, throughput could decrease, and overall latency may increase. Minimizing network congestion ensures it
does not impact latency. It is important to architect networks that are resilient to congestion.
Routing: Packets that pass through a router may induce additional latency. Depending on the router configuration, packets are
checked for a match against defined rules, in some cases requiring packet header modification.
MTU Mismatch: Depending on the MTU size configuration of each hop between two endpoints, an MTU mismatch may exist.
Therefore, packets must be split to conform to upstream links, creating additional CPU overhead on routers and NICs, creating
higher processing times, and leading to additional latency.
Firewalls: Firewalls provide protection by filtering through packets against set rules for additional steps. The filtering process
consumes time and could create further latency. Processing times are heavily dependent upon the number of rules in place. It is
good measure to ensure outdated rules are removed to minimize processing times.
The difference between these terms is important when troubleshooting. If an Isilon node supports 40 GbE, it does not necessarily mean
the throughput is 40 Gb/s. The actual throughput between a client and an Isilon node is dependent on all of the factors between the two
endpoints and may be measured with a variety of tools.
During the design phase of a data center network, it is important to ensure bandwidth is available throughout the hierarchy, eliminating
bottlenecks and ensuring consistent bandwidth. The bandwidth from the Access Switches to the Isilon nodes should be a ratio of what
is available back to the distribution and core switches. For example, if an Isilon cluster of 12 nodes has all 40 GbE connectivity to
access switches, the link from the core to distribution to access should be able to handle the throughput from the access switches.
Ideally, the link from the core to distribution to access should support roughly a bandwidth of 480 Gb (12 nodes * 40 GbE).
The amount of data that can be transmitted across a link is vital to understanding Transmission Control Protocol (TCP) performance.
Achieving maximum TCP throughput requires that data must be sent in quantities large enough before waiting for a confirmation
message from the receiver, which acknowledges the successful receipt of data. The successful receipt of the data is part of the TCP
connection flow. The diagram below explains the steps of a TCP connection and where BDP is applicable:
In the diagram above, four states are highlighted during a TCP connection. The following summarizes each state:
1. TCP Handshake – Establishes the TCP connection through an SYN, SYN/ACK, ACK
2. Data transmitted to the server. BDP is the maximum amount of data that can be sent at this
step.
Once the BDP rate is calculated, the TCP stack is tuned for the maximum throughput, which is discussed in the next section. The BDP
is calculated by multiplying the bandwidth of the network link (bits/second) by the round-trip time (seconds).
For example, a link with a bandwidth of 1 Gigabit per second and a 1 millisecond round trip time, would be calculated as:
Bandwidth * RTT = 1 Gigabit per second * 1 millisecond =
1,000,000,000 bits per second * 0.001 seconds = 1,000,000 bits = 0.125 MB
Isilon OneFs is built on FreeBSD. An Isilon cluster is composed of nodes with a distributed architecture, and each node provides
external network connectivity. Adapting the TCP stack to bandwidth, latency, and MTU requires tuning to ensure the cluster provides
optimal throughput.
In the previous section, BDP was explained in depth and how it is the amount of data that can be sent across a single TCP message
flow. Although the link supports the BDP that is calculated, the OneFS system buffer must be able to hold the full BDP. Otherwise, TCP
transmission failures may occur. If the buffer does not accept all of the data of a single BDP, the acknowledgment is not sent, creating a
delay, and the workload performance is degraded.
The OneFS network stack must be tuned to ensure on inbound, the full BDP is accepted, and on outbound, it must be retained for a
possible retransmission. Prior to modifying the TCP stack, it is important to measure the current I/O performance and then again after
implementing changes. As discussed earlier in this document, the tuning below is only guidance and should be tested in a lab
environment before modifying a production network.
The spreadsheet below provides the necessary TCP stack changes based on the bandwidth, latency, and MTU. The changes below
must be implemented in the order below and all together on all nodes. Modifying only some variables could lead to unknown results.
After making changes, it is important to measure performance again.
Note: The snippet below is only for representation. It is imperative to input the calculated bandwidth, latency, and MTU specific to each
environment.
Download the Isilon Network Stack Tuning spreadsheet at the following link:
http://www.emc.com/collateral/tool/h164888-isilon-onefs-network-stack-tuning.xlsm
The IEEEs 802.3x standard defines an Ethernet Flow Control mechanism at the data link layer. It specifies a pause flow control
mechanism through MAC Control frames in full duplex link segments. For Flow Control to be successfully implemented, it must be
configured throughout the network hops that the source and destination endpoints communicate through. Otherwise, the pause flow
control frames will not be recognized and will be dropped.
Isilon OneFS listens for pause frames but does not send them, meaning it is only applicable when an Isilon node is the source. OneFS
recognizes pause frames from the destination.
Most networks today do not send pause frames, but certain devices still send them. In particular, Cisco Nexus Switches with the Fabric
Extender Modules have been known to send pause frames.
If the network or cluster performance does not seem optimal, it is easy to check for pause frames on an Isilon cluster. To check for
pause frames received by an Isilon cluster, execute the following command from the shell:
isi_for_array -a <cluster name> sysctl dev | grep pause
Check for any values greater than zero. In the example, below, the cluster has not received any pause frames. If values greater than
zero are printed consistently, Flow Control should be considered.
If pause frames are reported, it is important to discuss these findings with the Network Engineering team before making any changes.
As mentioned above, changes must be implemented across the network, ensuring all devices recognize a pause frame. Please contact
the switch manufacturer’s support teams or account representative for specific steps and caveats for implementing flow control before
proceeding.
This section provides considerations for SyncIQ pertaining to external network connectivity. For more information on SyncIQ, see the
Best Practices for Data Replication with EMC Isilon SyncIQ white paper.
Dedicated static SmartConnect zones are required for SyncIQ replication traffic. As with any static SmartConnect zone, the dedicated
replication zone requires one IP address for each active logical interface. For example, in the case of two active physical interfaces,
10gige-1 and 10gige-2, requiring two IP addresses. However, if these are combined with link aggregation, interface 10gige-agg-1 only
requires one IP address. Source-restrict all SyncIQ jobs to use the dedicated static SmartConnect zone on the source cluster, and
repeat the same on the target cluster.
By restricting SyncIQ replication jobs to a dedicated static SmartConnect Zone, replication traffic may be assigned to specific nodes,
reducing the impact of SyncIQ jobs on user or client I/O. The replication traffic is directed without reconfiguring or modifying the
interfaces participating in the SmartConnect zone.
For example, consider a data ingest cluster for a sports television network. The cluster must ingest large amounts of data recorded in
4K video format. The data must be active immediately, and the cluster must store the data for extended periods of time. The sports
television network administrators want to keep data ingestion and data archiving separate, to maximize performance. The sports
television network purchased two types of nodes: H500s for ingesting data, and A200s for the long-term archive. Due to the extensive
size of the the data set, SyncIQ jobs replicating the data to the disaster recovery site, have a significant amount of work to do on each
pass. The front-end interfaces are saturated on the H500 nodes for either ingesting data or performing immediate data retrieval. The
CPUs of those nodes must not be effected by the SyncIQ jobs. By using a separate static SmartConnect pool, the network
administrators can force all SyncIQ traffic to leave only the A200 nodes and provide maximum throughput on the H500 nodes.
SmartConnect acts as a DNS delegation server to return IP addresses for SmartConnect zones, generally for load-balancing
connections to the cluster. The IP traffic involved is a four-way transaction shown in the figure below.
1. [Blue Arrow – Step 1] The client makes a DNS request for sc-zone.domain.com by sending a DNS request packet to the site
DNS server.
2. [Green Arrow – Step 2] The site DNS server has a delegation record for sc-zone.domain.com and sends a DNS request to
the defined nameserver address in the delegation record, the SmartConnect service (SmartConnect Service IP Address).
3. [Orange Arrow – Step 3] The cluster node hosting the SmartConnect Service IP (SSIP) for this zone receives the request,
calculates the IP address to assign based on the configured connection policy for the pool in question (such as round robin),
and sends a DNS response packet to the site DNS server.
4. [Red Arrow – Step 4] The site DNS server sends the response back to the client.
Throughout the network design phase, consider that a single SSIP is defined per subnet. However, under each subnet, pools are
defined, and each pool will have a unique SmartConnect Zone Name. It is important to recognize that multiple pools leads to multiple
SmartConnect Zones utilizing a single SSIP. As shown in the diagram above, a DNS provider is defined per Groupnet, which is a
feature in OneFS 8.0 and newer releases. In releases before 8.0, a DNS per Groupnet was not supported.
Generally speaking, starting with Round Robin is recommended for a new implementation or if the workload is not clearly defined. As
the workload is further defined and based on the Round Robin experience, another policy can be tested in a lab environment.
*Metrics are gathered every 5 seconds for CPU Utilization and every 10 seconds for Connection Count and Network Throughput. In
cases where many connections are created at the same time, these metrics may not be accurate, creating an imbalance across nodes.
As discussed previously, the above policies mapping to workloads are general guidelines. Each environment is unique with distinct
requirements. It is recommended to confirm the best load balancing policy in a lab environment which closely mimics the production
environment. For additional details on the description of each SmartConnect load balancing policy, refer to the Isilon SmartConnect
White Paper.
Once the IP address pool is defined, under the ‘SmartConnect Advanced’ Section, an ‘Allocation Method’ may be selected. By default,
this option is grayed out as ‘Static’ if a SmartConnect Advanced license is not installed. If a SmartConnect Advanced license is
installed, the default ‘Allocation Method’ is still ‘Static’, but ‘Dynamic’ may also be selected.
The Static Allocation Method assigns a single persistent IP address to each interface selected in the pool, leaving additional IP
addresses in the pool unassigned if the number of IP addresses is greater than interfaces. In the event a node or interface becomes
unavailable, this IP address does not move to another node or interface. Additionally, when the node or interface becomes unavailable,
it is removed from the SmartConnect Zone and new connections will not be assigned to the node. Once the node is available again,
SmartConnect adds it back into the zone and assigns new connections.
On the contrary, the Dynamic Allocation Method splits all available IP addresses in the pool across all selected interfaces. OneFS
attempts to assign the IP addresses evenly if at all possible, but if the interface to IP address ratio is not an integer value, a single
interface may have more IP addresses than another.
DYNAMIC FAILOVER
Combined with the Dynamic Allocation Method, Dynamic Failover provides high-availability by transparently migrating IP addresses to
another node when an interface is not available. If a node becomes unavailable, all of the IP addresses it was hosting are re-allocated
across the new set of available nodes in accordance with the configured failover load balancing policy. The default IP address failover
policy is round-robin, which evenly distributes IP addresses from the unavailable node across available nodes. As the IP address
remains consistent, irrespective of which node it resides on, this results in an transparent failover to the client, providing seamless high
availability.
The other available IP address failover policies are the same as the initial client connection balancing policies, i.e., connection count,
throughput, or CPU usage. In most scenarios, round-robin is not only the best option, but also the most common. However, the other
The examples below illustrate the concepts of how the IP address quantity impacts user experience during a failover and these are the
guidelines to use when determining IP address quantity.
In this scenario, 150 clients are actively connected to each node over NFS using a round-robin connection policy. Most NFSv3 mounted
clients perform a nslookup only the first time that they mount, never performing another nslookup to check for an updated IP address. If
the IP address changes, the NFSv3 clients have a stale mount and retain that IP address.
Suppose that one of the nodes fails, as shown in the following figure:
Figure 9. Dynamic Zone – 4 Node Cluster with 1 IP Address Per Node – 1 Node Offline
A SmartConnect Zone with a dynamic allocation strategy immediately hot-moves the one IP address on the failed node to one of the
other three nodes in the cluster. It sends out a number of gratuitous address resolution protocol (ARP) requests to the connected
switch, so that client I/O continues uninterrupted.
Although all four IP addresses are still online, two of them—and 300 clients—are now connected to one node. In practice,
SmartConnect can fail only one IP to one other place, and one IP address and 150 clients are already connected to each of the other
nodes. The failover process means that a failed node has just doubled the load on one of the three remaining nodes while not
This example considers the same four node cluster as the previous example, but now following the rule of N*(N-1). In this case 4*(3-1)
= 12, equaling three IPs per node, as shown in the figure below.
Figure 10. Dynamic Zone – 4 Node Cluster with 3 IP Addresses Per Node
When the same failure event as the previous example occurs, the three IP addresses are spread over all the other nodes in that
SmartConnect zone. This failover results in each remaining node having 200 clients and four IP addresses. Although performance may
degrade to a certain degree, it may not be as drastic as the failure in the first scenario, and the experience is consistent for all users, as
shown in the following figure.
Figure 11. Dynamic Zone – 4 Node Cluster with 3 IP Addresses Per Node – 1 Node Offline
Client access protocols are either stateful or stateless. Stateful protocols are defined by the client/server relationship having a session
state for each open file. Failing over IP addresses to other nodes for these types of workflows means that the client assumes that the
session state information was carried over. Session state information for each file is not shared among Isilon nodes. On the contrary,
stateless protocols are generally accepting of failover without session state information being maintained, except for locks.
SMB
Typically, SMB performs best in static zones. In certain workflows, SMB is preferred in a dynamic zone, because IP address
consistency is required. It may not only be a workflow requirement, but could also be an IT administrative dependence. SMB actually
works well with dynamic zones, but it is essential to understand the protocol limitations. SMB preserves complex state information per
session on the server side. If a connection is lost and a new connection is established with dynamic failover to another node, the new
node may not be able to continue the session where the previous one had left off. If the SMB workflow is primarily reads, or heavier on
the read side, the impact of a dynamic failover will not be as drastic, as the client can re-open the file and continue reading. Conversely,
if an SMB workflow is primarily writes, the state information is lost, and the writes could be lost, possibly leading to file corruption.
Hence, in most cases, static zones are suggested for SMB, but again it is workflow dependent. Prior to a major implementation, it is
recommend to test the workflow in a lab environment, understanding limitations and the best option for a specific workflow.
NFS
The NFSv2 and NFSv3 protocols are stateless, and in almost all cases perform best in a dynamic zone. The client does not rely on
writes unless commits have been acknowledged by the server, enabling NFS to failover dynamically from one node to another.
The NFSv4 protocol introduced state making it a better fit for static zones in most cases, as it expects the server to maintain session
state information. However, OneFS 8.0 introduced session-state information across multiple nodes for NFSv4, making dynamic pools
the better option. Additionally, most mountd daemons currently still behave in a v3 manner, where if the IP address it’s connected to
becomes unavailable, this results in a stale mount. In this case, the client does not attempt a new nslookup and connect to a different
node.
Again, as mentioned above, test the workflow in a lab environment to understand limitations and the best option for a specific workflow.
HDFS
The requirements for HDFS pools have been updated with the introduction of new OneFS features and as HDFS environments have
evolved. During the design phases of HDFS pools, several factors must be considered. The use of static versus dynamic pools are
impacted, by the following:
Node Pools – is the cluster a single heterogeneous node type or do different Node Pools exist
Availability of IP addresses
The factors above coupled with the workflow requirements determine the pool implementation. Please reference the HDFS Pool Usage
and Assignments section in the EMC Isilon Best Practices Guide for Hadoop Data Storage for additional details and considerations with
HDFS pool implementations.
IP ADDRESS QUANTIFICATION
This section provides guidance for determining the number of IP addresses required for a new cluster implementation. The guidance
provided below does not apply to all clusters, and is provided as a reference for the process and considerations during a new cluster
implementation.
During the process of implementing a new cluster and building the network topology, consider the following:
Calculate the number of IP addresses that are needed based on future cluster size, not the initial cluster size.
Do not share a subnet with other application servers. If more IP addresses are required, and the range is full, re-addressing an
entire cluster and then moving it into a new VLAN is disruptive. These complications are prevented with proper planning.
Static IP pools require one IP address for each logical interface that will be in the pool. Each node provides 2 interfaces for
external networking. If Link Aggregation is not configured, this would require 2*N IP addresses for a static pool.
For optimal load-balancing, during a node failure, IP pools with the Dynamic Allocation Method require the number of IP
addresses at a minimum of the node count and a maximum of the client count. For example, a 12-node SmartConnect zone
and 50 clients, would have a minimum of 12 and maximum of 50 IP addresses. In many larger configurations, defining an IP
address per client is not feasible, and in those cases the optimal number of IP addresses is workflow dependent and based on
lab testing. In the previous examples, N*(N-1) is used to calculate the number of IP addresses, where N is the number of
nodes that will participate in the pool. For larger clusters, this formula may not be feasible due to the sheer number of IP
addresses. Determining the number of IP addresses within a Dynamic Allocation pool varies depending on the workflow, node
count, and the estimated number of clients that would be in a failover event.
In previous OneFS releases, a greater IP address quantity was recommended considering the typical cluster size and the workload a
single node could handle during a failover. As nodes become unavailable, all the traffic hosted on that node is moved to another node
with typically the same resources, which could lead to a degraded end-user experience. Isilon nodes are now in the 6th generation, and
this is no longer a concern. Each node does have limitations, and those must be considered when determining the number of IP
addresses and failover events creating additional overhead. Additionally, as OneFS releases have progressed, so has the typical
cluster size, making it difficult to maintain the N*(N-1) formula with larger clusters.
From a load-balancing perspective, for dynamic pools, it is ideal, although optional, that all the interfaces have the same number of IP
addresses, whenever possible. It is important to note that in addition to the points above, consider the workflow and failover
requirements set by IT administrators.
Once a node is suspended, SmartConnect prevents new client connections to the node. If the node is part of a dynamic zone, IP
addresses are not assigned to this node in a suspended state. Suspending a node ensures that client access remains consistent. After
the node is suspended, client connections can be monitored and allowed to gradually drop-off before a reboot or power down.
A node is suspended from the OneFS CLI or web interface. From the Isilon CLI, the command is:
isi network pools --sc-suspend-node <groupnet.subnet.pool> <node ID>
Alternatively, from the web interface, click “Suspend Nodes” under the ‘Pool,’ as displayed in the following figure:
After a node is suspended, new connections are not created. Prior to rebooting or shutting the node down, confirm all client connections
have dropped by monitoring the web interface under the “Client Connections” tab from the “Cluster Overview” page. Also, clients may
have to be manually booted from the node if they have static SMB connections with applications that maintain connections.
In certain environments where PTR records may be required, this results in the creation of many PTR entries, as Isilon SmartConnect
pools could have hundreds of IP addresses. In scenarios where PTR records are required, each time an additional IP address is added
to a SmartConnect pool, DNS changes are necessary to keep the environment consistent.
Creating reverse DNS entries for the SmartConnect Service IP’s Host [address, or A] record is acceptable if the SmartConnect Service
IP is referenced only with an A record in one DNS domain.
The SmartConnect service IP address on an Isilon cluster, in most cases, should be created in DNS as an address (A) record, also
called a host entry. An A record maps a URL such as www.dell.com to its corresponding IP address. Delegating to an A record is
provides simplicity during a failover. Only a single DNS A record must be updated. All other name server delegations can be left alone.
In many enterprises, it is easier to have an A record updated than to update a name server record, because of the perceived complexity
of the process.
One delegation for each SmartConnect zone name or each SmartConnect zone alias on a cluster is recommended. This method
permits failover of only a portion of the cluster's workflow—one SmartConnect zone—without affecting any other zones. This method is
useful for scenarios such as testing disaster recovery failover and moving workflows between data centers.
It is not recommended to create a single delegation for each cluster and then create the SmartConnect zones as sub-records of that
delegation. Using this method would enable Isilon administrators to change, create, or modify their SmartConnect zones and zone
names as needed without involving a DNS team. The concern with this method is it causes failover operations that involve the entire
cluster and affects the entire workflow, not just the impacted SmartConnect zone.
Requests to connect to Isilon clusters with SmartConnect zone names will succeed
The isolated network benefits from SmartConnect features, such as load-balancing and rerouting traffic to prevent unavailable
nodes, will work as expected in a typical, non-isolated deployment.
It is important to recognize that Isilon OneFS is not a full DNS server, hence it will only answer for SmartConnect Zones.
The following commands show how to simulate and test a configuration that uses the SmartConnect service IP address as the primary
DNS server.
C:\>nslookup
Address: 10.123.17.60
> isi01-s0.domain.com
Server: [10.123.17.60]
Address: 10.123.17.60
Name: isi01-s0.domain.com
Address: 10.123.17.64
> isi01-s0.domain.com
Server: [10.123.17.60]
Address: 10.123.17.60
Name: isi01-s0.domain.com
Address: 10.123.17.63
Cluster Name: Active Directory (AD) uses the cluster name as the AD machine account name. For example, when a cluster
named isi01 joins Active Directory, isi01 is the cluster’s machine account name. Using the cluster/machine account name in all
DNS entries simplifies cluster administration and troubleshooting.
IP Allocation Strategy: Each SmartConnect zone has an IP allocation strategy set to static or dynamic. The allocation strategy
is allocated in the zone name, for example, by using “d” for dynamic and or “s” for static.
SmartConnect Pool ID: Each SmartConnect pool has a unique name or number that identifies it. By default, the first pool
called on a cluster is pool0, the second is pool1, and so on. These identifiers are recommended to be part of the zone name.
SSIP: Use the SSIP in the zone name to indicate a SmartConnect Service IP zone.
The variables above together form a SmartConnect zone name. For example: isi01-s0.domain.com
The name includes the cluster name (isi01), the allocation strategy of the zone (“s” for static), and the number of the pool (pool0).
Based on the SmartConnect zone, pool, and the cluster information, the following is a sample DNS layout for the cluster named 'isi01':
Many factors determine performance in network-attached storage. In an Isilon cluster, key components are the front-end performance,
which consists of the network card, CPU, and memory in the node that is serving the relevant data protocol, and the back-end
performance, which, in this case, is the disk tier or pool where the data resides. In the context of SmartConnect configuration, creating a
connection pool that spans across different node performance levels is not recommended. For example, a pool with Isilon F800 nodes
and A200 nodes would provide significantly varying protocol performance. It is imperative to understand how the nodes within a
connection pool impact client experience.
To illustrate how this works, suppose that an existing four-node cluster is refreshed with four new nodes. Assume that the cluster has
only one configured subnet, all the nodes are on the network, and that there are sufficient IP addresses to handle the refresh. The first
step in the cluster refresh, is to add the new nodes with the existing nodes, temporarily creating an eight node cluster. Next, the original
four nodes are SmartFailed. The cluster is then composed of the four new nodes with the original data set.
As the administrators perform the refresh, they check the current configuration using the isi config command, with the status
advanced command, as shown in the following example:
isi config
>status advanced
The SmartConnect service continues to run throughout the process as the existing nodes are refreshed. The following example
illustrates where the SmartConnect service runs at each step in the refresh process.
Once the four new nodes are added to the cluster, based on the existing naming convention, they are automatically named
clustername-5, clustername-6, clustername-7, and clustername-8. At this point, the Node IDs and LNNs are displayed in the following
table:
Next, the original nodes are removed using SmartFail. The updated Node IDs and LNNs are displayed in the following table:
Keeping the naming convention consistent, the administrators re-name the new nodes, formerly clustername-5, clustername-6,
clustername-7, and clustername-8, to clustername-1, clustername-2, clustername-3, and clustername-4, respectively. The updated
Node IDs and LNNs remain the same, but map to a different Node Name, as displayed in the following table:
If LNN 1 is offline for maintenance, the SmartConnect service migrates to LNN 2, because LNN 2 has the next lowest NodeID number,
6.
It is recommended to disable client DNS caching, when possible. To handle client requests properly, SmartConnect requires
that clients use the latest DNS entries. If clients cache SmartConnect DNS information, they could connect to incorrect
SmartConnect zone names. In this event, SmartConnect might not appear to be functioning correctly.
If traffic is traversing firewalls, ensure that the appropriate ports are open. For example, if UDP port 53 is opened, also ensure
TCP port 53 is opened.
Certain clients perform DNS caching and might not connect to the node with the lowest load if they make multiple connections
within the lifetime of the cached address. For example, this issue occurs on Mac OS X for certain client configurations.
In order to successfully distribute IP addresses, the OneFS SmartConnect DNS delegation server answers DNS queries with a
time-to-live (TTL) of 0 so that the answer is not cached. Certain DNS servers, most particularly Windows Server 2003, 2008,
2012, and 2016, will update the value to one second. If many clients are requesting an address within the same second, this
will cause all of them to receive the same address. If this occurs frequently, consider a different DNS server, such as bind.
The site DNS servers must be able to communicate with the node that is currently hosting the SmartConnect service.
Site DNS servers might not exist in the regular local subnets, or in any of the subnets that clients occupy. To enable the
SmartConnect lookup process, ensure that the DNS servers use a consistent route to the cluster and back. If the site DNS
server sends a lookup request that arrives through one local subnet on the cluster, but the configured cluster routing causes
the response to be sent through a different subnet, it’s likely that the packet will be dropped and the lookup will fail. The
solutions and considerations for SmartConnect are similar to the client scenarios. Additionally, the DNS server might benefit
from a static route to the subnet that contains the SSIP address or addresses.
SmartConnect makes it possible for different nodes to have different default routes, but this is fundamentally determined by
connectivity. SmartConnect enables administrators to define multiple gateways, with 1 gateway per subnet. Each gateway is
assigned a priority when it is defined. On any node, SmartConnect attempts to use the highest priority gateway—the gateway
that has the lowest number—that has an available functioning interface in a subnet that contains the gateway address.
Generally speaking, the MTU across the internet is 1500 bytes. As such, most devices limit packet size to roughly 1472 bytes, allowing
for additional overhead and remaining under the 1500 byte limit. Additional overhead may be added as the packet goes through
different hops. The IEEE 802.3 standard also specifies 1500 bytes as the standard payload.
ETHERNET PACKET
An Ethernet frame carries a payload of data and is carried by an Ethernet packet. The frame could be IPv4 or IPv6 and TCP or UDP.
The IEEE 802.3 standard defines the structure of each packet. As a packet traverses different layers, the structure is modified
accordingly. In the diagram below, the structure is displayed as it would traverse the wire, or Layer 1. Dissecting how a packet is
structured on the wire lends to an understanding of how the packet overhead is impacted and all of the other components required to
send a payload.
Interpacket Gap: Serves as a gap between each frame, similar to a spacer. The Interpacket gap is only part of Layer 1. The
field originates from a time when hubs were common, and collisions were more commonplace.
Preamble: Composed of alternating 1 and 0 bits for receiver clock synchronization. The Preamble is only part of Layer 1.
Destination MAC: Contains the MAC address of the destination station for which the data is intended.
Type: Also known as the EtherType field, this defines the type of protocol that is encapsulated in the payload. In the example
above, it is an Ethernet II Frame, the most widely accepted type.
Payload: Spans from 46 to 1500 bytes and contains user data. If it is smaller than 46 bytes, blank values are entered to bring
this up to 46 bytes as it is the minimum value. The Payload consists of protocol data for TCP, UDP or RTP and IPv4 or IPv6.
The next section explains the Payload field in greater depth.
ETHERNET PAYLOAD
The Ethernet payload varies based on the type of data it is carrying. It is a combination of either TCP, UDP, or RTP header combined
with an IPv4 or IPv6 header, and most importantly the actual payload which contains the data that is being sent. The fields within the
payload are displayed in the figure below.
As shown in the figure above, the amount of actual data sent within an Ethernet Frame is dependent upon the number of bytes
consumed by the other fields. There are other options are available which are not listed here. For example, Linux hosts automatically
add a timestamp to the TCP stack, adding 12 bytes.
JUMBO FRAMES
Jumbo frames are Ethernet frames where the MTU is greater than the standard 1500 bytes and a maximum of 9000 bytes. The larger
MTU size provides greater efficiency as less overhead and fewer acknowledgments are sent across devices, drastically reducing
interrupt load on endpoints. Jumbo frames are recommended for most workloads as the amount of data sent per message is far
greater, reducing processing times and maximizing efficiency. While the general assumption is that Jumbo frames provide performance
advantages for all workloads, it is important to measure results in a lab environment simulating a specific workload to ensure
performance enhancements.
For Jumbo frames to take advantage of the greater efficiencies, they must be enabled end-to-end on all hops between endpoints.
Otherwise, the MTU could be lowered through PMTUD or packets could be fragmented. The fragmentation and reassembly impact the
CPU performance of each hop, which impacts the overall latency.
For example, if a client is set to an MTU of 1500 bytes while other hops are set to 9000 bytes, transmission along the path will most
likely set to 1500 bytes using PMTUD, unless other options are configured.
Jumbo frames utilize the same Ethernet packet structure described in the previous section. However, the difference is the size of the
data within the payload. As the byte consumption of the other components within the frame remain the same, each packet contains
more data with the same overhead. A Jumbo frame Ethernet payload is displayed in the following figure:
IP PACKET OVERHEAD
Isilon nodes utilize 10 and 40 GbE NICs for front-end networking. In order to maximize throughput on these high bandwidth links,
Jumbo frames are recommended for greater throughput. Standard 1500 byte and Jumbo 9000 byte packets are formed with the same
packet structure at Layer 1 with the only difference pertaining to the actual data payload size. Although the overhead is identical for
standard and Jumbo packets, the ratio of the data to the overhead varies significantly.
For every payload sent to Layer 1 on the wire, the following fields are required:
Interpacket Gap / Preamble / Start Frame Delimiter / Destination MAC / Source MAC / Type / CRC
Hence, regardless of the payload fields or size, every payload requires an additional 38 bytes to be sent. It is important to note that this
does not take into account the optional VLAN tag which requires an additional 4 bytes. The following sections provide examples of
packet overhead based on the payload fields.
If the payload headers consume 40 bytes, the data field for a standard 1500 byte payload consumes:
1500 – 40 = 1460 bytes
Therefore, a standard 1500 byte packet with IPv4 and TCP headers results in a data to Ethernet frame percentage as follows:
(Data Bytes) / (Total Ethernet Frame Bytes) = (1500 – 40) / (1500 + 38) = 1460/1538 = .949 => 94.9%
If the payload headers consume 40 bytes, the data can field consumes:
9000 – 40 = 8960 bytes
Therefore, a standard 1500 byte packet with IPv4 and TCP headers results in a data to Ethernet frame percentage as follows:
(Data Bytes) / (Total Ethernet Frame Bytes) = (9000 – 40) / (9000 + 38) = 8960/9038 = .991 => 99.1%
Therefore, a standard 1500 byte packet with IPv4 and TCP headers results in a data to Ethernet frame percentage as follows:
(Data Bytes) / (Total Ethernet Frame Bytes) = (1500 – 52) / (1500 + 38) = 1448/1538 = .941 => 94.1%
Therefore, a standard 1500 byte packet with IPv4 and TCP headers results in a data to Ethernet frame percentage as follows:
(Data Bytes) / (Total Ethernet Frame Bytes) = (9000 – 52) / (9000 + 38) = 8948/9038 = .990 => 99.0%
As shown in the table above, Jumbo frames deliver between 98%-99% efficiency depending on the packet type. The efficiencies are
only maximized when all hops from the client endpoint to an Isilon node support Jumbo frames. Otherwise, packets may be fragmented
leading to additional processing overhead on devices or PMTUD finding the lowest MTU along the path. Therefore, Jumbo frames are
recommended for optimal performance. However, it is important to recognize that each workload environment is unique and measuring
performance enhancements in a lab are recommended prior to a production network update.
Most devices have a default MTU that is configurable and remains at the defined value. Isilon OneFS determines MTU size specific to
each transaction. After the initial TCP handshake, the Isilon node sends an ICMP message for Path MTU Discovery (PMTUD), RFC
1191, gathering the maximum supported MTU size. If for any reason ICMP is disabled, or PMTUD is not supported, this causes OneFS
to default the MTU size to 536 bytes, which typically leads to performance degradation.
To modify the MTU, use the isi command with the following context:
isi network subnets modify groupnet0.subnet1 --mtu=1500 --gateway=198.162.100.10 --gateway-priority=1
For example, to check if an MTU size of 8900 bytes is transmitted to an endpoint, from the OneFS CLI, use the following command:
ping –s 8900 –D <IP Address>. The ‘-s’ specifies the packet size and the ‘-D’ specifies the not to fragment the packet.
If the ping is successful, the MTU size is transmitted across. If the ping is unsuccessful, gradually lower the MTU size until it is
successfully transmitted. Confirm the MTU size can be transmitted from both endpoints.
OneFS is based on FreeBSD. FreeBSD also has options for gradually increasing the MTU size by performing a ‘sweeping ping’ using
the –g option. For more information on ping options in FreeBSD, access the FreeBSD manual at the following link:
https://www.freebsd.org/cgi/man.cgi?ping(8)
SYSTEM ZONE
When an Isilon cluster is first configured, the System Zone is created by default. The System Zone should only be used for
management as a best practice. In certain special cases, some protocols require the system zone, but generally speaking, all protocol
traffic should be moved to an Access Zone. If nothing else, NFS and SMB should have protocol specific Access Zones.
Moving client traffic to Access Zones ensures the System Zone is only used for management and accessed by administrators. Access
Zones provide greater security as administration, and file access is limited to a subset of the cluster, rather than the entire cluster.
Generally speaking, the best practice to remove all data access from the default System Zone. Otherwise, this leads to complications in
the future as the cluster grows and additional teams or workflows are added. Further, as mentioned above, create a subdirectory under
the Access Zone, rather than using the root of the zone, as this makes migration and disaster recovery simpler. It is preferred not to
have an overlap of Root Based Paths unless it is required for a specific workflow. Overlap is supported in 8.0 and newer releases
through the CLI.
In the figure below, as Cluster 1 fails over to Cluster 2, the directory structure remains consistent, easily identifying where the files
originated from. This delineation also ensures clients have the same directory structure after a failover. Once the IP address is updated
in DNS, the failover is transparent to clients. As more clusters are brought together with SyncIQ, this makes it easier to manage data,
understanding where it originated from and provides seamless disaster recovery.
Root Based Paths may also be based on protocol. As an example, protocols are matched with a Root Based Path in the following table:
In the example above, each gateway has a defined priority. If SBR is not configured, the highest priority gateway, i.e. gateway with the
lowest value which is reachable, is used as the default route. Once SBR is enabled, when traffic arrives from a subnet that is not
reachable via the default gateway, firewall rules are added. As OneFS is FreeBSD based, these are added through ipfw. In the example
above, the following ipfw rules are provisioned:
The process of adding ipfw rules is stateless and essentially translates to per-subnet default routes. SBR is entirely dependent on the
source IP address that is sending traffic to the cluster. If a session is initiated from the source subnet, the ipfw rule is created. The
session must be initiated from the source subnet, otherwise the ipfw rule is not created. If the cluster has not received traffic that
originated from a subnet that is not reachable via the default gateway, OneFS will transmit traffic it originates through the default
gateway. Given how SBR creates per-subnet default routes, consider the following:
A subnet setting of 0.0.0.0 is not supported and is severely problematic, as OneFS does not support RIP, RARP, or CDP.
The default gateway is the path for all traffic intended for clients that are not on the local subnet and not covered by a routing
table entry. Utilizing SBR does not negate the requirement for a default gateway, as SBR in effect overrides the default
gateway, but not static routes.
Static routes are an option when the cluster originates the traffic and the route is not accessible via the default gateway. As
mentioned above, static routes are prioritized over source-based routing rules.
In certain environments, Isilon clusters with SBR enabled and multiple SmartConnect SIP (SSIP) addresses, have experienced
excessive latency with DNS responses. As mentioned previously in this paper, keeping latency minimal is imperative through any
transaction and the delayed DNS responses could impact DNS dependent workflows. The prior section explained how SBR
dynamically assigns a gateway. In this instance, the route to the DNS server is changed as the session originated on a different
interface based on the SSIP being addressed.
In order to prevent the additional latency with DNS responses, when SBR is enabled with multiple SSIP addresses, consider the
following:
If a single Access Zone is configured to have multiple SmartConnect zones and multiple subnets with SBR enabled, it is
recommended to have a single SSIP.
If multiple Access Zones are required within a single groupnet, then a single SSIP is recommended.
WHY IPV6?
IPv6 brings innovation and takes connectivity to a new level with enhanced user experiences.
SECURITY
IPv6 supports IPSEC inherently with encryption and integrity checks. Additionally, the Secure Neighbor Discovery (SEND) protocol
provides cryptographic confirmation of host identity, minimizing hostname based attacks like Address Resolution Protocol (ARP)
poisoning, leading to devices placing more trust in connections.
EFFICIENCY
IPv6’s large address space means many devices no longer require NAT translation as previously with IPv4, making routers far more
efficient. Overall data transmission is faster and simplified as the need for checking packet integrity is eliminated.
MULTICAST
IPv6 supports multicast rather than broadcast, allowing media streams to be sent to multiple destinations simultaneously leading to
bandwidth savings.
QUALITY OF SERVICE
QoS implementation is far simplified in IPv6 with a new packet header. The IPv6 header contains a new field, Flow Label, which
identifies packets belonging to the same flow. The Flow Label associates packets from a specific host and head to a particular
destination.
For display purposes, an IPv6 address may be presented without leading zeros. For example, an IPv6 address as 2001 : 0DC8:
E004 : 0001 : 0000 : 0000 : 0000 : F00A could be displayed as 2001 : DC8: E004 : 1 : 0 : 0 : 0 : F00A.
The address may be further reduced by removing consecutive fields of zeros and replacing with a double-colon. The double-colon can
only be used once in an address. The address above becomes 2001 : DC8 : E004 : 1 :: F00A.
Anycast: one-to-nearest – Assigned to a group of interfaces, with packets being delivered only to a single (nearest) interface
Multicast: one-to-many – Assigned to a group of interfaces, and is typically delivered across multiple hosts.
An IPv6 Unicast address is composed of the Global Routing Prefix, Subnet ID, and the Interface Identifier. The Global Routing Prefix is
the network ID or prefix of the address for routing. The Subnet ID is similar to the netmask in IPv4 but is not part of the IP address in
IPv6. Finally, the Interface ID is a unique identifier for a particular interface. For Ethernet networks, the Ethernet MAC address (48 bits)
may be used for the Interface Identifier, by inserting 16 additional bits, forming what is referred to as an EUI-64 address.
For Service Providers to deliver IPv6, they utilize translation technologies. The two major translation technologies are the Network
Address Translation IPv6 to IPv4 (NAT64) and Stateless IP/ICMP Translation (SIIT). NAT64 is similar to the IPv4 Network Address
Translation but is specific to IPv6. SIIT is capable of replacing IPv4 and IPv6 as part of the translation.
NETSTAT
Netstat, short for network statistics, is a utility built into most Windows and Linux clients. It provides an array of statistics on current
ports, routing, IP stats for transport layer protocols, and serves as a forensic tool to link processes with network performance while
digging deeper into the current network status. Netstat bundles several actions into a single command with different options available.
As Netstat is multi-platform, the syntax is similar across platforms with slight variations.
NETSTAT
In its standard form without any arguments, netstat provides an overview of the current network status broken out by each connection
or socket. Each column displays the following:
Proto: Protocol of the active connection. The protocol could be TCP or UDP and has a ‘4’ or ‘6’ associated specifying if it is
IPv4 or IPv6, respectively.
Recv-Q and Send-Q: Value of the receiving and sending queue in bytes. Non-zero values specify the number of bytes in the
queue that are awaiting to be processed. The preferred value is zero. If several connections have non-zero values, this implies
something is causing processing to be delayed.
Local Address and Foreign Address: Lists the hosts and ports the sockets are connected with. Some of these are local
connections to the host itself.
State: Displays the current state of the TCP connection, based on the TCP protocol. As UDP is a stateless protocol, the ‘State’
column will be blank for UDP connections. The most common TCP states include:
o Close Wait: The remote machine has closed the connection, but the local device has not closed the connection yet.
o Time Wait: The local machine is waiting for a period of time after sending an ACK to close a connection.
For more information about the states of a TCP connection, see RFC 793.
Netstat reveals a lot of information about the current status of network connections, and it also provides information for a more thorough
forensic analysis. While reviewing the output from netstat, some of the scenarios can be generalized, like the following:
Recv-Q has a value greater than zero but is in a ‘Close Wait’ state. This indicates that these sockets should be torn down but
are hanging. If several sockets are in this state, it could imply the application is having difficulty tearing down the connection
and may warrant additional investigation.
Connections that have localhost as the ‘Local’ and ‘Foreign’ address denote an internal process using the TCP stack to
communicate. These connections are not concerning and are standard practice.
The fields highlighted in red above must be reviewed as a ratio of the total packets that are transmitted and received as a percentage.
Additionally, these statistics should be monitored for sudden increments. As a rule of thumb, under 1% is not concerning but this also
depends on the workload. The fields highlighted above provide the following:
Retransmitted Packets: Packets that are retransmitted consume network bandwidth and could be the reason for further
investigation. However, examining the percentage is critical. In this case, 249379 out of 235829612 were retransmitted, which
is 0.105%.
Duplicate Acknowledgements: High latency between endpoints may lead to duplicate acknowledgments, but the ratio must be
examined. In this case, it is 0.419%. This number varies depending on the workload.
Out of Order Packets: Out of order packets are placed in order by TCP before presenting to the application layer, which
impacts the CPU and overall stack performance as additional effort is involved in analyzing the packets. Performance is
impacted the most when packets arrive out of order with a significant time gap, or a number of packets are out of order. The
ratio, in this case, is 0.197%, which is negligible.
From the output above, netstat –i, lists the following columns:
Name: Network Interface Card (NIC) name. Loopback interfaces are listed as ‘lo0,’ and ‘ib’ specifies InfiniBand.
Ipkts: Input packets are the total number of packets received by this interface.
Ierrs: Input errors are the number of errors reported by the interface when processing the ‘Ipkts.' These errors include
malformed packets, buffer space limitation, checksum errors, errors generated by media, and resource limitation errors. Media
errors are errors specific to the physical layer, such as the NIC, connection, cabling, or switch port. Resource limitation errors
are generated at peak traffic when interface resources are exceeded by usage.
Idrop: Input drops are the number of packets that were received, but not processed and consequently dropped on the wire.
Dropped packets typically occur during heavy load.
Opkts: Output packets are the total number of packets transmitted by this interface
Oerrs: Output errors are the number of errors reported by the interface when processing the ‘Opkts.' Examples of errors
include the output queue reaching limits or an issue with the host.
Coll: Collisions are the number of packet collisions that are reported. Collisions typically occur during a duplex mismatch or
during high network utilization.
In general, errors and dropped packets require closer examination. However, as noted in the previous netstat section, the percentage of
errors and dropped packets are the main factor. The following are some of the points to consider for further analysis:
‘Ierrs’ should typically be less than 1% of the total ‘Ipkts.' If greater than 1%, check ‘netstat –m’ for buffer issues and consider
increasing the receive buffers. Prior to implementing changes on a production system, buffer changes should be tested in a
lab environment. Refer to the Isilon Network Stack Tuning Section for additional details.
‘Oerrs’ should typically be less than 1% of the total ‘Opkts.’ If greater than 1%, it could be a result of network saturation,
otherwise consider increasing the send queue size.
NETSTAT –M
The netstat –m option displays the current status of network memory requests as mbuf clusters. Netstat –m is a powerful option for a
complete forensic analysis when one of the other netstat commands mentioned above raises concern. If mbufs are exhausted, the
node cannot accept any additional network traffic.
The netstat –m output provides information in regards to available and used mbufs. The area highlighted in red, confirms if any memory
requests have been denied. In the example above, a quick glance at this area reveals that no requests have been denied.
For more information on netstat options, visit the FreeBSD manual netstat page at
https://www.freebsd.org/cgi/man.cgi?query=netstat&manpath=SuSE+Linux/i386+11.3
InsightIQ gathers network errors using the output from ‘netstat –i’ on external interfaces only. The total of the ‘Ierrs’ and ‘Oerrs’ is
combined and displayed in the graph. Refer to the previous section for interpreting the output from ‘netstat –i.’
In order to find the exact interfaces the errors, sort the graph by ‘Node,’ ‘Direction,’ and ‘Interface,’ as shown in the following figures:
From the figures above, it is concluded that the external network errors reside on the interface ‘7/10gige-1’ of Node 7, on the input or
receive side. Further analysis must be performed on this interface to conclude the root cause. Refer to the ‘netstat –i’ section in this
paper for the next troubleshooting steps.
Header: The Header provides the version of ‘dig’, options the ‘dig’ command used and the flags that are displayed.
Question Section: The Question Section displays the original input provided to the command. In the case above, dell.com was
queried. The default is to query the DNS A record. Other options are available for querying MX and NS records.
Answer Section: The Answer Section is the output received by dig from the DNS server queried.
Authority Section: The Authority Section lists the available Name Servers of dell.com. They have the authority to respond to
this query.
Additional Section: The Additional Section resolves the hostnames from the Authority Section to IP addresses.
Stats Section: The footer at the end of the query is referred to as the Stats Section. It displays the when, where, and time the
query consumed.
Dig supports an array of options. The most common options include a reverse look-up using ‘dig –x [IP address]’ to find a host name.
The other is to specify a DNS server to query using ‘dig @[dns server] [hostname].’
For the complete list of dig options, please refer to the FreeBSD manual page:
https://www.freebsd.org/cgi/man.cgi?query=dig&sektion=1&manpath=FreeBSD%209.1-RELEASE
REFERENCES
Cisco Nexus 5000 Series Configuration Guide:
http://www.cisco.com/c/en/us/td/docs/switches/datacenter/nexus5000/sw/configuration/guide/cli/CLIConfigurationGuide/QoS.html#pgfId
-1138522
IEEE Standards
https://standards.ieee.org/findstds/standard/802.1AX-2008.html
SMBv3 Multi-Channel:
https://blogs.technet.microsoft.com/josebda/2012/06/28/the-basics-of-smb-multichannel-a-feature-of-windows-server-2012-and-smb-3-
0/
RFCs:
http://www.faqs.org/rfcs/rfc1812.html
http://www.faqs.org/rfcs/rfc1122.html
http://www.faqs.org/rfcs/rfc1123.html
https://tools.ietf.org/html/rfc3971
https://tools.ietf.org/html/rfc792
https://tools.ietf.org/html/rfc3513
Shop Dell EMC Isilon to compare features and get more information.
© 2017 Dell Inc. or its subsidiaries. All Rights Reserved. Dell, EMC and other trademarks are
trademarks of Dell Inc. or its subsidiaries. Other trademarks may be trademarks of their respective
owners. Reference Number: H16103
© 2017 Dell Inc. or its subsidiaries. All Rights Reserved. Dell, EMC and other trademarks are
Learn more
trademarks about
of Dell Inc. or its Dell Contact
subsidiaries. Other a Dell
trademarks mayEMC Expertof their respective
be trademarks View more resources Join the conversation
EMC
owners. Isilon Number:
Reference solutions H16103 with #DellEMCStorage
*Legal footnotes minietus idi dolore, qui velibus andisquas min rest auditiorum que el ipsandant re volupit aliquisserio il eum fuga.
44 |AtIsilon
© 2018 faccus Network
Dell Inc. dolor siDesign
nem. Nam,
or its subsidiaries. Considerations
to occus
All Rights eicDell,
Reserved. to quas acestru
EMC and mquias andebis
other trademarks etur of
are trademarks sitDell
fugitassin
Inc. or nestrum.
©
its 2018 Dell Other
subsidiaries. Inc. trademarks may be trademarks of their respective owners. Reference Number: H16463
Learn more about
Dell EMC Isilon Contact a Dell EMC Expert