0% found this document useful (0 votes)
8 views

perf-soft-hard-raid

This paper evaluates the performance of software RAID versus hardware RAID in the context of the Parallel Virtual File System (PVFS) on Linux clusters. The study finds that while software RAIDs generally perform comparably to hardware RAIDs, they lag in write operations requiring file synchronization. The results indicate that hardware RAIDs offer better data transfer rates due to integrated cache memory, making them preferable in high-demand environments.

Uploaded by

vmkkolli
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

perf-soft-hard-raid

This paper evaluates the performance of software RAID versus hardware RAID in the context of the Parallel Virtual File System (PVFS) on Linux clusters. The study finds that while software RAIDs generally perform comparably to hardware RAIDs, they lag in write operations requiring file synchronization. The results indicate that hardware RAIDs offer better data transfer rates due to integrated cache memory, making them preferable in high-demand environments.

Uploaded by

vmkkolli
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/4004715

Performance evaluation of software RAID vs. hardware RAID for Parallel Virtual
File System

Conference Paper · January 2003


DOI: 10.1109/ICPADS.2002.1183417 · Source: IEEE Xplore

CITATIONS READS
12 2,713

3 authors, including:

Jenwei Hsieh Rizwan Ali


Dell Dell
53 PUBLICATIONS 651 CITATIONS 15 PUBLICATIONS 101 CITATIONS

SEE PROFILE SEE PROFILE

All content following this page was uploaded by Rizwan Ali on 10 July 2014.

The user has requested enhancement of the downloaded file.


Performance Evaluation of Software RAID vs. Hardware RAID for
Parallel Virtual File System

Jenwei Hsieh, Christopher Stanton and Rizwan Ali


Dell Computer Corporation
Round Rock, TX 78682
{jenwei_hsieh, christopher_stanton, rizwan_ali}@dell.com

Abstract and all PVFS file system data is stored on file systems local to
Linux clusters of commodity computer systems and inter- I/O nodes. The effective throughput which can be contributed
connects have become the fastest growing choice for building by each I/O node is minimum(Bnetwork , Bstorage), where Bnet-
cost-effective high-performance parallel computing systems. work is the sustained TCP/IP bandwidth of an I/O node and
The Parallel Virtual File System (PVFS) could potentially ful- Bstorage is the sustained storage bandwidth. Second, although
fill the requirements of large I/O-intensive parallel applica- PVFS is a stable parallel file system, it is not fault-tolerant.
tions. It provides a high-performance parallel file system by Failure of a disk drive on an I/O node or failure of an I/O node
striping file data across multiple cluster nodes, called I/O itself will make PVFS accesses which need that I/O node to
nodes. Therefore, the choice of storage devices on I/O nodes is fail as well.
crucial to PVFS. Until PVFS adopts more efficient communication proto-
In this paper, we study the impact of software RAIDs and cols, such as GM [9] or VIA [4], a RAID of a small number of
hardware RAIDs on the performance of PVFS when they are disk drives is a reasonable storage configuration for an I/O
used on I/O nodes. We first establish a baseline performance node. A small number of disk drives will not oversaturate the
of both RAIDs in a stand-alone configuration. We then present network interface and the RAID will provide disk level fault-
the performance of PVFS for a workload comprising concur- tolerance. With a small number of drives, the RAID can be
rent reads and writes using ROMIO MPI-IO, and for the BTIO implemented either using a dedicated RAID controller, called
benchmark with a noncontiguous access pattern. We found hardware RAID, or using host processors on the I/O node per-
that software RAIDs have a comparable performance to hard- forming parity calculation and data striping, called software
ware RAIDs, except for write operations that require file syn- RAID. Before making the decision of using either hardware or
chronization. software RAIDs for the I/O node, we need to study the perfor-
mance trend of both types of RAIDs. Tables 1 and 2 show the
Keywords: performance of software RAID 5 and hardware RAID 5 on
Performance evaluation, software RAID, hardware RAID, three generations of Dell PowerEdge servers using the Bonnie
cluster computing, parallel I/O, Parallel Virtual File System, benchmark1.
benchmarking
Table 1: Performance of software RAID 5.
I. Introduction Platform
CPU Write CPU Read CPU
Speed (MB/s) Load (MB/s) Load
The Parallel Virtual File System (PVFS) [3] from Clemson PE2450 866MHz 45.1 43.7% 101.5 60.2%
University is the most popular open source parallel file system PE2550 1.4GHz 45.3 41.7% 103.9 39.4%
for Linux clusters. It is a user-level, client-server implementa- PE2650 2.0GHz 57.7 32.0% 104.0 28.0%
tion that utilizing TCP/IP-based socket communications and
Table 2: Performance of hardware RAID 5.
existing local file systems on cluster nodes. In order to provide
high-performance access to data stored on the file system by Platform
CPU Write CPU Read CPU
many compute nodes, PVFS stripes file data across multiple Speed (MB/s) Load (MB/s) Load
cluster nodes, designated as I/O nodes. PVFS has been widely PE2450 866MHz 23.5 12.5% 46.4 19.9%
PE2550 1.4GHz 34.0 15.3% 54.9 14.1%
used as a high-performance, large parallel file system for tem-
PE2650 2.0GHz 42.7 12.5% 60.7 12.2%
porary storage and as an infrastructure for parallel I/O
research. In October of 2000, PVFS demonstrated an aggre- Both tables show performance improvement from the 4th
gate I/O throughput of 1.05 GBytes/sec with 48 I/O nodes and generation server (PowerEdge 2450) to the 6th generation
112 compute nodes [11]. More impressively, the throughput server (PowerEdge 2650), with the set of drives remaining
was achieved by aggregating a single SCSI disk drive from
each I/O node.
1. Bonnie benchmark [2] with three Seagate ST318406LC
Two design considerations need to be addressed when
drives. Performance numbers are sequential accesses on a
deploying PVFS in a Linux cluster with a commodity inter-
2GB ext2 file partition. To limit the impact from file sys-
connect (Fast Ethernet or Gigabit Ethernet). First, messages tem cache, each server is only equipped with 512 MB of
between PVFS clients and I/O nodes are passed over TCP/IP memory.
1
Proceedings of the Ninth International Conference on Parallel and Distributed Systems (ICPADS’02)
1521-9097/02 $17.00 © 2002 IEEE
fixed. Overall, software RAIDs have better sequential entire disk drive or a logical volume of many disk drives. Fig-
throughput than hardware RAIDs. The sequential read ure 1 is a logical view of PVFS. It shows how cluster nodes
throughput of Software RAIDs is reaching the plateau level of might be assigned for use with PVFS. They are divided into
three drives in a RAID 5 configuration. However, considering three types of nodes: compute nodes, on which applications
CPU loads, hardware RAIDs still enjoy a better ratio of are run, a management node which handles metadata opera-
throughput/CPU load. This indicates that hardware RAIDs tions, and I/O nodes which store file data for PVFS file sys-
off-load host CPUs for data transfer and RAID functionality. tems.
In this paper, we study the impact of software and hard-
ware RAIDs on the performance of PVFS when they are used A. Components of PVFS
as the storage configuration on I/O nodes. We present perfor-
The shaded areas in Figure 1 highlight four major compo-
mance results of PVFS on one of the Linux clusters at the Dell
nents of the PVFS. A single metadata server (mgr) stores
Computer Corporation’s Scalable Systems Laboratory. We
metadata and controls file operations including open, close
first establish the baseline performance of software RAID and
and remove commands. I/O servers (iod) handle all data trans-
hardware RAID on a single I/O node with ext2 filesystem and
fers, storing and retrieving file data stored on each I/O node’s
ext3 journaling filesystem [13]. We then present the perfor-
local filesystems. These first two components are daemons
mance of PVFS for a workload comprising concurrent reads
running on management and I/O nodes.
and writes using ROMIO MPI-IO [12]. We also utilize the
On compute nodes, the PVFS native API (a user library of
BTIO benchmark [10] to study PVFS performance with a
I/O calls, called libpvfs) provides a user-space low-level
noncontiguous access pattern. We found that for most of the
access to the PVFS servers. This library handles the scatter/
test cases software RAIDs provide a similar level of perfor-
gather operations necessary to move data between user buffers
mance as hardware RAIDs, although the former competes for
and PVFS servers, keeping these operations transparent to the
host CPU with network communications. The only exception
user. For metadata operations, applications communicate
is the write operations with file synchronization. In this case,
through the library with the metadata server. For data access
hardware RAIDs have better data transfer rates due to incor-
the metadata server is eliminated from the access path and
porated cache memory on the RAID controllers.
instead I/O servers are contacted directly. PVFS also provides
The rest of this paper is organized as follows. The next
Linux kernel support that allows PVFS file systems to be
section provides an overview of PVFS. In section 3, we
mounted in the same manner as an NFS or local filesystem on
describe our experimental environment. Baseline performance
compute nodes. This allows existing programs to access PVFS
of a single I/O node is presented in Section 4. In Sections 5
files without any modification.
and 6, we present PVFS’s performance and discuss the results.
Section 7 concludes the paper and outlines our future studies. B. Interfaces of PVFS
Compute Nodes As shown in Figure 1, there are three interfaces through
Applications which PVFS may be accessed.
kernel support ROMIO • PVFS native API (via libpvfs): a UNIX-like interface for
accessing PVFS files. This allows applications to specify
libpvfs
how files will be striped across the I/O nodes in the PVFS
system.
Network • Linux kernel interface: It allows PVFS file system to be
merged into a compute node’s local directory hierarchy.
iod iod mgr Through this interface existing applications or common
Local Filesystems Local Filesystems utilities may manipulate data on PVFS file systems.
Management
Node • ROMIO MPI-IO interface: ROMIO implements the MPI-2
I/O Nodes
[8] I/O calls in a portable library. This allows parallel pro-
Figure 1: Logical view of PVFS. grams using MPI to access PVFS files through the MPI-IO
interface.
II. Overview of PVFS In this paper, both benchmark programs we used are based
on ROMIO MPI-IO interface.
Like many other network file systems or parallel file sys-
tems, PVFS is implemented using a client-server architecture.
It utilizes a group of collaborative user-space processes (dae- III. Experimental Environment
mons) to provide a cluster-wide consistent name space and to The testing environment consists of a total of 24 rack opti-
store data in a striped fashion across multiple nodes in the mized Dell PowerEdge 1650 servers, 16 of which are used for
cluster. Messages between PVFS clients and servers are compute nodes and eight for I/O nodes. One of the I/O nodes
exchanged over TCP/IP for reliable communications. All is also designated as the metadata server. All of the cluster
PVFS file system data is stored on cluster nodes’ local filesys- nodes contain two Intel Pentium III processors running at 1.4
tems, which can be one of the partitions on a disk drive, the GHz with 512KB of L2 cache, 2GB of main memory, and two
2
Proceedings of the Ninth International Conference on Parallel and Distributed Systems (ICPADS’02)
1521-9097/02 $17.00 © 2002 IEEE
integrated Gigabit Ethernet interfaces. Each compute node has crash, and is therefore in high demand in environments where
an 18 GB SCSI system disk. Each I/O node has three 18 GB high availability is important. Actually ext3 is an ext2 filesys-
Hitachi DK32DJ-18MC SCSI disks in either hardware RAID tem with a journal file. The journaling capability means the
5 (via a PowerEdge Expandable RAID Controller, PERC 3/ user does not have to wait for a long consistency check or to
Di) or software RAID 5 configuration (via an embedded worry about metadata corruption after a crash.
Adaptec AIC 7899 SCSI controller). Cluster nodes are inter- In the baseline performance evaluation, we also study the
connected with a Foundry Networks Fast Iron II Gigabit penalty from using ext3 filesystem. For sequential read opera-
Ethernet switch. tions, there is no significant difference between ext2 and ext3.
From the software aspect, benchmark programs use The hardware RAID even demonstrates slightly better read
ROMIO implementation of MPI-IO in MPICH [5] version throughput on ext3. On sequential write operations, there is a
1.2.4. The PVFS version 1.5.4 is built on RedHat Linux 7.2, significant performance penalty with ext3. The software
which is the operating system running on all cluster nodes. RAID’s write throughput decreases by 45% from ext2 to ext3.
The hardware RAID experiences a performance reduction of
32%. The performance penalty comes from the journaling
IV. Baseline Performance of Hardware and Soft-
mechanism. Every single modification that is made to the file-
ware RAID 5 system will be written to the journaling log first. And only
We first used Bonnie [2] to establish a baseline perfor- once it is committed to the log, is the modification allowed to
mance of an I/O node. Bonnie is a widely used benchmark update the main copy on disk.
which measures the performance of Unix file system opera-
tions. It provides a good indicator of the performance charac- V. Performance Results of ROMIO perf
teristics of sequential operations. We tested Bonnie on one I/O
node with both software RAID 5 and hardware RAID 5. To The ROMIO source code includes an example MPI-IO test
limit the impact from file system cache, we reduced the sys- program called perf. It performs concurrent read and write
tem memory to 512MB and ran Bonnie against a 2GB file. To operations to the same file. In this program, each MPI process
ensure consistent results, we also used another benchmark, has a fixed size data array, 4 MB by default, which is written
IOzone filesystem benchmark [6], for cross reference. Bon- using MPI_File_write() and read using MPI_File_read() to
nie’s measurements are summarized in Table 3. disjoint regions of the file. All MPI processes synchronize
(using MPI_Barrier()) before each I/O operation.
Table 3: Baseline performance of a I/O node. The perf program measures four types of concurrent I/O
File Write CPU Read CPU operations.
RAID 5
Systems (MB/s) Load (MB/s) Load
• Write operations without file synchronization,
Software Ext2 50.5 45.7 89.6 30.8
Software Ext3 27.7 52.9 88.2 34.8 • Read operations without file synchronization,
Hardware Ext2 51.6 40.5 36.5 12.3 • Write operations with file synchronization (using
Hardware Ext3 35.3 31.5 39.3 13.2 MPI_File_sync()), and
• Read operations after file synchronization.
A. Software RAID 5 vs. Hardware RAID 5
These tests provide an upper bound on the MPI-IO perfor-
Table 3 shows different performance characteristics than mance that can be expected from a given set of I/O nodes and
Tables 1 and 2. Previously, software RAID 5 has better file system. In this section, we run perf program with up to 16
throughput than hardware RAID 5 for both write and read compute nodes against the PVFS file system constructed from
operations. In this case hardware RAID has better throughput eight I/O nodes.
for write operations. For both ext2 and ext3 filesystems, hard- Figure 2 has three charts that show the aggregate band-
ware RAID has higher write throughput than software RAID, width of write operations without file synchronization
while enjoying a lower CPU load. The difference is due to the reported by perf with each MPI process using array sizes
performance characteristics of Hitachi DK32DJ-18MC disks. (access sizes) from 4MB, 16MB, and 64MB. These charts
In a previous study of the performance trend of RAIDs, we present the aggregate bandwidth of four configurations (hard-
used Seagate ST318406LC disks. ware RAID 5 ext2 and ext3 filesystem, software RAID 5 ext2
Similar to the previous study, hardware RAIDs also have a and ext3 filesystem) with increasing numbers of processes.
better ratio of throughput/CPU load. With ext2 filesystem, the They show the peak performance increases with larger access
ratio for hardware RAID on write and read operations are 1.27 sizes.
and 2.97. The ratio for software RAID are 1.11 and 2.91. There is no difference in performance with 4MB access
size. As the access size increases, the charts start to show per-
B. Ext2 vs. Ext3 Filesystem formance difference among the four configurations. Hardware
RAID 5 ext2 filesystem has the best performance with the
The ext3 filesystem [13] is a journaling extension to the peak at 766 MB/sec with 64MB access size. Software RAID 5
standard ext2 filesystem on Linux. Journaling results in mas- ext3 filesystem has the worst performance, especially for
sively reduced time spent recovering a filesystem after a 64MB access size.
3
Proceedings of the Ninth International Conference on Parallel and Distributed Systems (ICPADS’02)
1521-9097/02 $17.00 © 2002 IEEE
Write Bandwidth Without File Sync (4MB Access Size) Read Bandwidth Without File Sync (4MB Access Size)

800 Hardware RAID5, Ext2 800 Hardware RAID5, Ext2


Hardware RAID5, Ext3 Hardware RAID5, Ext3
Software RAID5, Ext2 Software RAID5, Ext2
700 Software RAID5, Ext3 700 Software RAID5, Ext3

600 600

500 500
MB/sec

MB/sec
400 400

300 300

200 200

100 100

0 0
1 2 4 8 16 1 2 4 8 16
Number of MPI Processes Number of MPI Processes

Write Bandwidth Without File Sync (16MB Access Size) Read Bandwidth Without File Sync (16MB Access Size)

800 Hardware RAID5, Ext2 800 Hardware RAID5, Ext2


Hardware RAID5, Ext3 Hardware RAID5, Ext3
Software RAID5, Ext2 Software RAID5, Ext2
700 Software RAID5, Ext3 700 Software RAID5, Ext3

600 600

500 500
MB/sec

MB/sec

400 400

300 300

200 200

100 100

0 0
1 2 4 8 16 1 2 4 8 16
Number of MPI Processes Number of MPI Processes

Write Bandwidth Without File Sync (64MB Access Size) Read Bandwidth Without File Sync (64MB Access Size)

800 Hardware RAID5, Ext2 800 Hardware RAID5, Ext2


Hardware RAID5, Ext3 Hardware RAID5, Ext3
Software RAID5, Ext2 Software RAID5, Ext2
700 Software RAID5, Ext3 700 Software RAID5, Ext3

600 600

500 500
MB/sec

MB/sec

400 400

300 300

200 200

100 100

0 0
1 2 4 8 16 1 2 4 8 16
Number of MPI Processes Number of MPI Processes

Figure 2: ROMIO perf measurement on write operations without file Figure 3: ROMIO perf measurement on read operations without file
synchronization. synchronization.

4
Proceedings of the Ninth International Conference on Parallel and Distributed Systems (ICPADS’02)
1521-9097/02 $17.00 © 2002 IEEE
Write Bandwidth Including File Sync (4MB Access Size) Read Bandwidth After File Sync (4MB Access Size)

Hardware RAID5, Ext2 800 Hardware RAID5, Ext2


300 Hardware RAID5, Ext3 Hardware RAID5, Ext3
Software RAID5, Ext2 Software RAID5, Ext2
Software RAID5, Ext3 700 Software RAID5, Ext3
250
600
200
500
MB/sec

MB/sec
150 400

300
100
200
50
100

0 0
1 2 4 8 16 1 2 4 8 16
Number of MPI Processes Number of MPI Processes

Write Bandwidth Including File Sync (16MB Access Size) Read Bandwidth After File Sync (16MB Access Size)

Hardware RAID5, Ext2 800 Hardware RAID5, Ext2


300 Hardware RAID5, Ext3 Hardware RAID5, Ext3
Software RAID5, Ext2 Software RAID5, Ext2
Software RAID5, Ext3 700 Software RAID5, Ext3
250
600
200
500
MB/sec

MB/sec

150 400

300
100
200
50
100

0 0
1 2 4 8 16 1 2 4 8 16
Number of MPI Processes Number of MPI Processes

Write Bandwidth Including File Sync (64MB Access Size) Read Bandwidth After File Sync (64MB Access Size)

Hardware RAID5, Ext2 800 Hardware RAID5, Ext2


300 Hardware RAID5, Ext3 Hardware RAID5, Ext3
Software RAID5, Ext2 Software RAID5, Ext2
Software RAID5, Ext3 700 Software RAID5, Ext3
250
600
200
500
MB/sec

MB/sec

150 400

300
100
200
50
100

0 0
1 2 4 8 16 1 2 4 8 16
Number of MPI Processes Number of MPI Processes

Figure 4: ROMIO perf measurement on write operations including file Figure 5: ROMIO perf measurement on read operations after file syn-
synchronization, chronization,

5
Proceedings of the Ninth International Conference on Parallel and Distributed Systems (ICPADS’02)
1521-9097/02 $17.00 © 2002 IEEE
Figure 3 shows three charts of the aggregate bandwidth of The three writing methods for BTIO are Fortran direct
read operations without file synchronization. All four configu- unformatted I/O, MPI-IO using MPI_File_write_at() (the
rations reach similar peak performance with 16 MPI pro- “simple” MPI-IO version), and MPI-IO using
cesses. The best case has a peak of 842 MB/sec. Comparing MPI_File_write_at_all() collective I/O (the “full” MPI-IO
Figure 3 to Figure 2, we also found read operations have better version). The access pattern in BTIO is noncontiguous in
bandwidth than write operations. Overall, we are pleased with memory and in the file. We used the “full” MPI-IO version of
the performance of PVFS. The aggregate bandwidth of both BTIO which utilizes collective I/O and MPI derived data type
read and write operations indicate a high degree of utilization to describe noncontiguity in memory and file. Also the
on the Gigabit Ethernet network infrastructure. ROMIO implementation has optimization for such type of
Figure 4 shows three charts of the aggregate bandwidth of requests.
write operations with file synchronization. The measurement
includes the time required for a call to MPI_File_sync() rou-
NAS Parallel I/O Benchmarks -- BTIO Benchmark (Class A)
tine after the write operation. The MPI_File_sync() routine 100
forces the updates to a file to be propagated to the storage
device before it returns. We found hardware RAIDs have sig-
80
nificant better performance than software RAIDs. The former

I/O bandwidth (MB/sec)


has around four to five times better performance than the later.
For example, with 16MB access size, hardware RAID 5 ext2 60
has a peak of 293 MB/sec, while software RAID 5 ext2 has a
peak of 68 MB/sec. This great disparity in write-with-syn- 40
chronization performance between them is related to the
RAID implementation. The hardware RAID for each I/O node
has a dedicated I/O processor and 128MB cache memory. The 20
Hardware RAID5, Ext2
Hardware RAID5, Ext3
RAID controller presents itself to the operating system as a Software RAID5, Ext2
storage device. The MPI_File_sync() call returns once file 0
Software RAID5, Ext3
4 6 8 10 12 14 16
data has been written to its cache memory. For software
Number of MPI Processes
RAID, the operating system has the knowledge of those disks
constitute a RAID. The MPI_File_sync() call will not return Figure 6: BTIO benchmark results for Class A problem size.
until file data has been written directly to the disk.
Comparing to Figure 2, write-with-synchronization opera- Figure 6 shows the performance of the “full” MPI-IO ver-
tions have substantially lower bandwidth. In a similar study at sion of BTIO using Class A problem size (643). We used 4, 9
Ohio Supercomputer Center [1] with hardware RAIDs, it and 16 compute nodes since BTIO requires a square number
takes large access sizes (up to 1GB per MPI process) and large of processors. The program performs 200 iterations and writes
numbers of compute nodes for write-with-synchronization solution matrix every 5 iterations (time steps). Total data
operations to achieve around 60% of the peak of write-with- transferred is 408 MBytes. Note that BTIO only performs
out-synchronization ones. write operations. The maximum performance was reached
Figure 5 shows charts of the aggregate bandwidth of read with 9 compute nodes for all four RAID configurations. Hard-
operations after file synchronization. There is no significant ware RAID 5 ext2 filesystem has the best peak at 93 MB/sec.
performance difference among the four storage configura-
tions. The read-after-file-synchronization operations have NAS Parallel I/O Benchmarks -- BTIO Benchmark (Class B)
similar performance curves as that of Figure 3, read-without- 100
file-synchronization.
80

VI. Performance Results of NASA BTIO Bench-


I/O bandwidth (MB/sec)

mark 60

The BTIO benchmarks [10] are variations on the widely


used BT program from the NAS Parallel Benchmark (NPB) 40
suite. Both BTIO and NPB are developed at NASA Ames
Research Center. The BT program is a simulated computa- 20
Hardware RAID5, Ext2
tional fluid dynamics (CFD) application that solves systems of Hardware RAID5, Ext3
equations resulting from an approximately factored implicit Software RAID5, Ext2
Software RAID5, Ext3
finite-difference discretization of the Navier-Stokes equations. 0
4 6 8 10 12 14 16
The BTIO benchmarks add three different methods of writing Number of MPI Processes
solution files to disk in parallel at regular predetermined inter- Figure 7: BTIO benchmark results with Class B problem size.
vals. It is intended to test the ability of the system to simulta-
neously support significant parallel computation and I/O.
6
Proceedings of the Ninth International Conference on Parallel and Distributed Systems (ICPADS’02)
1521-9097/02 $17.00 © 2002 IEEE
The performance of BTIO with Class B problem size thank Troy Baer at the Ohio Supercomputer Center and Philip
(1023) from Figure 7 shows a different trend. The maximum Carns at Clemson University for their technical assistance
performance was reached with the maximum number of com- during the course of benchmarking. The feedbacks and sug-
pute nodes we used. With a total data transfer size of 1.6 gestions from Megan Riggs and other reviewers have greatly
GBytes, hardware RAID 5 ext2 filesystem has the best perfor- improved the quality of the presentation of this paper.
mance of 99MB/sec. Software RAID 5 ext2 is slightly behind.
Both ext3 filesystems have lower performance, the software
IX. References
RAID 5 ext3 experiences the lowest performance.
[1] T. Baer. “Parallel I/O Experiences on an SGI 750 Cluster”,
Proceedings of the 2002 CUG Summit. Manchester, UK.
VII. Conclusions and Future Works 2002.
From surveying the architectural changes of three genera- [2] T. Bray. Bonnie Benchmark, http://www.textuality.com/bon-
nie/, 2000.
tions of standard high volume (SHV) servers, we found there
[3] P. H. Carns, W. B. Ligon III, R. B. Ross, and R. Thakur,
is an increasing gap in clock frequency between host CPUs
“PVFS: A Parallel File System For Linux Clusters”, Pro-
and dedicated I/O controllers for hardware RAIDs. Host CPUs ceedings of the 4th Annual Linux Showcase and Conference,
have a faster increase than dedicated I/O controllers. The Atlanta, GA, October 2000, pp. 317-327.
direct impact to performance of software and hardware RAID [4] Compaq, Intel, and Microsoft. “Virtual Interface Architec-
5s has been shown in Tables 1 and 2 (and partially in Table 3 ture Interface Specification, Version 1.0”. December 1997,
for sequential read). In a stand-alone environment, software http://www.viarch.org.
RAID 5s demonstrate higher performance in sequential [5] William Gropp, Ewing Lusk, Nathan Doss, and Anthony
accesses than hardware RAID 5s. Skjellum, “A High-Performance, Portable Implementation of
In PVFS environment, where I/O nodes have responsibili- the MPI Message-Passing Interface Standard”, Parallel Com-
ties of storing and retrieving file data, and transferring or puting, 22(6):789-828, September 1996.
[6] IOzone Filesystem Benchmark, http://www.iozone.org/,
receiving them using heavy-weight communication protocol,
2002.
TCP/IP. Our measurement results show that PVFS file system
[7] Message Passing Interface Forum. “MPI: A Message-Pass-
constructed from hardware RAIDs provides better perfor- ing Interface Standard”. International Journal of Supercom-
mance than that of software RAIDs. Nonetheless, for most of puter Applications, 8 (3/4): 165-414, 1994.
the test cases software RAIDs have a comparable performance [8] Message Passing Interface Forum, “MPI-2: Extensions to the
as hardware RAIDs, especially for sequential read operations. Message-Passing Interface”, July 1997, http://www.mpi-
The only exception is write operations with file synchroniza- forum.org/docs/docs.html.
tion, where MPI-IO calls will not return until file data has [9] Myrinet, Inc. “The GM Message Passing System”, 1999,
been committed to storage devices. http://www.myri.com.
Overall, we are encouraged by PVFS’s performance either [10] NASA Ames Research Center, NAS Application I/O (BTIO)
with software RAIDs or hardware RAIDs. The results show a Benchmark, 1996.
[11] R. B. Ross, “Providing Parallel I/O on Linux Clusters”, Sec-
high degree of utilization on our Gigabit Ethernet infrastruc-
ond Annual Linux Storage Management Workshop, Miami,
ture.2 We has future plans for investigating the performance of
FL, October 2000.
PVFS in the same cluster with a faster interconnect, such as [12] R. Thakur, W. Gropp, and E. Lusk, “On Implementing MPI-
Myrinet. The objective is to compare software RAIDs and IO Portably and with High Performance”, in Proc. of the
hardware RAIDs in a PVFS environment where the sustained Sixth Workshop on I/O in Parallel and Distributed Systems,
bandwidth of network is higher than that of storage. More May 1999, pp. 23-32.
applications with parallel I/O requirements should also be [13] S. Tweedie, “EXT3, Journaling Filesystem”, Ottawa Linux
included for future studies. Symposium, July, 2000.

VIII. Acknowledgement
The authors would like to express their appreciation to col-
leagues in the Scalable Systems Group at Dell Computer Cor-
poration for their support on this project. Among them,
Monica Kashyap started the initial prototyping effort on
PVFS, Frank E. Elizondo assisted on project coordination, Dr.
Victor Mashayekhi and Dr. Reza Rooholamini provided tre-
mendous support from the management side. They are best
friends to experimental computer scientists. We also like to

2. Note that our environment does not have the support of


jumbo frames. Jumbo frames are commonly used on Giga-
bit Ethernet for improving bandwidth.
7
Proceedings of the Ninth International Conference on Parallel and Distributed Systems (ICPADS’02)
1521-9097/02 $17.00 © 2002 IEEE
View publication stats

You might also like