0% found this document useful (0 votes)

26 views

Asiabsdcon2007 Cluster Tutorial A4

Uploaded by

Nixbie (Pemula yg serba Kepo)

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

26 views

Asiabsdcon2007 Cluster Tutorial A4

Uploaded by

Nixbie (Pemula yg serba Kepo)

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 87

Building Clusters With FreeBSD

Brooks Davis
The Aerospace Corporation
<brooks@{aero,freebsd}.org>
March 8, 2007
http://people.freebsd.org/~brooks/pubs/asiabsdcon2007/

© 2006-2007 The Aerospace Corporation

Tutorial Outline
● Introductions
● Overview of Fellowship
● Cluster Architecture Issues
● Operational Issues
● Thoughts on a Second Cluster
● FreeBSD specifics
Introductions
● Name
● Affiliation
● Interest in clusters
Overview of Fellowship
Overview of Fellowship
● The Aerospace Corporation's corporate,
unclassified computing cluster
● Designed to be a general purpose cluster
– Run a wide variety of applications
– Growth over time
– Remote access for maintainability
● Gaining experience with clusters was a goal
● In production since 2001
● >100 users
Overview of Fellowship
Software
● FreeBSD 6.1
● Sun Grid Engine (SGE) scheduler
● Ganglia cluster monitor
● Nagios network monitor
Overview of Fellowship
Hardware
● 353 dual-processor nodes
– 64 Intel Xeon nodes
– 289 Opteron nodes (152 dual-core)
● 3 TB shared (NFS) disk
● >60TB total storage
● 700GB RAM
● Gigabit Ethernet
– 32 nodes also have 2Gbps Myrinet
Overview of Fellowship
Facilities
● ~80KVA power draw
– Average US house can draw 40KVA max
● 273 kBTU/hr ~= 22Tons of refrigeration
● 600 sq ft floor space
– Excluding HVAC and power distribution
Overview of Fellowship
Network Topology
r01n01
r01n02
fellowship r01n03
...
frodo
r02n01
gamgee 10.5.0.0/16 r02n02
Aerospace
Network r02n03
arwen Cat6509
...
elrond r03n01
r03n01
moria r03n02
r03n02
r03n03
r03n03
...
...
Cluster Architecture Issues
● Architecture matters
– Mistakes are compounded when you buy
hundreds of machines
● Have a requirements process
– What are your goals?
– What can you afford?
● Upfront
● Ongoing
Cluster Architecture Issues
● Operating System
● Processor Architecture
● Network Interconnect
● Storage
● Form Factor
● Facilities
● Scheduler
Cluster Architecture Issues
Slide Format
● Trade offs and Considerations
– The trade space and other things to considers
● Options
– Concrete options
● What we did on Fellowship
● How it worked out
Operating System
Trade offs and Considerations
● Cost: Licensing, Support

● Performance: Overhead, Driver quality

● Hardware Support: Processor, Network,

Storage
● Administration: Upgrade/patch process,

software installation and management

● Staff experience: software porting,

debugging, modification, scripting

Operating System
Options
● Linux

– General purpose distros: Debian, Fedora, Red

Hat, SuSE, Ubuntu, etc.
– Cluster kits: Rocks, OSCAR
– Vendor specific: Scyld
● BSD: FreeBSD, NetBSD, OpenBSD
● MacOS/Darwin
● Commercial Unix: Solaris, AIX, HPUX,
Tru64
● Windows
Operating System
What we did on Fellowship
● FreeBSD

– Started with 4.x

– Moved to 6.x
How it worked
● Netboot works well

● Linux emulation supports commercial code

(Mathematica, Matlab)
● No system scope threads in 4.x (fixed in 5.x)

● Had to port SGE, Ganglia, OpenMPI

● No parallel debugger
Processor Architecture
Trade offs and Considerations
● Cost

● Power consumption

● Heat production

● Performance: Integer, floating point, cache

size and latency, memory bandwidth and

latency, addressable memory
● Software Support: Operating system,

hardware drivers, applications (libraries),

development tools
Processor Architecture
Options
● IA32 (i386): AMD, Intel, Transmeta, Via

● AMD64 (EM64T): AMD, Intel

● IA64 (Itanium)

● SPARC

● PowerPC

● Power

● Alpha

● MIPS

● ARM
Processor Architecture
What we did on fellowship
● Intel Pentium III's for the first 86

● Intel Xeons for the next 76

● AMD Opterons for the most recent

purchases (169)
● Retired Pentium III's this year

How it worked
● Pentium III's gave good service

● Xeons and Opterons performing well

● Considering 64-bit mode for the future

● Looking at Intel Woodcrest CPUs

Network Interconnects
Trade offs and Considerations
● Cost: NIC, cable, switch ports

● Performance: throughput, latency

● Form factor: cable management and

termination
● Standardization: commodity vs proprietary

● Available switches: size, inter-switch links

● Separation of different types of traffic

Network Interconnects
Options
● 10/100 Ethernet

● Gigabit Ethernet

● 10 Gigabit Ethernet: fast

● Infiniband: fast, low latency

● 10 Gb Myrinet: fast, low latency

● Others: Dolphin, Fiber Channel

Network Interconnects
What we did on Fellowship
● Gigabit Ethernet

● One rack of 2Gbps Myrinet nodes

How it worked
● Gigabit Ethernet is now the default option for

clusters
● Fast enough for most of our applications

● Some applications would like lower latency

● Looking at 10GbE and 10Gb Myrinet

Storage
Trade offs and Considerations
● Cost

● Capacity

● Throughput

● Latency

● Locality

● Scalability

● Manageability

● Redundancy
Storage
Options
● Local Disk

● Protocol Based Network Storage: host or

NAS appliance based

● Storage Area Network

● Clustered Storage
Storage
What we did on fellowship
● Host based NFS for home directories, node

roots, and some software

● Local disks for scratch and swap

● Moved home directories to a Netapp in 2005

How it worked
● NFS is scaling fine so far

● Enhanced Warner Losh's diskprep script to

keep disk layouts up to date

● Users keep filling the local disks

● Disk failures are a problem

Form Factor
Trade offs and Considerations
● Cost

● Maximum performance

● Maintainability

● Cooling

● Peripheral options

● Volume (floor space)

● Looks
Form Factor
Options
● PCs on shelves

● Rackmount system

– Cabinets
– 4-post racks
– 2-post racks
● Blades
Form Factor
What we did on fellowship
● 1U nodes in 2-post racks

● Core equipment in short 4-post racks

● 6 inch wide vertical cable management with

direct runs from the switch in first row

● Moved to 10 inch wide vertical management

in second row and patch panels in both rows

● Now installing new core equipment in

cabinets
Form Factor
Form Factor
How it worked
● Node racks are accessible and fairly clean

looking
● Patch panels, 10 inch cable management,

and some custom cable lengths helped

● Short 4-post racks didn't work well for real

servers
● Watch out for heavy equipment!
Facilities
Trade offs and Considerations
● Cost: space, equipment, installation

● Construction time

● Reliability
Facilities
Options
● Plug it in and hope

● Convert a space (office, store room, etc)

● Build or acquire a real machine room

● Use an old mainframe room

Facilities
Facilities
What we did on Fellowship
● Built the cluster in our existing 15,000 sq ft.

underground machine room

– 500KVA building UPS and two layers of backup
generators
● New UPS and power distribution units
(PDUs) being installed for expansion
Facilities
How it worked
● Good space with plenty of cooling

● Power was initially adequate, but is

becoming limited
– Adding a new UPS and PDUs
● Cooling issues with new UPS
● Remote access means we don't have to
spend much time there
Scheduling Scheme
Trade offs and Considerations
● Cost

● Efficiency

● Support of policies

● Fit to job mix

● User expectations
Scheduling Scheme
Options
● No scheduler

● Custom or application specific scheduler

● Batch job system

● Time sharing
Scheduling Scheme
What we did on Fellowship
● None initially

● Tried OpenPBS (not stable 4 years ago, no

experience since)
● Ported Sun Grid Engine (SGE) 5.3 with help

from Ron Chen

● Switched to SGE 6 and mandated use in

January
Scheduling Scheme
How it worked
● Voluntary adoption was poor

● Forced adoption has gone well

● Users have preconceived notions of

computers that don't fit reality with batch

schedulers
● We have modified SGE to add features

missing from the FreeBSD port with good

success
Operational Issues
● Building, Refresh and Upgrade Cycle
● User Configuration Management
● System Configuration management
● Monitoring
● Inventory Management
● Disaster Recovery
Initial Build, Refresh and Major
Upgrade Cycle
Trade offs and Considerations
● Startup cost

● Ongoing cost

● Homogeneity vs Heterogeneity

● Gradual migration vs abrupt transitions

Initial Build, Refresh and Major
Upgrade Cycle
Options
● Build

– Build all at once

– Gradual buildup
● Refresh
– Build a new cluster before retirement
– Build a new cluster in the same location
– Replace parts over time
● Upgrades
– Upgrade everything at once
– Partion and gradually upgrade
– Never upgrade
Initial Build, Refresh and Major
Upgrade Cycle
What we did on Fellowship
● Build

– Gradual buildup of nodes

– Periodic purchase of new core systems for
expansion and replacement
● Refresh
– Replaced PIII's this year
– Xeons to be replaced next year if we don't
expand to a third row
● Upgrades
– Minor OS upgrades in place
– FreeBSD and SGE 6 by partitioning
Initial Build, Refresh and Major
Upgrade
How it worked
Cycle
● Build
– Most of our apps don't care
– Different machines had different exposed serial
ports which caused a problem for serial
consoles
● Refresh
– Rapid failures of Pentium III's were unexpected
● Major Upgrades
– Partitioning allowed a gradual transition
– New machines offered incentive to move
– Node locked SSH keys and licenses caused
problems
System Testing
Trade offs and Considerations
● Need to validate system stability and

performance
– LLNL says: “bad performance is a bug”
● “Bad batches” of hardware happen
● Lots of hardware means the unlikely is much
more common
System Testing
Options
● Leave it to the vendor

● Have a burn-in period

– No user access
– Limited user access
● Periodic testing
System Testing
What we did on Fellowship
● Vendor burn in

– Increasingly strict requirements to ship

● Let users decide where to run (prior to
mandatory scheduling)
● Scheduler group of nodes needing testing
● Working on building up a set of performance
and stress tests
System Testing
How it worked
● Ad hoc testing makes problems surprising

too often
● Users find too many hardware issues before

we do
● Group of nodes is easy to administer
System Configuration
Management
Trade offs and Considerations
● Network Scalability

● Administrator Scalability

● Packages vs custom builds

● Upgrading system images vs new, clean

images
System Configuration
Management
Options
● Maintaining individual nodes

● Push images to nodes

● Building new images for each upgrade in 6.x

How it worked
● Great overall

● A package build system to help keep

frontend and nodes in sync would be nice

● Network bottle neck does not appear to be a

problem at this point

User Configuration
Management
Trade offs and Considerations
● Maintainability

● User freedom and comfort

● Number of supported shells

User Configuration
Management
Options
● Make users handle it

● Use /etc/skel to provide defaults and have

users do updates
● Use a centrally located file that users source

● Don't let users do anything

User Configuration
Management
What we did on Fellowship
● /etc/skel defaults plus users updates to start

● Added a central script recently

– This script uses an sh script and some wrapper

scripts to work with both sh and csh style shells
● Planning a manual update
How it worked
● Bumpy, but improving with the central script
Monitoring
Trade offs and Considerations
● Cost

● Functionality

● Flexibility

● Status vs alarms
Monitoring
Options
● Cluster management systems

● Commercial network management systems:

Tivoli, OpenView
● Open Source system monitoring packages:

Big Sister, Ganglia, Nagios

● Most schedulers

● SNMP
Monitoring
What we did on Fellowship
● Ganglia early on

● Added Nagios recently

● SGE

How it worked
● Ganglia provides very user friendly output

– Rewrote most of FreeBSD support

● Nagios working well
● Finding SGE increasingly useful
Disaster Recovery
Trade offs and Considerations
● Cost up front

● Cost of recovery

● Time to recovery

● From what type of disaster

– Hardware failure
– Loss of building
– Data contamination/infection/hacking
Disaster Recovery
Options
● Do nothing

● Local backups

● Off site backups

● Geographically redundant clusters

– Transparent access to multiple clusters

Disaster Recovery
What we did on Fellowship
● Local backups (Bacula, formerly AMANDA)

● Working toward off site backups

How it worked
● No disasters yet

● Local backups are inadequate

● Looking at a second cluster

● Investigating transparent resource discovery

and access
Other Issues
● Virtualization
● System Naming and Addressing
● User Access
● Administrator Access
● User Training and Support
● Inventory Management
Thoughts on a Second Cluster
● We are planning to build a second, similar
cluster on the east coast
● Looking at blades for density and
maintenance
● Interested in higher speed, lower latency
interconnects for applications which can use
them
● Considering a completely diskless approach
with clustered storage to improve
maintainability and scalability
FreeBSD Specifics
● Diskless booting
– Image creation
– Disk initialization
● Using ports on a Cluster
● Ganglia demo
● SGE installation and configuration demo
Diskless Booting:
Image Creation
● Hacked copy of nanobsd Makefile
– Removed flash image support
– Added ability to create extra directories for use
as mount points
– Build a list of ports in the directory via chroot
● Ports directory created with portsnap
● Ports are built using portinstall in a chroot
● Mount linprocfs before every chroot and unmount it
afterward
● Distfile pre-staging is supported for non-redistributable
distfiles and faster rebuilds
● Packages are also supported
● DESTDIR support in ports will eventually
make this obsolete
Diskless Booting:
Image Creation
TODO
● Switch to nanobsd scripts (in place of

obsolete Makefiles)
● Handle sudoers file in images

– Copy on in place after install, extend

rc.initdiskless /conf support to /usr/local/etc, or
add the ability to override in port
● Find a way to keep packages in sync
between nodes and front end systems
Diskless Booting
Startup Process
● PXE boot with NFS root
● /etc/rc.initdiskless initializes /etc from data in
/conf (mounted from /../conf to allow sharing)
– /conf/base/etc remounts /etc
– /conf/default/etc includes rc.conf which simply
sources rc.conf.{default,bcast,ipaddr} allowing
configuration to live in the right place
● /etc/rc.d/diskprep creates swap, /tmp, and
/var and labels them to fstab stays
consistant reguardless of disk configuration
● Normal boot from this point on
Diskless Booting:
Disk Initalization
● Use sysutils/diskprep port (modified version
of Warner Losh's tool for embedded
deployments
– If the right GEOM volume label doesn't exist,
reconfigure the disk
● Could be improved
– Reboot during initalization is often fatel
– Better control of fsck at boot would be useful
● Option to newfs file systems who's contents we don't
care about
– Alternate superblock printout in newfs too noisy
Using Ports on a Cluster
● Very good for languages and cluster tools
● Unusable for MPI ports due to the need for
different ones with different compilers
– Need a bsd.mpi.mk
● Mixed for libraries
– Some are fine with one compiler but others
could benefit from more than on version,
particularly Fortran code
● Hard to keep nodes and front ends in sync
– Need an SGE package cluster :)
Using Ports on a Cluster
Useful Ports
● lang/gcc*, lang/icc, lang/ifc, etc.

● net-mgmt/nagios

● sysutils/diskprep

● sysutils/ganglia-monitor-core

● sysutils/ganglia-webfrontend

● sysutils/sge
Ganglia Demo
SGE Demo

Configuration Demo
Background Slides
Aside: Virtualization
● Virtualization presents another paradigm for
clusters
● Multiple operating systems can be supported
allowing applications to run where the work
best
● Migration of jobs can allow for simplified
maintenance
● Time sharing of machines is more practical
than with normal batch systems
System Naming and
Addressing
Trade offs and Considerations
● Ease of accessing resources

● Easy of physically locating equipment from a

name or address
● Address allocation efficiency
System Naming and
Addressing
Options
● Naming

– By address (e.g. 10.5.2.10-node)

– By location (e.g. Rack2node10)
– Sets of things (elements, Opera Singers,
FreeBSD committers, etc)
● Addressing
– By location vs arbitrary
– Public vs private
– Packed vs sparse
– IPv6
System Naming and
Addressing
What we did on Fellowship
● Naming

– Core systems are Lord of the Rings characters

– Nodes by rack and unit (r02n10)
– Cluster DNS zone (fellow.aero.org)
● Addressing
– Private network 10.5/16
– Node addresses 10.5.rack.unit
– NAT for downloads
System Naming and
Addressing
How it worked
● Has forced some application design issues

(no servers on nodes)

● Good demonstration of the issues of a

private address space cluster

● IPv6 would be nice
User Access
Trade offs and Considerations
● Ease of use

● User familiarity/comfort

● Control of resources
User Access
Options
● Shell on a frontend machine

● Direct access to nodes

● Single system image

● Desktop integration

● Web based portals

● Application integration
User Access
What we did on Fellowship
● SSH to frontend where users edit and compile

code and submit jobs to the scheduler

● Working on grid based solution for easier access

– Web portals
– Integration with clients
How it worked
● Many users don't really get the command prompt

thanks to Microsoft and Apple

● Some initial resistance to all access via SSH
Administrator Access
Trade offs and Considerations
● Cost

● Effectiveness of out of band access

● Frequency of use
Administrator Access
Options
● SSH to machines

● IPMI

● Serial consoles

● Local KVM switches

● Remote KVM switches

Administrator Access
What we did on Fellowship
● SSH as primary access

● Serial console to nodes and core systems

initially
● Local KVM access to core systems

● Upgraded to remote KVM access

Administrator Access
How it worked
● SSH is great when it works, but high latency

with many short connections

● We have abandoned serial consoles as too

expensive for the infrequent use

● Remove KVM access is very nice

● All devices except the Cisco 6509 have

remote power control

● Would like to investigate IPMI
Inventory Management
Trade offs and Considerations
● Need to know what hardware and software

is where
● Need history of success/failure to detect

buggy hardware
Inventory Management
Options
● Ad hoc tracking

● Wiki

● Property database

● Some request tracking systems

Inventory Management
What we did on Fellowship
● Ad hoc then some wiki pages

● Investigating a database solution

How it worked
● Sort of works, but things get lost or forgotten
Disclaimer
● All trademarks, service marks, and trade
names are the property of their respective
owners.

Cannot Install The Best Update Candidate For Package
No ratings yet
Cannot Install The Best Update Candidate For Package
7 pages
Sun Cluster
100% (1)
Sun Cluster
87 pages
Cluster Computing
No ratings yet
Cluster Computing
57 pages
Building A Solaris Cluster Express Cluster in A VirtualBox On OpenSolaris System-Log - Tyr
No ratings yet
Building A Solaris Cluster Express Cluster in A VirtualBox On OpenSolaris System-Log - Tyr
15 pages
The Little Book of Sitecore® Tips: Volume 1
From Everand
The Little Book of Sitecore® Tips: Volume 1
Neil P Shack
No ratings yet
Demo Plan Final
91% (22)
Demo Plan Final
52 pages
Heeds User Guide
No ratings yet
Heeds User Guide
195 pages
STEP PC2 - NRMKPlatform Getting Started - v1.9.0 - R1 PDF
No ratings yet
STEP PC2 - NRMKPlatform Getting Started - v1.9.0 - R1 PDF
58 pages
Culegere - Teste - Biologie - Umfcd 2019 PDF
No ratings yet
Culegere - Teste - Biologie - Umfcd 2019 PDF
243 pages
22-clusters-slides
No ratings yet
22-clusters-slides
61 pages
3_Servers
No ratings yet
3_Servers
26 pages
03-chapter 3_Servers
No ratings yet
03-chapter 3_Servers
26 pages
Cluster Computing
No ratings yet
Cluster Computing
11 pages
Beowulf Setup
No ratings yet
Beowulf Setup
46 pages
Veritas Cluster Server-VCS
No ratings yet
Veritas Cluster Server-VCS
72 pages
Rethinking Data Centers: Chuck Thacker Microsoft Research October 2007
No ratings yet
Rethinking Data Centers: Chuck Thacker Microsoft Research October 2007
26 pages
A Step by Step To Built A Cluster For Parallel Computing: Abstract
No ratings yet
A Step by Step To Built A Cluster For Parallel Computing: Abstract
7 pages
Clustering Tech Overview
No ratings yet
Clustering Tech Overview
48 pages
Server Software Guide for Setting up a Server - phoenixNAP KB
No ratings yet
Server Software Guide for Setting up a Server - phoenixNAP KB
4 pages
Vcs and Oracle Ha
No ratings yet
Vcs and Oracle Ha
157 pages
So Laris 1 0: Randy Fishel
No ratings yet
So Laris 1 0: Randy Fishel
33 pages
Intel - Build Your Own Cluster With Open Source Software and Intel Hardware
No ratings yet
Intel - Build Your Own Cluster With Open Source Software and Intel Hardware
111 pages
SLES12 Novedades
No ratings yet
SLES12 Novedades
43 pages
nutanix-2024-07-30.09-54-21
No ratings yet
nutanix-2024-07-30.09-54-21
13 pages
Modern FreeBSD (London Opentech 2010)
No ratings yet
Modern FreeBSD (London Opentech 2010)
18 pages
Cluster Stack Basics
No ratings yet
Cluster Stack Basics
25 pages
SA Notes Module 1
No ratings yet
SA Notes Module 1
37 pages
Ira Pramanick
No ratings yet
Ira Pramanick
24 pages
Using Openbsd 3.3 Asa Firewall/Gateway For Home DSL or Cable
No ratings yet
Using Openbsd 3.3 Asa Firewall/Gateway For Home DSL or Cable
16 pages
Building Scalable Web Architectures: Aaron Bannert
No ratings yet
Building Scalable Web Architectures: Aaron Bannert
75 pages
SUSE HA Arch Overview
No ratings yet
SUSE HA Arch Overview
26 pages
Cluster Computing
No ratings yet
Cluster Computing
23 pages
Batch24 - Azure Rescue Mode-Interview Questions-08-Oct-2021
No ratings yet
Batch24 - Azure Rescue Mode-Interview Questions-08-Oct-2021
166 pages
Building Your Cloud Infrastructure
No ratings yet
Building Your Cloud Infrastructure
4 pages
A Practical Guide To Oracle 10g RAC Its REAL Easy!: Gavin Soorma, Emirates Airline, Dubai Session# 106
No ratings yet
A Practical Guide To Oracle 10g RAC Its REAL Easy!: Gavin Soorma, Emirates Airline, Dubai Session# 106
113 pages
Suse Linux Enterprise Server and High Performance Computing
No ratings yet
Suse Linux Enterprise Server and High Performance Computing
51 pages
Practical Guide of Building A HPC Cluster
No ratings yet
Practical Guide of Building A HPC Cluster
30 pages
3.1-1 Introduction To Server Computing - Password - Removed
No ratings yet
3.1-1 Introduction To Server Computing - Password - Removed
8 pages
Industrial Training
No ratings yet
Industrial Training
17 pages
Sun Cluster
No ratings yet
Sun Cluster
87 pages
RAC 10g Best Practices On Linux: Roland Knapp RAC Pack
No ratings yet
RAC 10g Best Practices On Linux: Roland Knapp RAC Pack
49 pages
Linux Running Notes (1)
No ratings yet
Linux Running Notes (1)
162 pages
Experience
No ratings yet
Experience
23 pages
Distributed CS571
No ratings yet
Distributed CS571
36 pages
W2C1 History Building Blocks Cloud Computing
No ratings yet
W2C1 History Building Blocks Cloud Computing
38 pages
POC Linux7 Multinode GFS2 Cluster On CentOS 7.3
No ratings yet
POC Linux7 Multinode GFS2 Cluster On CentOS 7.3
69 pages
Lecture 8 ICT723
No ratings yet
Lecture 8 ICT723
41 pages
UG_research_JenniferH_DavidP_Grid_Dec06_final_version_000
No ratings yet
UG_research_JenniferH_DavidP_Grid_Dec06_final_version_000
14 pages
Practical Guide Rac
No ratings yet
Practical Guide Rac
63 pages
Beowulf Cluster
No ratings yet
Beowulf Cluster
60 pages
Brumund Building A Smallish v1
No ratings yet
Brumund Building A Smallish v1
30 pages
Building Clouds Using Commodity, Open-Source Software Components
No ratings yet
Building Clouds Using Commodity, Open-Source Software Components
20 pages
Introduction To Server OS
No ratings yet
Introduction To Server OS
35 pages
Build Your Own Gateway Firewall With FreeBSD
No ratings yet
Build Your Own Gateway Firewall With FreeBSD
10 pages
Cloud Foundry
No ratings yet
Cloud Foundry
14 pages
Module1
No ratings yet
Module1
28 pages
Less Known Solaris Features
No ratings yet
Less Known Solaris Features
404 pages
Dreamcast Architecture: Architecture of Consoles: A Practical Analysis, #9
From Everand
Dreamcast Architecture: Architecture of Consoles: A Practical Analysis, #9
Rodrigo Copetti
No ratings yet
CUDA Programming with Python: From Basics to Expert Proficiency
From Everand
CUDA Programming with Python: From Basics to Expert Proficiency
William Smith
1/5 (1)
Nintendo 64 Architecture: Architecture of Consoles: A Practical Analysis, #8
From Everand
Nintendo 64 Architecture: Architecture of Consoles: A Practical Analysis, #8
Rodrigo Copetti
No ratings yet
CUDA Programming with C++: From Basics to Expert Proficiency
From Everand
CUDA Programming with C++: From Basics to Expert Proficiency
William Smith
No ratings yet
Cisco Packet Tracer Implementation: Building and Configuring Networks: 1, #1
From Everand
Cisco Packet Tracer Implementation: Building and Configuring Networks: 1, #1
S. R. Jena
No ratings yet
GameCube Architecture: Architecture of Consoles: A Practical Analysis, #10
From Everand
GameCube Architecture: Architecture of Consoles: A Practical Analysis, #10
Rodrigo Copetti
No ratings yet
Oracle GoldenGate 11g Implementer's guide
From Everand
Oracle GoldenGate 11g Implementer's guide
John P Jeffries
5/5 (1)
Simple Golang Programming for Beginners
From Everand
Simple Golang Programming for Beginners
Terry T. Diaz
No ratings yet
Study On Portable Load Balancer For Container Clusters
No ratings yet
Study On Portable Load Balancer For Container Clusters
149 pages
A Portable Load Balancer With ECMP Redundancy For Container Clusters
No ratings yet
A Portable Load Balancer With ECMP Redundancy For Container Clusters
14 pages
Datasheet of DS-3E0109P-EC 8-Port 100 Mbps Long-Range Unmanaged PoE Switch
No ratings yet
Datasheet of DS-3E0109P-EC 8-Port 100 Mbps Long-Range Unmanaged PoE Switch
4 pages
Implementasi BGP Dan Resource Public Key Infrastructuremenggunakan BIRD Untuk KeamananRouting
No ratings yet
Implementasi BGP Dan Resource Public Key Infrastructuremenggunakan BIRD Untuk KeamananRouting
10 pages
DR Abdus Salam His Faith and His Science
No ratings yet
DR Abdus Salam His Faith and His Science
8 pages
Catalyst Command Line Interface
No ratings yet
Catalyst Command Line Interface
8 pages
The Ethics Al-Razi (865-925?) : Ther Se-Anne Druart
No ratings yet
The Ethics Al-Razi (865-925?) : Ther Se-Anne Druart
26 pages
How To - OpenBSD 4.0 + SQUID
No ratings yet
How To - OpenBSD 4.0 + SQUID
9 pages
Asiabsdcon2007 Cluster Tutorial A4
No ratings yet
Asiabsdcon2007 Cluster Tutorial A4
87 pages
Muslim Contributions To Science
No ratings yet
Muslim Contributions To Science
164 pages
Fling On Raspberry Pi
No ratings yet
Fling On Raspberry Pi
19 pages
Billing System For Internet Service Providers Using Mikrotik Devices
No ratings yet
Billing System For Internet Service Providers Using Mikrotik Devices
26 pages
Slides Rpki Deployment 2
No ratings yet
Slides Rpki Deployment 2
87 pages
FreeBSD 5.3 Networking PDF
No ratings yet
FreeBSD 5.3 Networking PDF
27 pages
Cisco Router Config
No ratings yet
Cisco Router Config
32 pages
How Do I Load Lagg Driver?: A Note About Custom Freebsd Kernels
No ratings yet
How Do I Load Lagg Driver?: A Note About Custom Freebsd Kernels
3 pages
1cdipadr PDF
No ratings yet
1cdipadr PDF
56 pages
Understanding BGP Misconfiguration: Ratul Mahajan David Wetherall Tom Anderson
No ratings yet
Understanding BGP Misconfiguration: Ratul Mahajan David Wetherall Tom Anderson
14 pages
CIS Oracle Database 12c Benchmark v3.0.0 ARCHIVE
No ratings yet
CIS Oracle Database 12c Benchmark v3.0.0 ARCHIVE
263 pages
Ibm Hana
No ratings yet
Ibm Hana
16 pages
U.are.U SDK Developer Guide
No ratings yet
U.are.U SDK Developer Guide
102 pages
DQ SAP Users Guide 11 6
No ratings yet
DQ SAP Users Guide 11 6
104 pages
Selinux 101: What You Should Know: Print Close
No ratings yet
Selinux 101: What You Should Know: Print Close
8 pages
Cobol Install
No ratings yet
Cobol Install
9 pages
LGLinux 1 PDF
No ratings yet
LGLinux 1 PDF
149 pages
Modul Bahasa Inggris
No ratings yet
Modul Bahasa Inggris
39 pages
Making A Short Movie With MakeHuman and Blender25 in Two Weeks 2010 11 24
No ratings yet
Making A Short Movie With MakeHuman and Blender25 in Two Weeks 2010 11 24
129 pages
Yocto Slides
No ratings yet
Yocto Slides
300 pages
Docker
No ratings yet
Docker
23 pages
Performance Evaluation of Microservices
No ratings yet
Performance Evaluation of Microservices
9 pages
Comparison of Windows and Linux
No ratings yet
Comparison of Windows and Linux
35 pages
Achieve: Mahan
No ratings yet
Achieve: Mahan
20 pages
LESSON 3 Operating System
No ratings yet
LESSON 3 Operating System
17 pages
iISIZULU NEW GRADE 9 2025
No ratings yet
iISIZULU NEW GRADE 9 2025
8 pages
Linux Bash Shell Programming MCQs - Sanfoundry
No ratings yet
Linux Bash Shell Programming MCQs - Sanfoundry
9 pages
iBMA 2.0 User Guide 06 PDF
No ratings yet
iBMA 2.0 User Guide 06 PDF
197 pages
Chennai Companies
100% (1)
Chennai Companies
16 pages
Red - Hat - Satellite 6.2 Architecture - Guide en US
No ratings yet
Red - Hat - Satellite 6.2 Architecture - Guide en US
39 pages
Decoding SSTV From A File Using Linux
No ratings yet
Decoding SSTV From A File Using Linux
8 pages
Devops It Report
No ratings yet
Devops It Report
53 pages
Red Hat Enterprise Linux-5-5.5 Technical Notes-en-US
No ratings yet
Red Hat Enterprise Linux-5-5.5 Technical Notes-en-US
421 pages
Pitjvol 1 No 1 Final
No ratings yet
Pitjvol 1 No 1 Final
48 pages
GLPK Cli
No ratings yet
GLPK Cli
17 pages

Asiabsdcon2007 Cluster Tutorial A4

Uploaded by

Asiabsdcon2007 Cluster Tutorial A4

Uploaded by

Building Clusters With FreeBSD

© 2006-2007 The Aerospace Corporation

● Performance: Overhead, Driver quality

● Hardware Support: Processor, Network,

software installation and management

debugging, modification, scripting

– General purpose distros: Debian, Fedora, Red

– Started with 4.x

● Linux emulation supports commercial code

● Had to port SGE, Ganglia, OpenMPI

● Performance: Integer, floating point, cache

size and latency, memory bandwidth and

hardware drivers, applications (libraries),

● AMD64 (EM64T): AMD, Intel

● Intel Xeons for the next 76

● AMD Opterons for the most recent

● Xeons and Opterons performing well

● Considering 64-bit mode for the future

● Looking at Intel Woodcrest CPUs

● Performance: throughput, latency

● Form factor: cable management and

● Available switches: size, inter-switch links

● Separation of different types of traffic

● 10 Gigabit Ethernet: fast

● Infiniband: fast, low latency

● 10 Gb Myrinet: fast, low latency

● Others: Dolphin, Fiber Channel

● One rack of 2Gbps Myrinet nodes

● Some applications would like lower latency

● Looking at 10GbE and 10Gb Myrinet

● Protocol Based Network Storage: host or

NAS appliance based

roots, and some software

● Moved home directories to a Netapp in 2005

● Enhanced Warner Losh's diskprep script to

keep disk layouts up to date

● Disk failures are a problem

● Volume (floor space)

● Core equipment in short 4-post racks

● 6 inch wide vertical cable management with

direct runs from the switch in first row

in second row and patch panels in both rows

and some custom cable lengths helped

● Convert a space (office, store room, etc)

● Build or acquire a real machine room

● Use an old mainframe room

underground machine room

● Power was initially adequate, but is

● Fit to job mix

● Custom or application specific scheduler

● Batch job system

● Tried OpenPBS (not stable 4 years ago, no

from Ron Chen

● Forced adoption has gone well

● Users have preconceived notions of

computers that don't fit reality with batch

missing from the FreeBSD port with good

● Gradual migration vs abrupt transitions

– Build all at once

– Gradual buildup of nodes

● Have a burn-in period

– Increasingly strict requirements to ship

● Packages vs custom builds

● Upgrading system images vs new, clean

● Push images to nodes

● Network booted with shared images

● Building new images for each upgrade in 6.x

● A package build system to help keep

frontend and nodes in sync would be nice

problem at this point

● User freedom and comfort

● Number of supported shells

● Use /etc/skel to provide defaults and have

● Don't let users do anything

● Added a central script recently

– This script uses an sh script and some wrapper