Gluster File System 3.3.0
Gluster File System 3.3.0
0 Administration Guide
Using Gluster File System
GlusterFS Developers
Administration Guide
Gluster File System 3.3.0 Administration Guide Using Gluster File System Edition 1
Author GlusterFS Developers [email protected]
Copyright 2006-2012 Red Hat, Inc., (http://www.redhat.com) GlusterFS has a dual licencing model for its source code On client side: GlusterFS licensed to you under your choice of the GNU Lesser General Public License, version 3 or any later version (LGPLv3 or later), or the GNU General Public License, version 2 (GPLv2), in all cases as published by the Free Software Foundation. On server side: GlusterFS is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 3 of the License, or (at your option) any later version.
This guide describes Gluster File System (GlusterFS) and provides information on how to configure, operate, and manage GlusterFS.
Preface vii 1. Audience ....................................................................................................................... vii 2. License ......................................................................................................................... vii 3. Document Conventions .................................................................................................. vii 3.1. Typographic Conventions .................................................................................... vii 3.2. Pull-quote Conventions ....................................................................................... viii 3.3. Notes and Warnings ............................................................................................ ix 4. We Need Feedback! ....................................................................................................... x 1. Introducing Gluster File System 2. Managing the glusterd Service 2.1. Starting and Stopping glusterd Manually ........................................................................ 2.2. Starting glusterd Automatically ...................................................................................... 2.2.1. Red Hat-based Systems .................................................................................... 2.2.2. Debian-based Systems ...................................................................................... 2.2.3. Systems Other than Red Hat and Debain ........................................................... 3. Using the Gluster Console Manager Command Line Utility 1 3 3 3 3 3 3 5
4. Setting up Trusted Storage Pools 7 4.1. Adding Servers to Trusted Storage Pool ........................................................................ 7 4.2. Removing Servers from the Trusted Storage Pool .......................................................... 8 5. Setting up GlusterFS Server Volumes 5.1. Creating Distributed Volumes ...................................................................................... 5.2. Creating Replicated Volumes ...................................................................................... 5.3. Creating Striped Volumes ........................................................................................... 5.4. Creating Distributed Striped Volumes .......................................................................... 5.5. Creating Distributed Replicated Volumes ..................................................................... 5.6. Creating Distributed Striped Replicated Volumes .......................................................... 5.7. Creating Striped Replicated Volumes ........................................................................... 5.8. Starting Volumes ........................................................................................................ 6. Accessing Data - Setting Up GlusterFS Client 6.1. Gluster Native Client .................................................................................................. 6.1.1. Installing the Gluster Native Client .................................................................... 6.1.2. Mounting Volumes ........................................................................................... 6.2. NFS ........................................................................................................................... 6.2.1. Using NFS to Mount Volumes .......................................................................... 6.3. CIFS .......................................................................................................................... 6.3.1. Using CIFS to Mount Volumes ......................................................................... 6.4. Testing Mounted Volumes .......................................................................................... 7. Managing GlusterFS Volumes 7.1. Tuning Volume Options .............................................................................................. 7.2. Expanding Volumes .................................................................................................... 7.3. Shrinking Volumes ..................................................................................................... 7.4. Migrating Volumes ...................................................................................................... 7.5. Rebalancing Volumes ................................................................................................. 7.5.1. Rebalancing Volume to Fix Layout Changes ..................................................... 7.5.2. Rebalancing Volume to Fix Layout and Migrate Data ......................................... 7.5.3. Displaying Status of Rebalance Operation ......................................................... 7.5.4. Stopping Rebalance Operation ......................................................................... 7.6. Stopping Volumes ...................................................................................................... 7.7. Deleting Volumes ....................................................................................................... 7.8. Triggering Self-Heal on Replicate ................................................................................ 9 10 11 12 14 15 17 18 19 21 21 21 23 25 25 26 27 28 31 31 39 40 42 43 44 44 44 45 45 46 46 iii
Administration Guide 8. Managing Geo-replication 8.1. Replicated Volumes vs Geo-replication ........................................................................ 8.2. Preparing to Deploy Geo-replication ............................................................................ 8.2.1. Exploring Geo-replication Deployment Scenarios ............................................... 8.2.2. Geo-replication Deployment Overview ............................................................... 8.2.3. Checking Geo-replication Minimum Requirements ............................................. 8.2.4. Setting Up the Environment for Geo-replication ................................................. 8.2.5. Setting Up the Environment for a Secure Geo-replication Slave .......................... 8.3. Starting Geo-replication .............................................................................................. 8.3.1. Starting Geo-replication .................................................................................... 8.3.2. Verifying Successful Deployment ...................................................................... 8.3.3. Displaying Geo-replication Status Information .................................................... 8.3.4. Configuring Geo-replication .............................................................................. 8.3.5. Stopping Geo-replication .................................................................................. 8.4. Restoring Data from the Slave .................................................................................... 8.5. Best Practices ............................................................................................................ 9. Managing Directory Quota 9.1. Enabling Quota .......................................................................................................... 9.2. Disabling Quota .......................................................................................................... 9.3. Setting or Replacing Disk Limit ................................................................................... 9.4. Displaying Disk Limit Information ................................................................................. 9.5. Updating Memory Cache Size ..................................................................................... 9.6. Removing Disk Limit ................................................................................................... 10. Monitoring your GlusterFS Workload 10.1. Running GlusterFS Volume Profile Command ............................................................ 10.1.1. Start Profiling ................................................................................................. 10.1.2. Displaying the I/0 Information ......................................................................... 10.1.3. Stop Profiling ................................................................................................. 10.2. Running GlusterFS Volume TOP Command ............................................................... 10.2.1. Viewing Open fd Count and Maximum fd Count ............................................... 10.2.2. Viewing Highest File Read Calls ..................................................................... 10.2.3. Viewing Highest File Write Calls ..................................................................... 10.2.4. Viewing Highest Open Calls on Directories ...................................................... 10.2.5. Viewing Highest Read Calls on Directory ........................................................ 10.2.6. Viewing List of Read Performance on each Brick ............................................. 10.2.7. Viewing List of Write Performance on each Brick ............................................. 10.3. Displaying Volume Information .................................................................................. 10.4. Performing Statedump on a Volume .......................................................................... 10.5. Displaying Volume Status ......................................................................................... 11. POSIX Access Control Lists 11.1. Activating POSIX ACLs Support ................................................................................ 11.1.1. Activating POSIX ACLs Support on Sever ....................................................... 11.1.2. Activating POSIX ACLs Support on Client ....................................................... 11.2. Setting POSIX ACLs ................................................................................................. 11.2.1. Setting Access ACLs ..................................................................................... 11.2.2. Setting Default ACLs ...................................................................................... 11.3. Retrieving POSIX ACLs ............................................................................................ 11.4. Removing POSIX ACLs ............................................................................................ 11.5. Samba and ACLs ..................................................................................................... 11.6. NFS and ACLs ......................................................................................................... 49 49 49 50 51 52 52 53 55 56 56 56 57 58 58 61 63 63 63 64 64 65 65 67 67 67 67 68 69 69 70 71 71 72 73 74 75 76 77 83 83 83 83 83 83 84 85 85 86 86
12. Managing Unified File and Object Storage 87 12.1. Components of Object Storage ................................................................................. 87 iv
12.2. Advantages of using GlusterFS Unified File and Object Storage .................................. 88 12.3. Preparing to Deploy Unified File and Object Storage .................................................. 89 12.3.1. Pre-requisites ................................................................................................ 89 12.3.2. Dependencies ................................................................................................ 89 12.4. Installing and Configuring Unified File and Object Storage ........................................... 89 12.4.1. Installing Unified File and Object Storage ........................................................ 89 12.4.2. Adding Users ................................................................................................. 90 12.4.3. Configuring Proxy Server ............................................................................... 91 12.4.4. Configuring Authentication System .................................................................. 91 12.4.5. Configuring Proxy Server for HTTPS ............................................................... 91 12.4.6. Configuring Object Server .............................................................................. 93 12.4.7. Configuring Container Server .......................................................................... 94 12.4.8. Configuring Account Server ............................................................................ 95 12.4.9. Starting and Stopping Server .......................................................................... 96 12.5. Working with Unified File and Object Storage ............................................................. 97 12.5.1. Configuring Authenticated Access ................................................................... 97 12.5.2. Working with Accounts ................................................................................... 98 12.5.3. Working with Containers ................................................................................ 99 12.5.4. Working with Objects ................................................................................... 103 13. Managing Hadoop Compatible Storage 13.1. Architecture Overview ............................................................................................. 13.2. Advantages ............................................................................................................ 13.3. Preparing to Install Hadoop Compatible Storage ....................................................... 13.3.1. Pre-requisites ............................................................................................... 13.4. Installing, and Configuring Hadoop Compatible Storage ............................................ 13.5. Starting and Stopping the Hadoop MapReduce Daemon ........................................... 109 109 109 109 109 110 112
14. Troubleshooting GlusterFS 113 14.1. Managing GlusterFS Logs ....................................................................................... 113 14.1.1. Rotating Logs .............................................................................................. 113 14.2. Troubleshooting Geo-replication .............................................................................. 113 14.2.1. Locating Log Files ........................................................................................ 113 14.2.2. Rotating Geo-replication Logs ....................................................................... 114 14.2.3. Synchronization is not complete .................................................................... 115 14.2.4. Issues in Data Synchronization ..................................................................... 115 14.2.5. Geo-replication status displays Faulty very often ............................................ 115 14.2.6. Intermediate Master goes to Faulty State ...................................................... 116 14.3. Troubleshooting POSIX ACLs .................................................................................. 116 14.3.1. setfacl command fails with setfacl: <file or directory name>: Operation not supported error ...................................................................................................... 116 14.4. Troubleshooting Hadoop Compatible Storage ........................................................... 116 14.4.1. Time Sync ................................................................................................... 116 14.5. Troubleshooting NFS .............................................................................................. 116 14.5.1. mount command on NFS client fails with RPC Error: Program not registered 116 .. 14.5.2. NFS server start-up fails with Port is already in use error in the log file." ......... 117 14.5.3. mount command fails with rpc.statd related error message ........................... 117 14.5.4. mount command takes too long to finish. ....................................................... 117 14.5.5. NFS server, glusterfsd starts but initialization fails with nfsrpc- service: portmap registration of program failed error message in the log. ................................ 118 14.5.6. mount command fails with NFS server failed error. ......................................... 119 14.5.7. showmount fails with clnt_create: RPC: Unable to receive ............................... 119 14.5.8. Application fails with "Invalid argument" or "Value too large for defined data type" error. .............................................................................................................. 119 14.6. Troubleshooting File Locks ...................................................................................... 120 v
Administration Guide 15. Command Reference 123 15.1. gluster Command ................................................................................................... 123 15.2. glusterd Daemon .................................................................................................... 126 16. Glossary A. Revision History 129 133
vi
Preface
This guide describes how to configure, operate, and manage Gluster File System (GlusterFS).
1. Audience
This guide is intended for Systems Administrators interested in configuring and managing GlusterFS. This guide assumes that you are familiar with the Linux operating system, concepts of File System, GlusterFS concepts, and GlusterFS Installation
2. License
The License information is available at http://www.redhat.com/licenses/rhel_rha_eula.html.
3. Document Conventions
This manual uses several conventions to highlight certain words and phrases and draw attention to specific pieces of information. In PDF and paper editions, this manual uses typefaces drawn from the Liberation Fonts set. The Liberation Fonts set is also used in HTML editions if the set is installed on your system. If not, alternative but equivalent typefaces are displayed. Note: Red Hat Enterprise Linux 5 and later includes the Liberation Fonts set by default.
1
https://fedorahosted.org/liberation-fonts/
vii
Preface If source code is discussed, class names, methods, functions, variable names and returned values mentioned within a paragraph will be presented as above, in mono-spaced bold. For example: File-related classes include filesystem for file systems, file for files, and dir for directories. Each class has its own associated set of permissions. Proportional Bold This denotes words or phrases encountered on a system, including application names; dialog box text; labeled buttons; check-box and radio button labels; menu titles and sub-menu titles. For example: Choose System Preferences Mouse from the main menu bar to launch Mouse Preferences. In the Buttons tab, click the Left-handed mouse check box and click Close to switch the primary mouse button from the left to the right (making the mouse suitable for use in the left hand). To insert a special character into a gedit file, choose Applications Accessories Character Map from the main menu bar. Next, choose Search Find from the Character Map menu bar, type the name of the character in the Search field and click Next. The character you sought will be highlighted in the Character Table. Double-click this highlighted character to place it in the Text to copy field and then click the Copy button. Now switch back to your document and choose Edit Paste from the gedit menu bar. The above text includes application names; system-wide menu names and items; application-specific menu names; and buttons and text found within a GUI interface, all presented in proportional bold and all distinguishable by context. Mono-spaced Bold Italic or Proportional Bold Italic Whether mono-spaced bold or proportional bold, the addition of italics indicates replaceable or variable text. Italics denotes text you do not input literally or displayed text that changes depending on circumstance. For example: To connect to a remote machine using ssh, type ssh [email protected] at a shell prompt. If the remote machine is example.com and your username on that machine is john, type ssh [email protected]. The mount -o remount file-system command remounts the named file system. For example, to remount the /home file system, the command is mount -o remount /home. To see the version of a currently installed package, use the rpm -q package command. It will return a result as follows: package-version-release. Note the words in bold italics above username, domain.name, file-system, package, version and release. Each word is a placeholder, either for text you enter when issuing a command or for text displayed by the system. Aside from standard usage for presenting the title of a work, italics denotes the first use of a new and important term. For example: Publican is a DocBook publishing system.
Notes and Warnings Output sent to a terminal is set in mono-spaced roman and presented thus:
books books_tests Desktop Desktop1 documentation downloads drafts images mss notes photos scripts stuff svgs svn
Source-code listings are also set in mono-spaced roman but add syntax highlighting as follows:
package org.jboss.book.jca.ex1; import javax.naming.InitialContext; public class ExClient { public static void main(String args[]) throws Exception { InitialContext iniCtx = new InitialContext(); Object ref = iniCtx.lookup("EchoBean"); EchoHome home = (EchoHome) ref; Echo echo = home.create(); System.out.println("Created Echo"); System.out.println("Echo.echo('Hello') = " + echo.echo("Hello")); } }
Note
Notes are tips, shortcuts or alternative approaches to the task at hand. Ignoring a note should have no negative consequences, but you might miss out on a trick that makes your life easier.
Important
Important boxes detail things that are easily missed: configuration changes that only apply to the current session, or services that need restarting before an update will apply. Ignoring a box labeled 'Important' will not cause data loss but may cause irritation and frustration.
Warning
Warnings should not be ignored. Ignoring warnings will most likely cause data loss.
ix
Preface
4. We Need Feedback!
If you find any issues, please open a bug on our Bugzilla - https://bugzilla.redhat.com/enter_bug.cgi? product=GlusterFS http://www.gluster.org/interact/mailinglists/ - For details about mailing lists check out our community page If you want live help, join us on #gluster on freenode (IRC channel)
Chapter 1.
Figure 1.1. Virtualized Cloud Environments GlusterFS is designed for today's high-performance, virtualized cloud environments. Unlike traditional data centers, cloud environments require multi-tenancy along with the ability to grow or shrink resources on demand. Enterprises can scale capacity, performance, and availability on demand, with no vendor lock-in, across on-premise, public cloud, and hybrid environments. GlusterFS is in production at thousands of enterprises spanning media, healthcare, government, education, web 2.0, and financial services. The following table lists the commercial offerings and its documentation location: Product Red Hat Storage Software Appliance Red Hat Virtual Documentation Location http://docs.redhat.com/docs/en-US/Red_Hat_Storage_Software_Appliance/ index.html
http://docs.redhat.com/docs/en-US/Red_Hat_Virtual_Storage_Appliance/index.html 1
Chapter 1. Introducing Gluster File System Product Storage Appliance Red Hat Storage http://docs.redhat.com/docs/en-US/Red_Hat_Storage/index.html Documentation Location
Chapter 2.
Note
You must start glusterd on all GlusterFS servers.
Chapter 3.
Chapter 4.
Note
Do not self-probe the first server/localhost. The glusterd service must be running on all storage servers that you want to add to the storage pool. See Chapter 2, Managing the glusterd Service for more information.
2. Verify the peer status from the first server using the following commands:
# gluster peer status Number of Peers: 3 Hostname: server2 Uuid: 5e987bda-16dd-43c2-835b-08b7d55e94e5 State: Peer in Cluster (Connected) Hostname: server3 Uuid: 1e0ca3aa-9ef7-4f66-8f15-cbc348f29ff7 State: Peer in Cluster (Connected) Hostname: server4 Uuid: 3e0caba-9df7-4f66-8e5d-cbc348f29ff7
Chapter 5.
# gluster volume create test-volume server3:/exp3 server4:/exp4 Creation of test-volume has been successful Please start the volume to access data.
Note
Disk/server failure in distributed volumes can result in a serious loss of data because directory contents are spread randomly across the bricks in the volume.
Figure 5.1. Illustration of a Distributed Volume To create a distributed volume 1. Create a trusted storage pool as described earlier in Section 4.1, Adding Servers to Trusted Storage Pool. 2. Create the distributed volume: # gluster volume create NEW-VOLNAME [transport [tcp | rdma | tcp,rdma]] NEW-BRICK... For example, to create a distributed volume with four storage servers using tcp:
# gluster volume create test-volume server1:/exp1 server2:/exp2 server3:/exp3 server4:/ exp4
10
For example, to create a distributed volume with four storage servers over InfiniBand:
# gluster volume create test-volume transport rdma server1:/exp1 server2:/exp2 server3:/ exp3 server4:/exp4 Creation of test-volume has been successful Please start the volume to access data.
If the transport type is not specified, tcp is used as the default. You can also set additional options if required, such as auth.allow or auth.reject. For more information, see Section 7.1, Tuning Volume Options
Note
Make sure you start your volumes before you try to mount them or else client operations after the mount will hang, see Section 5.8, Starting Volumes for details.
Note
The number of bricks should be equal to of the replica count for a replicated volume. To protect against server and disk failures, it is recommended that the bricks of the volume are from different servers.
11
Figure 5.2. Illustration of a Replicated Volume To create a replicated volume 1. Create a trusted storage pool as described earlier in Section 4.1, Adding Servers to Trusted Storage Pool. 2. Create the replicated volume: # gluster volume create NEW-VOLNAME [replica COUNT] [transport [tcp | rdma | tcp,rdma]] NEW-BRICK... For example, to create a replicated volume with two storage servers:
# gluster volume create test-volume replica 2 transport tcp server1:/exp1 server2:/exp2 Creation of test-volume has been successful Please start the volume to access data.
If the transport type is not specified, tcp is used as the default. You can also set additional options if required, such as auth.allow or auth.reject. For more information, see Section 7.1, Tuning Volume Options
Note
Make sure you start your volumes before you try to mount them or else client operations after the mount will hang, see Section 5.8, Starting Volumes for details.
12
Note
The number of bricks should be a equal to the stripe count for a striped volume.
Figure 5.3. Illustration of a Striped Volume To create a striped volume 1. Create a trusted storage pool as described earlier in Section 4.1, Adding Servers to Trusted Storage Pool. 2. Create the striped volume: # gluster volume create NEW-VOLNAME [stripe COUNT] [transport [tcp | rdma | tcp,rdma]] NEW-BRICK... For example, to create a striped volume across two storage servers:
# gluster volume create test-volume stripe 2 transport tcp server1:/exp1 server2:/exp2 Creation of test-volume has been successful Please start the volume to access data.
If the transport type is not specified, tcp is used as the default. You can also set additional options if required, such as auth.allow or auth.reject. For more information, see Section 7.1, Tuning Volume Options
13
Note
Make sure you start your volumes before you try to mount them or else client operations after the mount will hang, see Section 5.8, Starting Volumes for details.
Note
The number of bricks should be a multiple of the stripe count for a distributed striped volume.
Figure 5.4. Illustration of a Distributed Striped Volume To create a distributed striped volume 1. Create a trusted storage pool as described earlier in Section 4.1, Adding Servers to Trusted Storage Pool. 2. Create the distributed striped volume: # gluster volume create NEW-VOLNAME [stripe COUNT] [transport [tcp | rdma | tcp,rdma]] NEW-BRICK...
14
Creating Distributed Replicated Volumes For example, to create a distributed striped volume across eight storage servers:
# gluster volume create test-volume stripe 4 transport tcp server1:/exp1 server2:/exp2 server3:/exp3 server4:/exp4 server5:/exp5 server6:/exp6 server7:/exp7 server8:/exp8 Creation of test-volume has been successful Please start the volume to access data.
If the transport type is not specified, tcp is used as the default. You can also set additional options if required, such as auth.allow or auth.reject. For more information, see Section 7.1, Tuning Volume Options
Note
Make sure you start your volumes before you try to mount them or else client operations after the mount will hang, see Section 5.8, Starting Volumes for details.
Note
The number of bricks should be a multiple of the replica count for a distributed replicated volume. Also, the order in which bricks are specified has a great effect on data protection. Each replica_count consecutive bricks in the list you give will form a replica set, with all replica sets combined into a volume-wide distribute set. To make sure that replica-set members are not placed on the same node, list the first brick on every server, then the second brick on every server in the same order, and so on.
15
Figure 5.5. Illustration of a Distributed Replicated Volume To create a distributed replicated volume 1. Create a trusted storage pool as described earlier in Section 4.1, Adding Servers to Trusted Storage Pool. 2. Create the distributed replicated volume: # gluster volume create NEW-VOLNAME [replica COUNT] [transport [tcp | rdma | tcp,rdma]] NEW-BRICK... For example, four node distributed (replicated) volume with a two-way mirror:
# gluster volume create test-volume replica 2 transport tcp server1:/exp1 server2:/exp2 server3:/exp3 server4:/exp4 Creation of test-volume has been successful Please start the volume to access data.
For example, to create a six node distributed (replicated) volume with a two-way mirror:
# gluster volume create test-volume replica 2 transport tcp server1:/exp1 server2:/exp2 server3:/exp3 server4:/exp4 server5:/exp5 server6:/exp6 Creation of test-volume has been successful Please start the volume to access data.
If the transport type is not specified, tcp is used as the default. You can also set additional options if required, such as auth.allow or auth.reject. For more information, see Section 7.1, Tuning Volume Options
16
Note
Make sure you start your volumes before you try to mount them or else client operations after the mount will hang, see Section 5.8, Starting Volumes for details.
Note
The number of bricks should be a multiples of number of stripe count and replica count for a distributed striped replicated volume. To create a distributed striped replicated volume 1. Create a trusted storage pool as described earlier in Section 4.1, Adding Servers to Trusted Storage Pool. 2. Create a distributed striped replicated volume using the following command: # gluster volume create NEW-VOLNAME [stripe COUNT] [replica COUNT] [transport [tcp | rdma | tcp,rdma]] NEW-BRICK... For example, to create a distributed replicated striped volume across eight storage servers:
# gluster volume create test-volume stripe 2 replica 2 transport tcp server1:/exp1 server2:/exp2 server3:/exp3 server4:/exp4 server5:/exp5 server6:/exp6 server7:/exp7 server8:/exp8 Creation of test-volume has been successful Please start the volume to access data.
If the transport type is not specified, tcp is used as the default. You can also set additional options if required, such as auth.allow or auth.reject. For more information, see Section 7.1, Tuning Volume Options
17
Note
Make sure you start your volumes before you try to mount them or else client operations after the mount will hang, see Section 5.8, Starting Volumes for details.
Note
The number of bricks should be a multiple of the replicate count and stripe count for a striped replicated volume.
Figure 5.6. Illustration of a Striped Replicated Volume To create a striped replicated volume 1. Create a trusted storage pool consisting of the storage servers that will comprise the volume. For more information, see Section 4.1, Adding Servers to Trusted Storage Pool.
18
Starting Volumes 2. Create a striped replicated volume : # gluster volume create NEW-VOLNAME [stripe COUNT] [replica COUNT] [transport [tcp | rdma | tcp,rdma]] NEW-BRICK... For example, to create a striped replicated volume across four storage servers:
# gluster volume create test-volume stripe 2 replica 2 transport tcp server1:/exp1 server2:/exp2 server3:/exp3 server4:/exp4 Creation of test-volume has been successful Please start the volume to access data.
If the transport type is not specified, tcp is used as the default. You can also set additional options if required, such as auth.allow or auth.reject. For more information, see Section 7.1, Tuning Volume Options
Note
Make sure you start your volumes before you try to mount them or else client operations after the mount will hang, see Section 5.8, Starting Volumes for details.
19
20
Chapter 6.
Note
Install 'glusterfs-rdma' RPM if RDMA support is required. 'glusterfs-rdma' contains RDMA transport module for Infiniband interconnect.
21
Chapter 6. Accessing Data - Setting Up GlusterFS Client You can download the software at http://bits.gluster.com/gluster/glusterfs/3.3.0/x86_64/. 3. Install Gluster Native Client on the client. $ sudo rpm -i glusterfs-3.3.0-1.x86_64.rpm $ sudo rpm -i glusterfs-fuse-3.3.0-1.x86_64.rpm $ sudo rpm -i glusterfs-rdma-3.3.0-1.x86_64.rpm
Mounting Volumes FUSE client : yes Infiniband verbs : yes epoll IO multiplex : yes argp-standalone : no fusermount : no readline : yes
Note
The configuration summary shown above is sample, it can vary depending on other packages.
5. Build the Gluster Native Client software using the following commands: # make # make install 6. Verify that the correct version of Gluster Native Client is installed, using the following command: # glusterfs -version
Note
Server names selected during creation of Volumes should be resolvable in the client machine. You can use appropriate /etc/hosts entries or DNS server to resolve server names to IP addresses.
23
Chapter 6. Accessing Data - Setting Up GlusterFS Client # mount -t glusterfs HOSTNAME-OR-IPADDRESS:/VOLNAME MOUNTDIR For example: # mount -t glusterfs server1:/test-volume /mnt/glusterfs
Note
The server specified in the mount command is only used to fetch the gluster configuration volfile describing the volume name. Subsequently, the client will communicate directly with the servers mentioned in the volfile (which might not even include the one used for mount).
24
NFS Using /etc/fstab, options would look like below: HOSTNAME-OR-IPADDRESS:/VOLNAME MOUNTDIR glusterfs defaults,_netdev,loglevel=WARNING,log-file=/var/log/gluster.log 0 0 If backupvolfile-server option is added while mounting fuse client, when the first volfile server fails, then the server specified in backupvolfile-server option is used as volfile server to mount the client. In fetch-attempts=N option, specify the number of attempts to fetch volume files while mounting a volume. This option will be useful when round-robin DNS is configured for the server-name.
6.2. NFS
You can use NFS v3 to access to gluster volumes. GlusterFS 3.3.0, now includes network lock manager (NLM) v4 feature too. NLM enables applications on NFSv3 clients to do record locking on files. NLM program is started automatically with the NFS server process. This section describes how to use NFS to mount Gluster volumes (both manually and automatically).
Note
Gluster NFS server does not support UDP. If the NFS client you are using defaults to connecting using UDP, the following message appears: requested NFS version or transport protocol is not supported. To connect using TCP
25
Chapter 6. Accessing Data - Setting Up GlusterFS Client Add the following option to the mount command: -o mountproto=tcp For example: # mount -o mountproto=tcp,vers=3 -t nfs server1:/test-volume /mnt/ glusterfs To mount Gluster NFS server from a Solaris client Use the following command: # mount -o proto=tcp,vers=3 nfs://HOSTNAME-OR-IPADDRESS:38467/VOLNAME MOUNTDIR For example: # mount -o proto=tcp,vers=3 nfs://server1:38467/test-volume /mnt/ glusterfs
6.3. CIFS
You can use CIFS to access to volumes when using Microsoft Windows as well as SAMBA clients. For this access method, Samba packages need to be present on the client side. You can export glusterfs mount point as the samba export, and then mount it using CIFS protocol. This section describes how to mount CIFS shares on Microsoft Windows-based clients (both manually and automatically) and how to verify that the volume has mounted successfully. 26
Note
CIFS access using the Mac OS X Finder is not supported, however, you can use the Mac OS X command line to access Gluster volumes using CIFS.
Note
To be able mount from any server in the trusted storage pool, you must repeat these steps on each Gluster node. For more advanced configurations, see Samba documentation.
27
Testing Mounted Volumes Use the following command: # df -h The output of df command on the client will display the aggregated storage space from all the bricks in a volume similar to this example: # df -h /mnt/glusterfs Filesystem Size Used Avail Use% Mounted on server1:/test-volume 28T 22T 5.4T 82% /mnt/glusterfs Change to the directory and list the contents by entering the following: # cd MOUNTDIR # ls For example, # cd /mnt/glusterfs # ls
29
30
Chapter 7.
Note
It is recommend to set server.allow-insecure option to ON if there are too many bricks in each volume or if there are too many services which have already utilized all the privileged ports in the system. Turning this option ON allows ports to accept/reject messages from insecure ports. So, use this option only if your deployment requires it. To tune volume options Tune volume options using the following command: # gluster volume set VOLNAME OPTION PARAMETER For example, to specify the performance cache size for test-volume:
# gluster volume set test-volume performance.cache-size 256MB Set volume successful
The following table lists the Volume options along with its description and default value:
31
Note
The default options given here are subject to modification at any given time and may not be the same for all versions. Option auth.allow Description IP addresses of the clients which should be allowed to access the volume. IP addresses of the clients which should be denied to access the volume. Specifies the duration for the lock state to be maintained on the client after a network disconnection. Specifies the maximum number of blocks per file on which self-heal would happen simultaneously. Default Value * (allow all) Available Options Valid IP address which includes wild card patterns including *, such as 192.168.1.* Valid IP address which includes wild card patterns including *, such as 192.168.2.* 10 - 1800 secs
auth.reject
client.grace-timeout
10
cluster.self-healwindow-size
16
0 - 1025 blocks
cluster.data-self-healalgorithm
Specifies the type reset of self-heal. If you set the option as "full", the entire file is copied from source to destinations. If the option is set to "diff" the file blocks that are not in sync are copied to destinations. Reset uses a heuristic model. If the file does not exist on one of the subvolumes, or a zerobyte file exists (created by entry self-heal) the entire content has to be copied anyway, so there is no benefit from using the "diff" algorithm. If the file size is about the same
32
Tuning Volume Options Option Description as page size, the entire file can be read and written with a few operations, which will be faster than "diff" which has to read checksums and then read and write. cluster.min-free-disk Specifies the percentage of disk space that must be kept free. Might be useful for non-uniform bricks. Specifies the size of the stripe unit that will be read from or written to. Allows you to turn-off proactive self-heal on replicated volumes. Changes the log-level of the bricks. Changes the log-level of the clients. Statistics related to the latency of each operation would be tracked. Statistics related to file-operations would be tracked. Enables you to mount the entire volume as read-only for all the clients (including NFS clients) accessing it. Enables self-healing of locks when the network disconnects. For performance reasons, quota caches the directory sizes on client. You can set timeout indicating the maximum duration of directory sizes in 10% Percentage of required minimum free disk space Default Value Available Options
cluster.stripe-blocksize
size in bytes
on
On | Off
INFO
INFO
off
diagnostics.dump-fdstats feature.read-only
off
On | Off
off
On | Off
features.lock-heal
on
On | Off
features.quota-timeout
0 - 3600 secs
33
Chapter 7. Managing GlusterFS Volumes Option Description cache, from the time they are populated, during which they are considered valid. georeplication.indexing Use this option to automatically sync the changes in the filesystem from Master to Slave. The time frame after which the operation has to be declared as dead, if the server does not respond for a particular operation. The time duration for which the client waits to check if the server is responsive. When a ping timeout happens, there is a network disconnect between the client and server. All resources held by server on behalf of the client get cleaned up. When a reconnection happens, all resources will need to be reacquired before the client can resume its operations on the server. Additionally, the locks will be acquired and the lock tables updated. This reconnect is a very expensive operation and should be avoided. For 32-bit nfs clients or applications that do not support 64bit inode numbers or large files, use this option from the CLI to make Gluster NFS return 32-bit inode numbers instead of 64-bit inode numbers. Applications that will off On | Off Default Value Available Options
network.frame-timeout
1800 secs
network.ping-timeout
42 Secs
42 Secs
nfs.enable-ino32
off
On | Off
34
Tuning Volume Options Option Description benefit are those that were either: * Built 32-bit and run on 32-bit machines. * Built 32-bit on 64-bit systems. * Built 64-bit but use a library built 32-bit, especially relevant for python and perl scripts. Either of the conditions above can lead to application on Linux NFS clients failing with "Invalid argument" or "Value too large for defined data type" errors. nfs.volume-access Set the access type for the specified subvolume. read-write read-write|read-only Default Value Available Options
nfs.trusted-write
If there is an off UNSTABLE write from the client, STABLE flag will be returned to force the client to not send a COMMIT request. In some environments, combined with a replicated GlusterFS setup, this option can improve write performance. This flag allows users to trust Gluster replication logic to sync data to the disks and recover when required. COMMIT requests if received will be handled in a default manner by fsyncing. STABLE writes are still handled in a sync manner. All writes and COMMIT requests off
On | Off
nfs.trusted-sync
On | Off
35
Chapter 7. Managing GlusterFS Volumes Option Description are treated as async. This implies that no write requests are guaranteed to be on server disks when the write reply is received at the NFS client. Trusted sync includes trusted-write behavior. nfs.export-dir By default, all subvolumes of NFS are exported as individual exports. Now, this option allows you to export only the specified subdirectory or subdirectories in the volume. This option can also be used in conjunction with nfs3.exportvolumes option to restrict exports only to the subdirectories specified through this option. You must provide an absolute path. Enable/Disable exporting entire volumes, instead if used in conjunction with nfs3.export-dir, can allow setting up only subdirectories as exports. Enable/Disable the AUTH_UNIX authentication type. This option is enabled by default for better interoperability. However, you can disable it if required. Enable/Disable the AUTH_NULL authentication type. It is not recommended to change the default value for this option. Enabled for all sub directories. Enable | Disable Default Value Available Options
nfs.export-volumes
on
On | Off
nfs.rpc-auth-unix
on
On | Off
nfs.rpc-auth-null
on
On | Off
36
Tuning Volume Options Option nfs.rpc-auth-allow<IPAddresses> Description Default Value Available Options IP address or Host name
Allow a comma Reject All separated list of addresses and/or hostnames to connect to the server. By default, all clients are disallowed. This allows you to define a general rule for all exported volumes. Reject a comma Reject All separated list of addresses and/or hostnames from connecting to the server. By default, all connections are disallowed. This allows you to define a general rule for all exported volumes. Allow client connections from unprivileged ports. By default only privileged ports are allowed. This is a global setting in case insecure ports are to be enabled for all exports using a single option. off
nfs.rpc-auth-reject IPAddresses
nfs.ports-insecure
On | Off
nfs.addr-namelookup
Turn-off name lookup on for incoming client connections using this option. In some setups, the name server can take too long to reply to DNS queries resulting in timeouts of mount requests. Use this option to turn off name lookups during address authentication. Note, turning this off will prevent you from using hostnames in rpcauth.addr.* filters. For systems that need to run multiple NFS on
On | Off
nfs.register-withportmap
On | Off
37
Chapter 7. Managing GlusterFS Volumes Option Description servers, you need to prevent more than one from registering with portmap service. Use this option to turn off portmap registration for Gluster NFS. nfs.port <PORTNUMBER> Use this option on systems that need Gluster NFS to be associated with a nondefault port number. Turn-off volume being exported by NFS Size of the per-file write-behind buffer. 38465- 38467 Default Value Available Options
nfs.disable performance.writebehind-window-size
off 1 MB 16
performance.io-thread- The number of count threads in IO threads translator. performance.flushbehind If this option is set ON, instructs write-behind translator to perform flush in background, by returning success (or any errors, if any of previous writes were failed) to application even before flush is sent to backend filesystem. Sets the maximum file size cached by the io-cache translator. Can use the normal size descriptors of KB, MB, GB,TB or PB (for example, 6GB). Maximum size uint64. Sets the minimum file size cached by the io-cache translator. Values same as "max" above.
On
On | Off
performance.cachemax-file-size
2 ^ 64 -1 bytes
size in bytes
performance.cachemin-file-size
0B
size in bytes
performance.cacherefresh-timeout
The cached data for a 1 sec file will be retained till 'cache-refresh-timeout' seconds, after which data re-validation is performed.
0 - 61
38
Expanding Volumes Option performance.cachesize server.allow-insecure Description Default Value Available Options size in bytes On | Off
Size of the read cache. 32 MB Allow client connections from unprivileged ports. By default only privileged ports are allowed. This is a global setting in case insecure ports are to be enabled for all exports using a single option. Specifies the duration for the lock state to be maintained on the server after a network disconnection. Location of the state dump file. on
server.grace-timeout
10
10 - 1800 secs
server.statedump-path
You can view the changed volume options using the # gluster volume info VOLNAME command. For more information, see Section 7.7, Deleting Volumes.
Note
When expanding distributed replicated and distributed striped volumes, you need to add a number of bricks that is a multiple of the replica or stripe count. For example, to expand a distributed replicated volume with a replica count of 2, you need to add bricks in multiples of 2 (such as 4, 6, 8, etc.). To expand a volume 1. On the first server in the cluster, probe the server to which you want to add the new brick using the following command: # gluster peer probe HOSTNAME For example:
# gluster peer probe server4
39
2. Add the brick using the following command: # gluster volume add-brick VOLNAME NEW-BRICK For example:
# gluster volume add-brick test-volume server4:/exp4 Add Brick successful
3. Check the volume information using the following command: # gluster volume info The command displays information similar to the following:
Volume Name: test-volume Type: Distribute Status: Started Number of Bricks: 4 Bricks: Brick1: server1:/exp1 Brick2: server2:/exp2 Brick3: server3:/exp3 Brick4: server4:/exp4
4. Rebalance the volume to ensure that all files are distributed to the new brick. You can use the rebalance command as described in Section 7.5, Rebalancing Volumes.
Note
Data residing on the brick that you are removing will no longer be accessible at the Gluster mount point. Note however that only the configuration information is removed - you can continue to access the data directly from the brick, as necessary. When shrinking distributed replicated and distributed striped volumes, you need to remove a number of bricks that is a multiple of the replica or stripe count. For example, to shrink a distributed striped volume with a stripe count of 2, you need to remove bricks in multiples of 2 (such as 4, 6, 8, etc.). In addition, the bricks you are trying to remove must be from the same sub-volume (the same replica or stripe set). To shrink a volume 1. Remove the brick using the following command:
40
Shrinking Volumes # gluster volume remove-brick VOLNAME BRICK start For example, to remove server2:/exp2:
# gluster volume remove-brick test-volume server2:/exp2 start Removing brick(s) can result in data loss. Do you want to Continue? (y/n)
2. Enter "y" to confirm the operation. The command displays the following message indicating that the remove brick operation is successfully started:
Remove Brick successful
3. (Optional) View the status of the remove brick operation using the following command: # gluster volume remove-brick VOLNAME BRICK status For example, to view the status of remove brick operation on server2:/exp2 brick:
# gluster volume remove-brick test-volume server2:/exp2 Node Rebalanced-files --------- ---------------617c923e-6450-4065-8e33-865e28d9428f 34 status size scanned ---- ------340 162
4. Commit the remove brick operation using the following command: # gluster volume remove-brick VOLNAME BRICK commit For example, to view the status of remove brick operation on server2:/exp2 brick:
# gluster volume remove-brick test-volume server2:/exp2 commit
5. Check the volume information using the following command: # gluster volume info The command displays information similar to the following:
# gluster volume info Volume Name: test-volume Type: Distribute Status: Started Number of Bricks: 3 Bricks: Brick1: server1:/exp1 Brick3: server3:/exp3 Brick4: server4:/exp4
6. Rebalance the volume to ensure that all files are distributed to the new brick. You can use the rebalance command as described in Section 7.5, Rebalancing Volumes.
41
Note
You need to have the FUSE package installed on the server on which you are running the replace-brick command for the command to work.
3. To pause the migration operation, if needed, use the following command: # gluster volume replace-brick VOLNAME BRICK NEW-BRICK pause
For example, to pause the data migration from server3:/exp3 to server5:/exp5 in test-volume:
# gluster volume replace-brick test-volume server3:/exp3 server5:exp5 pause Replace brick pause operation successful
4. To abort the migration operation, if needed, use the following command: # gluster volume replace-brick VOLNAME BRICK NEW-BRICK abort For example, to abort the data migration from server3:/exp3 to server5:/exp5 in test-volume:
# gluster volume replace-brick test-volume server3:/exp3 server5:exp5 abort Replace brick abort operation successful
5. Check the status of the migration operation using the following command: # gluster volume replace-brick VOLNAME BRICK NEW-BRICK status For example, to check the data migration status from server3:/exp3 to server5:/exp5 in testvolume:
# gluster volume replace-brick test-volume server3:/exp3 server5:/exp5 status Current File = /usr/src/linux-headers-2.6.31-14/block/Makefile
42
Rebalancing Volumes
Number of files migrated = 10567 Migration complete
The status command shows the current file being migrated along with the current total number of files migrated. After completion of migration, it displays Migration complete. 6. Commit the migration of data from one brick to another using the following command: # gluster volume replace-brick VOLNAME BRICK NEW-BRICK commit For example, to commit the data migration from server3:/exp3 to server5:/exp5 in test-volume:
# gluster volume replace-brick test-volume server3:/exp3 server5:/exp5 commit replace-brick commit successful
7. Verify the migration of brick by viewing the volume info using the following command: # gluster volume info VOLNAME For example, to check the volume information of new brick server5:/exp5 in test-volume:
# gluster volume info test-volume Volume Name: testvolume Type: Replicate Status: Started Number of Bricks: 4 Transport-type: tcp Bricks: Brick1: server1:/exp1 Brick2: server2:/exp2 Brick3: server4:/exp4 Brick4: server5:/exp5 The new volume details are displayed.
The new volume details are displayed. In the above example, previously, there were bricks; 1,2,3, and 4 and now brick 3 is replaced by brick 5.
Start the migration operation forcefully on any one of the server using the following command: # gluster volume rebalance VOLNAME start force For example:
# gluster volume rebalance test-volume start force Starting rebalancing on volume test-volume has been successful
size ---1463
scanned ------312
The time to complete the rebalance operation depends on the number of files on the volume along with the corresponding file sizes. Continue checking the rebalance status, verifying that the number of files rebalanced or total files scanned keeps increasing. For example, running the status command again might display a result similar to the following:
# gluster volume rebalance test-volume status Node Rebalanced-files --------- ---------------617c923e-6450-4065-8e33-865e28d9428f 498
size ---1783
scanned ------378
The rebalance status displays the following when the rebalance is complete:
# gluster volume rebalance test-volume status Node Rebalanced-files --------- ---------------617c923e-6450-4065-8e33-865e28d9428f 502
size ---1873
scanned ------334
status ----------completed
size ---590
scanned ------244
status ----------stopped
45
Chapter 7. Managing GlusterFS Volumes 2. Enter y to confirm the operation. The output of the command displays the following:
Stopping volume test-volume has been successful
Trigger self-heal on all the files of a volume: # gluster volume heal VOLNAME full For example, to trigger self-heal on all the files of of test-volume:
# gluster volume heal test-volume full Heal operation on volume test-volume has been successful
View the list of files that needs healing: # gluster volume heal VOLNAME info 46
Triggering Self-Heal on Replicate For example, to view the list of files on test-volume that needs healing:
# gluster volume heal test-volume info Brick server1:/gfs/test-volume_0 Number of entries: 0 Brick server2:/gfs/test-volume_1 Number of entries: 101 /95.txt /32.txt /66.txt /35.txt /18.txt /26.txt /47.txt /55.txt /85.txt ...
View the list of files that are self-healed: # gluster volume heal VOLNAME info healed For example, to view the list of files on test-volume that are self-healed:
# gluster volume heal test-volume info healed Brick server1:/gfs/test-volume_0 Number of entries: 0 Brick server2:/gfs/test-volume_1 Number of entries: 69 /99.txt /93.txt /76.txt /11.txt /27.txt /64.txt /80.txt /19.txt /41.txt /29.txt /37.txt /46.txt ...
View the list of files of a particular volume on which the self-heal failed: # gluster volume heal VOLNAME info failed For example, to view the list of files of test-volume that are not self-healed:
# gluster volume heal test-volume info failed Brick server1:/gfs/test-volume_0 Number of entries: 0 Brick server2:/gfs/test-volume_3 Number of entries: 72 /90.txt /95.txt /77.txt /71.txt
47
View the list of files of a particular volume which are in split-brain state: # gluster volume heal VOLNAME info split-brain For example, to view the list of files of test-volume which are in split-brain state:
# gluster volume heal test-volume info split-brain Brick server1:/gfs/test-volume_2 Number of entries: 12 /83.txt /28.txt /69.txt ... Brick server2:/gfs/test-volume_2 Number of entries: 12 /83.txt /28.txt /69.txt ...
48
Chapter 8.
Managing Geo-replication
Geo-replication provides a continuous, asynchronous, and incremental replication service from one site to another over Local Area Networks (LANs), Wide Area Network (WANs), and across the Internet. Geo-replication uses a masterslave model, whereby replication and mirroring occurs between the following partners: Master a GlusterFS volume Slave a slave which can be of the following types: A local directory which can be represented as file URL like file:///path/to/dir. You can use shortened form, for example, /path/to/dir. A GlusterFS Volume - Slave volume can be either a local volume like gluster:// localhost:volname (shortened form - :volname) or a volume served by different host like gluster://host:volname (shortened form - host:volname).
Note
Both of the above types can be accessed remotely using SSH tunnel. To use SSH, add an SSH prefix to either a file URL or gluster type URL. For example, ssh://root@remotehost:/path/to/dir (shortened form - root@remote-host:/path/to/dir) or ssh:// root@remote-host:gluster://localhost:volname (shortened from - root@remotehost::volname).
This section introduces Geo-replication, illustrates the various deployment scenarios, and explains how to configure the system to provide replication and mirroring in your environment.
49
Chapter 8. Managing Geo-replication Section 8.2.2, Geo-replication Deployment Overview Section 8.2.3, Checking Geo-replication Minimum Requirements Section 8.2.4, Setting Up the Environment for Geo-replication Section 8.2.5, Setting Up the Environment for a Secure Geo-replication Slave
Geo-replication over WAN You can configure Geo-replication to replicate data over a Wide Area Network.
Geo-replication over Internet You can configure Geo-replication to mirror data over the Internet. 50
Multi-site cascading Geo-replication You can configure Geo-replication to mirror data in a cascading fashion across multiple sites.
Python
Python 2.4 (with ctypes external module), or Python 2.5 (or higher) OpenSSH version 4.0 (or higher) rsync 3.0.0 or higher GlusterFS supported versions
If you host multiple slave volumes, you can repeat step 2. for each of the slave volumes and add the following options to the volfile:
option mountbroker-geo-replication.geoaccount2 slavevol2 option mountbroker-geo-replication.geoaccount3 slavevol3
6. Setup Master to access Slave as geoaccount@Slave. You can add multiple slave volumes within the same account (geoaccount) by providing commaseparated list of slave volumes (without spaces) as the argument of mountbroker-georeplication.geogroup. You can also have multiple options of the form mountbrokergeo-replication.*. It is recommended to use one service account per Master machine. For example, if there are multiple slave volumes on Slave for the master machines Master1, Master2, and Master3, then create a dedicated service user on Slave for them by repeating Step 2. for each 54
Starting Geo-replication (like geogroup1, geogroup2, and geogroup3), and then add the following corresponding options to the volfile: option mountbroker-geo-replication.geoaccount1 slavevol11,slavevol12,slavevol13 option mountbroker-geo-replication.geoaccount2 slavevol21,slavevol22 option mountbroker-geo-replication.geoaccount3 slavevol31 Now set up Master1 to ssh to geoaccount1@Slave, etc. You must restart glusterd to make the configuration changes effective.
Note
You may need to configure the Geo-replication service before starting it. For more information, see Section 8.3.4, Configuring Geo-replication.
56
Configuring Geo-replication
Volume1 [email protected]:/data/remote_dir Starting....
Display information of a particular master slave session using the following command: # gluster volume geo-replication MASTER SLAVE status For example, to display information of Volume1 and example.com:/data/remote_dir # gluster volume geo-replication Volume1 example.com:/data/remote_dir status The status of the geo-replication between Volume1 and example.com:/data/remote_dir is displayed. Display information of all geo-replication sessions belonging to a master # gluster volume geo-replication MASTER status For example, to display information of Volume1
# gluster volume geo-replication Volume1 example.com:/data/remote_dir status MASTER SLAVE STATUS ______ ______________________________ ____________ Volume1 ssh://example.com:gluster://127.0.0.1:remove_volume Volume1 ssh://example.com:file:///data/remote_dir OK
OK
The status of a session could be one of the following four: Starting: This is the initial phase of the Geo-replication session; it remains in this state for a minute, to make sure no abnormalities are present. OK: The geo-replication session is in a stable state. Faulty: The geo-replication session has witnessed some abnormality and the situation has to be investigated further. For further information, see Chapter 14, Troubleshooting GlusterFS section. Corrupt: The monitor thread which is monitoring the geo-replication session has died. This situation should not occur normally, if it persists contact Red Hat Supportwww.redhat.com/support/.
See Chapter 15, Command Reference for more information about the gluster command.
The data is syncing from master volume (Volume1) to slave directory (example.com:/data/remote_dir). To view the status of this geo-replication session run the following command on Master:
# gluster volume geo-replication Volume1 [email protected]:/data/remote_dir status MASTER ______ Volume1 SLAVE ______________________________ [email protected]:/data/remote_dir STATUS ____________ OK
Before Failure Assume that the Master volume had 100 files and was mounted at /mnt/gluster on one of the client machines (client). Run the following command on Client machine to view the list of files:
client# ls /mnt/gluster | wc l 100
The slave directory (example.com) will have same data as in the master volume and same can be viewed by running the following command on slave: 58
After Failure If one of the bricks (machine2) fails, then the status of Geo-replication session is changed from "OK" to "Faulty". To view the status of this geo-replication session run the following command on Master:
# gluster volume geo-replication Volume1 [email protected]:/data/remote_dir status MASTER ______ Volume1 SLAVE ______________________________ [email protected]:/data/remote_dir STATUS ____________ Faulty
Machine2 is failed and now you can see discrepancy in number of files between master and slave. Few files will be missing from the master volume but they will be available only on slave as shown below. Run the following command on Client:
client # ls /mnt/gluster | wc l 52
To restore data from the slave machine 1. Stop all Master's geo-replication sessions using the following command: # gluster volume geo-replication MASTER SLAVE stop For example:
machine1# gluster volume geo-replication Volume1 example.com:/data/remote_dir stop Stopping geo-replication session between Volume1 & example.com:/data/remote_dir has been successful
Note
Repeat # gluster volume geo-replication MASTER SLAVE stop command on all active geo-replication sessions of master volume.
2. Replace the faulty brick in the master by using the following command: # gluster volume replace-brick VOLNAME BRICK NEW-BRICK start For example:
59
machine1# gluster volume replace-brick Volume1 machine2:/export/dir16 machine3:/export/ dir16 start Replace-brick started successfully
3. Commit the migration of data using the following command: # gluster volume replace-brick VOLNAME BRICK NEW-BRICK commit force For example:
machine1# gluster volume replace-brick Volume1 machine2:/export/dir16 machine3:/export/ dir16 commit force Replace-brick commit successful
4. Verify the migration of brick by viewing the volume info using the following command: # gluster volume info VOLNAME For example:
machine1# gluster volume info Volume Name: Volume1 Type: Distribute Status: Started Number of Bricks: 2 Transport-type: tcp Bricks: Brick1: machine1:/export/dir16 Brick2: machine3:/export/dir16 Options Reconfigured: geo-replication.indexing: on
5. Run rsync command manually to sync data from slave to master volume's client (mount point). For example: example.com# rsync -PavhS --xattrs --ignore-existing /data/remote_dir/ client:/mnt/gluster Verify that the data is synced by using the following command: On master volume, run the following command:
Client # ls | wc l 100
Now Master volume and Slave directory is synced. 6. Restart geo-replication session from master to slave using the following command: # gluster volume geo-replication MASTER SLAVE start 60
61
62
Chapter 9.
Note
For now, only Hard limit is supported. Here, the limit cannot be exceeded and attempts to use more disk space beyond the set limit will be denied. System administrators can also monitor the resource utilization to limit the storage for the users depending on their role in the organization. You can set the quota at the following levels: Directory level limits the usage at the directory level Volume level limits the usage at the volume level
Note
You can set the disk limit on the directory even if it is not created. The disk limit is enforced immediately after creating that directory. For more information on setting disk limit, see Section 9.3, Setting or Replacing Disk Limit .
63
Chapter 9. Managing Directory Quota Disable the quota using the following command: # gluster volume quota VOLNAME disable For example, to disable quota on test-volume:
# gluster volume quota test-volume disable Quota translator is disabled on /test-volume
Note
In a multi-level directory hierarchy, the minimum of disk limit in entire hierarchy will be considered for enforcement.
64
Updating Memory Cache Size NOTE that, the directory listed here is not absolute directory name, but relative path to the volume's root ('/'). For example, if 'test-volume' is mounted on '/mnt/glusterfs', then for the above example, '/ Test/data' means, '/mnt/glusterfs/Test/data' Display disk limit information on a particular directory on which limit is set, using the following command: # gluster volume quota VOLNAME list /directory name For example, to see the set limit on /data directory of test-volume:
# gluster volume quota test-volume list /data Path__________Limit______Set Size /Test/data 10 GB 6 GB
65
66
Chapter 10.
When profiling on the volume is started, the following additional options are displayed in the Volume Info:
diagnostics.count-fop-hits: on diagnostics.latency-measurement: on
Chapter 10. Monitoring your GlusterFS Workload # gluster volume profile VOLNAME info For example, to see the I/O information on test-volume:
# gluster volume profile test-volume info Brick: Test:/export/2 Cumulative Stats: Block Size: Read: Write: Block Size: Read: Write: Block Size: Read: Write: Block Size: Read: Write: Block Size: Read: Write: 1b+ 0 908 128b+ 0 5 1024b+ 0 15 8192b+ 52 234 32b+ 0 28 256b+ 6 23 2048b+ 52 120 16384b+ 8 134 65536b+ 118 1341 64b+ 0 8 512b+ 4 16 4096b+ 17 846 32768b+ 34 286 131072b+ 622 594
%-latency
AvgMinMaxcalls Fop latency Latency Latency ___________________________________________________________ 4.82 1132.28 21.00 800970.00 4575 WRITE 5.70 156.47 9.00 665085.00 39163 READDIRP 11.35 315.02 9.00 1433947.00 38698 LOOKUP 11.88 1729.34 21.00 2569638.00 7382 FXATTROP 47.35 104235.02 2485.00 7789367.00 488 FSYNC ----------------------------------Duration BytesRead : 335 : 94505058
BytesWritten : 195571980
Running GlusterFS Volume TOP Command # gluster volume profile test-volume stop Profiling stopped on test-volume
11
69
11
/clients/client0/~dmtmp/PARADOX/ STUDENTS.DB /clients/client0/~dmtmp/PWRPNT/ TIPS.PPT /clients/client0/~dmtmp/PWRPNT/ PCBENCHM.PPT /clients/client7/~dmtmp/PARADOX/ STUDENTS.DB /clients/client1/~dmtmp/PARADOX/ STUDENTS.DB /clients/client2/~dmtmp/PARADOX/ STUDENTS.DB /clients/client0/~dmtmp/PARADOX/ STUDENTS.DB /clients/client8/~dmtmp/PARADOX/ STUDENTS.DB
10
10
70
54
/clients/client8/~dmtmp/SEED/LARGE.FIL
71
Opendir count 1001 454 454 454 454 454 443 408 408 402
directory name /clients/client0/~dmtmp /clients/client8/~dmtmp /clients/client2/~dmtmp /clients/client6/~dmtmp /clients/client5/~dmtmp /clients/client9/~dmtmp /clients/client0/~dmtmp/PARADOX /clients/client1/~dmtmp /clients/client7/~dmtmp /clients/client4/~dmtmp
72
/clients/client0/~dmtmp/PWRPNT/ TRIDOTS.POT /clients/client0/~dmtmp/PWRPNT/ PCBENCHM.PPT /clients/client2/~dmtmp/SEED/ MEDIUM.FIL /clients/client0/~dmtmp/SEED/ MEDIUM.FIL /clients/client0/~dmtmp/SEED/ LARGE.FIL /clients/client0/~dmtmp/PARADOX/ COURSES.X04 /clients/client0/~dmtmp/PARADOX/ STUDENTS.VAL /clients/client3/~dmtmp/SEED/ COURSES.DB /clients/client3/~dmtmp/SEED/ MEDIUM.FIL /clients/client5/~dmtmp/WORD/ BASEMACH.DOC
-2011-01-31 15:38:36.894610 -2011-01-31 15:38:39.815310 -2011-01-31 15:52:53.631499 -2011-01-31 15:38:36.926198 -2011-01-31 15:38:36.930445 -2011-01-31 15:38:40.549919 -2011-01-31 15:52:53.298766 -2011-01-31 15:39:11.776780 -2011-01-31 15:39:10.251764 -2011-01-31 15:39:09.336572
2383.00
2340.00
2299.00
2259.00
2221.00
2221.00
2184.00
2184.00
This command will initiate a dd for the specified count and block size and measures the corresponding throughput. To view list of read performance on each brick View list of read performance on each brick using the following command: # gluster volume top VOLNAME read-perf [bs blk-size count count] [brick BRICK-NAME] [list-cnt cnt] For example, to view read performance on brick server:/export/ of test-volume, 256 block size of count 1, and list count 10: # gluster volume top test-volume read-perf bs 256 count 1 brick server:/ export/ list-cnt 10 Brick: server:/export/dir1 256 bytes (256 B) copied, Throughput: 4.1 MB/s
==========Read throughput file stats======== read through filename Time
73
2570.00
2383.00
2340.00
2299.00
2259.00
/clients/client0/~dmtmp/PARADOX/ -2011-01-31 COURSES.X04 15:38:40.549919 /clients/client9/~dmtmp/PARADOX/ -2011-01-31 STUDENTS.VAL 15:52:53.298766 /clients/client8/~dmtmp/PARADOX/ -2011-01-31 COURSES.DB 15:39:11.776780 /clients/client3/~dmtmp/SEED/ MEDIUM.FIL /clients/client5/~dmtmp/WORD/ BASEMACH.DOC -2011-01-31 15:39:10.251764 -2011-01-31 15:39:09.336572
2221.00
2221.00
2184.00
2184.00
74
1008.00
949.00
936.00 897.00
897.00
885.00
528.00
516.00
Display information about all volumes using the following command: # gluster volume info all
# gluster volume info all Volume Name: test-volume Type: Distribute Status: Created Number of Bricks: 4 Bricks: Brick1: server1:/exp1 Brick2: server2:/exp2
75
The statedump files are created on the brick servers in the /tmp directory or in the directory set using server.statedump-path volume option. The naming convention of the dump file is <brick-path>.<brick-pid>.dump. By defult, the output of the statedump is stored at /tmp/<brickname.PID.dump> file on that particular server. Change the directory of the statedump file using the following command: # gluster volume set VOLNAME server.statedump-path path 76
Displaying Volume Status For example, to change the location of the statedump file of test-volume:
# gluster volume set test-volume server.statedump-path /usr/local/var/log/glusterfs/dumps/ Set volume successful
You can view the changed path of the statedump file using the following command: # gluster volume info VOLNAME
Display information about all volumes using the following command: # gluster volume status all
# gluster volume status all STATUS OF VOLUME: volume-test BRICK PORT ONLINE PID -------------------------------------------------------arch:/export/4 24010 Y 22455 STATUS OF VOLUME: test-volume BRICK PORT ONLINE PID --------------------------------------------------------
77
Display additional information about the bricks using the following command: # gluster volume status VOLNAME detail For example, to display additional information about the bricks of test-volume:
# gluster volume status test-volume details STATUS OF VOLUME: test-volume ------------------------------------------Brick : arch:/export/1 Port : 24009 Online : Y Pid : 16977 File System : rootfs Device : rootfs Mount Options : rw Disk Space Free : 13.8GB Total Disk Space : 46.5GB Inode Size : N/A Inode Count : N/A Free Inodes : N/A Number of Bricks: 1 Bricks: Brick: server:/brick6
Display the list of clients accessing the volumes using the following command: # gluster volume status VOLNAME clients For example, to display the list of clients connected to test-volume:
# gluster volume status test-volume clients Brick : arch:/export/1 Clients connected : 2 Hostname Bytes Read BytesWritten --------------------------127.0.0.1:1013 776 676 127.0.0.1:1012 50440 51200
Display the memory usage and memory pool details of the bricks using the following command: # gluster volume status VOLNAME mem For example, to display the memory usage and memory pool details of the bricks of test-volume:
Memory status for volume : test-volume ---------------------------------------------Brick : arch:/export/1 Mallinfo -------Arena : 434176 Ordblks : 2 Smblks : 0 Hblks : 12 Hblkhd : 40861696 Usmblks : 0
78
Mempool Stats ------------Name HotCount ColdCount PaddedSizeof AllocCount MaxAlloc ----------- --------- ------------ ---------- -------test-volume-server:fd_t 0 16384 92 57 5 test-volume-server:dentry_t 59 965 84 59 59 test-volume-server:inode_t 60 964 148 60 60 test-volume-server:rpcsvc_request_t 0 525 6372 351 2 glusterfs:struct saved_frame 0 4096 124 2 2 glusterfs:struct rpc_req 0 4096 2236 2 2 glusterfs:rpcsvc_request_t 1 524 6372 2 1 glusterfs:call_stub_t 0 1024 1220 288 1 glusterfs:call_stack_t 0 8192 2084 290 2 glusterfs:call_frame_t 0 16384 172 1728 6
Display the inode tables of the volume using the following command: # gluster volume status VOLNAME inode For example, to display the inode tables of the test-volume:
# gluster volume status test-volume inode inode tables for volume test-volume ---------------------------------------------Brick : arch:/export/1 Active inodes: GFID ---6f3fe173-e07a-4209-abb6-484091d75499 370d35d7-657e-44dc-bac4-d6dd800ec3d3 LRU inodes: GFID ---80f98abe-cdcf-4c1d-b917-ae564cf55763 3a58973d-d549-4ea6-9977-9aa218f233de 2ce0197d-87a9-451b-9094-9baa38121155
Lookups ------1 1
Ref --9 1
IA type ------2 2
Lookups ------1 1 1
Ref --0 0 0
IA type ------1 1 2
Display the open fd tables of the volume using the following command: # gluster volume status VOLNAME fd For example, to display the open fd tables of the test-volume:
# gluster volume status test-volume fd FD tables for volume test-volume ---------------------------------------------Brick : arch:/export/1 Connection 1: RefCount = 0 MaxFDs = 128 FirstFree = 4 FD Entry PID RefCount ----------------0 26311 1 1 26310 3 2 26310 1 3 26311 3
Flags ----2 2 2 2
79
FirstFree = 0
FirstFree = 0
Display the pending calls of the volume using the following command: # gluster volume status VOLNAME callpool Each call has a call stack containing call frames. For example, to display the pending calls of test-volume:
# gluster volume status test-volume Pending calls for volume test-volume ---------------------------------------------Brick : arch:/export/1 Pending calls: 2 Call Stack1 UID : 0 GID : 0 PID : 26338 Unique : 192138 Frames : 7 Frame 1 Ref Count = 1 Translator = test-volume-server Completed = No Frame 2 Ref Count = 0 Translator = test-volume-posix Completed = No Parent = test-volume-access-control Wind From = default_fsync Wind To = FIRST_CHILD(this)->fops->fsync Frame 3 Ref Count = 1 Translator = test-volume-access-control Completed = No Parent = repl-locks Wind From = default_fsync Wind To = FIRST_CHILD(this)->fops->fsync Frame 4 Ref Count = 1 Translator = test-volume-locks Completed = No Parent = test-volume-io-threads Wind From = iot_fsync_wrapper Wind To = FIRST_CHILD (this)->fops->fsync Frame 5 Ref Count = 1 Translator = test-volume-io-threads Completed = No Parent = test-volume-marker Wind From = default_fsync Wind To = FIRST_CHILD(this)->fops->fsync Frame 6 Ref Count = 1 Translator = test-volume-marker Completed = No Parent = /export/1
80
81
82
Chapter 11.
Chapter 11. POSIX Access Control Lists To set or modify Access ACLs You can set or modify access ACLs use the following command: # setfacl m entry type file The ACL entry types are the POSIX ACLs representations of owner, group, and other. Permissions must be a combination of the characters r (read), w (write), and x (execute). You must specify the ACL entry in the following format and can specify multiple entry types separated by commas. ACL Entry u:uid:<permission> g:gid:<permission> m:<permission> Description Sets the access ACLs for a user. You can specify user name or UID Sets the access ACLs for a group. You can specify group name or GID. Sets the effective rights mask. The mask is the combination of all access permissions of the owning group and all of the user and group entries. Sets the access ACLs for users other than the ones in the group for the file.
o:<permission>
If a file or directory already has an POSIX ACLs, and the setfacl command is used, the additional permissions are added to the existing POSIX ACLs or the existing rule is modified. For example, to give read and write permissions to user antony: # setfacl -m u:antony:rw /mnt/gluster/data/testfile
Note
An access ACLs set for an individual file can override the default ACLs permissions.
84
Retrieving POSIX ACLs Effects of a Default ACLs The following are the ways in which the permissions of a directory's default ACLs are passed to the files and subdirectories in it: A subdirectory inherits the default ACLs of the parent directory both as its default ACLs and as an access ACLs. A file inherits the default ACLs as its access ACLs.
View the default ACLs of a directory using the following command: # getfacl directory name For example, to view the existing ACLs for /data/doc
# getfacl /mnt/gluster/data/doc # owner: antony # group: antony user::rwuser:john:r-group::r-mask::r-other::r-default:user::rwx default:user:antony:rwx default:group::r-x default:mask::rwx default:other::r-x
86
Chapter 12.
Chapter 12. Managing Unified File and Object Storage Proxy Server All REST requests to the UFO are routed through the Proxy Server. Objects and Containers An object is the basic storage entity and any optional metadata that represents the data you store. When you upload data, the data is stored as-is (with no compression or encryption). A container is a storage compartment for your data and provides a way for you to organize your data. Containers can be visualized as directories in a Linux system. Data must be stored in a container and hence objects are created within a container. It implements objects as files and directories under the container. The object name is a '/' separated path and UFO maps it to directories until the last name in the path, which is marked as a file. With this approach, objects can be accessed as files and directories from native GlusterFS (FUSE) or NFS mounts by providing the '/' separated path. Accounts and Account Servers The OpenStack Object Storage system is designed to be used by many different storage consumers. Each user is associated with one or more accounts and must identify themselves using an authentication system. While authenticating, users must provide the name of the account for which the authentication is requested. UFO implements accounts as GlusterFS volumes. So, when a user is granted read/write permission on an account, it means that that user has access to all the data available on that GlusterFS volume. Authentication and Access Permissions You must authenticate against an authentication service to receive OpenStack Object Storage connection parameters and an authentication token. The token must be passed in for all subsequent container or object operations. One authentication service that you can use as a middleware example is called tempauth. By default, each user has their own storage account and has full access to that account. Users must authenticate with their credentials as described above, but once authenticated they can manage containers and objects within that account. If a user wants to access the content from another account, they must have API access key or a session token provided by their authentication system.
12.3.1. Pre-requisites
GlusterFS's Unified File and Object Storage needs user_xattr support from the underlying disk file system. Use the following command to enable user_xattr for GlusterFS bricks backend: # mount o remount,user_xattr device name For example, # mount o remount,user_xattr /dev/hda1
12.3.2. Dependencies
The following packages are installed on GlusterFS when you install Unified File and Object Storage: curl memcached openssl xfsprogs python2.6 pyxattr python-configobj python-setuptools python-simplejson python-webob python-eventlet python-greenlet python-pastedeploy python-netifaces
Chapter 12. Managing Unified File and Object Storage 1. Download rhel_install.sh install script from http://download.gluster.com/pub/gluster/ glusterfs/3.2/UFO/ . 2. Run rhel_install.sh script using the following command: # sh rhel_install.sh 3. Download swift-1.4.5-1.noarch.rpm and swift-plugin-1.0.-1.el6.noarch.rpm files from http://download.gluster.com/pub/gluster/glusterfs/3.2/UFO/. 4. Install swift-1.4.5-1.noarch.rpm and swift-plugin-1.0.-1.el6.noarch.rpm using the following commands: # rpm -ivh swift-1.4.5-1.noarch.rpm # rpm -ivh swift-plugin-1.0.-1.el6.noarch.rpm
Note
You must repeat the above steps on all the machines on which you want to install Unified File and Object Storage. If you install the Unified File and Object Storage on multiple servers, you can use a load balancer like pound, nginx, and so on to distribute the request across the machines.
90
Note
During installation, the installation script adds few sample users to the proxy-server.conf file. It is highly recommended that you remove all the default sample user entries from the configuration file. For more information on setting ACLs, see Section 12.5.3.6, Setting ACLs on Container .
By default, GlusterFS's Unified File and Object Storage is configured to support HTTP protocol and uses temporary authentication to authenticate the HTTP requests.
91
Chapter 12. Managing Unified File and Object Storage 2. Add the following lines to /etc/swift/proxy-server.conf under [DEFAULT]
bind_port = 443 cert_file = /etc/swift/cert.crt key_file = /etc/swift/cert.key
The following are the configurable options: Table 12.1. proxy-server.conf Default Options in the [DEFAULT] section Option bind_ip bind_port swift_dir workers user cert_file key_file Default 0.0.0.0 80 /etc/swift 1 swift Description IP Address for server to bind Port for server to bind Swift configuration directory Number of workers to fork swift user Path to the ssl .crt Path to the ssl .key
Table 12.2. proxy-server.conf Server Options in the [proxy-server] section Option use Default Description paste.deploy entry point for the container server. For most cases, this should be egg:swift#container. proxy-server LOG_LOCAL0 INFO True 60 Label used when logging Syslog log facility Log level If True, log headers in each request Cache timeout in seconds to send memcached for account existence Cache timeout in seconds to send memcached for container existence Chunk size to read from object servers Chunk size to read from clients Comma separated list of memcached servers ip:port Request timeout to external services
recheck_container_existence
60
92
Configuring Object Server Option client_timeout conn_timeout error_suppression_interval Default 60 0.5 60 Description Timeout to read one chunk from a client Connection timeout to external services Time in seconds that must elapse since the last error for a node to be considered no longer error limited Error count to consider a node error limited Whether account PUTs and DELETEs are even callable
error_suppression_limit allow_account_management
10 false
The following are the configurable options: Table 12.3. object-server.conf Default Options in the [DEFAULT] section Option swift_dir devices mount_check Default /etc/swift /srv/node true Description Swift configuration directory Mount parent directory where devices are mounted Whether or not check if the devices are mounted to prevent 93
Chapter 12. Managing Unified File and Object Storage Option Default Description accidentally writing to the root device bind_ip bind_port workers Option use 0.0.0.0 6000 1 Default IP Address for server to bind Port for server to bind Number of workers to fork Description paste.deploy entry point for the object server. For most cases, this should be egg:swift#object. object-server LOG_LOCAL0 INFO True swift 3 0.5 65536 65536 65536 0 log name used when logging Syslog log facility Logging level Whether or not to log each request swift user Request timeout to external services Connection timeout to external services Size of chunks to read or write over the network Size of chunks to read or write to disk Maximum time allowed to upload an object If > 0, Minimum time in seconds for a PUT or DELETE request to complete
log_name log_facility log_level log_requests user node_timeout conn_timeout network_chunk_size disk_chunk_size max_upload_time slow
94
The following are the configurable options: Table 12.5. container-server.conf Default Options in the [DEFAULT] section Option swift_dir devices mount_check Default /etc/swift /srv/node true Description Swift configuration directory Mount parent directory where devices are mounted Whether or not check if the devices are mounted to prevent accidentally writing to the root device IP Address for server to bind Port for server to bind Number of workers to fork Swift user Description paste.deploy entry point for the container server. For most cases, this should be egg:swift#container. container-server LOG_LOCAL0 INFO 3 0.5 Label used when logging Syslog log facility Logging level Request timeout to external services Connection timeout to external services
95
The following are the configurable options: Table 12.7. account-server.conf Default Options in the [DEFAULT] section Option swift_dir devices mount_check Default /etc/swift /srv/node true Description Swift configuration directory mount parent directory where devices are mounted Whether or not check if the devices are mounted to prevent accidentally writing to the root device IP Address for server to bind Port for server to bind Number of workers to fork Swift user
Table 12.8. account-server.conf Server Options in the [account-server] section Option use Default Description paste.deploy entry point for the container server. For most cases, this should be egg:swift#container. account-server LOG_LOCAL0 INFO Label used when logging Syslog log facility Logging level
Working with Unified File and Object Storage To stop the server, enter the following command: # swift_init main stop
For example,
GET auth/v1.0 HTTP/1.1 Host: auth.example.com X-Auth-User: test:tester X-Auth-Key: testing HTTP/1.1 200 OK X-Storage-Url: https:/example.storage.com:443/v1/AUTH_test X-Storage-Token: AUTH_tkde3ad38b087b49bbbac0494f7600a554 X-Auth-Token: AUTH_tkde3ad38b087b49bbbac0494f7600a554 Content-Length: 0 Date: Wed, 10 jul 2011 06:11:51 GMT
To authenticate access using cURL (for the above example), run the following command:
curl -v -H 'X-Storage-User: test:tester' -H 'X-Storage-Pass:testing' -k https://auth.example.com:443/auth/v1.0
The X-Auth-Url has to be parsed and used in the connection and request line of all subsequent requests to the server. In the example output, users connecting to server will send most container/ object requests with a host header of example.storage.com and the request line's version and account as v1/AUTH_test.
Note
The authentication tokens are valid for a 24 hour period.
97
Description Limits the number of results to at most n value. Returns object names greater in value than the specified marker. Specify either json or xml to return the respective serialized response.
For example,
GET /v1/AUTH_test HTTP/1.1 Host: example.storage.com X-Auth-Token: AUTH_tkd3ad38b087b49bbbac0494f7600a554 HTTP/1.1 200 Ok Date: Wed, 13 Jul 2011 16:32:21 GMT Server: Apache Content-Type: text/plain; charset=UTF-8 Content-Length: 39 songs movies documents reports
To display container information using cURL (for the above example), run the following command:
curl -v -X GET -H 'X-Auth-Token: AUTH_tkde3ad38b087b49bbbac0494f7600a554' https://example.storage.com:443/v1/AUTH_test -k
98
To display account metadata information using cURL (for the above example), run the following command:
curl -v -X HEAD -H 'X-Auth-Token: AUTH_tkde3ad38b087b49bbbac0494f7600a554' https://example.storage.com:443/v1/AUTH_test -k
For example,
PUT /v1/AUTH_test/pictures/ HTTP/1.1 Host: example.storage.com X-Auth-Token: AUTH_tkd3ad38b087b49bbbac0494f7600a554 HTTP/1.1 201 Created Date: Wed, 13 Jul 2011 17:32:21 GMT Server: Apache Content-Type: text/plain; charset=UTF-8
To create container using cURL (for the above example), run the following command:
curl -v -X PUT -H 'X-Auth-Token: AUTH_tkde3ad38b087b49bbbac0494f7600a554' https://example.storage.com:443/v1/AUTH_test/pictures -k
The status code of 201 (Created) indicates that you have successfully created the container. If a container with same is already existed, the status code of 202 is displayed.
99
To display objects of a container List objects of a specific container using the following command:
GET /<apiversion>/<account>/<container>[parm=value] HTTP/1.1 Host: <storage URL> X-Auth-Token: <authentication-token-key>
For example,
GET /v1/AUTH_test/images HTTP/1.1 Host: example.storage.com X-Auth-Token: AUTH_tkd3ad38b087b49bbbac0494f7600a554 HTTP/1.1 200 Ok Date: Wed, 13 Jul 2011 15:42:21 GMT Server: Apache Content-Type: text/plain; charset=UTF-8 Content-Length: 139 sample file.jpg test-file.pdf You and Me.pdf Puddle of Mudd.mp3 Test Reports.doc
To display objects of a container using cURL (for the above example), run the following command:
curl -v -X GET-H 'X-Auth-Token: AUTH_tkde3ad38b087b49bbbac0494f7600a554' https://example.storage.com:443/v1/AUTH_test/images -k
For example,
HEAD /v1/AUTH_test/images HTTP/1.1 Host: example.storage.com X-Auth-Token: AUTH_tkd3ad38b087b49bbbac0494f7600a554 HTTP/1.1 204 No Content Date: Wed, 13 Jul 2011 19:52:21 GMT Server: Apache X-Account-Object-Count: 8 X-Container-Bytes-Used: 472
To display list of objects and storage used in a container using cURL (for the above example), run the following command:
curl -v -X HEAD -H 'X-Auth-Token: AUTH_tkde3ad38b087b49bbbac0494f7600a554' https://example.storage.com:443/v1/AUTH_test/images -k
For example,
DELETE /v1/AUTH_test/pictures HTTP/1.1 Host: example.storage.com X-Auth-Token: AUTH_tkd3ad38b087b49bbbac0494f7600a554 HTTP/1.1 204 No Content Date: Wed, 13 Jul 2011 17:52:21 GMT Server: Apache Content-Length: 0 Content-Type: text/plain; charset=UTF-8
To delete a container using cURL (for the above example), run the following command:
curl -v -X DELETE -H 'X-Auth-Token: AUTH_tkde3ad38b087b49bbbac0494f7600a554' https://example.storage.com:443/v1/AUTH_test/pictures -k
101
Chapter 12. Managing Unified File and Object Storage The status code of 204 (No Content) indicates that you have successfully deleted the container. If that container does not exist, the status code 404 (Not Found) is displayed, and if the container is not empty, the status code 409 (Conflict) is displayed.
For example,
POST /v1/AUTH_test/images HTTP/1.1 Host: example.storage.com X-Auth-Token: AUTH_tkd3ad38b087b49bbbac0494f7600a554 X-Container-Meta-Zoo: Lion X-Container-Meta-Home: Dog HTTP/1.1 204 No Content Date: Wed, 13 Jul 2011 20:52:21 GMT Server: Apache Content-Type: text/plain; charset=UTF-8
To update the metadata of the object using cURL (for the above example), run the following command:
curl -v -X POST -H 'X-Auth-Token: AUTH_tkde3ad38b087b49bbbac0494f7600a554' https://example.storage.com:443/v1/AUTH_test/images -H ' X-Container-Meta-Zoo: Lion' -H 'X-Container-Meta-Home: Dog' -k
The status code of 204 (No Content) indicates the container's metadata is updated successfully. If that object does not exist, the status code 404 (Not Found) is displayed.
102
By default, allowing read access via .r will not allow listing objects in the container but allows retrieving objects from the container. To turn on listings, use the .rlistings directive. Also, .r designations are not allowed in headers whose names include the word write. For example, to set all the objects access rights to "public# inside the container using cURL (for the above example), run the following command:
curl -v -X POST -H 'X-Auth-Token: AUTH_tkde3ad38b087b49bbbac0494f7600a554' https://example.storage.com:443/v1/AUTH_test/images -H 'X-Container-Read: .r:*' -k
For example,
PUT /v1/AUTH_test/pictures/dog HTTP/1.1 Host: example.storage.com X-Auth-Token: AUTH_tkd3ad38b087b49bbbac0494f7600a554 ETag: da1e100dc9e7becc810986e37875ae38 HTTP/1.1 201 Created
103
To create or update an object using cURL (for the above example), run the following command:
curl -v -X PUT -H 'X-Auth-Token: AUTH_tkde3ad38b087b49bbbac0494f7600a554' https://example.storage.com:443/v1/AUTH_test/pictures/dog -H 'ContentLength: 0' -k
The status code of 201 (Created) indicates that you have successfully created or updated the object. If there is a missing content-Length or Content-Type header in the request, the status code of 412 (Length Required) is displayed. (Optionally) If the MD5 checksum of the data written to the storage system does not match the ETag value, the status code of 422 (Unprocessable Entity) is displayed.
For example,
PUT /v1/AUTH_test/pictures/cat HTTP/1.1 Host: example.storage.com X-Auth-Token: AUTH_tkd3ad38b087b49bbbac0494f7600a554 Transfer-Encoding: chunked X-Object-Meta-PIN: 2343 19 A bunch of data broken up D into chunks. 0
COPY /<apiversion>/<account>/<container>/<sourceobject> HTTP/1.1 Host: <storage URL> X-Auth-Token: < authentication-token-key> Destination: /<container>/<destinationobject>
For example,
COPY /v1/AUTH_test/images/dogs HTTP/1.1 Host: example.storage.com X-Auth-Token: AUTH_tkd3ad38b087b49bbbac0494f7600a554 Destination: /photos/cats HTTP/1.1 201 Created Date: Wed, 13 Jul 2011 18:32:21 GMT Server: Apache Content-Length: 0 Content-Type: text/plain; charset=UTF-8
To copy an object using cURL (for the above example), run the following command:
curl -v -X COPY -H 'X-Auth-Token: AUTH_tkde3ad38b087b49bbbac0494f7600a554' -H 'Destination: /photos/cats' -k https:// example.storage.com:443/v1/AUTH_test/images/dogs
The status code of 201 (Created) indicates that you have successfully copied the object. If there is a missing content-Length or Content-Type header in the request, the status code of 412 (Length Required) is displayed. You can also use PUT command to copy object by using additional header X-Copy-From: container/obj. To use PUT command to copy an object, run the following command:
PUT /v1/AUTH_test/photos/cats HTTP/1.1 Host: example.storage.com X-Auth-Token: AUTH_tkd3ad38b087b49bbbac0494f7600a554 X-Copy-From: /images/dogs HTTP/1.1 201 Created Date: Wed, 13 Jul 2011 18:32:21 GMT Server: Apache Content-Type: text/plain; charset=UTF-8
To copy an object using cURL (for the above example), run the following command:
curl -v -X PUT -H 'X-Auth-Token: AUTH_tkde3ad38b087b49bbbac0494f7600a554' -H 'X-Copy-From: /images/dogs' k https://example.storage.com:443/v1/AUTH_test/images/cats
The status code of 201 (Created) indicates that you have successfully copied the object.
For example,
GET /v1/AUTH_test/images/cat HTTP/1.1 Host: example.storage.com X-Auth-Token: AUTH_tkd3ad38b087b49bbbac0494f7600a554 HTTP/1.1 200 Ok Date: Wed, 13 Jul 2011 23:52:21 GMT Server: Apache Last-Modified: Thu, 14 Jul 2011 13:40:18 GMT ETag: 8a964ee2a5e88be344f36c22562a6486 Content-Length: 534210 [.........]
To display the content of an object using cURL (for the above example), run the following command:
curl -v -X GET -H 'X-Auth-Token: AUTH_tkde3ad38b087b49bbbac0494f7600a554' https://example.storage.com:443/v1/AUTH_test/images/cat -k
The status code of 200 (Ok) indicates the object#s data is displayed successfully. If that object does not exist, the status code 404 (Not Found) is displayed.
For example,
HEAD /v1/AUTH_test/images/cat HTTP/1.1 Host: example.storage.com X-Auth-Token: AUTH_tkd3ad38b087b49bbbac0494f7600a554 HTTP/1.1 204 No Content Date: Wed, 13 Jul 2011 21:52:21 GMT Server: Apache Last-Modified: Thu, 14 Jul 2011 13:40:18 GMT ETag: 8a964ee2a5e88be344f36c22562a6486 Content-Length: 512000 Content-Type: text/plain; charset=UTF-8 X-Object-Meta-House: Cat X-Object-Meta-Zoo: Cat X-Object-Meta-Home: Cat X-Object-Meta-Park: Cat
To display the metadata of the object using cURL (for the above example), run the following command:
106
The status code of 204 (No Content) indicates the object#s metadata is displayed successfully. If that object does not exist, the status code 404 (Not Found) is displayed.
For example,
POST /v1/AUTH_test/images/cat HTTP/1.1 Host: example.storage.com X-Auth-Token: AUTH_tkd3ad38b087b49bbbac0494f7600a554 X-Object-Meta-Zoo: Lion X-Object-Meta-Home: Dog HTTP/1.1 202 Accepted Date: Wed, 13 Jul 2011 22:52:21 GMT Server: Apache Content-Length: 0 Content-Type: text/plain; charset=UTF-8
To update the metadata of an object using cURL (for the above example), run the following command:
curl -v -X POST -H 'X-Auth-Token: AUTH_tkde3ad38b087b49bbbac0494f7600a554' https://example.storage.com:443/v1/AUTH_test/images/cat -H ' X-ObjectMeta-Zoo: Lion' -H 'X-Object-Meta-Home: Dog' -k
The status code of 202 (Accepted) indicates that you have successfully updated the object#s metadata. If that object does not exist, the status code 404 (Not Found) is displayed.
For example,
DELETE /v1/AUTH_test/pictures/cat HTTP/1.1 Host: example.storage.com X-Auth-Token: AUTH_tkd3ad38b087b49bbbac0494f7600a554 HTTP/1.1 204 No Content Date: Wed, 13 Jul 2011 20:52:21 GMT Server: Apache Content-Type: text/plain; charset=UTF-8
To delete an object using cURL (for the above example), run the following command:
curl -v -X DELETE -H 'X-Auth-Token: AUTH_tkde3ad38b087b49bbbac0494f7600a554' https://example.storage.com:443/v1/AUTH_test/pictures/cat -k
The status code of 204 (No Content) indicates that you have successfully deleted the object. If that object does not exist, the status code 404 (Not Found) is displayed.
108
Chapter 13.
13.2. Advantages
The following are the advantages of Hadoop Compatible Storage with GlusterFS: Provides simultaneous file-based and object-based access within Hadoop. Eliminates the centralized metadata server. Provides compatibility with MapReduce applications and rewrite is not required. Provides a fault tolerant file system.
13.3.1. Pre-requisites
The following are the pre-requisites to install Hadoop Compatible Storage : Hadoop 0.20.2 is installed, configured, and is running on all the machines in the cluster. Java Runtime Environment 109
Chapter 13. Managing Hadoop Compatible Storage Maven (mandatory only if you are building the plugin from the source) JDK (mandatory only if you are building the plugin from the source) getfattr - command line utility
110
The following are the configurable fields: Property Name fs.default.name Default Value glusterfs://fedora1:9000 Description Any hostname in the cluster as the server and any port number. GlusterFS volume to mount. The directory used to fuse mount the volume. Any hostname or IP address on the cluster except the client/master. Performance tunable option. If this option is set to On, the plugin will try to perform I/ O directly from the disk file system (like ext3 or ext4) the file resides on. Hence read performance will improve and job would run faster.
quick.slave.io
Off
Note
This option is not tested widely
5. Create a soft link in Hadoops library and configuration directory for the downloaded files (in Step 3) using the following commands: # ln -s <target location> <source location> For example, # ln s /usr/local/lib/glusterfs-0.20.2-0.1.jar $HADOOP_HOME/lib/ glusterfs-0.20.2-0.1.jar # ln s /usr/local/lib/conf/core-site.xml $HADOOP_HOME/conf/coresite.xml 6. (Optional) You can run the following command on Hadoop master to build the plugin and deploy it along with core-site.xml file, instead of repeating the above steps: # build-deploy-jar.py -d $HADOOP_HOME -c 111
Note
You must start Hadoop MapReduce daemon on all servers.
112
Chapter 14.
Troubleshooting GlusterFS
This section describes how to manage GlusterFS logs and most common troubleshooting scenarios related to GlusterFS.
Note
When a log file is rotated, the contents of the current log file are moved to log-file- name.epochtime-stamp.
113
Chapter 14. Troubleshooting GlusterFS To get the Master-log-file for geo-replication, use the following command: gluster volume geo-replication MASTER SLAVE config log-file For example: # gluster volume geo-replication Volume1 example.com:/data/remote_dir config log-file Slave Log File To get the log file for Geo-replication on slave (glusterd must be running on slave machine), use the following commands: 1. On master, run the following command: # gluster volume geo-replication Volume1 example.com:/data/remote_dir config session-owner 5f6e5200-756f-11e0-a1f0-0800200c9a66 Displays the session owner details. 2. On slave, run the following command: # gluster volume geo-replication /data/remote_dir config log-file /var/ log/gluster/${session-owner}:remote-mirror.log 3. Replace the session owner details (output of Step 1) to the output of the Step 2 to get the location of the log file. /var/log/gluster/5f6e5200-756f-11e0-a1f0-0800200c9a66:remote-mirror.log
Rotate log file for all sessions for a master volume using the following command: # gluster volume geo-replication master log-rotate For example, to rotate the log file of master Volume1:
# gluster volume geo-replication Volume1 log rotate log rotate successful
114
Synchronization is not complete Rotate log file for all sessions using the following command: # gluster volume geo-replication log-rotate For example, to rotate the log file for all sessions:
# gluster volume geo-replication log-rotate log rotate successful
Chapter 14. Troubleshooting GlusterFS If GlusterFS 3.3 or higher is not installed in the default location (in Master) and has been prefixed to be installed in a custom location, configure the gluster-command for it to point to the exact location. If GlusterFS 3.3 or higher is not installed in the default location (in slave) and has been prefixed to be installed in a custom location, configure the remote-gsyncd-command for it to point to the exact place where geo-replication is located.
14.3.1. setfacl command fails with setfacl: <file or directory name>: Operation not supported error
You may face this error when the backend file systems in one of the servers is not mounted with the "o acl" option. The same can be confirmed by viewing the following error message in the log file of the server "Posix access control list is not supported". Solution: Remount the backend file system with "-o acl" option. For more information, see Section 11.1.1, Activating POSIX ACLs Support on Sever .
14.5.1. mount command on NFS client fails with RPC Error: Program not registered
Start portmap or rpcbind service on the machine where NFS server is running. 116
NFS server start-up fails with Port is already in use error in the log file." This error is encountered when the server has not started correctly. On most Linux distributions this is fixed by starting portmap: $ /etc/init.d/portmap start On some distributions where portmap has been replaced by rpcbind, the following command is required: $ /etc/init.d/rpcbind start After starting portmap or rpcbind, gluster NFS server needs to be restarted.
14.5.2. NFS server start-up fails with Port is already in use error in the log file."
Another Gluster NFS server is running on the same machine. This error can arise in case there is already a Gluster NFS server running on the same machine. This situation can be confirmed from the log file, if the following error lines exist:
[2010-05-26 23:40:49] E [rpc-socket.c:126:rpcsvc_socket_listen] rpc-socket: binding socket failed:Address already in use [2010-05-26 23:40:49] E [rpc-socket.c:129:rpcsvc_socket_listen] rpc-socket: Port is already in use [2010-05-26 23:40:49] E [rpcsvc.c:2636:rpcsvc_stage_program_register] rpc-service: could not create listening connection [2010-05-26 23:40:49] E [rpcsvc.c:2675:rpcsvc_program_register] rpc-service: stage registration of program failed [2010-05-26 23:40:49] E [rpcsvc.c:2695:rpcsvc_program_register] rpc-service: Program registration failed: MOUNT3, Num: 100005, Ver: 3, Port: 38465 [2010-05-26 23:40:49] E [nfs.c:125:nfs_init_versions] nfs: Program init failed [2010-05-26 23:40:49] C [nfs.c:531:notify] nfs: Failed to initialize protocols
To resolve this error one of the Gluster NFS servers will have to be shutdown. At this time, Gluster NFS server does not support running multiple NFS servers on the same machine.
Chapter 14. Troubleshooting GlusterFS $ /etc/init.d/portmap start On some distributions where portmap has been replaced by rpcbind, the following command is required: $ /etc/init.d/rpcbind start
14.5.5. NFS server, glusterfsd starts but initialization fails with nfsrpc- service: portmap registration of program failed error message in the log.
NFS start-up can succeed but the initialization of the NFS service can still fail preventing clients from accessing the mount points. Such a situation can be confirmed from the following error messages in the log file:
[2010-05-26 23:33:47] E [rpcsvc.c:2598:rpcsvc_program_register_portmap] rpc-service: Could notregister with portmap [2010-05-26 23:33:47] E [rpcsvc.c:2682:rpcsvc_program_register] rpc-service: portmap registration of program failed [2010-05-26 23:33:47] E [rpcsvc.c:2695:rpcsvc_program_register] rpc-service: Program registration failed: MOUNT3, Num: 100005, Ver: 3, Port: 38465 [2010-05-26 23:33:47] E [nfs.c:125:nfs_init_versions] nfs: Program init failed [2010-05-26 23:33:47] C [nfs.c:531:notify] nfs: Failed to initialize protocols [2010-05-26 23:33:49] E [rpcsvc.c:2614:rpcsvc_program_unregister_portmap] rpc-service: Could not unregister with portmap [2010-05-26 23:33:49] E [rpcsvc.c:2731:rpcsvc_program_unregister] rpc-service: portmap unregistration of program failed [2010-05-26 23:33:49] E [rpcsvc.c:2744:rpcsvc_program_unregister] rpc-service: Program unregistration failed: MOUNT3, Num: 100005, Ver: 3, Port: 38465
1. Start portmap or rpcbind service on the NFS server. On most Linux distributions, portmap can be started using the following command: $ /etc/init.d/portmap start On some distributions where portmap has been replaced by rpcbind, run the following command: $ /etc/init.d/rpcbind start After starting portmap or rpcbind, gluster NFS server needs to be restarted. 2. Stop another NFS server running on the same machine. Such an error is also seen when there is another NFS server running on the same machine but it is not the Gluster NFS server. On Linux systems, this could be the kernel NFS server. Resolution involves stopping the other NFS server or not running the Gluster NFS server on the machine. Before stopping the kernel NFS server, ensure that no critical service depends on access to that NFS server's exports. On Linux, kernel NFS servers can be stopped by using either of the following commands depending on the distribution in use: $ /etc/init.d/nfs-kernel-server stop $ /etc/init.d/nfs stop 3. Restart Gluster NFS server. 118
Note
Note: Remember that disabling the NFS server forces authentication of clients to use only IP addresses and if the authentication rules in the volume file use hostnames, those authentication rules will fail and disallow mounting for those clients. or 2. NFS version used by the NFS client is other than version 3. Gluster NFS server supports version 3 of NFS protocol. In recent Linux kernels, the default NFS version has been changed from 3 to 4. It is possible that the client machine is unable to connect to the Gluster NFS server because it is using version 4 messages which are not understood by Gluster NFS server. The timeout can be resolved by forcing the NFS client to use version 3. The vers option to mount command is used for this purpose: $ mount nfsserver:export -o vers=3 mount-point
14.5.8. Application fails with "Invalid argument" or "Value too large for defined data type" error.
These two errors generally happen for 32-bit nfs clients or applications that do not support 64-bit inode numbers or large files. Use the following option from the CLI to make Gluster NFS server return 32-bit inode numbers instead: nfs.enable-ino32 <on|off>
119
Chapter 14. Troubleshooting GlusterFS Applications that will benefit are those that were either: built 32-bit and run on 32-bit machines such that they do not support large files by default built 32-bit on 64-bit systems This option is disabled by default. So Gluster NFS server returns 64-bit inode numbers by default. Applications which can be rebuilt from source are recommended to rebuild using the following flag with gcc: -D_FILE_OFFSET_BITS=64
The statedump files are created on the brick servers in the /tmp directory or in the directory set using server.statedump-path volume option. The naming convention of the dump file is <brick-path>.<brick-pid>.dump. The following are the sample contents of the statedump file. It indicates that GlusterFS has entered into a state where there is an entry lock (entrylk) and an inode lock (inodelk). Ensure that those are stale locks and no resources own them before clearing.
[xlator.features.locks.vol-locks.inode] path=/ mandatory=0 entrylk-count=1 lock-dump.domain.domain=vol-replicate-0 xlator.feature.locks.lock-dump.domain.entrylk.entrylk[0](ACTIVE)=type=ENTRYLK_WRLCK on basename=file1, pid = 714782904, owner=ffffff2a3c7f0000, transport=0x20e0670, , granted at Mon Feb 27 16:01:01 2012 conn.2.bound_xl./gfs/brick1.hashsize=14057 conn.2.bound_xl./gfs/brick1.name=/gfs/brick1/inode conn.2.bound_xl./gfs/brick1.lru_limit=16384 conn.2.bound_xl./gfs/brick1.active_size=2
120
2. Clear the lock using the following command: # gluster volume clear-locks VOLNAME path kind granted entry basename For example, to clear the entry lock on file1 of test-volume:
# gluster volume clear-locks test-volume / kind granted entry file1 Volume clear-locks successful vol-locks: entry blocked locks=0 granted locks=1
3. Clear the inode lock using the following command: # gluster volume clear-locks VOLNAME path kind granted inode range For example, to clear the inode lock on file1 of test-volume:
# gluster volume clear-locks test-volume /file1 kind granted inode 0,0-0 Volume clear-locks successful vol-locks: inode blocked locks=0 granted locks=1
You can perform statedump on test-volume again to verify that the above inode and entry locks are cleared.
121
122
Chapter 15.
Command Reference
This section describes the available commands and includes thefollowing section: gluster Command Gluster Console Manager (command line interpreter) glusterd Daemon Gluster elastic volume management daemon
Chapter 15. Command Reference Command abort | status | commit | commit force] volume remove-brick VOLNAME [replica N] BRICK1 ... [start | stop | status | commit | force ] Removes the specified brick(s) from the specified volume. 'remove-brick' command can be used to reduce the replica count of the volume when 'replica N' option is given. To ensure data migration from the removed brick to existing bricks, give 'start' sub-command at the end of the command. After the 'status' command says remove-brick operation is complete, user can 'commit' the changes to volume file. Using 'remove-brick' without 'start' option works similar to 'force' command, which makes the changes to volume configuration without migrating the data. Starts rebalancing of the data on specified volume. Stops rebalancing the specified volume. Displays the rebalance status of the specified volume. Description
Rebalance volume rebalance VOLNAME start volume rebalance VOLNAME stop volume rebalance VOLNAME status Log volume log rotate VOLNAME [BRICK] Debugging volume top VOLNAME {[open| read|write|opendir|readdir [nfs]] |[read-perf|write-perf [nfs|{bs COUNT count COUNT}]]|[clear [nfs]]} [BRICK] [list-cnt COUNT] volume profile VOLNAME {start|info|stop} [nfs] volume status [all | VOLNAME] [nfs|shd|BRICK] [detail|clients| mem|inode|fd|callpool] Shows the operation details on the volume depending on the arguments given. Rotates the log file for corresponding volume/brick.
Shows the file operation details on each bricks of the volume. Show details of activity, internal data of the processes (nfs/shd/ BRICK) corresponding to one of the next argument given. If now argument is given, this command outputs bare minimum details of the current status (include PID of brick process etc) of volume's bricks. Command is used to take the statedump of the process, which is used captures most of the internal details.
statedump VOLNAME [nfs] [all|mem|iobuf|callpool|priv|fd| inode|history] Peer peer probe HOSTNAME peer detach HOSTNAME peer status peer help Geo-replication
Probes the specified peer. Detaches the specified peer. Displays the status of peers. Displays help for the peer command.
124
gluster Command Command volume geo-replication MASTER SLAVE start Description Start geo-replication between the hosts specified by MASTER and SLAVE. You can specify a local master volume as :VOLNAME. You can specify a local slave volume as :VOLUME and a local slave directory as /DIRECTORY/SUB-DIRECTORY. You can specify a remote slave volume as DOMAIN::VOLNAME and a remote slave directory as DOMAIN:/DIRECTORY/SUBDIRECTORY. volume geo-replication MASTER SLAVE stop Stop geo-replication between the hosts specified by MASTER and SLAVE. You can specify a local master volume as :VOLNAME and a local master directory as /DIRECTORY/ SUB-DIRECTORY. You can specify a local slave volume as :VOLNAME and a local slave directory as /DIRECTORY/SUB-DIRECTORY. You can specify a remote slave volume as DOMAIN::VOLNAME and a remote slave directory as DOMAIN:/DIRECTORY/SUBDIRECTORY. volume geo-replication MASTER SLAVE config [options] Configure geo-replication options between the hosts specified by MASTER and SLAVE. gluster-command COMMAND gluster-log-level LOGFILELEVEL log-file LOGFILE log-level LOGFILELEVEL remote-gsyncd COMMAND The path where the gluster command is installed. The log level for gluster processes. The path to the geo-replication log file. The log level for georeplication. The path where the gsyncd binary is installed on the remote machine. The ssh command to use to connect to the remote machine (the default is ssh). The rsync command to use for synchronizing the files (the default is rsync). The command to delete the existing master UID for the intermediate/slave node. The timeout period. The number of simultaneous files/directories that can be synchronized.
ssh-command COMMAND
rsync-command COMMAND
volume_id= UID
125
Chapter 15. Command Reference Command ignore-deletes Description If this option is set to 1, a file deleted on master will not trigger a delete operation on the slave. Hence, the slave will remain as a superset of the master and can be used to recover the master in case of crash and/or accidental delete. Display the command options. Exit the gluster command line interface.
FILES /var/lib/glusterd/*
--debug
127
128
Chapter 16.
Glossary
Brick A Brick is the GlusterFS basic unit of storage, represented by an export directory on a server in the trusted storage pool. A Brick is expressed by combining a server with an export directory in the following format: SERVER:EXPORT For example: myhostname:/exports/myexportdir/ Cluster Distributed File System Filesystem A cluster is a group of linked computers, working together closely thus in many respects forming a single computer. A file system that allows multiple clients to concurrently access data over a computer network. A method of storing and organizing computer files and their data. Essentially, it organizes these files into a database for the storage, organization, manipulation, and retrieval by the computer's operating system. Source: Wikipedia FUSE
1
Filesystem in Userspace (FUSE) is a loadable kernel module for Unix-like computer operating systems that lets non-privileged users create their own file systems without editing kernel code. This is achieved by running file system code in user space while the FUSE module provides only a "bridge" to the actual kernel interfaces. Source: Wikipedia
2
Geo-Replication
Geo-replication provides a continuous, asynchronous, and incremental replication service from site to another over Local Area Networks (LAN), Wide Area Network (WAN), and across the Internet. The Gluster management daemon that needs to run on all servers in the trusted storage pool. Metadata is data providing information about one or more other pieces of data. Namespace is an abstract container or environment created to hold a logical grouping of unique identifiers or symbols. Each Gluster volume exposes a single namespace as a POSIX mount point that contains every file in the cluster. Open source describes practices in production and development that promote access to the end product's source materials. Some consider open source a philosophy, others consider it a pragmatic methodology.
Open Source
1 2
http://en.wikipedia.org/wiki/Filesystem http://en.wikipedia.org/wiki/Filesystem_in_Userspace
129
Chapter 16. Glossary Before the term open source became widely adopted, developers and producers used a variety of phrases to describe the concept; open source gained hold with the rise of the Internet, and the attendant need for massive retooling of the computing source code. Opening the source code enabled a self-enhancing diversity of production models, communication paths, and interactive communities. Subsequently, a new, three-word phrase "open source software" was born to describe the environment that the new copyright, licensing, domain, and consumer issues created. Source: Wikipedia Petabyte
3
A petabyte (derived from the SI prefix peta- ) is a unit of information equal to one quadrillion (short scale) bytes, or 1000 terabytes. The unit symbol for the petabyte is PB. The prefix peta- (P) indicates a power of 1000: 1 PB = 1,000,000,000,000,000 B = 10005 B = 1015 B. The term "pebibyte" (PiB), using a binary prefix, is used for the corresponding power of 1024. Source: Wikipedia
4
POSIX
Portable Operating System Interface (for Unix) is the name of a family of related standards specified by the IEEE to define the application programming interface (API), along with shell and utilities interfaces for software compatible with variants of the Unix operating system. Gluster exports a fully POSIX compliant file system. Redundant Array of Inexpensive Disks (RAID) is a technology that provides increased storage reliability through redundancy, combining multiple low-cost, less-reliable disk drives components into a logical unit where all drives in the array are interdependent. Round Robin Domain Name Service (RRDNS) is a method to distribute load across application servers. RRDNS is implemented by creating multiple A records with the same name and different IP addresses in the zone file of a DNS server. A storage pool is a trusted network of storage servers. When you start the first server, the storage pool consists of that server alone. Applications running in user space dont directly interact with hardware, instead using the kernel to moderate access. Userspace applications are generally more portable than applications in kernel space. Gluster is a user space application. Volfile is a configuration file used by glusterfs process. Volfile will be usually located at /etc/glusterd/vols/VOLNAME.
RAID
RRDNS
Userspace
Volfile
3 4
http://en.wikipedia.org/wiki/Open_source http://en.wikipedia.org/wiki/Petabyte
130
Volume
A volume is a logical collection of bricks. Most of the gluster management operations happen on the volume.
131
132
133
134