SlideShare a Scribd company logo
Roberto Franchini 
franchini@celi.it 
Codemotion Milano 
29/11/2014
GlusterFS 
A scalable distributed 
file system
whoami(1) 
15 years of experience, proud to be a programmer 
Writes software for information extraction, nlp, opinion mining 
(@scale ), and a lot of other buzzwords 
Implements scalable architectures 
Plays with servers 
Member of the JUG-Torino coordination team 
franchini@celi.it 
http://www.celi.it http://www.blogmeter.it 
github.com/robfrank github.com/uim-celi 
twitter.com/robfrankie linkedin.com/in/robfrank
The problem 
Identify a distributed and scalable 
file system 
for today's and tomorrow's 
Big Data
Once upon a time 
2008: One nfs share 
1,5TB ought to be enough for anybody 
2010: Herd of shares 
(1,5TB x N) ought to be enough for anybody 
Nobody couldn’t stop the data flood 
It was the time for something new
Requirements 
Can be enlarged on demand 
No dedicated HW 
OS is preferred and trusted 
No specialized API 
No specialized Kernel 
POSIX compliance 
Zilions of big and small files 
No NAS or SAN (€€€€€)
Clustered Scale-out General Purpose Storage 
Platform 
− POSIX-y Distributed File System 
− ...and so much more 
Built on commodity systems 
− x86_64 Linux ++ 
− POSIX filesystems underneath (XFS, 
EXT4) 
No central metadata Server (NO SPOF) 
Modular architecture for scale and functionality
Common 
use cases 
Large Scale File Server 
Media / Content Distribution Network (CDN) 
Backup / Archive / Disaster Recovery (DR) 
High Performance Computing (HPC) 
Infrastructure as a Service (IaaS) storage layer 
Database offload (blobs) 
Unified Object Store + File Access
Features 
ACL and Quota support 
Fault-tolerance 
Peer to peer 
Self-healing 
Fast setup up 
Enlarge on demand 
Shrink on demand 
Snapshot 
On premise phisical or virtual 
On cloud
Architecture
Architecture 
Peer / Node 
− cluster servers (glusterfs server) 
− Runs the gluster daemons and participates in volumes 
Brick 
− A filesystem mountpoint on servers 
− A unit of storage used as a capacity building block
Bricks on a node
Architecture 
Translator 
− Logic between bricks or subvolume that generate a 
subvolume with certain characteristic 
− distribute, replica, stripe are special translators to 
generate simil-RAID configuration 
− perfomance translators 
Volume 
− Bricks combined and passed through translators 
− Ultimately, what's presented to the end user
Volume
Volume types
Distributed 
The default configuration 
Files “evenly” spread across bricks 
Similar to file-level RAID 0 
Server/Disk failure could be catastrophic
Distributed
Replicated 
Files written synchronously to replica peers 
Files read synchronously, 
but ultimately serviced by the first responder 
Similar to file-level RAID 1
Replicated
Distributed + replicated 
Distribued + replicated 
Similar to file-level RAID 10 
Most used layout
Distributed replicated
Striped 
Individual files split among bricks (sparse files) 
Similar to block-level RAID 0 
Limited Use Cases 
HPC Pre/Post Processing 
File size exceeds brick size
Striped
Moving parts
Components 
glusterd 
Management daemon 
One instance on each GlusterFS server 
Interfaced through gluster CLI 
glusterfsd 
GlusterFS brick daemon 
One process for each brick on each server 
Managed by glusterd
Components 
glusterfs 
Volume service daemon 
One process for each volume service 
NFS server, FUSE client, Self-Heal, Quota, ... 
mount.glusterfs 
FUSE native client mount extension 
gluster 
Gluster Console Manager (CLI)
Clients
Clients: native 
FUSE kernel module allows the filesystem to be built and 
operated entirely in userspace 
Specify mount to any GlusterFS server 
Native Client fetches volfile from mount server, then 
communicates directly with all nodes to access data 
Recommended for high concurrency and high write 
performance 
Load is inherently balanced across distributed volumes
Clients:NFS 
Standard NFS v3 clients 
Standard automounter is supported 
Mount to any server, or use a load balancer 
GlusterFS NFS server includes Network Lock Manager 
(NLM) to synchronize locks across clients 
Better performance for reading many small files from a 
single client 
Load balancing must be managed externally
Clients: libgfapi 
Introduced with GlusterFS 3.4 
User-space library for accessing data in GlusterFS 
Filesystem-like API 
Runs in application process 
no FUSE, no copies, no context switches 
...but same volfiles, translators, etc.
Clients: SMB/CIFS 
In GlusterFS 3.4 – Samba + libgfapi 
No need for local native client mount & re-export 
Significant performance improvements with FUSE 
removed from the equation 
Must be setup on each server you wish to connect to via 
CIFS 
CTDB is required for Samba clustering
Clients: HDFS 
Access data within and outside of Hadoop 
No HDFS name node single point of failure / bottleneck 
Seamless replacement for HDFS 
Scales with the massive growth of big data
Scalability
Under the hood 
Elastic Hash Algorithm 
No central metadata 
No Performance Bottleneck 
Eliminates risk scenarios 
Location hashed intelligently on filename 
Unique identifiers (GFID), similar to md5sum
Scalability 
3TB 
3TB 
3TB 
3TB 
3TB 
3TB 
3TB 
3TB 
3TB 
3TB 
Gluster Server 
3TB 
3TB 
3TB 
3TB 
3TB 
3TB 
3TB 
3TB 
3TB 
3TB 
Gluster Server 
3TB 
3TB 
3TB 
3TB 
3TB 
3TB 
3TB 
3TB 
3TB 
3TB 
Gluster Server 
Scale out performance and availability 
Scale out capacitry
Scalability 
Add disks to servers to increase storage size 
Add servers to increase bandwidth and storage size 
Add servers to increase availability (replica factor)
What we do with glusterFS
What we do with GFS 
Daily production of more than 10GB of Lucene inverted 
indexes stored on glusterFS 
more than 200GB/month 
Search stored indexes to extract different sets of 
documents for every customers 
YES: we open indexes directly on storage 
(it's POSIX!!!)
2010: first installation 
Version 3.0.x 
8 (not dedicated) servers 
Distributed replicated 
No bound on brick size (!!!!) 
Ca 4TB avaliable 
NOTE: stuck to 3.0.x until 2012 due to problems on 3.1 and 
3.2 series, then RH acquired gluster (RH Storage)
2012: (little) cluster 
New installation, version 3.3.2 
4TB available on 8 servers (DELL c5000) 
still not dedicated 
1 brick per server limited to 1TB 
2TB-raid 1 on each server 
Still in production
2012: enlarge 
New installation, upgrade to 3.3.x 
6TB available on 12 servers (still not dedicated) 
Enlarged to 9TB on 18 servers 
Bricks size bounded AND unbounded
2013: fail 
18 not dedicated servers: too much 
18 bricks of different sizes 
2 big down due to bricks out of space 
Didn’t restart after a move 
but… 
All data were recovered 
(files are scattered on bricks, read from them!)
2014: consolidate 
2 dedicated servers 
12 x 3TB SAS raid6 
4 bricks per server 
28 TB available 
distributed replicated 
4x1Gb bonded NIC 
ca 40 clients (FUSE) (other 
servers)
Consolidate 
brick 1 
brick 2 
brick 3 
brick 4 
Gluster Server 1 
brick 1 
brick 2 
brick 3 
brick 4 
Gluster Server 2
Scale up 
brick 11 
brick 12 
brick 13 
brick 31 
Gluster Server 1 
brick 21 
brick 22 
brick 32 
brick 24 
Gluster Server 2 
brick 31 
brick 32 
brick 23 
brick 14 
Gluster Server 3
Do 
Dedicated server (phisical or virtual) 
RAID 6 or RAID 10 (with small files) 
Multiple bricks of same size 
Plan to scale
Do not 
Multi purpose server 
Bricks of different size 
Very small files 
Write to bricks
Some raw tests 
read 
Total transferred file size: 23.10G bytes 
43.46M bytes/sec 
write 
Total transferred file size: 23.10G bytes 
38.53M bytes/sec
Raw tests 
NOTE: ran in production under heavy load, no 
clean test room
Resources 
http://www.gluster.org/ 
https://access.redhat.com/documentation/en- 
US/Red_Hat_Storage/ 
https://github.com/gluster 
http://www.redhat.com/products/storage-server/ 
http://joejulian.name/blog/category/glusterfs/ 
http://jread.us/2013/06/one-petabyte-red-hat-storage-and-glusterfs- 
project-overview/
Thank you!
GlusterFs: a scalable file system for today's and tomorrow's big data
Roberto Franchini 
franchini@celi.it

More Related Content

What's hot (20)

PDF
BeagleBone Black Bootloaders
SysPlay eLearning Academy for You
 
PDF
An Overview to MySQL SYS Schema
Mydbops
 
PPTX
Understanding GIT and Version Control
Sourabh Sahu
 
PDF
初心者 Git 上手攻略
Lucien Lee
 
PPTX
HDFS Internals
Apache Apex
 
PPTX
Dr.s.shiyamala fpga ppt
SHIYAMALASUBRAMANI1
 
PPTX
CXL at OCP
Memory Fabric Forum
 
PDF
A crash course in CRUSH
Sage Weil
 
PDF
Glusterfs and openstack
openstackindia
 
PDF
An Introduction to Gradle for Java Developers
Kostas Saidis
 
PPTX
Directory Management in Unix
Vpmv
 
PDF
virtio
zhaobrian
 
PDF
Q4.11: Introduction to eMMC
Linaro
 
PDF
Intro to Open Source Hardware (OSHW)
Drew Fustini
 
PDF
Linaro and Android Kernel
John Lee
 
PPTX
Cross Data Center Replication with Redis using Redis Enterprise
Cihan Biyikoglu
 
PDF
RISC-V on Edge: Porting EVE and Alpine Linux to RISC-V
ScyllaDB
 
ODP
oVirt and OpenStack
Dave Neary
 
PPTX
Accelerate with ibm storage ibm spectrum virtualize hyper swap deep dive
xKinAnx
 
BeagleBone Black Bootloaders
SysPlay eLearning Academy for You
 
An Overview to MySQL SYS Schema
Mydbops
 
Understanding GIT and Version Control
Sourabh Sahu
 
初心者 Git 上手攻略
Lucien Lee
 
HDFS Internals
Apache Apex
 
Dr.s.shiyamala fpga ppt
SHIYAMALASUBRAMANI1
 
A crash course in CRUSH
Sage Weil
 
Glusterfs and openstack
openstackindia
 
An Introduction to Gradle for Java Developers
Kostas Saidis
 
Directory Management in Unix
Vpmv
 
virtio
zhaobrian
 
Q4.11: Introduction to eMMC
Linaro
 
Intro to Open Source Hardware (OSHW)
Drew Fustini
 
Linaro and Android Kernel
John Lee
 
Cross Data Center Replication with Redis using Redis Enterprise
Cihan Biyikoglu
 
RISC-V on Edge: Porting EVE and Alpine Linux to RISC-V
ScyllaDB
 
oVirt and OpenStack
Dave Neary
 
Accelerate with ibm storage ibm spectrum virtualize hyper swap deep dive
xKinAnx
 

Viewers also liked (8)

PDF
Codemotion Rome 2015. GlusterFS
Roberto Franchini
 
PDF
What the hell is your software doing at runtime?
Roberto Franchini
 
PPTX
Where are yours vertexes and what are they talking about?
Roberto Franchini
 
PDF
Java application monitoring with Dropwizard Metrics and graphite
Roberto Franchini
 
PDF
Java 8: le nuove-interfacce di Ezio Sperduto
Vitalij Zadneprovskij
 
PDF
Redis for duplicate detection on real time stream
Roberto Franchini
 
PDF
Metodi asincroni in spring
Vitalij Zadneprovskij
 
PPTX
DevExperience - The Dark Side of Microservices
Nicolas Fränkel
 
Codemotion Rome 2015. GlusterFS
Roberto Franchini
 
What the hell is your software doing at runtime?
Roberto Franchini
 
Where are yours vertexes and what are they talking about?
Roberto Franchini
 
Java application monitoring with Dropwizard Metrics and graphite
Roberto Franchini
 
Java 8: le nuove-interfacce di Ezio Sperduto
Vitalij Zadneprovskij
 
Redis for duplicate detection on real time stream
Roberto Franchini
 
Metodi asincroni in spring
Vitalij Zadneprovskij
 
DevExperience - The Dark Side of Microservices
Nicolas Fränkel
 
Ad

Similar to GlusterFs: a scalable file system for today's and tomorrow's big data (20)

PDF
Gluster FS a filesistem for Big Data | Roberto Franchini - Codemotion Rome 2015
Codemotion
 
PDF
Gluster intro-tdose
Gluster.org
 
ODP
Gluster intro-tdose
Gluster.org
 
PDF
Introducing gluster filesystem by aditya
Aditya Chhikara
 
ODP
GlusterFS Architecture - June 30, 2011 Meetup
GlusterFS
 
PPTX
Celi @Codemotion 2014 - Roberto Franchini GlusterFS
CELI
 
ODP
The Future of GlusterFS and Gluster.org
John Mark Walker
 
ODP
Gluster fs architecture_future_directions_tlv
Sahina Bose
 
PDF
Gluster fs architecture_&_roadmap-vijay_bellur-linuxcon_eu_2013
Gluster.org
 
PDF
Gluster fs architecture_future_directions_tlv
Sahina Bose
 
ODP
Glusterfs for sysadmins-justin_clift
Gluster.org
 
PDF
vBACD - Distributed Petabyte-Scale Cloud Storage with GlusterFS - 2/28
CloudStack - Open Source Cloud Computing Project
 
PDF
GlusterFS Update and OpenStack Integration
Etsuji Nakai
 
PDF
The Future of GlusterFS and Gluster.org
John Mark Walker
 
PDF
State of the_gluster_-_lceu
Gluster.org
 
PDF
GlusterFS Presentation FOSSCOMM2013 HUA, Athens, GR
Theophanis Kontogiannis
 
ODP
GlusterFs Architecture & Roadmap - LinuxCon EU 2013
Gluster.org
 
PDF
Gluster fs tutorial part 2 gluster and big data- gluster for devs and sys ...
Tommy Lee
 
PDF
GlusterFS Talk for CentOS Dojo Bangalore
Raghavendra Talur
 
PDF
Scalable POSIX File Systems in the Cloud
Red_Hat_Storage
 
Gluster FS a filesistem for Big Data | Roberto Franchini - Codemotion Rome 2015
Codemotion
 
Gluster intro-tdose
Gluster.org
 
Gluster intro-tdose
Gluster.org
 
Introducing gluster filesystem by aditya
Aditya Chhikara
 
GlusterFS Architecture - June 30, 2011 Meetup
GlusterFS
 
Celi @Codemotion 2014 - Roberto Franchini GlusterFS
CELI
 
The Future of GlusterFS and Gluster.org
John Mark Walker
 
Gluster fs architecture_future_directions_tlv
Sahina Bose
 
Gluster fs architecture_&_roadmap-vijay_bellur-linuxcon_eu_2013
Gluster.org
 
Gluster fs architecture_future_directions_tlv
Sahina Bose
 
Glusterfs for sysadmins-justin_clift
Gluster.org
 
vBACD - Distributed Petabyte-Scale Cloud Storage with GlusterFS - 2/28
CloudStack - Open Source Cloud Computing Project
 
GlusterFS Update and OpenStack Integration
Etsuji Nakai
 
The Future of GlusterFS and Gluster.org
John Mark Walker
 
State of the_gluster_-_lceu
Gluster.org
 
GlusterFS Presentation FOSSCOMM2013 HUA, Athens, GR
Theophanis Kontogiannis
 
GlusterFs Architecture & Roadmap - LinuxCon EU 2013
Gluster.org
 
Gluster fs tutorial part 2 gluster and big data- gluster for devs and sys ...
Tommy Lee
 
GlusterFS Talk for CentOS Dojo Bangalore
Raghavendra Talur
 
Scalable POSIX File Systems in the Cloud
Red_Hat_Storage
 
Ad

Recently uploaded (20)

PDF
Download iTop VPN Free 6.1.0.5882 Crack Full Activated Pre Latest 2025
imang66g
 
PPTX
TexSender Pro 8.9.1 Crack Full Version Download
cracked shares
 
PDF
Infrastructure planning and resilience - Keith Hastings.pptx.pdf
Safe Software
 
PPT
Brief History of Python by Learning Python in three hours
adanechb21
 
PDF
Why Are More Businesses Choosing Partners Over Freelancers for Salesforce.pdf
Cymetrix Software
 
PDF
Adobe Illustrator Crack Full Download (Latest Version 2025) Pre-Activated
imang66g
 
PPTX
slidesgo-unlocking-the-code-the-dynamic-dance-of-variables-and-constants-2024...
kr2589474
 
PDF
Summary Of Odoo 18.1 to 18.4 : The Way For Odoo 19
CandidRoot Solutions Private Limited
 
PDF
Applitools Platform Pulse: What's New and What's Coming - July 2025
Applitools
 
PDF
Balancing Resource Capacity and Workloads with OnePlan – Avoid Overloading Te...
OnePlan Solutions
 
PDF
SAP GUI Installation Guide for macOS (iOS) | Connect to SAP Systems on Mac
SAP Vista, an A L T Z E N Company
 
PDF
10 posting ideas for community engagement with AI prompts
Pankaj Taneja
 
PPTX
Contractor Management Platform and Software Solution for Compliance
SHEQ Network Limited
 
PDF
ChatPharo: an Open Architecture for Understanding How to Talk Live to LLMs
ESUG
 
PPTX
Presentation about Database and Database Administrator
abhishekchauhan86963
 
PDF
Virtual Threads in Java: A New Dimension of Scalability and Performance
Tier1 app
 
PDF
AWS_Agentic_AI_in_Indian_BFSI_A_Strategic_Blueprint_for_Customer.pdf
siddharthnetsavvies
 
PPTX
Presentation about variables and constant.pptx
kr2589474
 
PDF
AI Image Enhancer: Revolutionizing Visual Quality”
docmasoom
 
PDF
AI Software Engineering based on Multi-view Modeling and Engineering Patterns
Hironori Washizaki
 
Download iTop VPN Free 6.1.0.5882 Crack Full Activated Pre Latest 2025
imang66g
 
TexSender Pro 8.9.1 Crack Full Version Download
cracked shares
 
Infrastructure planning and resilience - Keith Hastings.pptx.pdf
Safe Software
 
Brief History of Python by Learning Python in three hours
adanechb21
 
Why Are More Businesses Choosing Partners Over Freelancers for Salesforce.pdf
Cymetrix Software
 
Adobe Illustrator Crack Full Download (Latest Version 2025) Pre-Activated
imang66g
 
slidesgo-unlocking-the-code-the-dynamic-dance-of-variables-and-constants-2024...
kr2589474
 
Summary Of Odoo 18.1 to 18.4 : The Way For Odoo 19
CandidRoot Solutions Private Limited
 
Applitools Platform Pulse: What's New and What's Coming - July 2025
Applitools
 
Balancing Resource Capacity and Workloads with OnePlan – Avoid Overloading Te...
OnePlan Solutions
 
SAP GUI Installation Guide for macOS (iOS) | Connect to SAP Systems on Mac
SAP Vista, an A L T Z E N Company
 
10 posting ideas for community engagement with AI prompts
Pankaj Taneja
 
Contractor Management Platform and Software Solution for Compliance
SHEQ Network Limited
 
ChatPharo: an Open Architecture for Understanding How to Talk Live to LLMs
ESUG
 
Presentation about Database and Database Administrator
abhishekchauhan86963
 
Virtual Threads in Java: A New Dimension of Scalability and Performance
Tier1 app
 
AWS_Agentic_AI_in_Indian_BFSI_A_Strategic_Blueprint_for_Customer.pdf
siddharthnetsavvies
 
Presentation about variables and constant.pptx
kr2589474
 
AI Image Enhancer: Revolutionizing Visual Quality”
docmasoom
 
AI Software Engineering based on Multi-view Modeling and Engineering Patterns
Hironori Washizaki
 

GlusterFs: a scalable file system for today's and tomorrow's big data

  • 1. Roberto Franchini [email protected] Codemotion Milano 29/11/2014
  • 2. GlusterFS A scalable distributed file system
  • 3. whoami(1) 15 years of experience, proud to be a programmer Writes software for information extraction, nlp, opinion mining (@scale ), and a lot of other buzzwords Implements scalable architectures Plays with servers Member of the JUG-Torino coordination team [email protected] http://www.celi.it http://www.blogmeter.it github.com/robfrank github.com/uim-celi twitter.com/robfrankie linkedin.com/in/robfrank
  • 4. The problem Identify a distributed and scalable file system for today's and tomorrow's Big Data
  • 5. Once upon a time 2008: One nfs share 1,5TB ought to be enough for anybody 2010: Herd of shares (1,5TB x N) ought to be enough for anybody Nobody couldn’t stop the data flood It was the time for something new
  • 6. Requirements Can be enlarged on demand No dedicated HW OS is preferred and trusted No specialized API No specialized Kernel POSIX compliance Zilions of big and small files No NAS or SAN (€€€€€)
  • 7. Clustered Scale-out General Purpose Storage Platform − POSIX-y Distributed File System − ...and so much more Built on commodity systems − x86_64 Linux ++ − POSIX filesystems underneath (XFS, EXT4) No central metadata Server (NO SPOF) Modular architecture for scale and functionality
  • 8. Common use cases Large Scale File Server Media / Content Distribution Network (CDN) Backup / Archive / Disaster Recovery (DR) High Performance Computing (HPC) Infrastructure as a Service (IaaS) storage layer Database offload (blobs) Unified Object Store + File Access
  • 9. Features ACL and Quota support Fault-tolerance Peer to peer Self-healing Fast setup up Enlarge on demand Shrink on demand Snapshot On premise phisical or virtual On cloud
  • 11. Architecture Peer / Node − cluster servers (glusterfs server) − Runs the gluster daemons and participates in volumes Brick − A filesystem mountpoint on servers − A unit of storage used as a capacity building block
  • 12. Bricks on a node
  • 13. Architecture Translator − Logic between bricks or subvolume that generate a subvolume with certain characteristic − distribute, replica, stripe are special translators to generate simil-RAID configuration − perfomance translators Volume − Bricks combined and passed through translators − Ultimately, what's presented to the end user
  • 16. Distributed The default configuration Files “evenly” spread across bricks Similar to file-level RAID 0 Server/Disk failure could be catastrophic
  • 18. Replicated Files written synchronously to replica peers Files read synchronously, but ultimately serviced by the first responder Similar to file-level RAID 1
  • 20. Distributed + replicated Distribued + replicated Similar to file-level RAID 10 Most used layout
  • 22. Striped Individual files split among bricks (sparse files) Similar to block-level RAID 0 Limited Use Cases HPC Pre/Post Processing File size exceeds brick size
  • 25. Components glusterd Management daemon One instance on each GlusterFS server Interfaced through gluster CLI glusterfsd GlusterFS brick daemon One process for each brick on each server Managed by glusterd
  • 26. Components glusterfs Volume service daemon One process for each volume service NFS server, FUSE client, Self-Heal, Quota, ... mount.glusterfs FUSE native client mount extension gluster Gluster Console Manager (CLI)
  • 28. Clients: native FUSE kernel module allows the filesystem to be built and operated entirely in userspace Specify mount to any GlusterFS server Native Client fetches volfile from mount server, then communicates directly with all nodes to access data Recommended for high concurrency and high write performance Load is inherently balanced across distributed volumes
  • 29. Clients:NFS Standard NFS v3 clients Standard automounter is supported Mount to any server, or use a load balancer GlusterFS NFS server includes Network Lock Manager (NLM) to synchronize locks across clients Better performance for reading many small files from a single client Load balancing must be managed externally
  • 30. Clients: libgfapi Introduced with GlusterFS 3.4 User-space library for accessing data in GlusterFS Filesystem-like API Runs in application process no FUSE, no copies, no context switches ...but same volfiles, translators, etc.
  • 31. Clients: SMB/CIFS In GlusterFS 3.4 – Samba + libgfapi No need for local native client mount & re-export Significant performance improvements with FUSE removed from the equation Must be setup on each server you wish to connect to via CIFS CTDB is required for Samba clustering
  • 32. Clients: HDFS Access data within and outside of Hadoop No HDFS name node single point of failure / bottleneck Seamless replacement for HDFS Scales with the massive growth of big data
  • 34. Under the hood Elastic Hash Algorithm No central metadata No Performance Bottleneck Eliminates risk scenarios Location hashed intelligently on filename Unique identifiers (GFID), similar to md5sum
  • 35. Scalability 3TB 3TB 3TB 3TB 3TB 3TB 3TB 3TB 3TB 3TB Gluster Server 3TB 3TB 3TB 3TB 3TB 3TB 3TB 3TB 3TB 3TB Gluster Server 3TB 3TB 3TB 3TB 3TB 3TB 3TB 3TB 3TB 3TB Gluster Server Scale out performance and availability Scale out capacitry
  • 36. Scalability Add disks to servers to increase storage size Add servers to increase bandwidth and storage size Add servers to increase availability (replica factor)
  • 37. What we do with glusterFS
  • 38. What we do with GFS Daily production of more than 10GB of Lucene inverted indexes stored on glusterFS more than 200GB/month Search stored indexes to extract different sets of documents for every customers YES: we open indexes directly on storage (it's POSIX!!!)
  • 39. 2010: first installation Version 3.0.x 8 (not dedicated) servers Distributed replicated No bound on brick size (!!!!) Ca 4TB avaliable NOTE: stuck to 3.0.x until 2012 due to problems on 3.1 and 3.2 series, then RH acquired gluster (RH Storage)
  • 40. 2012: (little) cluster New installation, version 3.3.2 4TB available on 8 servers (DELL c5000) still not dedicated 1 brick per server limited to 1TB 2TB-raid 1 on each server Still in production
  • 41. 2012: enlarge New installation, upgrade to 3.3.x 6TB available on 12 servers (still not dedicated) Enlarged to 9TB on 18 servers Bricks size bounded AND unbounded
  • 42. 2013: fail 18 not dedicated servers: too much 18 bricks of different sizes 2 big down due to bricks out of space Didn’t restart after a move but… All data were recovered (files are scattered on bricks, read from them!)
  • 43. 2014: consolidate 2 dedicated servers 12 x 3TB SAS raid6 4 bricks per server 28 TB available distributed replicated 4x1Gb bonded NIC ca 40 clients (FUSE) (other servers)
  • 44. Consolidate brick 1 brick 2 brick 3 brick 4 Gluster Server 1 brick 1 brick 2 brick 3 brick 4 Gluster Server 2
  • 45. Scale up brick 11 brick 12 brick 13 brick 31 Gluster Server 1 brick 21 brick 22 brick 32 brick 24 Gluster Server 2 brick 31 brick 32 brick 23 brick 14 Gluster Server 3
  • 46. Do Dedicated server (phisical or virtual) RAID 6 or RAID 10 (with small files) Multiple bricks of same size Plan to scale
  • 47. Do not Multi purpose server Bricks of different size Very small files Write to bricks
  • 48. Some raw tests read Total transferred file size: 23.10G bytes 43.46M bytes/sec write Total transferred file size: 23.10G bytes 38.53M bytes/sec
  • 49. Raw tests NOTE: ran in production under heavy load, no clean test room
  • 50. Resources http://www.gluster.org/ https://access.redhat.com/documentation/en- US/Red_Hat_Storage/ https://github.com/gluster http://www.redhat.com/products/storage-server/ http://joejulian.name/blog/category/glusterfs/ http://jread.us/2013/06/one-petabyte-red-hat-storage-and-glusterfs- project-overview/