SlideShare a Scribd company logo
GPFS: General Parallel File
System
Why is it needed?
What is GPFS and its features?
Where it is being used?
Why GPFS is needed?
Growth Rate of Components
• ✓ CPU speed performance has increased 8
to 10 times.
• ✓ DRAM speed performance has increased 7
to 9 times.
• ✓ Network speed performance has increased
100 times.
• ✓ Bus speed performance has increased 20
times.
• ✓ But Hard disk drive (HDD) speed
performance has increased only 1.2 times.
Three Important Functions
of Enterprise Storage
• ✓ Store data
• ✓ Protect data from being lost
• ✓ Feed data to the computer’s processors
(so they can keep doing work)
Existing Solutions Inability
• DAS, NAS, SAN [alone]
• Many data centers have become victims of
“filer-sprawl”
• Data administration and management
(such as migration, backups, archiving)
costs to skyrocket!
• I/O performance & application workflow
What is GPFS
• The General Parallel File System (GPFS) is a high
performance clustered file system. It can be
deployed in shared disk or shared nothing
distributed parallel modes.
• Developer(s): IBM
• Operating system: AIX / Linux / Windows Server
• License: Proprietary
• System Introduced: 1998 (AIX)
• Max. volume size: 8 YB
• Max. file size: 8 EB
• Max. number of files: 264 per file system
• File system permissions: POSIX
GPFS Current Usage
• It is used by many of the world's largest commercial
companies, as well as some of the supercomputers on
the Top 500 List.
• For example, GPFS was the filesystem of the ASC
Purple Supercomputer which was composed of more
than 12,000 processors and 2 petabytes of total disk
storage spanning more than 11,000 disks.
• IBM,s GPFS is extensively used across multiple
industries like Government, Oil and Gas, Life Sciences,
Media/Entertainment, Financial services
GPFS Features
Standard file system interface with POSIX semantics
– Metadata on shared storage
– Distributed locking for read/write semantics
• Highly scalable
– High capacity (up to 2^99 bytes file system size, up to 2^63 files per file
system)
– High throughput (TB/s)
– Wide striping
– Large block size (up to 16MB)
– Multiple nodes write in parallel
• Advanced data management
– ILM (storage pools), Snapshots
– Backup HSM (DMAPI)
– Remote replication, WAN caching
• High availability
– Fault tolerance (node, disk failures)
– On-line system management (add/remove nodes, disks, ...)
References
• GPFS official homepage
• GPFS resources (including download)
• GPFS at Almaden
• GPFS Mailing List
• GPFS User Group
• IBM GPFS Product Documentation
• IBM GPFS Wiki
Ad

More Related Content

What's hot (20)

Apache Cassandra at the Geek2Geek Berlin
Apache Cassandra at the Geek2Geek BerlinApache Cassandra at the Geek2Geek Berlin
Apache Cassandra at the Geek2Geek Berlin
Christian Johannsen
 
Distributed Caching in Kubernetes with Hazelcast
Distributed Caching in Kubernetes with HazelcastDistributed Caching in Kubernetes with Hazelcast
Distributed Caching in Kubernetes with Hazelcast
Mesut Celik
 
NiFi Developer Guide
NiFi Developer GuideNiFi Developer Guide
NiFi Developer Guide
Deon Huang
 
CDW: SAN vs. NAS
CDW: SAN vs. NASCDW: SAN vs. NAS
CDW: SAN vs. NAS
Spiceworks
 
Ibm spectrum scale fundamentals workshop for americas part 4 Replication, Str...
Ibm spectrum scale fundamentals workshop for americas part 4 Replication, Str...Ibm spectrum scale fundamentals workshop for americas part 4 Replication, Str...
Ibm spectrum scale fundamentals workshop for americas part 4 Replication, Str...
xKinAnx
 
Apache Spark Architecture
Apache Spark ArchitectureApache Spark Architecture
Apache Spark Architecture
Alexey Grishchenko
 
Evaluation of TPC-H on Spark and Spark SQL in ALOJA
Evaluation of TPC-H on Spark and Spark SQL in ALOJAEvaluation of TPC-H on Spark and Spark SQL in ALOJA
Evaluation of TPC-H on Spark and Spark SQL in ALOJA
DataWorks Summit
 
Redis persistence in practice
Redis persistence in practiceRedis persistence in practice
Redis persistence in practice
Eugene Fidelin
 
Efficient Data Storage for Analytics with Apache Parquet 2.0
Efficient Data Storage for Analytics with Apache Parquet 2.0Efficient Data Storage for Analytics with Apache Parquet 2.0
Efficient Data Storage for Analytics with Apache Parquet 2.0
Cloudera, Inc.
 
Reduce Storage Costs by 5x Using The New HDFS Tiered Storage Feature
Reduce Storage Costs by 5x Using The New HDFS Tiered Storage Feature Reduce Storage Costs by 5x Using The New HDFS Tiered Storage Feature
Reduce Storage Costs by 5x Using The New HDFS Tiered Storage Feature
DataWorks Summit
 
Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to Redis
TO THE NEW | Technology
 
Ibm spectrum scale fundamentals workshop for americas part 8 spectrumscale ba...
Ibm spectrum scale fundamentals workshop for americas part 8 spectrumscale ba...Ibm spectrum scale fundamentals workshop for americas part 8 spectrumscale ba...
Ibm spectrum scale fundamentals workshop for americas part 8 spectrumscale ba...
xKinAnx
 
Hadoop hdfs
Hadoop hdfsHadoop hdfs
Hadoop hdfs
Sudipta Ghosh
 
Hive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveHive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep Dive
DataWorks Summit
 
Storage Basics
Storage BasicsStorage Basics
Storage Basics
Murali Rajesh
 
Google File System
Google File SystemGoogle File System
Google File System
Amgad Muhammad
 
HBaseCon 2015: Taming GC Pauses for Large Java Heap in HBase
HBaseCon 2015: Taming GC Pauses for Large Java Heap in HBaseHBaseCon 2015: Taming GC Pauses for Large Java Heap in HBase
HBaseCon 2015: Taming GC Pauses for Large Java Heap in HBase
HBaseCon
 
Ceph Object Storage Performance Secrets and Ceph Data Lake Solution
Ceph Object Storage Performance Secrets and Ceph Data Lake SolutionCeph Object Storage Performance Secrets and Ceph Data Lake Solution
Ceph Object Storage Performance Secrets and Ceph Data Lake Solution
Karan Singh
 
File Format Benchmark - Avro, JSON, ORC & Parquet
File Format Benchmark - Avro, JSON, ORC & ParquetFile Format Benchmark - Avro, JSON, ORC & Parquet
File Format Benchmark - Avro, JSON, ORC & Parquet
DataWorks Summit/Hadoop Summit
 
Alfresco Share - Recycle Bin Ideas
Alfresco Share - Recycle Bin IdeasAlfresco Share - Recycle Bin Ideas
Alfresco Share - Recycle Bin Ideas
AlfrescoUE
 
Apache Cassandra at the Geek2Geek Berlin
Apache Cassandra at the Geek2Geek BerlinApache Cassandra at the Geek2Geek Berlin
Apache Cassandra at the Geek2Geek Berlin
Christian Johannsen
 
Distributed Caching in Kubernetes with Hazelcast
Distributed Caching in Kubernetes with HazelcastDistributed Caching in Kubernetes with Hazelcast
Distributed Caching in Kubernetes with Hazelcast
Mesut Celik
 
NiFi Developer Guide
NiFi Developer GuideNiFi Developer Guide
NiFi Developer Guide
Deon Huang
 
CDW: SAN vs. NAS
CDW: SAN vs. NASCDW: SAN vs. NAS
CDW: SAN vs. NAS
Spiceworks
 
Ibm spectrum scale fundamentals workshop for americas part 4 Replication, Str...
Ibm spectrum scale fundamentals workshop for americas part 4 Replication, Str...Ibm spectrum scale fundamentals workshop for americas part 4 Replication, Str...
Ibm spectrum scale fundamentals workshop for americas part 4 Replication, Str...
xKinAnx
 
Evaluation of TPC-H on Spark and Spark SQL in ALOJA
Evaluation of TPC-H on Spark and Spark SQL in ALOJAEvaluation of TPC-H on Spark and Spark SQL in ALOJA
Evaluation of TPC-H on Spark and Spark SQL in ALOJA
DataWorks Summit
 
Redis persistence in practice
Redis persistence in practiceRedis persistence in practice
Redis persistence in practice
Eugene Fidelin
 
Efficient Data Storage for Analytics with Apache Parquet 2.0
Efficient Data Storage for Analytics with Apache Parquet 2.0Efficient Data Storage for Analytics with Apache Parquet 2.0
Efficient Data Storage for Analytics with Apache Parquet 2.0
Cloudera, Inc.
 
Reduce Storage Costs by 5x Using The New HDFS Tiered Storage Feature
Reduce Storage Costs by 5x Using The New HDFS Tiered Storage Feature Reduce Storage Costs by 5x Using The New HDFS Tiered Storage Feature
Reduce Storage Costs by 5x Using The New HDFS Tiered Storage Feature
DataWorks Summit
 
Ibm spectrum scale fundamentals workshop for americas part 8 spectrumscale ba...
Ibm spectrum scale fundamentals workshop for americas part 8 spectrumscale ba...Ibm spectrum scale fundamentals workshop for americas part 8 spectrumscale ba...
Ibm spectrum scale fundamentals workshop for americas part 8 spectrumscale ba...
xKinAnx
 
Hive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveHive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep Dive
DataWorks Summit
 
HBaseCon 2015: Taming GC Pauses for Large Java Heap in HBase
HBaseCon 2015: Taming GC Pauses for Large Java Heap in HBaseHBaseCon 2015: Taming GC Pauses for Large Java Heap in HBase
HBaseCon 2015: Taming GC Pauses for Large Java Heap in HBase
HBaseCon
 
Ceph Object Storage Performance Secrets and Ceph Data Lake Solution
Ceph Object Storage Performance Secrets and Ceph Data Lake SolutionCeph Object Storage Performance Secrets and Ceph Data Lake Solution
Ceph Object Storage Performance Secrets and Ceph Data Lake Solution
Karan Singh
 
Alfresco Share - Recycle Bin Ideas
Alfresco Share - Recycle Bin IdeasAlfresco Share - Recycle Bin Ideas
Alfresco Share - Recycle Bin Ideas
AlfrescoUE
 

Similar to IBM GPFS (20)

IBM Platform Computing Elastic Storage
IBM Platform Computing  Elastic StorageIBM Platform Computing  Elastic Storage
IBM Platform Computing Elastic Storage
Patrick Bouillaud
 
Wheeler w 0450_linux_file_systems1
Wheeler w 0450_linux_file_systems1Wheeler w 0450_linux_file_systems1
Wheeler w 0450_linux_file_systems1
sprdd
 
Wheeler w 0450_linux_file_systems1
Wheeler w 0450_linux_file_systems1Wheeler w 0450_linux_file_systems1
Wheeler w 0450_linux_file_systems1
sprdd
 
getFamiliarWithHadoop
getFamiliarWithHadoopgetFamiliarWithHadoop
getFamiliarWithHadoop
AmirReza Mohammadi
 
Storage solutions for High Performance Computing
Storage solutions for High Performance ComputingStorage solutions for High Performance Computing
Storage solutions for High Performance Computing
gmateesc
 
Chapter2.pdf
Chapter2.pdfChapter2.pdf
Chapter2.pdf
WasyihunSema2
 
UNIT 4-UNDERSTANDING VIRTUAL MEMORY.pptx
UNIT 4-UNDERSTANDING VIRTUAL MEMORY.pptxUNIT 4-UNDERSTANDING VIRTUAL MEMORY.pptx
UNIT 4-UNDERSTANDING VIRTUAL MEMORY.pptx
LeahRachael
 
Introducing StorNext5 and Lattus
Introducing StorNext5 and LattusIntroducing StorNext5 and Lattus
Introducing StorNext5 and Lattus
inside-BigData.com
 
Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introduction
Subhas Kumar Ghosh
 
Apache hadoop basics
Apache hadoop basicsApache hadoop basics
Apache hadoop basics
saili mane
 
Distributed Filesystems Review
Distributed Filesystems ReviewDistributed Filesystems Review
Distributed Filesystems Review
Schubert Zhang
 
Introduction to intelligence cybersecurity_4
Introduction to intelligence cybersecurity_4Introduction to intelligence cybersecurity_4
Introduction to intelligence cybersecurity_4
arazaque2675
 
Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoop
Roushan Sinha
 
Big in memory file system
Big in memory file systemBig in memory file system
Big in memory file system
Mahesh Gupta
 
FAT.pptx
FAT.pptxFAT.pptx
FAT.pptx
madhavigulhane1
 
Introduction to Big Data Hadoop Training Online by www.itjobzone.biz
Introduction to Big Data Hadoop Training Online by www.itjobzone.bizIntroduction to Big Data Hadoop Training Online by www.itjobzone.biz
Introduction to Big Data Hadoop Training Online by www.itjobzone.biz
ITJobZone.biz
 
Introducing IBM Spectrum Scale 4.2 and Elastic Storage Server 3.5
Introducing IBM Spectrum Scale 4.2 and Elastic Storage Server 3.5Introducing IBM Spectrum Scale 4.2 and Elastic Storage Server 3.5
Introducing IBM Spectrum Scale 4.2 and Elastic Storage Server 3.5
Doug O'Flaherty
 
Gestione gerarchica dei dati con SUSE Enterprise Storage e HPE DMF
Gestione gerarchica dei dati con SUSE Enterprise Storage e HPE DMFGestione gerarchica dei dati con SUSE Enterprise Storage e HPE DMF
Gestione gerarchica dei dati con SUSE Enterprise Storage e HPE DMF
SUSE Italy
 
Spectrum Scale Unified File and Object with WAN Caching
Spectrum Scale Unified File and Object with WAN CachingSpectrum Scale Unified File and Object with WAN Caching
Spectrum Scale Unified File and Object with WAN Caching
Sandeep Patil
 
Software Defined Analytics with File and Object Access Plus Geographically Di...
Software Defined Analytics with File and Object Access Plus Geographically Di...Software Defined Analytics with File and Object Access Plus Geographically Di...
Software Defined Analytics with File and Object Access Plus Geographically Di...
Trishali Nayar
 
IBM Platform Computing Elastic Storage
IBM Platform Computing  Elastic StorageIBM Platform Computing  Elastic Storage
IBM Platform Computing Elastic Storage
Patrick Bouillaud
 
Wheeler w 0450_linux_file_systems1
Wheeler w 0450_linux_file_systems1Wheeler w 0450_linux_file_systems1
Wheeler w 0450_linux_file_systems1
sprdd
 
Wheeler w 0450_linux_file_systems1
Wheeler w 0450_linux_file_systems1Wheeler w 0450_linux_file_systems1
Wheeler w 0450_linux_file_systems1
sprdd
 
Storage solutions for High Performance Computing
Storage solutions for High Performance ComputingStorage solutions for High Performance Computing
Storage solutions for High Performance Computing
gmateesc
 
UNIT 4-UNDERSTANDING VIRTUAL MEMORY.pptx
UNIT 4-UNDERSTANDING VIRTUAL MEMORY.pptxUNIT 4-UNDERSTANDING VIRTUAL MEMORY.pptx
UNIT 4-UNDERSTANDING VIRTUAL MEMORY.pptx
LeahRachael
 
Introducing StorNext5 and Lattus
Introducing StorNext5 and LattusIntroducing StorNext5 and Lattus
Introducing StorNext5 and Lattus
inside-BigData.com
 
Apache hadoop basics
Apache hadoop basicsApache hadoop basics
Apache hadoop basics
saili mane
 
Distributed Filesystems Review
Distributed Filesystems ReviewDistributed Filesystems Review
Distributed Filesystems Review
Schubert Zhang
 
Introduction to intelligence cybersecurity_4
Introduction to intelligence cybersecurity_4Introduction to intelligence cybersecurity_4
Introduction to intelligence cybersecurity_4
arazaque2675
 
Big in memory file system
Big in memory file systemBig in memory file system
Big in memory file system
Mahesh Gupta
 
Introduction to Big Data Hadoop Training Online by www.itjobzone.biz
Introduction to Big Data Hadoop Training Online by www.itjobzone.bizIntroduction to Big Data Hadoop Training Online by www.itjobzone.biz
Introduction to Big Data Hadoop Training Online by www.itjobzone.biz
ITJobZone.biz
 
Introducing IBM Spectrum Scale 4.2 and Elastic Storage Server 3.5
Introducing IBM Spectrum Scale 4.2 and Elastic Storage Server 3.5Introducing IBM Spectrum Scale 4.2 and Elastic Storage Server 3.5
Introducing IBM Spectrum Scale 4.2 and Elastic Storage Server 3.5
Doug O'Flaherty
 
Gestione gerarchica dei dati con SUSE Enterprise Storage e HPE DMF
Gestione gerarchica dei dati con SUSE Enterprise Storage e HPE DMFGestione gerarchica dei dati con SUSE Enterprise Storage e HPE DMF
Gestione gerarchica dei dati con SUSE Enterprise Storage e HPE DMF
SUSE Italy
 
Spectrum Scale Unified File and Object with WAN Caching
Spectrum Scale Unified File and Object with WAN CachingSpectrum Scale Unified File and Object with WAN Caching
Spectrum Scale Unified File and Object with WAN Caching
Sandeep Patil
 
Software Defined Analytics with File and Object Access Plus Geographically Di...
Software Defined Analytics with File and Object Access Plus Geographically Di...Software Defined Analytics with File and Object Access Plus Geographically Di...
Software Defined Analytics with File and Object Access Plus Geographically Di...
Trishali Nayar
 
Ad

IBM GPFS

  • 1. GPFS: General Parallel File System Why is it needed? What is GPFS and its features? Where it is being used?
  • 2. Why GPFS is needed?
  • 3. Growth Rate of Components • ✓ CPU speed performance has increased 8 to 10 times. • ✓ DRAM speed performance has increased 7 to 9 times. • ✓ Network speed performance has increased 100 times. • ✓ Bus speed performance has increased 20 times. • ✓ But Hard disk drive (HDD) speed performance has increased only 1.2 times.
  • 4. Three Important Functions of Enterprise Storage • ✓ Store data • ✓ Protect data from being lost • ✓ Feed data to the computer’s processors (so they can keep doing work)
  • 5. Existing Solutions Inability • DAS, NAS, SAN [alone] • Many data centers have become victims of “filer-sprawl” • Data administration and management (such as migration, backups, archiving) costs to skyrocket! • I/O performance & application workflow
  • 6. What is GPFS • The General Parallel File System (GPFS) is a high performance clustered file system. It can be deployed in shared disk or shared nothing distributed parallel modes. • Developer(s): IBM • Operating system: AIX / Linux / Windows Server • License: Proprietary • System Introduced: 1998 (AIX) • Max. volume size: 8 YB • Max. file size: 8 EB • Max. number of files: 264 per file system • File system permissions: POSIX
  • 7. GPFS Current Usage • It is used by many of the world's largest commercial companies, as well as some of the supercomputers on the Top 500 List. • For example, GPFS was the filesystem of the ASC Purple Supercomputer which was composed of more than 12,000 processors and 2 petabytes of total disk storage spanning more than 11,000 disks. • IBM,s GPFS is extensively used across multiple industries like Government, Oil and Gas, Life Sciences, Media/Entertainment, Financial services
  • 8. GPFS Features Standard file system interface with POSIX semantics – Metadata on shared storage – Distributed locking for read/write semantics • Highly scalable – High capacity (up to 2^99 bytes file system size, up to 2^63 files per file system) – High throughput (TB/s) – Wide striping – Large block size (up to 16MB) – Multiple nodes write in parallel • Advanced data management – ILM (storage pools), Snapshots – Backup HSM (DMAPI) – Remote replication, WAN caching • High availability – Fault tolerance (node, disk failures) – On-line system management (add/remove nodes, disks, ...)
  • 9. References • GPFS official homepage • GPFS resources (including download) • GPFS at Almaden • GPFS Mailing List • GPFS User Group • IBM GPFS Product Documentation • IBM GPFS Wiki