This set of slides describes some initial experiments which we have designed for discovering improvements for performance in Hadoop technologies using NVMe technology
Ceph on Intel: Intel Storage Components, Benchmarks, and ContributionsColleen Corrice
At Red Hat Storage Day Minneapolis on 4/12/16, Intel's Dan Ferber presented on Intel storage components, benchmarks, and contributions as they relate to Ceph.
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...Odinot Stanislas
Après la petite intro sur le stockage distribué et la description de Ceph, Jian Zhang réalise dans cette présentation quelques benchmarks intéressants : tests séquentiels, tests random et surtout comparaison des résultats avant et après optimisations. Les paramètres de configuration touchés et optimisations (Large page numbers, Omap data sur un disque séparé, ...) apportent au minimum 2x de perf en plus.
1. Log structured merge trees store data in multiple levels with different storage speeds and costs, requiring data to periodically merge across levels.
2. This structure allows fast writes by storing new data in faster levels before merging to slower levels, and efficient reads by querying multiple levels and merging results.
3. The merging process involves loading, sorting, and rewriting levels to consolidate and propagate deletions and updates between levels.
Some key value stores using log-structureZhichao Liang
This slides presents three key-value stores using log-structure, includes Riak, RethinkDB, LevelDB. BTW, i state that RethinkDB employs append-only B-tree and that is an estimate made by combining guessing wih reasoning!
Journey to Stability: Petabyte Ceph Cluster in OpenStack CloudPatrick McGarry
Cisco Cloud Services provides an OpenStack platform to Cisco SaaS applications using a worldwide deployment of Ceph clusters storing petabytes of data. The initial Ceph cluster design experienced major stability problems as the cluster grew past 50% capacity. Strategies were implemented to improve stability including client IO throttling, backfill and recovery throttling, upgrading Ceph versions, adding NVMe journals, moving the MON levelDB to SSDs, rebalancing the cluster, and proactively detecting slow disks. Lessons learned included the importance of devops practices, sharing knowledge, rigorous testing, and balancing performance, cost and time.
CephFS performance testing was conducted on a Jewel deployment. Key findings include:
- Single MDS performance is limited by its single-threaded design; operations reached CPU limits
- Improper client behavior can cause MDS OOM issues by exceeding inode caching limits
- Metadata operations like create, open, update showed similar performance, reaching 4-5k ops/sec maximum
- Caching had a large impact on performance when the working set exceeded cache size
This session will cover performance-related developments in Red Hat Gluster Storage 3 and share best practices for testing, sizing, configuration, and tuning.
Join us to learn about:
Current features in Red Hat Gluster Storage, including 3-way replication, JBOD support, and thin-provisioning.
Features that are in development, including network file system (NFS) support with Ganesha, erasure coding, and cache tiering.
New performance enhancements related to the area of remote directory memory access (RDMA), small-file performance, FUSE caching, and solid state disks (SSD) readiness.
MyRocks is an open source LSM based MySQL database, created by Facebook. This slides introduce MyRocks overview and how we deployed at Facebook, as of 2017.
Ceph - High Performance Without High CostsJonathan Long
Ceph is a high-performance storage platform that provides storage without high costs. The presentation discusses BlueStore, a redesign of Ceph's object store to improve performance and efficiency. BlueStore preserves wire compatibility but uses an incompatible storage format. It aims to double write performance and match or exceed read performance of the previous FileStore design. BlueStore simplifies the architecture and uses algorithms tailored for different hardware like flash. It was in a tech preview in the Jewel release and aims to be default in the Luminous release next year.
This document proposes a unified read-only cache for Ceph using a standalone SSD caching library that can be reused for librbd and RGW. It describes the general architecture including a common libcachefile, policy, and hooks. It then provides more details on shared read-only caching implementations for librbd and RGW, including initial results showing a 4x performance improvement for librbd. Issues discussed include different block vs object caching semantics and status of the RGW caching PR.
Build an High-Performance and High-Durable Block Storage Service Based on CephRongze Zhu
This document discusses building a high-performance and durable block storage service using Ceph. It describes the architecture, including a minimum deployment of 12 OSD nodes and 3 monitor nodes. It outlines optimizations made to Ceph, Qemu, and the operating system configuration to achieve high performance, including 6000 IOPS and 170MB/s throughput. It also discusses how the CRUSH map can be optimized to reduce recovery times and number of copysets to improve durability to 99.99999999%.
This document provides an overview and planning guidelines for a first Ceph cluster. It discusses Ceph's object, block, and file storage capabilities and how it integrates with OpenStack. Hardware sizing examples are given for a 1 petabyte storage cluster with 500 VMs requiring 100 IOPS each. Specific lessons learned are also outlined, such as realistic IOPS expectations from HDD and SSD backends, recommended CPU and RAM per OSD, and best practices around networking and deployment.
Tech Talk: RocksDB Slides by Dhruba Borthakur & Haobo Xu of FacebookThe Hive
This presentation describes the reasons why Facebook decided to build yet another key-value store, the vision and architecture of RocksDB and how it differs from other open source key-value stores. Dhruba describes some of the salient features in RocksDB that are needed for supporting embedded-storage deployments. He explains typical workloads that could be the primary use-cases for RocksDB. He also lays out the roadmap to make RocksDB the key-value store of choice for highly-multi-core processors and RAM-speed storage devices.
This document summarizes BlueStore, a new storage backend for Ceph that provides faster performance compared to the existing FileStore backend. BlueStore manages metadata and data separately, with metadata stored in a key-value database (RocksDB) and data written directly to block devices. This avoids issues with POSIX filesystem transactions and enables more efficient features like checksumming, compression, and cloning. BlueStore addresses consistency and performance problems that arose with previous approaches like FileStore and NewStore.
This presentation provides an overview of the Dell PowerEdge R730xd server performance results with Red Hat Ceph Storage. It covers the advantages of using Red Hat Ceph Storage on Dell servers with their proven hardware components that provide high scalability, enhanced ROI cost benefits, and support of unstructured data.
1) Write-behind logging (WBL) is an alternative to write-ahead logging (WAL) that avoids duplicating data in the log and database for non-volatile memory (NVM) storage.
2) WBL records two commit timestamps - one for the latest persisted changes and one for the latest promised commit. This allows transactions between the timestamps to be ignored during recovery.
3) Recovery for WBL involves analyzing the log to retrieve commit timestamp gaps and long transaction timestamps rather than replaying the entire log as in WAL.
Yfrog uses HBase as its scalable database backend to store and serve 250 million photos from over 60 million monthly users across 4 HBase clusters ranging from 50TB to 1PB in size. The authors provide best practices for configuring and monitoring HBase, including using smaller commodity servers, tuning JVM garbage collection, monitoring metrics like thread usage and disk I/O, and implementing caching and replication for high performance and reliability. Following these practices has allowed Yfrog's HBase deployment to run smoothly and efficiently.
Ceph Day Beijing - Optimizing Ceph Performance by Leveraging Intel Optane and...Danielle Womboldt
Optimizing Ceph performance by leveraging Intel Optane and 3D NAND TLC SSDs. The document discusses using Intel Optane SSDs as journal/metadata drives and Intel 3D NAND SSDs as data drives in Ceph clusters. It provides examples of configurations and analysis of a 2.8 million IOPS Ceph cluster using this approach. Tuning recommendations are also provided to optimize performance.
Development to Production with Sharded MongoDB ClustersSeveralnines
Severalnines presentation at MongoDB Stockholm Conference.
Presentation covers:
- mongoDB sharding/clustering concepts
- recommended dev/test/prod setups
- how to verify your deployment
- how to avoid downtime
- what MongoDB metrics to watch
- when to scale
The Hive Think Tank: Rocking the Database World with RocksDBThe Hive
RocksDB is a new storage engine for MySQL that provides better storage efficiency than InnoDB. It achieves lower space amplification and write amplification than InnoDB through its use of compression and log-structured merge trees. While MyRocks (RocksDB integrated with MySQL) currently has some limitations like a lack of support for online DDL and spatial indexes, work is ongoing to address these limitations and integrate additional RocksDB features to fully support MySQL workloads. Testing at Facebook showed MyRocks uses less disk space and performs comparably to InnoDB for their queries.
Understanding how memory is managed with MongoDB is instrumental in maximizing database performance and hardware utilisation. This talk covers the workings of low level operating system components like the page cache and memory mapped files. We will examine the differences between RAM, SSD and hard disk drives to help you choose the right hardware configuration. Finally, we will learn how to monitor and analyze memory and disk usage using the MongoDB Management Service, linux administration commands and MongoDB commands.
This document discusses the status update of Hadoop running over Ceph RGW with SSD caching. It describes the RGW-Proxy component that returns the closest RGW instance to data, and the RGWFS component that allows Hadoop to access a Ceph cluster through RGW. Performance testing shows that avoiding object renames in Swift reduces overhead compared to HDFS. The next steps are to finish RGWFS development, address heavy renames in RGW, and open source the code.
This document summarizes a distributed storage system called Ceph. Ceph uses an architecture with four main components - RADOS for reliable storage, Librados client libraries, RBD for block storage, and CephFS for file storage. It distributes data across intelligent storage nodes using the CRUSH algorithm and maintains reliability through replication and erasure coding of placement groups across the nodes. The monitors manage the cluster map and placement, while OSDs on each node store and manage the data and metadata.
The Hive Think Tank: Rocking the Database World with RocksDBThe Hive
Dhruba Borthakur, Facebook
Dhruba Borthakur is an engineer at Facebook. He has been one of the founding engineer of RocksDB, an open-source key-value store optimized for storing data in flash and main-memory storage. He has been one of the founding architects of the Apache Hadoop Distributed File System and has been instrumental in scaling Facebook's Hadoop cluster to multiples of petabytes. Dhruba has contributed code to the Apache HBase project. Earlier, he contributed to the development of the Andrew File System (AFS). He has an M.S. in Computer Science from the University of Wisconsin, Madison and a B.S. in Computer Science BITS, Pilani, India.
HKG15-401: Ceph and Software Defined Storage on ARM serversLinaro
HKG15-401: Ceph and Software Defined Storage on ARM servers
---------------------------------------------------
Speaker: Yazen Ghannam Steve Capper
Date: February 12, 2015
---------------------------------------------------
★ Session Summary ★
Running Ceph in the colocation, ongoing optimizations
--------------------------------------------------
★ Resources ★
Pathable: https://hkg15.pathable.com/meetings/250828
Video: https://www.youtube.com/watch?v=RdZojLL7ttk
Etherpad: http://pad.linaro.org/p/hkg15-401
---------------------------------------------------
★ Event Details ★
Linaro Connect Hong Kong 2015 - #HKG15
February 9-13th, 2015
Regal Airport Hotel Hong Kong Airport
---------------------------------------------------
http://www.linaro.org
http://connect.linaro.org
This document discusses secrets management in containers and recommends solutions like Kubernetes Secrets, Docker Swarm Secrets, DC/OS Secrets, Keywhiz, and Hashicorp Vault. It highlights Hashicorp Vault's purpose-built focus on secrets, key rolling capabilities, comprehensive access control, expiration policies, and extensibility. The document then provides a case study of Aqua Security's integration with Hashicorp Vault, which allows for central secret management without persisting secrets to disk, secured communications, control over user/group secret access, usage tracking, and runtime secret rotation/revocation without container restarts.
How to Become a Thought Leader in Your NicheLeslie Samuel
Are bloggers thought leaders? Here are some tips on how you can become one. Provide great value, put awesome content out there on a regular basis, and help others.
#askSAP: Journey to the Cloud: SAP Strategy and Roadmap for Cloud and Hybrid ...SAP Analytics
www.sap.com/businessobjects-cloud. The momentum of customers moving to the SAP BusinessObjects Cloud is rapidly accelerating – and so are the innovations being introduced by SAP. New features and functionality for cloud and on premise with SAP BusinessObjects Enterprise offer hybrid use cases that organizations can take advantage of as they embark on their journey to the cloud. View the webinar reply at http://webinars.sap.com/asksap-webinar-series/en/home#section_3.
MyRocks is an open source LSM based MySQL database, created by Facebook. This slides introduce MyRocks overview and how we deployed at Facebook, as of 2017.
Ceph - High Performance Without High CostsJonathan Long
Ceph is a high-performance storage platform that provides storage without high costs. The presentation discusses BlueStore, a redesign of Ceph's object store to improve performance and efficiency. BlueStore preserves wire compatibility but uses an incompatible storage format. It aims to double write performance and match or exceed read performance of the previous FileStore design. BlueStore simplifies the architecture and uses algorithms tailored for different hardware like flash. It was in a tech preview in the Jewel release and aims to be default in the Luminous release next year.
This document proposes a unified read-only cache for Ceph using a standalone SSD caching library that can be reused for librbd and RGW. It describes the general architecture including a common libcachefile, policy, and hooks. It then provides more details on shared read-only caching implementations for librbd and RGW, including initial results showing a 4x performance improvement for librbd. Issues discussed include different block vs object caching semantics and status of the RGW caching PR.
Build an High-Performance and High-Durable Block Storage Service Based on CephRongze Zhu
This document discusses building a high-performance and durable block storage service using Ceph. It describes the architecture, including a minimum deployment of 12 OSD nodes and 3 monitor nodes. It outlines optimizations made to Ceph, Qemu, and the operating system configuration to achieve high performance, including 6000 IOPS and 170MB/s throughput. It also discusses how the CRUSH map can be optimized to reduce recovery times and number of copysets to improve durability to 99.99999999%.
This document provides an overview and planning guidelines for a first Ceph cluster. It discusses Ceph's object, block, and file storage capabilities and how it integrates with OpenStack. Hardware sizing examples are given for a 1 petabyte storage cluster with 500 VMs requiring 100 IOPS each. Specific lessons learned are also outlined, such as realistic IOPS expectations from HDD and SSD backends, recommended CPU and RAM per OSD, and best practices around networking and deployment.
Tech Talk: RocksDB Slides by Dhruba Borthakur & Haobo Xu of FacebookThe Hive
This presentation describes the reasons why Facebook decided to build yet another key-value store, the vision and architecture of RocksDB and how it differs from other open source key-value stores. Dhruba describes some of the salient features in RocksDB that are needed for supporting embedded-storage deployments. He explains typical workloads that could be the primary use-cases for RocksDB. He also lays out the roadmap to make RocksDB the key-value store of choice for highly-multi-core processors and RAM-speed storage devices.
This document summarizes BlueStore, a new storage backend for Ceph that provides faster performance compared to the existing FileStore backend. BlueStore manages metadata and data separately, with metadata stored in a key-value database (RocksDB) and data written directly to block devices. This avoids issues with POSIX filesystem transactions and enables more efficient features like checksumming, compression, and cloning. BlueStore addresses consistency and performance problems that arose with previous approaches like FileStore and NewStore.
This presentation provides an overview of the Dell PowerEdge R730xd server performance results with Red Hat Ceph Storage. It covers the advantages of using Red Hat Ceph Storage on Dell servers with their proven hardware components that provide high scalability, enhanced ROI cost benefits, and support of unstructured data.
1) Write-behind logging (WBL) is an alternative to write-ahead logging (WAL) that avoids duplicating data in the log and database for non-volatile memory (NVM) storage.
2) WBL records two commit timestamps - one for the latest persisted changes and one for the latest promised commit. This allows transactions between the timestamps to be ignored during recovery.
3) Recovery for WBL involves analyzing the log to retrieve commit timestamp gaps and long transaction timestamps rather than replaying the entire log as in WAL.
Yfrog uses HBase as its scalable database backend to store and serve 250 million photos from over 60 million monthly users across 4 HBase clusters ranging from 50TB to 1PB in size. The authors provide best practices for configuring and monitoring HBase, including using smaller commodity servers, tuning JVM garbage collection, monitoring metrics like thread usage and disk I/O, and implementing caching and replication for high performance and reliability. Following these practices has allowed Yfrog's HBase deployment to run smoothly and efficiently.
Ceph Day Beijing - Optimizing Ceph Performance by Leveraging Intel Optane and...Danielle Womboldt
Optimizing Ceph performance by leveraging Intel Optane and 3D NAND TLC SSDs. The document discusses using Intel Optane SSDs as journal/metadata drives and Intel 3D NAND SSDs as data drives in Ceph clusters. It provides examples of configurations and analysis of a 2.8 million IOPS Ceph cluster using this approach. Tuning recommendations are also provided to optimize performance.
Development to Production with Sharded MongoDB ClustersSeveralnines
Severalnines presentation at MongoDB Stockholm Conference.
Presentation covers:
- mongoDB sharding/clustering concepts
- recommended dev/test/prod setups
- how to verify your deployment
- how to avoid downtime
- what MongoDB metrics to watch
- when to scale
The Hive Think Tank: Rocking the Database World with RocksDBThe Hive
RocksDB is a new storage engine for MySQL that provides better storage efficiency than InnoDB. It achieves lower space amplification and write amplification than InnoDB through its use of compression and log-structured merge trees. While MyRocks (RocksDB integrated with MySQL) currently has some limitations like a lack of support for online DDL and spatial indexes, work is ongoing to address these limitations and integrate additional RocksDB features to fully support MySQL workloads. Testing at Facebook showed MyRocks uses less disk space and performs comparably to InnoDB for their queries.
Understanding how memory is managed with MongoDB is instrumental in maximizing database performance and hardware utilisation. This talk covers the workings of low level operating system components like the page cache and memory mapped files. We will examine the differences between RAM, SSD and hard disk drives to help you choose the right hardware configuration. Finally, we will learn how to monitor and analyze memory and disk usage using the MongoDB Management Service, linux administration commands and MongoDB commands.
This document discusses the status update of Hadoop running over Ceph RGW with SSD caching. It describes the RGW-Proxy component that returns the closest RGW instance to data, and the RGWFS component that allows Hadoop to access a Ceph cluster through RGW. Performance testing shows that avoiding object renames in Swift reduces overhead compared to HDFS. The next steps are to finish RGWFS development, address heavy renames in RGW, and open source the code.
This document summarizes a distributed storage system called Ceph. Ceph uses an architecture with four main components - RADOS for reliable storage, Librados client libraries, RBD for block storage, and CephFS for file storage. It distributes data across intelligent storage nodes using the CRUSH algorithm and maintains reliability through replication and erasure coding of placement groups across the nodes. The monitors manage the cluster map and placement, while OSDs on each node store and manage the data and metadata.
The Hive Think Tank: Rocking the Database World with RocksDBThe Hive
Dhruba Borthakur, Facebook
Dhruba Borthakur is an engineer at Facebook. He has been one of the founding engineer of RocksDB, an open-source key-value store optimized for storing data in flash and main-memory storage. He has been one of the founding architects of the Apache Hadoop Distributed File System and has been instrumental in scaling Facebook's Hadoop cluster to multiples of petabytes. Dhruba has contributed code to the Apache HBase project. Earlier, he contributed to the development of the Andrew File System (AFS). He has an M.S. in Computer Science from the University of Wisconsin, Madison and a B.S. in Computer Science BITS, Pilani, India.
HKG15-401: Ceph and Software Defined Storage on ARM serversLinaro
HKG15-401: Ceph and Software Defined Storage on ARM servers
---------------------------------------------------
Speaker: Yazen Ghannam Steve Capper
Date: February 12, 2015
---------------------------------------------------
★ Session Summary ★
Running Ceph in the colocation, ongoing optimizations
--------------------------------------------------
★ Resources ★
Pathable: https://hkg15.pathable.com/meetings/250828
Video: https://www.youtube.com/watch?v=RdZojLL7ttk
Etherpad: http://pad.linaro.org/p/hkg15-401
---------------------------------------------------
★ Event Details ★
Linaro Connect Hong Kong 2015 - #HKG15
February 9-13th, 2015
Regal Airport Hotel Hong Kong Airport
---------------------------------------------------
http://www.linaro.org
http://connect.linaro.org
This document discusses secrets management in containers and recommends solutions like Kubernetes Secrets, Docker Swarm Secrets, DC/OS Secrets, Keywhiz, and Hashicorp Vault. It highlights Hashicorp Vault's purpose-built focus on secrets, key rolling capabilities, comprehensive access control, expiration policies, and extensibility. The document then provides a case study of Aqua Security's integration with Hashicorp Vault, which allows for central secret management without persisting secrets to disk, secured communications, control over user/group secret access, usage tracking, and runtime secret rotation/revocation without container restarts.
How to Become a Thought Leader in Your NicheLeslie Samuel
Are bloggers thought leaders? Here are some tips on how you can become one. Provide great value, put awesome content out there on a regular basis, and help others.
#askSAP: Journey to the Cloud: SAP Strategy and Roadmap for Cloud and Hybrid ...SAP Analytics
www.sap.com/businessobjects-cloud. The momentum of customers moving to the SAP BusinessObjects Cloud is rapidly accelerating – and so are the innovations being introduced by SAP. New features and functionality for cloud and on premise with SAP BusinessObjects Enterprise offer hybrid use cases that organizations can take advantage of as they embark on their journey to the cloud. View the webinar reply at http://webinars.sap.com/asksap-webinar-series/en/home#section_3.
Rackspace is a managed cloud computing company with over 6,200 employees serving customers in 150 countries. It has 10 data centers worldwide and annualized revenue of over $2 billion. Rackspace aims to be recognized as one of the world's greatest service companies by providing expert managed services across public cloud, private cloud, and dedicated hosting solutions.
Lily for the Bay Area HBase UG - NYC editionNGDATA
The document discusses Lily, an open source content application developed by Outerthought that uses HBase for scalable storage and SOLR for search. It provides a high-level overview of Lily's architecture, which maps content to HBase, indexes it in SOLR, and uses a queue implemented on HBase to connect updates between the systems. Future plans for Lily include a 1.0 release with additional features like user management and a UI framework.
Rigorous and Multi-tenant HBase PerformanceCloudera, Inc.
The document discusses techniques for rigorously measuring Apache HBase performance in both standalone and multi-tenant environments. It introduces the Yahoo! Cloud Serving Benchmark (YCSB) and best practices for cluster setup, workload generation, data loading, and measurement. These include pre-splitting tables, warming caches, setting target throughput, and using appropriate workload distributions. The document also covers challenges in achieving good multi-tenant performance across HBase, MapReduce and Apache Solr.
This document summarizes a research project on GPUrdma, which enables direct RDMA communication from GPU kernels without CPU intervention. GPUrdma provides a 5 microsecond latency for GPU-to-GPU communication and up to 50 Gbps bandwidth. It implements a direct data path and control path from the GPU to the InfiniBand HCA. Evaluation shows GPUrdma outperforms CPU-based RDMA by a factor of 4.5x for small messages. The document also discusses using GPUrdma to enable the GPI2 framework for partitioned global address space programming across GPUs.
The document discusses using RDMA (Remote Direct Memory Access) efficiently for key-value services. It summarizes background on key-value stores and RDMA. The presentation then explores using one-sided and two-sided RDMA operations for writes versus reads in key-value systems. Experimental results show that optimizing for writes using inline, unreliable, and unsignaled RDMA writes can outperform read-based approaches. While this approach works well, limitations include its assumption of an asymmetric system model and lack of generality. The presentation concludes by discussing lessons learned about challenging assumptions and the need to experiment and optimize for common cases.
Paper on RDMA enabled Cluster FileSystem at Intel Developer Forumsomenathb
The document summarizes Veritas Cluster File System (CFS) with Remote Direct Memory Access (RDMA). CFS provides a scalable, shared file system across cluster nodes. RDMA capabilities from InfiniBand Architecture can improve CFS performance by reducing CPU usage and latency through zero-copy data transfers and remote direct memory access. Key CFS components like the Group Lock Manager benefit from RDMA to enhance coherency and recovery. The Common RDMA Transport Access Layer abstracts RDMA calls to enable CFS to support different transports.
The document discusses Remote Direct Memory Access (RDMA) over IP as a way to avoid data copying and reduce host processing overhead for high-speed data transfers. It proposes an architecture with two layers - Direct Data Placement (DDP) and RDMA control - running over IP transports. RDMA over IP aims to make network I/O "free" by allowing the network adapter to directly place data into application buffers without involving the host CPU. This could improve throughput and allow more machines to be supported for high-bandwidth data center applications. Open issues that still need to be addressed include security, interaction with TCP, atomic operations, and impact on network behaviors.
Watch video on Youtube! : http://www.youtube.com/watch?v=aZDKyNtSqOo
장소 : 서울시 용산구 원효로 3가 53-5 청진 빌딩 10층 1004호 TERA TEC 사무실
시간 : 2010년 1월 30일 토요일 오후 2:00
발표 : 김성윤님, 강분도님, 노태상님 - 리눅스 커널 - 개요 및 이슈,
세미나 정보 : http://www.ubuntu.or.kr/viewtopic.php...
Place : TERA TEC Office, 1004, 10th floor, Cheongjin Bldg., Wonhyoro 3-ga, Yongsan-gu, Seoul, Korea
Time : 14:00, Saturday, 2010Y 1M 30D
Presentation : Kim Seongyun, Kang Bundo, Noh Taesang - Linux Kernel - Outline and issue
Seminar Info : http://www.ubuntu.or.kr/viewtopic.php...
About Ubuntu
Ubuntu is an ancient African word meaning 'humanity to others'.
It also means 'I am what I am because of who we all are'.
The Ubuntu operating system brings the spirit of Ubuntu to the world of computers.
http://www.ubuntu.com
About Ubuntu Korea Community
We want to be happy using Ubuntu.
'Korean Ubuntu User Forum' Welcomes your voluntary supports.
http://www.ubuntu-kr.org
Edgecombe County is located in eastern North Carolina with a population of around 56,000 people, the majority of whom are black. The county has struggled with poverty and low incomes, and its economy was traditionally based around manufacturing but has shifted more towards retail and healthcare. Key facts about Edgecombe County's demographics, economy, infrastructure, education levels, and political landscape are presented for context.
This document discusses NoSQL databases and how they relate to big data. It provides examples of column-oriented NoSQL databases like Cassandra, document-oriented databases like MongoDB, and key-value stores like Dynamo. It also briefly summarizes characteristics of different database categories and how big data problems can be differentiated based on the five V's: volume, velocity, variety, value and variability.
Yesterday's thinking may still believe NVMe (NVM Express) is in transition to a production ready solution. In this session, we will discuss how the evolution of NVMe is ready for production, the history and evolution of NVMe and the Linux stack to address where NVMe has progressed today to become the low latency, highly reliable database key value store mechanism that will drive the future of cloud expansion. Examples of protocol efficiencies and types of storage engines that are optimizing for NVMe will be discussed. Please join us for an exciting session where in-memory computing and persistence have evolved.
El documento describe las propiedades del agua, incluyendo sus estados y la construcción de su molécula. Explica cómo el agua puede disolver muchas sustancias y la diferencia entre agua potable y no potable. Luego describe experimentos en el laboratorio para estudiar los cambios de estado del agua y reconocer sus formas. Finalmente, explica el ciclo natural del agua, incluyendo la condensación, precipitación, escorrentía, filtración y retorno al mar, así como el ciclo urbano del agua desde la captación hasta la dep
This document summarizes a presentation about FlashGrid, an alternative to Oracle Exadata that aims to achieve similar performance levels using commodity hardware. It discusses the key components of FlashGrid including the Linux kernel, networking protocols like Infiniband and NVMe, and hardware. Benchmarks show FlashGrid achieving comparable IOPS and throughput to Exadata on a single server. While Exadata has proprietary advantages, FlashGrid offers excellent raw performance at lower cost and with simpler maintenance through the use of standard technologies.
This document discusses persistent memory and the Linux software stack. It begins by covering the evolution of non-volatile memory from battery backed RAM to emerging technologies like PCM and memristors. It then outlines the persistent memory Linux software stack, including the kernel subsystem and NVDIMM architecture. Finally, it discusses using and emulating persistent memory on Linux, including kernel configuration, hardware options, and libraries for programming with persistent memory.
Presentation from OpenStack Summit Tokyo
Online video link is below.
https://www.openstack.org/summit/tokyo-2015/videos/presentation/approaching-open-source-hyper-converged-openstack-using-40gbit-ethernet-network
Accelerating HBase with NVMe and Bucket CacheNicolas Poggi
on-Volatile-Memory express (NVMe) standard promises and order of magnitude faster storage than regular SSDs, while at the same time being more economical than regular RAM on TB/$. This talk evaluates the use cases and benefits of NVMe drives for its use in Big Data clusters with HBase and Hadoop HDFS.
First, we benchmark the different drives using system level tools (FIO) to get maximum expected values for each different device type and set expectations. Second, we explore the different options and use cases of HBase storage and benchmark the different setups. And finally, we evaluate the speedups obtained by the NVMe technology for the different Big Data use cases from the YCSB benchmark.
In summary, while the NVMe drives show up to 8x speedup in best case scenarios, testing the cost-efficiency of new device technologies is not straightforward in Big Data, where we need to overcome system level caching to measure the maximum benefits.
The state of Hive and Spark in the Cloud (July 2017)Nicolas Poggi
Originally presented at the BDOOP and Spark Barcelona meetup groups: http://meetu.ps/3bwCTM
Cloud providers currently offer convenient on-demand managed big data clusters (PaaS) with a pay-as-you-go model. In PaaS, analytical engines such as Spark and Hive come ready to use, with a general-purpose configuration and upgrade management. Over the last year, the Spark framework and APIs have been evolving very rapidly, with major improvements on performance and the release of v2, making it challenging to keep up-to-date production services both on-premises and in the cloud for compatibility and stability. The talk compares:
• The performance of both v1 and v2 for Spark and Hive
• PaaS cloud services: Azure HDinsight, Amazon Web Services EMR, Google Cloud Dataproc
• Out-of-the-box support for Spark and Hive versions from providers
• PaaS reliability, scalability, and price-performance of the solutions
Using BigBench, the new Big Data benchmark standard. BigBench combines SQL queries, MapReduce, user code (UDF), and machine learning, which makes it ideal to stress Spark libraries (SparkSQL, DataFrames, MLlib, etc.).
This document discusses performance improvements to the Lustre parallel file system in versions 2.5 through large I/O patches, metadata improvements, and metadata scaling with distributed namespace (DNE). It summarizes evaluations showing improved throughput from 4MB RPC, reduced degradation with large numbers of threads using SSDs over NL-SAS, high random read performance from SSD pools, and significant metadata performance gains in Lustre 2.4 from DNE allowing nearly linear scaling. Key requirements for next-generation storage include extreme IOPS, tiered architectures using local flash with parallel file systems, and reducing infrastructure needs while maintaining throughput.
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...Chester Chen
Machine Learning at the Limit
John Canny, UC Berkeley
How fast can machine learning and graph algorithms be? In "roofline" design, every kernel is driven toward the limits imposed by CPU, memory, network etc. This can lead to dramatic improvements: BIDMach is a toolkit for machine learning that uses rooflined design and GPUs to achieve two- to three-orders of magnitude improvements over other toolkits on single machines. These speedups are larger than have been reported for *cluster* systems (e.g. Spark/MLLib, Powergraph) running on hundreds of nodes, and BIDMach with a GPU outperforms these systems for most common machine learning tasks. For algorithms (e.g. graph algorithms) which do require cluster computing, we have developed a rooflined network primitive called "Kylix". We can show that Kylix approaches the rooline limits for sparse Allreduce, and empirically holds the record for distributed Pagerank. Beyond rooflining, we believe there are great opportunities from deep algorithm/hardware codesign. Gibbs Sampling (GS) is a very general tool for inference, but is typically much slower than alternatives. SAME (State Augmentation for Marginal Estimation) is a variation of GS which was developed for marginal parameter estimation. We show that it has high parallelism, and a fast GPU implementation. Using SAME, we developed a GS implementation of Latent Dirichlet Allocation whose running time is 100x faster than other samplers, and within 3x of the fastest symbolic methods. We are extending this approach to general graphical models, an area where there is currently a void of (practically) fast tools. It seems at least plausible that a general-purpose solution based on these techniques can closely approach the performance of custom algorithms.
Bio
John Canny is a professor in computer science at UC Berkeley. He is an ACM dissertation award winner and a Packard Fellow. He is currently a Data Science Senior Fellow in Berkeley's new Institute for Data Science and holds a INRIA (France) International Chair. Since 2002, he has been developing and deploying large-scale behavioral modeling systems. He designed and protyped production systems for Overstock.com, Yahoo, Ebay, Quantcast and Microsoft. He currently works on several applications of data mining for human learning (MOOCs and early language learning), health and well-being, and applications in the sciences.
The state of SQL-on-Hadoop in the CloudNicolas Poggi
With the increase of Hadoop offerings in the Cloud, users are faced with many decisions to make: which Cloud provider, VMs to choose, cluster sizing, storage type, or even if to go to fully managed Platform-as-a-Service (PaaS) Hadoop? As the answer is always "depends on your data and usage", this talk will guide participants over an overview of the different PaaS solutions for the leading Cloud providers. By highlighting the main results benchmarking their SQL-on-Hadoop (i.e., Hive) services using the ALOJA benchmarking project. To compare their current offerings in terms of readiness, architectural differences, and cost-effectiveness (performance-to-price), to entry-level Hadoop based deployments. As well as briefly presenting how to replicate results and create custom benchmarks from internal apps. So that users can make their own decisions about choosing the right provider to their particular data needs.
Red Hat Storage Server Administration Deep DiveRed_Hat_Storage
"In this session for administrators of all skill levels, you’ll get a deep technical dive into Red Hat Storage Server and GlusterFS administration.
We’ll start with the basics of what scale-out storage is, and learn about the unique implementation of Red Hat Storage Server and its advantages over legacy and competing technologies. From the basic knowledge and design principles, we’ll move to a live start-to-finish demonstration. Your experience will include:
Building a cluster.
Allocating resources.
Creating and modifying volumes of different types.
Accessing data via multiple client protocols.
A resiliency demonstration.
Expanding and contracting volumes.
Implementing directory quotas.
Recovering from and preventing split-brain.
Asynchronous parallel geo-replication.
Behind-the-curtain views of configuration files and logs.
Extended attributes used by GlusterFS.
Performance tuning basics.
New and upcoming feature demonstrations.
Those new to the scale-out product will leave this session with the knowledge and confidence to set up their first Red Hat Storage Server environment. Experienced administrators will sharpen their skills and gain insights into the newest features. IT executives and managers will gain a valuable overview to help fuel the drive for next-generation infrastructures."
The document summarizes key topics and industry talks from the China Linux Summit Forum (CLSF) 2010 conference in Shanghai. It discusses presentations on writeback optimization, the BTRFS file system, SSD challenges, VFS scalability, kernel testing frameworks, and talks from companies like Intel, EMC, Taobao, and Baidu on their storage architectures and solutions. Attendees included representatives from Intel, EMC, Fujitsu, Taobao, Novell, Oracle, Baidu, and Canonical discussing topics around file systems, storage, and kernel optimizations.
Cassandra Day Chicago 2015: DataStax Enterprise & Apache Cassandra Hardware B...DataStax Academy
Speaker(s): Kathryn Erickson, Engineering at DataStax
During this session we will discuss varying recommended hardware configurations for DSE. We’ll get right to the point and provide quick and solid recommendations up front. After we get the main points down take a brief tour of the history of database storage and then focus on designing a storage subsystem that won't let you down.
Data deduplication is a hot topic in storage and saves significant disk space for many environments, with some trade offs. We’ll discuss what deduplication is and where the Open Source solutions are versus commercial offerings. Presentation will lean towards the practical – where attendees can use it in their real world projects (what works, what doesn’t, should you use in production, etcetera).
More at http://sites.google.com/site/cudaiap2009 and http://pinto.scripts.mit.edu/Classes/CUDAIAP2009
Galaxy Big Data with MariaDB 10 by Bernard Garros, Sandrine Chirokoff and Stéphane Varoqui.
Presented 26.6.2014 at the MariaDB Roadshow in Paris, France.
Storage Spaces Direct - the new Microsoft SDS star - Carsten RachfahlITCamp
Storage Spaces Direct will provide new unseen possibilities for Microsoft Hypervisor Hyper-V. These are on one hand a high performant, high available Scale-Out Fileserver with the possibility to use internal not shared disks like SATA HDDs and SSDs and even NVMe Devices. On the other hand, you can build a Hyper-converged Hyper-V Cluster where the VMs and their Storage are running on the same Servers. And let’s not forget Azure Stack! The first version of Microsoft Private/Hosted Cloud solution will only be supported on the hyper-converged S2D infrastructure. Join this session to learn about this great new technology that will have its role in the future Private and Hosted Cloud infrastructure implementations.
QCT Ceph Solution - Design Consideration and Reference ArchitecturePatrick McGarry
This document discusses QCT's Ceph storage solutions, including an overview of Ceph architecture, QCT hardware platforms, Red Hat Ceph software, workload considerations, reference architectures, test results and a QCT/Red Hat whitepaper. It provides technical details on QCT's throughput-optimized and capacity-optimized solutions and shows how they address different storage needs through workload-driven design. Hands-on testing and a test drive lab are offered to explore Ceph features and configurations.
QCT Ceph Solution - Design Consideration and Reference ArchitectureCeph Community
This document discusses QCT's Ceph storage solutions, including an overview of Ceph architecture, QCT hardware platforms, Red Hat Ceph software, workload considerations, benchmark testing results, and a collaboration between QCT, Red Hat, and Intel to provide optimized and validated Ceph solutions. Key reference architectures are presented targeting small, medium, and large storage capacities with options for throughput, capacity, or IOPS optimization.
Red Hat Storage Day New York - New Reference ArchitecturesRed_Hat_Storage
The document provides an overview and summary of Red Hat's reference architecture work including MySQL and Hadoop, software-defined NAS, and digital media repositories. It discusses trends toward disaggregating Hadoop compute and storage and various data flow options. It also summarizes performance testing Red Hat conducted comparing AWS EBS and Ceph for MySQL workloads, and analyzing factors like IOPS/GB ratios, core-to-flash ratios, and pricing. Server categories and vendor examples are defined. Comparisons of throughput and costs at scale between software-defined scale-out storage and traditional enterprise NAS solutions are also presented.
Red Hat Storage Day Dallas - Red Hat Ceph Storage Acceleration Utilizing Flas...Red_Hat_Storage
Red Hat Ceph Storage can utilize flash technology to accelerate applications in three ways: 1) utilize flash caching to accelerate critical data writes and reads, 2) utilize storage tiering to place performance critical data on flash and less critical data on HDDs, and 3) utilize all-flash storage to accelerate performance when all data is critical or caching/tiering cannot be used. The document then discusses best practices for leveraging NVMe SSDs versus SATA SSDs in Ceph configurations and optimizing Linux settings.
In-memory Caching in HDFS: Lower Latency, Same Great TasteDataWorks Summit
This document discusses in-memory caching in HDFS to improve query latency. The implementation caches important datasets in the DataNode memory and allows clients to directly access cached blocks via zero-copy reads without checksum verification. Evaluation shows the zero-copy reads approach provides significant performance gains over short-circuit and TCP reads for both microbenchmarks and Impala queries, with speedups of up to 7x when the working set fits in memory. MapReduce jobs see more modest gains as they are often not I/O bound.
Logging at OVHcloud :
Logs Data platform est la plateforme de collecte, d'analyse et de gestion centralisée de logs d'OVHcloud. Cette plateforme a pour but de répondre aux challenges que constitue l'indexation de plus de 4000 milliards de logs par une entreprise comme OVHcloud. Cette présentation vous décrira l'architecture générale de Logs Data Platform autour de ses composants centraux Elasticsearch et Graylog et vous décrira les différentes problématiques de scalabilité, disponibilité, performance et d'évolutivité qui sont le quotidien de l'équipe Observability à OVHcloud.
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...Alan Dix
Talk at the final event of Data Fusion Dynamics: A Collaborative UK-Saudi Initiative in Cybersecurity and Artificial Intelligence funded by the British Council UK-Saudi Challenge Fund 2024, Cardiff Metropolitan University, 29th April 2025
https://alandix.com/academic/talks/CMet2025-AI-Changes-Everything/
Is AI just another technology, or does it fundamentally change the way we live and think?
Every technology has a direct impact with micro-ethical consequences, some good, some bad. However more profound are the ways in which some technologies reshape the very fabric of society with macro-ethical impacts. The invention of the stirrup revolutionised mounted combat, but as a side effect gave rise to the feudal system, which still shapes politics today. The internal combustion engine offers personal freedom and creates pollution, but has also transformed the nature of urban planning and international trade. When we look at AI the micro-ethical issues, such as bias, are most obvious, but the macro-ethical challenges may be greater.
At a micro-ethical level AI has the potential to deepen social, ethnic and gender bias, issues I have warned about since the early 1990s! It is also being used increasingly on the battlefield. However, it also offers amazing opportunities in health and educations, as the recent Nobel prizes for the developers of AlphaFold illustrate. More radically, the need to encode ethics acts as a mirror to surface essential ethical problems and conflicts.
At the macro-ethical level, by the early 2000s digital technology had already begun to undermine sovereignty (e.g. gambling), market economics (through network effects and emergent monopolies), and the very meaning of money. Modern AI is the child of big data, big computation and ultimately big business, intensifying the inherent tendency of digital technology to concentrate power. AI is already unravelling the fundamentals of the social, political and economic world around us, but this is a world that needs radical reimagining to overcome the global environmental and human challenges that confront us. Our challenge is whether to let the threads fall as they may, or to use them to weave a better future.
AI and Data Privacy in 2025: Global TrendsInData Labs
In this infographic, we explore how businesses can implement effective governance frameworks to address AI data privacy. Understanding it is crucial for developing effective strategies that ensure compliance, safeguard customer trust, and leverage AI responsibly. Equip yourself with insights that can drive informed decision-making and position your organization for success in the future of data privacy.
This infographic contains:
-AI and data privacy: Key findings
-Statistics on AI data privacy in the today’s world
-Tips on how to overcome data privacy challenges
-Benefits of AI data security investments.
Keep up-to-date on how AI is reshaping privacy standards and what this entails for both individuals and organizations.
Book industry standards are evolving rapidly. In the first part of this session, we’ll share an overview of key developments from 2024 and the early months of 2025. Then, BookNet’s resident standards expert, Tom Richardson, and CEO, Lauren Stewart, have a forward-looking conversation about what’s next.
Link to recording, presentation slides, and accompanying resource: https://bnctechforum.ca/sessions/standardsgoals-for-2025-standards-certification-roundup/
Presented by BookNet Canada on May 6, 2025 with support from the Department of Canadian Heritage.
Big Data Analytics Quick Research Guide by Arthur MorganArthur Morgan
This is a Quick Research Guide (QRG).
QRGs include the following:
- A brief, high-level overview of the QRG topic.
- A milestone timeline for the QRG topic.
- Links to various free online resource materials to provide a deeper dive into the QRG topic.
- Conclusion and a recommendation for at least two books available in the SJPL system on the QRG topic.
QRGs planned for the series:
- Artificial Intelligence QRG
- Quantum Computing QRG
- Big Data Analytics QRG
- Spacecraft Guidance, Navigation & Control QRG (coming 2026)
- UK Home Computing & The Birth of ARM QRG (coming 2027)
Any questions or comments?
- Please contact Arthur Morgan at [email protected].
100% human made.
Dev Dives: Automate and orchestrate your processes with UiPath MaestroUiPathCommunity
This session is designed to equip developers with the skills needed to build mission-critical, end-to-end processes that seamlessly orchestrate agents, people, and robots.
📕 Here's what you can expect:
- Modeling: Build end-to-end processes using BPMN.
- Implementing: Integrate agentic tasks, RPA, APIs, and advanced decisioning into processes.
- Operating: Control process instances with rewind, replay, pause, and stop functions.
- Monitoring: Use dashboards and embedded analytics for real-time insights into process instances.
This webinar is a must-attend for developers looking to enhance their agentic automation skills and orchestrate robust, mission-critical processes.
👨🏫 Speaker:
Andrei Vintila, Principal Product Manager @UiPath
This session streamed live on April 29, 2025, 16:00 CET.
Check out all our upcoming Dev Dives sessions at https://community.uipath.com/dev-dives-automation-developer-2025/.
How Can I use the AI Hype in my Business Context?Daniel Lehner
𝙄𝙨 𝘼𝙄 𝙟𝙪𝙨𝙩 𝙝𝙮𝙥𝙚? 𝙊𝙧 𝙞𝙨 𝙞𝙩 𝙩𝙝𝙚 𝙜𝙖𝙢𝙚 𝙘𝙝𝙖𝙣𝙜𝙚𝙧 𝙮𝙤𝙪𝙧 𝙗𝙪𝙨𝙞𝙣𝙚𝙨𝙨 𝙣𝙚𝙚𝙙𝙨?
Everyone’s talking about AI but is anyone really using it to create real value?
Most companies want to leverage AI. Few know 𝗵𝗼𝘄.
✅ What exactly should you ask to find real AI opportunities?
✅ Which AI techniques actually fit your business?
✅ Is your data even ready for AI?
If you’re not sure, you’re not alone. This is a condensed version of the slides I presented at a Linkedin webinar for Tecnovy on 28.04.2025.
Procurement Insights Cost To Value Guide.pptxJon Hansen
Procurement Insights integrated Historic Procurement Industry Archives, serves as a powerful complement — not a competitor — to other procurement industry firms. It fills critical gaps in depth, agility, and contextual insight that most traditional analyst and association models overlook.
Learn more about this value- driven proprietary service offering here.
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungenpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-nomad-web-best-practices-und-verwaltung-von-multiuser-umgebungen/
HCL Nomad Web wird als die nächste Generation des HCL Notes-Clients gefeiert und bietet zahlreiche Vorteile, wie die Beseitigung des Bedarfs an Paketierung, Verteilung und Installation. Nomad Web-Client-Updates werden “automatisch” im Hintergrund installiert, was den administrativen Aufwand im Vergleich zu traditionellen HCL Notes-Clients erheblich reduziert. Allerdings stellt die Fehlerbehebung in Nomad Web im Vergleich zum Notes-Client einzigartige Herausforderungen dar.
Begleiten Sie Christoph und Marc, während sie demonstrieren, wie der Fehlerbehebungsprozess in HCL Nomad Web vereinfacht werden kann, um eine reibungslose und effiziente Benutzererfahrung zu gewährleisten.
In diesem Webinar werden wir effektive Strategien zur Diagnose und Lösung häufiger Probleme in HCL Nomad Web untersuchen, einschließlich
- Zugriff auf die Konsole
- Auffinden und Interpretieren von Protokolldateien
- Zugriff auf den Datenordner im Cache des Browsers (unter Verwendung von OPFS)
- Verständnis der Unterschiede zwischen Einzel- und Mehrbenutzerszenarien
- Nutzung der Client Clocking-Funktion
Artificial Intelligence is providing benefits in many areas of work within the heritage sector, from image analysis, to ideas generation, and new research tools. However, it is more critical than ever for people, with analogue intelligence, to ensure the integrity and ethical use of AI. Including real people can improve the use of AI by identifying potential biases, cross-checking results, refining workflows, and providing contextual relevance to AI-driven results.
News about the impact of AI often paints a rosy picture. In practice, there are many potential pitfalls. This presentation discusses these issues and looks at the role of analogue intelligence and analogue interfaces in providing the best results to our audiences. How do we deal with factually incorrect results? How do we get content generated that better reflects the diversity of our communities? What roles are there for physical, in-person experiences in the digital world?
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdfAbi john
Analyze the growth of meme coins from mere online jokes to potential assets in the digital economy. Explore the community, culture, and utility as they elevate themselves to a new era in cryptocurrency.
Role of Data Annotation Services in AI-Powered ManufacturingAndrew Leo
From predictive maintenance to robotic automation, AI is driving the future of manufacturing. But without high-quality annotated data, even the smartest models fall short.
Discover how data annotation services are powering accuracy, safety, and efficiency in AI-driven manufacturing systems.
Precision in data labeling = Precision on the production floor.
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...SOFTTECHHUB
I started my online journey with several hosting services before stumbling upon Ai EngineHost. At first, the idea of paying one fee and getting lifetime access seemed too good to pass up. The platform is built on reliable US-based servers, ensuring your projects run at high speeds and remain safe. Let me take you step by step through its benefits and features as I explain why this hosting solution is a perfect fit for digital entrepreneurs.
Semantic Cultivators : The Critical Future Role to Enable AIartmondano
By 2026, AI agents will consume 10x more enterprise data than humans, but with none of the contextual understanding that prevents catastrophic misinterpretations.
Technology Trends in 2025: AI and Big Data AnalyticsInData Labs
At InData Labs, we have been keeping an ear to the ground, looking out for AI-enabled digital transformation trends coming our way in 2025. Our report will provide a look into the technology landscape of the future, including:
-Artificial Intelligence Market Overview
-Strategies for AI Adoption in 2025
-Anticipated drivers of AI adoption and transformative technologies
-Benefits of AI and Big data for your business
-Tips on how to prepare your business for innovation
-AI and data privacy: Strategies for securing data privacy in AI models, etc.
Download your free copy nowand implement the key findings to improve your business.
Mobile App Development Company in Saudi ArabiaSteve Jonas
EmizenTech is a globally recognized software development company, proudly serving businesses since 2013. With over 11+ years of industry experience and a team of 200+ skilled professionals, we have successfully delivered 1200+ projects across various sectors. As a leading Mobile App Development Company In Saudi Arabia we offer end-to-end solutions for iOS, Android, and cross-platform applications. Our apps are known for their user-friendly interfaces, scalability, high performance, and strong security features. We tailor each mobile application to meet the unique needs of different industries, ensuring a seamless user experience. EmizenTech is committed to turning your vision into a powerful digital product that drives growth, innovation, and long-term success in the competitive mobile landscape of Saudi Arabia.
HCL Nomad Web – Best Practices and Managing Multiuser Environmentspanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-nomad-web-best-practices-and-managing-multiuser-environments/
HCL Nomad Web is heralded as the next generation of the HCL Notes client, offering numerous advantages such as eliminating the need for packaging, distribution, and installation. Nomad Web client upgrades will be installed “automatically” in the background. This significantly reduces the administrative footprint compared to traditional HCL Notes clients. However, troubleshooting issues in Nomad Web present unique challenges compared to the Notes client.
Join Christoph and Marc as they demonstrate how to simplify the troubleshooting process in HCL Nomad Web, ensuring a smoother and more efficient user experience.
In this webinar, we will explore effective strategies for diagnosing and resolving common problems in HCL Nomad Web, including
- Accessing the console
- Locating and interpreting log files
- Accessing the data folder within the browser’s cache (using OPFS)
- Understand the difference between single- and multi-user scenarios
- Utilizing Client Clocking
Drupalcamp Finland – Measuring Front-end Energy ConsumptionExove
Accelerating hbase with nvme and bucket cache
1. Evaluating NVMe drives for
accelerating HBase
Nicolas Poggi and David Grier
Denver/Boulder BigData 2017
BSC Data Centric Computing – Rackspace collaboration
2. Outline
1. Intro on Rackspace BSC and
ALOJA
2. Cluster specs and disk
benchmarks
3. HBase use case with NVE
1. Read-only workload
• Different strategies
2. Mixed workload
4. Summary
Denver/Boulder BigData 2017 2
Max of bw (MB/s)
Max of lat (us)
BYTES/S
REQ SIZE
3. Barcelona Supercomputing Center (BSC)
• Spanish national supercomputing center 22 years history in:
• Computer Architecture, networking and distributed systems
research
• Based at BarcelonaTech University (UPC)
• Large ongoing life science computational projects
• Prominent body of research activity around Hadoop
• 2008-2013: SLA Adaptive Scheduler, Accelerators, Locality
Awareness, Performance Management. 7+ publications
• 2013-Present: Cost-efficient upcoming Big Data architectures
(ALOJA) 6+ publications
5. ALOJA: towards cost-effective Big Data
• Research project for automating characterization and
optimization of Big Data deployments
• Open source Benchmarking-to-Insights platform and tools
• Largest Big Data public repository (70,000+ jobs)
• Community collaboration with industry and academia
http://aloja.bsc.es
Big Data
Benchmarking
Online
Repository
Web / ML
Analytics
6. Motivation and objectives
• Explore use cases where NVMe devices
can speedup Big Data apps
• Poor initial results…
• HBase (this study) based on Intel report
• Measure the possibilities of NVMe devices
• System level benchmarks (FIO, IO Meter)
• WiP towards tiered-storage for Big data status
• Extend ALOJA into low-level I/O
• Challenge
• benchmark and stress high-end Big Data
clusters
• In reasonable amount of time (and cost)
First tests:
Denver/Boulder BigData 2017 6
8512 8667 8523
9668
0
2000
4000
6000
8000
10000
12000
Seconds
Lower is better
Running time of terasort (1TB) under different
disks
terasort
teragen
Marginal improvement!!!
7. Cluster and drive specs
All nodes (x5)
Operating System CentOS 7.2
Memory 128GB
CPU Single Octo-core (16 threads)
Disk Config OS 2x600GB SAS RAID1
Network 10Gb/10Gb redundant
Master node (x1, extra nodes for HA not used in these tests)
Disk Config Master storage 4x600GB SAS RAID10
XSF partition
Data Nodes (x4)
NVMe (cache) 1.6TB Intel DC P3608 NVMe SSD (PCIe)
Disk config HDFS data
storage
10x 0.6TB NL-SAS/SATA JBOD
PCIe3 x8, 12Gb/s SAS RAID
SEAGATE ST3600057SS 15K (XFS partition)
1. Intel DC P3608 (current 2015)
• PCI-Express 3.0 x8 lanes
• Capacity: 1.6TB (two drives)
• Seq R/W BW:
• 5000/2000 MB/s, (128k req)
• Random 4k R/W IOPS:
• 850k/150k
• Random 8k R/W IOPS:
• 500k/60k, 8 workers
• Price $10,780
• (online search 02/2017)
2. LSI Nytro WarpDrive 4-400 (old gen 2012)
• PCI-Express 2.0 x8 lanes
• Capacity: 1.6TB (two drives)
• Seq. R/W BW:
• 2000/1000 MB/s (256k req)
• R/W IOPS:
• 185k/120k (8k req)
• Price $4,096
• (online search 02/2017)
• $12,195 MSRP 2012
Denver/Boulder BigData 2017 7
8. FIO Benchmarks
Objectives:
• Assert vendor specs (BW, IOPS, Latency)
• Seq R/W, Random R/W
• Verify driver/firmware and OS
• Set performance expectations
Commands on reference on last slides 8
Max of bw (MB/s)
Max of lat (us)
0
5000000
10000000
15000000
20000000
25000000
524288
1048576
2097152
4194304
Bytes/s
Req Size
9. FIO results: Max Bandwidth
Higher is better.
Max BW recorded for each device under different settings: req size, io depth, threads. Using libaio. 9
Results:
• Random R/W similar in both
PCIe SSDs
• But not for the SAS
JBOD
• SAS JBOD achieves high both
seq R/W
• 10 disks 2GB/s
• Achieved both PCIe vendor
numbers
• Also on IOPS
• Combined WPD disks only
improve in W performance
Intel NVMe SAS (15KRPM) 10 and 1 disk(s) PCIe SSD (old gen) 1 and 2 disks
NVMe (2 disks P3608) SAS JBOD (10d) SAS disk (1 disk 15K) PCIe (WPD 1 disk) PCIe (WPD 2 disks)
New cluster (NVMe) Old Cluster
randread 4674.99 409.95 118.07 1935.24 4165.65
randwrite 2015.27 843 249.06 1140.96 1256.12
read 4964.44 1861.4 198.54 2033.52 3957.42
write 2006 1869.49 204.17 1201.52 2066.48
0
1000
2000
3000
4000
5000
6000
MB/s
Max bandwidth (MB/s) per disk type
10. NVMe (2 disks P3608) SAS JBOD (10d) PCIe (WPD 1 disk) PCIe (WPD 2 disks)
New cluster (NVMe) Old Cluster
randread 381.43 5823.5 519.61 250.06
randwrite 389.9 1996.9 1340.35 252.96
read 369.53 294.33 405.65 204.39
write 369.42 280.2 852.03 410.14
0
1000
2000
3000
4000
5000
6000
7000
µsecs
Average latency by device (64KB req size, 1 io depth)
FIO results: Latency (smoke test)
Higher is better.
Average latency for req size 64KB and 1 io depth (varying workers). Using libaio. 10
Results:
• JBOD has highest latency for
random R/W (as expected)
• But very low for seq
• Combined WPD disks lower
the latency
• Lower than P3608
disks.
Notes:
• Need to more thorough
comparison and at different
settings.
Intel NVMe SAS 10 disk JBOD PCIe SSD (old gen) 1 and 2 disks
High latency
11. HBase in a nutshell
• Highly scalable Big data key-value store
• On top of Hadoop (HDFS)
• Based on Google’s Bigtable
• Real-time and random access
• Indexed
• Low-latency
• Block cache and Bloom Filters
• Linear, modular scalability
• Automatic sharding of tables and failover
• Strictly consistent reads and writes.
• failover support
• Production ready and battle tested
• Building block of other projects
HBase R/W architecture
Denver/Boulder BigData 2017 11
Source: HDP doc
JVM Heap
12. L2 Bucket Cache (BC) in HBase
Region server (worker) memory with BC
Denver/Boulder BigData 2017 12
• Adds a second “block” storage for HFiles
• Use case: L2 cache and replaces OS buffer
cache
• Does copy-on‐read
• Fixed sized, reserved on startup
• 3 different modes:
• Heap
• Marginal improvement
• Divides mem with the block cache
• Offheap (in RAM)
• Uses Java NIO’s Direct ByteBuffer
• File
• Any local file / device
• Bypasses HDFS
• Saves RAM
13. L2-BucketCache experiments summary
Tested configurations for HBase v1.24
1. HBase default (baseline)
2. HBase w/ Bucket cache Offheap
1. Size: 32GB /work node
3. HBase w/ Bucket cache in RAM disk
1. Size: 32GB /work node
4. HBase w/ Bucket cache in NVMe disk
1. Size: 250GB / worker node
• All using same Hadoop and HDFS
configuration
• On JBOD (10 SAS disks, /grid/{0,9})
• 1 Replica, short-circuit reads
Experiments
1. Read-only (workload C)
1. RAM at 128GB / node
2. RAM at 32GB / node
3. Clearing buffer cache
2. Full YCSB (workloads A-F)
4. RAM at 128GB / node
5. RAM at 32GB / node
• Payload:
• YCSB 250M records.
• ~2TB HDFS raw
Denver/Boulder BigData 2017 13
15. E 1.1: Throughput of the 4 configurations
(128GB RAM)
Higher is better for Ops (Y1), lower for latency (Y2)
(Offheap run2 and 3 the same run)
15
Results:
• Ops/Sec improve with BC
• Offheap 1.6x
• RAMd 1.5x
• NVMe 2.3x
• AVG latency as well
• (req time)
• First run slower on 4
cases (writes OS cache
and BC)
• Baseline and RAMd
only 6% faster after
• NVMe 16%
• 3rd run not faster, cache
already loaded
• Tested onheap config:
• > 8GB failed
• 8GB slower than
baseline
Baseline BucketCache Offheap BucketCache RAM disk BucketCache NVMe
WorkloadC_run1 105610 133981 161712 218342
WorkloadC_run2 111530 175483 171236 257017
WorkloadC_run3 111422 175483 170889 253625
Cache Time % 5.5 31 5.6 16.2
Speeup (run3) 1 1.57 1.53 2.28
Latency µs (run3) 4476 2841 2917 1964
0
500
1000
1500
2000
2500
3000
3500
4000
4500
5000
0
50000
100000
150000
200000
250000
300000
Ops/Sec
Throughput of 3 consecutive iterations of Workload C (128GB)
16. E 1.1 Cluster resource consumption: Baseline
16
CPU % (AVG)
Disk R/W MB/s (SUM)
Mem Usage KB (AVG)
NET R/W Mb/s (SUM)
Notes:
• Java heap and OS buffer cache holds 100% of WL
• Data is read from disks (HDFS) only in first part of
the run,
• then throughput stabilizes (see NET and CPU)
• Free resources,
• Bottleneck in application and OS path (not shown)
Write
Read
17. E 1.1 Cluster resource consumption: Bucket Cache strategies
17
Offheap (32GB) Disk R/W RAM disk (tmpfs 32GB) Disk R/W
NVMe (250GB) Disk R/WNotes:
• 3 BC strategies faster than baseline
• BC LRU more effective than OS buffer
• Offheap slightly more efficient ran RAMd (same size)
• But seems to take longer to fill (different per node)
• And more capacity for same payload (plus java heap)
• NVM can hold the complete WL in the BC
• Read and Writes to MEM not captured by charts
BC fills on 1st run
WL doesn’t fit
completely
18. E 1.2-3
Challenge:
Limit OS buffer cache effect on experiments
1st approach, larger payload. Cons: high execution time
2nd limit available RAM (using stress tool)
3rd clear buffer cache periodically (drop caches)
Denver/Boulder BigData 2017 18
19. E1.2: Throughput of the 4 configurations
(32GB RAM)
Higher is better for Ops (Y1), lower for latency (Y2) 19
Results:
• Ops/Sec improve only
with NVMe up to 8X
• RAMd performs
close to baseline
• First run same on baseline
and RAMd
• tmpfs “blocks” as
RAM is needed
• At lower capacity,
external BC shows more
improvement
Baseline BucketCache Offheap BucketCache RAM disk BucketCache NVMe
WorkloadC_run1 20578 14715.493 21520 109488
WorkloadC_run2 20598 16995 21534 166588
Speeup (run2) 1 0.83 1.05 8.09
Cache Time % 99.9 99.9 99.9 48
Latency µs (run3) 24226 29360 23176 2993
0
5000
10000
15000
20000
25000
30000
35000
0
20000
40000
60000
80000
100000
120000
140000
160000
180000
Ops/Sec
Throughput of 2 consecutive iterations of Workload C (32GB)
20. E 1.1 Cluster CPU% AVG: Bucket Cache (32GB RAM)
20
Baseline
RAM disk (/dev/shm 8GB)
Offheap (4GB)
NVMe (250GB)
Read disk throughput: 2.4GB/s Read disk throughput: 2.8GB/s
Read disk throughput: 2.5GB/s Read disk throughput: 38.GB/s
BC failure
Slowest
21. E1.3: Throughput of the 4 configurations
(Drop OS buffer cache)
Higher is better for Ops (Y1), lower for latency (Y2).
Dropping cache every 10 secs.
21
Results:
• Ops/Sec improve only
with NVMe up to 9X
• RAMd performs 1.43X
better this time
• First run same on baseline
and RAMd
• But RAMd worked
fine
• Having a larger sized BC
improves performance
over RAMd
Baseline BucketCache Offheap BucketCache RAM disk BucketCache NVMe
WorkloadC_run1 22780 30593 32447 126306
WorkloadC_run2 22770 30469 32617 210976
Speeup (run3) 1 1.34 1.43 9.27
Cache Time % -0.1 -0.1 0.5 67
Latency µs (run2) 21924 16375 15293 2361
0
5000
10000
15000
20000
25000
0
50000
100000
150000
200000
250000
Ops/Sec
Throughput of 2 consecutive iterations of Workload C (Drop OS Cache)
23. Benchmark suite: The Yahoo! Cloud Serving Benchmark (YCSB)
• Open source specification and kit, for comparing NoSQL DBs. (since 2010)
• Core workloads:
• A: Update heavy workload
• 50/50 R/W.
• B: Read mostly workload
• 95/5 R/W mix.
• C: Read only
• 100% read.
• D: Read latest workload
• Inserts new records and reads them
• Workload E: Short ranges (Not used, takes too long to run SCAN type)
• Short ranges of records are queried, instead of individual records
• F: Read-modify-write
• read a record, modify it, and write back.
https://github.com/brianfrankcooper/YCSB/wiki/Core-Workloads 23
24. E2.1: Throughput and Speedup ALL
(128GB RAM)
Higher is better 24
Results:
• Datagen same in all
• (write-only)
• Overall: Ops/Sec improve
with BC
• RAMd 14%
• NVMe 37%
• WL D gets higher speedup
with NVMe
• WL F 6% faster on RAMd
than in NVMe
• Need to run more
iterations to see max
improvement
Baseline
BucketCache RAM
disk
BucketCache NVMe
Datagen 68502 67998 65933
WL A 77049 83379 96752
WL B 80966 87788 115713
WL C 89372 99403 132738
WL D 136426 171123 244759
WL F 48699 65223 62496
0
50000
100000
150000
200000
250000
300000
Ops/Sec
Throughput of workloads A-D,F (128GB, 1 iteration)
Baseline
BucketCache RAM
disk
BucketCache NVMe
Speedup Data 1 0.99 0.96
Speedup A 1 1.08 1.26
Speedup B 1 1.08 1.43
Speedup C 1 1.11 1.49
Speedup D 1 1.25 1.79
Speedup F 1 1.34 1.28
Total 1 1.14 1.37
0.8
1
1.2
1.4
1.6
1.8
2
Speedup
Speedup of workloads A-D,F (128GB, 1 iteration)
25. E2.1: Throughput and Speedup ALL
(32GB RAM)
Higher is better 25
Results:
• Datagen slower with the
RAMd (less OS RAM)
Overall: Ops/Sec improve
with BC
• RAMd 17%
• NVMe 87%
• WL C gets higher speedup
with NVMe
• WL F now faster with
NVMe
• Need to run more
iterations to see max
improvement
Baseline
BucketCache RAM
disk
BucketCache NVMe
Speedup Data 1 0.88 0.98
Speedup A 1 1.02 1.37
Speedup B 1 1.31 2.38
Speedup C 1 1.42 2.65
Speedup D 1 1.1 1.5
Speedup F 1 1.27 1.98
Total 1 1.17 1.81
0.8
1
1.2
1.4
1.6
1.8
2
2.2
2.4
2.6
2.8
Speedup
Speedup of workloads A-D,F (128GB, 1 iteration)
Baseline
BucketCache RAM
disk
BucketCache NVMe
Datagen 68821 60819 67737
WL A 55682 56702 76315
WL B 33551 43978 79895
WL C 30631 43420 81245
WL D 85725 94631 128881
WL F 25540 32464 50568
0
20000
40000
60000
80000
100000
120000
140000
Ops/Sec
Throughput of workloads A-D,F (128GB, 1 iteration)
26. E2.1: CPU and Disk for ALL (32GB RAM)
26
Baseline CPU %
NVMe CPU%
Baseline Disk R/w
NVMe Disk R/W
High I/O wait
Read disk throughput: 1.8GB/s
Moderate I/O wait (2x less time)
Read disk throughput: up to 25GB/s
27. E2.1: NET and MEM ALL (32GB RAM)
27
Baseline MEM
NVMe MEM
Baseline NET R/W
NVMe NET R/W
Higher OS cache util
Throughput: 1Gb/s
Lower OS cache util Throughput: up to 2.5 Gb/s
29. Bucket Cache results recap (medium sized WL)
• Full cluster (128GB RAM / node)
• WL-C up to 2.7x speedup (warm cache)
• Full benchmark (CRUD) from 0.3 to 0.9x speedup (cold cache)
• Limiting resources
• 32GB RAM / node
• WL-C NVMe gets up to 8X improvement (warm cache)
• Other techniques failed/poor results
• Full benchmark between 0.4 and 2.7x speedup (cold cache)
• Drop OS cache WL-C
• Up to 9x with NVMe, only < 0.5x with other techniques (warm cache)
• Latency reduces significantly with cached results
• Onheap BC not recommended
• just give more RAM to BlockCache
Denver/Boulder BigData 2017 29
30. Open challenges / Lessons learned
• Generating app level workloads that stresses newer HW
• At acceptable time / cost
• Still need to
• Run micro-benchmarks
• Per DEV, node, cluster
• Large working sets
• > RAM (128GB / Node)
• > NMVe (1.6TB / Node)
• OS buffer cache highly effective
• at least with HDFS and HBase
• Still, a RAM app “L2 cache” is able to speedup
• App level LRU more effective
• YCSB Zipfian distribution (popular records)
• The larger the WL, higher the gains
• Can be simulated by limiting resources or dropping caches effectively
Denver/Boulder BigData 2017 30
8512 8667 8523
9668
0
2000
4000
6000
8000
10000
12000
NVMe JBOD10+NVMe JBOD10 JBOD05
Seconds
Lower is better
Running time of terasort (1TB) under different disks
terasort
teragen
31. To conclude…
• NVMe offers significant BW and Latency improvement over SAS/SATA,
but
• JBODs still perform well for seq R/W
• Also cheaper $/TB
• Big Data apps still designed for rotational (avoid random I/O)
• Full tiered-storage support is missing by Big Data frameworks
• Byte addressable vs. block access
• Research shows improvements
• Need to rely on external tools/file systems
• Alluxio (Tachyon), Triple-H, New file systems (SSDFS), …
• Fast devices speedup, but still caching is the simple use case…
Denver/Boulder BigData 2017 31
32. References
• ALOJA
• http://aloja.bsc.es
• https://github.com/aloja/aloja
• Bucket cache and HBase
• BlockCache (and Bucket Cache 101) http://www.n10k.com/blog/blockcache-101/
• Intel brief on bucket cache:
http://www.intel.com/content/dam/www/public/us/en/documents/solution-briefs/apache-
hbase-block-cache-testing-brief.pdf
• http://www.slideshare.net/larsgeorge/hbase-status-report-hadoop-summit-europe-2014
• HBase performance: http://www.slideshare.net/bijugs/h-base-performance
• Benchmarks
• FIO https://github.com/axboe/fio
• Brian F. Cooper, et. Al. 2010. Benchmarking cloud serving systems with YCSB.
http://dx.doi.org/10.1145/1807128.1807152
Denver/Boulder BigData 2017 32