Cassandra Day Atlanta 2015: Software Development with Apache Cassandra: A Wal...DataStax Academy
Adding a new technology to your development process can be challenging, and the distributed nature of Apache Cassandra can make it daunting. However the drivers, utilities and tooling now available for Apache Cassandra make this process as familiar as possible to developers, with a few minor caveats. After all, it is still a distributed system.
In this presentation, we will do several quick iterations through a simple Java project, demonstrating the following:
• Creating and modifying a data model
• Writing some code working with this model
• Using your local environment for single and multi-node cluster tests
• Integration testing with Jenkins
• Sending it off to production
New and existing users will leave this presentation with the necessary knowledge to make their next Apache Cassandra-based project a success.
This document discusses various techniques for improving Drupal performance and scaling. It covers optimizing hardware resources like RAM and PHP opcode caches. Front-end optimizations include JavaScript and CSS aggregation, caching, and compression. Database engines and using a content delivery network can help. The Pressflow distribution of Drupal is optimized for performance. Monitoring and measuring performance is also important.
This is my talk from the July LVL.UP KL meeting (formerly WebCamp KL) held on August 6th at Mindvalley, Bangsar.
The talk covers a basic introduction to scalability, 5 things to consider/think about and 5 things you can do build at scale.
WebCampKL Group is here - https://www.facebook.com/groups/webcamp/
The video of this talk is available here: http://youtu.be/Djs-8lGpz_U (also added as the 19th slide).
My presentation from Wordconf 2011 about High Performance Wordpress. Covers tuning the whole LAMP stack, some stuff on Wordpress and Caching (both plugins and Varnish).
JEEconf - Nikolas Ischenko - Java embedded why 8 not 11 (one comma was missed)Nikolai Ischenko
1) The document discusses Java versions for use on Raspberry Pi devices, noting that while Java 11 could be used its larger size may not be optimal.
2) It explores using Java 8 instead and demonstrates how to customize the Java runtime using the Liberica toolkit to reduce the size using tools like jrecreate and jlink.
3) Test results show the customized Java 8 runtime has a much smaller footprint than a default Java 11 installation and provides improved startup times using features like Application Class Data Sharing.
OpenNebulaconf2017US: Rapid scaling of research computing to over 70,000 cor...OpenNebula Project
Since 2008, Harvard Research Computing has undertaken a significant scaling challenge increasing their available HPC and storage from 200 cores and 20TB to over 70,000 cores and 35PB of storage. James will discuss the journey and the highlights of extending the computing to support world class research and education. During the evolution of the computing platforms at Harvard they also helped to support and build the Massachusetts Green High Performance Computing Center which is a dedicated high performance research computing facility in Holyoke, MA. This facility continues to support large scale research computing with sustainable energy and advanced networking. Recently the NESE project (New England Storage Exchange) was funded by the National Science Foundation. This is a multi-petabyte object store that is supported by the existing MGHPCC facility supporting the region. The Data Science Initiative at Harvard has also been recently announced and will require even further advanced computation to support their research faculty. Now as the world takes a grip on "cloud" but more importantly remotely provisioned infrastructure, hybrid models for compute and storage are required along with flexibility to be able to further accelerate science. James will discuss their strategy moving forwards and the current and existing infrastructures in place to allow for seamless provisioning of research computing. Justin Riley Team Lead at Harvard, will follow this talk with a deep technical discussion of the specific implementation of the systems that Harvard are designing in concert with the development teams and leadership at OpenNebula to support research computing to make their platforms more resilient and able to continue to scale.
The document discusses optimizing WordPress performance. It recommends minimizing frontend assets like images, implementing caching for assets and application chunks, optimizing themes and plugins, and choosing efficient server setups. Specific plugins like W3 Total Cache and a CDN can improve performance by up to 10 times by caching static content. Nginx is presented as a faster alternative to Apache. Overall, the key takeaways are to simplify code, minimize requests, optimize caching, and reduce payload sizes to improve perceived and actual performance.
OpenNebula provides features for high availability of virtual machines including migrating or recreating VMs if the host fails. It allows grouping VMs together by affinity like VM to host, VM to VM, or role to role. Network migrations can be optimized to use faster interfaces. Reusing VLANs involves collecting used IDs from the database and returning the first free ID. Virtual machines can utilize cgroups for CPU management, pass raw parameters to hypervisors, and use a guest agent for tasks like freezing for consistent snapshots or blocking during backups.
WebLogic Stability; Detect and Analyse Stuck ThreadsMaarten Smeets
Stuck threads are a major cause for stability issues of WebLogic Server environments. Often people in operations and development who are confronted with stuck threads, are at a loss what to do. In this presentation we will talk about what stuck threads actually are and how you can detect them. We will elaborate on how you can get to the root cause of a stuck thread and which tools can help you with that. In order to reduce the impact of having stuck threads in an application, we will talk about using workmanagers. In order to prevent stuck threads we will illustrate several patterns which can be implemented in infrastructure and applications. Next time you see a stuck thread, you will know what to do!
This document summarizes migrating from MySQL replication to Galera Cluster. It describes Galera Cluster as providing synchronous multi-master replication with automatic failover. The migration procedure involves converting an existing asynchronous slave to Galera, building up the Galera Cluster to the desired size, switching readers and writers over to the cluster, and then deactivating the original asynchronous replication. Key benefits of Galera Cluster include strong consistency, high availability, and ability to join new nodes automatically.
Moving mongo db to the cloud strategies and points to considerVinicius M Grippa
Moving to the cloud brings a series of benefits. Flexibility, scalability, automation. But what about the precautions we need to take? This session will cover the main points to take care of before moving your database such as security, performance and options available in the cloud.
What's New in Postgres Plus Advanced Server 9.3EDB
Learn more about EnterpriseDB's Postgres Plus Advanced Server 9.3!
Highlights of Postgres Plus Advanced Server 9.3 include:
Major Partitioning Enhancements
Materialized Views
New RPM packages
New EDB Failover Manager
New capabilities in Postgres Enterprise Manager 4.0
OpenNebulaconf2017US: Configuration management with OpenNebula and Ansible by...OpenNebula Project
OpenNebula provides the infrastructure, the Virtual Machines themselves. But what if you want to deploy applications on those VMs? What is the best method to achieve that? This session is a case study on how we have been using OpenNebula and OneFlow to deploy multi-VM services, and how we have integratet it with Ansible in order to deploy applications.
This is a technical talk where we will look a bit at some Ruby code to integrate with Ansible, a bit of Ansible specific information and with a special focus on application deployment workflows.
This document discusses optimizing Ceph latency through hardware design. It finds that CPU frequency has a significant impact on latency, with higher frequencies resulting in lower latencies. Testing shows 4KB write latency of 2.4ms at 900MHz but 694us at higher frequencies. The document also discusses how CPU power states that wake slowly, like C6 at 85us, can negatively impact latency. Overall it advocates designing hardware with fast CPUs and avoiding slower cores or dual sockets to minimize latency in Ceph deployments.
Sascha Möllering discusses how his company moved from manual server setup and deployment to automated deployments using infrastructure as code and continuous delivery. They now deploy whenever needed using tools like Chef and JBoss to configure servers. Previously they faced challenges like manual processes, difficult rollbacks, and biweekly deployment windows. Now deployments are automated, safer, and can happen continuously.
This document discusses best practices for virtual machines (VMs), storage area networks (SANs), and SQL Server. It provides three "nevers" for VMs: never overallocate virtual CPUs, never use automatic settings, and never assume VMs are alone. It also gives three "always" for SANs: always know your neighbors on the SAN, always test storage performance first with SQLIO before SQL Server, and always be checking performance metrics. Key metrics discussed include processor queue length, SQL Server memory page life expectancy, and physical disk read/write average time. The document emphasizes testing storage, understanding competition from other workloads, and monitoring for subtle performance changes.
Drupal 8 is an even more powerful tool for creating large, fast, capable applications. With architectural improvements, support for Symfony 2, enhanced security, and better mobile integration, Drupal 8 has been eagerly awaited by the worldwide Drupal community.
As your Drupal site traffic grows, you're likely to run up against performance constraints inherent to Apache and Drupal (or any PHP-based framework). In this webinar, we'll show you how to smoothly bypass performance bottlenecks and scale your Drupal site far beyond its current limitations.
Watch the webinar on demand: https://www.nginx.com/resources/webinars/drupal-8-performance/
The document provides an overview of NoSQL databases, discussing Brewer's CAP theorem and the key aspects of availability, partition tolerance, and consistency. It then describes different types of NoSQL databases, including key-value stores, document stores, and column stores. Code examples and links to further resources on MongoDB, CouchDB, SimpleDB, and Azure Table Service are also included.
We will show the advantages of having a geo-distributed database cluster and how to create one using Galera Cluster for MySQL. We will also discuss the configuration and status variables that are involved and how to deal with typical situations on the WAN such as slow, untrusted or unreliable links, latency and packet loss. We will demonstrate a multi-region cluster on Amazon EC2 and perform some throughput and latency measurements in real-time (video http://galeracluster.com/videos/using-galera-replication-to-create-geo-distributed-clusters-on-the-wan-webinar-video-3/)
Ceph Benchmarking Tool (CBT) is a Python framework for benchmarking Ceph clusters. It has client and monitor personalities for generating load and setting up the cluster. CBT includes benchmarks for RADOS operations, librbd, KRBD on EXT4, KVM with RBD volumes, and COSBench tests against RGW. Test plans are defined in YAML files and results are archived for later analysis using tools like awk, grep, and gnuplot.
The document discusses best practices for Galera Cluster, a synchronous multi-master replication solution for MySQL/InnoDB. It covers topics like dealing with conflicts in a multi-master setup, performing state transfers when adding new nodes, different backup methods that assign a global transaction ID, and techniques for upgrading schemas in a clustered environment.
Integrating Puppet with Cloud Infrastructures-Remco OverdijkMaxServ
This document discusses automating cloud infrastructure using Puppet. It begins by describing issues with traditional single server infrastructure like limited scalability and redundancy. It then introduces using tools like AWS, Puppet, and Terraform to provision infrastructure in the cloud with improved scalability, isolation, and zero-downtime deployments. It discusses using Puppet and Terraform to define and provision AWS resources declaratively. It also covers bootstrapping Puppet onto new instances using techniques like autosigning, ENCs, Hiera lookups, AWS user data, and Cloud-init to automate configuration. The document concludes with a demonstration of provisioning a stack of web servers on AWS using Terraform and Puppet.
WordPress + NGINX Best Practices with EasyEngineNGINX, Inc.
Whether for speed, security or scalability, a WordPress site can be improved using NGINX.
View full webinar on-demand at: http://nginx.com/resources/webinars/taste-nginx-conf-wordpress-nginx-best-practices-easyengine/
This document discusses different versions of popular open-source SQL databases and how to install and configure MySQL. It lists versions of MySQL, MariaDB, Percona, and XtraDB Cluster and how to download, install, start, and connect to MySQL. It also shows how to install MySQL using Debian packages or RPMs, how to view server configuration settings, and how to set permissions to allow remote root connections.
These were the opening slides used in the all-day Selenium Grid Workshop, given by Marcus Merrell and Manoj Kumar on November 14, 2016 at the 2016 London Selenium Conference
C* Summit 2013 - Hindsight is 20/20. MySQL to Cassandra by Michael KjellmanDataStax Academy
Abstract A brief intro to how Barracuda Networks uses Cassandra and the ways in which they are replacing their MySQL infrastructure, with Cassandra. This presentation will include the lessons they've learned along the way during this migration.
UKOUG, Lies, Damn Lies and I/O StatisticsKyle Hailey
1. Many factors can cause storage performance anomalies that make benchmarking difficult. Caching, shared infrastructure, I/O consolidation and fragmentation, and tiered storage are some of the top issues.
2. It is important to use real workloads, capture latency histograms rather than just averages, ensure results are reproducible, and run tests long enough to reach steady state.
3. Proper testing methodology is required to accurately characterize storage performance and avoid anomalies. Tools like FIO can help simulate real workloads.
This document discusses scaling NoSQL databases and describes how Google BigTable and Amazon Dynamo influenced the development of NoSQL databases. It explains BigTable's column family data model and how it allows for multi-dimensional scaling and schema-free sparse columns. The document also notes that BigTable and Dynamo helped drive the need to consider data modeling and query requirements when designing NoSQL databases.
Slides from OOW13
The optimizer must try to be all things to all people, and similarly, the collection of optimizer statistics must try to satisfy the needs of all. And many DBA's just leave it at that. But the optimizer offers so much more than that. With a little more effort and discipline, we can achieve much more than a "one-size-fits-all" policy, and maximize the benefit of all of the optimizer features. We'll look at the tools now available under DBMS_STATS to get more stability and better performance with optimizer statistics.
WebLogic Stability; Detect and Analyse Stuck ThreadsMaarten Smeets
Stuck threads are a major cause for stability issues of WebLogic Server environments. Often people in operations and development who are confronted with stuck threads, are at a loss what to do. In this presentation we will talk about what stuck threads actually are and how you can detect them. We will elaborate on how you can get to the root cause of a stuck thread and which tools can help you with that. In order to reduce the impact of having stuck threads in an application, we will talk about using workmanagers. In order to prevent stuck threads we will illustrate several patterns which can be implemented in infrastructure and applications. Next time you see a stuck thread, you will know what to do!
This document summarizes migrating from MySQL replication to Galera Cluster. It describes Galera Cluster as providing synchronous multi-master replication with automatic failover. The migration procedure involves converting an existing asynchronous slave to Galera, building up the Galera Cluster to the desired size, switching readers and writers over to the cluster, and then deactivating the original asynchronous replication. Key benefits of Galera Cluster include strong consistency, high availability, and ability to join new nodes automatically.
Moving mongo db to the cloud strategies and points to considerVinicius M Grippa
Moving to the cloud brings a series of benefits. Flexibility, scalability, automation. But what about the precautions we need to take? This session will cover the main points to take care of before moving your database such as security, performance and options available in the cloud.
What's New in Postgres Plus Advanced Server 9.3EDB
Learn more about EnterpriseDB's Postgres Plus Advanced Server 9.3!
Highlights of Postgres Plus Advanced Server 9.3 include:
Major Partitioning Enhancements
Materialized Views
New RPM packages
New EDB Failover Manager
New capabilities in Postgres Enterprise Manager 4.0
OpenNebulaconf2017US: Configuration management with OpenNebula and Ansible by...OpenNebula Project
OpenNebula provides the infrastructure, the Virtual Machines themselves. But what if you want to deploy applications on those VMs? What is the best method to achieve that? This session is a case study on how we have been using OpenNebula and OneFlow to deploy multi-VM services, and how we have integratet it with Ansible in order to deploy applications.
This is a technical talk where we will look a bit at some Ruby code to integrate with Ansible, a bit of Ansible specific information and with a special focus on application deployment workflows.
This document discusses optimizing Ceph latency through hardware design. It finds that CPU frequency has a significant impact on latency, with higher frequencies resulting in lower latencies. Testing shows 4KB write latency of 2.4ms at 900MHz but 694us at higher frequencies. The document also discusses how CPU power states that wake slowly, like C6 at 85us, can negatively impact latency. Overall it advocates designing hardware with fast CPUs and avoiding slower cores or dual sockets to minimize latency in Ceph deployments.
Sascha Möllering discusses how his company moved from manual server setup and deployment to automated deployments using infrastructure as code and continuous delivery. They now deploy whenever needed using tools like Chef and JBoss to configure servers. Previously they faced challenges like manual processes, difficult rollbacks, and biweekly deployment windows. Now deployments are automated, safer, and can happen continuously.
This document discusses best practices for virtual machines (VMs), storage area networks (SANs), and SQL Server. It provides three "nevers" for VMs: never overallocate virtual CPUs, never use automatic settings, and never assume VMs are alone. It also gives three "always" for SANs: always know your neighbors on the SAN, always test storage performance first with SQLIO before SQL Server, and always be checking performance metrics. Key metrics discussed include processor queue length, SQL Server memory page life expectancy, and physical disk read/write average time. The document emphasizes testing storage, understanding competition from other workloads, and monitoring for subtle performance changes.
Drupal 8 is an even more powerful tool for creating large, fast, capable applications. With architectural improvements, support for Symfony 2, enhanced security, and better mobile integration, Drupal 8 has been eagerly awaited by the worldwide Drupal community.
As your Drupal site traffic grows, you're likely to run up against performance constraints inherent to Apache and Drupal (or any PHP-based framework). In this webinar, we'll show you how to smoothly bypass performance bottlenecks and scale your Drupal site far beyond its current limitations.
Watch the webinar on demand: https://www.nginx.com/resources/webinars/drupal-8-performance/
The document provides an overview of NoSQL databases, discussing Brewer's CAP theorem and the key aspects of availability, partition tolerance, and consistency. It then describes different types of NoSQL databases, including key-value stores, document stores, and column stores. Code examples and links to further resources on MongoDB, CouchDB, SimpleDB, and Azure Table Service are also included.
We will show the advantages of having a geo-distributed database cluster and how to create one using Galera Cluster for MySQL. We will also discuss the configuration and status variables that are involved and how to deal with typical situations on the WAN such as slow, untrusted or unreliable links, latency and packet loss. We will demonstrate a multi-region cluster on Amazon EC2 and perform some throughput and latency measurements in real-time (video http://galeracluster.com/videos/using-galera-replication-to-create-geo-distributed-clusters-on-the-wan-webinar-video-3/)
Ceph Benchmarking Tool (CBT) is a Python framework for benchmarking Ceph clusters. It has client and monitor personalities for generating load and setting up the cluster. CBT includes benchmarks for RADOS operations, librbd, KRBD on EXT4, KVM with RBD volumes, and COSBench tests against RGW. Test plans are defined in YAML files and results are archived for later analysis using tools like awk, grep, and gnuplot.
The document discusses best practices for Galera Cluster, a synchronous multi-master replication solution for MySQL/InnoDB. It covers topics like dealing with conflicts in a multi-master setup, performing state transfers when adding new nodes, different backup methods that assign a global transaction ID, and techniques for upgrading schemas in a clustered environment.
Integrating Puppet with Cloud Infrastructures-Remco OverdijkMaxServ
This document discusses automating cloud infrastructure using Puppet. It begins by describing issues with traditional single server infrastructure like limited scalability and redundancy. It then introduces using tools like AWS, Puppet, and Terraform to provision infrastructure in the cloud with improved scalability, isolation, and zero-downtime deployments. It discusses using Puppet and Terraform to define and provision AWS resources declaratively. It also covers bootstrapping Puppet onto new instances using techniques like autosigning, ENCs, Hiera lookups, AWS user data, and Cloud-init to automate configuration. The document concludes with a demonstration of provisioning a stack of web servers on AWS using Terraform and Puppet.
WordPress + NGINX Best Practices with EasyEngineNGINX, Inc.
Whether for speed, security or scalability, a WordPress site can be improved using NGINX.
View full webinar on-demand at: http://nginx.com/resources/webinars/taste-nginx-conf-wordpress-nginx-best-practices-easyengine/
This document discusses different versions of popular open-source SQL databases and how to install and configure MySQL. It lists versions of MySQL, MariaDB, Percona, and XtraDB Cluster and how to download, install, start, and connect to MySQL. It also shows how to install MySQL using Debian packages or RPMs, how to view server configuration settings, and how to set permissions to allow remote root connections.
These were the opening slides used in the all-day Selenium Grid Workshop, given by Marcus Merrell and Manoj Kumar on November 14, 2016 at the 2016 London Selenium Conference
C* Summit 2013 - Hindsight is 20/20. MySQL to Cassandra by Michael KjellmanDataStax Academy
Abstract A brief intro to how Barracuda Networks uses Cassandra and the ways in which they are replacing their MySQL infrastructure, with Cassandra. This presentation will include the lessons they've learned along the way during this migration.
UKOUG, Lies, Damn Lies and I/O StatisticsKyle Hailey
1. Many factors can cause storage performance anomalies that make benchmarking difficult. Caching, shared infrastructure, I/O consolidation and fragmentation, and tiered storage are some of the top issues.
2. It is important to use real workloads, capture latency histograms rather than just averages, ensure results are reproducible, and run tests long enough to reach steady state.
3. Proper testing methodology is required to accurately characterize storage performance and avoid anomalies. Tools like FIO can help simulate real workloads.
This document discusses scaling NoSQL databases and describes how Google BigTable and Amazon Dynamo influenced the development of NoSQL databases. It explains BigTable's column family data model and how it allows for multi-dimensional scaling and schema-free sparse columns. The document also notes that BigTable and Dynamo helped drive the need to consider data modeling and query requirements when designing NoSQL databases.
Slides from OOW13
The optimizer must try to be all things to all people, and similarly, the collection of optimizer statistics must try to satisfy the needs of all. And many DBA's just leave it at that. But the optimizer offers so much more than that. With a little more effort and discipline, we can achieve much more than a "one-size-fits-all" policy, and maximize the benefit of all of the optimizer features. We'll look at the tools now available under DBMS_STATS to get more stability and better performance with optimizer statistics.
Wes McKinney gave a talk at the 2015 Open Data Science Conference about data frames and the state of data frame interfaces across different languages and libraries. He discussed the challenges of collaboration between different data frame communities due to the tight coupling of user interfaces, data representations, and computation engines in current data frame implementations. McKinney predicted that over time these components would decouple and specialize, improving code sharing across languages.
Slides from OpenWorld 2013 presentation.
Analytics have been there since 8.1.6, but they are still dramatically underused by application developers. This session looks at the syntax and usage of analytic functions, and how they can supercharge your SQL skillset.
How to find and fix your Oracle application performance problemCary Millsap
How long does your code take to run? Is it changing? When it is slow, WHY is it slow? Is it your fault, or somebody else's? Can you prove it? How much faster could your code be? Do you know how to measure the performance of your code as user workloads and data volumes increase? These are fundamental questions about performance, but the vast majority of Oracle application developers can't answer them. The most popular performance tools available to them—and to the database administrators that run their code in production—are incapable of answering any of these questions. But the Oracle Database can give you exactly what you need to answer these questions and many more. You can know exactly where YOUR CODE is spending YOUR TIME. This session explains how.
An introduction to data virtualization in business intelligenceDavid Walker
A brief description of what Data Virtualisation is and how it can be used to support business intelligence applications and development. Originally presented to the ETIS Conference in Riga, Latvia in October 2013
This document summarizes a presentation about Apache Jackrabbit Oak and its use of MongoDB for content storage. It discusses how Jackrabbit Oak uses multi-version concurrency control and copy-on-write to allow concurrent access to different document versions. Transactions are implemented by assigning commit roots and checking for collisions during commit. Properties are stored with revision IDs to allow retrieval of prior versions.
Python as part of a production machine learning stack by Michael Manapat PyDa...PyData
Over the course of three years, we've built Stripe from scratch and scaled it to process billions of dollars of transaction volume a year by making it easy and painless for merchants to get set up and start accepting payments. While the vast majority of transactions facilitated by Stripe are honest, we do need to protect our merchants from rogue individuals and groups seeing to "test" or "cash" stolen credit cards. To combat this sort of activity, Stripe uses Python (together with Scala and Ruby) as part of its production machine learning pipeline to detect and block fraud in real time. In this talk, I'll go through the scikit-based modeling process for a sample data set that is derived from production data to illustrate how we train and validate our models. We'll also walk through how we deploy the models and monitor them in our production environment and how Python has allowed us to do this at scale.
The lightning talks covered various Netflix OSS projects including S3mper, PigPen, STAASH, Dynomite, Aegisthus, Suro, Zeno, Lipstick on GCE, AnsWerS, and IBM. 41 projects were discussed and the need for a cohesive Netflix OSS platform was highlighted. Matt Bookman then gave a presentation on running Lipstick and Hadoop on Google Cloud Platform using Google Compute Engine and Cloud Storage. He demonstrated running Pig jobs on Compute Engine and discussed design considerations for cloud-based Hadoop deployments. Finally, Peter Sankauskas from @Answers4AWS discussed initial ideas around CloudFormation for Asgard and deploying various Netflix OSS
This document provides an overview of a SQL-on-Hadoop tutorial. It introduces the presenters and discusses why SQL is important for Hadoop, as MapReduce is not optimal for all use cases. It also notes that while the database community knows how to efficiently process data, SQL-on-Hadoop systems face challenges due to the limitations of running on top of HDFS and Hadoop ecosystems. The tutorial outline covers SQL-on-Hadoop technologies like storage formats, runtime engines, and query optimization.
Cassandra Community Webinar: MySQL to Cassandra - What I Wish I'd KnownDataStax
A brief intro to how Barracuda Networks uses Cassandra and the ways in which they are replacing their MySQL infrastructure, with Cassandra. This presentation will include the lessons they've learned along the way during this migration.
Speaker: Michael Kjellman, Software Engineer at Barracuda Networks
Michael Kjellman is a Software Engineer, from San Francisco, working at Barracuda Networks. Michael works across multiple products, technologies, and languages. He primarily works on Barracuda's spam infrastructure and web filter classification data.
Caching for Performance Masterclass: The In-Memory DatastoreScyllaDB
Understanding where in-memory data stores help most and where teams get into trouble.
- Where in the stack to cache
- Memcached as a tool
- Modern cache primitives
uCluster (micro-Cluster) is a toy computer cluster composed of 3 Raspberry Pi boards, 2 NVIDIA Jetson Nano boards and 1 NVIDIA Jetson TX2 board.
The presentation shows how to build the uCluster and focuses on few interesting technologies for further consideration when building a cluster at any scale.
The project is for educational purposes and tinkering with various technologies.
Scylla Summit 2018: Make Scylla Fast Again! Find out how using Tools, Talent,...ScyllaDB
Scylla strives to deliver high throughput at low, consistent latencies under any scenario. But in the field things can and do get slower than one would like. Some of those issues come from bad data modelling and anti-patterns. Some others from lack of resources and bad system configuration, and in rare cases even product malfunction.
But how to tell them apart? And once you do, how to understand how to fix your application or reconfigure your system? Scylla has a rich ecosystem of tools available to answer those questions and in this talk we’ll discuss the proper use of some of them and how to take advantage of each tool’s strength. We will discuss real examples using tools like CQL tracing, nodetool commands, the Scylla monitor and others.
Watch the replay: http://cs.co/9000DCie4
In today’s digital economy, getting ahead means crunching a lot of data. That’s why businesses of all sizes and industries are investing in high-performance computing. However, the last thing IT needs is another tech silo to manage.
Fortunately, the new Cisco UCS C4200 Series chassis and C125 M5 server node help you scale out compute-intensive workloads with ease—with the network fabric you already have. This TechWiseTV Workshop will get you up to speed fast.
Resources:
Watch the related TechWiseTV episode: http://cs.co/9006DAVPC
TechWiseTV: http://cs.co/9009DzrjN
Caches are used in many layers of applications that we develop today, holding data inside or outside of your runtime environment, or even distributed across multiple platforms in data fabrics. However, considerable performance gains can often be realized by configuring the deployment platform/environment and coding your application to take advantage of the properties of CPU caches.
In this talk, we will explore what CPU caches are, how they work and how to measure your JVM-based application data usage to utilize them for maximum efficiency. We will discuss the future of CPU caches in a many-core world, as well as advancements that will soon arrive such as HP's Memristor.
This document discusses streaming SIMD extensions (SSE) and how to use SIMD instructions to boost program performance. It defines SSE as a set of CPU instructions for applications like signal processing that use single instruction, multiple data (SIMD) parallelism. The document outlines what SSE is, the advantages of SIMD, how to identify if an application can benefit from SSE, different SSE versions, coding methods like assembly and intrinsics, and references for further information.
CUDA 6.0 provides performance improvements and new features for several CUDA libraries and tools. Key updates include up to 2x faster kernel launches, new cuFFT and cuBLAS features for multi-GPU support, up to 700 GFLOPS performance from cuFFT, over 3 TFLOPS from cuBLAS, and 5x faster cuSPARSE performance compared to MKL. New features also improve the performance of cuRAND, NPP, and Thrust.
Feature Store Evolution Under Cost Constraints: When Cost is Part of the Arch...ScyllaDB
At P99 CONF 23, ShareChat tackled scaling its ML Feature Store to 1B features/sec—then had to cut costs while maintaining SLAs. Ivan & David share how they optimized compute, reduced waste in Kubernetes, and tackled autoscaling for Apache Flink. It's geared for anyone interested in ML Feature Store and/or cloud cost optimizations.
A Dataflow Processing Chip for Training Deep Neural Networksinside-BigData.com
In this deck from the Hot Chips conference, Chris Nicol from Wave Computing presents: A Dataflow Processing Chip for Training Deep Neural Networks.
Watch the video: https://wp.me/p3RLHQ-k6W
Learn more: https://wavecomp.ai/
and
http://www.hotchips.org/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Ceph Day Beijing - Ceph all-flash array design based on NUMA architectureCeph Community
This document discusses an all-flash Ceph array design from QCT based on NUMA architecture. It provides an agenda that covers all-flash Ceph and use cases, QCT's all-flash Ceph solution for IOPS, an overview of QCT's lab environment and detailed architecture, and the importance of NUMA. It also includes sections on why all-flash storage is used, different all-flash Ceph use cases, QCT's IOPS-optimized all-flash Ceph solution, benefits of using NVMe storage, and techniques for configuring and optimizing all-flash Ceph performance.
Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA ArchitectureDanielle Womboldt
This document discusses an all-flash Ceph array design from QCT based on NUMA architecture. It provides an agenda that covers all-flash Ceph and use cases, QCT's all-flash Ceph solution for IOPS, an overview of QCT's lab environment and detailed architecture, and the importance of NUMA. It also includes sections on why all-flash storage is used, different all-flash Ceph use cases, QCT's IOPS-optimized all-flash Ceph solution, benefits of using NVMe storage, QCT's lab test environment, Ceph tuning recommendations, and benefits of using multi-partitioned NVMe SSDs for Ceph OSDs.
Using Apache Spark to analyze large datasets in the cloud presents a range of challenges. Different stages of your pipeline may be constrained by CPU, memory, disk and/or network IO. But what if all those stages have to run on the same cluster? In the cloud, you have limited control over the hardware your cluster runs on.
You may have even less control over the size and format of your raw input files. Performance tuning is an iterative and experimental process. It’s frustrating with very large datasets: what worked great with 30 billion rows may not work at all with 400 billion rows. But with strategic optimizations and compromises, 50+ TiB datasets can be no big deal.
By using Spark UI and simple metrics, explore how to diagnose and remedy issues on jobs:
Sizing the cluster based on your dataset (shuffle partitions)
Ingestion challenges – well begun is half done (globbing S3, small files)
Managing memory (sorting GC – when to go parallel, when to go G1, when offheap can help you)
Shuffle (give a little to get a lot – configs for better out of box shuffle) – Spill (partitioning for the win)
Scheduling (FAIR vs FIFO, is there a difference for your pipeline?)
Caching and persistence (it’s the cost of doing business, so what are your options?)
Fault tolerance (blacklisting, speculation, task reaping)
Making the best of a bad deal (skew joins, windowing, UDFs, very large query plans)
Writing to S3 (dealing with write partitions, HDFS and s3DistCp vs writing directly to S3)
Presented at Spark+AI Summit Europe 2019
https://databricks.com/session_eu19/apache-spark-at-scale-in-the-cloud
Using Apache Spark to analyze large datasets in the cloud presents a range of challenges. Different stages of your pipeline may be constrained by CPU, memory, disk and/or network IO. But what if all those stages have to run on the same cluster? In the cloud, you have limited control over the hardware your cluster runs on.
You may have even less control over the size and format of your raw input files. Performance tuning is an iterative and experimental process. It’s frustrating with very large datasets: what worked great with 30 billion rows may not work at all with 400 billion rows. But with strategic optimizations and compromises, 50+ TiB datasets can be no big deal.
By using Spark UI and simple metrics, explore how to diagnose and remedy issues on jobs:
Sizing the cluster based on your dataset (shuffle partitions)
Ingestion challenges – well begun is half done (globbing S3, small files)
Managing memory (sorting GC – when to go parallel, when to go G1, when offheap can help you)
Shuffle (give a little to get a lot – configs for better out of box shuffle) – Spill (partitioning for the win)
Scheduling (FAIR vs FIFO, is there a difference for your pipeline?)
Caching and persistence (it’s the cost of doing business, so what are your options?)
Fault tolerance (blacklisting, speculation, task reaping)
Making the best of a bad deal (skew joins, windowing, UDFs, very large query plans)
iland Internet Solutions: Leveraging Cassandra for real-time multi-datacenter...DataStax Academy
iland has built a global data warehouse across multiple data centers, collecting and aggregating data from core cloud services including compute, storage and network as well as chargeback and compliance. iland's warehouse brings actionable intelligence that customers can use to manipulate resources, analyze trends, define alerts and share information.
In this session, we would like to present the lessons learned around Cassandra, both at the development and operations level, but also the technology and architecture we put in action on top of Cassandra such as Redis, syslog-ng, RabbitMQ, Java EE, etc.
Finally, we would like to share insights on how we are currently extending our platform with Spark and Kafka and what our motivations are.
Leveraging Cassandra for real-time multi-datacenter public cloud analyticsJulien Anguenot
iland has built a global data warehouse across multiple data centers, collecting and aggregating data from core cloud services including compute, storage and network as well as chargeback and compliance. iland's warehouse brings actionable intelligence that customers can use to manipulate resources, analyze trends, define alerts and share information.
In this session, we would like to present the lessons learned around Cassandra, both at the development and operations level, but also the technology and architecture we put in action on top of Cassandra such as Redis, syslog-ng, RabbitMQ, Java EE, etc.
Finally, we would like to share insights on how we are currently extending our platform with Spark and Kafka and what our motivations are.
Title: Sista: Improving Cog’s JIT performance
Speaker: Clément Béra
Thu, August 21, 9:45am – 10:30am
Video Part1
https://www.youtube.com/watch?v=X4E_FoLysJg
Video Part2
https://www.youtube.com/watch?v=gZOk3qojoVE
Description
Abstract: Although recent improvements of the Cog VM performance made it one of the fastest available Smalltalk virtual machine, the overhead compared to optimized C code remains important. Efficient industrial object oriented virtual machine, such as Javascript V8's engine for Google Chrome and Oracle Java Hotspot can reach on many benchs the performance of optimized C code thanks to adaptive optimizations performed their JIT compilers. The VM becomes then cleverer, and after executing numerous times the same portion of codes, it stops the code execution, looks at what it is doing and recompiles critical portion of codes in code faster to run based on the current environment and previous executions.
Bio: Clément Béra and Eliot Miranda has been working together on Cog's JIT performance for the last year. Clément Béra is a young engineer and has been working in the Pharo team for the past two years. Eliot Miranda is a Smalltalk VM expert who, among others, has implemented Cog's JIT and the Spur Memory Manager for Cog.
Quantum Computing Quick Research Guide by Arthur MorganArthur Morgan
This is a Quick Research Guide (QRG).
QRGs include the following:
- A brief, high-level overview of the QRG topic.
- A milestone timeline for the QRG topic.
- Links to various free online resource materials to provide a deeper dive into the QRG topic.
- Conclusion and a recommendation for at least two books available in the SJPL system on the QRG topic.
QRGs planned for the series:
- Artificial Intelligence QRG
- Quantum Computing QRG
- Big Data Analytics QRG
- Spacecraft Guidance, Navigation & Control QRG (coming 2026)
- UK Home Computing & The Birth of ARM QRG (coming 2027)
Any questions or comments?
- Please contact Arthur Morgan at [email protected].
100% human made.
Role of Data Annotation Services in AI-Powered ManufacturingAndrew Leo
From predictive maintenance to robotic automation, AI is driving the future of manufacturing. But without high-quality annotated data, even the smartest models fall short.
Discover how data annotation services are powering accuracy, safety, and efficiency in AI-driven manufacturing systems.
Precision in data labeling = Precision on the production floor.
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptxAnoop Ashok
In today's fast-paced retail environment, efficiency is key. Every minute counts, and every penny matters. One tool that can significantly boost your store's efficiency is a well-executed planogram. These visual merchandising blueprints not only enhance store layouts but also save time and money in the process.
Special Meetup Edition - TDX Bengaluru Meetup #52.pptxshyamraj55
We’re bringing the TDX energy to our community with 2 power-packed sessions:
🛠️ Workshop: MuleSoft for Agentforce
Explore the new version of our hands-on workshop featuring the latest Topic Center and API Catalog updates.
📄 Talk: Power Up Document Processing
Dive into smart automation with MuleSoft IDP, NLP, and Einstein AI for intelligent document workflows.
Mobile App Development Company in Saudi ArabiaSteve Jonas
EmizenTech is a globally recognized software development company, proudly serving businesses since 2013. With over 11+ years of industry experience and a team of 200+ skilled professionals, we have successfully delivered 1200+ projects across various sectors. As a leading Mobile App Development Company In Saudi Arabia we offer end-to-end solutions for iOS, Android, and cross-platform applications. Our apps are known for their user-friendly interfaces, scalability, high performance, and strong security features. We tailor each mobile application to meet the unique needs of different industries, ensuring a seamless user experience. EmizenTech is committed to turning your vision into a powerful digital product that drives growth, innovation, and long-term success in the competitive mobile landscape of Saudi Arabia.
Technology Trends in 2025: AI and Big Data AnalyticsInData Labs
At InData Labs, we have been keeping an ear to the ground, looking out for AI-enabled digital transformation trends coming our way in 2025. Our report will provide a look into the technology landscape of the future, including:
-Artificial Intelligence Market Overview
-Strategies for AI Adoption in 2025
-Anticipated drivers of AI adoption and transformative technologies
-Benefits of AI and Big data for your business
-Tips on how to prepare your business for innovation
-AI and data privacy: Strategies for securing data privacy in AI models, etc.
Download your free copy nowand implement the key findings to improve your business.
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep DiveScyllaDB
Want to learn practical tips for designing systems that can scale efficiently without compromising speed?
Join us for a workshop where we’ll address these challenges head-on and explore how to architect low-latency systems using Rust. During this free interactive workshop oriented for developers, engineers, and architects, we’ll cover how Rust’s unique language features and the Tokio async runtime enable high-performance application development.
As you explore key principles of designing low-latency systems with Rust, you will learn how to:
- Create and compile a real-world app with Rust
- Connect the application to ScyllaDB (NoSQL data store)
- Negotiate tradeoffs related to data modeling and querying
- Manage and monitor the database for consistently low latencies
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager APIUiPathCommunity
Join this UiPath Community Berlin meetup to explore the Orchestrator API, Swagger interface, and the Test Manager API. Learn how to leverage these tools to streamline automation, enhance testing, and integrate more efficiently with UiPath. Perfect for developers, testers, and automation enthusiasts!
📕 Agenda
Welcome & Introductions
Orchestrator API Overview
Exploring the Swagger Interface
Test Manager API Highlights
Streamlining Automation & Testing with APIs (Demo)
Q&A and Open Discussion
Perfect for developers, testers, and automation enthusiasts!
👉 Join our UiPath Community Berlin chapter: https://community.uipath.com/berlin/
This session streamed live on April 29, 2025, 18:00 CET.
Check out all our upcoming UiPath Community sessions at https://community.uipath.com/events/.
Andrew Marnell: Transforming Business Strategy Through Data-Driven InsightsAndrew Marnell
With expertise in data architecture, performance tracking, and revenue forecasting, Andrew Marnell plays a vital role in aligning business strategies with data insights. Andrew Marnell’s ability to lead cross-functional teams ensures businesses achieve sustainable growth and operational excellence.
Linux Support for SMARC: How Toradex Empowers Embedded DevelopersToradex
Toradex brings robust Linux support to SMARC (Smart Mobility Architecture), ensuring high performance and long-term reliability for embedded applications. Here’s how:
• Optimized Torizon OS & Yocto Support – Toradex provides Torizon OS, a Debian-based easy-to-use platform, and Yocto BSPs for customized Linux images on SMARC modules.
• Seamless Integration with i.MX 8M Plus and i.MX 95 – Toradex SMARC solutions leverage NXP’s i.MX 8 M Plus and i.MX 95 SoCs, delivering power efficiency and AI-ready performance.
• Secure and Reliable – With Secure Boot, over-the-air (OTA) updates, and LTS kernel support, Toradex ensures industrial-grade security and longevity.
• Containerized Workflows for AI & IoT – Support for Docker, ROS, and real-time Linux enables scalable AI, ML, and IoT applications.
• Strong Ecosystem & Developer Support – Toradex offers comprehensive documentation, developer tools, and dedicated support, accelerating time-to-market.
With Toradex’s Linux support for SMARC, developers get a scalable, secure, and high-performance solution for industrial, medical, and AI-driven applications.
Do you have a specific project or application in mind where you're considering SMARC? We can help with Free Compatibility Check and help you with quick time-to-market
For more information: https://www.toradex.com/computer-on-modules/smarc-arm-family
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025BookNet Canada
Book industry standards are evolving rapidly. In the first part of this session, we’ll share an overview of key developments from 2024 and the early months of 2025. Then, BookNet’s resident standards expert, Tom Richardson, and CEO, Lauren Stewart, have a forward-looking conversation about what’s next.
Link to recording, transcript, and accompanying resource: https://bnctechforum.ca/sessions/standardsgoals-for-2025-standards-certification-roundup/
Presented by BookNet Canada on May 6, 2025 with support from the Department of Canadian Heritage.
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...SOFTTECHHUB
I started my online journey with several hosting services before stumbling upon Ai EngineHost. At first, the idea of paying one fee and getting lifetime access seemed too good to pass up. The platform is built on reliable US-based servers, ensuring your projects run at high speeds and remain safe. Let me take you step by step through its benefits and features as I explain why this hosting solution is a perfect fit for digital entrepreneurs.
This is the keynote of the Into the Box conference, highlighting the release of the BoxLang JVM language, its key enhancements, and its vision for the future.
HCL Nomad Web – Best Practices and Managing Multiuser Environmentspanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-nomad-web-best-practices-and-managing-multiuser-environments/
HCL Nomad Web is heralded as the next generation of the HCL Notes client, offering numerous advantages such as eliminating the need for packaging, distribution, and installation. Nomad Web client upgrades will be installed “automatically” in the background. This significantly reduces the administrative footprint compared to traditional HCL Notes clients. However, troubleshooting issues in Nomad Web present unique challenges compared to the Notes client.
Join Christoph and Marc as they demonstrate how to simplify the troubleshooting process in HCL Nomad Web, ensuring a smoother and more efficient user experience.
In this webinar, we will explore effective strategies for diagnosing and resolving common problems in HCL Nomad Web, including
- Accessing the console
- Locating and interpreting log files
- Accessing the data folder within the browser’s cache (using OPFS)
- Understand the difference between single- and multi-user scenarios
- Utilizing Client Clocking
6. How do I keep my graphs pretty during
a C* upgrade?
September 18th 2013
7. Make a C* Build
$> git clone http://git-wip-
us.apache.org/repos/asf/cassandra.git
$> git checkout –t origin/cassandra-1.2
$> git log
$> vim build.xml (change version number every
time you make a build!)
$> ant clean release
8. Deployment
• Make release
• Test release with CCM
• Push release to Puppet (deals with config, etc)
• Run controlled and scripted rolling restart one datacenter
at a time
– flush
– stop
– start
– validate node
10. So, why not just
apt-get install cassandra?
• Makes running a custom release in the future a
complete nightmare
• Lost visibility into changes in the release
• WHY are you upgrading
• Treat a C* build just as if it was a release of your
code. What commits did you put into your own
release?
11. MY CODE DOESN’T WORK WITHOUT A
STABLE C* CLUSTER
Simply Put:
12. When things go wrong
• Every commit (those by C* committers or my
own) come with potential bugs and regressions
• Gossip Bugs Can Bite Hard:
– CASSANDRA-5665: Gossiper.handleMajorStateChange
can lose existing node ApplicationState
• At 48 nodes, even small mistakes are massive
13. Writing your code to deal with node
failure
• Upgrading a C* cluster means constant node
failures for the duration of the rolling restart
• How does your code deal with read latency and
retries
– CASSANDRA-4705: Eager Retries for reads for 2.0+
• The mythical “constantly failing” code != stability.
– Handle exceptions (and node/read failures) gracefully!
14. Why treat C* like your own code
• Using C* will move much of your own
application logic to C*
• The bugs have to go somewhere!
• Data replication at database layer or at
application layer