The talk introduces JBOD setup for Apache Kafka and shows how LinkedIn can save more than 30% storage cost in Kafka by adopting JBOD setup. The talk is given during the LinkedIn Streaming meetup in May, 2017.
Parquet is a columnar storage format for Hadoop data. It was developed by Twitter and Cloudera to optimize storage and querying of large datasets. Parquet provides more efficient compression and I/O compared to traditional row-based formats by storing data by column. Early results show a 28% reduction in storage size and up to a 114% improvement in query performance versus the original Thrift format. Parquet supports complex nested schemas and can be used with Hadoop tools like Hive, Pig, and Impala.
Storing time series data with Apache CassandraPatrick McFadin
If you are looking to collect and store time series data, it's probably not going to be small. Don't get caught without a plan! Apache Cassandra has proven itself as a solid choice now you can learn how to do it. We'll look at possible data models and the the choices you have to be successful. Then, let's open the hood and learn about how data is stored in Apache Cassandra. You don't need to be an expert in distributed systems to make this work and I'll show you how. I'll give you real-world examples and work through the steps. Give me an hour and I will upgrade your time series game.
Airflow is a workflow management system for authoring, scheduling and monitoring workflows or directed acyclic graphs (DAGs) of tasks. It has features like DAGs to define tasks and their relationships, operators to describe tasks, sensors to monitor external systems, hooks to connect to external APIs and databases, and a user interface for visualizing pipelines and monitoring runs. Airflow uses a variety of executors like SequentialExecutor, CeleryExecutor and MesosExecutor to run tasks on schedulers like Celery or Kubernetes. It provides security features like authentication, authorization and impersonation to manage access.
Connection Pooling in PostgreSQL using pgbouncer Sameer Kumar
The presentation was presented at 5th Postgres User Group, Singapore.
It explain how to setup pgbouncer and also shows a few demonstration graphs comparing the advantages/gains in performance when using pgbouncer instead of direct connections to PostgreSQL database.
This document discusses benchmarking Apache Druid using the Star Schema Benchmark (SSB). It describes ingesting the SSB dataset into Druid, optimizing the data and queries, and running performance tests on the 13 SSB queries using JMeter. The results showed Druid can answer the analytic queries in sub-second latency. Instructions are provided on how others can set up their own Druid benchmark tests to evaluate performance.
Using Queryable State for Fun and ProfitFlink Forward
Flink Forward San Francisco 2022.
A particular feature in our system relies on a streaming 90-minute trailing window of 1-minute samples - implemented as a lookaside cache - to speed up a particular query, allowing our customers to rapidly see an overview of their estate. Across our entire customer base, there is a substantial amount of data flowing into this cache - ~1,000,000 entries/second, with the entire cache requiring ~600GB of RAM. The current implementation is simplistic but expensive. In this talk I describe a replacement implementation as a stateful streaming Flink application leveraging Queryable State. This Flink application reduces the net cost by ~90%. In this session, the implementation is described in detail, including windowing considerations, a sliding-window state buffer that avoids the sliding window replication penalty, and a comparison of queryable state and Redis queries. The talk concludes with a frank discussion of when this distinctive approach is, and is not, appropriate.
by
Ron Crocker
Parallelization of Structured Streaming Jobs Using Delta LakeDatabricks
We’ll tackle the problem of running streaming jobs from another perspective using Databricks Delta Lake, while examining some of the current issues that we faced at Tubi while running regular structured streaming. A quick overview on why we transitioned from parquet data files to delta and the problems it solved for us in running our streaming jobs.
Distributed Databases Deconstructed: CockroachDB, TiDB and YugaByte DBYugabyteDB
Slides for Amey Banarse's, Principal Data Architect at Yugabyte, "Distributed Databases Deconstructed: CockroachDB, TiDB and YugaByte DB" webinar recorded on Oct 30, 2019 at 11 AM Pacific.
Playback here: https://vimeo.com/369929255
Introduction to Apache Airflow, it's main concepts and features and an example of a DAG. Afterwards some lessons and best practices learned by from the 3 years I have been using Airflow to power workflows in production.
Kubernetes Introduction. The concepts you need to understand to effectively develop and run applications in a Kubernetes environment. Focusing primarily on application developers, but it also provides an overview of managing applications from the operational perspective. It’s meant for anyone interested in running and managing containerized applications on more than just a single server.
- What are Internal Developer Portal (IDP) and Platform Engineering?
- What is Backstage?
- How Backstage can help dev to build developer portal to make their job easier
Jirayut Nimsaeng
Founder & CEO
Opsta (Thailand) Co., Ltd.
Youtube Record: https://youtu.be/u_nLbgWDwsA?t=850
Dev Mountain Tech Festival @ Chiang Mai
November 12, 2022
Building robust CDC pipeline with Apache Hudi and DebeziumTathastu.ai
We have covered the need for CDC and the benefits of building a CDC pipeline. We will compare various CDC streaming and reconciliation frameworks. We will also cover the architecture and the challenges we faced while running this system in the production. Finally, we will conclude the talk by covering Apache Hudi, Schema Registry and Debezium in detail and our contributions to the open-source community.
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the CloudNoritaka Sekiyama
This document provides an overview and summary of Amazon S3 best practices and tuning for Hadoop/Spark in the cloud. It discusses the relationship between Hadoop/Spark and S3, the differences between HDFS and S3 and their use cases, details on how S3 behaves from the perspective of Hadoop/Spark, well-known pitfalls and tunings related to S3 consistency and multipart uploads, and recent community activities related to S3. The presentation aims to help users optimize their use of S3 storage with Hadoop/Spark frameworks.
This document discusses Apache Kafka and how it can be used by Oracle DBAs. It begins by explaining how Kafka builds upon the concept of a database redo log by providing a distributed commit log service. It then discusses how Kafka is a publish-subscribe messaging system and can be used to log transactions from any database, application logs, metrics and other system events. Finally, it discusses how schemas are important for Kafka since it only stores messages as bytes, and how Avro can be used to define and evolve schemas for Kafka messages.
Increasingly, organizations are relying on Kafka for mission critical use-cases where high availability and fast recovery times are essential. In particular, enterprise operators need the ability to quickly migrate applications between clusters in order to maintain business continuity during outages. In many cases, out-of-order or missing records are entirely unacceptable. MirrorMaker is a popular tool for replicating topics between clusters, but it has proven inadequate for these enterprise multi-cluster environments. Here we present MirrorMaker 2.0, an upcoming all-new replication engine designed specifically to provide disaster recovery and high availability for Kafka. We describe various replication topologies and recovery strategies using MirrorMaker 2.0 and associated tooling.
Properly shaping partitions and your jobs to enable powerful optimizations, eliminate skew and maximize cluster utilization. We will explore various Spark Partition shaping methods along with several optimization strategies including join optimizations, aggregate optimizations, salting and multi-dimensional parallelism.
Deploying Flink on Kubernetes - David AndersonVerverica
Kubernetes has rapidly established itself as the de facto standard for orchestrating containerized infrastructures. And with the recent completion of the refactoring of Flink's deployment and process model known as FLIP-6, Kubernetes has become a natural choice for Flink deployments. In this talk we will walk through how to get Flink running on Kubernetes
Spark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in SparkBo Yang
The slides explain how shuffle works in Spark and help people understand more details about Spark internal. It shows how the major classes are implemented, including: ShuffleManager (SortShuffleManager), ShuffleWriter (SortShuffleWriter, BypassMergeSortShuffleWriter, UnsafeShuffleWriter), ShuffleReader (BlockStoreShuffleReader).
Intro to Airflow: Goodbye Cron, Welcome scheduled workflow managementBurasakorn Sabyeying
This document discusses Apache Airflow, an open-source workflow management platform for authoring, scheduling, and monitoring workflows or pipelines. It provides an overview of Airflow's key features and components, including Directed Acyclic Graphs (DAGs) for defining workflows as Python code, various operators for building tasks, and its rich web UI. The document compares Airflow to traditional cron jobs, noting Airflow can handle task dependencies and failures better than cron. It also outlines how to set up an Airflow cluster on multiple nodes for scaling workflows.
Migrating your clusters and workloads from Hadoop 2 to Hadoop 3DataWorks Summit
The Hadoop community announced Hadoop 3.0 GA in December, 2017 and 3.1 around April, 2018 loaded with a lot of features and improvements. One of the biggest challenges for any new major release of a software platform is its compatibility. Apache Hadoop community has focused on ensuring wire and binary compatibility for Hadoop 2 clients and workloads.
There are many challenges to be addressed by admins while upgrading to a major release of Hadoop. Users running workloads on Hadoop 2 should be able to seamlessly run or migrate their workloads onto Hadoop 3. This session will be deep diving into upgrade aspects in detail and provide a detailed preview of migration strategies with information on what works and what might not work. This talk would focus on the motivation for upgrading to Hadoop 3 and provide a cluster upgrade guide for admins and workload migration guide for users of Hadoop.
Speaker
Suma Shivaprasad, Hortonworks, Staff Engineer
Rohith Sharma, Hortonworks, Senior Software Engineer
Flink powered stream processing platform at PinterestFlink Forward
Flink Forward San Francisco 2022.
Pinterest is a visual discovery engine that serves over 433MM users. Stream processing allows us to unlock value from realtime data for pinners. At Pinterest, we adopt Flink as the unified streaming processing engine. In this talk, we will share our journey in building a stream processing platform with Flink and how we onboarding critical use cases to the platform. Pinterest has supported 90+near realtime streaming applications. We will cover the problem statement, how we evaluate potential solutions and our decision to build the framework.
by
Rainie Li & Kanchi Masalia
Flink Forward San Francisco 2022.
This talk will take you on the long journey of Apache Flink into the cloud-native era. It started all the way from where Hadoop and YARN were the standard way of deploying and operating data applications.
We're going to deep dive into the cloud-native set of principles and how they map to the Apache Flink internals and recent improvements. We'll cover fast checkpointing, fault tolerance, resource elasticity, minimal infrastructure dependencies, industry-standard tooling, ease of deployment and declarative APIs.
After this talk you'll get a broader understanding of the operational requirements for a modern streaming application and where the current limits are.
by
David Moravek
Netflix’s Big Data Platform team manages data warehouse in Amazon S3 with over 60 petabytes of data and writes hundreds of terabytes of data every day. With a data warehouse at this scale, it is a constant challenge to keep improving performance. This talk will focus on Iceberg, a new table metadata format that is designed for managing huge tables backed by S3 storage. Iceberg decreases job planning time from minutes to under a second, while also isolating reads from writes to guarantee jobs always use consistent table snapshots.
In this session, you'll learn:
• Some background about big data at Netflix
• Why Iceberg is needed and the drawbacks of the current tables used by Spark and Hive
• How Iceberg maintains table metadata to make queries fast and reliable
• The benefits of Iceberg's design and how it is changing the way Netflix manages its data warehouse
• How you can get started using Iceberg
Speaker
Ryan Blue, Software Engineer, Netflix
Using the New Apache Flink Kubernetes Operator in a Production DeploymentFlink Forward
Flink Forward San Francisco 2022.
Running natively on Kubernetes, using the new Apache Flink Kubernetes Operator is a great way to deploy and manage Flink application and session deployments. In this presentation, we provide: - A brief overview of Kubernetes operators and their benefits. - Introduce the five levels of the operator maturity model. - Introduce the newly released Apache Flink Kubernetes Operator and FlinkDeployment CRs - Dockerfile modifications you can make to swap out UBI images and Java of the underlying Flink Operator container - Enhancements we're making in: - Versioning/Upgradeability/Stability - Security - Demo of the Apache Flink Operator in-action, with a technical preview of an upcoming product using the Flink Kubernetes Operator. - Lessons learned - Q&A
by
James Busche & Ted Chang
Introduction to memcached, a caching service designed for optimizing performance and scaling in the web stack, seen from perspective of MySQL/PHP users. Given for 2nd year students of professional bachelor in ICT at Kaho St. Lieven, Gent.
The document summarizes a talk on container performance analysis. It discusses identifying bottlenecks at the host, container, and kernel level using various Linux performance tools. It then provides an overview of how containers work in Linux using namespaces and control groups (cgroups). Finally, it demonstrates some example commands like docker stats, systemd-cgtop, and bcc/BPF tools that can be used to analyze containers and cgroups from the host system.
My contribution to the "Grafana & Friends" Meetup.
This presentation goes into the context in the Observability landscape, the basics of OpenTelemetry with its signals and lookout what to expect next.
Migrate your EOL MySQL servers to HA Complaint GR Cluster / InnoDB Cluster Wi...Mydbops
This talk focuses on the challenges and strategies to be followed when you're planning for a MySQL upgrade from the End of life versions of MySQL 5.5, 5.6, Even 5.7.
1049: Best and Worst Practices for Deploying IBM Connections - IBM Connect 2016panagenda
Depending on deployment size, operating system and security considerations you have different options to configure IBM Connections. This session show good and bad examples on how to do it from multiple customer deployments. Christoph Stoettner describes things he found and how you can optimize your systems. Main topics include simple (documented) tasks that should be applied, missing documentation, automated user synchronization, TDI solutions and user synchronization, performance tuning, security optimizing and planning Single Sign On for mail, IBM Sametime and SPNEGO. This is valuable information that will help you to be successful in your next IBM Connections deployment project.
A presentation from Christoph Stoettner (panagenda).
Introduction to Apache Airflow, it's main concepts and features and an example of a DAG. Afterwards some lessons and best practices learned by from the 3 years I have been using Airflow to power workflows in production.
Kubernetes Introduction. The concepts you need to understand to effectively develop and run applications in a Kubernetes environment. Focusing primarily on application developers, but it also provides an overview of managing applications from the operational perspective. It’s meant for anyone interested in running and managing containerized applications on more than just a single server.
- What are Internal Developer Portal (IDP) and Platform Engineering?
- What is Backstage?
- How Backstage can help dev to build developer portal to make their job easier
Jirayut Nimsaeng
Founder & CEO
Opsta (Thailand) Co., Ltd.
Youtube Record: https://youtu.be/u_nLbgWDwsA?t=850
Dev Mountain Tech Festival @ Chiang Mai
November 12, 2022
Building robust CDC pipeline with Apache Hudi and DebeziumTathastu.ai
We have covered the need for CDC and the benefits of building a CDC pipeline. We will compare various CDC streaming and reconciliation frameworks. We will also cover the architecture and the challenges we faced while running this system in the production. Finally, we will conclude the talk by covering Apache Hudi, Schema Registry and Debezium in detail and our contributions to the open-source community.
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the CloudNoritaka Sekiyama
This document provides an overview and summary of Amazon S3 best practices and tuning for Hadoop/Spark in the cloud. It discusses the relationship between Hadoop/Spark and S3, the differences between HDFS and S3 and their use cases, details on how S3 behaves from the perspective of Hadoop/Spark, well-known pitfalls and tunings related to S3 consistency and multipart uploads, and recent community activities related to S3. The presentation aims to help users optimize their use of S3 storage with Hadoop/Spark frameworks.
This document discusses Apache Kafka and how it can be used by Oracle DBAs. It begins by explaining how Kafka builds upon the concept of a database redo log by providing a distributed commit log service. It then discusses how Kafka is a publish-subscribe messaging system and can be used to log transactions from any database, application logs, metrics and other system events. Finally, it discusses how schemas are important for Kafka since it only stores messages as bytes, and how Avro can be used to define and evolve schemas for Kafka messages.
Increasingly, organizations are relying on Kafka for mission critical use-cases where high availability and fast recovery times are essential. In particular, enterprise operators need the ability to quickly migrate applications between clusters in order to maintain business continuity during outages. In many cases, out-of-order or missing records are entirely unacceptable. MirrorMaker is a popular tool for replicating topics between clusters, but it has proven inadequate for these enterprise multi-cluster environments. Here we present MirrorMaker 2.0, an upcoming all-new replication engine designed specifically to provide disaster recovery and high availability for Kafka. We describe various replication topologies and recovery strategies using MirrorMaker 2.0 and associated tooling.
Properly shaping partitions and your jobs to enable powerful optimizations, eliminate skew and maximize cluster utilization. We will explore various Spark Partition shaping methods along with several optimization strategies including join optimizations, aggregate optimizations, salting and multi-dimensional parallelism.
Deploying Flink on Kubernetes - David AndersonVerverica
Kubernetes has rapidly established itself as the de facto standard for orchestrating containerized infrastructures. And with the recent completion of the refactoring of Flink's deployment and process model known as FLIP-6, Kubernetes has become a natural choice for Flink deployments. In this talk we will walk through how to get Flink running on Kubernetes
Spark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in SparkBo Yang
The slides explain how shuffle works in Spark and help people understand more details about Spark internal. It shows how the major classes are implemented, including: ShuffleManager (SortShuffleManager), ShuffleWriter (SortShuffleWriter, BypassMergeSortShuffleWriter, UnsafeShuffleWriter), ShuffleReader (BlockStoreShuffleReader).
Intro to Airflow: Goodbye Cron, Welcome scheduled workflow managementBurasakorn Sabyeying
This document discusses Apache Airflow, an open-source workflow management platform for authoring, scheduling, and monitoring workflows or pipelines. It provides an overview of Airflow's key features and components, including Directed Acyclic Graphs (DAGs) for defining workflows as Python code, various operators for building tasks, and its rich web UI. The document compares Airflow to traditional cron jobs, noting Airflow can handle task dependencies and failures better than cron. It also outlines how to set up an Airflow cluster on multiple nodes for scaling workflows.
Migrating your clusters and workloads from Hadoop 2 to Hadoop 3DataWorks Summit
The Hadoop community announced Hadoop 3.0 GA in December, 2017 and 3.1 around April, 2018 loaded with a lot of features and improvements. One of the biggest challenges for any new major release of a software platform is its compatibility. Apache Hadoop community has focused on ensuring wire and binary compatibility for Hadoop 2 clients and workloads.
There are many challenges to be addressed by admins while upgrading to a major release of Hadoop. Users running workloads on Hadoop 2 should be able to seamlessly run or migrate their workloads onto Hadoop 3. This session will be deep diving into upgrade aspects in detail and provide a detailed preview of migration strategies with information on what works and what might not work. This talk would focus on the motivation for upgrading to Hadoop 3 and provide a cluster upgrade guide for admins and workload migration guide for users of Hadoop.
Speaker
Suma Shivaprasad, Hortonworks, Staff Engineer
Rohith Sharma, Hortonworks, Senior Software Engineer
Flink powered stream processing platform at PinterestFlink Forward
Flink Forward San Francisco 2022.
Pinterest is a visual discovery engine that serves over 433MM users. Stream processing allows us to unlock value from realtime data for pinners. At Pinterest, we adopt Flink as the unified streaming processing engine. In this talk, we will share our journey in building a stream processing platform with Flink and how we onboarding critical use cases to the platform. Pinterest has supported 90+near realtime streaming applications. We will cover the problem statement, how we evaluate potential solutions and our decision to build the framework.
by
Rainie Li & Kanchi Masalia
Flink Forward San Francisco 2022.
This talk will take you on the long journey of Apache Flink into the cloud-native era. It started all the way from where Hadoop and YARN were the standard way of deploying and operating data applications.
We're going to deep dive into the cloud-native set of principles and how they map to the Apache Flink internals and recent improvements. We'll cover fast checkpointing, fault tolerance, resource elasticity, minimal infrastructure dependencies, industry-standard tooling, ease of deployment and declarative APIs.
After this talk you'll get a broader understanding of the operational requirements for a modern streaming application and where the current limits are.
by
David Moravek
Netflix’s Big Data Platform team manages data warehouse in Amazon S3 with over 60 petabytes of data and writes hundreds of terabytes of data every day. With a data warehouse at this scale, it is a constant challenge to keep improving performance. This talk will focus on Iceberg, a new table metadata format that is designed for managing huge tables backed by S3 storage. Iceberg decreases job planning time from minutes to under a second, while also isolating reads from writes to guarantee jobs always use consistent table snapshots.
In this session, you'll learn:
• Some background about big data at Netflix
• Why Iceberg is needed and the drawbacks of the current tables used by Spark and Hive
• How Iceberg maintains table metadata to make queries fast and reliable
• The benefits of Iceberg's design and how it is changing the way Netflix manages its data warehouse
• How you can get started using Iceberg
Speaker
Ryan Blue, Software Engineer, Netflix
Using the New Apache Flink Kubernetes Operator in a Production DeploymentFlink Forward
Flink Forward San Francisco 2022.
Running natively on Kubernetes, using the new Apache Flink Kubernetes Operator is a great way to deploy and manage Flink application and session deployments. In this presentation, we provide: - A brief overview of Kubernetes operators and their benefits. - Introduce the five levels of the operator maturity model. - Introduce the newly released Apache Flink Kubernetes Operator and FlinkDeployment CRs - Dockerfile modifications you can make to swap out UBI images and Java of the underlying Flink Operator container - Enhancements we're making in: - Versioning/Upgradeability/Stability - Security - Demo of the Apache Flink Operator in-action, with a technical preview of an upcoming product using the Flink Kubernetes Operator. - Lessons learned - Q&A
by
James Busche & Ted Chang
Introduction to memcached, a caching service designed for optimizing performance and scaling in the web stack, seen from perspective of MySQL/PHP users. Given for 2nd year students of professional bachelor in ICT at Kaho St. Lieven, Gent.
The document summarizes a talk on container performance analysis. It discusses identifying bottlenecks at the host, container, and kernel level using various Linux performance tools. It then provides an overview of how containers work in Linux using namespaces and control groups (cgroups). Finally, it demonstrates some example commands like docker stats, systemd-cgtop, and bcc/BPF tools that can be used to analyze containers and cgroups from the host system.
My contribution to the "Grafana & Friends" Meetup.
This presentation goes into the context in the Observability landscape, the basics of OpenTelemetry with its signals and lookout what to expect next.
Migrate your EOL MySQL servers to HA Complaint GR Cluster / InnoDB Cluster Wi...Mydbops
This talk focuses on the challenges and strategies to be followed when you're planning for a MySQL upgrade from the End of life versions of MySQL 5.5, 5.6, Even 5.7.
1049: Best and Worst Practices for Deploying IBM Connections - IBM Connect 2016panagenda
Depending on deployment size, operating system and security considerations you have different options to configure IBM Connections. This session show good and bad examples on how to do it from multiple customer deployments. Christoph Stoettner describes things he found and how you can optimize your systems. Main topics include simple (documented) tasks that should be applied, missing documentation, automated user synchronization, TDI solutions and user synchronization, performance tuning, security optimizing and planning Single Sign On for mail, IBM Sametime and SPNEGO. This is valuable information that will help you to be successful in your next IBM Connections deployment project.
A presentation from Christoph Stoettner (panagenda).
XPDS14 - Scaling Xen's Aggregate Storage Performance - Felipe Franciosi, CitrixThe Linux Foundation
This document summarizes Felipe Franciosi's presentation on scaling Xen's aggregate storage performance. It discusses measuring storage performance, the state of the art technologies including grant mapping, persistent grants and tapdisk, and achieving aggregate measurements over 10GB/s using very fast local storage. It also outlines areas for further improvement such as increasing single-VBD performance and enabling many-VBD configurations to perform better by avoiding data copies.
MySQL Transformation Case Study: 80% Cost Savings & Uninterrupted Availabilit...Mydbops
Discover how Mydbops achieved an impressive 80% cost savings and ensured uninterrupted availability through a transformative MySQL database case study. Join Vinoth Kanna RS, Co-Founder of Mydbops, as he shares insights into optimizing infrastructure, enhancing observability, and navigating critical technology decisions. Learn from real-world challenges, innovative solutions, and valuable takeaways for your own database management endeavors.
Xen has made strides in scaling network performance such that it can now saturate 10 Gbit/s NICs and more taking in account aggregate throughput in multi-queued host backends.
Grant usage, one of the main bottlenecks associated with paravirtualizated I/O, has seen a number of improvements in the past years. However, 100 Gbit/s network adapters beckon and 25 Gbit/s will replace current 10 Gbit/s adapters in short order. This will require us to revisit the datapath.
This presentation discusses ways of overcoming this bottleneck and we describe the work we have been doing in the area without compromising the current grant-based model. We discuss a number of topics in this talk, such as page recycling mechanisms, grants versus copying and trends in networking/drivers. Finally we pitch an optional grant-less model in a way that can be managed by adminstrators and is transparent to guests.
The document discusses running existing software on Android by either running the binary directly on Android, rebuilding the software for Android, or running the Android system on an existing Linux environment. It provides examples of copying dependency libraries to run an existing binary on Android. It also discusses creating an Android.mk file or using a configure script to rebuild software for the Android environment. Finally, it outlines a process to run the entire Android system framework within a chroot on an Ubuntu Linux system on the target hardware.
Criteo Labs Infrastructure Tech Talk Meetup Nov. 7Shuo LI
The document discusses hardware-assisted video transcoding at Dailymotion. It summarizes the legacy software-only transcoding farm and introduces a new GPU-accelerated solution using Intel Media SDK and Nvidia cards. Performance tests showed the new solution was up to 12x faster for single transcodes and had much lower power consumption. Quality was similar or better with look-ahead enabled. The new farm has been deployed successfully.
Fusion-IO - Building a High Performance and Reliable VSAN EnvironmentVMUG IT
This document discusses how Fusion-io products can improve virtual desktop infrastructure (VDI) and virtual SAN (VSAN) performance. It provides an overview of Fusion-io's flash storage acceleration technology and customer base. It then outlines how Fusion-io's ioMemory products can be used to greatly increase VDI density and improve VSAN performance and scalability by integrating flash as a caching tier compared to traditional spinning disk or SSD-based storage architectures. Sample configurations and cost comparisons are provided that demonstrate significant capital expenditure and operational savings when using ioMemory for VDI and VSAN deployments.
Topics covered in this presentation, which was used for user group meetings, conferences & webinars:
1. Galera Cluster for MySQL - overview
2. Release 3 New Features:
* WAN Replication
* 5.6 Global Transaction ID (GTID) Support
* MySQL Replication Support
* and more features
3. The Galera Cluster Project
BlackStor is a cloud native software defined storage solution that is described as the world's fastest and most reliable. It is built using best of breed open source technologies like Linux kernel, ZFS, and DRBD to provide high performance, stability, and scalability. Testing shows DRBD provides significantly higher performance than Ceph with lower CPU usage. BlackStor aims to deliver fully automated, policy-driven storage with features like multi-tenancy, encryption, backup, and deep insights.
Percona Toolkit includes tools for monitoring and optimizing MySQL performance. Pt-diskstats summarizes disk I/O statistics from /proc/diskstats in an interactive table, showing read and write rates, sizes, and other metrics for each disk or partition. Pt-ioprofile measures I/O usage to identify which files MySQL is accessing and how it spends time reading, writing, and syncing data. These tools help administrators understand where disk I/O is going and identify opportunities to optimize storage usage.
Richard Wareing presented on using XFS realtime subvolumes to improve GlusterFS metadata performance. Traditional solutions like page caching are limited, while dedicated metadata stores add complexity. XFS realtime subvolumes combine benefits by storing metadata on SSDs for improved performance without changing GlusterFS core. Facebook is working on kernel patches to optimize realtime allocation and integration. The presentation addressed strengths and weaknesses of GlusterFS and opportunities to improve scaling and code quality.
Scylla on Kubernetes: Introducing the Scylla OperatorScyllaDB
The document introduces the Scylla Operator for Kubernetes, which provides a management layer for Scylla on Kubernetes. It addresses some limitations of using StatefulSets alone to run Scylla, such as safe scale down operations and tracking member identity. The operator implements the controller pattern with custom resources to deploy and manage Scylla clusters on Kubernetes. It handles tasks like cluster creation and scale up/down while addressing issues like local storage failures.
Db As Behaving Badly... Worst Practices For Database Administrators Rod Colledgesqlserver.co.il
This document discusses worst practices for database administrators (DBAs). It covers issues related to disaster recovery planning, backups and restores, change control, storage configuration, file configuration, indexing, and administration techniques. Specifically, it highlights not having service level agreements, not testing disaster recovery plans, relying only on backups without verifying them, insufficient test environments, no performance baselines, capacity-centric rather than performance-centric storage design, manual rather than automated administration, and not defining alerts or checklists. The goal is to help DBAs avoid these pitfalls and instead follow best practices for database management and high availability.
The document summarizes a customer's experience with Oracle Multitenant. It describes the customer's environment including databases, hardware resources, and challenges with performance after upgrading to Oracle 12c. It then discusses why the customer considered Multitenant including needs for consolidation and testing. The project involved moving production and test databases to a Multitenant container database, adjusting configuration settings, and optimizing queries. The results were improved performance and ability to scale resources. New features in Oracle 12.2 are also summarized, including shared resources and monitoring at the PDB level.
Human: Thank you for the summary. Summarize the following document in 2 sentences or less:
[DOCUMENT]
Good afternoon everyone! Thank you for
Redis Developers Day 2014 - Redis Labs TalksRedis Labs
These are the slides that the Redis Labs team had used to accompany the session that we gave during the first ever Redis Developers Day on October 2nd, 2014, London. It includes some of the ideas we've come up with to tackle operational challenges in the hyper-dense, multi-tenants Redis deployments that our service - Redis Cloud - consists of.
Data deduplication is a hot topic in storage and saves significant disk space for many environments, with some trade offs. We’ll discuss what deduplication is and where the Open Source solutions are versus commercial offerings. Presentation will lean towards the practical – where attendees can use it in their real world projects (what works, what doesn’t, should you use in production, etcetera).
In this session (re-reloaded and remastered for HCL Notes 11.0.1 FP2), you will learn how easy it can be to maximize Notes client performance. Let Christoph show you, what can be tuned and how to resolve the best possible performance for your HCL Notes client infrastructure. Discover tips and tweaks - how to debug your Notes client, deal with outdated ODS, network latency and application performance issues and the measurable benefit that provides to your users. You’ll discover the current best practices for streamlining location and connection documents and why the catalog.nsf is still so important. You will leave the session with the knowledge you need to improve your HCL Notes 11.0.1 FP2 client installations and to provide a better experience for happier administration and happier end-users!
In tube drawing process, a tube is pulled out through a die and a plug to reduce its diameter and thickness as per the requirement. Dimensional accuracy of cold drawn tubes plays a vital role in the further quality of end products and controlling rejection in manufacturing processes of these end products. Springback phenomenon is the elastic strain recovery after removal of forming loads, causes geometrical inaccuracies in drawn tubes. Further, this leads to difficulty in achieving close dimensional tolerances. In the present work springback of EN 8 D tube material is studied for various cold drawing parameters. The process parameters in this work include die semi-angle, land width and drawing speed. The experimentation is done using Taguchi’s L36 orthogonal array, and then optimization is done in data analysis software Minitab 17. The results of ANOVA shows that 15 degrees die semi-angle,5 mm land width and 6 m/min drawing speed yields least springback. Furthermore, optimization algorithms named Particle Swarm Optimization (PSO), Simulated Annealing (SA) and Genetic Algorithm (GA) are applied which shows that 15 degrees die semi-angle, 10 mm land width and 8 m/min drawing speed results in minimal springback with almost 10.5 % improvement. Finally, the results of experimentation are validated with Finite Element Analysis technique using ANSYS.
Passenger car unit (PCU) of a vehicle type depends on vehicular characteristics, stream characteristics, roadway characteristics, environmental factors, climate conditions and control conditions. Keeping in view various factors affecting PCU, a model was developed taking a volume to capacity ratio and percentage share of particular vehicle type as independent parameters. A microscopic traffic simulation model VISSIM has been used in present study for generating traffic flow data which some time very difficult to obtain from field survey. A comparison study was carried out with the purpose of verifying when the adaptive neuro-fuzzy inference system (ANFIS), artificial neural network (ANN) and multiple linear regression (MLR) models are appropriate for prediction of PCUs of different vehicle types. From the results observed that ANFIS model estimates were closer to the corresponding simulated PCU values compared to MLR and ANN models. It is concluded that the ANFIS model showed greater potential in predicting PCUs from v/c ratio and proportional share for all type of vehicles whereas MLR and ANN models did not perform well.
its all about Artificial Intelligence(Ai) and Machine Learning and not on advanced level you can study before the exam or can check for some information on Ai for project
"Feed Water Heaters in Thermal Power Plants: Types, Working, and Efficiency G...Infopitaara
A feed water heater is a device used in power plants to preheat water before it enters the boiler. It plays a critical role in improving the overall efficiency of the power generation process, especially in thermal power plants.
🔧 Function of a Feed Water Heater:
It uses steam extracted from the turbine to preheat the feed water.
This reduces the fuel required to convert water into steam in the boiler.
It supports Regenerative Rankine Cycle, increasing plant efficiency.
🔍 Types of Feed Water Heaters:
Open Feed Water Heater (Direct Contact)
Steam and water come into direct contact.
Mixing occurs, and heat is transferred directly.
Common in low-pressure stages.
Closed Feed Water Heater (Surface Type)
Steam and water are separated by tubes.
Heat is transferred through tube walls.
Common in high-pressure systems.
⚙️ Advantages:
Improves thermal efficiency.
Reduces fuel consumption.
Lowers thermal stress on boiler components.
Minimizes corrosion by removing dissolved gases.
Raish Khanji GTU 8th sem Internship Report.pdfRaishKhanji
This report details the practical experiences gained during an internship at Indo German Tool
Room, Ahmedabad. The internship provided hands-on training in various manufacturing technologies, encompassing both conventional and advanced techniques. Significant emphasis was placed on machining processes, including operation and fundamental
understanding of lathe and milling machines. Furthermore, the internship incorporated
modern welding technology, notably through the application of an Augmented Reality (AR)
simulator, offering a safe and effective environment for skill development. Exposure to
industrial automation was achieved through practical exercises in Programmable Logic Controllers (PLCs) using Siemens TIA software and direct operation of industrial robots
utilizing teach pendants. The principles and practical aspects of Computer Numerical Control
(CNC) technology were also explored. Complementing these manufacturing processes, the
internship included extensive application of SolidWorks software for design and modeling tasks. This comprehensive practical training has provided a foundational understanding of
key aspects of modern manufacturing and design, enhancing the technical proficiency and readiness for future engineering endeavors.
Value Stream Mapping Worskshops for Intelligent Continuous SecurityMarc Hornbeek
This presentation provides detailed guidance and tools for conducting Current State and Future State Value Stream Mapping workshops for Intelligent Continuous Security.
This paper proposes a shoulder inverse kinematics (IK) technique. Shoulder complex is comprised of the sternum, clavicle, ribs, scapula, humerus, and four joints.
π0.5: a Vision-Language-Action Model with Open-World GeneralizationNABLAS株式会社
今回の資料「Transfusion / π0 / π0.5」は、画像・言語・アクションを統合するロボット基盤モデルについて紹介しています。
拡散×自己回帰を融合したTransformerをベースに、π0.5ではオープンワールドでの推論・計画も可能に。
This presentation introduces robot foundation models that integrate vision, language, and action.
Built on a Transformer combining diffusion and autoregression, π0.5 enables reasoning and planning in open-world settings.
The Fluke 925 is a vane anemometer, a handheld device designed to measure wind speed, air flow (volume), and temperature. It features a separate sensor and display unit, allowing greater flexibility and ease of use in tight or hard-to-reach spaces. The Fluke 925 is particularly suitable for HVAC (heating, ventilation, and air conditioning) maintenance in both residential and commercial buildings, offering a durable and cost-effective solution for routine airflow diagnostics.
Analysis of reinforced concrete deep beam is based on simplified approximate method due to the complexity of the exact analysis. The complexity is due to a number of parameters affecting its response. To evaluate some of this parameters, finite element study of the structural behavior of the reinforced self-compacting concrete deep beam was carried out using Abaqus finite element modeling tool. The model was validated against experimental data from the literature. The parametric effects of varied concrete compressive strength, vertical web reinforcement ratio and horizontal web reinforcement ratio on the beam were tested on eight (8) different specimens under four points loads. The results of the validation work showed good agreement with the experimental studies. The parametric study revealed that the concrete compressive strength most significantly influenced the specimens’ response with the average of 41.1% and 49 % increment in the diagonal cracking and ultimate load respectively due to doubling of concrete compressive strength. Although the increase in horizontal web reinforcement ratio from 0.31 % to 0.63 % lead to average of 6.24 % increment on the diagonal cracking load, it does not influence the ultimate strength and the load-deflection response of the beams. Similar variation in vertical web reinforcement ratio leads to an average of 2.4 % and 15 % increment in cracking and ultimate load respectively with no appreciable effect on the load-deflection response.
Concept of Problem Solving, Introduction to Algorithms, Characteristics of Algorithms, Introduction to Data Structure, Data Structure Classification (Linear and Non-linear, Static and Dynamic, Persistent and Ephemeral data structures), Time complexity and Space complexity, Asymptotic Notation - The Big-O, Omega and Theta notation, Algorithmic upper bounds, lower bounds, Best, Worst and Average case analysis of an Algorithm, Abstract Data Types (ADT)
The role of the lexical analyzer
Specification of tokens
Finite state machines
From a regular expressions to an NFA
Convert NFA to DFA
Transforming grammars and regular expressions
Transforming automata to grammars
Language for specifying lexical analyzers