Spencer Christensen
There are many aspects to managing an RDBMS. Some of these are handled by an experienced DBA, but there are a good many things that any sys admin should be able to take care of if they know what to look for.
This presentation will cover basics of managing Postgres, including creating database clusters, overview of configuration, and logging. We will also look at tools to help monitor Postgres and keep an eye on what is going on. Some of the tools we will review are:
* pgtop
* pg_top
* pgfouine
* check_postgres.pl.
Check_postgres.pl is a great tool that can plug into your Nagios or Cacti monitoring systems, giving you even better visibility into your databases.
Kafka Summit SF 2017 - Kafka Connect Best Practices – Advice from the Fieldconfluent
This document discusses best practices for using Apache Kafka Connect. It begins with an overview of Kafka Connect basics like connectors, converters, transforms, and plugins. It then discusses choosing the right connectors for different data sources and sinks, and how to test connectors using the Confluent CLI. The document concludes with recommendations for planning Kafka Connect deployments, such as understanding schemas, deploying connectors across workers, tuning configurations, and minimizing rebalances.
The paperback version is available on lulu.com there http://goo.gl/fraa8o
This is the first volume of the postgresql database administration book. The book covers the steps for installing, configuring and administering a PostgreSQL 9.3 on Linux debian. The book covers the logical and physical aspect of PostgreSQL. Two chapters are dedicated to the backup/restore topic.
CDC Stream Processing With Apache Flink With Timo Walther | Current 2022HostedbyConfluent
CDC Stream Processing With Apache Flink With Timo Walther | Current 2022
An instant world requires instant decisions at scale. This includes the ability to digest and react to changes in real-time. Thus, event logs such as Apache Kafka can be found in almost every architecture, while databases and similar systems still provide the foundation. Change Data Capture (CDC) has become popular for propagating changes. Nevertheless, integrating all these systems, which often have slightly different semantics, can be a challenge.
In this talk, we highlight what it means for Apache Flink to be a general data processor that acts as a data integration hub. Looking under the hood, we demonstrate Flink's SQL engine as a changelog processor that ships with an ecosystem tailored to processing CDC data and maintaining materialized views. We will discuss the semantics of different data sources and how to perform joins or stream enrichment between them. This talk illustrates how Flink can be used with systems such as Kafka (for upsert logging), Debezium, JDBC, and others.
Presto as a Service - Tips for operation and monitoringTaro L. Saito
- Presto as a Service in Treasure Data involves deploying Presto using blue-green deployments with no downtime and automatic error recovery of failed queries.
- Monitoring Presto involves using its JSON API to view queries and query plans as well as collecting Presto metrics with Fluentd and detecting anomalies.
- Benchmarking compares query performance between Presto versions by running predefined query sets and aggregating the results.
This document discusses different ways to implement configuration management in a Spring Cloud application using Spring Cloud Config. It describes using the Spring Cloud Config Server with a Git backend to externalize configuration and manage configs across environments. It also covers using a MySQL database instead of Git and implementing the config server functionality within each application to retrieve configs directly from MySQL. While most configs can be externalized, some like datasource URLs may still need to be defined internally for bootstrapping. Security, encryption, and broadcasting config changes to clients would need additional implementation as well.
FreeIPA is the open source answer to Active Directory, bringing the functionality of Kerberos and centralized management to the unix world. This talk will dive into the background of FreeIPA, how to attack it, and its parallels to traditional Active Directory. We will cover the FreeIPA equivalents of credential abuse, discovery, and lateral movement, highlighting the similarities and differences from traditional Active Directory tradecraft. This will culminate in multiple real-world demos showing how chains of abuse, previously accessible only in Windows environments, are now possible in the unix realm, providing a new medium for offensive research into Kerberos and LDAP environments.
Apache Spark Data Source V2 with Wenchen Fan and Gengliang WangDatabricks
As a general computing engine, Spark can process data from various data management/storage systems, including HDFS, Hive, Cassandra and Kafka. For flexibility and high throughput, Spark defines the Data Source API, which is an abstraction of the storage layer. The Data Source API has two requirements.
1) Generality: support reading/writing most data management/storage systems.
2) Flexibility: customize and optimize the read and write paths for different systems based on their capabilities.
Data Source API V2 is one of the most important features coming with Spark 2.3. This talk will dive into the design and implementation of Data Source API V2, with comparison to the Data Source API V1. We also demonstrate how to implement a file-based data source using the Data Source API V2 for showing its generality and flexibility.
Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewenconfluent
Flink and Kafka are popular components to build an open source stream processing infrastructure. We present how Flink integrates with Kafka to provide a platform with a unique feature set that matches the challenging requirements of advanced stream processing applications. In particular, we will dive into the following points:
Flink’s support for event-time processing, how it handles out-of-order streams, and how it can perform analytics on historical and real-time streams served from Kafka’s persistent log using the same code. We present Flink’s windowing mechanism that supports time-, count- and session- based windows, and intermixing event and processing time semantics in one program.
How Flink’s checkpointing mechanism integrates with Kafka for fault-tolerance, for consistent stateful applications with exactly-once semantics.
We will discuss “”Savepoints””, which allows users to save the state of the streaming program at any point in time. Together with a durable event log like Kafka, savepoints allow users to pause/resume streaming programs, go back to prior states, or switch to different versions of the program, while preserving exactly-once semantics.
We explain the techniques behind the combination of low-latency and high throughput streaming, and how latency/throughput trade-off can configured.
We will give an outlook on current developments for streaming analytics, such as streaming SQL and complex event processing.
Apache Spark Streaming in K8s with ArgoCD & Spark OperatorDatabricks
Over the last year, we have been moving from a batch processing jobs setup with Airflow using EC2s to a powerful & scalable setup using Airflow & Spark in K8s.
The increasing need of moving forward with all the technology changes, the new community advances, and multidisciplinary teams, forced us to design a solution where we were able to run multiple Spark versions at the same time by avoiding duplicating infrastructure and simplifying its deployment, maintenance, and development.
CDC Stream Processing With Apache Flink With Timo Walther | Current 2022HostedbyConfluent
CDC Stream Processing With Apache Flink With Timo Walther | Current 2022
An instant world requires instant decisions at scale. This includes the ability to digest and react to changes in real-time. Thus, event logs such as Apache Kafka can be found in almost every architecture, while databases and similar systems still provide the foundation. Change Data Capture (CDC) has become popular for propagating changes. Nevertheless, integrating all these systems, which often have slightly different semantics, can be a challenge.
In this talk, we highlight what it means for Apache Flink to be a general data processor that acts as a data integration hub. Looking under the hood, we demonstrate Flink's SQL engine as a changelog processor that ships with an ecosystem tailored to processing CDC data and maintaining materialized views. We will discuss the semantics of different data sources and how to perform joins or stream enrichment between them. This talk illustrates how Flink can be used with systems such as Kafka (for upsert logging), Debezium, JDBC, and others.
Presto as a Service - Tips for operation and monitoringTaro L. Saito
- Presto as a Service in Treasure Data involves deploying Presto using blue-green deployments with no downtime and automatic error recovery of failed queries.
- Monitoring Presto involves using its JSON API to view queries and query plans as well as collecting Presto metrics with Fluentd and detecting anomalies.
- Benchmarking compares query performance between Presto versions by running predefined query sets and aggregating the results.
This document discusses different ways to implement configuration management in a Spring Cloud application using Spring Cloud Config. It describes using the Spring Cloud Config Server with a Git backend to externalize configuration and manage configs across environments. It also covers using a MySQL database instead of Git and implementing the config server functionality within each application to retrieve configs directly from MySQL. While most configs can be externalized, some like datasource URLs may still need to be defined internally for bootstrapping. Security, encryption, and broadcasting config changes to clients would need additional implementation as well.
FreeIPA is the open source answer to Active Directory, bringing the functionality of Kerberos and centralized management to the unix world. This talk will dive into the background of FreeIPA, how to attack it, and its parallels to traditional Active Directory. We will cover the FreeIPA equivalents of credential abuse, discovery, and lateral movement, highlighting the similarities and differences from traditional Active Directory tradecraft. This will culminate in multiple real-world demos showing how chains of abuse, previously accessible only in Windows environments, are now possible in the unix realm, providing a new medium for offensive research into Kerberos and LDAP environments.
Apache Spark Data Source V2 with Wenchen Fan and Gengliang WangDatabricks
As a general computing engine, Spark can process data from various data management/storage systems, including HDFS, Hive, Cassandra and Kafka. For flexibility and high throughput, Spark defines the Data Source API, which is an abstraction of the storage layer. The Data Source API has two requirements.
1) Generality: support reading/writing most data management/storage systems.
2) Flexibility: customize and optimize the read and write paths for different systems based on their capabilities.
Data Source API V2 is one of the most important features coming with Spark 2.3. This talk will dive into the design and implementation of Data Source API V2, with comparison to the Data Source API V1. We also demonstrate how to implement a file-based data source using the Data Source API V2 for showing its generality and flexibility.
Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewenconfluent
Flink and Kafka are popular components to build an open source stream processing infrastructure. We present how Flink integrates with Kafka to provide a platform with a unique feature set that matches the challenging requirements of advanced stream processing applications. In particular, we will dive into the following points:
Flink’s support for event-time processing, how it handles out-of-order streams, and how it can perform analytics on historical and real-time streams served from Kafka’s persistent log using the same code. We present Flink’s windowing mechanism that supports time-, count- and session- based windows, and intermixing event and processing time semantics in one program.
How Flink’s checkpointing mechanism integrates with Kafka for fault-tolerance, for consistent stateful applications with exactly-once semantics.
We will discuss “”Savepoints””, which allows users to save the state of the streaming program at any point in time. Together with a durable event log like Kafka, savepoints allow users to pause/resume streaming programs, go back to prior states, or switch to different versions of the program, while preserving exactly-once semantics.
We explain the techniques behind the combination of low-latency and high throughput streaming, and how latency/throughput trade-off can configured.
We will give an outlook on current developments for streaming analytics, such as streaming SQL and complex event processing.
Apache Spark Streaming in K8s with ArgoCD & Spark OperatorDatabricks
Over the last year, we have been moving from a batch processing jobs setup with Airflow using EC2s to a powerful & scalable setup using Airflow & Spark in K8s.
The increasing need of moving forward with all the technology changes, the new community advances, and multidisciplinary teams, forced us to design a solution where we were able to run multiple Spark versions at the same time by avoiding duplicating infrastructure and simplifying its deployment, maintenance, and development.
Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires full-text search, especially cross-platform.
This document provides instructions for setting up a Python development environment for a tax calculator project. It outlines steps to update packages, install Python and related tools like Pip and Pytest, clone the GitHub repository, check out a new branch, and run tests. It also notes that if Unicode errors occur, the locale can be generated and exported to resolve encoding issues.
This document discusses authentication of RESTful APIs built with Django. It introduces RESTful APIs and common Django packages for building and documenting APIs, including Django REST Framework, Djoser, and Django REST Swagger. It provides an overview of configuring authentication with Djoser and generating documentation with Django REST Swagger. Finally, it mentions similar authentication projects and other API frameworks.