Confluent building a real-time streaming platform using kafka streams and kafka connect-20min-version

Jul 15, 2016Download as pptx, pdf

2 likes634 views

Jeremy Custenborder from Confluent talked about how Kafka brings an event-centric approach to building streaming applications, and how to use Kafka Connect and Kafka Streams to build them.

Building a real-
time streaming
platform using
Kafka Connect +
Kafka Streams
Jeremy Custenborder, Systems Engineer, Confluent

Confluent building a real-time streaming platform using kafka streams and kafka connect-20min-version

• Everything in the company is a real-time stream
• > 1.2 trillion messages written per day
• > 3.4 trillion messages read per day
• ~ 1 PB of stream data
• Thousands of engineers
• Tens of thousands of producer processes

Resources
• Confluent
• Company website: http://www.confluent.io
• Blog: http://www.confluent.io/blog
• Free Ebook “Making Sense of Stream Processing”
http://www.confluent.io/making-sense-of-stream-processing-ebook
• Apache Kafka
• http://kafka.apache.org
• Kafka Connect
• http://www.confluent.io/blog/announcing-kafka-connect-building-large-scale-
low-latency-data-pipelines
• Kafka Streams
• http://www.confluent.io/blog/introducing-kafka-streams-stream-processing-
made-simple

Thanks!
Jeremy Custenborder | jeremy@confluent.io |
Download Kafka
and Confluent Platform
www.confluent.io/download

Training!
http://www.confluent.io/training
Discount Code: BELLEVUE10
Operations Training in Seattle December 10th.

Many companies are adopting Apache Kafka to power their data pipelines, including LinkedIn, Netflix, and Airbnb. Kafka’s ability to handle high throughput real-time data makes it a perfect fit for solving the data integration problem, acting as the common buffer for all your data and bridging the gap between streaming and batch systems. However, building a data pipeline around Kafka today can be challenging because it requires combining a wide variety of tools to collect data from disparate data systems. One tool streams updates from your database to Kafka, another imports logs, and yet another exports to HDFS. As a result, building a data pipeline can take significant engineering effort and has high operational overhead because all these different tools require ongoing monitoring and maintenance. Additionally, some of the tools are simply a poor fit for the job: the fragmented nature of the data integration tools ecosystem lead to creative but misguided solutions such as misusing stream processing frameworks for data integration purposes. We describe the design and implementation of Kafka Connect, Kafka’s new tool for scalable, fault-tolerant data import and export. First we’ll discuss some existing tools in the space and why they fall short when applied to data integration at large scale. Next, we will explore Kafka Connect’s design and how it compares to systems with similar goals, discussing key design decisions that trade off between ease of use for connector developers, operational complexity, and reuse of existing connectors. Finally, we’ll discuss how standardizing on Kafka Connect can ultimately lead to simplifying your entire data pipeline, making ETL into your data warehouse and enabling stream processing applications as simple as adding another Kafka connector. eventbrite_kafka_summit_event_logo_v3-035858-edited.png

Kafka Connect by DatioDatio Big Data

Introduction to Kafka connectKnoldus Inc.

Kafka Streams: What it is, and how to use it?confluent

Kafka Streams is a client library for building distributed applications that process streaming data stored in Apache Kafka. It provides a high-level streams DSL that allows developers to express streaming applications as set of processing steps. Alternatively, developers can use the lower-level processor API to implement custom business logic. Kafka Streams handles tasks like fault-tolerance, scalability and state management. It represents data as streams for unbounded data or tables for bounded state. Common operations include transformations, aggregations, joins and table operations.

What's new in Confluent 3.2 and Apache Kafka 0.10.2 confluent

With the introduction of connect and streams API in 2016, Apache Kafka is becoming the defacto solution for anyone looking to build a streaming platform. The community continues to add additional capabilities to make it the complete solution for streaming data. Join us as we review the latest additions in Apache Kafka 0.10.2. In addition, we’ll cover what’s new in Confluent Enterprise 3.2 that makes it possible for running Kafka at scale.

Intro to AsyncAPIconfluent

Building Realtim Data Pipelines with Kafka Connect and Spark StreamingGuozhang Wang

Spark Streaming makes it easy to build scalable, robust stream processing applications — but only once you’ve made your data accessible to the framework. Spark Streaming solves the realtime data processing problem, but to build large scale data pipeline we need to combine it with another tool that addresses data integration challenges. The Apache Kafka project recently introduced a new tool, Kafka Connect, to make data import/export to and from Kafka easier.

Steps to Building a Streaming ETL Pipeline with Apache Kafka® and KSQLconfluent

Speaker: Robin Moffatt, Developer Advocate, Confluent In this talk, we'll build a streaming data pipeline using nothing but our bare hands, the Kafka Connect API and KSQL. We'll stream data in from MySQL, transform it with KSQL and stream it out to Elasticsearch. Options for integrating databases with Kafka using CDC and Kafka Connect will be covered as well. This is part 2 of 3 in Streaming ETL - The New Data Integration series. Watch the recording: https://videos.confluent.io/watch/4cVXUQ2jCLgJNmg4kjCRqo?.

Data integration with Apache Kafkaconfluent

A stream processing platform is not an island unto itself; it must be connected to all of your existing data systems, applications, and sources. In this talk we will provide different options for integrating systems and applications with Apache Kafka, with a focus on the Kafka Connect framework and the ecosystem of Kafka connectors. We will discuss the intended use cases for Kafka Connect and share our experience and best practices for building large-scale data pipelines using Apache Kafka.

Introduction to Apache Kafka and Confluent... and why they matterconfluent

Milano Apache Kafka Meetup by Confluent (First Italian Kafka Meetup) on Wednesday, November 29th 2017. Il talk introduce Apache Kafka (incluse le APIs Kafka Connect e Kafka Streams), Confluent (la società creata dai creatori di Kafka) e spiega perché Kafka è un'ottima e semplice soluzione per la gestione di stream di dati nel contesto di due delle principali forze trainanti e trend industriali: Internet of Things (IoT) e Microservices.

Kafka connect-london-meetup-2016Gwen (Chen) Shapira

This document discusses Apache Kafka and Confluent's Kafka Connect tool for large-scale streaming data integration. Kafka Connect allows importing and exporting data from Kafka to other systems like HDFS, databases, search indexes, and more using reusable connectors. Connectors use converters to handle serialization between data formats. The document outlines some existing connectors and upcoming improvements to Kafka Connect.

Kafka connectAndrew Stevenson

Andrew Stevenson from DataMountaineer presented on Kafka Connect. Kafka Connect is a common framework that facilitates data streams between Kafka and other systems. It handles delivery semantics, offset management, serialization/deserialization and other complex tasks, allowing users to focus on domain logic. Connectors can load and unload data from various systems like Cassandra, Elasticsearch, and MongoDB. Configuration files are used to deploy connectors with no code required.

Apache Kafka & Kafka Connectをに使ったデータ連携パターン(改めETLの実装)Keigo Suda

This document discusses Apache Kafka and Kafka Connect. It provides an overview of Kafka Connect and how it can be used for ETL processes. Kafka Connect allows data to be exported from or imported to Kafka and integrated with other systems through customizable connectors. The document describes how to run Kafka Connect in standalone and distributed modes and highlights some popular connectors available for integrating Kafka with other data sources and sinks.

Utilizing Kafka Connect to Integrate Classic Monoliths into Modern Microservi...HostedbyConfluent

Having started with classic monolith applications in the late 90s and adopting a new microservice architecture in 2015, our organization needed a convenient, reliable, and low-cost way to push changes back and forth between them. One that preferably utilized technology already on hand and could exchange information between multiple data stores. In this session we will explore how Kafka Connect and its various connectors satisfied this need. We will review the two disparate tech stacks we needed to integrate, and the strategies and connectors we used to exchange information. Finally, we will cover some enhancements we made to our own processes including integrating Kafka Connect and its connectors into our CI/CD pipeline and writing tools to monitor connectors in our production environment.

The Many Faces of Apache Kafka: Leveraging real-time data at scaleNeha Narkhede

Since it was open sourced, Apache Kafka has been adopted very widely from web companies like Uber, Netflix, LinkedIn to more traditional enterprises like Cerner, Goldman Sachs and Cisco. At these companies, Kafka is used in a variety of ways - as a pipeline for collecting high-volume log data for load into Hadoop, a means for collecting operational metrics to feed monitoring and alerting applications, for low latency messaging use cases and to power near realtime stream processing.

Developing a custom Kafka connector? Make it shine! | Igor Buzatović, Porsche...HostedbyConfluent

Some people see their cars just as a means to get them from point A to point B without breaking down halfway, but most of us want it also to be comfortable, performant, easy to drive, and of course - to look good. We can think of Kafka Connect connectors in a similar way. While the main focus is on getting data from or writing data to the external target system, it’s also relevant how easy it is to configure, does it scale well, does it provide the best possible data consistency, is it resilient to both the external system and Kafka cluster failures, and so on. This talk focuses on aspects of connector plugin development important for achieving these goals. More specifically - we‘ll cover configuration definition and validation, external source partitions and offsets handling, achieving desired delivery semantics, and more."

Monitoring Apache Kafka with Confluent Control Center confluent

Presentation by Nick Dearden, Direct, Product and Engineering, Confluent It’s 3 am. Do you know how your Kafka cluster is doing? With over 150 metrics to think about, operating a Kafka cluster can be daunting, particularly as a deployment grows. Confluent Control Center is the only complete monitoring and administration product for Apache Kafka and is designed specifically for making the Kafka operators life easier. Join Confluent as we cover how Control Center is used to simplify deployment, operability, and ensure message delivery. Watch the recording: https://www.confluent.io/online-talk/monitoring-and-alerting-apache-kafka-with-confluent-control-center/

Apache Kafka 0.8 basic training - VerisignMichael Noll

Apache Kafka 0.8 basic training (120 slides) covering: 1. Introducing Kafka: history, Kafka at LinkedIn, Kafka adoption in the industry, why Kafka 2. Kafka core concepts: topics, partitions, replicas, producers, consumers, brokers 3. Operating Kafka: architecture, hardware specs, deploying, monitoring, P&S tuning 4. Developing Kafka apps: writing to Kafka, reading from Kafka, testing, serialization, compression, example apps 5. Playing with Kafka using Wirbelsturm Audience: developers, operations, architects Created by Michael G. Noll, Data Architect, Verisign, https://www.verisigninc.com/ Verisign is a global leader in domain names and internet security. Tools mentioned: - Wirbelsturm (https://github.com/miguno/wirbelsturm) - kafka-storm-starter (https://github.com/miguno/kafka-storm-starter) Blog post at: http://www.michael-noll.com/blog/2014/08/18/apache-kafka-training-deck-and-tutorial/ Many thanks to the LinkedIn Engineering team (the creators of Kafka) and the Apache Kafka open source community!

Deploying Kafka on DC/OSKaufman Ng

The document discusses deploying Apache Kafka on DC/OS. It provides an overview of Kafka and why it is useful to deploy it on DC/OS. It outlines important considerations for deploying Kafka brokers and Zookeeper as stateful services on DC/OS, including using dedicated disks, placement constraints, and service discovery. The document warns of potential gotchas like broker restarts impacting catch up time and Kafka Streams fault tolerance.

Data Pipelines with Kafka ConnectKaufman Ng

In this presentation we describe the design and implementation of Kafka Connect, Kafka’s new tool for scalable, fault-tolerant data import and export. First we’ll discuss some existing tools in the space and why they fall short when applied to data integration at large scale. Next, we will explore Kafka Connect’s design and how it compares to systems with similar goals, discussing key design decisions that trade off between ease of use for connector developers, operational complexity, and reuse of existing connectors. Finally, we’ll discuss how standardizing on Kafka Connect can ultimately lead to simplifying your entire data pipeline, making ETL into your data warehouse and enabling stream processing applications as simple as adding another Kafka connector.

Introducing Apache Kafka's Streams API - Kafka meetup Munich, Jan 25 2017Michael Noll

The document summarizes a presentation on Apache Kafka's Streams API given in Munich, Germany on January 25, 2017. The presentation introduced the Streams API, which allows users to build stream processing applications that run on client machines and integrate natively with Apache Kafka. Key features highlighted included the API's ability to perform both stateful and stateless computations, support for interactive queries, and guarantees of at-least-once processing. The roadmap for future Streams API development was also briefly outlined.

Introduction to Apache Kafka and why it matters - MadridPaolo Castagna

This document provides an introduction to Apache Kafka and discusses why it is an important distributed streaming platform. It outlines how Kafka can be used to handle streaming data flows in a reliable and scalable way. It also describes the various Apache Kafka APIs including Kafka Connect, Streams API, and KSQL that allow organizations to integrate Kafka with other systems and build stream processing applications.

Apache kafka-a distributed streaming platformconfluent

Power of the Log: LSM & Append Only Data Structuresconfluent

LSM trees provide an efficient way to structure databases by organizing data sequentially in logs. They optimize for write performance by batching writes together sequentially on disk. To optimize reads, data is organized into levels and bloom filters and caching are used to avoid searching every file. This log-structured approach works well for many systems by aligning with how hardware is optimized for sequential access. The immutability of appended data also simplifies concurrency. This log-centric approach can be applied beyond databases to distributed systems as well.

Introducing Kafka's Streams APIconfluent

The document introduces Apache Kafka's Streams API for stream processing. Some key points covered include: - The Streams API allows building stream processing applications without needing a separate cluster, providing an elastic, scalable, and fault-tolerant processing engine. - It integrates with existing Kafka deployments and supports both stateful and stateless computations on data in Kafka topics. - Applications built with the Streams API are standard Java applications that run on client machines and leverage Kafka for distributed, parallel processing and fault tolerance via state stores in Kafka.

KSQL Introconfluent

PostgreSQL + Kafka: The Delight of Change Data CaptureJeff Klukas

PostgreSQL is an open source relational database. Kafka is an open source log-based messaging system. Because both systems are powerful and flexible, they’re devouring whole categories of infrastructure. And they’re even better together. In this talk, you’ll learn about commit logs and how that fundamental data structure underlies both PostgreSQL and Kafka. We’ll use that basis to understand what Kafka is, what advantages it has over traditional messaging systems, and why it’s perfect for modeling database tables as streams. From there, we’ll introduce the concept of change data capture (CDC) and run a live demo of Bottled Water, an open source CDC pipeline, watching INSERT, UPDATE, and DELETE operations in PostgreSQL stream into Kafka. We’ll wrap up with a discussion of use cases for this pipeline: messaging between systems with transactional guarantees, transmitting database changes to a data warehouse, and stream processing.

Rethinking Stream Processing with Apache Kafka: Applications vs. Clusters, St...Michael Noll

My talk at Strata Data Conference, London, May 2017. https://conferences.oreilly.com/strata/strata-eu/public/schedule/detail/57619 Abstract: Modern businesses have data at their core, but this data is changing continuously. How can you harness this torrent of information in real time? The answer: stream processing. The core platform for streaming data is Apache Kafka, and thousands of companies are using Kafka to transform and reshape their industries, including Netflix, Uber, PayPal, Airbnb, Goldman Sachs, Cisco, and Oracle. Unfortunately, today’s common architectures for real-time data processing at scale suffer from complexity: to succeed, many technologies need to be stitched and operated together, and each individual technology is often complex by itself. This has led to a strong discrepancy between how we engineers would like to work and how we actually end up working in practice. Michael Noll explains how Apache Kafka helps you radically simplify your data processing architectures by building normal applications to serve your real-time processing needs rather than building clusters or similar special-purpose infrastructure—while still benefiting from properties typically associated exclusively with cluster technologies, like high scalability, distributed computing, and fault tolerance. Michael also covers Kafka’s Streams API, its abstractions for streams and tables, and its recently introduced interactive queries functionality. Along the way, Michael shares common use cases that demonstrate that stream processing in practice often requires database-like functionality and how Kafka allows you to bridge the worlds of streams and databases when implementing your own core business applications (for example, in the form of event-driven, containerized microservices). As you’ll see, Kafka makes such architectures equally viable for small-, medium-, and large-scale use cases.

Apache Kafka - Scalable Message-Processing and more !Guido Schmutz

Independent of the source of data, the integration of event streams into an Enterprise Architecture gets more and more important in the world of sensors, social media streams and Internet of Things. Events have to be accepted quickly and reliably, they have to be distributed and analysed, often with many consumers or systems interested in all or part of the events. How can me make sure that all these event are accepted and forwarded in an efficient and reliable way? This is where Apache Kafaka comes into play, a distirbuted, highly-scalable messaging broker, build for exchanging huge amount of messages between a source and a target. This session will start with an introduction into Apache and presents the role of Apache Kafka in a modern data / information architecture and the advantages it brings to the table. Additionally the Kafka ecosystem will be covered as well as the integration of Kafka in the Oracle Stack, with products such as Golden Gate, Service Bus and Oracle Stream Analytics all being able to act as a Kafka consumer or producer.

Reducing Microservice Complexity with Kafka and Reactive Streamsjimriecken

My talk from ScalaDays 2016 in New York on May 11, 2016: Transitioning from a monolithic application to a set of microservices can help increase performance and scalability, but it can also drastically increase complexity. Layers of inter-service network calls for add latency and an increasing risk of failure where previously only local function calls existed. In this talk, I'll speak about how to tame this complexity using Apache Kafka and Reactive Streams to: - Extract non-critical processing from the critical path of your application to reduce request latency - Provide back-pressure to handle both slow and fast producers/consumers - Maintain high availability, high performance, and reliable messaging - Evolve message payloads while maintaining backwards and forwards compatibility.

More Related Content

What's hot (20)

Data integration with Apache Kafkaconfluent

Introduction to Apache Kafka and Confluent... and why they matterconfluent

Kafka connect-london-meetup-2016Gwen (Chen) Shapira

Kafka connectAndrew Stevenson

Apache Kafka & Kafka Connectをに使ったデータ連携パターン(改めETLの実装)Keigo Suda

Utilizing Kafka Connect to Integrate Classic Monoliths into Modern Microservi...HostedbyConfluent

The Many Faces of Apache Kafka: Leveraging real-time data at scaleNeha Narkhede

Developing a custom Kafka connector? Make it shine! | Igor Buzatović, Porsche...HostedbyConfluent

Monitoring Apache Kafka with Confluent Control Center confluent

Apache Kafka 0.8 basic training - VerisignMichael Noll

Deploying Kafka on DC/OSKaufman Ng

Data Pipelines with Kafka ConnectKaufman Ng

Introducing Apache Kafka's Streams API - Kafka meetup Munich, Jan 25 2017Michael Noll

Introduction to Apache Kafka and why it matters - MadridPaolo Castagna

Apache kafka-a distributed streaming platformconfluent

Power of the Log: LSM & Append Only Data Structuresconfluent

Introducing Kafka's Streams APIconfluent

KSQL Introconfluent

PostgreSQL + Kafka: The Delight of Change Data CaptureJeff Klukas

Rethinking Stream Processing with Apache Kafka: Applications vs. Clusters, St...Michael Noll

Data integration with Apache Kafkaconfluent

Introduction to Apache Kafka and Confluent... and why they matterconfluent

Kafka connect-london-meetup-2016Gwen (Chen) Shapira

Kafka connectAndrew Stevenson

Apache Kafka & Kafka Connectをに使ったデータ連携パターン(改めETLの実装)Keigo Suda

Utilizing Kafka Connect to Integrate Classic Monoliths into Modern Microservi...HostedbyConfluent

The Many Faces of Apache Kafka: Leveraging real-time data at scaleNeha Narkhede

Developing a custom Kafka connector? Make it shine! | Igor Buzatović, Porsche...HostedbyConfluent

Monitoring Apache Kafka with Confluent Control Center confluent

Apache Kafka 0.8 basic training - VerisignMichael Noll

Deploying Kafka on DC/OSKaufman Ng

Data Pipelines with Kafka ConnectKaufman Ng

Introducing Apache Kafka's Streams API - Kafka meetup Munich, Jan 25 2017Michael Noll

Introduction to Apache Kafka and why it matters - MadridPaolo Castagna

Apache kafka-a distributed streaming platformconfluent

Power of the Log: LSM & Append Only Data Structuresconfluent

Introducing Kafka's Streams APIconfluent

KSQL Introconfluent

PostgreSQL + Kafka: The Delight of Change Data CaptureJeff Klukas

Rethinking Stream Processing with Apache Kafka: Applications vs. Clusters, St...Michael Noll

Similar to Confluent building a real-time streaming platform using kafka streams and kafka connect-20min-version (20)

Apache Kafka - Scalable Message-Processing and more !Guido Schmutz

Reducing Microservice Complexity with Kafka and Reactive Streamsjimriecken

Apache Kafka at LinkedIn - How LinkedIn Customizes Kafka to Work at the Trill...Jonghyun Lee

An evening with Jay Kreps; author of Apache Kafka, Samza, Voldemort & Azkaban.Data Con LA

Jay Kreps, Open Source Visionary and Co Founder of Confluent and several open source projects will be visiting LA. I have asked him to come present at our group. He will present his vision and will answer questions regarding Kafka and other projects Bio:- Jay is the co-founder and CEO at Confluent a company built around realtime data streams and the open source messaging system Apache Kafka. He is the original author of several of open source projects including Apache Kafka, Apache Samza, Voldemort, and Azkaban.

Keystone - ApacheCon 2016Peter Bakas

Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...Trivadis

Kafka 탄생과 생태계Gee Yeol Nahm

1. Kafka is described as a "WAL (write-ahead logging) system" and "the global commit log thingy" that was used as part of LinkedIn's data pipeline architecture. 2. LinkedIn had an ad hoc approach to data pipelines between systems that became more complex over time, so they built pipelines using Kafka. 3. The Kafka ecosystem includes storage using Kafka brokers, publishing and subscribing using producers and consumers, and stream processing using tools like Kafka Streams and KSQL.

0-60: Tesla's Streaming Data Platform ( Jesse Yates, Tesla) Kafka Summit SF 2019confluent

Tesla ingests trillions of events every day from hundreds of unique data sources through our streaming data platform. Find out how we developed a set of high-throughput, non-blocking primitives that allow us to transform and ingest data into a variety of data stores with minimal development time. Additionally, we will discuss how these primitives allowed us to completely migrate the streaming platform in just a few months. Finally, we will talk about how we scale team size sub-linearly to data volumes, while continuing to onboard new use cases.

Sas 2015 event_drivenSascha Möllering

This document summarizes an event-driven architecture presentation using Java. It discusses using Apache Kafka/Amazon Kinesis for messaging, Docker for containerization, Vert.x for reactive applications, Apache Camel/AWS Lambda for integration, and Google Protocol Buffers for data serialization. It covers infrastructure components, software frameworks, local and AWS deployment, and integration testing between Kinesis and Kafka. The presentation provides resources for code samples and Docker images discussed.

Building a company-wide data pipeline on Apache Kafka - engineering for 150 b...LINE Corporation

This document discusses LINE's use of Apache Kafka to build a company-wide data pipeline to handle 150 billion messages per day. LINE uses Kafka as a distributed streaming platform and message queue to reliably transmit events between internal systems. The author discusses LINE's architecture, metrics like 40PB of accumulated data, and engineering challenges like optimizing Kafka's performance through contributions to reduce latency. Building systems at this massive scale requires a focus on scalability, reliability, and leveraging open source technologies like Kafka while continuously improving performance.

Being Ready for Apache Kafka - Apache: Big Data Europe 2015Michael Noll

These are the slides of my Kafka talk at Apache: Big Data Europe in Budapest, Hungary. Enjoy! --Michael Apache Kafka is a high-throughput distributed messaging system that has become a mission-critical infrastructure component for modern data platforms. Kafka is used across a wide range of industries by thousands of companies such as Twitter, Netflix, Cisco, PayPal, and many others. After a brief introduction to Kafka this talk will provide an update on the growth and status of the Kafka project community. Rest of the talk will focus on walking the audience through what's required to put Kafka in production. We’ll give an overview of the current ecosystem of Kafka, including: client libraries for creating your own apps; operational tools; peripheral components required for running Kafka in production and for integration with other systems like Hadoop. We will cover the upcoming project roadmap, which adds key features to make Kafka even more convenient to use and more robust in production.

introductiontoapachekafka-201102140206.pdfTarekHamdi8

Apache Kafka is an open-source distributed event streaming platform used for building real-time data pipelines and streaming apps. It was developed by LinkedIn in 2011 to solve problems with data integration and processing. Kafka uses a publish-subscribe messaging model and is designed to be fast, scalable, and durable. It allows both streaming and storage of data and acts as a central data backbone for large organizations.

Introduction to Apache KafkaAIMDek Technologies

Distributed Kafka Architecture Taboola ScaleApache Kafka TLV

___________________________________________ Meetup#7 | Session 2 | 21/03/2018 | Taboola _____________________________________________ In this talk, we will present our multi-DC Kafka architecture, and discuss how we tackle sending and handling 10B+ messages per day, with maximum availability and no tolerance for data loss. Our architecture includes technologies such as Cassandra, Spark, HDFS, and Vertica - with Kafka as the backbone that feeds them all.

Introduction Apache KafkaJoe Stein

AWS Re-Invent 2017 Netflix Keystone SPaaS - Monal Daxini - Abd320 2017Monal Daxini

Over 100 million subscribers from over 190 countries enjoy the Netflix service. This leads to over a trillion events, amounting to 3 PB, flowing through the Keystone infrastructure to help improve customer experience and glean business insights. The self-serve Keystone stream processing service processes these messages in near real-time with at-least once semantics in the cloud. This enables the users to focus on extracting insights, and not worry about building out scalable infrastructure. I’ll share the details about this platform, and our experience building it.

Web Analytics using Kafka - August talk w/ Women Who CodePurnima Kamath

Purnima Kamath's presentation discusses using Apache Kafka for web analytics. It introduces Kafka as a distributed commit log that can throttle high volumes of event data from web servers to prevent request drop-offs. The presentation covers Kafka's publish-subscribe model using topics and partitions, how it guarantees ordering and allows for replays. It also demonstrates how Kafka Streams enables real-time extract, transform, load operations on streaming data and maintains application state in local stores. The demo shows a sample web analytics pipeline using Kafka to capture device, gender, browser and preference change events.

Data streaming-systemsimcpune

This document discusses streaming data architectures and technologies. It begins with defining streaming processing as processing data continuously as it arrives, rather than in batches. It then covers streaming architectures, scalable data ingestion technologies like Kafka and Flume, and real-time streaming processing systems like Storm, Samza and Spark Streaming. The document aims to provide an overview of building distributed streaming systems for processing high volumes of real-time data.

NoSQL afternoon in Japan Kumofs & MessagePackSadayuki Furuhashi

Sadayuki Furuhashi created Kumofs, a distributed key-value store, and MessagePack, a cross-language object serialization library. Kumofs is optimized for low latency with zero-hop reads and no single points of failure. It scales out linearly as servers are added without impacting applications. MessagePack is a compact binary format like JSON used for cross-language communication. MessagePack-RPC is a cross-language messaging library that uses an asynchronous, pipelined protocol over an event-driven I/O model.

NoSQL afternoon in Japan kumofs & MessagePackSadayuki Furuhashi

- Sadayuki Furuhashi is a computer science student at the University of Tsukuba who created Kumofs, a distributed key-value store, and MessagePack, a cross-language communication library. - Kumofs is optimized for low-latency and scalability through a consistent hashing algorithm and dynamic rebalancing without impacting applications. It has no single point of failure. - MessagePack is a compact binary serialization format like JSON used for fast cross-language communication. MessagePack-RPC builds on this to enable asynchronous, pipelined RPC between languages.

Apache Kafka - Scalable Message-Processing and more !Guido Schmutz

Reducing Microservice Complexity with Kafka and Reactive Streamsjimriecken

Apache Kafka at LinkedIn - How LinkedIn Customizes Kafka to Work at the Trill...Jonghyun Lee

An evening with Jay Kreps; author of Apache Kafka, Samza, Voldemort & Azkaban.Data Con LA

Keystone - ApacheCon 2016Peter Bakas

Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...Trivadis

Kafka 탄생과 생태계Gee Yeol Nahm

0-60: Tesla's Streaming Data Platform ( Jesse Yates, Tesla) Kafka Summit SF 2019confluent

Sas 2015 event_drivenSascha Möllering

Building a company-wide data pipeline on Apache Kafka - engineering for 150 b...LINE Corporation

Being Ready for Apache Kafka - Apache: Big Data Europe 2015Michael Noll

introductiontoapachekafka-201102140206.pdfTarekHamdi8

Introduction to Apache KafkaAIMDek Technologies

Distributed Kafka Architecture Taboola ScaleApache Kafka TLV

Introduction Apache KafkaJoe Stein

AWS Re-Invent 2017 Netflix Keystone SPaaS - Monal Daxini - Abd320 2017Monal Daxini

Web Analytics using Kafka - August talk w/ Women Who CodePurnima Kamath

Data streaming-systemsimcpune

NoSQL afternoon in Japan Kumofs & MessagePackSadayuki Furuhashi

NoSQL afternoon in Japan kumofs & MessagePackSadayuki Furuhashi

Recently uploaded (20)

Topic 26 Security Testing Considerations.pptxmarutnand8

How to purchase, license and subscribe to Microsoft Azure_PDF.pdfvictordsane

Microsoft Azure is a cloud platform that empowers businesses with scalable computing, data analytics, artificial intelligence, and cybersecurity capabilities. Arguably the biggest hurdle for most organizations is understanding how to get started. Microsoft Azure is a consumption-based cloud service. This means you pay for what you use. Unlike traditional software, Azure resources (e.g., VMs, databases, storage) are billed based on usage time, storage size, data transfer, or resource configurations. There are three primary Azure purchasing models: • Pay-As-You-Go (PAYG): Ideal for flexibility. Billed monthly based on actual usage. • Azure Reserved Instances (RI): Commit to 1- or 3-year terms for predictable workloads. This model offers up to 72% cost savings. • Enterprise Agreements (EA): Best suited for large organizations needing comprehensive Azure solutions and custom pricing. Licensing Azure: What You Need to Know Azure doesn’t follow the traditional “per seat” licensing model. Instead, you pay for: • Compute Hours (e.g., Virtual Machines) • Storage Used (e.g., Blob, File, Disk) • Database Transactions • Data Transfer (Outbound) Purchasing and subscribing to Microsoft Azure is more than a transactional step, it’s a strategic move. Get in touch with our team of licensing experts via [email protected] to further understand the purchasing paths, licensing options, and cost management tools, to optimize your investment.

Software Engineering Process, Notation & Tools Introduction - Part 3Gaurav Sharma

The Future of Open Source Reporting Best Alternatives to Jaspersoft.pdfVarsha Nayak

In recent years, organizations have increasingly sought robust open source alternative to Jasper Reports as the landscape of open-source reporting tools rapidly evolves. While Jaspersoft has been a longstanding choice for generating complex business intelligence and analytics reports, factors such as licensing changes and growing demands for flexibility have prompted many businesses to explore other options. Among the most notable alternatives to Jaspersoft, Helical Insight stands out for its powerful open-source architecture, intuitive analytics, and dynamic dashboard capabilities. Designed to be both flexible and budget-friendly, Helical Insight empowers users with advanced features—such as in-memory reporting, extensive data source integration, and customizable visualizations—making it an ideal solution for organizations seeking a modern, scalable reporting platform. This article explores the future of open-source reporting and highlights why Helical Insight and other emerging tools are redefining the standards for business intelligence solutions.

The rise of e-commerce has redefined how retailers operate—and reconciliation...Prachi Desai

From Chaos to Clarity - Designing (AI-Ready) APIs with APIOps CyclesMarjukka Niinioja

Teams delivering API are challenges with: - Connecting APIs to business strategy - Measuring API success (audit & lifecycle metrics) - Partner/Ecosystem onboarding - Consistent documentation, security, and publishing 🧠 The big takeaway? Many teams can build APIs. But few connect them to value, visibility, and long-term improvement. That’s why the APIOps Cycles method helps teams: 📍 Start where the pain is (one “metro station” at a time) 📈 Scale success across strategy, platform, and operations 🛠 Use collaborative canvases to get buy-in and visibility Want to try it and learn more? - Follow APIOps Cycles in LinkedIn - Visit the www.apiopscycles.com site - Subscribe to email list -

Generative Artificial Intelligence and its ApplicationsSandeepKS52

The exploration of generative AI begins with an overview of its fundamental concepts, highlighting how these technologies create new content and ideas by learning from existing data. Following this, the focus shifts to the processes involved in training and fine-tuning models, which are essential for enhancing their performance and ensuring they meet specific needs. Finally, the importance of responsible AI practices is emphasized, addressing ethical considerations and the impact of AI on society, which are crucial for developing systems that are not only effective but also beneficial and fair.

Top 11 Fleet Management Software Providers in 2025 (2).pdfTrackobit

Eliminate the complexities of Event-Driven Architecture with Domain-Driven De...SheenBrisals

The distributed nature of modern applications and their architectures brings a great level of complexity to engineering teams. Though API contracts, asynchronous communication patterns, and event-driven architecture offer assistance, not all enterprise teams fully utilize them. While adopting cloud and modern technologies, teams are often hurried to produce outcomes without spending time in upfront thinking. This leads to building tangled applications and distributed monoliths. For those organizations, it is hard to recover from such costly mistakes. In this talk, Sheen will explain how enterprises should decompose by starting at the organizational level, applying Domain-Driven Design, and distilling to a level where teams can operate within a boundary, ownership, and autonomy. He will provide organizational, team, and design patterns and practices to make the best use of event-driven architecture by understanding the types of events, event structure, and design choices to keep the domain model pure by guarding against corruption and complexity.

How Insurance Policy Administration Streamlines Policy Lifecycle for Agile Op...Insurance Tech Services

Essentials of Resource Planning in a DownturnOnePlan Solutions

COBOL Programming with VSCode - IBM CertificateVICTOR MAESTRE RAMIREZ

Bonk coin airdrop_ Everything You Need to Know.pdfHerond Labs

The Bonk airdrop, one of the largest in Solana’s history, distributed 50% of its total supply to community members, significantly boosting its popularity and Solana’s network activity. Below is everything you need to know about the Bonk coin airdrop, including its history, eligibility, how to claim tokens, risks, and current status. https://blog.herond.org/bonk-coin-airdrop/

Agentic Techniques in Retrieval-Augmented Generation with Azure AI SearchMaxim Salnikov

Discover how Agentic Retrieval in Azure AI Search takes Retrieval-Augmented Generation (RAG) to the next level by intelligently breaking down complex queries, leveraging full conversation history, and executing parallel searches through a new LLM-powered query planner. This session introduces a cutting-edge approach that delivers significantly more accurate, relevant, and grounded answers—unlocking new capabilities for building smarter, more responsive generative AI applications. Traditional Retrieval-Augmented Generation (RAG) pipelines work well for simple queries—but when users ask complex, multi-part questions or refer to previous conversation history, they often fall short. That’s where Agentic Retrieval comes in: a game-changing advancement in Azure AI Search that brings LLM-powered reasoning directly into the retrieval layer. This session unveils how agentic techniques elevate your RAG-based applications by introducing intelligent query planning, subquery decomposition, parallel execution, and result merging—all orchestrated by a new Knowledge Agent. You’ll learn how this approach significantly boosts relevance, groundedness, and answer quality, especially for sophisticated enterprise use cases. Key takeaways: - Understand the evolution from keyword and vector search to agentic query orchestration - See how full conversation context improves retrieval accuracy - Explore measurable improvements in answer relevance and completeness (up to 40% gains!) - Get hands-on guidance on integrating Agentic Retrieval with Azure AI Foundry and SDKs - Discover how to build scalable, AI-first applications powered by this new paradigm Whether you're building intelligent copilots, enterprise Q&A bots, or AI-driven search solutions, this session will equip you with the tools and patterns to push beyond traditional RAG.

Top 5 Task Management Software to Boost Productivity in 2025Orangescrum

AI and Deep Learning with NVIDIA TechnologiesSandeepKS52

Artificial intelligence and deep learning are transforming various fields by enabling machines to learn from data and make decisions. Understanding how to prepare data effectively is crucial, as it lays the foundation for training models that can recognize patterns and improve over time. Once models are trained, the focus shifts to deployment, where these intelligent systems are integrated into real-world applications, allowing them to perform tasks and provide insights based on new information. This exploration of AI encompasses the entire process from initial concepts to practical implementation, highlighting the importance of each stage in creating effective and reliable AI solutions.

Integrating Survey123 and R&H Data Using FMESafe Software

West Virginia Department of Transportation (WVDOT) actively engages in several field data collection initiatives using Collector and Survey 123. A critical component for effective asset management and enhanced analytical capabilities is the integration of Geographic Information System (GIS) data with Linear Referencing System (LRS) data. Currently, RouteID and Measures are not captured in Survey 123. However, we can bridge this gap through FME Flow automation. When a survey is submitted through Survey 123 for ArcGIS Portal (10.8.1), it triggers FME Flow automation. This process uses a customized workbench that interacts with a modified version of Esri's Geometry to Measure API. The result is a JSON response that includes RouteID and Measures, which are then applied to the feature service record.

Build enterprise-ready applications using skills you already have!PhilMeredith3

Process Tempo is a rapid application development (RAD) environment that empowers data teams to create enterprise-ready applications using skills they already have. With Process Tempo, data teams can craft beautiful, pixel-perfect applications the business will love. Process Tempo combines features found in business intelligence tools, graphic design tools and workflow solutions - all in a single platform. Process Tempo works with all major databases such as Databricks, Snowflake, Postgres and MySQL. It also works with leading graph database technologies such as Neo4j, Puppy Graph and Memgraph. It is the perfect platform to accelerate the delivery of data-driven solutions. For more information, you can find us at www.processtempo.com

Software Engineering Process, Notation & Tools Introduction - Part 4Gaurav Sharma

Marketo & Dynamics can be Most Excellent to Each Other – The SequelBradBedford3

So you’ve built trust in your Marketo Engage-Dynamics integration—excellent. But now what? This sequel picks up where our last adventure left off, offering a step-by-step guide to move from stable sync to strategic power moves. We’ll share real-world project examples that empower sales and marketing to work smarter and stay aligned. If you’re ready to go beyond the basics and do truly most excellent stuff, this session is your guide.