Distributed streaming performance, consistency, reliable delivery, durability, optimisations, event time processing and other concepts discussed and explained on Akka Persistence and other examples.
Data in Motion: Streaming Static Data Efficiently 2Martin Zapletal
Updated version for SD Berlin 2016. Distributed streaming performance, consistency, reliable delivery, durability, optimisations, event time processing and other concepts discussed and explained on Akka Persistence and other examples.
Large volume data analysis on the Typesafe Reactive Platform - Big Data Scala...Martin Zapletal
The document discusses distributed machine learning and data processing. It covers several topics including reasons for using distributed machine learning, different distributed computing architectures and primitives, distributed data stores and analytics tools like Spark, streaming architectures like Lambda and Kappa, and challenges around distributed state management and fault tolerance. It provides examples of failures in distributed databases and suggestions to choose the appropriate tools based on the use case and understand their internals.
Yet another presentation about Event Sourcing? Yes and no. Event Sourcing is a really great concept. Some could say it’s a Holy Grail of the software architecture. I might agree with that, while remembering that everything comes with a price. This session is a summary of my experience with ES gathered while working on 3 different commercial products. Instead of theoretical aspects, I will focus on possible challenges with ES implementation. What could explode (very often with delayed ignition)? How and where to store events effectively? What are possible schema evolution solutions? How to achieve the highest level of scalability and live with eventual consistency? And many other interesting topics that you might face when experimenting with ES.
Andrzej Ludwikowski - Event Sourcing - what could possibly go wrong? - Codemo...Codemotion
Yet another presentation about Event Sourcing? Yes and no. Event Sourcing is a really great concept. Some could say it’s a Holy Grail of the software architecture. True, but everything comes with a price. This session is a summary of my experience with ES gathered while working on 3 different commercial products. Instead of theoretical aspects, I will focus on possible challenges with ES implementation. What could explode? How and where to store events effectively? What are possible schema evolution solutions? How to achieve the highest level of scalability and live with eventual consistency?
Our product uses third generation Big Data technologies and Spark Structured Streaming to enable comprehensive Digital Transformation. It provides a unified streaming API that allows for continuous processing, interactive queries, joins with static data, continuous aggregations, stateful operations, and low latency. The presentation introduces Spark Structured Streaming's basic concepts including loading from stream sources like Kafka, writing to sinks, triggers, SQL integration, and mixing streaming with batch processing. It also covers continuous aggregations with windows, stateful operations with checkpointing, reading from and writing to Kafka, and benchmarks compared to other streaming frameworks.
Using Spark to Load Oracle Data into CassandraJim Hatcher
The document discusses lessons learned from using Spark to load data from Oracle into Cassandra. It describes problems encountered with Spark SQL handling Oracle NUMBER and timeuuid fields incorrectly. It also discusses issues generating IDs across RDDs and limitations on returning RDDs of tuples over 22 items. The resources section provides references for learning more about Spark, Scala, and using Spark with Cassandra.
A dive into akka streams: from the basics to a real-world scenarioGioia Ballin
Reactive streaming is becoming the best approach to handle data flows across asynchronous boundaries. Here, we present the implementation of a real-world application based on Akka Streams. After reviewing the basics, we will discuss the development of a data processing pipeline that collects real-time sensor data and sends it to a Kinesis stream. There are various possible point of failures in this architecture. What should happen when Kinesis is unavailable? If the data flow is not handled in the correct way, some information may get lost. Akka Streams are the tools that enabled us to build a reliable processing logic for the pipeline that avoids data losses and maximizes the robustness of the entire system.
This document provides an overview of Rxjs (Reactive Extensions for JavaScript). It begins by explaining why Rxjs is useful for dealing with asynchronous code in a synchronous way and provides one paradigm for asynchronous operations. It then discusses the history of callbacks and promises for asynchronous code. The bulk of the document explains key concepts in Rxjs including Observables, Operators, error handling, testing with Schedulers, and compares Promises to Rxjs. It provides examples of many common Rxjs operators and patterns.
Rxjs provides a paradigm for dealing with asynchronous operations in a way that resembles synchronous code. It uses Observables to represent asynchronous data streams over time that can be composed using operators. This allows handling of events, asynchronous code, and other reactive sources in a declarative way. Key points are:
- Observables represent asynchronous data streams that can be subscribed to.
- Operators allow manipulating and transforming streams through methods like map, filter, switchMap.
- Schedulers allow controlling virtual time for testing asynchronous behavior.
- Promises represent single values while Observables represent continuous streams, making Observables more powerful for reactive programming.
- Cascading asynchronous calls can be modeled elegantly using switch
This document provides an introduction to RxJS, a library for reactive programming using Observables. It discusses some of the key concepts in RxJS including Observables, operators, and recipes. Observables can be created from events, promises, arrays and other sources. Operators allow chaining transformations on Observables like map, filter, switchMap. RxJS is useful for complex asynchronous flows and is easier to test than promises using marble testing. Overall, RxJS enables rich composition of asynchronous code and cancellation of Observables.
User Defined Aggregation in Apache Spark: A Love StoryDatabricks
This document summarizes a user's journey developing a custom aggregation function for Apache Spark using a T-Digest sketch. The user initially implemented it as a User Defined Aggregate Function (UDAF) but ran into performance issues due to excessive serialization/deserialization. They then worked to resolve it by implementing the function as a custom Aggregator using Spark 3.0's new aggregation APIs, which avoided unnecessary serialization and provided a 70x performance improvement. The story highlights the importance of understanding how custom functions interact with Spark's execution model and optimization techniques like avoiding excessive serialization.
NGRX provides tools for implementing Redux patterns in Angular applications. The key components are:
- The Store holds the single source of truth application state as plain JavaScript objects and uses reducers to immutable update state in response to actions.
- Actions describe state changes and are dispatched to the Store which passes them to reducers. Reducers pure functions that return new state.
- Selectors allow slices of state to be accessed.
- Additional libraries provide debugging with devtools, routing integration, side effect handling, and entity management functionality. Files can be organized by domain or feature module.
Time series with Apache Cassandra - Long versionPatrick McFadin
Apache Cassandra has proven to be one of the best solutions for storing and retrieving time series data. This talk will give you an overview of the many ways you can be successful. We will discuss how the storage model of Cassandra is well suited for this pattern and go over examples of how best to build data models.
Using Akka Persistence to build a configuration datastoreAnargyros Kiourkos
Akka is a library implementing the actor model on the JVM. It enables the development of distributed, concurrent, fault-tolerant and scalable applications. Akka persistence enables actors to persist their internal state so that it can be recovered where an actor is restarted. We demonstrate the practical application of Akka persistence to build a configuration datastore using CQRS and Event Sourcing
Designing The Right Schema To Power Heap (PGConf Silicon Valley 2016)Dan Robinson
Heap's analytics infrastructure is built around PostgreSQL. The most important choice to make when building a system this way is the schema you'll use to represent your data. This foundation will determine your write throughput, what sorts of read queries will be fast, what indexing strategies will be available to you, and what data inconsistencies will be possible. With the wrong choice, you won't be able to leverage PostgreSQL's most powerful features.
This talk walks through the different schemas we've used to power Heap over the last three years, their relative strengths and weaknesses, and the mistakes we've made.
Reactive Design Patterns — J on the BeachRoland Kuhn
Our software needs to become reactive, this realization is widely understood: we need to consider responsiveness, maintainability, elasticity and scalability from the outset. Not all systems need to implement all these to the same degree, specific project requirements will determine where effort is most wisely spent, but in the vast majority of cases the need to go reactive will demand that we design our applications differently.
In this presentation we explore several architecture elements that are commonly found in reactive systems (like the circuit breaker, various replication techniques, or flow control protocols). These patterns are language agnostic and also independent of the abundant choice of reactive programming frameworks and libraries, they are well-specified starting points for exploring the design space of a concrete problem: thinking is strictly required!
This document discusses stateful streaming data pipelines using Apache Apex. It introduces Apache Apex and describes its key components like tuples, operators, and the directed acyclic graph (DAG) structure. It then discusses challenges around checkpointing large operator state and introduces managed state and spillable data structures as solutions. Managed state incrementally checkpoints state to disk and allows configuring memory thresholds. Spillable data structures decouple data from serialization and provide map, list, and set interfaces to stored data. Examples demonstrate building complex data structures on top of managed state.
1. Rxjs provides a better way of handling asynchronous code through observables which are streams of values over time. Observables allow for cancellable, retryable operations and easy composition of different asynchronous sources.
2. Common Rxjs operators like map, filter, and flatMap allow transforming and combining observable streams. Operators make observables quite powerful for tasks like async logic, event handling, and API requests.
3. In Angular, observables are used extensively for tasks like HTTP requests, routing, and component communication. Key aspects are using async pipes for subscriptions and unsubscribing during lifecycle hooks. Rxjs greatly simplifies many common asynchronous patterns in Angular applications.
The document summarizes a workshop on Cassandra data modeling. It discusses four use cases: (1) modeling clickstream data by storing sessions and clicks in separate column families, (2) modeling a rolling time window of data points by storing each point in a column with a TTL, (3) modeling rolling counters by storing counts in columns indexed by time bucket, and (4) using transaction logs to achieve eventual consistency when modeling many-to-many relationships by serializing transactions and deleting logs after commit. The document provides recommendations and alternatives for each use case.
RxJava is a library for composing asynchronous and event-based programs using observable sequences. It provides APIs for asynchronous programming with observable streams and the ability to chain operations and transformations on these streams using reactive extensions. The basic building blocks are Observables, which emit items, and Subscribers, which consume those items. Operators allow filtering, transforming, and combining Observable streams. RxJava helps address problems with threading and asynchronous operations in Android by providing tools to manage execution contexts and avoid callback hell.
This document provides an agenda for a presentation on Big Data Analytics with Cassandra, Spark, and MLLib. The presentation covers Spark basics, using Spark with Cassandra, Spark Streaming, Spark SQL, and Spark MLLib. It also includes examples of querying and analyzing Cassandra data with Spark and Spark SQL, and machine learning with Spark MLLib.
Schedulers are a very powerful way of controlling the execution of the Reactive Extensions for JavaScript streams. In this presentation we examine the different types of schedulers and the different queues in the JavaScript runtime (Tasks and Micro tasks)
RxJs - demystified provides an overview of reactive programming and RxJs. The key points covered are:
- Reactive programming focuses on propagating changes without explicitly specifying how propagation happens.
- Observables are at the heart of RxJs and emit values in a push-based manner. Operators allow transforming, filtering, and combining observables.
- Common operators include map, filter, reduce, buffer, and switchMap. Over 120 operators exist for tasks like error handling, multicasting, and conditional logic.
- Marble diagrams visually demonstrate how operators transform observable streams.
- Creating observables from events, promises, arrays and iterables allows wrapping different data sources in a uniform API
Unit Testing RxJS streams using jasmine-marbles. The demo shows how we can create a Web Component that uses NGRX/store + NGRX/effects like workflow and how we test our effect streams
Nike Tech Talk: Double Down on Apache Cassandra and SparkPatrick McFadin
Apache Cassandra has proven to be one of the best solutions for storing and retrieving time series data at high velocity and high volume. This talk will give you an overview of the many ways you can be successful by introducing Apache Cassandra concepts. We will discuss how the storage model of Cassandra is well suited for this pattern and go over examples of how best to build data models. There will also be examples of how you can use Apache Spark along with Apache Cassandra to create a real time data analytics platform. It’s so easy, you will be shocked and ready to try it yourself.
Advanced Apache Cassandra Operations with JMXzznate
Nodetool is a command line interface for managing a Cassandra node. It provides commands for node administration, cluster inspection, table operations and more. The nodetool info command displays node-specific information such as status, load, memory usage and cache details. The nodetool compactionstats command shows compaction status including active tasks and progress. The nodetool tablestats command displays statistics for a specific table including read/write counts, space usage, cache usage and latency.
Beyond the Query – Bringing Complex Access Patterns to NoSQL with DataStax - ...StampedeCon
Learn how to model beyond traditional direct access in Apache Cassandra. Utilizing the DataStax platform to harness the power of Spark and Solr to perform search, analytics, and complex operations in place on your Cassandra data!
Distributed Real-Time Stream Processing: Why and How 2.0Petr Zapletal
The demand for stream processing is increasing a lot these day. Immense amounts of data has to be processed fast from a rapidly growing set of disparate data sources. This pushes the limits of traditional data processing infrastructures. These stream-based applications include trading, social networks, Internet of things, system monitoring, and many other examples.
In this talk we are going to discuss various state of the art open-source distributed streaming frameworks, their similarities and differences, implementation trade-offs and their intended use-cases. Apart of that, I’m going to speak about Fast Data, theory of streaming, framework evaluation and so on. My goal is to provide comprehensive overview about modern streaming frameworks and to help fellow developers with picking the best possible for their particular use-case.
The document summarizes Martin Odersky's talk at Scala Days 2016 about the road ahead for Scala. The key points are:
1. Scala is maturing with improvements to tools like IDEs and build tools in 2015, while 2016 sees increased activity with the Scala Center, Scala 2.12 release, and rethinking Scala libraries.
2. The Scala Center was formed to undertake projects benefiting the Scala community with support from various companies.
3. Scala 2.12 focuses on optimizing for Java 8 and includes many new features. Future releases will focus on improving Scala libraries and modularization.
4. The DOT calculus provides a formal
Rxjs provides a paradigm for dealing with asynchronous operations in a way that resembles synchronous code. It uses Observables to represent asynchronous data streams over time that can be composed using operators. This allows handling of events, asynchronous code, and other reactive sources in a declarative way. Key points are:
- Observables represent asynchronous data streams that can be subscribed to.
- Operators allow manipulating and transforming streams through methods like map, filter, switchMap.
- Schedulers allow controlling virtual time for testing asynchronous behavior.
- Promises represent single values while Observables represent continuous streams, making Observables more powerful for reactive programming.
- Cascading asynchronous calls can be modeled elegantly using switch
This document provides an introduction to RxJS, a library for reactive programming using Observables. It discusses some of the key concepts in RxJS including Observables, operators, and recipes. Observables can be created from events, promises, arrays and other sources. Operators allow chaining transformations on Observables like map, filter, switchMap. RxJS is useful for complex asynchronous flows and is easier to test than promises using marble testing. Overall, RxJS enables rich composition of asynchronous code and cancellation of Observables.
User Defined Aggregation in Apache Spark: A Love StoryDatabricks
This document summarizes a user's journey developing a custom aggregation function for Apache Spark using a T-Digest sketch. The user initially implemented it as a User Defined Aggregate Function (UDAF) but ran into performance issues due to excessive serialization/deserialization. They then worked to resolve it by implementing the function as a custom Aggregator using Spark 3.0's new aggregation APIs, which avoided unnecessary serialization and provided a 70x performance improvement. The story highlights the importance of understanding how custom functions interact with Spark's execution model and optimization techniques like avoiding excessive serialization.
NGRX provides tools for implementing Redux patterns in Angular applications. The key components are:
- The Store holds the single source of truth application state as plain JavaScript objects and uses reducers to immutable update state in response to actions.
- Actions describe state changes and are dispatched to the Store which passes them to reducers. Reducers pure functions that return new state.
- Selectors allow slices of state to be accessed.
- Additional libraries provide debugging with devtools, routing integration, side effect handling, and entity management functionality. Files can be organized by domain or feature module.
Time series with Apache Cassandra - Long versionPatrick McFadin
Apache Cassandra has proven to be one of the best solutions for storing and retrieving time series data. This talk will give you an overview of the many ways you can be successful. We will discuss how the storage model of Cassandra is well suited for this pattern and go over examples of how best to build data models.
Using Akka Persistence to build a configuration datastoreAnargyros Kiourkos
Akka is a library implementing the actor model on the JVM. It enables the development of distributed, concurrent, fault-tolerant and scalable applications. Akka persistence enables actors to persist their internal state so that it can be recovered where an actor is restarted. We demonstrate the practical application of Akka persistence to build a configuration datastore using CQRS and Event Sourcing
Designing The Right Schema To Power Heap (PGConf Silicon Valley 2016)Dan Robinson
Heap's analytics infrastructure is built around PostgreSQL. The most important choice to make when building a system this way is the schema you'll use to represent your data. This foundation will determine your write throughput, what sorts of read queries will be fast, what indexing strategies will be available to you, and what data inconsistencies will be possible. With the wrong choice, you won't be able to leverage PostgreSQL's most powerful features.
This talk walks through the different schemas we've used to power Heap over the last three years, their relative strengths and weaknesses, and the mistakes we've made.
Reactive Design Patterns — J on the BeachRoland Kuhn
Our software needs to become reactive, this realization is widely understood: we need to consider responsiveness, maintainability, elasticity and scalability from the outset. Not all systems need to implement all these to the same degree, specific project requirements will determine where effort is most wisely spent, but in the vast majority of cases the need to go reactive will demand that we design our applications differently.
In this presentation we explore several architecture elements that are commonly found in reactive systems (like the circuit breaker, various replication techniques, or flow control protocols). These patterns are language agnostic and also independent of the abundant choice of reactive programming frameworks and libraries, they are well-specified starting points for exploring the design space of a concrete problem: thinking is strictly required!
This document discusses stateful streaming data pipelines using Apache Apex. It introduces Apache Apex and describes its key components like tuples, operators, and the directed acyclic graph (DAG) structure. It then discusses challenges around checkpointing large operator state and introduces managed state and spillable data structures as solutions. Managed state incrementally checkpoints state to disk and allows configuring memory thresholds. Spillable data structures decouple data from serialization and provide map, list, and set interfaces to stored data. Examples demonstrate building complex data structures on top of managed state.
1. Rxjs provides a better way of handling asynchronous code through observables which are streams of values over time. Observables allow for cancellable, retryable operations and easy composition of different asynchronous sources.
2. Common Rxjs operators like map, filter, and flatMap allow transforming and combining observable streams. Operators make observables quite powerful for tasks like async logic, event handling, and API requests.
3. In Angular, observables are used extensively for tasks like HTTP requests, routing, and component communication. Key aspects are using async pipes for subscriptions and unsubscribing during lifecycle hooks. Rxjs greatly simplifies many common asynchronous patterns in Angular applications.
The document summarizes a workshop on Cassandra data modeling. It discusses four use cases: (1) modeling clickstream data by storing sessions and clicks in separate column families, (2) modeling a rolling time window of data points by storing each point in a column with a TTL, (3) modeling rolling counters by storing counts in columns indexed by time bucket, and (4) using transaction logs to achieve eventual consistency when modeling many-to-many relationships by serializing transactions and deleting logs after commit. The document provides recommendations and alternatives for each use case.
RxJava is a library for composing asynchronous and event-based programs using observable sequences. It provides APIs for asynchronous programming with observable streams and the ability to chain operations and transformations on these streams using reactive extensions. The basic building blocks are Observables, which emit items, and Subscribers, which consume those items. Operators allow filtering, transforming, and combining Observable streams. RxJava helps address problems with threading and asynchronous operations in Android by providing tools to manage execution contexts and avoid callback hell.
This document provides an agenda for a presentation on Big Data Analytics with Cassandra, Spark, and MLLib. The presentation covers Spark basics, using Spark with Cassandra, Spark Streaming, Spark SQL, and Spark MLLib. It also includes examples of querying and analyzing Cassandra data with Spark and Spark SQL, and machine learning with Spark MLLib.
Schedulers are a very powerful way of controlling the execution of the Reactive Extensions for JavaScript streams. In this presentation we examine the different types of schedulers and the different queues in the JavaScript runtime (Tasks and Micro tasks)
RxJs - demystified provides an overview of reactive programming and RxJs. The key points covered are:
- Reactive programming focuses on propagating changes without explicitly specifying how propagation happens.
- Observables are at the heart of RxJs and emit values in a push-based manner. Operators allow transforming, filtering, and combining observables.
- Common operators include map, filter, reduce, buffer, and switchMap. Over 120 operators exist for tasks like error handling, multicasting, and conditional logic.
- Marble diagrams visually demonstrate how operators transform observable streams.
- Creating observables from events, promises, arrays and iterables allows wrapping different data sources in a uniform API
Unit Testing RxJS streams using jasmine-marbles. The demo shows how we can create a Web Component that uses NGRX/store + NGRX/effects like workflow and how we test our effect streams
Nike Tech Talk: Double Down on Apache Cassandra and SparkPatrick McFadin
Apache Cassandra has proven to be one of the best solutions for storing and retrieving time series data at high velocity and high volume. This talk will give you an overview of the many ways you can be successful by introducing Apache Cassandra concepts. We will discuss how the storage model of Cassandra is well suited for this pattern and go over examples of how best to build data models. There will also be examples of how you can use Apache Spark along with Apache Cassandra to create a real time data analytics platform. It’s so easy, you will be shocked and ready to try it yourself.
Advanced Apache Cassandra Operations with JMXzznate
Nodetool is a command line interface for managing a Cassandra node. It provides commands for node administration, cluster inspection, table operations and more. The nodetool info command displays node-specific information such as status, load, memory usage and cache details. The nodetool compactionstats command shows compaction status including active tasks and progress. The nodetool tablestats command displays statistics for a specific table including read/write counts, space usage, cache usage and latency.
Beyond the Query – Bringing Complex Access Patterns to NoSQL with DataStax - ...StampedeCon
Learn how to model beyond traditional direct access in Apache Cassandra. Utilizing the DataStax platform to harness the power of Spark and Solr to perform search, analytics, and complex operations in place on your Cassandra data!
Distributed Real-Time Stream Processing: Why and How 2.0Petr Zapletal
The demand for stream processing is increasing a lot these day. Immense amounts of data has to be processed fast from a rapidly growing set of disparate data sources. This pushes the limits of traditional data processing infrastructures. These stream-based applications include trading, social networks, Internet of things, system monitoring, and many other examples.
In this talk we are going to discuss various state of the art open-source distributed streaming frameworks, their similarities and differences, implementation trade-offs and their intended use-cases. Apart of that, I’m going to speak about Fast Data, theory of streaming, framework evaluation and so on. My goal is to provide comprehensive overview about modern streaming frameworks and to help fellow developers with picking the best possible for their particular use-case.
The document summarizes Martin Odersky's talk at Scala Days 2016 about the road ahead for Scala. The key points are:
1. Scala is maturing with improvements to tools like IDEs and build tools in 2015, while 2016 sees increased activity with the Scala Center, Scala 2.12 release, and rethinking Scala libraries.
2. The Scala Center was formed to undertake projects benefiting the Scala community with support from various companies.
3. Scala 2.12 focuses on optimizing for Java 8 and includes many new features. Future releases will focus on improving Scala libraries and modularization.
4. The DOT calculus provides a formal
Everyone in the Scala world is using or looking into using Akka for low-latency, scalable, distributed or concurrent systems. I'd like to share my story of developing and productionizing multiple Akka apps, including low-latency ingestion and real-time processing systems, and Spark-based applications.
When does one use actors vs futures?
Can we use Akka with, or in place of, Storm?
How did we set up instrumentation and monitoring in production?
How does one use VisualVM to debug Akka apps in production?
What happens if the mailbox gets full?
What is our Akka stack like?
I will share best practices for building Akka and Scala apps, pitfalls and things we'd like to avoid, and a vision of where we would like to go for ideal Akka monitoring, instrumentation, and debugging facilities. Plus backpressure and at-least-once processing.
The document discusses Akka 2.4 and commercial features available through the Reactive Platform. Key points include: Akka 2.4 requires Java 8 but provides backwards compatibility; Cluster Tools, Persistence, and Distributed PubSub are now stable features; Persistence allows cross-Scala version snapshot compatibility; a Split Brain Resolver is available in beta for cluster failure scenarios; and extended Java 6 support is provided through the Reactive Platform.
Large volume data analysis on the Typesafe Reactive PlatformMartin Zapletal
The document discusses several topics related to distributed machine learning and distributed systems including:
- Reasons for using distributed machine learning being either due to large data volumes or hopes of increased speed
- Failure rates of hardware and network links in large data centers
- Examples of database inconsistencies and data loss caused by network partitions in different distributed databases
- Key aspects of distributed data processing including data storage, integration, computing primitives, and analytics
This document provides an overview of installing and deploying Apache Spark, including:
1. Spark can be installed via prebuilt packages or by building from source.
2. Spark runs in local, standalone, YARN, or Mesos cluster modes and the SparkContext is used to connect to the cluster.
3. Jobs are deployed to the cluster using the spark-submit script which handles building jars and dependencies.
Spark Based Distributed Deep Learning Framework For Big Data Applications Humoyun Ahmedov
Deep Learning architectures, such as deep neural networks, are currently the hottest emerging areas of data science, especially in Big Data. Deep Learning could be effectively exploited to address some major issues of Big Data, such as fast information retrieval, data classification, semantic indexing and so on. In this work, we designed and implemented a framework to train deep neural networks using Spark, fast and general data flow engine for large scale data processing, which can utilize cluster computing to train large scale deep networks. Training Deep Learning models requires extensive data and computation. Our proposed framework can accelerate the training time by distributing the model replicas, via stochastic gradient descent, among cluster nodes for data resided on HDFS.
This document provides a history and market overview of Apache Spark. It discusses the motivation for distributed data processing due to increasing data volumes, velocities and varieties. It then covers brief histories of Google File System, MapReduce, BigTable, and other technologies. Hadoop and MapReduce are explained. Apache Spark is introduced as a faster alternative to MapReduce that keeps data in memory. Competitors like Flink, Tez and Storm are also mentioned.
The document provides an overview and summary of Curator, a client library for Apache ZooKeeper. Curator aims to simplify ZooKeeper development by providing a friendlier API, handling retries, and implementing common patterns ("recipes") like leader election and locks. It consists of a client, framework, and recipes components. The framework handles connection management and retries, while recipes implement distributed primitives. Details about common recipes like locks and leader election are provided.
This document discusses scaling machine learning using Apache Spark. It covers several key topics:
1) Parallelizing machine learning algorithms and neural networks to distribute computation across clusters. This includes data, model, and parameter server parallelism.
2) Apache Spark's Resilient Distributed Datasets (RDDs) programming model which allows distributing data and computation across a cluster in a fault-tolerant manner.
3) Examples of very large neural networks trained on clusters, such as a Google face detection model using 1,000 servers and a IBM brain-inspired chip model using 262,144 CPUs.
Spark's distributed programming model uses resilient distributed datasets (RDDs) and a directed acyclic graph (DAG) approach. RDDs support transformations like map, filter, and actions like collect. Transformations are lazy and form the DAG, while actions execute the DAG. RDDs support caching, partitioning, and sharing state through broadcasts and accumulators. The programming model aims to optimize the DAG through operations like predicate pushdown and partition coalescing.
Why Scala Is Taking Over the Big Data WorldDean Wampler
The document discusses how Scala is becoming increasingly popular for big data applications. It provides context about the evolution of Hadoop and MapReduce as the main computing framework for big data in 2008. It notes that implementing algorithms with just MapReduce steps was difficult, and that the Hadoop API was low-level and tedious to work with. Scala is taking over as it allows for more complex algorithms to be implemented more easily compared to the MapReduce model and Hadoop's API.
- Scala originated from Martin Odersky's work on functional programming languages like OCaml in the 1980s and 1990s. It was designed to combine object-oriented and functional programming in a practical way.
- Key aspects of Scala include being statically typed while also being flexible, its unified object and module system, and treating libraries as primary over language features.
- Scala has grown into an large ecosystem centered around the JVM and also targets JavaScript via Scala.js. Tooling continues to improve with faster compilers and new IDE support.
- Future work includes establishing TASTY as a portable intermediate representation, connecting Scala to formal foundations through the DOT calculus, and exploring new type
Gary Vaynerchuk provides tips for using Snapchat effectively. He notes that while only 1% of marketers currently use Snapchat, it has over 100 million active monthly users, making it a good platform to engage younger audiences. He then provides several tips, such as making emojis huge, finding hidden colors, using geofitlers to tag locations, replaying snaps once per day, drawing on an iPad for better quality, cross-promoting to other networks, using QR codes, and checking out the new Snapchat Discover feature. He encourages readers to get started using these tips on Snapchat.
This document describes a reactive platform architecture designed for scalability. It advocates for a shift in priorities from consistency to availability to enable responsiveness under load. The architecture uses microservices and serverless functions to improve scalability but notes serverless is not yet good for managing state. It also discusses challenges around infrastructure complexity and operational costs that increase with microservices compared to monolithic architectures.
Using Apache Spark to Solve Sessionization Problem in Batch and StreamingDatabricks
This document discusses sessionization techniques using Apache Spark batch and streaming processing. It describes using Spark to join previous session data with new log data to generate user sessions in batch mode. For streaming, it covers using watermarks and stateful processing to continuously generate sessions from streaming data. Key aspects covered include checkpointing to provide fault tolerance, configuring the state store, and techniques for reprocessing data in batch and streaming contexts.
This document describes a quiz management system created by Joyita Kundu. It includes details on the database tables, menu design, form design and event coding. The database contains tables for login information, questions and results. The menu system allows users to take IP or GK tests. Forms are used for login, registration, the quiz and results. Event coding handles form interactions and database queries. The system allows users to take timed tests, view results and track performance over time.
Event sourcing in the functional world (22 07-2021)Vitaly Brusentsev
This document provides an overview of using event sourcing with F# and functional programming principles. It discusses domain design using event storming, defining primitive types and events, implementing domain logic as pure functions, serialization, error handling, and using Cosmos DB for event storage and change feeds to build eventually consistent read models. The key aspects covered include defining aggregates, commands, and events as discriminated unions; using Railway Oriented Programming for error handling; serializing to and from DTOs; implementing handlers as pure functions that return aggregates and events; and processing events in Azure Functions using change feeds to update read models.
Powering Heap With PostgreSQL And CitusDB (PGConf Silicon Valley 2015)Dan Robinson
At Heap, we lean on PostgreSQL for all our backend heavy lifting. We support an expressive set of queries — conversion funnels with filtering and grouping, retention analysis, and behavioral cohorting to name a few — across billions of users and tens of billions of events. Results need to come back in a matter of seconds and reflect up-to-the-minute data.
This talk will discuss these challenges, with a particular focus on:
- Using CitusDB for interactive analysis across 50 terabytes of data and counting.
- PostgreSQL and Kafka: two great tastes that taste great together.
- UDFs in C and PL/pgSQL, partial indexes for pre-aggregation, and other tricks up our sleeves.
User Defined Aggregation in Apache Spark: A Love StoryDatabricks
Defining customized scalable aggregation logic is one of Apache Spark’s most powerful features. User Defined Aggregate Functions (UDAF) are a flexible mechanism for extending both Spark data frames and Structured Streaming with new functionality ranging from specialized summary techniques to building blocks for exploratory data analysis.
Vadym Khondar is a senior software engineer with 8 years of experience, including 2.5 years at EPAM. He leads a development team that works on web and JavaScript projects. The document discusses reactive programming, including its benefits of responsiveness, resilience, and other qualities. Examples demonstrate using streams, behaviors, and other reactive concepts to write more declarative and asynchronous code.
The document discusses event sourcing and command query responsibility segregation (CQRS) patterns at different levels of implementation. It covers the benefits of event sourcing including having a complete log of state changes, improved debugging, performance, scalability, and microservices integration. It also discusses challenges with increasing complexity, eventual consistency, and different event store and serialization options.
A Practical Approach to Building a Streaming Processing Pipeline for an Onlin...Databricks
Yelp’s ad platform handles millions of ad requests everyday. To generate ad metrics and analytics in real-time, they built they ad event tracking and analyzing pipeline on top of Spark Streaming. It allows Yelp to manage large number of active ad campaigns and greatly reduce over-delivery. It also enables them to share ad metrics with advertisers in a more timely fashion.
This session will start with an overview of the entire pipeline and then focus on two specific challenges in the event consolidation part of the pipeline that Yelp had to solve. The first challenge will be about joining multiple data sources together to generate a single stream of ad events that feeds into various downstream systems. That involves solving several problems that are unique to real-time applications, such as windowed processing and handling of event delays. The second challenge covered is with regards to state management across code deployments and application restarts. Throughout the session, the speakers will share best practices for the design and development of large-scale Spark Streaming pipelines for production environments.
Casting for not so strange Actos
- A presentation about the Actors pattern and a look at prototypes in 4 different programming languages - Jruby (Celluloid), Erlang, Elixir and Scala (Akka)
- Presented in "Strange Group Berlin" meetup on 12.03.2015 held at 6Wunderkinder
The Ring programming language version 1.5 book - Part 8 of 31Mahmoud Samir Fayed
This document summarizes key classes and methods from the Ring web library (weblib.ring).
The Application class contains methods for encoding, decoding, cookies, and more. The Page class contains methods for generating common HTML elements and structures. Model classes like UsersModel manage data access and object relational mapping. Controller classes handle requests and coordinate the view and model.
This document discusses refactoring code to improve its design without changing external behavior. It notes that refactoring involves making small, incremental changes rather than large "big bang" refactorings. Code smells that may indicate a need for refactoring include duplication, long methods, complex conditional logic, speculative code, and overuse of comments. Techniques discussed include extracting methods, removing duplication, using meaningful names, removing temporary variables, and applying polymorphism. The document emphasizes that refactoring is an investment that makes future changes easier and helps avoid bugs, and encourages learning from other programming communities.
The Ring programming language version 1.4.1 book - Part 13 of 31Mahmoud Samir Fayed
This document provides documentation on Ring's web library API for generating HTML pages and elements. It describes classes and methods for creating pages, adding content and attributes, handling forms, and more. The Page class allows adding various HTML elements to the page content through methods like text(), html(), h1(), etc. The Application class contains methods for encoding, cookies, and page structure. WebLib enables generating complete HTML pages in Ring code.
Kamil Chmielewski, Jacek Juraszek - "Hadoop. W poszukiwaniu złotego młotka."sjabs
The document discusses Hadoop and its applications. It provides examples of companies like Facebook and their use of Hadoop. It also discusses Hadoop components like HDFS, MapReduce, Pig and HBase. It provides examples of using Hadoop with databases like MongoDB and search engines like Solr. It notes that not every problem requires large-scale solutions and discusses potential use cases for Hadoop including log analysis, indexing documents and building recommendation systems.
In this talk, I will describe how to use xState to implement state machine and state charts and create highly predictable user interfaces.
Presentation from https://www.meetup.com/ReactNYC/events/259413726/
The Ring programming language version 1.7 book - Part 48 of 196Mahmoud Samir Fayed
This document provides code examples and documentation for Ring's web library (weblib.ring). It describes classes and methods for generating HTML pages, forms, tables and other elements. This includes the Page class for adding common elements like text, headings, paragraphs etc., the Application class for handling requests, cookies and encoding, and classes representing various HTML elements like forms, inputs, images etc. It also provides an overview of how to create pages dynamically using View and Controller classes along with Model classes for database access.
Map/reduce, geospatial indexing, and other cool features (Kristina Chodorow)MongoSF
The document appears to be notes from a MongoDB training session that discusses various MongoDB features like MapReduce, geospatial indexes, and GridFS. It also covers topics like database commands, indexing, and querying documents with embedded documents and arrays. Examples are provided for how to implement many of these MongoDB features and functions.
Proper distribution of functionalities throughout many machines is very hard, especially when we leave those decisions for later. Akka toolkit gives us many tools for scaling out and we can start using them very early in a development process, enhancing our chances of success. In this introductory talk, I want to go through a very simple example and show snippets of single-noded and sharded implementations.
This was a talk given at HTML5DevConf SF in 2015.
Ever wanted to write your own Browserify or Babel? Maybe have an idea for something new? This talk will get you started understanding how to use a JavaScript AST to transform and generate new code.
Download Wondershare Filmora Crack [2025] With Latesttahirabibi60507
Copy & Past Link 👉👉
http://drfiles.net/
Wondershare Filmora is a video editing software and app designed for both beginners and experienced users. It's known for its user-friendly interface, drag-and-drop functionality, and a wide range of tools and features for creating and editing videos. Filmora is available on Windows, macOS, iOS (iPhone/iPad), and Android platforms.
Adobe After Effects Crack FREE FRESH version 2025kashifyounis067
🌍📱👉COPY LINK & PASTE ON GOOGLE http://drfiles.net/ 👈🌍
Adobe After Effects is a software application used for creating motion graphics, special effects, and video compositing. It's widely used in TV and film post-production, as well as for creating visuals for online content, presentations, and more. While it can be used to create basic animations and designs, its primary strength lies in adding visual effects and motion to videos and graphics after they have been edited.
Here's a more detailed breakdown:
Motion Graphics:
.
After Effects is powerful for creating animated titles, transitions, and other visual elements to enhance the look of videos and presentations.
Visual Effects:
.
It's used extensively in film and television for creating special effects like green screen compositing, object manipulation, and other visual enhancements.
Video Compositing:
.
After Effects allows users to combine multiple video clips, images, and graphics to create a final, cohesive visual.
Animation:
.
It uses keyframes to create smooth, animated sequences, allowing for precise control over the movement and appearance of objects.
Integration with Adobe Creative Cloud:
.
After Effects is part of the Adobe Creative Cloud, a suite of software that includes other popular applications like Photoshop and Premiere Pro.
Post-Production Tool:
.
After Effects is primarily used in the post-production phase, meaning it's used to enhance the visuals after the initial editing of footage has been completed.
Avast Premium Security Crack FREE Latest Version 2025mu394968
🌍📱👉COPY LINK & PASTE ON GOOGLE https://dr-kain-geera.info/👈🌍
Avast Premium Security is a paid subscription service that provides comprehensive online security and privacy protection for multiple devices. It includes features like antivirus, firewall, ransomware protection, and website scanning, all designed to safeguard against a wide range of online threats, according to Avast.
Key features of Avast Premium Security:
Antivirus: Protects against viruses, malware, and other malicious software, according to Avast.
Firewall: Controls network traffic and blocks unauthorized access to your devices, as noted by All About Cookies.
Ransomware protection: Helps prevent ransomware attacks, which can encrypt your files and hold them hostage.
Website scanning: Checks websites for malicious content before you visit them, according to Avast.
Email Guardian: Scans your emails for suspicious attachments and phishing attempts.
Multi-device protection: Covers up to 10 devices, including Windows, Mac, Android, and iOS, as stated by 2GO Software.
Privacy features: Helps protect your personal data and online privacy.
In essence, Avast Premium Security provides a robust suite of tools to keep your devices and online activity safe and secure, according to Avast.
FL Studio Producer Edition Crack 2025 Full Versiontahirabibi60507
Copy & Past Link 👉👉
http://drfiles.net/
FL Studio is a Digital Audio Workstation (DAW) software used for music production. It's developed by the Belgian company Image-Line. FL Studio allows users to create and edit music using a graphical user interface with a pattern-based music sequencer.
F-Secure Freedome VPN 2025 Crack Plus Activation New Versionsaimabibi60507
Copy & Past Link 👉👉
https://dr-up-community.info/
F-Secure Freedome VPN is a virtual private network service developed by F-Secure, a Finnish cybersecurity company. It offers features such as Wi-Fi protection, IP address masking, browsing protection, and a kill switch to enhance online privacy and security .
Get & Download Wondershare Filmora Crack Latest [2025]saniaaftab72555
Copy & Past Link 👉👉
https://dr-up-community.info/
Wondershare Filmora is a video editing software and app designed for both beginners and experienced users. It's known for its user-friendly interface, drag-and-drop functionality, and a wide range of tools and features for creating and editing videos. Filmora is available on Windows, macOS, iOS (iPhone/iPad), and Android platforms.
Proactive Vulnerability Detection in Source Code Using Graph Neural Networks:...Ranjan Baisak
As software complexity grows, traditional static analysis tools struggle to detect vulnerabilities with both precision and context—often triggering high false positive rates and developer fatigue. This article explores how Graph Neural Networks (GNNs), when applied to source code representations like Abstract Syntax Trees (ASTs), Control Flow Graphs (CFGs), and Data Flow Graphs (DFGs), can revolutionize vulnerability detection. We break down how GNNs model code semantics more effectively than flat token sequences, and how techniques like attention mechanisms, hybrid graph construction, and feedback loops significantly reduce false positives. With insights from real-world datasets and recent research, this guide shows how to build more reliable, proactive, and interpretable vulnerability detection systems using GNNs.
⭕️➡️ FOR DOWNLOAD LINK : http://drfiles.net/ ⬅️⭕️
Maxon Cinema 4D 2025 is the latest version of the Maxon's 3D software, released in September 2024, and it builds upon previous versions with new tools for procedural modeling and animation, as well as enhancements to particle, Pyro, and rigid body simulations. CG Channel also mentions that Cinema 4D 2025.2, released in April 2025, focuses on spline tools and unified simulation enhancements.
Key improvements and features of Cinema 4D 2025 include:
Procedural Modeling: New tools and workflows for creating models procedurally, including fabric weave and constellation generators.
Procedural Animation: Field Driver tag for procedural animation.
Simulation Enhancements: Improved particle, Pyro, and rigid body simulations.
Spline Tools: Enhanced spline tools for motion graphics and animation, including spline modifiers from Rocket Lasso now included for all subscribers.
Unified Simulation & Particles: Refined physics-based effects and improved particle systems.
Boolean System: Modernized boolean system for precise 3D modeling.
Particle Node Modifier: New particle node modifier for creating particle scenes.
Learning Panel: Intuitive learning panel for new users.
Redshift Integration: Maxon now includes access to the full power of Redshift rendering for all new subscriptions.
In essence, Cinema 4D 2025 is a major update that provides artists with more powerful tools and workflows for creating 3D content, particularly in the fields of motion graphics, VFX, and visualization.
Exceptional Behaviors: How Frequently Are They Tested? (AST 2025)Andre Hora
Exceptions allow developers to handle error cases expected to occur infrequently. Ideally, good test suites should test both normal and exceptional behaviors to catch more bugs and avoid regressions. While current research analyzes exceptions that propagate to tests, it does not explore other exceptions that do not reach the tests. In this paper, we provide an empirical study to explore how frequently exceptional behaviors are tested in real-world systems. We consider both exceptions that propagate to tests and the ones that do not reach the tests. For this purpose, we run an instrumented version of test suites, monitor their execution, and collect information about the exceptions raised at runtime. We analyze the test suites of 25 Python systems, covering 5,372 executed methods, 17.9M calls, and 1.4M raised exceptions. We find that 21.4% of the executed methods do raise exceptions at runtime. In methods that raise exceptions, on the median, 1 in 10 calls exercise exceptional behaviors. Close to 80% of the methods that raise exceptions do so infrequently, but about 20% raise exceptions more frequently. Finally, we provide implications for researchers and practitioners. We suggest developing novel tools to support exercising exceptional behaviors and refactoring expensive try/except blocks. We also call attention to the fact that exception-raising behaviors are not necessarily “abnormal” or rare.
Microsoft AI Nonprofit Use Cases and Live Demo_2025.04.30.pdfTechSoup
In this webinar we will dive into the essentials of generative AI, address key AI concerns, and demonstrate how nonprofits can benefit from using Microsoft’s AI assistant, Copilot, to achieve their goals.
This event series to help nonprofits obtain Copilot skills is made possible by generous support from Microsoft.
What You’ll Learn in Part 2:
Explore real-world nonprofit use cases and success stories.
Participate in live demonstrations and a hands-on activity to see how you can use Microsoft 365 Copilot in your own work!
Explaining GitHub Actions Failures with Large Language Models Challenges, In...ssuserb14185
GitHub Actions (GA) has become the de facto tool that developers use to automate software workflows, seamlessly building, testing, and deploying code. Yet when GA fails, it disrupts development, causing delays and driving up costs. Diagnosing failures becomes especially challenging because error logs are often long, complex and unstructured. Given these difficulties, this study explores the potential of large language models (LLMs) to generate correct, clear, concise, and actionable contextual descriptions (or summaries) for GA failures, focusing on developers’ perceptions of their feasibility and usefulness. Our results show that over 80% of developers rated LLM explanations positively in terms of correctness for simpler/small logs. Overall, our findings suggest that LLMs can feasibly assist developers in understanding common GA errors, thus, potentially reducing manual analysis. However, we also found that improved reasoning abilities are needed to support more complex CI/CD scenarios. For instance, less experienced developers tend to be more positive on the described context, while seasoned developers prefer concise summaries. Overall, our work offers key insights for researchers enhancing LLM reasoning, particularly in adapting explanations to user expertise.
https://arxiv.org/abs/2501.16495
Why Orangescrum Is a Game Changer for Construction Companies in 2025Orangescrum
Orangescrum revolutionizes construction project management in 2025 with real-time collaboration, resource planning, task tracking, and workflow automation, boosting efficiency, transparency, and on-time project delivery.
Landscape of Requirements Engineering for/by AI through Literature ReviewHironori Washizaki
Hironori Washizaki, "Landscape of Requirements Engineering for/by AI through Literature Review," RAISE 2025: Workshop on Requirements engineering for AI-powered SoftwarE, 2025.
This presentation explores code comprehension challenges in scientific programming based on a survey of 57 research scientists. It reveals that 57.9% of scientists have no formal training in writing readable code. Key findings highlight a "documentation paradox" where documentation is both the most common readability practice and the biggest challenge scientists face. The study identifies critical issues with naming conventions and code organization, noting that 100% of scientists agree readable code is essential for reproducible research. The research concludes with four key recommendations: expanding programming education for scientists, conducting targeted research on scientific code quality, developing specialized tools, and establishing clearer documentation guidelines for scientific software.
Presented at: The 33rd International Conference on Program Comprehension (ICPC '25)
Date of Conference: April 2025
Conference Location: Ottawa, Ontario, Canada
Preprint: https://arxiv.org/abs/2501.10037
Download YouTube By Click 2025 Free Full Activatedsaniamalik72555
Copy & Past Link 👉👉
https://dr-up-community.info/
"YouTube by Click" likely refers to the ByClick Downloader software, a video downloading and conversion tool, specifically designed to download content from YouTube and other video platforms. It allows users to download YouTube videos for offline viewing and to convert them to different formats.
43. Events by tag
0 0
0 1
event 1,
tag 1
event 100,
tag 1
event 101 event 102
event 0 event 2,
tag 1
1 0
event 0 event 1 event 2,
tag 1
44. 0 0
0 1
event 1,
tag 1
event 100,
tag 1
event 101 event 102
event 2,
tag 1
1 0
event 0 event 1
event 0
event 2,
tag 1
45. 0 0
0 1
event 1,
tag 1
event 100,
tag 1
event 101 event 102
event 0 event 2,
tag 1
1 0
event 1event 0 event 2,
tag 1
46. 0 0
0 1
event 1,
tag 1
event 100,
tag 1
event 101 event 102
event 0 event 2,
tag 1
1 0
event 0 event 1 event 2,
tag 1
47. event 0
event 0
0 0
0 1
event 1,
tag 1
event 100,
tag 1
event 101 event 102
event 2,
tag 1
1 0
event 1 event 2,
tag 1
48. event 0
event 0 event 1
0 0
0 1
event 100,
tag 1
event 101 event 102
event 2,
tag 1
1 0
event 2,
tag 1
event 1,
tag 1
49. 0 0
0 1
event 1,
tag 1
event 100,
tag 1
event 101 event 102
event 2,
tag 1
1 0
event 2,
tag 1
event 0
event 0 event 1
event 1,
tag 1
50. event 1,
tag 1
event 2,
tag 1
event 0
event 0 event 1
event 1,
tag 10 0
0 1
event 100,
tag 1
event 101 event 102
1 0
event 2,
tag 1
51. event 2,
tag 1
event 0
event 0 event 1
0 0
0 1
event 100,
tag 1
event 101 event 102
1 0
event 2,
tag 1
event 1,
tag 1
52. 0 0
0 1
1 0
event 2,
tag 1
event 0
event 0 event 1
event 100,
tag 1
event 101 event 102
event 2,
tag 1
event 1,
tag 1
53. Events by tag
Id 0,
event 1
Id 1,
event 2
Id 0,
event 100
0 0
0 1
event 1,
tag 1
event 100,
tag 1
event 101 event 102
event 0
1 0
event 0 event 1 event 2,
tag 1
Id 0,
event 2
tag 1 1/1/2016
tag 1 1/2/2016
event 2,
tag 1
SELECT * FROM $eventsByTagViewName$tagId WHERE
tag$tagId = ? AND
timebucket = ? AND
timestamp > ? AND
timestamp <= ?
ORDER BY timestamp ASC
LIMIT ?
54. Id 1,
event 2
Id 0,
event 100
Id 0,
event 1
0 0
0 1
event 1,
tag 1
event 100,
tag 1
event 101 event 102
event 0
Id 0,
event 2
1 0
event 0 event 1 event 2,
tag 1
tag 1 1/1/2016
tag 1 1/2/2016
event 2,
tag 1
55. Id 1,
event 2
Id 0,
event 100
Id 0,
event 1
0 0
0 1
event 1,
tag 1
event 100,
tag 1
event 101 event 102
event 0
Id 0,
event 2
1 0
event 0 event 1 event 2,
tag 1
tag 1 1/1/2016
tag 1 1/2/2016
event 2,
tag 1
56. Id 0,
event 100
Id 1,
event 2
Id 0,
event 1
0 0
0 1
event 1,
tag 1
event 100,
tag 1
event 101 event 102
event 0
Id 0,
event 2
1 0
event 0 event 1 event 2,
tag 1
tag 1 1/1/2016
tag 1 1/2/2016
event 2,
tag 1
57. Id 0,
event 100
Id 1,
event 2
Id 0,
event 1
0 0
0 1
event 1,
tag 1
event 100,
tag 1
event 101 event 102
event 0
1 0
event 0 event 1 event 2,
tag 1
tag 1 1/1/2016
tag 1 1/2/2016
event 2,
tag 1
Id 0,
event 2
58. 0 0
0 1
event 1,
tag 1
event 100,
tag 1
event 101 event 102
event 0
event 2,
tag 1
1 0
event 0 event 1 event 2,
tag 1
tag 1 1/1/2016
tag 1 1/2/2016
59. tag 1 1/1/2016
tag 1 1/2/2016
Id 0,
event 1
0 0
0 1
event 1,
tag 1
event 100,
tag 1
event 101 event 102
event 0
1 0
event 0 event 1 event 2,
tag 1
persistence
_id
seq
0 1
1 . . .
event 2,
tag 1
60. Id 0,
event 100
Id 0,
event 1
0 0
0 1
event 1,
tag 1
event 100,
tag 1
event 101 event 102
event 0
1 0
event 0 event 1 event 2,
tag 1
persistence
_id
seq
0 ?
1 . . .
event 2,
tag 1
tag 1 1/1/2016
tag 1 1/2/2016
61. Id 0,
event 100
Id 0,
event 2
Id 0,
event 1
0 0
0 1
event 1,
tag 1
event 100,
tag 1
event 101 event 102
event 0
1 0
event 0 event 1 event 2,
tag 1
persistence
_id
seq
0 ?
1
event 2,
tag 1
tag 1 1/1/2016
tag 1 1/2/2016
. . .
62. seqNumbers match {
case None =>
replyTo ! UUIDPersistentRepr(offs, toPersistentRepr(row, pid, seqNr))
loop(n - 1)
case Some(s) =>
s.isNext(pid, seqNr) match {
case SequenceNumbers.Yes | SequenceNumbers.PossiblyFirst =>
seqNumbers = Some(s.updated(pid, seqNr))
replyTo ! UUIDPersistentRepr(offs, toPersistentRepr(row, pid, seqNr))
loop(n - 1)
case SequenceNumbers.After =>
replyTo ! ReplayAborted(seqNumbers, pid, s.get(pid) + 1, seqNr)
// end loop
case SequenceNumbers.Before =>
// duplicate, discard
if (!backtracking)
log.debug(s"Discarding duplicate. Got sequence number [$seqNr] for [$pid], " +
s"but current sequence number is [${s.get(pid)}]")
loop(n - 1)
}
}
63. seqNumbers match {
case None =>
replyTo ! UUIDPersistentRepr(offs, toPersistentRepr(row, pid, seqNr))
loop(n - 1)
case Some(s) =>
s.isNext(pid, seqNr) match {
case SequenceNumbers.Yes | SequenceNumbers.PossiblyFirst =>
seqNumbers = Some(s.updated(pid, seqNr))
replyTo ! UUIDPersistentRepr(offs, toPersistentRepr(row, pid, seqNr))
loop(n - 1)
case SequenceNumbers.After =>
replyTo ! ReplayAborted(seqNumbers, pid, s.get(pid) + 1, seqNr)
// end loop
case SequenceNumbers.Before =>
// duplicate, discard
if (!backtracking)
log.debug(s"Discarding duplicate. Got sequence number [$seqNr] for [$pid], " +
s"but current sequence number is [${s.get(pid)}]")
loop(n - 1)
}
}
64. def replay(): Unit = {
val backtracking = isBacktracking
val limit =
if (backtracking) maxBufferSize
else maxBufferSize - buf.size
val toOffs =
if (backtracking && abortDeadline.isEmpty) highestOffset
else UUIDs.endOf(System.currentTimeMillis() - eventualConsistencyDelayMillis)
context.actorOf(EventsByTagFetcher.props(tag, currTimeBucket, currOffset, toOffs, limit, backtracking,
self, session, preparedSelect, seqNumbers, settings))
context.become(replaying(limit))
}
def replaying(limit: Int): Receive = {
case env @ UUIDPersistentRepr(offs, _) => // Deliver buffer
case ReplayDone(count, seqN, highest) => // Request more
case ReplayAborted(seqN, pid, expectedSeqNr, gotSeqNr) =>
// Causality violation, wait and retry. Only applicable if all events for persistence_id are tagged
case ReplayFailed(cause) => // Failure
case _: Request => // Deliver buffer
case Continue => // Do nothing
case Cancel => // Stop
}
65. def replay(): Unit = {
val backtracking = isBacktracking
val limit =
if (backtracking) maxBufferSize
else maxBufferSize - buf.size
val toOffs =
if (backtracking && abortDeadline.isEmpty) highestOffset
else UUIDs.endOf(System.currentTimeMillis() - eventualConsistencyDelayMillis)
context.actorOf(EventsByTagFetcher.props(tag, currTimeBucket, currOffset, toOffs, limit, backtracking,
self, session, preparedSelect, seqNumbers, settings))
context.become(replaying(limit))
}
def replaying(limit: Int): Receive = {
case env @ UUIDPersistentRepr(offs, _) => // Deliver buffer
case ReplayDone(count, seqN, highest) => // Request more
case ReplayAborted(seqN, pid, expectedSeqNr, gotSeqNr) =>
// Causality violation, wait and retry. Only applicable if all events for persistence_id are tagged
case ReplayFailed(cause) => // Failure
case _: Request => // Deliver buffer
case Continue => // Do nothing
case Cancel => // Stop
}
78. node_id
0
1
Id 0,
event 0
Id 0,
event 1
Id 1,
event 0
Id 0,
event 2
Id 2,
event 0
Id 0,
event 3
Id 0,
event 0
Id 0,
event 1
Id 1,
event 0
Id 2,
event 0
Id 0,
event 2
Id 0,
event 3
tag 1 0
allIds
Id 0,
event 1
Id 2,
event 1
0 1
0 0 event 0 event 1
79. tag 1 0
allIds
Id 0,
event 1
Id 2,
event 1
0 1
0 0 event 0 event 1
val boundStatements = statementGroup(eventsByPersistenceId, eventsByTag, allPersistenceIds)
Future.sequence(boundStatements).flatMap { stmts =>
val batch = new BatchStatement().setConsistencyLevel(...).setRetryPolicy(...)
stmts.foreach(batch.add)
session.underlying().flatMap(_.executeAsync(batch))
}
80. tag 1 0
allIds
Id 0,
event 1
Id 2,
event 1
0 1
0 0 event 0 event 1
val boundStatements = statementGroup(eventsByPersistenceId, eventsByTag, allPersistenceIds)
Future.sequence(boundStatements).flatMap { stmts =>
val batch = new BatchStatement().setConsistencyLevel(...).setRetryPolicy(...)
stmts.foreach(batch.add)
session.underlying().flatMap(_.executeAsync(batch))
}
81. val eventsByPersistenceIdStatement = statementGroup(eventsByPersistenceIdStatement)
val boundStatements = statementGroup(eventsByTagStatement, allPersistenceIdsStatement)
...
session.underlying().flatMap { s =>
val ebpResult = s.executeAsync(eventsByPersistenceIdStatement)
val batchResult = s.executeAsync(batch))
...
}
tag 1 0
allIds
Id 0,
event 1
Id 2,
event 1
0 1
0 0 event 0 event 1
82. val eventsByPersistenceIdStatement = statementGroup(eventsByPersistenceIdStatement)
val boundStatements = statementGroup(eventsByTagStatement, allPersistenceIdsStatement)
...
session.underlying().flatMap { s =>
val ebpResult = s.executeAsync(eventsByPersistenceIdStatement)
val batchResult = s.executeAsync(batch))
...
}
tag 1 0
allIds
Id 0,
event 1
Id 2,
event 1
0 1
0 0 event 0 event 1
85. Ordering
10 2
1 12:34:57 1
KEY TIME VALUE
2 12:34:58 2
KEY TIME VALUE
0 12:34:56 0
KEY TIME VALUE
86. 0
1
2
1 12:34:57 1
KEY TIME VALUE
2 12:34:58 2
KEY TIME VALUE
0 12:34:56 0
KEY TIME VALUE
87. Distributed causal stream merging
Id 0,
event 2
Id 0,
event 1
Id 0,
event 0
Id 1,
event 00
1
Id 2,
event 0
Id 0,
event 3
node_id
88. Id 0,
event 2
Id 0,
event 1
Id 0,
event 0
Id 1,
event 00
1
Id 2,
event 0
Id 0,
event 3
Id 0,
event 0
node_id
89. Id 0,
event 2
Id 0,
event 1
Id 0,
event 0
Id 1,
event 00
1
Id 2,
event 0
Id 0,
event 3
Id 0,
event 0
node_id
90. Id 0,
event 2
Id 0,
event 1
Id 0,
event 0
Id 1,
event 00
1
Id 2,
event 0
Id 0,
event 3
Id 0,
event 0
node_id
persistence
_id
seq
0 0
1 . . .
2 . . .
91. persistence
_id
seq
0 1
1 . . .
2 . . .
Id 0,
event 2
Id 0,
event 1
Id 0,
event 0
Id 1,
event 0
node_id
0
1
Id 2,
event 0
Id 0,
event 0
Id 0,
event 1
Id 0,
event 3
92. persistence
_id
seq
0 2
1 0
2 0
Id 0,
event 1
Id 0,
event 0
Id 1,
event 0
node_id
0
1
Id 2,
event 0
Id 0,
event 0
Id 0,
event 1
Id 0,
event 2
Id 0,
event 3
Id 2,
event 0
Id 0,
event 2
Id 1,
event 0
93. Id 0,
event 2
Id 0,
event 1
Id 0,
event 0
Id 1,
event 00
1
Id 2,
event 0
Id 0,
event 3
Id 0,
event 0
Id 0,
event 1
Id 2,
event 0
Id 0,
event 2
Id 0,
event 3
node_id
Id 1,
event 0
persistence
_id
seq
0 3
1 0
2 0
94. Id 0,
event 2
Id 0,
event 1
Id 0,
event 0
Id 1,
event 00
1
Id 2,
event 0
Id 0,
event 3
Id 0,
event 0
Id 0,
event 1
Id 2,
event 0
Id 0,
event 2
node_id
Id 1,
event 0
0 0 Id 0,
event 0
Id 0,
event 1
Replay
95. Id 0,
event 2
Id 0,
event 1
Id 0,
event 0
Id 1,
event 00
1
Id 2,
event 0
Id 0,
event 3
Id 0,
event 0
Id 0,
event 1
Id 2,
event 0
Id 0,
event 2
node_id
Id 1,
event 0
0 0 Id 0,
event 0
Id 0,
event 1
96. Id 0,
event 2
Id 0,
event 1
Id 0,
event 0
Id 1,
event 00
1
Id 2,
event 0
Id 0,
event 3
Id 0,
event 0
Id 0,
event 1
Id 2,
event 0
Id 0,
event 2
Id 1,
event 0
0 0 Id 0,
event 0
Id 0,
event 1
node_id
97. Id 0,
event 2
Id 0,
event 1
Id 0,
event 0
Id 1,
event 00
1
Id 2,
event 0
Id 0,
event 3
Id 0,
event 0
Id 0,
event 1
Id 2,
event 0
Id 0,
event 2
Id 1,
event 0
0 0 Id 0,
event 0
Id 0,
event 1
node_id
persistence
_id
seq
0 2
98. Id 0,
event 2
Id 0,
event 1
Id 0,
event 0
Id 1,
event 00
Id 2,
event 0
Id 0,
event 3
Id 0,
event 0
Id 0,
event 1
Id 2,
event 0
Id 0,
event 2
Id 1,
event 0
0 0 Id 0,
event 0
Id 0,
event 1
persistence
_id
seq
0 2
stream_id seq
0 1
1 2
1
node_id
100. Id 0,
event 0
Id 0,
event 1
Id 2,
event 0
Id 0,
event 2
Id 0,
event 3
Id 1,
event 0
101. Id 0,
event 0
Id 0,
event 1
Id 2,
event 0
Id 0,
event 2
Id 0,
event 3
Id 1,
event 0
102. Id 0,
event 0
Id 0,
event 1
Id 2,
event 0
Id 0,
event 2
Id 0,
event 3
Id 1,
event 0
Id 0,
event 0
Id 0,
event 1
Id 2,
event 0
Id 0,
event 3
Id 1,
event 0
ACK ACK ACK ACK ACK
103. Id 0,
event 0
Id 0,
event 1
Id 2,
event 0
Id 0,
event 2
Id 0,
event 3
Id 1,
event 0
Id 0,
event 0
Id 0,
event 1
Id 2,
event 0
Id 0,
event 3
Id 1,
event 0
ACK ACK ACK ACK ACK
104. Id 0,
event 0
Id 0,
event 1
Id 2,
event 0
Id 0,
event 2
Id 0,
event 3
Id 1,
event 0
Id 0,
event 0
Id 0,
event 1
Id 2,
event 0
Id 0,
event 3
Id 1,
event 0
ACK ACK ACK ACK ACK
126. 1 Id 0
Event 1
Table and stream duality
1
4
3
5
2
1 State X
Id 0
Event 2
Id 0
Event 1
127. Snapshot
for offset N
Table and stream duality
1
4
3
5
2
1 Id 0
Event 1
1 State X
Id 0
Event 2
Id 0
Event 1
4
128. Table and stream duality
Snapshot
for offset N
1
4
3
5
2
1 Id 0
Event 1
1 State X
Id 0
Event 2
Id 0
Event 1
4
N
Id 0
Offset 123
State X
Id 11
Offset 123
State X
129. Cache / view / index /
replica / system / service
Continuous stream applying
transformation function
Updates to the source
of truth data
Original table
Infinite streams application
132. Client 1
Client 2
Client 3
Update
Update
Update
Model devices Model devices Model devices
Input data Input data Input data
Parameter devices
P
ΔP
ΔP
ΔP
133. Challenges
● All the solved problems
○ Exactly once delivery
○ Consistency
○ Availability
○ Fault tolerance
○ Cross service invariants and consistency
○ Transactions
○ Automated deployment and configuration management
○ Serialization, versioning, compatibility
○ Automated elasticity
○ No downtime version upgrades
○ Graceful shutdown of nodes
○ Distributed system verification, logging, tracing, monitoring, debugging
○ Split brains
○ ...
134. Conclusion
● From request, response, synchronous, mutable state
● To streams, asynchronous messaging
● Production ready distributed systems
136. MANCHESTER LONDON NEW YORK
@zapletal_martin @cakesolutions
347 708 1518
[email protected]
We are hiring
http://www.cakesolutions.net/careers