This document discusses the history and development of onboard vehicle diagnostics (OBD) standards. It describes the Clean Air Act and Air Quality Act which led to the establishment of emissions standards. The first OBD standard was introduced to help ensure reliable emissions control systems. OBD-II was later enhanced standard made mandatory for all vehicles from 1996 onward. It established a standardized way for technicians to access diagnostic information from a vehicle to help with repairs. The document then outlines the objectives and approach of a project to design a low-cost OBD-II scanner.
As presented at the Global SAFe Summit 2018 in Washington, DC
https://www.safesummit.com/sessions/the-synergistic-nature-of-pi-objectives/
The document discusses onboard diagnostics (OBD) systems and how to diagnose problems related to check engine lights. It provides the following key points:
1. OBD systems monitor components that could cause emissions problems and illuminate the check engine light if issues are detected. Codes are generated but do not always immediately turn on the light.
2. Scan tools are needed to read codes and determine what systems are being monitored and if any issues exist. More advanced scan tools are needed for some diagnostics.
3. Driving cycles may be required to fully run all monitors and diagnose some intermittent problems. Monitoring readiness and ensuring all diagnostics have run is important for emissions testing.
4. In addition to
Updated with latest version as presented at the Canberra Agile & Scrum meetup on July 20, 2017. Previously titled "Using Agile techniques to manage risk more effectively".
Given that the "Waterfall" process model has been dominant in the IT industry for many decades, how many IT and project management professionals are aware that it's inventor warned the world in 1970 that Waterfall is "risky and invites failure"?
From a risk management perspective, is waterfall ever an appropriate choice for complex IT initiatives given what we know now?
In this session we will outline how, as a risk management strategy, using the waterfall model for non-trivial systems development initiatives is systemically high risk as compared with the Iterative Incremental Development (IID) model that has been used in pockets of the IT industry since the late 1950's. Today, many organisations use the IID strategy under the umbrella term of 'Agile'. The majority of these employ Lean Product Development patterns that were first described in the Harvard Business Review in 1986 using a metaphor borrowed from the game of rugby i.e. 'Scrum'.
If you are not using a disciplined agile approach, are you facing more risk as you approach a high-stakes deadline than you need to?
The varied contexts that we work in come with varied types of risk. For a green fields date-driven release, the primary risk may be cost and schedule related. For teams designing a new product for an emerging market, the primary risks may be business risk. For teams doing innovative R&D, the primary risk may technical risk. For a young team in a new technical or business domain, the primary risk may be social risk. In this session, we will use real world examples of such varied challenges to illustrate how risk-tuned Agile helped us to manage risk effectively.
Whilst we will always have to deal with risk to create value, the good news is that there are now many powerful risk management techniques that can be overlaid on top of IID to tune your development process to the type of risk you face. The question is: which ones are most appropriate for the type of risk you are facing? In this workshop we outline a series of powerful risk management tools that tune an agile development process to effectively manage the type of risk that you face.
This document discusses landslides, including their causes, types, effects, indicators, prevention, and safety measures. It defines landslides as the downward movement of soil, rock, and vegetation under gravity. Key points include that landslides occur when resisting forces are less than driving forces, and can be triggered by heavy rainfall, earthquakes, erosion, deforestation, and human activities like excavation. The document outlines common landslide types and describes their impacts, such as damage to infrastructure, loss of life, and secondary hazards like flooding. It provides guidance on landslide hazard mapping, mitigation strategies, and safety precautions during landslide events.
A strategy is a plan of action designed to achieve a vision, derived from the Greek word for command. Military strategy deals with planning campaigns and troop movements and deception of enemies. Business strategy involves defining an organization's direction and allocating resources to pursue this strategy. Game theory studies strategic decision making and conflict and cooperation between rational decision makers. Popular strategy games include Risk, StarCraft, and Chess.
Spotify provides personalized music recommendations to over 100 million active users based on their listening history and the listening history of similar users. It utilizes various recommendation approaches, including collaborative filtering using latent factor models to create lower-dimensional representations of users and songs. Spotify also uses natural language processing models on playlist data and deep learning on audio features to power recommendations. Personalizing music at Spotify's massive scale across 30 million tracks presents challenges around cold starts, repeated consumption, and measuring recommendation quality.
The document provides an introduction to agile methods for executives. It discusses how agile approaches can help organizations adapt to increasingly volatile business environments. The key benefits of agile include shorter time to market, increased productivity, improved alignment with business needs, and greater predictability. The document outlines agile concepts like iterative development, minimal viable products, continuous delivery and focus on customer value. It also summarizes common agile frameworks like Scrum and how agility can be scaled in large organizations.
The evolution of machine learning and IoT have made it possible for manufacturers to build more effective applications for predictive maintenance than ever before. Despite the huge potential that machine learning offers for predictive maintenance, it's challenging to build solutions that can handle the speed of IoT data streams and the massively large datasets required to train models that can forecast rare events like mechanical failures. Solving these challenges requires knowledge about state-of-the-art dataware, such as MapR, and cluster computing frameworks, such as Spark, which give developers foundational APIs for consuming and transforming data into feature tables useful for machine learning.
The primary reasons for using parallel computing:
Save time - wall clock time
Solve larger problems
Provide concurrency (do multiple things at the same time)
virtual memory management in multi processor mach osAJAY KHARAT
Virtual memory management in multi-processor Mach OS allows processes to access more memory than is physically installed by using virtual addresses. The Mach kernel provides basic services like tasks, threads, messages, and ports to enable parallel and distributed applications. Tasks have their own virtual address spaces that are divided into pages which are allocated to physical frames. The virtual memory system provides protection at the page level by using protection codes in page table entries to control read, write, and execute permissions.
Location Based System For Mobile Devices Using Rfidvein
This document summarizes a location-based system for mobile devices using RFID technology. The system uses a Java application on a mobile phone with an attached RFID reader to locate the user's position in a building based on fixed RFID tags. The system architecture is presented along with related work in other location-based systems using radio signal strength analysis or fixed RFID readers. Screenshots of a mobile emulator executing the application are also shown. In conclusion, the presented system was able to accurately locate users and send them information based on their location and interests.
The ppt contains detail about issues and scheduling technique of real-time systems. It includes scheduling both online and offline for uniprocessor system. The applications of real-time system is also there
Presentation - Programming a Heterogeneous Computing ClusterAashrith Setty
This document provides an overview of programming a heterogeneous computing cluster using the Message Passing Interface (MPI). It begins with background on heterogeneous computing and MPI. It then discusses the MPI programming model and environment management routines. A vector addition example is presented to demonstrate an MPI implementation. Point-to-point and collective communication routines are explained. Finally, it covers groups, communicators, and virtual topologies in MPI programming.
This document provides an overview of task scheduling algorithms for load balancing in cloud computing. It begins with introductions to cloud computing and load balancing. It then surveys several existing task scheduling algorithms, including Min-Min, Max-Min, Resource Awareness Scheduling Algorithm, QoS Guided Min-Min, and others. It discusses the goals, workings, results and problems of each algorithm. It identifies the need for an optimized task scheduling algorithm. It also discusses tools like CloudSim that can be used to simulate scheduling algorithms and evaluate performance.
Scalable and Order-robust Continual Learning with Additive Parameter Decompos...MLAI2
While recent continual learning methods largely alleviate the catastrophic problem on toy-sized datasets, some issues remain to be tackled to apply them to real-world problem domains. First, a continual learning model should effectively handle catastrophic forgetting and be efficient to train even with a large number of tasks. Secondly, it needs to tackle the problem of order-sensitivity, where the performance of the tasks largely varies based on the order of the task arrival sequence, as it may cause serious problems where fairness plays a critical role (e.g. medical diagnosis). To tackle these practical challenges, we propose a novel continual learning method that is scalable as well as order-robust, which instead of learning a completely shared set of weights, represents the parameters for each task as a sum of task-shared and sparse task-adaptive parameters. With our Additive Parameter Decomposition (APD), the task-adaptive parameters for earlier tasks remain mostly unaffected, where we update them only to reflect the changes made to the task-shared parameters. This decomposition of parameters effectively prevents catastrophic forgetting and order-sensitivity, while being computation- and memory-efficient. Further, we can achieve even better scalability with APD using hierarchical knowledge consolidation, which clusters the task-adaptive parameters to obtain hierarchically shared parameters. We validate our network with APD, APD-Net, on multiple benchmark datasets against state-of-the-art continual learning methods, which it largely outperforms in accuracy, scalability, and order-robustness
The document summarizes a weather application project. The project aimed to create an application that provides users with personalized weather alerts, detailed forecasts, and real-time updates tailored to their location. The objective was to give users accurate and up-to-date weather information to help them plan activities and stay safe during severe weather. The application was developed using HTML, CSS, and JavaScript to deliver a seamless user experience and integrate real-time weather data.
Cloud computing and grid computing 360 degree comparedMd. Hasibur Rashid
Cloud computing builds upon concepts from cluster and grid computing. Cluster computing links multiple computers to share workloads, while grid computing dynamically aggregates distributed resources for tasks. Cloud computing provides scalable resources and services over the internet. It extends concepts from grid computing by offering virtualized, dynamically provisioned resources on-demand. Key differences are that cloud computing has loose coupling between providers and consumers, supports scaling, and offers services under a pay-per-use business model. Common cloud services are SaaS, PaaS, and IaaS. Challenges include dynamic scalability, security, and standardization. Cloud computing shows promise for further research in areas like security, interoperability and dynamic pricing models.
The document discusses a draft cloud computing initiative vision and strategy for the federal government. It aims to establish secure, easy-to-use IT services through cloud computing. The goals are to drive adoption of cost-effective cloud solutions and provide services like infrastructure, platform, and software as a service. Various considerations around delivery models, security, and governance are also outlined.
This document discusses analytics for IoT and making sense of data from sensors. It first provides an overview of Innohabit Technologies' vision and products related to contextual intelligence platforms, machine learning analytics, and predictive network health analytics. It then discusses how analytics can help make sense of the endless sea of data from IoT sensors, highlighting key applications of analytics in areas like industrial IoT, smart retail, autonomous vehicles, and more. The benefits of analytics adoption in industrial IoT contexts include optimized asset maintenance, production operations, supply chain management, and more.
Real Time Analytics with Apache Cassandra - Cassandra Day MunichGuido Schmutz
Time series data is everywhere: IoT, sensor data or financial transactions. The industry has moved to databases like Cassandra to handle the high velocity and high volume of data that is now common place. In this talk I will present how we have used Cassandra to store time series data. I will highlight both the Cassandra data model as well as the architecture we put in place for collecting and ingesting data into Cassandra, using Apache Kafka and Apache Storm.
Real-time Stream Processing with Apache Flink @ Hadoop SummitGyula Fóra
Apache Flink is an open source project that offers both batch and stream processing on top of a common runtime and exposing a common API. This talk focuses on the stream processing capabilities of Flink.
This tutorial was presented in KDD 2016 conference in San Francisco, CA. You can find the main presentation at http://www.slideshare.net/NeeraAgarwal2/streaming-analytics
RBea: Scalable Real-Time Analytics at KingGyula Fóra
This talk introduces RBEA (Rule-Based Event Aggregator), the scalable real-time analytics platform developed by King’s Streaming Platform team. We have built RBEA to make real-time analytics easily accessible to game teams across King without having to worry about operational details. RBEA is built on top of Apache Flink and uses the framework’s capabilities to it’s full potential in order to provide highly scalable stateful and windowed processing logic for the analytics applications. We will talk about how we have built a high-level DSL on the abstractions provided by Flink and how we tackled different technical challenges that have come up while developing the system.
Large-Scale Stream Processing in the Hadoop EcosystemGyula Fóra
Distributed stream processing is one of the hot topics in big data analytics today. An increasing number of applications are shifting from traditional static data sources to processing the incoming data in real-time. Performing large scale stream processing or analysis requires specialized tools and techniques which have become publicly available in the last couple of years.
This talk will give a deep, technical overview of the top-level Apache stream processing landscape. We compare several frameworks including Spark, Storm, Samza and Flink. Our goal is to highlight the strengths and weaknesses of the individual systems in a project-neutral manner to help selecting the best tools for the specific applications. We will touch on the topics of API expressivity, runtime architecture, performance, fault-tolerance and strong use-cases for the individual frameworks.
Real Time Analytics with Apache Cassandra - Cassandra Day BerlinGuido Schmutz
Time series data is everywhere: IoT, sensor data or financial transactions. The industry has moved to databases like Cassandra to handle the high velocity and high volume of data that is now common place. In this talk I will present how we have used Cassandra to store time series data. I will highlight both the Cassandra data model as well as the architecture we put in place for collecting and ingesting data into Cassandra, using Apache Kafka and Apache Storm.
Real-time analytics as a service at King Gyula Fóra
This talk introduces RBea, our scalable real-time analytics platform at King built on top of Apache Flink. The design goal of RBea is to make stream analytics easily accessible to game teams across King. RBea is powered by Apache Flink and uses the framework’s capabilities to it’s full potential in order to provide highly scalable stateful and windowed processing logic for the analytics applications. RBea provides a high-level scripting DSL that is more approachable to developers without stream-processing experience and uses code-generation to execute user-scripts efficiently at scale.
In this talk I will cover the technical details of the RBea architecture and will also look at what real-time analytics brings to the table from the business perspective. If time permits I will also give some outlook on our future plans to generalise and further grow the platform.
Streaming analytics provides real-time processing of continuous data streams. It contrasts with batch processing which operates on bounded datasets. Streaming analytics is used for applications like clickstream analysis, fraud detection, and IoT. Key concepts include event time windows, exactly-once processing, and state management. LinkedIn's streaming platform standardizes profile data in real-time using techniques like stream-table joins, broadcast joins, and reprocessing prior data. Popular open source streaming systems include Kafka Streams, Spark Streaming, Flink, and Storm.
This presentation examines some of the top stream analytic platforms in the enterprise. The slide deck explores the characteristics of enterprise stream analytic solutions and discusses the capabilties of some of the top stream analytic platform in the current market.
The evolution of machine learning and IoT have made it possible for manufacturers to build more effective applications for predictive maintenance than ever before. Despite the huge potential that machine learning offers for predictive maintenance, it's challenging to build solutions that can handle the speed of IoT data streams and the massively large datasets required to train models that can forecast rare events like mechanical failures. Solving these challenges requires knowledge about state-of-the-art dataware, such as MapR, and cluster computing frameworks, such as Spark, which give developers foundational APIs for consuming and transforming data into feature tables useful for machine learning.
The primary reasons for using parallel computing:
Save time - wall clock time
Solve larger problems
Provide concurrency (do multiple things at the same time)
virtual memory management in multi processor mach osAJAY KHARAT
Virtual memory management in multi-processor Mach OS allows processes to access more memory than is physically installed by using virtual addresses. The Mach kernel provides basic services like tasks, threads, messages, and ports to enable parallel and distributed applications. Tasks have their own virtual address spaces that are divided into pages which are allocated to physical frames. The virtual memory system provides protection at the page level by using protection codes in page table entries to control read, write, and execute permissions.
Location Based System For Mobile Devices Using Rfidvein
This document summarizes a location-based system for mobile devices using RFID technology. The system uses a Java application on a mobile phone with an attached RFID reader to locate the user's position in a building based on fixed RFID tags. The system architecture is presented along with related work in other location-based systems using radio signal strength analysis or fixed RFID readers. Screenshots of a mobile emulator executing the application are also shown. In conclusion, the presented system was able to accurately locate users and send them information based on their location and interests.
The ppt contains detail about issues and scheduling technique of real-time systems. It includes scheduling both online and offline for uniprocessor system. The applications of real-time system is also there
Presentation - Programming a Heterogeneous Computing ClusterAashrith Setty
This document provides an overview of programming a heterogeneous computing cluster using the Message Passing Interface (MPI). It begins with background on heterogeneous computing and MPI. It then discusses the MPI programming model and environment management routines. A vector addition example is presented to demonstrate an MPI implementation. Point-to-point and collective communication routines are explained. Finally, it covers groups, communicators, and virtual topologies in MPI programming.
This document provides an overview of task scheduling algorithms for load balancing in cloud computing. It begins with introductions to cloud computing and load balancing. It then surveys several existing task scheduling algorithms, including Min-Min, Max-Min, Resource Awareness Scheduling Algorithm, QoS Guided Min-Min, and others. It discusses the goals, workings, results and problems of each algorithm. It identifies the need for an optimized task scheduling algorithm. It also discusses tools like CloudSim that can be used to simulate scheduling algorithms and evaluate performance.
Scalable and Order-robust Continual Learning with Additive Parameter Decompos...MLAI2
While recent continual learning methods largely alleviate the catastrophic problem on toy-sized datasets, some issues remain to be tackled to apply them to real-world problem domains. First, a continual learning model should effectively handle catastrophic forgetting and be efficient to train even with a large number of tasks. Secondly, it needs to tackle the problem of order-sensitivity, where the performance of the tasks largely varies based on the order of the task arrival sequence, as it may cause serious problems where fairness plays a critical role (e.g. medical diagnosis). To tackle these practical challenges, we propose a novel continual learning method that is scalable as well as order-robust, which instead of learning a completely shared set of weights, represents the parameters for each task as a sum of task-shared and sparse task-adaptive parameters. With our Additive Parameter Decomposition (APD), the task-adaptive parameters for earlier tasks remain mostly unaffected, where we update them only to reflect the changes made to the task-shared parameters. This decomposition of parameters effectively prevents catastrophic forgetting and order-sensitivity, while being computation- and memory-efficient. Further, we can achieve even better scalability with APD using hierarchical knowledge consolidation, which clusters the task-adaptive parameters to obtain hierarchically shared parameters. We validate our network with APD, APD-Net, on multiple benchmark datasets against state-of-the-art continual learning methods, which it largely outperforms in accuracy, scalability, and order-robustness
The document summarizes a weather application project. The project aimed to create an application that provides users with personalized weather alerts, detailed forecasts, and real-time updates tailored to their location. The objective was to give users accurate and up-to-date weather information to help them plan activities and stay safe during severe weather. The application was developed using HTML, CSS, and JavaScript to deliver a seamless user experience and integrate real-time weather data.
Cloud computing and grid computing 360 degree comparedMd. Hasibur Rashid
Cloud computing builds upon concepts from cluster and grid computing. Cluster computing links multiple computers to share workloads, while grid computing dynamically aggregates distributed resources for tasks. Cloud computing provides scalable resources and services over the internet. It extends concepts from grid computing by offering virtualized, dynamically provisioned resources on-demand. Key differences are that cloud computing has loose coupling between providers and consumers, supports scaling, and offers services under a pay-per-use business model. Common cloud services are SaaS, PaaS, and IaaS. Challenges include dynamic scalability, security, and standardization. Cloud computing shows promise for further research in areas like security, interoperability and dynamic pricing models.
The document discusses a draft cloud computing initiative vision and strategy for the federal government. It aims to establish secure, easy-to-use IT services through cloud computing. The goals are to drive adoption of cost-effective cloud solutions and provide services like infrastructure, platform, and software as a service. Various considerations around delivery models, security, and governance are also outlined.
This document discusses analytics for IoT and making sense of data from sensors. It first provides an overview of Innohabit Technologies' vision and products related to contextual intelligence platforms, machine learning analytics, and predictive network health analytics. It then discusses how analytics can help make sense of the endless sea of data from IoT sensors, highlighting key applications of analytics in areas like industrial IoT, smart retail, autonomous vehicles, and more. The benefits of analytics adoption in industrial IoT contexts include optimized asset maintenance, production operations, supply chain management, and more.
Real Time Analytics with Apache Cassandra - Cassandra Day MunichGuido Schmutz
Time series data is everywhere: IoT, sensor data or financial transactions. The industry has moved to databases like Cassandra to handle the high velocity and high volume of data that is now common place. In this talk I will present how we have used Cassandra to store time series data. I will highlight both the Cassandra data model as well as the architecture we put in place for collecting and ingesting data into Cassandra, using Apache Kafka and Apache Storm.
Real-time Stream Processing with Apache Flink @ Hadoop SummitGyula Fóra
Apache Flink is an open source project that offers both batch and stream processing on top of a common runtime and exposing a common API. This talk focuses on the stream processing capabilities of Flink.
This tutorial was presented in KDD 2016 conference in San Francisco, CA. You can find the main presentation at http://www.slideshare.net/NeeraAgarwal2/streaming-analytics
RBea: Scalable Real-Time Analytics at KingGyula Fóra
This talk introduces RBEA (Rule-Based Event Aggregator), the scalable real-time analytics platform developed by King’s Streaming Platform team. We have built RBEA to make real-time analytics easily accessible to game teams across King without having to worry about operational details. RBEA is built on top of Apache Flink and uses the framework’s capabilities to it’s full potential in order to provide highly scalable stateful and windowed processing logic for the analytics applications. We will talk about how we have built a high-level DSL on the abstractions provided by Flink and how we tackled different technical challenges that have come up while developing the system.
Large-Scale Stream Processing in the Hadoop EcosystemGyula Fóra
Distributed stream processing is one of the hot topics in big data analytics today. An increasing number of applications are shifting from traditional static data sources to processing the incoming data in real-time. Performing large scale stream processing or analysis requires specialized tools and techniques which have become publicly available in the last couple of years.
This talk will give a deep, technical overview of the top-level Apache stream processing landscape. We compare several frameworks including Spark, Storm, Samza and Flink. Our goal is to highlight the strengths and weaknesses of the individual systems in a project-neutral manner to help selecting the best tools for the specific applications. We will touch on the topics of API expressivity, runtime architecture, performance, fault-tolerance and strong use-cases for the individual frameworks.
Real Time Analytics with Apache Cassandra - Cassandra Day BerlinGuido Schmutz
Time series data is everywhere: IoT, sensor data or financial transactions. The industry has moved to databases like Cassandra to handle the high velocity and high volume of data that is now common place. In this talk I will present how we have used Cassandra to store time series data. I will highlight both the Cassandra data model as well as the architecture we put in place for collecting and ingesting data into Cassandra, using Apache Kafka and Apache Storm.
Real-time analytics as a service at King Gyula Fóra
This talk introduces RBea, our scalable real-time analytics platform at King built on top of Apache Flink. The design goal of RBea is to make stream analytics easily accessible to game teams across King. RBea is powered by Apache Flink and uses the framework’s capabilities to it’s full potential in order to provide highly scalable stateful and windowed processing logic for the analytics applications. RBea provides a high-level scripting DSL that is more approachable to developers without stream-processing experience and uses code-generation to execute user-scripts efficiently at scale.
In this talk I will cover the technical details of the RBea architecture and will also look at what real-time analytics brings to the table from the business perspective. If time permits I will also give some outlook on our future plans to generalise and further grow the platform.
Streaming analytics provides real-time processing of continuous data streams. It contrasts with batch processing which operates on bounded datasets. Streaming analytics is used for applications like clickstream analysis, fraud detection, and IoT. Key concepts include event time windows, exactly-once processing, and state management. LinkedIn's streaming platform standardizes profile data in real-time using techniques like stream-table joins, broadcast joins, and reprocessing prior data. Popular open source streaming systems include Kafka Streams, Spark Streaming, Flink, and Storm.
This presentation examines some of the top stream analytic platforms in the enterprise. The slide deck explores the characteristics of enterprise stream analytic solutions and discusses the capabilties of some of the top stream analytic platform in the current market.
Reliable Data Intestion in BigData / IoTGuido Schmutz
Many of the Big Data and IoT use cases are based on combing data from multiple data sources and to make them available on a Big Data platform for analysis. The data sources are often very heterogeneous, from simple files, databases to high-volume event streams from sensors (IoT devices). It’s important to retrieve this data in a secure and reliable manner and integrate it with the Big Data platform so that it is available for analysis in real-time (stream processing) as well as in batch (typical big data processing). In past some new tools have emerged, which are especially capable of handling the process of integrating data from outside, often called Data Ingestion. From an outside perspective, they are very similar to a traditional Enterprise Service Bus infrastructures, which in larger organization are often in use to handle message-driven and service-oriented systems. But there are also important differences, they are typically easier to scale in a horizontal fashion, offer a more distributed setup, are capable of handling high-volumes of data/messages, provide a very detailed monitoring on message level and integrate very well with the Hadoop ecosystem. This session will present and compare Apache Flume, Apache NiFi, StreamSets and the Kafka Ecosystem and show how they handle the data ingestion in a Big Data solution architecture.
Processing data from social media streams and sensors in real-time is becoming increasingly prevalent and there are plenty open source solutions to choose from. To help practitioners decide what to use when we compare three popular Apache projects allowing to do stream processing: Apache Storm, Apache Spark and Apache Samza.
The end of polling : why and how to transform a REST API into a Data Streamin...Audrey Neveu
We know interactivity is the key to keep our user’s interest alive but we can’t reduce animation to UI anymore. Twitter, Waze, Slack… users are used to have real-time data in applications they love. But how can you turn your static API into a stream of data?
When talking about data streaming, we often think about WebSockets. But have you ever heard of Server-Sent Events? In this tools-in-action we will compare both technologies to understand which one you should opt for depending on your usecase and I’ll show you how we have been even further by reducing the amount of data to transfer with JSON-Patch.
And because real-time data is not only needed by web (and because it’s much more fun), I’ll show you how we can make drone dance on streamed APIs.
More complex streaming applications generally need to store some state of the running computations in a fault-tolerant manner. This talk discusses the concept of operator state and compares state management in current stream processing frameworks such as Apache Flink Streaming, Apache Spark Streaming, Apache Storm and Apache Samza.
We will go over the recent changes in Flink streaming that introduce a unique set of tools to manage state in a scalable, fault-tolerant way backed by a lightweight asynchronous checkpointing algorithm.
Talk presented in the Apache Flink Bay Area Meetup group on 08/26/15
Independent of the source of data, the integration and analysis of event streams gets more important in the world of sensors, social media streams and Internet of Things. Events have to be accepted quickly and reliably, they have to be distributed and analysed, often with many consumers or systems interested in all or part of the events.
So far this mostly a development experience, with frameworks such as Oracle Event Processing, Apache Storm or Spark Streaming. With Oracle Stream Analytics, analytics on event streams can be put in the hands of the business analyst. It simplifies the implementation of event processing solutions so that every business analyst is able to graphically and decleratively define event stream processing pipelines, without having to write a single line of code or continous query language (CQL). Event Processing is no longer “complex”! This session presents Oracle Stream Analytics directly on some selected demo use cases.
Apache Kafka - Scalable Message-Processing and more !Guido Schmutz
ndependent of the source of data, the integration of event streams into an Enterprise Architecture gets more and more important in the world of sensors, social media streams and Internet of Things. Events have to be accepted quickly and reliably, they have to be distributed and analysed, often with many consumers or systems interested in all or part of the events. How can me make sure that all these event are accepted and forwarded in an efficient and reliable way? This is where Apache Kafaka comes into play, a distirbuted, highly-scalable messaging broker, build for exchanging huge amount of messages between a source and a target.
This session will start with an introduction into Apache and presents the role of Apache Kafka in a modern data / information architecture and the advantages it brings to the table. Additionally the Kafka ecosystem will be covered as well as the integration of Kafka in the Oracle Stack, with products such as Golden Gate, Service Bus and Oracle Stream Analytics all being able to act as a Kafka consumer or producer.
Big Data Architectures @ JAX / BigDataCon 2016Guido Schmutz
Mit der Architektur steht und fällt jedes IT-Projekt. Das gilt in noch stärkerem Maße für Big-Data-Projekte, denn hier konnten noch keine Standards über Jahrzehnte ihre Tauglichkeit beweisen. Dennoch verbreiten und etablieren sich auch hier gute und effektive Lösungen. Der Vortrag erklärt, welche Bausteine wichtig für die verschiedenen Einsatzmöglichkeiten im Big-Data-Umfeld sind, und wie sie in konkrete Lösungen gegossen werden können. Dabei beleuchtet er sowohl traditionelle Big-Data-Architekturen als auch aktuelle Ansätze, wie z. B. die Lambda- und die Kappa-Architektur. Ebenfalls ein Thema sind Stream-Processing-Infrastrukturen und ihre Kombination mit Big-Data-Technologien. Ausgehend von einer produkt- und technologieunabhängigen Referenzarchitektur stellt dieser Vortrag verschiedene Lösungsmöglichkeiten auf Basis von Open-Source-Komponenten vor.
Distributed Real-Time Stream Processing: Why and How 2.0Petr Zapletal
The demand for stream processing is increasing a lot these day. Immense amounts of data has to be processed fast from a rapidly growing set of disparate data sources. This pushes the limits of traditional data processing infrastructures. These stream-based applications include trading, social networks, Internet of things, system monitoring, and many other examples.
In this talk we are going to discuss various state of the art open-source distributed streaming frameworks, their similarities and differences, implementation trade-offs and their intended use-cases. Apart of that, I’m going to speak about Fast Data, theory of streaming, framework evaluation and so on. My goal is to provide comprehensive overview about modern streaming frameworks and to help fellow developers with picking the best possible for their particular use-case.
Large-Scale Stream Processing in the Hadoop Ecosystem - Hadoop Summit 2016Gyula Fóra
Distributed stream processing is one of the hot topics in big data analytics today. An increasing number of applications are shifting from traditional static data sources to processing the incoming data in real-time. Performing large scale stream analysis requires specialized tools and techniques which have become widely available in the last couple of years. This talk will give a deep, technical overview of the Apache stream processing landscape. We compare several frameworks including Flink , Spark, Storm, Samza and Apex. Our goal is to highlight the strengths and weaknesses of the individual systems in a project-neutral manner to help selecting the best tools for the specific applications. We will touch on the topics of API expressivity, runtime architecture, performance, fault-tolerance and strong use-cases for the individual frameworks. This talk is targeted towards anyone interested in streaming analytics either from user’s or contributor’s perspective. The attendees can expect to get a clear view of the available open-source stream processing architectures
Independent of the source of data, the integration of event streams into an Enterprise Architecture gets more and more important in the world of sensors, social media streams and Internet of Things. Events have to be accepted quickly and reliably, they have to be distributed and analysed, often with many consumers or systems interested in all or part of the events. Storing such huge event streams into HDFS or a NoSQL datastore is feasible and not such a challenge anymore. But if you want to be able to react fast, with minimal latency, you can not afford to first store the data and doing the analysis/analytics later. You have to be able to include part of your analytics right after you consume the event streams. Products for doing event processing, such as Oracle Event Processing or Esper, are avaialble for quite a long time and also used to be called Complex Event Processing (CEP). In the last 3 years, another family of products appeared, mostly out of the Big Data Technology space, called Stream Processing or Streaming Analytics. These are mostly open source products/frameworks such as Apache Storm, Spark Streaming, Apache Samza as well as supporting infrastructures such as Apache Kafka. In this talk I will present the theoretical foundations for Event and Stream Processing and present what differences you might find between the more traditional CEP and the more modern Stream Processing solutions and show that a combination of both will bring the most value.
From Pipelines to Refineries: scaling big data applications with Tim HunterDatabricks
Big data tools are challenging to combine into a larger application: ironically, big data applications themselves do not tend to scale very well. These issues of integration and data management are only magnified by increasingly large volumes of data. Apache Spark provides strong building blocks for batch processes, streams and ad-hoc interactive analysis. However, users face challenges when putting together a single coherent pipeline that could involve hundreds of transformation steps, especially when confronted by the need of rapid iterations. This talk explores these issues through the lens of functional programming. It presents an experimental framework that provides full-pipeline guarantees by introducing more laziness to Apache Spark. This framework allows transformations to be seamlessly composed and alleviates common issues, thanks to whole program checks, auto-caching, and aggressive computation parallelization and reuse.
Hard Truths About Streaming and Eventing (Dan Rosanova, Microsoft) Kafka Summ...confluent
Eventing and streaming open a world of compelling new possibilities to our software and platform designs. They can reduce time to decision and action while lowering total platform cost. But they are not a panacea. Understanding the edges and limits of these architectures can help you avoid painful missteps. This talk will focus on event driven and streaming architectures and how Apache Kafka can help you implement these. It will also discuss key tradeoffs you will face along the way from partitioning schemes to the impact of availability vs. consistency (CAP Theorem). Finally we’ll discuss some challenges of scale for patterns like Event Sourcing and how you can use other tools and even features of Kafka to work around them. This talk assumes a basic understanding of Kafka and distributed computing, but will include brief refresher sections.
From Pipelines to Refineries: Scaling Big Data ApplicationsDatabricks
Big data tools are challenging to combine into a larger application: ironically, big data applications themselves do not tend to scale very well. These issues of integration and data management are only magnified by increasingly large volumes of data.
Apache Spark provides strong building blocks for batch processes, streams and ad-hoc interactive analysis. However, users face challenges when putting together a single coherent pipeline that could involve hundreds of transformation steps, especially when confronted by the need of rapid iterations.
This talk explores these issues through the lens of functional programming. It presents an experimental framework that provides full-pipeline guarantees by introducing more laziness to Apache Spark. This framework allows transformations to be seamlessly composed and alleviates common issues, thanks to whole program checks, auto-caching, and aggressive computation parallelization and reuse.
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedInLinkedIn
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn. This was a presentation made at QCon 2009 and is embedded on LinkedIn's blog - http://blog.linkedin.com/
Hard Truths About Streaming and Eventing (Dan Rosanova, Microsoft) Kafka Summ...confluent
Eventing and streaming open a world of compelling new possibilities to our software and platform designs. They can reduce time to decision and action while lowering total platform cost. But they are not a panacea. Understanding the edges and limits of these architectures can help you avoid painful missteps. This talk will focus on event-driven and streaming architectures and how Apache Kafka can help you implement these. It will also discuss key tradeoffs you will face along the way from partitioning schemes to the impact of availability vs. consistency (CAP Theorem). Finally, we’ll discuss some challenges of scale for patterns like Event Sourcing and how you can use other tools and even features of Kafka to work around them. This talk assumes a basic understanding of Kafka and distributed computing but will include brief refresher sections.
Today's technical landscape features workloads that can no longer be accomplished on a single server using technology from years past. As a result, we must find new ways to accommodate the increasing demands on our compute performance. Some of these new strategies introduce trade-offs and additional complexity into a system.
In this presentation, we give an overview of scaling and how to address performance concerns that business are facing, today.
This document discusses hardware provisioning best practices for MongoDB. It covers key concepts like bottlenecks, working sets, and replication vs sharding. It also presents two case studies where these concepts were applied: 1) For a Spanish bank storing logs, the working set was 4TB so they provisioned servers with at least that much RAM. 2) For an online retailer storing products, testing found the working set was 270GB, so they recommended a replica set with 384GB RAM per server to avoid complexity of sharding. The key lessons are to understand requirements, test with a proof of concept, measure resource usage, and expect that applications may become bottlenecks over time.
Patterns of the Lambda Architecture -- 2015 April -- Hadoop Summit, EuropeFlip Kromer
This talk centers on two things: a set of patterns for the architecture of high-scale data systems; and a framework for understanding the tradeoffs we make in designing them.
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...smallerror
Twitter's operations team manages software performance, availability, capacity planning, and configuration management for Twitter. They use metrics, logs, and analysis to find weak points and take corrective action. Some techniques include caching everything possible, moving operations to asynchronous daemons, and optimizing databases to reduce replication delay and locks. The team also created several open source projects like CacheMoney for caching and Kestrel for asynchronous messaging.
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...xlight
Fixing Twitter and Finding your own Fail Whale document discusses Twitter operations. The operations team manages software performance, availability, capacity planning, and configuration management using metrics, logs, and data-driven analysis to find weak points and take corrective action. They use managed services for infrastructure to focus on computer science problems. The document outlines Twitter's rapid growth and challenges in maintaining performance as traffic increases. It provides recommendations around caching, databases, asynchronous processing, and other techniques Twitter uses to optimize performance under heavy load.
Twitter's operations team manages software performance, availability, capacity planning, and configuration management. They use metrics, logs, and analysis to find weak points and take corrective action. Some techniques include caching everything possible, moving operations to asynchronous daemons, optimizing databases, and instrumenting all systems. Their goal is to process requests asynchronously when possible and avoid overloading relational databases.
Fixing Twitter and Finding your own Fail Whale document discusses Twitter operations. The Twitter operations team focuses on software performance, availability, capacity planning, and configuration management using metrics, logs, and science. They use a dedicated managed services team and run their own servers instead of cloud services. The document outlines Twitter's rapid growth and challenges in maintaining performance. It discusses strategies for monitoring, analyzing metrics to find weak points, deploying changes, and improving processes through configuration management and peer reviews.
InfluxEnterprise Architecture Patterns by Tim Hall & Sam DillardInfluxData
1. The document provides an overview of InfluxEnterprise, including its core open source functionality, high availability features, scalability, fine-grained authorization, support options, and on-premise or cloud deployment options.
2. It discusses signs that an organization may be ready for InfluxEnterprise, such as high CPU usage, issues with single node deployments, and needing improved data durability or throughput.
3. The document covers InfluxEnterprise cluster architecture including meta nodes, data nodes, replication patterns, ingestion and query rates for different replication configurations, and examples for mothership, durable data ingest, and integrating with ElasticSearch deployments.
This is the story of how we managed to scale and improve Tappsi’s RoR RESTful API to handle our ever-growing load - told from different perspectives: infrastructure, data storage tuning, web server tuning, RoR optimization, monitoring and architecture design.
Cloud computing UNIT 2.1 presentation inRahulBhole12
Cloud storage allows users to store files online through cloud storage providers like Apple iCloud, Dropbox, Google Drive, Amazon Cloud Drive, and Microsoft SkyDrive. These providers offer various amounts of free storage and options to purchase additional storage. They allow files to be securely uploaded, accessed, and synced across devices. The best cloud storage provider depends on individual needs and preferences regarding storage space requirements and features offered.
MongoDB: How We Did It – Reanimating Identity at AOLMongoDB
AOL experienced explosive growth and needed a new database that was both flexible and easy to deploy with little effort. They chose MongoDB. Due to the complexity of internal systems and the data, most of the migration process was spent building a new identity platform and adapters for legacy apps to talk to MongoDB. Systems were migrated in 4 phases to ensure that users were not impacted during the switch. Turning on dual reads/writes to both legacy databases and MongoDB also helped get production traffic into MongoDB during the process. Ultimately, the project was successful with the help of MongoDB support. Today, the team has 15 shards, with 60-70 GB per shard.
This document provides an overview of a distributed systems course taught in French. It includes the following key points:
- The course objectives are to understand challenges in distributed systems, implement distributed systems, discover distributed algorithms, study examples of distributed systems, and explore distributed systems research.
- The course consists of 8 sessions over 4 hours each that include lectures, tutorials, labs, presentations, and an exam.
- Distributed systems are defined as independent computers that appear as a single coherent system to users. Key characteristics include concurrency, lack of global state, potential node and message failures, unsynchronized clocks, and heterogeneity.
Headaches and Breakthroughs in Building Continuous ApplicationsDatabricks
At SpotX, we have built and maintained a portfolio of Spark Streaming applications -- all of which process records in the millions per minute. From pure data ingestion, to ETL, to real-time reporting, to live customer-facing products and features, continuous applications are in our DNA. Come along with us as we outline our journey from square one to present in the world of Spark Streaming. We'll detail what we've learned about efficient processing and monitoring, reliability and stability, and long term support of a streaming app. Come learn from our mistakes, and leave with some handy settings and designs you can implement in your own streaming apps.
GenAI for Quant Analytics: survey-analytics.aiInspirient
Pitched at the Greenbook Insight Innovation Competition as apart of IIEX North America 2025 on 30 April 2025 in Washington, D.C.
Join us at survey-analytics.ai!
computer organization and assembly language : its about types of programming language along with variable and array description..https://www.nfciet.edu.pk/
By James Francis, CEO of Paradigm Asset Management
In the landscape of urban safety innovation, Mt. Vernon is emerging as a compelling case study for neighboring Westchester County cities. The municipality’s recently launched Public Safety Camera Program not only represents a significant advancement in community protection but also offers valuable insights for New Rochelle and White Plains as they consider their own safety infrastructure enhancements.
Mieke Jans is a Manager at Deloitte Analytics Belgium. She learned about process mining from her PhD supervisor while she was collaborating with a large SAP-using company for her dissertation.
Mieke extended her research topic to investigate the data availability of process mining data in SAP and the new analysis possibilities that emerge from it. It took her 8-9 months to find the right data and prepare it for her process mining analysis. She needed insights from both process owners and IT experts. For example, one person knew exactly how the procurement process took place at the front end of SAP, and another person helped her with the structure of the SAP-tables. She then combined the knowledge of these different persons.
Thingyan is now a global treasure! See how people around the world are search...Pixellion
We explored how the world searches for 'Thingyan' and 'သင်္ကြန်' and this year, it’s extra special. Thingyan is now officially recognized as a World Intangible Cultural Heritage by UNESCO! Dive into the trends and celebrate with us!
How iCode cybertech Helped Me Recover My Lost Fundsireneschmid345
I was devastated when I realized that I had fallen victim to an online fraud, losing a significant amount of money in the process. After countless hours of searching for a solution, I came across iCode cybertech. From the moment I reached out to their team, I felt a sense of hope that I can recommend iCode Cybertech enough for anyone who has faced similar challenges. Their commitment to helping clients and their exceptional service truly set them apart. Thank you, iCode cybertech, for turning my situation around!
[email protected]
Telangana State, India’s newest state that was carved from the erstwhile state of Andhra
Pradesh in 2014 has launched the Water Grid Scheme named as ‘Mission Bhagiratha (MB)’
to seek a permanent and sustainable solution to the drinking water problem in the state. MB is
designed to provide potable drinking water to every household in their premises through
piped water supply (PWS) by 2018. The vision of the project is to ensure safe and sustainable
piped drinking water supply from surface water sources
4. The world does not wait
• Big data applications are build with the sole
purpose of managing a business case of gathering
an understanding about the word that would give
an advantage.
• The necessity of building streaming applications
arises from the fact that in many applications, the
value of the information gathered drops
dramatically with time.
9. Batch/streaming duality
• Streaming applications can bring value by giving
an approximate answer just on time. If timing is not
an issue (daily), batch pipelines can provide a
good solution.
time
value
streaming
batch
11. Start big, grow small
• Despite the advertisement of vendors, jump in to a
streaming application is not always advisable
• It is harder to get it right and you encounter limitations:
probabilistic data structures, guarantees, …
• The value of the data you are about to gather is not
clear in a discovery phase.
• Some new libraries provide the same set of primitives
both for batch and streaming. It is possible to develop
the core of the idea and just translate that to a streaming
pipeline later.
12. Not always practical
• As a developer, you can
face any of the following
situations
• It is mandatory
• It is doubtful
• It will never be necessary
20. Lambda architecture
• Batch layer (ex. Spark, HDFS): process the master
dataset (append only) to precompute batch views
(a view the front end will query
• Speed layer (streaming): calculate ephemeral
views only based on recent data!
• Motto: take into account reprocessing and
recovery
21. Lambda architecture
• Problems:!
• Maintaing two code bases in sync (often different
because speed layer cannot reproduce the
same)
• Synchronisation of the two layers in the query
layer is an additional problem
24. Kappa approach
• Only maintain one code base and reduce
accidental complexity by using too many
technologies.
• Can reverse back if something goes wrong
• Not a silver bullet and not prescription of
technologies, just a framework.
27. Concepts are basic
• There are multiple frameworks
available nowadays who
change terminology trying to
differentiate.
• It makes starting on streaming
a bit confusing…
28. Concepts are basic
• It makes starting on streaming
a bit confusing…
• Actually there are many
concepts which are shared
between them and they are
quite logical.
29. Step 1: data structure
• The basic data structure is made of 4 elements
• Sink: where is this thing going?
• Partition key: to which shard?
• Sequence id: when was this produced?
• Data: anything that can be serialised (JSON, Avro, photo, …)
Partition key Sequence idSink Data
( , , ),
30. Step 2: hashing
• The holy grail trick of big data to split the work, and
also major block of streaming
• We use hashing in the reverse of classical, force
the clashing of the things that are if my interest
Partition key Sequence idSink Data
( , , ),
h(k) mod N
31. Step 3: fault tolerance
“Distributed computing is parallel computing when
you cannot trust anything or anyone”
32. Step 3: fault tolerance
• At any point any node producing the data in the
source can stop working
• Non persistent: data is lost
• Persistent: data is replicated so it can always be
recovered from other node
33. Step 3: fault tolerance
• At any point any node computing our pipeline can
go down
• at most once: we let data be lost, once delivered
do not reprocess.
• at least once, we ensure delivery, can be
reprocessed.
• exactly once, we ensure delivery and no
reprocessing
34. Step 3: fault tolerance
• At any point any node computing our pipeline can go
down
• checkpointing: If we have been running the pipeline for
hours and something goes wrong, do I have to start
from the beginning?
• Streaming systems put in place mechanisms to
checkpoint progress so the new worker knows
previous state and where to start from.
• Usually involves other systems to save checkpoints
and synchronise.
35. Step 4: delivery
• One at a time: we process each message
individually. Increases response time per message.
• Micro-batch: we always process data in batches
gathered at specified time intervals or size. Makes
it impossible to reduce message processing below
a limit.
38. Partition keyTopic Data
, , )(
…
Partition keyTopic Data
, , )(
Partition keyTopic Data
, , )(
…
Partition keyTopic Data
, , )(
39. Partition keyTopic Data
, , )(
…
Partition keyTopic Data
, , )(
Partition keyTopic Data
, , )(
…
Partition keyTopic Data
, , )(
h(k) mod N
h(k) mod N
h(k) mod N
h(k) mod N
44. one!
at-a-time
mini!
batch
exactly!
once
Deploy Windowing Functional Catch
Yes Yes * Yes *
Custom
YARN
Yes * ~ DRPC
No Yes Yes
YARN
Mesos
Yes Yes
MLlib,
ecosystem
Yes Yes Yes YARN Yes Yes
Flexible
windowing
Yes ~ No YARN ~ No
DB update
log plugin
Yes Yes Yes Google Yes ~
Google
ecosystem
Yes you No AWS you No
AWS
ecosystem
* with Trident
45. Flink basic concepts
• Stream: source of data that feeds computations (a
batch dataset is a bounded stream)
• Transformations: operation that takes one or more
streams as input and computes an output stream.
They can be stateless of stateful (exactly once).
• Sink: endpoint that received the output stream of a
transformation
• Dataflow: DAG of streams, transformations and sinks.
49. Samza basic concepts
• Streams: persistent set of immutable messages of similar type
and category with transactional nature.
• Jobs: code that performs logical transformations on a set of
input streams to append to a set of output streams.
• Partitions: Each stream breaks into partitions, set of totally
ordered sequence of examples.
• Tasks: Each task consumes data from one partition.
• Dataflow: composition of jobs that connects a set of streams.
• Containers: physical unit of parallelism.
52. Storm basic concepts
• Spout: source of data from any external system.
• Bolts: transformations of one or more streams into another
set of output streams.
• Stream grouping: shuffling of streaming data between bolts.
• Topology: set of spouts and bolts that process a stream of
data.
• Tasks and Workers: unit of work deployable into one
container. Workers can process one or more tasks. Task
deploy to one worker.
55. Spark basic concepts
• DStream: continuous stream of data represented by a
series of RDDs. Each RDD contains data for a specific
time interval.
• Input DStream and Receiver: source of data that feeds a
DStream.
• Transformations: operations that transform one DStream
in another DStream (stateless and stateful with exactly
once semantics).
• Output operations: operations that periodically push data
of a DStream to a specific output system.
58. Conclusions…
• Think on streaming when there is a hard constraint on time-to-information
• Use a queue system as your place of orchestration
• Select the processing system that best suits to your use case
• Samza: early stage, more to come in the close future.
• Spark: good option if mini batch will always work for you.
• Storm: good option if you can setup the infrastructure. DRPC provides an interesting pattern
for some use cases.
• Flink: reduced ecosystem because it has a shorter history. Its design learnt from all past
frameworks and is the most flexible.
• Datastream: original inspiration for Flink. Good and flexible model if you want to go the
managed route and make use of Google toolbox (Bigtable, etc)
• Kinesis: Only if you have some legacy. Probably better off using Spark connector in AWS
EMR.
59. Where to go…
• All code examples are available in Github
• Kafka https://github.com/torito1984/kafka-
playground.git, https://github.com/torito1984/kafka-
doyle-generator.git
• Spark https://github.com/torito1984/spark-doyle.git!
• Storm https://github.com/torito1984/trident-doyle.git!
• Flink https://github.com/torito1984/flink-sherlock.git!
• Samza https://github.com/torito1984/samza-locations.git