Hortonworks DataFlow delivers data to streaming analytics platforms, inclusive of Storm, Spark and Flink
These are slides from an Apache Flink Meetup: Integration of Apache Flink and Apache Nifi, Feb 4 2016.
Learn more: http://hortonworks.com/hdf/
Log data can be complex to capture, typically collected in limited amounts and difficult to operationalize at scale. HDF expands the capabilities of log analytics integration options for easy and secure edge analytics of log files in the following ways:
More efficient collection and movement of log data by prioritizing, enriching and/or transforming data at the edge to dynamically separate critical data. The relevant data is then delivered into log analytics systems in a real-time, prioritized and secure manner.
Cost-effective expansion of existing log analytics infrastructure by improving error detection and troubleshooting through more comprehensive data sets.
Intelligent edge analytics to support real-time content-based routing, prioritization, and simultaneous delivery of data into Connected Data Platforms, log analytics and reporting systems for comprehensive coverage and retention of Internet of Anything data.
Beyond Messaging Enterprise Dataflow powered by Apache NiFiIsheeta Sanghi
This document discusses Apache NiFi, an open source software project that provides a dataflow solution for gathering, processing, and delivering data between systems. NiFi addresses challenges with traditional messaging systems by allowing for data routing, transformation, prioritization, and provenance tracking. It uses a flow-based programming model where data moves through a directed graph of processes connected by queues. The project started at the National Security Agency in 2006 and became a top-level Apache project in 2015.
Yifeng Jiang gives a presentation introducing Apache Nifi. He begins with an overview of himself and the agenda. He then provides an introduction to Nifi including terminology like FlowFile and Processor. Key aspects of Nifi are demonstrated including the user interface, provenance tracking, queue prioritization, cluster architecture, and a demo of real-time data processing. Example use cases are discussed like indexing JSON tweets and indexing data from a relational database. The presentation concludes that Nifi is an easy to use and powerful system for processing and distributing data with 90 built-in processors.
Big Data Day LA 2016/ Big Data Track - Building scalable enterprise data flow...Data Con LA
This document discusses Apache NiFi and stream processing. It provides an overview of NiFi's key concepts of managing data flow, data provenance, and securing data. NiFi allows users to visually build data flows with drag and drop processors. It offers features such as guaranteed delivery, data buffering, prioritized queuing, and data provenance. NiFi is based on Flow-Based Programming and is used to reliably transfer data between systems, enrich and prepare data, and deliver data to analytic platforms.
Taking DataFlow Management to the Edge with Apache NiFi/MiNiFiBryan Bende
This document provides an overview of a presentation about taking dataflow management to the edge with Apache NiFi and MiniFi. The presentation discusses the problem of moving data between systems with different formats, protocols, and security requirements. It introduces Apache NiFi as a solution for dataflow management and introduces Apache MiniFi for managing dataflows at the edge. The presentation includes a demo and time for Q&A.
State of the Apache NiFi Ecosystem & CommunityAccumulo Summit
This talk will discuss the state of the Apache NiFi Ecosystem & Community.
Apache NiFi is an integrated data logistics platform for automating the movement of data between disparate systems. It provides real-time control that makes it easy to manage the movement of data between any source and any destination. It is data source agnostic, supporting disparate and distributed sources of differing formats, schemas, protocols, speeds and sizes such as machines, geo location devices, click streams, files, social feeds, log files and videos and more. It is configurable plumbing for moving data around, similar to how Fedex, UPS or other courier delivery services move parcels around. And just like those services, Apache NiFi allows you to trace your data in real time, just like you could trace a delivery.
NiFi processors allow data to be processed as it flows through the system. This document discusses how to create a custom NiFi processor by using the nifi-processor-bundle-archetype Maven archetype to generate the project structure. It also covers deploying the custom processor by building a NAR file with Maven and placing it in the NiFi installation directory so that the new processor will be available. Key methods for customizing processor behavior like init, onSchedule, and onTrigger are also outlined.
NiFi Best Practices for the EnterpriseGregory Keys
The document discusses best practices for implementing Apache NiFi in an enterprise. It recommends establishing a Center of Excellence (COE) to align stakeholders, provide guidance, and develop standards and processes for NiFi deployment. The COE should work with business leaders to understand data flow needs and ensure NiFi is delivering business value. When scaling NiFi across a large enterprise, it may make sense to have multiple semi-autonomous NiFi clusters for different business groups rather than one large cluster. Reusable templates, components, and patterns can help with development efficiencies.
Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...Hortonworks
Apache NiFi, Storm and Kafka augment each other in modern enterprise architectures. NiFi provides a coding free solution to get many different formats and protocols in and out of Kafka and compliments Kafka with full audit trails and interactive command and control. Storm compliments NiFi with the capability to handle complex event processing.
Join us to learn how Apache NiFi, Storm and Kafka can augment each other for creating a new dataplane connecting multiple systems within your enterprise with ease, speed and increased productivity.
https://www.brighttalk.com/webcast/9573/224063
Apache NiFi - Flow Based Programming MeetupJoseph Witt
These are the slides from the July 11th Meetup in Toronto for the Flow Based Programming meetup group at Lighthouse covering Enterprise Dataflow with Apache NiFi.
Apache NiFi: latest developments for flow management at scaleAbdelkrim Hadjidj
The document discusses Apache NiFi, an open source dataflow management platform. It provides an overview of NiFi's capabilities including over 225 processors for common data access, transformation, and management tasks. The presentation demonstrates NiFi and its web-based user interface, zero-master clustering architecture, and extensibility via custom processors and controllers. New features discussed include component versioning, change data capture from MySQL, and a record-based processing mechanism for improved data handling.
Apache NiFi Meetup - Introduction to NiFi RegistryBryan Bende
This document introduces NiFi Registry, a new component of Apache NiFi that allows for versioning and centralized management of NiFi flows. It provides capabilities for deploying flows between environments through version control and management of parameterized variables. The architecture involves a metadata database and flow persistence provider to store versions of flows and their associated metadata. Examples of deployment scenarios and existing tools for automation are also presented.
Harnessing Data-in-Motion with HDF 2.0, introduction to Apache NIFI/MINIFIHaimo Liu
Introducing the new Hortonworks DataFlow (HDF) release, HDF 2.0. Also provides introduction to the flow management part of the platform, powered by Apache NIFI and MINIFI.
Learn about HDF and how you can easily augment your existing data systems - Hadoop and otherwise. Learn what Dataflow is all about and how Apache NiFi, MiNiFi, Kafka and Storm work together for streaming analytics.
Introduction to Apache NiFi - Seattle Scalability MeetupSaptak Sen
The document introduces Apache NiFi, an open source tool for data flow. It discusses how data from the Internet of Things is growing faster than can be consumed and highlights Apache NiFi's ability to securely collect, process and distribute this data in motion. The key concepts of Apache NiFi are described as managing the flow of information, ensuring data provenance, and securing the control and data planes. Example use cases are provided and the document demonstrates Apache NiFi's visual interface for creating data flows between processors to ingest, transform and output data in real-time.
MiNiFi is a recently started sub-project of Apache NiFi that is a complementary data collection approach which supplements the core tenets of NiFi in dataflow management, focusing on the collection of data at the source of its creation. Simply, MiNiFi agents take the guiding principles of NiFi and pushes them to the edge in a purpose built design and deploy manner. This talk will focus on MiNiFi's features, go over recent developments and prospective plans, and give a live demo of MiNiFi.
The config.yml is available here: https://gist.github.com/JPercivall/f337b8abdc9019cab5ff06cb7f6ff09a
Introduction: This workshop will provide a hands on introduction to simple event data processing and data flow processing using a Sandbox on students’ personal machines.
Format: A short introductory lecture to Apache NiFi and computing used in the lab followed by a demo, lab exercises and a Q&A session. The lecture will be followed by lab time to work through the lab exercises and ask questions.
Objective: To provide a quick and short hands-on introduction to Apache NiFi. In the lab, you will install and use Apache NiFi to collect, conduct and curate data-in-motion and data-at-rest with NiFi. You will learn how to connect and consume streaming sensor data, filter and transform the data and persist to multiple data sources.
Pre-requisites: Registrants must bring a laptop that has the latest VirtualBox installed and an image for Hortonworks DataFlow (HDF) Sandbox will be provided.
Speakers: Andy LoPresto, Timothy Spann
This document provides an overview of Apache NiFi and the new MiNiFi project. It begins with introductions to Apache NiFi, its key features, and what is new in version 1.0.0. It then introduces MiNiFi, describing it as a way to deploy NiFi flows to edge systems with limited resources. The rest of the document demonstrates the NiFi and MiNiFi architectures and how they work together, and provides an example deployment to a courier service. It concludes with a demo of NiFi and MiNiFi.
The document discusses improvements to Apache NiFi and NiFi Registry for software development lifecycles. Key improvements include:
1) Parameterized flows in NiFi that allow sensitive values to be parameterized and referenced securely.
2) Version control improvements like forcing commits and tracking component enable/disable state.
3) Granular proxy permissions and public buckets in NiFi Registry for access control.
4) Versioning of extension bundles alongside flows in NiFi Registry.
Introduction: This workshop will provide a hands on introduction to simple event data processing and data flow processing using a Sandbox on students’ personal machines.
Format: A short introductory lecture to Apache NiFi and computing used in the lab followed by a demo, lab exercises and a Q&A session. The lecture will be followed by lab time to work through the lab exercises and ask questions.
Objective: To provide a quick and short hands-on introduction to Apache NiFi. In the lab, you will install and use Apache NiFi to collect, conduct and curate data-in-motion and data-at-rest with NiFi. You will learn how to connect and consume streaming sensor data, filter and transform the data and persist to multiple data sources.
Running Apache NiFi with Apache Spark : Integration OptionsTimothy Spann
A walk-through of various options in integration Apache Spark and Apache NiFi in one smooth dataflow. There are now several options in interfacing between Apache NiFi and Apache Spark with Apache Kafka and Apache Livy.
As Apache Solr becomes more powerful and easier to use, the accessibility of high quality data becomes key to unlocking the full potential of Solr’s search and analytic capabilities. Traditional approaches to acquiring data frequently involve a combination of homegrown tools and scripts, often requiring significant development efforts and becoming hard to change, hard to monitor, and hard to maintain. This talk will discuss how Apache NiFi addresses the above challenges and can be used to build production-grade data pipelines for Solr. We will start by giving an introduction to the core features of NiFi, such as visual command & control, dynamic prioritization, back-pressure, and provenance. We will then look at NiFi’s processors for integrating with Solr, covering topics such as ingesting and extracting data, interacting with secure Solr instances, and performance tuning. We will conclude by building a live dataflow from scratch, demonstrating how to prepare data and ingest to Solr.
NJ Hadoop Meetup - Apache NiFi Deep DiveBryan Bende
Apache NiFi is a software platform created by Apache to automate the flow of data between systems. It addresses challenges of global enterprise data flow with features like visual command and control, data lineage tracking, data prioritization, and secure data transfer. NiFi is commonly used for reliable transfer of data between systems, delivery of data to analytic platforms, and data enrichment/preparation tasks like format conversion and extraction. It is not intended for distributed computation, complex event processing, or joins.
Originally created for Hadoop Summit 2016: Melbourne.
http://www.hadoopsummit.org/melbourne/
Apache NiFi is becoming a defacto tool for handling orchestration, routing and mediation of data in the highly complex and heterogeneous world of Big Data, connecting many components (in-motion and at-rest) of its ecosystem into one homogenous and secure data flow. And while features such as security, provenance, dynamic prioritization and extensibility have long captured the attention of the enterprises, the innovation in NiFi land continues. This hands-on talk consisting of live demos and code will concentrate on what’s new an exciting in the world of NiFi. It will cover the newest and most advanced features of NiFi as well as demonstrate some of the "work in progress" essentially giving you a preview into the future.
MiNiFi is a recently started sub-project of Apache NiFi that is a complementary data collection approach which supplements the core tenets of NiFi in dataflow management, focusing on the collection of data at the source of its creation. Simply, MiNiFi agents take the guiding principles of NiFi and pushes them to the edge in a purpose built design and deploy manner. This talk will focus on MiNiFi's features, go over recent developments and prospective plans, and give a live demo of MiNiFi.
The config.yml is available here: https://gist.github.com/JPercivall/f337b8abdc9019cab5ff06cb7f6ff09a
Dataflow Management From Edge to Core with Apache NiFiDataWorks Summit
What is “dataflow?” — the process and tooling around gathering necessary information and getting it into a useful form to make insights available. Dataflow needs change rapidly — what was noise yesterday may be crucial data today, an API endpoint changes, or a service switches from producing CSV to JSON or Avro. In addition, developers may need to design a flow in a sandbox and deploy to QA or production — and those database passwords aren’t the same (hopefully). Learn about Apache NiFi — a robust and secure framework for dataflow development and monitoring.
Abstract: Identifying, collecting, securing, filtering, prioritizing, transforming, and transporting abstract data is a challenge faced by every organization. Apache NiFi and MiNiFi allow developers to create and refine dataflows with ease and ensure that their critical content is routed, transformed, validated, and delivered across global networks. Learn how the framework enables rapid development of flows, live monitoring and auditing, data protection and sharing. From IoT and machine interaction to log collection, NiFi can scale to meet the needs of your organization. Able to handle both small event messages and “big data” on the scale of terabytes per day, NiFi will provide a platform which lets both engineers and non-technical domain experts collaborate to solve the ingest and storage problems that have plagued enterprises.
Expected prior knowledge / intended audience: developers and data flow managers should be interested in learning about and improving their dataflow problems. The intended audience does not need experience in designing and modifying data flows.
Takeaways: Attendees will gain an understanding of dataflow concepts, data management processes, and flow management (including versioning, rollbacks, promotion between deployment environments, and various backing implementations).
Current uses: I am a committer and PMC member for the Apache NiFi, MiNiFi, and NiFi Registry projects and help numerous users deploy these tools to collect data from an incredibly diverse array of endpoints, aggregate, prioritize, filter, transform, and secure this data, and generate actionable insight from it. Current users of these platforms include many Fortune 100 companies, governments, startups, and individual users across fields like telecommunications, finance, healthcare, automotive, aerospace, and oil & gas, with use cases like fraud detection, logistics management, supply chain management, machine learning, IoT gateway, connected vehicles, smart grids, etc.
NiFi processors allow data to be processed as it flows through the system. This document discusses how to create a custom NiFi processor by using the nifi-processor-bundle-archetype Maven archetype to generate the project structure. It also covers deploying the custom processor by building a NAR file with Maven and placing it in the NiFi installation directory so that the new processor will be available. Key methods for customizing processor behavior like init, onSchedule, and onTrigger are also outlined.
NiFi Best Practices for the EnterpriseGregory Keys
The document discusses best practices for implementing Apache NiFi in an enterprise. It recommends establishing a Center of Excellence (COE) to align stakeholders, provide guidance, and develop standards and processes for NiFi deployment. The COE should work with business leaders to understand data flow needs and ensure NiFi is delivering business value. When scaling NiFi across a large enterprise, it may make sense to have multiple semi-autonomous NiFi clusters for different business groups rather than one large cluster. Reusable templates, components, and patterns can help with development efficiencies.
Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...Hortonworks
Apache NiFi, Storm and Kafka augment each other in modern enterprise architectures. NiFi provides a coding free solution to get many different formats and protocols in and out of Kafka and compliments Kafka with full audit trails and interactive command and control. Storm compliments NiFi with the capability to handle complex event processing.
Join us to learn how Apache NiFi, Storm and Kafka can augment each other for creating a new dataplane connecting multiple systems within your enterprise with ease, speed and increased productivity.
https://www.brighttalk.com/webcast/9573/224063
Apache NiFi - Flow Based Programming MeetupJoseph Witt
These are the slides from the July 11th Meetup in Toronto for the Flow Based Programming meetup group at Lighthouse covering Enterprise Dataflow with Apache NiFi.
Apache NiFi: latest developments for flow management at scaleAbdelkrim Hadjidj
The document discusses Apache NiFi, an open source dataflow management platform. It provides an overview of NiFi's capabilities including over 225 processors for common data access, transformation, and management tasks. The presentation demonstrates NiFi and its web-based user interface, zero-master clustering architecture, and extensibility via custom processors and controllers. New features discussed include component versioning, change data capture from MySQL, and a record-based processing mechanism for improved data handling.
Apache NiFi Meetup - Introduction to NiFi RegistryBryan Bende
This document introduces NiFi Registry, a new component of Apache NiFi that allows for versioning and centralized management of NiFi flows. It provides capabilities for deploying flows between environments through version control and management of parameterized variables. The architecture involves a metadata database and flow persistence provider to store versions of flows and their associated metadata. Examples of deployment scenarios and existing tools for automation are also presented.
Harnessing Data-in-Motion with HDF 2.0, introduction to Apache NIFI/MINIFIHaimo Liu
Introducing the new Hortonworks DataFlow (HDF) release, HDF 2.0. Also provides introduction to the flow management part of the platform, powered by Apache NIFI and MINIFI.
Learn about HDF and how you can easily augment your existing data systems - Hadoop and otherwise. Learn what Dataflow is all about and how Apache NiFi, MiNiFi, Kafka and Storm work together for streaming analytics.
Introduction to Apache NiFi - Seattle Scalability MeetupSaptak Sen
The document introduces Apache NiFi, an open source tool for data flow. It discusses how data from the Internet of Things is growing faster than can be consumed and highlights Apache NiFi's ability to securely collect, process and distribute this data in motion. The key concepts of Apache NiFi are described as managing the flow of information, ensuring data provenance, and securing the control and data planes. Example use cases are provided and the document demonstrates Apache NiFi's visual interface for creating data flows between processors to ingest, transform and output data in real-time.
MiNiFi is a recently started sub-project of Apache NiFi that is a complementary data collection approach which supplements the core tenets of NiFi in dataflow management, focusing on the collection of data at the source of its creation. Simply, MiNiFi agents take the guiding principles of NiFi and pushes them to the edge in a purpose built design and deploy manner. This talk will focus on MiNiFi's features, go over recent developments and prospective plans, and give a live demo of MiNiFi.
The config.yml is available here: https://gist.github.com/JPercivall/f337b8abdc9019cab5ff06cb7f6ff09a
Introduction: This workshop will provide a hands on introduction to simple event data processing and data flow processing using a Sandbox on students’ personal machines.
Format: A short introductory lecture to Apache NiFi and computing used in the lab followed by a demo, lab exercises and a Q&A session. The lecture will be followed by lab time to work through the lab exercises and ask questions.
Objective: To provide a quick and short hands-on introduction to Apache NiFi. In the lab, you will install and use Apache NiFi to collect, conduct and curate data-in-motion and data-at-rest with NiFi. You will learn how to connect and consume streaming sensor data, filter and transform the data and persist to multiple data sources.
Pre-requisites: Registrants must bring a laptop that has the latest VirtualBox installed and an image for Hortonworks DataFlow (HDF) Sandbox will be provided.
Speakers: Andy LoPresto, Timothy Spann
This document provides an overview of Apache NiFi and the new MiNiFi project. It begins with introductions to Apache NiFi, its key features, and what is new in version 1.0.0. It then introduces MiNiFi, describing it as a way to deploy NiFi flows to edge systems with limited resources. The rest of the document demonstrates the NiFi and MiNiFi architectures and how they work together, and provides an example deployment to a courier service. It concludes with a demo of NiFi and MiNiFi.
The document discusses improvements to Apache NiFi and NiFi Registry for software development lifecycles. Key improvements include:
1) Parameterized flows in NiFi that allow sensitive values to be parameterized and referenced securely.
2) Version control improvements like forcing commits and tracking component enable/disable state.
3) Granular proxy permissions and public buckets in NiFi Registry for access control.
4) Versioning of extension bundles alongside flows in NiFi Registry.
Introduction: This workshop will provide a hands on introduction to simple event data processing and data flow processing using a Sandbox on students’ personal machines.
Format: A short introductory lecture to Apache NiFi and computing used in the lab followed by a demo, lab exercises and a Q&A session. The lecture will be followed by lab time to work through the lab exercises and ask questions.
Objective: To provide a quick and short hands-on introduction to Apache NiFi. In the lab, you will install and use Apache NiFi to collect, conduct and curate data-in-motion and data-at-rest with NiFi. You will learn how to connect and consume streaming sensor data, filter and transform the data and persist to multiple data sources.
Running Apache NiFi with Apache Spark : Integration OptionsTimothy Spann
A walk-through of various options in integration Apache Spark and Apache NiFi in one smooth dataflow. There are now several options in interfacing between Apache NiFi and Apache Spark with Apache Kafka and Apache Livy.
As Apache Solr becomes more powerful and easier to use, the accessibility of high quality data becomes key to unlocking the full potential of Solr’s search and analytic capabilities. Traditional approaches to acquiring data frequently involve a combination of homegrown tools and scripts, often requiring significant development efforts and becoming hard to change, hard to monitor, and hard to maintain. This talk will discuss how Apache NiFi addresses the above challenges and can be used to build production-grade data pipelines for Solr. We will start by giving an introduction to the core features of NiFi, such as visual command & control, dynamic prioritization, back-pressure, and provenance. We will then look at NiFi’s processors for integrating with Solr, covering topics such as ingesting and extracting data, interacting with secure Solr instances, and performance tuning. We will conclude by building a live dataflow from scratch, demonstrating how to prepare data and ingest to Solr.
NJ Hadoop Meetup - Apache NiFi Deep DiveBryan Bende
Apache NiFi is a software platform created by Apache to automate the flow of data between systems. It addresses challenges of global enterprise data flow with features like visual command and control, data lineage tracking, data prioritization, and secure data transfer. NiFi is commonly used for reliable transfer of data between systems, delivery of data to analytic platforms, and data enrichment/preparation tasks like format conversion and extraction. It is not intended for distributed computation, complex event processing, or joins.
Originally created for Hadoop Summit 2016: Melbourne.
http://www.hadoopsummit.org/melbourne/
Apache NiFi is becoming a defacto tool for handling orchestration, routing and mediation of data in the highly complex and heterogeneous world of Big Data, connecting many components (in-motion and at-rest) of its ecosystem into one homogenous and secure data flow. And while features such as security, provenance, dynamic prioritization and extensibility have long captured the attention of the enterprises, the innovation in NiFi land continues. This hands-on talk consisting of live demos and code will concentrate on what’s new an exciting in the world of NiFi. It will cover the newest and most advanced features of NiFi as well as demonstrate some of the "work in progress" essentially giving you a preview into the future.
MiNiFi is a recently started sub-project of Apache NiFi that is a complementary data collection approach which supplements the core tenets of NiFi in dataflow management, focusing on the collection of data at the source of its creation. Simply, MiNiFi agents take the guiding principles of NiFi and pushes them to the edge in a purpose built design and deploy manner. This talk will focus on MiNiFi's features, go over recent developments and prospective plans, and give a live demo of MiNiFi.
The config.yml is available here: https://gist.github.com/JPercivall/f337b8abdc9019cab5ff06cb7f6ff09a
Dataflow Management From Edge to Core with Apache NiFiDataWorks Summit
What is “dataflow?” — the process and tooling around gathering necessary information and getting it into a useful form to make insights available. Dataflow needs change rapidly — what was noise yesterday may be crucial data today, an API endpoint changes, or a service switches from producing CSV to JSON or Avro. In addition, developers may need to design a flow in a sandbox and deploy to QA or production — and those database passwords aren’t the same (hopefully). Learn about Apache NiFi — a robust and secure framework for dataflow development and monitoring.
Abstract: Identifying, collecting, securing, filtering, prioritizing, transforming, and transporting abstract data is a challenge faced by every organization. Apache NiFi and MiNiFi allow developers to create and refine dataflows with ease and ensure that their critical content is routed, transformed, validated, and delivered across global networks. Learn how the framework enables rapid development of flows, live monitoring and auditing, data protection and sharing. From IoT and machine interaction to log collection, NiFi can scale to meet the needs of your organization. Able to handle both small event messages and “big data” on the scale of terabytes per day, NiFi will provide a platform which lets both engineers and non-technical domain experts collaborate to solve the ingest and storage problems that have plagued enterprises.
Expected prior knowledge / intended audience: developers and data flow managers should be interested in learning about and improving their dataflow problems. The intended audience does not need experience in designing and modifying data flows.
Takeaways: Attendees will gain an understanding of dataflow concepts, data management processes, and flow management (including versioning, rollbacks, promotion between deployment environments, and various backing implementations).
Current uses: I am a committer and PMC member for the Apache NiFi, MiNiFi, and NiFi Registry projects and help numerous users deploy these tools to collect data from an incredibly diverse array of endpoints, aggregate, prioritize, filter, transform, and secure this data, and generate actionable insight from it. Current users of these platforms include many Fortune 100 companies, governments, startups, and individual users across fields like telecommunications, finance, healthcare, automotive, aerospace, and oil & gas, with use cases like fraud detection, logistics management, supply chain management, machine learning, IoT gateway, connected vehicles, smart grids, etc.
The First Mile -- Edge and IoT Data Collection with Apache NiFi and MiNiFiDataWorks Summit
Apache NiFi provided a revolutionary data flow management system with a broad range of integrations with existing data production, consumption, and analysis ecosystems, all covered with robust data delivery and provenance infrastructure. Now learn about the follow-on project which expands the reach of NiFi to the edge, Apache MiNiFi. MiNiFi is a lightweight application which can be deployed on hardware orders of magnitude smaller and less powerful than the existing standard data collection platforms. With both a JVM compatible and native agent, MiNiFi allows data collection in brand new environments — sensors with tiny footprints, distributed systems with intermittent or restricted bandwidth, and even disposable or ephemeral hardware. Not only can this data be prioritized and have some initial analysis performed at the edge, it can be encrypted and secured immediately. Local governance and regulatory policies can be applied across geopolitical boundaries to conform with legal requirements. And all of this configuration can be done from central command & control using an existing NiFi with the trusted and stable UI data flow managers already love.
The First Mile - Edge and IoT Data Collection With Apache Nifi and MiniFiDataWorks Summit
Apache NiFi MiNiFi enables data collection in a brand new environment - small sensor footprint, intermittent or limited bandwidth distributed system, and disposable or short-lived hardware. You can prioritize this data or perform initial analysis on the edge, as well as immediately encrypt and protect it.
Concept: Apache NiFi offers a revolutionary data flow management system and extensive integration of existing data production, consumption and analysis ecosystems, all of which are robust data delivery and a (data) logging infrastructure It is protected by. Learn about the additional project Apache MiNiFi, which extends the scope of NiFi's power to the maximum. MiNiFi is a lightweight application that can be placed on hardware that is one order of magnitude smaller than the existing standard data collection platform and is less powerful. As a JVM-enabled native agent MiNiFi enables data gathering in a brand new environment - small sensor footprint, intermittent or limited bandwidth distributed system, and disposable or short-lived hardware. You can prioritize this data or perform initial analysis on the edge, as well as immediately encrypt and protect it. Regional governance and regulatory policies are applied to geopolitical boundaries and comply with legal requirements. And all of this configuration can be done from the existing NiFi and central control using the stable data UI that the data flow administrator has already liked and trusted.
Required prior knowledge / targeted participants: Developers and data flow administrators need some knowledge of Apache NiFi as a platform for routing, conversion, and data delivery through the system (a brief overview is provided ). In this talk we will focus on extending data collection, routing, data history, and NiFi control functions, through IoT / edge integration via MiNiFi.
Key Points: Participants will learn about the opportunity to collect and capture data flows close to the source of data, "edge", such as IoT devices, vehicles, machines, etc. Participants prioritize, filter, protect, and manipulate this data in the initial data lifecycle and understand the potential for data visibility and performance improvement.
The document provides an introduction and overview of Apache NiFi and its architecture. It discusses how NiFi can be used to effectively manage and move data between different producers and consumers. It also summarizes key NiFi features like guaranteed delivery, data buffering, prioritization, and data provenance. Finally, it briefly outlines the NiFi architecture and components as well as opportunities for the future of the MiniFi project.
This workshop will provide a hands on introduction to simple event data processing and data flow processing using a Sandbox on students’ personal machines.
Format: A short introductory lecture to Apache NiFi and computing used in the lab followed by a demo, lab exercises and a Q&A session. The lecture will be followed by lab time to work through the lab exercises and ask questions.
Objective: To provide a quick and short hands-on introduction to Apache NiFi. In the lab, you will install and use Apache NiFi to collect, conduct and curate data-in-motion and data-at-rest with NiFi. You will learn how to connect and consume streaming sensor data, filter and transform the data and persist to multiple data sources.
Pre-requisites: Registrants must bring a laptop that has the latest VirtualBox installed and an image for Hortonworks DataFlow (HDF) Sandbox will be provided.
Speaker: Andy LoPresto
The document discusses Apache NiFi and its role in the Hadoop ecosystem. It provides an overview of NiFi, describes how it can be used to integrate with Hadoop components like HDFS, HBase, and Kafka. It also discusses how NiFi supports stream processing integrations and outlines some use cases. The document concludes by discussing future work, including improving NiFi's high availability, multi-tenancy, and expanding its ecosystem integrations.
Building Data Pipelines for Solr with Apache NiFiBryan Bende
This document provides an overview of using Apache NiFi to build data pipelines that index data into Apache Solr. It introduces NiFi and its capabilities for data routing, transformation and monitoring. It describes how Solr accepts data through different update handlers like XML, JSON and CSV. It demonstrates how NiFi processors can be used to stream data to Solr via these update handlers. Example use cases are presented for indexing tweets, commands, logs and databases into Solr collections. Future enhancements are discussed like parsing documents and distributing commands across a Solr cluster.
Dataflow Management From Edge to Core with Apache NiFiDataWorks Summit
What is “dataflow?” — the process and tooling around gathering necessary information and getting it into a useful form to make insights available. Dataflow needs change rapidly — what was noise yesterday may be crucial data today, an API endpoint changes, or a service switches from producing CSV to JSON or Avro. In addition, developers may need to design a flow in a sandbox and deploy to QA or production — and those database passwords aren’t the same (hopefully). Learn about Apache NiFi — a robust and secure framework for dataflow development and monitoring.
Abstract: Identifying, collecting, securing, filtering, prioritizing, transforming, and transporting abstract data is a challenge faced by every organization. Apache NiFi and MiNiFi allow developers to create and refine dataflows with ease and ensure that their critical content is routed, transformed, validated, and delivered across global networks. Learn how the framework enables rapid development of flows, live monitoring and auditing, data protection and sharing. From IoT and machine interaction to log collection, NiFi can scale to meet the needs of your organization. Able to handle both small event messages and “big data” on the scale of terabytes per day, NiFi will provide a platform which lets both engineers and non-technical domain experts collaborate to solve the ingest and storage problems that have plagued enterprises.
Expected prior knowledge / intended audience: developers and data flow managers should be interested in learning about and improving their dataflow problems. The intended audience does not need experience in designing and modifying data flows.
Takeaways: Attendees will gain an understanding of dataflow concepts, data management processes, and flow management (including versioning, rollbacks, promotion between deployment environments, and various backing implementations).
Current uses: I am a committer and PMC member for the Apache NiFi, MiNiFi, and NiFi Registry projects and help numerous users deploy these tools to collect data from an incredibly diverse array of endpoints, aggregate, prioritize, filter, transform, and secure this data, and generate actionable insight from it. Current users of these platforms include many Fortune 100 companies, governments, startups, and individual users across fields like telecommunications, finance, healthcare, automotive, aerospace, and oil & gas, with use cases like fraud detection, logistics management, supply chain management, machine learning, IoT gateway, connected vehicles, smart grids, etc.
Speaker: Andy LoPresto, Sr. Member of Technical Staff, Hortonworks
Data Con LA 2018 - Streaming and IoT by Pat AlwellData Con LA
Hortonworks DataFlow (HDF) is built with the vision of creating a platform that enables enterprises to build dataflow management and streaming analytics solutions that collect, curate, analyze and act on data in motion across the datacenter and cloud. Do you want to be able to provide a complete end-to-end streaming solution, from an IoT device all the way to a dashboard for your business users with no code? Come to this session to learn how this is now possible with HDF 3.1.
Using Spark Streaming and NiFi for the Next Generation of ETL in the EnterpriseDataWorks Summit
In recent years, big data has moved from batch processing to stream-based processing since no one wants to wait hours or days to gain insights. Dozens of stream processing frameworks exist today and the same trend that occurred in the batch-based big data processing realm has taken place in the streaming world so that nearly every streaming framework now supports higher level relational operations.
On paper, combining Apache NiFi, Kafka, and Spark Streaming provides a compelling architecture option for building your next generation ETL data pipeline in near real time. What does this look like in an enterprise production environment to deploy and operationalized?
The newer Spark Structured Streaming provides fast, scalable, fault-tolerant, end-to-end exactly-once stream processing with elegant code samples, but is that the whole story?
We discuss the drivers and expected benefits of changing the existing event processing systems. In presenting the integrated solution, we will explore the key components of using NiFi, Kafka, and Spark, then share the good, the bad, and the ugly when trying to adopt these technologies into the enterprise. This session is targeted toward architects and other senior IT staff looking to continue their adoption of open source technology and modernize ingest/ETL processing. Attendees will take away lessons learned and experience in deploying these technologies to make their journey easier.
Speaker: Andrew Psaltis, Principal Solution Engineer, Hortonworks
This document discusses extending the functionality of Apache NiFi through custom processors and controller services. It provides an overview of the NiFi architecture and repositories, describes how to create extensions with minimal dependencies using Maven archetypes, and notes that most extensions can be developed within hours. Quick prototyping of data flows is possible using existing binaries, applications, and scripting languages. Resources for the NiFi developer guide and example Maven projects are also listed.
The First Mile – Edge and IoT Data Collection with Apache NiFi and MiNiFiDataWorks Summit
Apache NiFi MiNiFi allows data collection in brand new environments — sensors with tiny footprints, distributed systems with intermittent or restricted bandwidth, and even disposable or ephemeral hardware. Not only can this data be prioritized and have some initial analysis performed at the edge, it can be encrypted and secured immediately.
Abstract: Apache NiFi provided a revolutionary data flow management system with a broad range of integrations with existing data production, consumption, and analysis ecosystems, all covered with robust data delivery and provenance infrastructure. Now learn about the follow-on project which expands the reach of NiFi to the edge, Apache MiNiFi. MiNiFi is a lightweight application which can be deployed on hardware orders of magnitude smaller and less powerful than the existing standard data collection platforms. With both a JVM compatible and native agent, MiNiFi allows data collection in brand new environments — sensors with tiny footprints, distributed systems with intermittent or restricted bandwidth, and even disposable or ephemeral hardware. Not only can this data be prioritized and have some initial analysis performed at the edge, it can be encrypted and secured immediately. Local governance and regulatory policies can be applied across geopolitical boundaries to conform with legal requirements. And all of this configuration can be done from central command & control using an existing NiFi with the trusted and stable UI data flow managers already love.
Expected prior knowledge / intended audience: developers and data flow managers should have a passing knowledge of Apache NiFi as a platform for routing, transforming, and delivering data through systems (a brief overview will be provided). The talk will focus on extending the data collection, routing, provenance, and governance capabilities of NiFi to IoT/edge integration via MiNiFi.
Takeaways: Attendees will learn about opportunities to bring their data flow and capture closer to the "edge" -- sources of data like IoT devices, vehicles, machinery, etc. They will understand the possibilities to prioritize, filter, secure, and manipulate this data earlier in the data lifecycle to enhance their data visibility and performance.
Speaker: Andy LoPresto, Sr. Member of Technical Staff, Hortonworks
This document provides an overview of Apache NiFi 1.0 and discusses its new enhancements, including a modernized UI with a complete interface redesign, multitenant authorization capabilities, zero master clustering, and foundational work for software development lifecycles. It also outlines NiFi's use for data flow management and integration with downstream systems.
Connecting the Drops with Apache NiFi & Apache MiNiFiDataWorks Summit
Demand for increased capture of information to drive analytic insights into an organizations' assets and infrastructure is growing at unprecedented rates. However, as data volume growth soars, the ability to provide seamless ingestion pipelines becomes operationally complex as the magnitude of data sources and types expands.
This talk will focus on the efforts of the Apache NiFi community including subproject, MiNiFi; an agent based architecture and its relation to the core Apache NiFi project. MiNiFi is focused on providing a platform that meets and adapts to where data is born while providing the core tenets of NiFi in provenance, security, and command and control. These capabilities provide versatile avenues for the bi-directional exchange of information across data and control planes while dealing with the constraints of operation at opposite ends of the scale spectrum tackling the first and last miles of dataflow management.
We will highlight ongoing and new efforts in the community to provide greater flexibility with deployment and configuration management of flows. Versioned flows provide greater operational flexibility and serve as a powerful foundation to orchestrate the collection and transmission from the point of data's inception through to its transmission to consumers and processing systems.
Data at Scales and the Values of Starting Small with Apache NiFi & MiNiFiAldrin Piri
This document discusses Apache NiFi and Apache MiNiFi. It begins with an overview of NiFi, describing its key features like guaranteed delivery, data buffering, and data provenance. It then introduces MiNiFi as a smaller version of NiFi that can operate on edge devices with limited resources. A use case is presented of a courier service gathering data from disparate sources using both NiFi and MiNiFi. The document concludes by discussing the NiFi ecosystem and encouraging participation in the community.
Learn how Hortonworks Data Flow (HDF), powered by Apache Nifi, enables organizations to harness IoAT data streams to drive business and operational insights. We will use the session to provide an overview of HDF, including detailed hands-on lab to build HDF pipelines for capture and analysis of streaming data.
Recording and labs available at:
http://hortonworks.com/partners/learn/#hdf
Dev Dives: Automate and orchestrate your processes with UiPath MaestroUiPathCommunity
This session is designed to equip developers with the skills needed to build mission-critical, end-to-end processes that seamlessly orchestrate agents, people, and robots.
📕 Here's what you can expect:
- Modeling: Build end-to-end processes using BPMN.
- Implementing: Integrate agentic tasks, RPA, APIs, and advanced decisioning into processes.
- Operating: Control process instances with rewind, replay, pause, and stop functions.
- Monitoring: Use dashboards and embedded analytics for real-time insights into process instances.
This webinar is a must-attend for developers looking to enhance their agentic automation skills and orchestrate robust, mission-critical processes.
👨🏫 Speaker:
Andrei Vintila, Principal Product Manager @UiPath
This session streamed live on April 29, 2025, 16:00 CET.
Check out all our upcoming Dev Dives sessions at https://community.uipath.com/dev-dives-automation-developer-2025/.
Andrew Marnell: Transforming Business Strategy Through Data-Driven InsightsAndrew Marnell
With expertise in data architecture, performance tracking, and revenue forecasting, Andrew Marnell plays a vital role in aligning business strategies with data insights. Andrew Marnell’s ability to lead cross-functional teams ensures businesses achieve sustainable growth and operational excellence.
Special Meetup Edition - TDX Bengaluru Meetup #52.pptxshyamraj55
We’re bringing the TDX energy to our community with 2 power-packed sessions:
🛠️ Workshop: MuleSoft for Agentforce
Explore the new version of our hands-on workshop featuring the latest Topic Center and API Catalog updates.
📄 Talk: Power Up Document Processing
Dive into smart automation with MuleSoft IDP, NLP, and Einstein AI for intelligent document workflows.
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep DiveScyllaDB
Want to learn practical tips for designing systems that can scale efficiently without compromising speed?
Join us for a workshop where we’ll address these challenges head-on and explore how to architect low-latency systems using Rust. During this free interactive workshop oriented for developers, engineers, and architects, we’ll cover how Rust’s unique language features and the Tokio async runtime enable high-performance application development.
As you explore key principles of designing low-latency systems with Rust, you will learn how to:
- Create and compile a real-world app with Rust
- Connect the application to ScyllaDB (NoSQL data store)
- Negotiate tradeoffs related to data modeling and querying
- Manage and monitor the database for consistently low latencies
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...Alan Dix
Talk at the final event of Data Fusion Dynamics: A Collaborative UK-Saudi Initiative in Cybersecurity and Artificial Intelligence funded by the British Council UK-Saudi Challenge Fund 2024, Cardiff Metropolitan University, 29th April 2025
https://alandix.com/academic/talks/CMet2025-AI-Changes-Everything/
Is AI just another technology, or does it fundamentally change the way we live and think?
Every technology has a direct impact with micro-ethical consequences, some good, some bad. However more profound are the ways in which some technologies reshape the very fabric of society with macro-ethical impacts. The invention of the stirrup revolutionised mounted combat, but as a side effect gave rise to the feudal system, which still shapes politics today. The internal combustion engine offers personal freedom and creates pollution, but has also transformed the nature of urban planning and international trade. When we look at AI the micro-ethical issues, such as bias, are most obvious, but the macro-ethical challenges may be greater.
At a micro-ethical level AI has the potential to deepen social, ethnic and gender bias, issues I have warned about since the early 1990s! It is also being used increasingly on the battlefield. However, it also offers amazing opportunities in health and educations, as the recent Nobel prizes for the developers of AlphaFold illustrate. More radically, the need to encode ethics acts as a mirror to surface essential ethical problems and conflicts.
At the macro-ethical level, by the early 2000s digital technology had already begun to undermine sovereignty (e.g. gambling), market economics (through network effects and emergent monopolies), and the very meaning of money. Modern AI is the child of big data, big computation and ultimately big business, intensifying the inherent tendency of digital technology to concentrate power. AI is already unravelling the fundamentals of the social, political and economic world around us, but this is a world that needs radical reimagining to overcome the global environmental and human challenges that confront us. Our challenge is whether to let the threads fall as they may, or to use them to weave a better future.
Linux Support for SMARC: How Toradex Empowers Embedded DevelopersToradex
Toradex brings robust Linux support to SMARC (Smart Mobility Architecture), ensuring high performance and long-term reliability for embedded applications. Here’s how:
• Optimized Torizon OS & Yocto Support – Toradex provides Torizon OS, a Debian-based easy-to-use platform, and Yocto BSPs for customized Linux images on SMARC modules.
• Seamless Integration with i.MX 8M Plus and i.MX 95 – Toradex SMARC solutions leverage NXP’s i.MX 8 M Plus and i.MX 95 SoCs, delivering power efficiency and AI-ready performance.
• Secure and Reliable – With Secure Boot, over-the-air (OTA) updates, and LTS kernel support, Toradex ensures industrial-grade security and longevity.
• Containerized Workflows for AI & IoT – Support for Docker, ROS, and real-time Linux enables scalable AI, ML, and IoT applications.
• Strong Ecosystem & Developer Support – Toradex offers comprehensive documentation, developer tools, and dedicated support, accelerating time-to-market.
With Toradex’s Linux support for SMARC, developers get a scalable, secure, and high-performance solution for industrial, medical, and AI-driven applications.
Do you have a specific project or application in mind where you're considering SMARC? We can help with Free Compatibility Check and help you with quick time-to-market
For more information: https://www.toradex.com/computer-on-modules/smarc-arm-family
Artificial Intelligence is providing benefits in many areas of work within the heritage sector, from image analysis, to ideas generation, and new research tools. However, it is more critical than ever for people, with analogue intelligence, to ensure the integrity and ethical use of AI. Including real people can improve the use of AI by identifying potential biases, cross-checking results, refining workflows, and providing contextual relevance to AI-driven results.
News about the impact of AI often paints a rosy picture. In practice, there are many potential pitfalls. This presentation discusses these issues and looks at the role of analogue intelligence and analogue interfaces in providing the best results to our audiences. How do we deal with factually incorrect results? How do we get content generated that better reflects the diversity of our communities? What roles are there for physical, in-person experiences in the digital world?
HCL Nomad Web – Best Practices and Managing Multiuser Environmentspanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-nomad-web-best-practices-and-managing-multiuser-environments/
HCL Nomad Web is heralded as the next generation of the HCL Notes client, offering numerous advantages such as eliminating the need for packaging, distribution, and installation. Nomad Web client upgrades will be installed “automatically” in the background. This significantly reduces the administrative footprint compared to traditional HCL Notes clients. However, troubleshooting issues in Nomad Web present unique challenges compared to the Notes client.
Join Christoph and Marc as they demonstrate how to simplify the troubleshooting process in HCL Nomad Web, ensuring a smoother and more efficient user experience.
In this webinar, we will explore effective strategies for diagnosing and resolving common problems in HCL Nomad Web, including
- Accessing the console
- Locating and interpreting log files
- Accessing the data folder within the browser’s cache (using OPFS)
- Understand the difference between single- and multi-user scenarios
- Utilizing Client Clocking
Role of Data Annotation Services in AI-Powered ManufacturingAndrew Leo
From predictive maintenance to robotic automation, AI is driving the future of manufacturing. But without high-quality annotated data, even the smartest models fall short.
Discover how data annotation services are powering accuracy, safety, and efficiency in AI-driven manufacturing systems.
Precision in data labeling = Precision on the production floor.
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...Noah Loul
Artificial intelligence is changing how businesses operate. Companies are using AI agents to automate tasks, reduce time spent on repetitive work, and focus more on high-value activities. Noah Loul, an AI strategist and entrepreneur, has helped dozens of companies streamline their operations using smart automation. He believes AI agents aren't just tools—they're workers that take on repeatable tasks so your human team can focus on what matters. If you want to reduce time waste and increase output, AI agents are the next move.
Quantum Computing Quick Research Guide by Arthur MorganArthur Morgan
This is a Quick Research Guide (QRG).
QRGs include the following:
- A brief, high-level overview of the QRG topic.
- A milestone timeline for the QRG topic.
- Links to various free online resource materials to provide a deeper dive into the QRG topic.
- Conclusion and a recommendation for at least two books available in the SJPL system on the QRG topic.
QRGs planned for the series:
- Artificial Intelligence QRG
- Quantum Computing QRG
- Big Data Analytics QRG
- Spacecraft Guidance, Navigation & Control QRG (coming 2026)
- UK Home Computing & The Birth of ARM QRG (coming 2027)
Any questions or comments?
- Please contact Arthur Morgan at [email protected].
100% human made.
Generative Artificial Intelligence (GenAI) in BusinessDr. Tathagat Varma
My talk for the Indian School of Business (ISB) Emerging Leaders Program Cohort 9. In this talk, I discussed key issues around adoption of GenAI in business - benefits, opportunities and limitations. I also discussed how my research on Theory of Cognitive Chasms helps address some of these issues
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...Aqusag Technologies
In late April 2025, a significant portion of Europe, particularly Spain, Portugal, and parts of southern France, experienced widespread, rolling power outages that continue to affect millions of residents, businesses, and infrastructure systems.
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager APIUiPathCommunity
Join this UiPath Community Berlin meetup to explore the Orchestrator API, Swagger interface, and the Test Manager API. Learn how to leverage these tools to streamline automation, enhance testing, and integrate more efficiently with UiPath. Perfect for developers, testers, and automation enthusiasts!
📕 Agenda
Welcome & Introductions
Orchestrator API Overview
Exploring the Swagger Interface
Test Manager API Highlights
Streamlining Automation & Testing with APIs (Demo)
Q&A and Open Discussion
Perfect for developers, testers, and automation enthusiasts!
👉 Join our UiPath Community Berlin chapter: https://community.uipath.com/berlin/
This session streamed live on April 29, 2025, 18:00 CET.
Check out all our upcoming UiPath Community sessions at https://community.uipath.com/events/.