Overview of the newest features of Hortonworks DataFlow highlighting the new processors, new user interface, edge intelligence powered by Apache MiNiFi and new support for multi-tenancy and new zero master clustering architecture
Learn more: http://hortonworks.com/hdf/
Log data can be complex to capture, typically collected in limited amounts and difficult to operationalize at scale. HDF expands the capabilities of log analytics integration options for easy and secure edge analytics of log files in the following ways:
More efficient collection and movement of log data by prioritizing, enriching and/or transforming data at the edge to dynamically separate critical data. The relevant data is then delivered into log analytics systems in a real-time, prioritized and secure manner.
Cost-effective expansion of existing log analytics infrastructure by improving error detection and troubleshooting through more comprehensive data sets.
Intelligent edge analytics to support real-time content-based routing, prioritization, and simultaneous delivery of data into Connected Data Platforms, log analytics and reporting systems for comprehensive coverage and retention of Internet of Anything data.
Scaling real time streaming architectures with HDF and Dell EMC IsilonHortonworks
Streaming Analytics are the new normal. Customers are exploring use cases that have quickly transitioned from batch to near real time. Hortonworks Data Flow / Apache NiFi and Isilon provide a robust scalable architecture to enable real time streaming architectures. Explore our use cases and demo on how Hortonworks Data Flow and Isilon can empower your business for real time success
The document introduces Hortonworks DataFlow (HDF) and Apache NiFi. It discusses how HDF addresses challenges with enterprise data flow, such as variability in data formats/schemas, size/speed of data, security, and scaling. HDF provides a visual user interface and secure end-to-end data routing. It also offers data provenance for governance and compliance. HDF and Apache NiFi can be used for real-time data ingestion and management, data as a service, regulatory compliance, security applications like asset/personnel protection and fraud prevention, and other big data use cases.
MiNiFi is a recently started sub-project of Apache NiFi that is a complementary data collection approach which supplements the core tenets of NiFi in dataflow management, focusing on the collection of data at the source of its creation. Simply, MiNiFi agents take the guiding principles of NiFi and pushes them to the edge in a purpose built design and deploy manner. This talk will focus on MiNiFi's features, go over recent developments and prospective plans, and give a live demo of MiNiFi.
The config.yml is available here: https://gist.github.com/JPercivall/f337b8abdc9019cab5ff06cb7f6ff09a
Learn how Hortonworks Data Flow (HDF), powered by Apache Nifi, enables organizations to harness IoAT data streams to drive business and operational insights. We will use the session to provide an overview of HDF, including detailed hands-on lab to build HDF pipelines for capture and analysis of streaming data.
Recording and labs available at:
http://hortonworks.com/partners/learn/#hdf
ODPi is a non-profit organization committed to simplifying and standardizing the big data ecosystem. It provides common runtime specifications and reference implementations to enable "test once, run everywhere" compatibility. ODPi operates under an open governance model where any company can join as a member and all members have an equal vote. The organization's specifications aim to reduce costs and complexity for ISVs, end customers, and Hadoop platform providers.
Hortonworks Data In Motion Series Part 3 - HDF Ambari Hortonworks
How To: Hortonworks DataFlow 2.0 with Ambari and Ranger for integrated installation, deployment and operations of Apache NiFi.
On demand webinar with demo: http://hortonworks.com/webinar/getting-goal-big-data-faster-enterprise-readiness-data-motion/
This document provides an agenda and overview of topics for a Hortonworks data movement and management meetup. The agenda includes networking, introductions, discussions on Falcon use cases and releases, Hive disaster recovery, server-side extensions, ADF/instance search, Hive-based ingestion/export, Spark integration, and Sqoop 2 features. An overview of Falcon describes its high-level abstraction of Hadoop data processing services. Usage scenarios focus on dataset replication, lifecycle management, and lineage/traceability. The document also discusses Falcon examples for replication, retention, and late data handling.
Hortonworks Data in Motion Webinar Series - Part 1Hortonworks
VIEW THE ON-DEMAND WEBINAR: http://hortonworks.com/webinar/introduction-hortonworks-dataflow/
Learn about Hortonworks DataFlow (HDFTM) and how you can easily augment your existing data systems – Hadoop and otherwise. Learn what Dataflow is all about and how Apache NiFi, MiNiFi, Kafka and Storm work together for streaming analytics.
This document provides an overview of Apache NiFi 1.0 and discusses its new enhancements, including a modernized UI with a complete interface redesign, multitenant authorization capabilities, zero master clustering, and foundational work for software development lifecycles. It also outlines NiFi's use for data flow management and integration with downstream systems.
This document provides an introduction to Apache Spark and Zeppelin. It describes Spark as an open source cluster computing framework, and its APIs for Scala, Java, Python and R. Key Spark components are outlined like Spark Core, Spark SQL, MLlib and GraphX. RDDs are defined as Spark's primary abstraction, and DataFrames/Datasets are presented as higher-level APIs built on RDDs. The benefits of Spark SQL for structured data are highlighted. Examples demonstrate basic Spark and SQL usage. Finally, Apache Zeppelin and the Hortonworks sandbox are introduced as tools for interactive data analytics on Spark and Hadoop clusters.
View the recording:
http://hortonworks.com/webinar/accelerating-real-time-data-ingest-hadoop/
Hadoop didn’t disrupt the data center. The exploding amounts of data did. But, let’s face it, if you can’t move your data to Hadoop, then you can’t use it in Hadoop. The experts from Hortonworks, the #1 leader in Hadoop development, and Attunity, a leading data management software provider, cover:
- How to ingest your most valuable data into Hadoop using Attunity Replicate
- About how customers are using Hortonworks DataFlow (HDF) powered by Apache NiFi
- How to combine the real-time change data capture (CDC) technology with connected data platforms from Hortonworks
We discuss how Attunity Replicate and Hortonworks Data Flow (HDF) work together to move data into Hadoop.
Double Your Hadoop Hardware Performance with SmartSenseHortonworks
Hortonworks SmartSense provides proactive recommendations that improve cluster performance, security and operations. And since 30% of issues are configuration related, Hortonworks SmartSense makes an immediate impact on Hadoop system performance and availability, in some cases boosting hardware performance by two times. Learn how SmartSense can help you increase the efficiency of your Hadoop hardware, through customized cluster recommendations.
View the on-demand webinar: https://hortonworks.com/webinar/boosts-hadoop-hardware-performance-2x-smartsense/
The document discusses Apache NiFi and its role in the Hadoop ecosystem. It provides an overview of NiFi, describes how it can be used to integrate with Hadoop components like HDFS, HBase, and Kafka. It also discusses how NiFi supports stream processing integrations and outlines some use cases. The document concludes by discussing future work, including improving NiFi's high availability, multi-tenancy, and expanding its ecosystem integrations.
This document discusses extending the functionality of Apache NiFi through custom processors and controller services. It provides an overview of the NiFi architecture and repositories, describes how to create extensions with minimal dependencies using Maven archetypes, and notes that most extensions can be developed within hours. Quick prototyping of data flows is possible using existing binaries, applications, and scripting languages. Resources for the NiFi developer guide and example Maven projects are also listed.
Delivering a Flexible IT Infrastructure for Analytics on IBM Power SystemsHortonworks
Customers are preparing themselves to analyze and manage an increasing quantity of structured and unstructured data. Business leaders introduce new analytical workloads faster than what IT departments can handle. Legacy IT infrastructure needs to evolve to deliver operational improvements and cost containment, while increasing flexibility to meet future requirements. By providing HDP on IBM Power Systems, Hortonworks and IBM are giving customers have more choice in selecting the appropriate architectural platform that is right for them. In this webinar, we’ll discuss some of the challenges with deploying big data platforms, and how choosing solutions built with HDP on IBM Power Systems can offer tangible benefits and flexibility to accommodate changing needs.
Hortonworks Technical Workshop: What's New in HDP 2.3Hortonworks
Hortonworks Data Platform (HDP) 2.3 includes several new capabilities:
1) It improves the user experience with more guided configuration, customizable dashboards, and improved workload management.
2) It enhances security with new data encryption at rest and extends data governance.
3) It adds proactive cluster monitoring through Hortonworks SmartSense to enhance support.
Using Spark Streaming and NiFi for the next generation of ETL in the enterpriseDataWorks Summit
In recent years, big data has moved from batch processing to stream-based processing since no one wants to wait hours or days to gain insights. Dozens of stream processing frameworks exist today and the same trend that occurred in the batch-based big data processing realm has taken place in the streaming world so that nearly every streaming framework now supports higher level relational operations.
On paper, combining Apache NiFi, Kafka, and Spark Streaming provides a compelling architecture option for building your next generation ETL data pipeline in near real time. What does this look like in an enterprise production environment to deploy and operationalized?
The newer Spark Structured Streaming provides fast, scalable, fault-tolerant, end-to-end exactly-once stream processing with elegant code samples, but is that the whole story?
We discuss the drivers and expected benefits of changing the existing event processing systems. In presenting the integrated solution, we will explore the key components of using NiFi, Kafka, and Spark, then share the good, the bad, and the ugly when trying to adopt these technologies into the enterprise. This session is targeted toward architects and other senior IT staff looking to continue their adoption of open source technology and modernize ingest/ETL processing. Attendees will take away lessons learned and experience in deploying these technologies to make their journey easier.
Apache Spark 2.0 set the architectural foundations of structure in Spark, unified high-level APIs, structured streaming, and the underlying performant components like Catalyst Optimizer and Tungsten Engine. Since then the Spark community has continued to build new features and fix numerous issues in releases Spark 2.1 and 2.2.
Apache Spark 2.3 & 2.4 has made similar strides too. In this talk, we want to highlight some of the new features and enhancements, such as:
• Apache Spark and Kubernetes
• Native Vectorized ORC and SQL Cache Readers
• Pandas UDFs for PySpark
• Continuous Stream Processing
• Barrier Execution
• Avro/Image Data Source
• Higher-order Functions
Speaker: Robert Hryniewicz, AI Evangelist, Hortonworks
1. Sanjay Radia discusses designing data systems for data everywhere, including on-premises and in the cloud.
2. He advocates for applying a consistent set of tools, managing data in one place, and sharing security and governance across systems.
3. Radia also notes the importance of leveraging innovation from open source communities to continually improve the systems.
Connecting the Drops with Apache NiFi & Apache MiNiFiDataWorks Summit
Demand for increased capture of information to drive analytic insights into an organizations' assets and infrastructure is growing at unprecedented rates. However, as data volume growth soars, the ability to provide seamless ingestion pipelines becomes operationally complex as the magnitude of data sources and types expands.
This talk will focus on the efforts of the Apache NiFi community including subproject, MiNiFi; an agent based architecture and its relation to the core Apache NiFi project. MiNiFi is focused on providing a platform that meets and adapts to where data is born while providing the core tenets of NiFi in provenance, security, and command and control. These capabilities provide versatile avenues for the bi-directional exchange of information across data and control planes while dealing with the constraints of operation at opposite ends of the scale spectrum tackling the first and last miles of dataflow management.
We will highlight ongoing and new efforts in the community to provide greater flexibility with deployment and configuration management of flows. Versioned flows provide greater operational flexibility and serve as a powerful foundation to orchestrate the collection and transmission from the point of data's inception through to its transmission to consumers and processing systems.
Yifeng Jiang presented on Apache Hive's present and future capabilities. Hive has achieved 100x performance improvements through technologies like ORC file format, Tez execution engine, and vectorized processing. Upcoming features like LLAP caching and a persistent Hive server aim to provide sub-second query response times for interactive analytics. Hive continues to evolve as the standard SQL interface for Hadoop, supporting a wide range of use cases from ETL and reporting to real-time analytics.
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...Hortonworks
Companies in every industry look for ways to explore new data types and large data sets that were previously too big to capture, store and process. They need to unlock insights from data such as clickstream, geo-location, sensor, server log, social, text and video data. However, becoming a data-first enterprise comes with many challenges.
Join this webinar organized by three leaders in their respective fields and learn from our experts how you can accelerate the implementation of a scalable, cost-efficient and robust Big Data solution. Cisco, Hortonworks and Red Hat will explore how new data sets can enrich existing analytic applications with new perspectives and insights and how they can help you drive the creation of innovative new apps that provide new value to your business.
The document discusses new features in Apache Ambari 2.2, including Express Upgrade which allows automated upgrades of Hadoop clusters with downtime but potentially faster completion compared to Rolling Upgrade. Other new features include exporting metric graph data, user-selectable timezones and time ranges for dashboards, automatic logout of inactive Ambari web users, and saving Kerberos admin credentials for easier cluster changes.
Apache Spark 2.0 set the architectural foundations of structure in Spark, unified high-level APIs, structured streaming, and the underlying performant components like Catalyst Optimizer and Tungsten Engine. Since then the Spark community has continued to build new features and fix numerous issues in releases Spark 2.1 and 2.2.
Apache Spark 2.3 has made similar strides too, introducing new features and resolving over 1300 JIRA issues. Likewise, Apache Spark 2.4 will have many JIRA issues resolved over 1100. In this talk, I want to skim and go through those notable features and changes.
Hortonworks aims to make Apache Hadoop easier for enterprises to adopt by addressing technology and knowledge gaps. As the largest corporate contributor to Apache Hadoop projects, Hortonworks will release regular sustaining releases to make Hadoop more robust, easy to install, manage, use, integrate and extend. Their goal is for anyone to easily deploy Hadoop directly from Apache and for Hadoop to become the standard for managing all enterprise data.
Using Spark Streaming and NiFi for the Next Generation of ETL in the EnterpriseDataWorks Summit
In recent years, big data has moved from batch processing to stream-based processing since no one wants to wait hours or days to gain insights. Dozens of stream processing frameworks exist today and the same trend that occurred in the batch-based big data processing realm has taken place in the streaming world so that nearly every streaming framework now supports higher level relational operations.
On paper, combining Apache NiFi, Kafka, and Spark Streaming provides a compelling architecture option for building your next generation ETL data pipeline in near real time. What does this look like in an enterprise production environment to deploy and operationalized?
The newer Spark Structured Streaming provides fast, scalable, fault-tolerant, end-to-end exactly-once stream processing with elegant code samples, but is that the whole story?
We discuss the drivers and expected benefits of changing the existing event processing systems. In presenting the integrated solution, we will explore the key components of using NiFi, Kafka, and Spark, then share the good, the bad, and the ugly when trying to adopt these technologies into the enterprise. This session is targeted toward architects and other senior IT staff looking to continue their adoption of open source technology and modernize ingest/ETL processing. Attendees will take away lessons learned and experience in deploying these technologies to make their journey easier.
Speaker: Andrew Psaltis, Principal Solution Engineer, Hortonworks
Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...Hortonworks
Apache NiFi, Storm and Kafka augment each other in modern enterprise architectures. NiFi provides a coding free solution to get many different formats and protocols in and out of Kafka and compliments Kafka with full audit trails and interactive command and control. Storm compliments NiFi with the capability to handle complex event processing.
Join us to learn how Apache NiFi, Storm and Kafka can augment each other for creating a new dataplane connecting multiple systems within your enterprise with ease, speed and increased productivity.
https://www.brighttalk.com/webcast/9573/224063
MiNiFi is a recently started sub-project of Apache NiFi that is a complementary data collection approach which supplements the core tenets of NiFi in dataflow management, focusing on the collection of data at the source of its creation. Simply, MiNiFi agents take the guiding principles of NiFi and pushes them to the edge in a purpose built design and deploy manner. This talk will focus on MiNiFi's features, go over recent developments and prospective plans, and give a live demo of MiNiFi.
The config.yml is available here: https://gist.github.com/JPercivall/f337b8abdc9019cab5ff06cb7f6ff09a
Hortonworks Data in Motion Webinar Series - Part 1Hortonworks
VIEW THE ON-DEMAND WEBINAR: http://hortonworks.com/webinar/introduction-hortonworks-dataflow/
Learn about Hortonworks DataFlow (HDFTM) and how you can easily augment your existing data systems – Hadoop and otherwise. Learn what Dataflow is all about and how Apache NiFi, MiNiFi, Kafka and Storm work together for streaming analytics.
This document provides an overview of Apache NiFi 1.0 and discusses its new enhancements, including a modernized UI with a complete interface redesign, multitenant authorization capabilities, zero master clustering, and foundational work for software development lifecycles. It also outlines NiFi's use for data flow management and integration with downstream systems.
This document provides an introduction to Apache Spark and Zeppelin. It describes Spark as an open source cluster computing framework, and its APIs for Scala, Java, Python and R. Key Spark components are outlined like Spark Core, Spark SQL, MLlib and GraphX. RDDs are defined as Spark's primary abstraction, and DataFrames/Datasets are presented as higher-level APIs built on RDDs. The benefits of Spark SQL for structured data are highlighted. Examples demonstrate basic Spark and SQL usage. Finally, Apache Zeppelin and the Hortonworks sandbox are introduced as tools for interactive data analytics on Spark and Hadoop clusters.
View the recording:
http://hortonworks.com/webinar/accelerating-real-time-data-ingest-hadoop/
Hadoop didn’t disrupt the data center. The exploding amounts of data did. But, let’s face it, if you can’t move your data to Hadoop, then you can’t use it in Hadoop. The experts from Hortonworks, the #1 leader in Hadoop development, and Attunity, a leading data management software provider, cover:
- How to ingest your most valuable data into Hadoop using Attunity Replicate
- About how customers are using Hortonworks DataFlow (HDF) powered by Apache NiFi
- How to combine the real-time change data capture (CDC) technology with connected data platforms from Hortonworks
We discuss how Attunity Replicate and Hortonworks Data Flow (HDF) work together to move data into Hadoop.
Double Your Hadoop Hardware Performance with SmartSenseHortonworks
Hortonworks SmartSense provides proactive recommendations that improve cluster performance, security and operations. And since 30% of issues are configuration related, Hortonworks SmartSense makes an immediate impact on Hadoop system performance and availability, in some cases boosting hardware performance by two times. Learn how SmartSense can help you increase the efficiency of your Hadoop hardware, through customized cluster recommendations.
View the on-demand webinar: https://hortonworks.com/webinar/boosts-hadoop-hardware-performance-2x-smartsense/
The document discusses Apache NiFi and its role in the Hadoop ecosystem. It provides an overview of NiFi, describes how it can be used to integrate with Hadoop components like HDFS, HBase, and Kafka. It also discusses how NiFi supports stream processing integrations and outlines some use cases. The document concludes by discussing future work, including improving NiFi's high availability, multi-tenancy, and expanding its ecosystem integrations.
This document discusses extending the functionality of Apache NiFi through custom processors and controller services. It provides an overview of the NiFi architecture and repositories, describes how to create extensions with minimal dependencies using Maven archetypes, and notes that most extensions can be developed within hours. Quick prototyping of data flows is possible using existing binaries, applications, and scripting languages. Resources for the NiFi developer guide and example Maven projects are also listed.
Delivering a Flexible IT Infrastructure for Analytics on IBM Power SystemsHortonworks
Customers are preparing themselves to analyze and manage an increasing quantity of structured and unstructured data. Business leaders introduce new analytical workloads faster than what IT departments can handle. Legacy IT infrastructure needs to evolve to deliver operational improvements and cost containment, while increasing flexibility to meet future requirements. By providing HDP on IBM Power Systems, Hortonworks and IBM are giving customers have more choice in selecting the appropriate architectural platform that is right for them. In this webinar, we’ll discuss some of the challenges with deploying big data platforms, and how choosing solutions built with HDP on IBM Power Systems can offer tangible benefits and flexibility to accommodate changing needs.
Hortonworks Technical Workshop: What's New in HDP 2.3Hortonworks
Hortonworks Data Platform (HDP) 2.3 includes several new capabilities:
1) It improves the user experience with more guided configuration, customizable dashboards, and improved workload management.
2) It enhances security with new data encryption at rest and extends data governance.
3) It adds proactive cluster monitoring through Hortonworks SmartSense to enhance support.
Using Spark Streaming and NiFi for the next generation of ETL in the enterpriseDataWorks Summit
In recent years, big data has moved from batch processing to stream-based processing since no one wants to wait hours or days to gain insights. Dozens of stream processing frameworks exist today and the same trend that occurred in the batch-based big data processing realm has taken place in the streaming world so that nearly every streaming framework now supports higher level relational operations.
On paper, combining Apache NiFi, Kafka, and Spark Streaming provides a compelling architecture option for building your next generation ETL data pipeline in near real time. What does this look like in an enterprise production environment to deploy and operationalized?
The newer Spark Structured Streaming provides fast, scalable, fault-tolerant, end-to-end exactly-once stream processing with elegant code samples, but is that the whole story?
We discuss the drivers and expected benefits of changing the existing event processing systems. In presenting the integrated solution, we will explore the key components of using NiFi, Kafka, and Spark, then share the good, the bad, and the ugly when trying to adopt these technologies into the enterprise. This session is targeted toward architects and other senior IT staff looking to continue their adoption of open source technology and modernize ingest/ETL processing. Attendees will take away lessons learned and experience in deploying these technologies to make their journey easier.
Apache Spark 2.0 set the architectural foundations of structure in Spark, unified high-level APIs, structured streaming, and the underlying performant components like Catalyst Optimizer and Tungsten Engine. Since then the Spark community has continued to build new features and fix numerous issues in releases Spark 2.1 and 2.2.
Apache Spark 2.3 & 2.4 has made similar strides too. In this talk, we want to highlight some of the new features and enhancements, such as:
• Apache Spark and Kubernetes
• Native Vectorized ORC and SQL Cache Readers
• Pandas UDFs for PySpark
• Continuous Stream Processing
• Barrier Execution
• Avro/Image Data Source
• Higher-order Functions
Speaker: Robert Hryniewicz, AI Evangelist, Hortonworks
1. Sanjay Radia discusses designing data systems for data everywhere, including on-premises and in the cloud.
2. He advocates for applying a consistent set of tools, managing data in one place, and sharing security and governance across systems.
3. Radia also notes the importance of leveraging innovation from open source communities to continually improve the systems.
Connecting the Drops with Apache NiFi & Apache MiNiFiDataWorks Summit
Demand for increased capture of information to drive analytic insights into an organizations' assets and infrastructure is growing at unprecedented rates. However, as data volume growth soars, the ability to provide seamless ingestion pipelines becomes operationally complex as the magnitude of data sources and types expands.
This talk will focus on the efforts of the Apache NiFi community including subproject, MiNiFi; an agent based architecture and its relation to the core Apache NiFi project. MiNiFi is focused on providing a platform that meets and adapts to where data is born while providing the core tenets of NiFi in provenance, security, and command and control. These capabilities provide versatile avenues for the bi-directional exchange of information across data and control planes while dealing with the constraints of operation at opposite ends of the scale spectrum tackling the first and last miles of dataflow management.
We will highlight ongoing and new efforts in the community to provide greater flexibility with deployment and configuration management of flows. Versioned flows provide greater operational flexibility and serve as a powerful foundation to orchestrate the collection and transmission from the point of data's inception through to its transmission to consumers and processing systems.
Yifeng Jiang presented on Apache Hive's present and future capabilities. Hive has achieved 100x performance improvements through technologies like ORC file format, Tez execution engine, and vectorized processing. Upcoming features like LLAP caching and a persistent Hive server aim to provide sub-second query response times for interactive analytics. Hive continues to evolve as the standard SQL interface for Hadoop, supporting a wide range of use cases from ETL and reporting to real-time analytics.
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...Hortonworks
Companies in every industry look for ways to explore new data types and large data sets that were previously too big to capture, store and process. They need to unlock insights from data such as clickstream, geo-location, sensor, server log, social, text and video data. However, becoming a data-first enterprise comes with many challenges.
Join this webinar organized by three leaders in their respective fields and learn from our experts how you can accelerate the implementation of a scalable, cost-efficient and robust Big Data solution. Cisco, Hortonworks and Red Hat will explore how new data sets can enrich existing analytic applications with new perspectives and insights and how they can help you drive the creation of innovative new apps that provide new value to your business.
The document discusses new features in Apache Ambari 2.2, including Express Upgrade which allows automated upgrades of Hadoop clusters with downtime but potentially faster completion compared to Rolling Upgrade. Other new features include exporting metric graph data, user-selectable timezones and time ranges for dashboards, automatic logout of inactive Ambari web users, and saving Kerberos admin credentials for easier cluster changes.
Apache Spark 2.0 set the architectural foundations of structure in Spark, unified high-level APIs, structured streaming, and the underlying performant components like Catalyst Optimizer and Tungsten Engine. Since then the Spark community has continued to build new features and fix numerous issues in releases Spark 2.1 and 2.2.
Apache Spark 2.3 has made similar strides too, introducing new features and resolving over 1300 JIRA issues. Likewise, Apache Spark 2.4 will have many JIRA issues resolved over 1100. In this talk, I want to skim and go through those notable features and changes.
Hortonworks aims to make Apache Hadoop easier for enterprises to adopt by addressing technology and knowledge gaps. As the largest corporate contributor to Apache Hadoop projects, Hortonworks will release regular sustaining releases to make Hadoop more robust, easy to install, manage, use, integrate and extend. Their goal is for anyone to easily deploy Hadoop directly from Apache and for Hadoop to become the standard for managing all enterprise data.
Using Spark Streaming and NiFi for the Next Generation of ETL in the EnterpriseDataWorks Summit
In recent years, big data has moved from batch processing to stream-based processing since no one wants to wait hours or days to gain insights. Dozens of stream processing frameworks exist today and the same trend that occurred in the batch-based big data processing realm has taken place in the streaming world so that nearly every streaming framework now supports higher level relational operations.
On paper, combining Apache NiFi, Kafka, and Spark Streaming provides a compelling architecture option for building your next generation ETL data pipeline in near real time. What does this look like in an enterprise production environment to deploy and operationalized?
The newer Spark Structured Streaming provides fast, scalable, fault-tolerant, end-to-end exactly-once stream processing with elegant code samples, but is that the whole story?
We discuss the drivers and expected benefits of changing the existing event processing systems. In presenting the integrated solution, we will explore the key components of using NiFi, Kafka, and Spark, then share the good, the bad, and the ugly when trying to adopt these technologies into the enterprise. This session is targeted toward architects and other senior IT staff looking to continue their adoption of open source technology and modernize ingest/ETL processing. Attendees will take away lessons learned and experience in deploying these technologies to make their journey easier.
Speaker: Andrew Psaltis, Principal Solution Engineer, Hortonworks
Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...Hortonworks
Apache NiFi, Storm and Kafka augment each other in modern enterprise architectures. NiFi provides a coding free solution to get many different formats and protocols in and out of Kafka and compliments Kafka with full audit trails and interactive command and control. Storm compliments NiFi with the capability to handle complex event processing.
Join us to learn how Apache NiFi, Storm and Kafka can augment each other for creating a new dataplane connecting multiple systems within your enterprise with ease, speed and increased productivity.
https://www.brighttalk.com/webcast/9573/224063
MiNiFi is a recently started sub-project of Apache NiFi that is a complementary data collection approach which supplements the core tenets of NiFi in dataflow management, focusing on the collection of data at the source of its creation. Simply, MiNiFi agents take the guiding principles of NiFi and pushes them to the edge in a purpose built design and deploy manner. This talk will focus on MiNiFi's features, go over recent developments and prospective plans, and give a live demo of MiNiFi.
The config.yml is available here: https://gist.github.com/JPercivall/f337b8abdc9019cab5ff06cb7f6ff09a
Hortonworks Data In Motion Series Part 4Hortonworks
How real-world enterprises leverage Hortonworks DataFlow/Apache NiFi to to create real-time data flows in record time to enable new business opportunities, improve customer retention, accelerate big data projects from months to minutes through increased efficiency and reduced costs.
On-Demand webinar: http://hortonworks.com/webinar/paradigm-shift-business-usual-real-time-dataflows-record-time/
Hortonworks Data In Motion Webinar Series Pt. 2Hortonworks
This document discusses Hortonworks' HDF 2.0 platform for managing data in motion and at rest. The platform includes tools for data ingestion, streaming, and storage. It also allows partners to integrate their solutions and get certified. Use cases highlighted include log analytics, IoT, and connected vehicles. The ecosystem supports ingesting data from various sources and processing it using tools like NiFi, Kafka, and Storm.
Dynamic Column Masking and Row-Level Filtering in HDPHortonworks
As enterprises around the world bring more of their sensitive data into Hadoop data lakes, balancing the need for democratization of access to data without sacrificing strong security principles becomes paramount. In this webinar, Srikanth Venkat, director of product management for security & governance will demonstrate two new data protection capabilities in Apache Ranger – dynamic column masking and row level filtering of data stored in Apache Hive. These features have been introduced as part of HDP 2.5 platform release.
This document discusses Hortonworks Data Cloud for AWS, which allows users to easily deploy and manage Hadoop clusters on AWS. It offers pre-configured cluster types for Spark and Hive workloads. Users launch a "cloud controller" instance to create and manage clusters. Clusters are launched using AMIs containing Ambari and HDP. The document outlines the setup process, including launching the cloud controller, creating clusters, and common issues users may encounter.
Enabling the Real Time Analytical EnterpriseHortonworks
This document discusses enabling real-time analytics in the enterprise. It begins with an overview of the challenges of real-time analytics due to non-integrated systems, varied data types and volumes, and data management complexity. A case study on real-time quality analytics in automotive is presented, highlighting the need to analyze varied data sources quickly to address issues. The Hortonworks/Attunity solution is then introduced using Attunity Replicate to integrate data from various sources in real-time into Hortonworks Data Platform for analysis. A brief demonstration of data streaming from a database into Kafka and then Hortonworks Data Platform is shown.
The document discusses strategies for managing Hive tables stored in cloud storage systems. It notes key differences between cloud storage and traditional file systems, including that cloud storage uses paths instead of directories and keys instead of users/permissions. It then outlines several approaches for micro-managing Hive tables in cloud storage to avoid issues like rename collisions and inconsistent reads. This includes using transactional properties, partitioning, and a "take a number" approach for inserts to track and isolate concurrent writes. Measurements show a 21% reduction in partition load time using these strategies.
How to Use Apache Zeppelin with HWX HDBHortonworks
Part five in a five-part series, this webcast will be a demonstration of the integration of Apache Zeppelin and Pivotal HDB. Apache Zeppelin is a web-based notebook that enables interactive data analytics. You can make beautiful data-driven, interactive and collaborative documents with SQL, Scala and more. This webinar will demonstrate the configuration of the psql interpreter and the basic operations of Apache Zeppelin when used in conjunction with Hortonworks HDB.
Agenda:
1.Data Flow Challenges in an Enterprise
2.Introduction to Apache NiFi
3.Core Features
4.Architecture
5.Demo –Simple Lambda Architecture
6.Use Cases
7.Q & A
This document discusses using Apache Spark and Apache NiFi together for data lakes. It outlines the goals of a data lake including having a central data repository, reducing costs, enabling easier discovery and prototyping. It also discusses what is needed for a Hadoop data lake, including automation of pipelines, governance, and interactive data discovery. The document then provides an example ingestion project and describes using Apache Spark for functions like cleansing, validating, and profiling data. It outlines using Apache NiFi for the pipeline design with drag and drop functionality. Finally, it demonstrates ingesting and preparing data, data self-service and transformation, data discovery, and operational monitoring capabilities.
How Universities Use Big Data to Transform EducationHortonworks
Student performance data is increasingly being captured as part of software-based and online classroom exercises and testing. This data can be augmented with behavioral data captured from sources such as social media, student-professor meeting notes, blogs, student surveys, and so forth to discover new insights to improve student learning. The results transcend traditional IT departments to focus on issues like retention, research, and the delivery of content and courses through new modalities.
Hortonworks is partnering with Microsoft to show you how the Hortonworks Data Platform (HDP) running on the Microsoft stack enables you to develop a “single view of a student”.
Getting involved with Open Source at the ASFHortonworks
The document discusses getting involved with open source projects at the Apache Software Foundation. It provides an overview of the ASF, how it works, and how to contribute to Apache projects. The key points are:
- The ASF is a non-profit organization that oversees hundreds of open source projects and thousands of volunteers. Popular projects include Hadoop, Hive, and Pig.
- To get involved, individuals can start by joining mailing lists, reviewing documentation, reporting issues, and submitting code patches. More responsibilities come with becoming a committer or PMC member.
- Projects follow an open development process based on consensus. Voting on decisions helps include contributors from different time zones.
- Contributing is rewarding
Hortonworks technical workshop operations with ambariHortonworks
Ambari continues on its journey of provisioning, monitoring and managing enterprise Hadoop deployments. With 2.0, Apache Ambari brings a host of new capabilities including updated metric collections; Kerberos setup automation and developer views for Big Data developers. In this Hortonworks Technical Workshop session we will provide an in-depth look into Apache Ambari 2.0 and showcase security setup automation using Ambari 2.0. View the recording at https://www.brighttalk.com/webcast/9573/155575. View the github demo work at https://github.com/abajwa-hw/ambari-workshops/blob/master/blueprints-demo-security.md. Recorded May 28, 2015.
S3Guard: What's in your consistency model?Hortonworks
S3Guard provides a consistent metadata store for S3 using DynamoDB. It allows file system operations on S3, like listing and getting file status, to be consistent by checking results from S3 against metadata stored in DynamoDB. Mutating operations write to both S3 and DynamoDB, while read operations first check S3 results against DynamoDB to handle eventual consistency in S3. The goal is to improve performance of real workloads by providing consistent metadata operations on S3 objects written with S3Guard enabled.
Top 5 Strategies for Retail Data AnalyticsHortonworks
It’s an exciting time for retailers as technology is driving a major disruption in the market. Whether you are just beginning to build a retail data analytics program or you have been gaining advanced insights from your data for quite some time, join Eric and Shish as we explore the trends, drivers and hurdles in retail data analytics
The path to a Modern Data Architecture in Financial ServicesHortonworks
Delivering Data-Driven Applications at the Speed of Business: Global Banking AML use case.
Chief Data Officers in financial services have unique challenges: they need to establish an effective data ecosystem under strict governance and regulatory requirements. They need to build the data-driven applications that enable risk and compliance initiatives to run efficiently. In this webinar, we will discuss the case of a global banking leader and the anti-money laundering solution they built on the data lake. With a single platform to aggregate structured and unstructured information essential to determine and document AML case disposition, they reduced mean time for case resolution by 75%. They have a roadmap for building over 150 data-driven applications on the same search-based data discovery platform so they can mitigate risks and seize opportunities, at the speed of business.
Pivotal - Advanced Analytics for Telecommunications Hortonworks
Innovative mobile operators need to mine the vast troves of unstructured data now available to them to help develop compelling customer experiences and uncover new revenue opportunities. In this webinar, you’ll learn how HDB’s in-database analytics enable advanced use cases in network operations, customer care, and marketing for better customer experience. Join us, and get started on your advanced analytics journey today!
In 2017, more and more corporations are looking to reduce operational overheads in their enterprise data warehouse (EDW) installations. Hortonworks just launched Industry’s first turn key EDW Optimization solution together with our partners Syncsort and AtScale. Join Hortonworks’ CTO Scott Gnau to learn more about this exciting solution and its 3 use cases.
This document summarizes Apache Hadoop release 0.23, which is scheduled to be the first stable release since 0.20 in 2009. Key highlights include improvements to HDFS federation, MapReduce, and high availability. The release aims to support large clusters of thousands of machines with high concurrency. Extensive testing is being done to validate performance gains from changes like MapReduce shuffle reimplementation and optimizations for small jobs. The 0.23 branch is expected in August 2011 with an alpha release in October and production release in late Q1 2012.
Harnessing Data-in-Motion with HDF 2.0, introduction to Apache NIFI/MINIFIHaimo Liu
Introducing the new Hortonworks DataFlow (HDF) release, HDF 2.0. Also provides introduction to the flow management part of the platform, powered by Apache NIFI and MINIFI.
Learn about HDF and how you can easily augment your existing data systems - Hadoop and otherwise. Learn what Dataflow is all about and how Apache NiFi, MiNiFi, Kafka and Storm work together for streaming analytics.
HDF Powered by Apache NiFi IntroductionMilind Pandit
The document discusses Apache NiFi and its role in managing enterprise data flows, providing an overview of NiFi's key features and capabilities for reliable data transfer, preparation, and routing. It also demonstrates how NiFi is used in common use cases and provides examples of building simple data flows in NiFi to ingest, filter, and deliver data.
Data Con LA 2018 - Streaming and IoT by Pat AlwellData Con LA
Hortonworks DataFlow (HDF) is built with the vision of creating a platform that enables enterprises to build dataflow management and streaming analytics solutions that collect, curate, analyze and act on data in motion across the datacenter and cloud. Do you want to be able to provide a complete end-to-end streaming solution, from an IoT device all the way to a dashboard for your business users with no code? Come to this session to learn how this is now possible with HDF 3.1.
Big Data Day LA 2016/ Big Data Track - Building scalable enterprise data flow...Data Con LA
This document discusses Apache NiFi and stream processing. It provides an overview of NiFi's key concepts of managing data flow, data provenance, and securing data. NiFi allows users to visually build data flows with drag and drop processors. It offers features such as guaranteed delivery, data buffering, prioritized queuing, and data provenance. NiFi is based on Flow-Based Programming and is used to reliably transfer data between systems, enrich and prepare data, and deliver data to analytic platforms.
Taking DataFlow Management to the Edge with Apache NiFi/MiNiFiBryan Bende
This document provides an overview of a presentation about taking dataflow management to the edge with Apache NiFi and MiniFi. The presentation discusses the problem of moving data between systems with different formats, protocols, and security requirements. It introduces Apache NiFi as a solution for dataflow management and introduces Apache MiniFi for managing dataflows at the edge. The presentation includes a demo and time for Q&A.
NJ Hadoop Meetup - Apache NiFi Deep DiveBryan Bende
Apache NiFi is a software platform created by Apache to automate the flow of data between systems. It addresses challenges of global enterprise data flow with features like visual command and control, data lineage tracking, data prioritization, and secure data transfer. NiFi is commonly used for reliable transfer of data between systems, delivery of data to analytic platforms, and data enrichment/preparation tasks like format conversion and extraction. It is not intended for distributed computation, complex event processing, or joins.
Data at Scales and the Values of Starting Small with Apache NiFi & MiNiFiAldrin Piri
This document discusses Apache NiFi and Apache MiNiFi. It begins with an overview of NiFi, describing its key features like guaranteed delivery, data buffering, and data provenance. It then introduces MiNiFi as a smaller version of NiFi that can operate on edge devices with limited resources. A use case is presented of a courier service gathering data from disparate sources using both NiFi and MiNiFi. The document concludes by discussing the NiFi ecosystem and encouraging participation in the community.
Future of Data New Jersey - HDF 3.0 Deep DiveAldrin Piri
This document provides an overview and agenda for an HDF 3.0 Deep Dive presentation. It discusses new features in HDF 3.0 like record-based processing using a record reader/writer and QueryRecord processor. It also covers the latest efforts in the Apache NiFi community like component versioning and introducing a registry to enable capabilities like CI/CD, flow migration, and auditing of flows. The presentation demonstrates record processing in NiFi and concludes by discussing the evolution of Apache NiFi and its ecosystem.
State of the Apache NiFi Ecosystem & CommunityAccumulo Summit
This talk will discuss the state of the Apache NiFi Ecosystem & Community.
Apache NiFi is an integrated data logistics platform for automating the movement of data between disparate systems. It provides real-time control that makes it easy to manage the movement of data between any source and any destination. It is data source agnostic, supporting disparate and distributed sources of differing formats, schemas, protocols, speeds and sizes such as machines, geo location devices, click streams, files, social feeds, log files and videos and more. It is configurable plumbing for moving data around, similar to how Fedex, UPS or other courier delivery services move parcels around. And just like those services, Apache NiFi allows you to trace your data in real time, just like you could trace a delivery.
Curing the Kafka blindness—Streams Messaging ManagerDataWorks Summit
Companies who use Kafka today struggle with monitoring and managing Kafka clusters. Kafka is a key backbone of IoT streaming analytics applications. The challenge is understanding what is going on overall in the Kafka cluster including performance, issues and message flows. No open source tool caters to the needs of different users that work with Kafka: DevOps/developers, platform team, and security/governance teams. See how the new Hortonworks Streams Messaging Manager enables users to visualize their entire Kafka environment end-to-end and simplifies Kafka operations.
In this session learn how SMM visualizes the intricate details of how Apache Kafka functions in real time while simultaneously surfacing every nuance of tuning, optimizing, and measuring input and output. SMM will assist users to quickly understand and operate Kafka while providing the much-needed transparency that sophisticated and experienced users need to avoid all the pitfalls of running a Kafka cluster.
HDF 3.1 : An Introduction to New FeaturesTimothy Spann
Hortonworks Data Flow 3.1 introduces new features to improve ease of use, stream processing, cross-product integration, and flow management. Key enhancements include NiFi registry for version control of flows, improved Kafka 1.0 support, and new processors for deeper ecosystem integration. HDF 3.1 provides tools for engineers to aggregate, mediate, and gain insights from data across multiple sources when deployed with Hortonworks Data Platform.
Druid: Sub-Second OLAP queries over Petabytes of Streaming DataDataWorks Summit
When interacting with analytics dashboards in order to achieve a smooth user experience, two major key requirements are sub-second response time and data freshness. Cluster computing frameworks such as Hadoop or Hive/Hbase work well for storing large volumes of data, although they are not optimized for ingesting streaming data and making it available for queries in realtime. Also, long query latencies make these systems sub-optimal choices for powering interactive dashboards and BI use-cases.
In this talk we will present Druid as a complementary solution to existing hadoop based technologies. Druid is an open-source analytics data store, designed from scratch, for OLAP and business intelligence queries over massive data streams. It provides low latency realtime data ingestion and fast sub-second adhoc flexible data exploration queries.
Many large companies are switching to Druid for analytics, and we will cover how druid is able to handle massive data streams and why it is a good fit for BI use cases.
Agenda -
1) Introduction and Ideal Use cases for Druid
2) Data Architecture
3) Streaming Ingestion with Kafka
4) Demo using Druid, Kafka and Superset.
5) Recent Improvements in Druid moving from lambda architecture to Exactly once Ingestion
6) Future Work
The document discusses security features in Hortonworks Data Platform (HDP) and Pivotal HD. It covers authentication with Kerberos, authorization and auditing using Apache Ranger, perimeter security with Apache Knox, and data encryption at rest and in transit. Various security flows are illustrated including typical access to Hive through Beeline and adding authorization, firewall routing, and encryption. Installation and configuration of Ranger and Knox are also outlined.
This document discusses using Apache NiFi and Spark to build a smarter home. It describes using NiFi on a Raspberry Pi and EC2 to collect sensor data from smart home devices and transmit it to an HDP cluster for storage and analysis with Pig and Spark. It outlines the architecture as a hub-and-spoke model and shows the evolution of the NiFi flows from sequential blocking writes to attribute-based routing. Key discoveries include privacy issues, using MAC addresses to predict arrivals, and motion sensors being less useful alone. Challenges involved Oracle vs OpenJDK, backpressure, and site-to-site configuration.
Rescue your Big Data from Downtime with HP Operations Bridge and Apache HadoopHortonworks
How can you simplify the management and monitoring of your Hadoop environment? Ensure IT can focus on the right business priorities supported by Hadoop? Take a look at this presentation and learn how you can simplify the management and monitoring of your Hadoop environment, and ensure IT can focus on the right business priorities supported by Hadoop.
Apache NiFi - Flow Based Programming MeetupJoseph Witt
These are the slides from the July 11th Meetup in Toronto for the Flow Based Programming meetup group at Lighthouse covering Enterprise Dataflow with Apache NiFi.
This document provides an overview of Apache NiFi and data flow fundamentals. It begins with an introduction to Apache NiFi and outlines the agenda. It then discusses data flow and streaming fundamentals, including challenges in moving data effectively. The document introduces Apache NiFi's architecture and capabilities for addressing these challenges. It also previews a live demo of NiFi and discusses the NiFi community.
Dataflow Management From Edge to Core with Apache NiFiDataWorks Summit
What is “dataflow?” — the process and tooling around gathering necessary information and getting it into a useful form to make insights available. Dataflow needs change rapidly — what was noise yesterday may be crucial data today, an API endpoint changes, or a service switches from producing CSV to JSON or Avro. In addition, developers may need to design a flow in a sandbox and deploy to QA or production — and those database passwords aren’t the same (hopefully). Learn about Apache NiFi — a robust and secure framework for dataflow development and monitoring.
Abstract: Identifying, collecting, securing, filtering, prioritizing, transforming, and transporting abstract data is a challenge faced by every organization. Apache NiFi and MiNiFi allow developers to create and refine dataflows with ease and ensure that their critical content is routed, transformed, validated, and delivered across global networks. Learn how the framework enables rapid development of flows, live monitoring and auditing, data protection and sharing. From IoT and machine interaction to log collection, NiFi can scale to meet the needs of your organization. Able to handle both small event messages and “big data” on the scale of terabytes per day, NiFi will provide a platform which lets both engineers and non-technical domain experts collaborate to solve the ingest and storage problems that have plagued enterprises.
Expected prior knowledge / intended audience: developers and data flow managers should be interested in learning about and improving their dataflow problems. The intended audience does not need experience in designing and modifying data flows.
Takeaways: Attendees will gain an understanding of dataflow concepts, data management processes, and flow management (including versioning, rollbacks, promotion between deployment environments, and various backing implementations).
Current uses: I am a committer and PMC member for the Apache NiFi, MiNiFi, and NiFi Registry projects and help numerous users deploy these tools to collect data from an incredibly diverse array of endpoints, aggregate, prioritize, filter, transform, and secure this data, and generate actionable insight from it. Current users of these platforms include many Fortune 100 companies, governments, startups, and individual users across fields like telecommunications, finance, healthcare, automotive, aerospace, and oil & gas, with use cases like fraud detection, logistics management, supply chain management, machine learning, IoT gateway, connected vehicles, smart grids, etc.
Speaker: Andy LoPresto, Sr. Member of Technical Staff, Hortonworks
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next LevelHortonworks
The HDF 3.3 release delivers several exciting enhancements and new features. But, the most noteworthy of them is the addition of support for Kafka 2.0 and Kafka Streams.
https://hortonworks.com/webinar/hortonworks-dataflow-hdf-3-3-taking-stream-processing-next-level/
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT StrategyHortonworks
Forrester forecasts* that direct spending on the Internet of Things (IoT) will exceed $400 Billion by 2023. From manufacturing and utilities, to oil & gas and transportation, IoT improves visibility, reduces downtime, and creates opportunities for entirely new business models.
But successful IoT implementations require far more than simply connecting sensors to a network. The data generated by these devices must be collected, aggregated, cleaned, processed, interpreted, understood, and used. Data-driven decisions and actions must be taken, without which an IoT implementation is bound to fail.
https://hortonworks.com/webinar/iot-predictions-2019-beyond-data-heart-iot-strategy/
Getting the Most Out of Your Data in the Cloud with CloudbreakHortonworks
Cloudbreak, a part of Hortonworks Data Platform (HDP), simplifies the provisioning and cluster management within any cloud environment to help your business toward its path to a hybrid cloud architecture.
https://hortonworks.com/webinar/getting-data-cloud-cloudbreak-live-demo/
Johns Hopkins - Using Hadoop to Secure Access Log EventsHortonworks
In this webinar, we talk with experts from Johns Hopkins as they share techniques and lessons learned in real-world Apache Hadoop implementation.
https://hortonworks.com/webinar/johns-hopkins-using-hadoop-securely-access-log-events/
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad GuysHortonworks
Cybersecurity today is a big data problem. There’s a ton of data landing on you faster than you can load, let alone search it. In order to make sense of it, we need to act on data-in-motion, use both machine learning, and the most advanced pattern recognition system on the planet: your SOC analysts. Advanced visualization makes your analysts more efficient, helps them find the hidden gems, or bombs in masses of logs and packets.
https://hortonworks.com/webinar/catch-hacker-real-time-live-visuals-bots-bad-guys/
We have introduced several new features as well as delivered some significant updates to keep the platform tightly integrated and compatible with HDP 3.0.
https://hortonworks.com/webinar/hortonworks-dataflow-hdf-3-2-release-raises-bar-operational-efficiency/
Curing Kafka Blindness with Hortonworks Streams Messaging ManagerHortonworks
With the growth of Apache Kafka adoption in all major streaming initiatives across large organizations, the operational and visibility challenges associated with Kafka are on the rise as well. Kafka users want better visibility in understanding what is going on in the clusters as well as within the stream flows across producers, topics, brokers, and consumers.
With no tools in the market that readily address the challenges of the Kafka Ops teams, the development teams, and the security/governance teams, Hortonworks Streams Messaging Manager is a game-changer.
https://hortonworks.com/webinar/curing-kafka-blindness-hortonworks-streams-messaging-manager/
Interpretation Tool for Genomic Sequencing Data in Clinical EnvironmentsHortonworks
The healthcare industry—with its huge volumes of big data—is ripe for the application of analytics and machine learning. In this webinar, Hortonworks and Quanam present a tool that uses machine learning and natural language processing in the clinical classification of genomic variants to help identify mutations and determine clinical significance.
Watch the webinar: https://hortonworks.com/webinar/interpretation-tool-genomic-sequencing-data-clinical-environments/
IBM+Hortonworks = Transformation of the Big Data LandscapeHortonworks
Last year IBM and Hortonworks jointly announced a strategic and deep partnership. Join us as we take a close look at the partnership accomplishments and the conjoined road ahead with industry-leading analytics offers.
View the webinar here: https://hortonworks.com/webinar/ibmhortonworks-transformation-big-data-landscape/
The document provides an overview of Apache Druid, an open-source distributed real-time analytics database. It discusses Druid's architecture including segments, indexing, and nodes like brokers, historians and coordinators. It also covers integrating Druid with Hortonworks Data Platform for unified querying and visualization of streaming and historical data.
Accelerating Data Science and Real Time Analytics at ScaleHortonworks
Gaining business advantages from big data is moving beyond just the efficient storage and deep analytics on diverse data sources to using AI methods and analytics on streaming data to catch insights and take action at the edge of the network.
https://hortonworks.com/webinar/accelerating-data-science-real-time-analytics-scale/
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATAHortonworks
Thanks to sensors and the Internet of Things, industrial processes now generate a sea of data. But are you plumbing its depths to find the insight it contains, or are you just drowning in it? Now, Hortonworks and Seeq team to bring advanced analytics and machine learning to time-series data from manufacturing and industrial processes.
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...Hortonworks
Trimble Transportation Enterprise is a leading provider of enterprise software to over 2,000 transportation and logistics companies. They have designed an architecture that leverages Hortonworks Big Data solutions and Machine Learning models to power up multiple Blockchains, which improves operational efficiency, cuts down costs and enables building strategic partnerships.
https://hortonworks.com/webinar/blockchain-with-machine-learning-powered-by-big-data-trimble-transportation-enterprise/
Delivering Real-Time Streaming Data for Healthcare Customers: ClearsenseHortonworks
For years, the healthcare industry has had problems of data scarcity and latency. Clearsense solved the problem by building an open-source Hortonworks Data Platform (HDP) solution while providing decades worth of clinical expertise. Clearsense is delivering smart, real-time streaming data, to its healthcare customers enabling mission-critical data to feed clinical decisions.
https://hortonworks.com/webinar/delivering-smart-real-time-streaming-data-healthcare-customers-clearsense/
Making Enterprise Big Data Small with EaseHortonworks
Every division in an organization builds its own database to keep track of its business. When the organization becomes big, those individual databases grow as well. The data from each database may become silo-ed and have no idea about the data in the other database.
https://hortonworks.com/webinar/making-enterprise-big-data-small-ease/
Driving Digital Transformation Through Global Data ManagementHortonworks
Using your data smarter and faster than your peers could be the difference between dominating your market and merely surviving. Organizations are investing in IoT, big data, and data science to drive better customer experience and create new products, yet these projects often stall in ideation phase to a lack of global data management processes and technologies. Your new data architecture may be taking shape around you, but your goal of globally managing, governing, and securing your data across a hybrid, multi-cloud landscape can remain elusive. Learn how industry leaders are developing their global data management strategy to drive innovation and ROI.
Presented at Gartner Data and Analytics Summit
Speaker:
Dinesh Chandrasekhar
Director of Product Marketing, Hortonworks
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming FeaturesHortonworks
Hortonworks DataFlow (HDF) is the complete solution that addresses the most complex streaming architectures of today’s enterprises. More than 20 billion IoT devices are active on the planet today and thousands of use cases across IIOT, Healthcare and Manufacturing warrant capturing data-in-motion and delivering actionable intelligence right NOW. “Data decay” happens in a matter of seconds in today’s digital enterprises.
To meet all the needs of such fast-moving businesses, we have made significant enhancements and new streaming features in HDF 3.1.
https://hortonworks.com/webinar/series-hdf-3-1-technical-deep-dive-new-streaming-features/
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...Hortonworks
Join the Hortonworks product team as they introduce HDF 3.1 and the core components for a modern data architecture to support stream processing and analytics.
You will learn about the three main themes that HDF addresses:
Developer productivity
Operational efficiency
Platform interoperability
https://hortonworks.com/webinar/series-hdf-3-1-redefining-data-motion-modern-data-architectures/
Unlock Value from Big Data with Apache NiFi and Streaming CDCHortonworks
The document discusses Apache NiFi and streaming change data capture (CDC) with Attunity Replicate. It provides an overview of NiFi's capabilities for dataflow management and visualization. It then demonstrates how Attunity Replicate can be used for real-time CDC to capture changes from source databases and deliver them to NiFi for further processing, enabling use cases across multiple industries. Examples of source systems include SAP, Oracle, SQL Server, and file data, with targets including Hadoop, data warehouses, and cloud data stores.
Procurement Insights Cost To Value Guide.pptxJon Hansen
Procurement Insights integrated Historic Procurement Industry Archives, serves as a powerful complement — not a competitor — to other procurement industry firms. It fills critical gaps in depth, agility, and contextual insight that most traditional analyst and association models overlook.
Learn more about this value- driven proprietary service offering here.
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...TrustArc
Most consumers believe they’re making informed decisions about their personal data—adjusting privacy settings, blocking trackers, and opting out where they can. However, our new research reveals that while awareness is high, taking meaningful action is still lacking. On the corporate side, many organizations report strong policies for managing third-party data and consumer consent yet fall short when it comes to consistency, accountability and transparency.
This session will explore the research findings from TrustArc’s Privacy Pulse Survey, examining consumer attitudes toward personal data collection and practical suggestions for corporate practices around purchasing third-party data.
Attendees will learn:
- Consumer awareness around data brokers and what consumers are doing to limit data collection
- How businesses assess third-party vendors and their consent management operations
- Where business preparedness needs improvement
- What these trends mean for the future of privacy governance and public trust
This discussion is essential for privacy, risk, and compliance professionals who want to ground their strategies in current data and prepare for what’s next in the privacy landscape.
Role of Data Annotation Services in AI-Powered ManufacturingAndrew Leo
From predictive maintenance to robotic automation, AI is driving the future of manufacturing. But without high-quality annotated data, even the smartest models fall short.
Discover how data annotation services are powering accuracy, safety, and efficiency in AI-driven manufacturing systems.
Precision in data labeling = Precision on the production floor.
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...Alan Dix
Talk at the final event of Data Fusion Dynamics: A Collaborative UK-Saudi Initiative in Cybersecurity and Artificial Intelligence funded by the British Council UK-Saudi Challenge Fund 2024, Cardiff Metropolitan University, 29th April 2025
https://alandix.com/academic/talks/CMet2025-AI-Changes-Everything/
Is AI just another technology, or does it fundamentally change the way we live and think?
Every technology has a direct impact with micro-ethical consequences, some good, some bad. However more profound are the ways in which some technologies reshape the very fabric of society with macro-ethical impacts. The invention of the stirrup revolutionised mounted combat, but as a side effect gave rise to the feudal system, which still shapes politics today. The internal combustion engine offers personal freedom and creates pollution, but has also transformed the nature of urban planning and international trade. When we look at AI the micro-ethical issues, such as bias, are most obvious, but the macro-ethical challenges may be greater.
At a micro-ethical level AI has the potential to deepen social, ethnic and gender bias, issues I have warned about since the early 1990s! It is also being used increasingly on the battlefield. However, it also offers amazing opportunities in health and educations, as the recent Nobel prizes for the developers of AlphaFold illustrate. More radically, the need to encode ethics acts as a mirror to surface essential ethical problems and conflicts.
At the macro-ethical level, by the early 2000s digital technology had already begun to undermine sovereignty (e.g. gambling), market economics (through network effects and emergent monopolies), and the very meaning of money. Modern AI is the child of big data, big computation and ultimately big business, intensifying the inherent tendency of digital technology to concentrate power. AI is already unravelling the fundamentals of the social, political and economic world around us, but this is a world that needs radical reimagining to overcome the global environmental and human challenges that confront us. Our challenge is whether to let the threads fall as they may, or to use them to weave a better future.
Linux Support for SMARC: How Toradex Empowers Embedded DevelopersToradex
Toradex brings robust Linux support to SMARC (Smart Mobility Architecture), ensuring high performance and long-term reliability for embedded applications. Here’s how:
• Optimized Torizon OS & Yocto Support – Toradex provides Torizon OS, a Debian-based easy-to-use platform, and Yocto BSPs for customized Linux images on SMARC modules.
• Seamless Integration with i.MX 8M Plus and i.MX 95 – Toradex SMARC solutions leverage NXP’s i.MX 8 M Plus and i.MX 95 SoCs, delivering power efficiency and AI-ready performance.
• Secure and Reliable – With Secure Boot, over-the-air (OTA) updates, and LTS kernel support, Toradex ensures industrial-grade security and longevity.
• Containerized Workflows for AI & IoT – Support for Docker, ROS, and real-time Linux enables scalable AI, ML, and IoT applications.
• Strong Ecosystem & Developer Support – Toradex offers comprehensive documentation, developer tools, and dedicated support, accelerating time-to-market.
With Toradex’s Linux support for SMARC, developers get a scalable, secure, and high-performance solution for industrial, medical, and AI-driven applications.
Do you have a specific project or application in mind where you're considering SMARC? We can help with Free Compatibility Check and help you with quick time-to-market
For more information: https://www.toradex.com/computer-on-modules/smarc-arm-family
AI and Data Privacy in 2025: Global TrendsInData Labs
In this infographic, we explore how businesses can implement effective governance frameworks to address AI data privacy. Understanding it is crucial for developing effective strategies that ensure compliance, safeguard customer trust, and leverage AI responsibly. Equip yourself with insights that can drive informed decision-making and position your organization for success in the future of data privacy.
This infographic contains:
-AI and data privacy: Key findings
-Statistics on AI data privacy in the today’s world
-Tips on how to overcome data privacy challenges
-Benefits of AI data security investments.
Keep up-to-date on how AI is reshaping privacy standards and what this entails for both individuals and organizations.
TrsLabs - Fintech Product & Business ConsultingTrs Labs
Hybrid Growth Mandate Model with TrsLabs
Strategic Investments, Inorganic Growth, Business Model Pivoting are critical activities that business don't do/change everyday. In cases like this, it may benefit your business to choose a temporary external consultant.
An unbiased plan driven by clearcut deliverables, market dynamics and without the influence of your internal office equations empower business leaders to make right choices.
Getting things done within a budget within a timeframe is key to Growing Business - No matter whether you are a start-up or a big company
Talk to us & Unlock the competitive advantage
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptxJustin Reock
Building 10x Organizations with Modern Productivity Metrics
10x developers may be a myth, but 10x organizations are very real, as proven by the influential study performed in the 1980s, ‘The Coding War Games.’
Right now, here in early 2025, we seem to be experiencing YAPP (Yet Another Productivity Philosophy), and that philosophy is converging on developer experience. It seems that with every new method we invent for the delivery of products, whether physical or virtual, we reinvent productivity philosophies to go alongside them.
But which of these approaches actually work? DORA? SPACE? DevEx? What should we invest in and create urgency behind today, so that we don’t find ourselves having the same discussion again in a decade?
Artificial Intelligence is providing benefits in many areas of work within the heritage sector, from image analysis, to ideas generation, and new research tools. However, it is more critical than ever for people, with analogue intelligence, to ensure the integrity and ethical use of AI. Including real people can improve the use of AI by identifying potential biases, cross-checking results, refining workflows, and providing contextual relevance to AI-driven results.
News about the impact of AI often paints a rosy picture. In practice, there are many potential pitfalls. This presentation discusses these issues and looks at the role of analogue intelligence and analogue interfaces in providing the best results to our audiences. How do we deal with factually incorrect results? How do we get content generated that better reflects the diversity of our communities? What roles are there for physical, in-person experiences in the digital world?
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025BookNet Canada
Book industry standards are evolving rapidly. In the first part of this session, we’ll share an overview of key developments from 2024 and the early months of 2025. Then, BookNet’s resident standards expert, Tom Richardson, and CEO, Lauren Stewart, have a forward-looking conversation about what’s next.
Link to recording, transcript, and accompanying resource: https://bnctechforum.ca/sessions/standardsgoals-for-2025-standards-certification-roundup/
Presented by BookNet Canada on May 6, 2025 with support from the Department of Canadian Heritage.
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...SOFTTECHHUB
I started my online journey with several hosting services before stumbling upon Ai EngineHost. At first, the idea of paying one fee and getting lifetime access seemed too good to pass up. The platform is built on reliable US-based servers, ensuring your projects run at high speeds and remain safe. Let me take you step by step through its benefits and features as I explain why this hosting solution is a perfect fit for digital entrepreneurs.
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager APIUiPathCommunity
Join this UiPath Community Berlin meetup to explore the Orchestrator API, Swagger interface, and the Test Manager API. Learn how to leverage these tools to streamline automation, enhance testing, and integrate more efficiently with UiPath. Perfect for developers, testers, and automation enthusiasts!
📕 Agenda
Welcome & Introductions
Orchestrator API Overview
Exploring the Swagger Interface
Test Manager API Highlights
Streamlining Automation & Testing with APIs (Demo)
Q&A and Open Discussion
Perfect for developers, testers, and automation enthusiasts!
👉 Join our UiPath Community Berlin chapter: https://community.uipath.com/berlin/
This session streamed live on April 29, 2025, 18:00 CET.
Check out all our upcoming UiPath Community sessions at https://community.uipath.com/events/.