Learn how Cloudera provides a unified platform that breaks down data silos commonly seen in organizations. By unifying the data needed for applied machine learning, organizations are better equipped to gather valuable insights from their data.
The 6th Wave of Automation: Automation of Decisions | Cloudera Analytics & Ma...Cloudera, Inc.
This presentation provides detail on how we are now in the 6th wave of automation, that is based on Machine Learning. In this 6th wave, Cloudera plays a critical role in providing the data platform for Machine Learning and Analytics built for the Cloud.
Cloudera - The Modern Platform for AnalyticsCloudera, Inc.
This presentation provides an overview of Cloudera and how a modern platform for Machine Learning and Analytics better enables a data-driven enterprise.
Data Science and Machine Learning for the EnterpriseCloudera, Inc.
Overview of Machine Learning and how the Cloudera Data Science Workbench provides full access to data while supporting IT SLAs. The presentation includes details on Fast Forward Labs and The Value of Interpretability in Models.
Big data journey to the cloud maz chaudhri 5.30.18Cloudera, Inc.
We hope this session was valuable in teaching you more about Cloudera Enterprise on AWS, and how fast and easy it is to deploy a modern data management platform—in your cloud and on your terms.
Consolidate your data marts for fast, flexible analytics 5.24.18Cloudera, Inc.
In this webinar, Cloudera and AtScale will showcase:
How a company can modernize their analytic architecture to deliver flexibility and agility to more end-users.
How using AtScale’s Universal Semantic layer can end the data chaos and allow business users to use the data in the modern platform.
Highlight the performance of AtScale and Cloudera’s analytic database with newly completed TPC-DS standard benchmarking.
Best practices for migrating from legacy appliances.
Big data journey to the cloud 5.30.18 asher bartchCloudera, Inc.
We hope this session was valuable in teaching you more about Cloudera Enterprise on AWS, and how fast and easy it is to deploy a modern data management platform—in your cloud and on your terms.
Get started with Cloudera's cyber solutionCloudera, Inc.
Cloudera empowers cybersecurity innovators to proactively secure the enterprise by accelerating threat detection, investigation, and response through machine learning and complete enterprise visibility. Cloudera’s cybersecurity solution, based on Apache Spot, enables anomaly detection, behavior analytics, and comprehensive access across all enterprise data using an open, scalable platform. But what’s the easiest way to get started?
Preparing data for analysis and insights is the foundation of any data-driven exercise. Moving workloads to a PaaS, be it data engineering, analytic database, or data science requires a two step leap of faith - in trusting the public cloud, and then your PaaS vendor. In this webinar we will discuss the architecture of a PaaS solution for data management and understand the nitty gritty details of what exactly this involves with the following:
An exploration of the architecture of Cloudera Altus PaaS - the industry’s first multi-function, multi-cloud data and analytic platform-as-a-service
A dive into use cases and a demo of Altus
The synergy between AWS and Altus to help you securely standardize on a combination of public cloud and data management
3 things to learn:
An exploration of the architecture of Cloudera Altus PaaS - the industry’s first multi-function, multi-cloud data and analytic platform-as-a-service
A dive into use cases and a demo of Altus
The synergy between AWS and Altus to help you securely standardize on a combination of public cloud and data management
Self-service Big Data Analytics on Microsoft AzureCloudera, Inc.
In this presentation Microsoft will join Cloudera to introduce a new Platform-as-a-Service (PaaS) offering that helps data engineers use on-demand cloud infrastructure to speed the creation and operation of data pipelines that power sophisticated, data-driven applications - without onerous administration.
A Community Approach to Fighting Cyber ThreatsCloudera, Inc.
3 Things to Learn About:
*Infinitely scale data storage, access, and machine learning
*Provide community defined open data models for complete enterprise visibility
*Open up application flexibility while building on a future proofed architecture
Cloudera Altus: Big Data in the Cloud Made EasyCloudera, Inc.
Cloudera Altus makes it easier for data engineers, ETL developers, and anyone who regularly works with raw data to process that data in the cloud efficiently and cost effectively. In this webinar we introduce our new platform-as-a-service offering and explore challenges associated with data processing in the cloud today, how Altus abstracts cluster overhead to deliver easy, efficient data processing, and unique features and benefits of Cloudera Altus.
Machine Learning in the Enterprise 2019 Timothy Spann
Machine Learning in the Enterprise 2019. These are the slides for my upcoming demo on integrating Machine Learning and Streaming with Apache NiFi and Cloudera Data Science Workbench. This is for the February 12th, 2019 Future of Data Princeton meetup.
Big data journey to the cloud rohit pujari 5.30.18Cloudera, Inc.
We hope this session was valuable in teaching you more about Cloudera Enterprise on AWS, and how fast and easy it is to deploy a modern data management platform—in your cloud and on your terms.
In this webinar, we’ll show you how Cloudera SDX reduces the complexity in your data management environment and lets you deliver diverse analytics with consistent security, governance, and lifecycle management against a shared data catalog.
Leveraging the cloud for analytics and machine learning 1.29.19Cloudera, Inc.
Learn how organizations are deriving unique customer insights, improving product and services efficiency, and reducing business risk with a modern big data architecture powered by Cloudera on Azure. In this webinar, you see how fast and easy it is to deploy a modern data management platform—in your cloud, on your terms.
Cloudera Data Science Workbench: sparklyr, implyr, and More - dplyr Interfac...Cloudera, Inc.
You like to use R, and you need to use big data. dplyr, one of the most popular packages for R, makes it easy to query large data sets in scalable processing engines like Apache Spark and Apache Impala.
But there can be pitfalls: dplyr works differently with different data sources—and those differences can bite you if you don’t know what you’re doing.
Ian Cook is a data scientist, an R contributor, and a curriculum developer at Cloudera University. In this webinar, Ian will show you exactly what you need to know about sparklyr (from RStudio) and the package implyr (from Cloudera). He will show you how to write dplyr code that works across these different interfaces. And, he will solve mysteries:
Do I need to know SQL to use dplyr?
When is a “tbl” not a “tibble”?
Why is 1 not always equal to 1?
When should you collect(), collapse(), and compute()?
How can you use dplyr to combine data stored in different systems?
3 things to learn:
Do I need to know SQL to use dplyr?
When should you collect(), collapse(), and compute()?
How can you use dplyr to combine data stored in different systems?
Leveraging the Cloud for Big Data Analytics 12.11.18Cloudera, Inc.
Learn how organizations are deriving unique customer insights, improving product and services efficiency, and reducing business risk with a modern big data architecture powered by Cloudera on AWS. In this webinar, you see how fast and easy it is to deploy a modern data management platform—in your cloud, on your terms.
Managing Successful Data Projects: Technology Selection and Team BuildingCloudera, Inc.
Recent years have seen dramatic advancements in the technologies available for managing and processing data. While these technologies provide powerful tools to build data applications, they also require new skills. Ted Malaska and Jonathan Seidman explain how to evaluate these new technologies and build teams to effectively leverage these technologies and achieve ROI with your data initiatives.
Cloud Data Warehousing with Cloudera Altus 7.24.18Cloudera, Inc.
This webinar will help you maximize the full potential of the cloud. Understand how to leverage cloud environments for different analytic workloads to empower business analysts and keep IT happy. An intricate, beautiful balance. The learn best practices in design, performance tuning, workload considerations, and hybrid or multi-cloud strategies.
How komatsu is driving operational efficiencies using io t and machine learni...Cloudera, Inc.
In this joint webinar, Jason Knuth, data scientist and analytics lead at Komatsu shares how they are analyzing over 17 billion data points every day from connected devices and using machine learning and analytics to improve mining operations.
3 Things to Learn:
-How data is driving digital transformation to help businesses innovate rapidly
-How Choice Hotels (one of largest hoteliers) is using Cloudera Enterprise to gain meaningful insights that drive their business
-How Choice Hotels has transformed business through innovative use of Apache Hadoop, Cloudera Enterprise, and deployment in the cloud — from developing customer experiences to meeting IT compliance requirements
What’s New in Cloudera Enterprise 6.0: The Inside Scoop 6.14.18Cloudera, Inc.
Webinar on Cloudera Enterprise 6.0 where we will discuss how to build new applications on the modern platform for machine learning and analytics. This webinar will take a look at the latest software enhancements and how they’ll help you improve your productivity and innovate new analytics applications.
Topics including: The transformative value of real-time data and analytics, and current barriers to adoption. The importance of an end-to-end solution for data-in-motion that includes ingestion, processing, and serving. Apache Kudu’s role in simplifying real-time architectures.
Comment développer une stratégie Big Data dans le cloud public avec l'offre P...Cloudera, Inc.
Le cloud public est une proposition attractive pour les entreprises à la recherche d’agilité dans leurs projets big data, qu’il s’agisse de traiter des données en masse ou d’y exécuter des analyses complexes pour une meilleure prise de décision.
Workload Experience Manager (XM) gives you the visibility necessary to efficiently migrate, analyze, optimize, and scale workloads running in a modern data warehouse. In this recorded webinar we discuss common challenges running at scale with modern data warehouse, benefits of end-to-end visibility into workload lifecycles, overview of Workload XM and live demo, real-life customer before/after scenarios, and what's next for Workload XM.
Explore new trends and use cases in data warehousing including exploration and discovery, self-service ad-hoc analysis, predictive analytics and more ways to get deeper business insight. Modern Data Warehousing Fundamentals will show how to modernize your data warehouse architecture and infrastructure for benefits to both traditional analytics practitioners and data scientists and engineers.
Part 1: Cloudera’s Analytic Database: BI & SQL Analytics in a Hybrid Cloud WorldCloudera, Inc.
3 Things to Learn About:
* On-premises versus the cloud: What’s the same and what’s different?
* Design and benefits of analytics in the cloud
* Best practices and architectural considerations
How to Build Continuous Ingestion for the Internet of ThingsCloudera, Inc.
The Internet of Things is moving into the mainstream and this new world of data-driven products is transforming a vast number of industry sectors and technologies.
However, IoT creates a new challenge: how to build and operationalize continual data ingestion from such a wide and ever-changing array of endpoints so that the data arrives consumption-ready and can drive analysis and action within the business.
In this webinar, Sean Anderson from Cloudera and Kirit Busu, Director of Product Management at StreamSets, will discuss Hadoop's ecosystem and IoT capabilities and provide advice about common patterns and best practices. Using specific examples, they will demonstrate how to build and run end-to-end IOT data flows using StreamSets and Cloudera infrastructure.
This document discusses enterprise data science and machine learning. It begins by noting that data is now more plentiful and machine learning opportunities are everywhere. However, challenges remain around scaling data science work, making models production-ready, and meeting different team needs. The document then introduces Cloudera's Data Science Workbench for addressing these challenges. It claims the Workbench provides a secure, self-service environment allowing data scientists direct access to enterprise data and tools while meeting IT requirements. Examples are given of how it supports the full data science pipeline from exploration to production. In demos, it highlights features like connecting to Hadoop clusters securely and enabling collaboration. Overall, the document pitches Cloudera's Workbench as a solution
Machine Learning Model Deployment: Strategy to ImplementationDataWorks Summit
This talk will introduce participants to the theory and practice of machine learning in production. The talk will begin with an intro on machine learning models and data science systems and then discuss data pipelines, containerization, real-time vs. batch processing, change management and versioning.
As part of this talk, an audience will learn more about:
• How data scientists can have the complete self-service capability to rapidly build, train, and deploy machine learning models.
• How organizations can accelerate machine learning from research to production while preserving the flexibility and agility of data scientists and modern business use cases demand.
A small demo will showcase how to rapidly build, train, and deploy machine learning models in R, python, and Spark, and continue with a discussion of API services, RESTful wrappers/Docker, PMML/PFA, Onyx, SQLServer embedded models, and
lambda functions.
Speakers
Sagar Kewalramani, Solutions Architect
Cloudera
Justin Norman, Director, Research and Data Science Services
Cloudera Fast Forward Labs
Self-service Big Data Analytics on Microsoft AzureCloudera, Inc.
In this presentation Microsoft will join Cloudera to introduce a new Platform-as-a-Service (PaaS) offering that helps data engineers use on-demand cloud infrastructure to speed the creation and operation of data pipelines that power sophisticated, data-driven applications - without onerous administration.
A Community Approach to Fighting Cyber ThreatsCloudera, Inc.
3 Things to Learn About:
*Infinitely scale data storage, access, and machine learning
*Provide community defined open data models for complete enterprise visibility
*Open up application flexibility while building on a future proofed architecture
Cloudera Altus: Big Data in the Cloud Made EasyCloudera, Inc.
Cloudera Altus makes it easier for data engineers, ETL developers, and anyone who regularly works with raw data to process that data in the cloud efficiently and cost effectively. In this webinar we introduce our new platform-as-a-service offering and explore challenges associated with data processing in the cloud today, how Altus abstracts cluster overhead to deliver easy, efficient data processing, and unique features and benefits of Cloudera Altus.
Machine Learning in the Enterprise 2019 Timothy Spann
Machine Learning in the Enterprise 2019. These are the slides for my upcoming demo on integrating Machine Learning and Streaming with Apache NiFi and Cloudera Data Science Workbench. This is for the February 12th, 2019 Future of Data Princeton meetup.
Big data journey to the cloud rohit pujari 5.30.18Cloudera, Inc.
We hope this session was valuable in teaching you more about Cloudera Enterprise on AWS, and how fast and easy it is to deploy a modern data management platform—in your cloud and on your terms.
In this webinar, we’ll show you how Cloudera SDX reduces the complexity in your data management environment and lets you deliver diverse analytics with consistent security, governance, and lifecycle management against a shared data catalog.
Leveraging the cloud for analytics and machine learning 1.29.19Cloudera, Inc.
Learn how organizations are deriving unique customer insights, improving product and services efficiency, and reducing business risk with a modern big data architecture powered by Cloudera on Azure. In this webinar, you see how fast and easy it is to deploy a modern data management platform—in your cloud, on your terms.
Cloudera Data Science Workbench: sparklyr, implyr, and More - dplyr Interfac...Cloudera, Inc.
You like to use R, and you need to use big data. dplyr, one of the most popular packages for R, makes it easy to query large data sets in scalable processing engines like Apache Spark and Apache Impala.
But there can be pitfalls: dplyr works differently with different data sources—and those differences can bite you if you don’t know what you’re doing.
Ian Cook is a data scientist, an R contributor, and a curriculum developer at Cloudera University. In this webinar, Ian will show you exactly what you need to know about sparklyr (from RStudio) and the package implyr (from Cloudera). He will show you how to write dplyr code that works across these different interfaces. And, he will solve mysteries:
Do I need to know SQL to use dplyr?
When is a “tbl” not a “tibble”?
Why is 1 not always equal to 1?
When should you collect(), collapse(), and compute()?
How can you use dplyr to combine data stored in different systems?
3 things to learn:
Do I need to know SQL to use dplyr?
When should you collect(), collapse(), and compute()?
How can you use dplyr to combine data stored in different systems?
Leveraging the Cloud for Big Data Analytics 12.11.18Cloudera, Inc.
Learn how organizations are deriving unique customer insights, improving product and services efficiency, and reducing business risk with a modern big data architecture powered by Cloudera on AWS. In this webinar, you see how fast and easy it is to deploy a modern data management platform—in your cloud, on your terms.
Managing Successful Data Projects: Technology Selection and Team BuildingCloudera, Inc.
Recent years have seen dramatic advancements in the technologies available for managing and processing data. While these technologies provide powerful tools to build data applications, they also require new skills. Ted Malaska and Jonathan Seidman explain how to evaluate these new technologies and build teams to effectively leverage these technologies and achieve ROI with your data initiatives.
Cloud Data Warehousing with Cloudera Altus 7.24.18Cloudera, Inc.
This webinar will help you maximize the full potential of the cloud. Understand how to leverage cloud environments for different analytic workloads to empower business analysts and keep IT happy. An intricate, beautiful balance. The learn best practices in design, performance tuning, workload considerations, and hybrid or multi-cloud strategies.
How komatsu is driving operational efficiencies using io t and machine learni...Cloudera, Inc.
In this joint webinar, Jason Knuth, data scientist and analytics lead at Komatsu shares how they are analyzing over 17 billion data points every day from connected devices and using machine learning and analytics to improve mining operations.
3 Things to Learn:
-How data is driving digital transformation to help businesses innovate rapidly
-How Choice Hotels (one of largest hoteliers) is using Cloudera Enterprise to gain meaningful insights that drive their business
-How Choice Hotels has transformed business through innovative use of Apache Hadoop, Cloudera Enterprise, and deployment in the cloud — from developing customer experiences to meeting IT compliance requirements
What’s New in Cloudera Enterprise 6.0: The Inside Scoop 6.14.18Cloudera, Inc.
Webinar on Cloudera Enterprise 6.0 where we will discuss how to build new applications on the modern platform for machine learning and analytics. This webinar will take a look at the latest software enhancements and how they’ll help you improve your productivity and innovate new analytics applications.
Topics including: The transformative value of real-time data and analytics, and current barriers to adoption. The importance of an end-to-end solution for data-in-motion that includes ingestion, processing, and serving. Apache Kudu’s role in simplifying real-time architectures.
Comment développer une stratégie Big Data dans le cloud public avec l'offre P...Cloudera, Inc.
Le cloud public est une proposition attractive pour les entreprises à la recherche d’agilité dans leurs projets big data, qu’il s’agisse de traiter des données en masse ou d’y exécuter des analyses complexes pour une meilleure prise de décision.
Workload Experience Manager (XM) gives you the visibility necessary to efficiently migrate, analyze, optimize, and scale workloads running in a modern data warehouse. In this recorded webinar we discuss common challenges running at scale with modern data warehouse, benefits of end-to-end visibility into workload lifecycles, overview of Workload XM and live demo, real-life customer before/after scenarios, and what's next for Workload XM.
Explore new trends and use cases in data warehousing including exploration and discovery, self-service ad-hoc analysis, predictive analytics and more ways to get deeper business insight. Modern Data Warehousing Fundamentals will show how to modernize your data warehouse architecture and infrastructure for benefits to both traditional analytics practitioners and data scientists and engineers.
Part 1: Cloudera’s Analytic Database: BI & SQL Analytics in a Hybrid Cloud WorldCloudera, Inc.
3 Things to Learn About:
* On-premises versus the cloud: What’s the same and what’s different?
* Design and benefits of analytics in the cloud
* Best practices and architectural considerations
How to Build Continuous Ingestion for the Internet of ThingsCloudera, Inc.
The Internet of Things is moving into the mainstream and this new world of data-driven products is transforming a vast number of industry sectors and technologies.
However, IoT creates a new challenge: how to build and operationalize continual data ingestion from such a wide and ever-changing array of endpoints so that the data arrives consumption-ready and can drive analysis and action within the business.
In this webinar, Sean Anderson from Cloudera and Kirit Busu, Director of Product Management at StreamSets, will discuss Hadoop's ecosystem and IoT capabilities and provide advice about common patterns and best practices. Using specific examples, they will demonstrate how to build and run end-to-end IOT data flows using StreamSets and Cloudera infrastructure.
This document discusses enterprise data science and machine learning. It begins by noting that data is now more plentiful and machine learning opportunities are everywhere. However, challenges remain around scaling data science work, making models production-ready, and meeting different team needs. The document then introduces Cloudera's Data Science Workbench for addressing these challenges. It claims the Workbench provides a secure, self-service environment allowing data scientists direct access to enterprise data and tools while meeting IT requirements. Examples are given of how it supports the full data science pipeline from exploration to production. In demos, it highlights features like connecting to Hadoop clusters securely and enabling collaboration. Overall, the document pitches Cloudera's Workbench as a solution
Machine Learning Model Deployment: Strategy to ImplementationDataWorks Summit
This talk will introduce participants to the theory and practice of machine learning in production. The talk will begin with an intro on machine learning models and data science systems and then discuss data pipelines, containerization, real-time vs. batch processing, change management and versioning.
As part of this talk, an audience will learn more about:
• How data scientists can have the complete self-service capability to rapidly build, train, and deploy machine learning models.
• How organizations can accelerate machine learning from research to production while preserving the flexibility and agility of data scientists and modern business use cases demand.
A small demo will showcase how to rapidly build, train, and deploy machine learning models in R, python, and Spark, and continue with a discussion of API services, RESTful wrappers/Docker, PMML/PFA, Onyx, SQLServer embedded models, and
lambda functions.
Speakers
Sagar Kewalramani, Solutions Architect
Cloudera
Justin Norman, Director, Research and Data Science Services
Cloudera Fast Forward Labs
Big Data LDN 2018: A LOOK INSIDE APPLIED MACHINE LEARNINGMatt Stubbs
Date: 13th November 2018
Location: Data-Driven Ldn Theatre
Time: 13:10 - 13:40
Speaker: Brian Goral
Organisation: Cloudera
About: The field of machine learning (ML) ranges from the very practical and pragmatic to the highly theoretical and abstract. This talk describes several of the challenges facing organisations that want to leverage more of their data through ML, including some examples of the applied algorithms that are already delivering value in business contexts.
Introducing Cloudera Data Science Workbench for HDP 2.12.19Cloudera, Inc.
Cloudera’s Data Science Workbench (CDSW) is available for Hortonworks Data Platform (HDP) clusters for secure, collaborative data science at scale. During this webinar, we provide an introductory tour of CDSW and a demonstration of a machine learning workflow using CDSW on HDP.
The 5 Biggest Data Myths in Telco: ExposedCloudera, Inc.
The document discusses common myths in the telecommunications industry regarding big data and analytics. It addresses five myths: 1) that data is too diverse to analyze, 2) that open source means open security, 3) that big data platforms do not provide adequate return on investment, 4) that big data tools are too difficult for teams to learn, and 5) that legacy systems cannot handle additional data solutions. For each myth, it provides facts and examples to demonstrate why the myths are unfounded and how organizations can leverage big data to drive insights.
Introducing the data science sandbox as a service 8.30.18Cloudera, Inc.
How can companies integrate data science into their businesses more effectively? Watch this recorded webinar and demonstration to hear more about operationalizing data science with Cloudera Data Science Workbench on Cazena’s fully-managed cloud platform.
The document discusses challenges in building machine learning platforms and pipelines. It covers topics like data exploration challenges due to versioning issues; managing large numbers of model experiments with different hyperparameters, datasets, and performance tracking; and difficulties deploying models at scale for monitoring. The presentation demonstrates examples of machine learning applications in industries like telecommunications, manufacturing, and finance. It also discusses trends in deep learning, distributed learning, transfer learning, and edge device machine learning.
Keynote: The Journey to Pervasive AnalyticsCloudera, Inc.
We are in the middle of a data rush. When you are right in the center of a storm, it can seem overwhelming. Where should I start? What do I need to think about? What is the best long-term bet? But don’t forget that more data should mean great news. More data should mean more insight, more guidance, and more strategic direction. However, more data doesn’t automatically rally your entire business around common goals and insights. You need a platform and architecture that can support a thriving, analytic-driven business culture that embraces a pervasive analytics strategy.
Federated Learning makes it possible to build machine learning systems without direct access to training data. The data remains in its original location, which helps to ensure privacy, reduces network communication costs, and taps edge device computing resources. The principles of data minimization established by the GDPR, and the growing prevalence of smart sensors make the advantages of federated learning more compelling. Federated learning is a great fit for smartphones, industrial and consumer IoT, healthcare and other privacy-sensitive use cases, and industrial sensor applications.
We’ll present the Fast Forward Labs team’s research on this topic and the accompanying prototype application, “Turbofan Tycoon”: a simplified working example of federated learning applied to a predictive maintenance problem. In this demo scenario, customers of an industrial turbofan manufacturer are not willing to share the details of how their components failed with the manufacturer, but want the manufacturer to provide them with a strategy to maintain the part. Federated learning allows us to satisfy the customer's privacy concerns while providing them with a model that leads to fewer costly failures and less maintenance downtime.
We’ll discuss the advantages and tradeoffs of taking the federated approach. We’ll assess the state of tooling for federated learning, circumstances in which you might want to consider applying it, and the challenges you’d face along the way.
Speaker
Chris Wallace
Data Scientist
Cloudera
This document discusses an approach to enterprise metadata integration using a multilayer metadata model. Key points include:
- Status dashboards provide facts from technical, operational, application, and quality metadata layers
- A graph database allows for context exploration across the entire cluster
- The integration of metadata from multiple sources provides a more holistic view of business knowledge
Machine Learning Models: From Research to Production 6.13.18Cloudera, Inc.
Learn more about how data scientists can have the complete self-service capability to rapidly build, train, and deploy machine learning models, and how organisations can accelerate machine learning from research to production, while preserving the flexibility and agility data scientists and modern business use cases demand.
The Edge to AI Deep Dive Barcelona Meetup March 2019Timothy Spann
The Edge to AI Deep Dive Barcelona Meetup March 2019
A deep dive demo of using MiNiFi, NiFi, CDSW for real-time AI at the edge, in a local cluster, in the cloud and in a Data Science platform at scale with real-time streaming and data storage.
Apache NiFi, MiNiFi, NiFi Registry, Cloudera Data Science Workbench (CDSW), Python, Pyspark, Spark SQL, Apache Calcite, Apache Parquet, Apache MXNet, GluonCV.
Big Data Day LA 2015 - Brainwashed: Building an IDE for Feature Engineering b...Data Con LA
Feature engineering- writing code to map raw input data into a set of signals that will be fed into a machine learning algorithm- is the dark art of data science. Although the process of crafting new features is tedious and failure-prone, the key to a successful model is a diverse set of high-quality features that are informed by domain experts. Recently, academic researchers have begun to focus on the problem of feature engineering, and have started to publish research that addresses the relative lack of tools that are designed to support the feature engineering process. In this talk, I will review some of my favorite papers and present some efforts to convert these ideas into tools that leverage the principles of reactive application design in order to make feature engineering (dare I say it) fun.
Part 3: Models in Production: A Look From Beginning to EndCloudera, Inc.
The document discusses the different roles involved in developing machine learning models from beginning to end. It describes the typical workflow as including data engineering to prepare data, exploratory data science to develop models, and operational model deployment to production applications. It provides examples of tasks for each role such as data engineers ingesting and transforming sensor data, data scientists building and evaluating predictive models, and model deployment engineers validating models and creating APIs.
Build a modern platform for anti-money laundering 9.19.18Cloudera, Inc.
In this webinar, you will learn how Cloudera and BAH riskCanvas can help you build a modern AML platform that reduces false positive rates, investigation costs, technology sprawl, and regulatory risk.
The document discusses using Cloudera DataFlow to address challenges with collecting, processing, and analyzing log data across many systems and devices. It provides an example use case of logging modernization to reduce costs and enable security solutions by filtering noise from logs. The presentation shows how DataFlow can extract relevant events from large volumes of raw log data and normalize the data to make security threats and anomalies easier to detect across many machines.
Cloudera Data Impact Awards 2021 - Finalists Cloudera, Inc.
The document outlines the 2021 finalists for the annual Data Impact Awards program, which recognizes organizations using Cloudera's platform and the impactful applications they have developed. It provides details on the challenges, solutions, and outcomes for each finalist project in the categories of Data Lifecycle Connection, Cloud Innovation, Data for Enterprise AI, Security & Governance Leadership, Industry Transformation, People First, and Data for Good. There are multiple finalists highlighted in each category demonstrating innovative uses of data and analytics.
2020 Cloudera Data Impact Awards FinalistsCloudera, Inc.
Cloudera is proud to present the 2020 Data Impact Awards Finalists. This annual program recognizes organizations running the Cloudera platform for the applications they've built and the impact their data projects have on their organizations, their industries, and the world. Nominations were evaluated by a panel of independent thought-leaders and expert industry analysts, who then selected the finalists and winners. Winners exemplify the most-cutting edge data projects and represent innovation and leadership in their respective industries.
The document outlines the agenda for Cloudera's Enterprise Data Cloud event in Vienna. It includes welcome remarks, keynotes on Cloudera's vision and customer success stories. There will be presentations on the new Cloudera Data Platform and customer case studies, followed by closing remarks. The schedule includes sessions on Cloudera's approach to data warehousing, machine learning, streaming and multi-cloud capabilities.
Machine Learning with Limited Labeled Data 4/3/19Cloudera, Inc.
Cloudera Fast Forward Labs’ latest research report and prototype explore learning with limited labeled data. This capability relaxes the stringent labeled data requirement in supervised machine learning and opens up new product possibilities. It is industry invariant, addresses the labeling pain point and enables applications to be built faster and more efficiently.
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Cloudera, Inc.
In this session, we will cover how to move beyond structured, curated reports based on known questions on known data, to an ad-hoc exploration of all data to optimize business processes and into the unknown questions on unknown data, where machine learning and statistically motivated predictive analytics are shaping business strategy.
Introducing Cloudera DataFlow (CDF) 2.13.19Cloudera, Inc.
Watch this webinar to understand how Hortonworks DataFlow (HDF) has evolved into the new Cloudera DataFlow (CDF). Learn about key capabilities that CDF delivers such as -
-Powerful data ingestion powered by Apache NiFi
-Edge data collection by Apache MiNiFi
-IoT-scale streaming data processing with Apache Kafka
-Enterprise services to offer unified security and governance from edge-to-enterprise
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Cloudera, Inc.
Join Cloudera as we outline how we use Cloudera technology to strengthen sales engagement, minimize marketing waste, and empower line of business leaders to drive successful outcomes.
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Cloudera, Inc.
Join us to learn about the challenges of legacy data warehousing, the goals of modern data warehousing, and the design patterns and frameworks that help to accelerate modernization efforts.
Explore new trends and use cases in data warehousing including exploration and discovery, self-service ad-hoc analysis, predictive analytics and more ways to get deeper business insight. Modern Data Warehousing Fundamentals will show how to modernize your data warehouse architecture and infrastructure for benefits to both traditional analytics practitioners and data scientists and engineers.
The document discusses the benefits and trends of modernizing a data warehouse. It outlines how a modern data warehouse can provide deeper business insights at extreme speed and scale while controlling resources and costs. Examples are provided of companies that have improved fraud detection, customer retention, and machine performance by implementing a modern data warehouse that can handle large volumes and varieties of data from many sources.
Extending Cloudera SDX beyond the PlatformCloudera, Inc.
Cloudera SDX is by no means no restricted to just the platform; it extends well beyond. In this webinar, we show you how Bardess Group’s Zero2Hero solution leverages the shared data experience to coordinate Cloudera, Trifacta, and Qlik to deliver complete customer insight.
Federated Learning: ML with Privacy on the Edge 11.15.18Cloudera, Inc.
Join Cloudera Fast Forward Labs Research Engineer, Mike Lee Williams, to hear about their latest research report and prototype on Federated Learning. Learn more about what it is, when it’s applicable, how it works, and the current landscape of tools and libraries.
Analyst Webinar: Doing a 180 on Customer 360Cloudera, Inc.
451 Research Analyst Sheryl Kingstone, and Cloudera’s Steve Totman recently discussed how a growing number of organizations are replacing legacy Customer 360 systems with Customer Insights Platforms.
Spark and Deep Learning Frameworks at Scale 7.19.18Cloudera, Inc.
We'll outline approaches for preprocessing, training, inference, and deployment across datasets (time series, audio, video, text, etc.) that leverage Spark, along with its extended ecosystem of libraries and deep learning frameworks using Cloudera's Data Science Workbench.
Cloudera's big data platform can help organizations comply with the EU's General Data Protection Regulation (GDPR) in three key ways:
1. It provides a single system to securely store, govern, and manage all analytic workloads and personal data across on-premises, cloud, structured, and unstructured data sources.
2. Its shared services like data catalog, security, governance, and lifecycle management can be applied uniformly across the platform to meet GDPR principles like data minimization, storage limitation, and accuracy.
3. Specific capabilities like its GDPR data hub, consent management, and ability to delete individual data records upon request help automate key GDPR requirements at scale,
To disrupt and innovate, you need access to data. All of your data. The challenge for many organisations is that the data they need is locked away in a variety of silos. And there's perhaps no bigger silo than one of the most a widely deployed business application: SAP. Bringing together all your data for analytics and machine learning unlocks new insights and business value. Together, Cloudera and Datavard hold the key to breaking SAP data out of its silo, providing access to unlimited and untapped opportunities that currently lay hidden.
Multi task learning stepping away from narrow expert models 7.11.18Cloudera, Inc.
Join this webinar as Friederike Schüür covers:
A conceptual introduction to multi-task learning (MTL), how and why it works
A technical deep dive, from MTL random forests to MTL neural networks
Applications of MTL, from structured data to text and images
The benefits of MTL to organizations, from financial services to healthcare and agriculture
Cloudera training secure your cloudera cluster 7.10.18Cloudera, Inc.
Exclusively through Cloudera OnDemand, Cloudera Security Training introduces you to the tools and techniques that Cloudera's solution architects use to protect the clusters our customers rely on for critical machine learning and analytics workloads. This webinar will give you a sneak peek at our new on-demand security course and show you the immense scope of Cloudera training. From authentication and authorization to encryption, auditing, and everything in between, this course gives you the skills you need to properly secure your Cloudera cluster.
Mieke Jans is a Manager at Deloitte Analytics Belgium. She learned about process mining from her PhD supervisor while she was collaborating with a large SAP-using company for her dissertation.
Mieke extended her research topic to investigate the data availability of process mining data in SAP and the new analysis possibilities that emerge from it. It took her 8-9 months to find the right data and prepare it for her process mining analysis. She needed insights from both process owners and IT experts. For example, one person knew exactly how the procurement process took place at the front end of SAP, and another person helped her with the structure of the SAP-tables. She then combined the knowledge of these different persons.
GenAI for Quant Analytics: survey-analytics.aiInspirient
Pitched at the Greenbook Insight Innovation Competition as apart of IIEX North America 2025 on 30 April 2025 in Washington, D.C.
Join us at survey-analytics.ai!
#2: Thank you, Joao.
Our next speaker comes to us all the way from Brooklyn, New York, where the Fast Forward Labs team is headquartered. Cloudera acquired Fast Forward Labs last year to advance machine learning in the enterprise. The FFL team has a clear vision of the future and a deep expertise in applying machine learning and AI to practical business problems.
Brian Goral guides large client programs and provides operations direction for Cloudera's Fast Forward Labs team — when he’s available at the FFL offices in Brooklyn he joins the research and advising teams, bringing emerging machine learning concepts to life in client business use cases. Prior to joining the Fast Forward Labs team Brian lived a 15-plus year career focused on data collection systems and application of global data to decision-making and policy-setting. An alumnus of Michigan State University with a Masters from the University of North Carolina, Brian hails from Milwaukee though now hangs his hat in New York City. Without further adieu, please welcome Brian Goral.
#8: Zebra Medical Vision, an Israeli startup, uses deep learning to diagnose diseases of the bone system, liver, lungs, cardiovascular system, and the brain.
#9: Hospitals have used machine learning to predict rehospitalization for years. With new techniques, they are able to markedly improve the predictive power of readmission models.
-- Mining text from provider notes and other documents in the EHR system
-- Building models specific to individual diseases and diagnoses
CHS transformed readmissions modeling into priority scores available to care managers.
in a year and a half to a two-year period Carolinas Health System was able to drop the readmission rate from 21 percent to 14 percent.
#10: For the insurance industry, researchers at Purdue University developed a system that uses machine learning to assess disaster damage. This makes it possible for insurers to rapidly predict claims, and serves as an independent check on human assessors.
#11: Lloyds Banking Group uses deep learning to develop unique identifiers for each customer’s voice. The bank uses these voice profiles to confirm the identity of people who contact the call center, reducing fraud and improving operations.
#13: Many organizations STRUGGLE to profit from machine learning
#14: It’s difficult because it’s often hard to ask the right questions, difficult because it’s not straightforward programming - it really is science and experimentation, difficult because of the rapidly growing volume, difficult because the metrics are generally measured somewhere relative to chance which can be difficult to communicate - when a data set tells you you have a 95% certainty in a particular outcome that’s one thing, but what about when the data only supports a solution with a 65% accuracy and you need to dig deeper. Executives in a lot of organizations are not used to being given that kind of response.
You also know, better than most organizations, that a company, one of your clients, can’t outsource understanding of your own workplace or necessarily outsource their internal data product development without risking poor integration
And with so much going on under the umbrella of Artificial Intelligence these days Executives need trusted advisors to navigate a fast-moving landscape of machine learning or “AI” as many people term it.
Data products underpin much of the current business and government decision-making occurring today. Data - machine learning in particular is a huge - but difficult to execute on - opportunity for every organization.
#15: Data products themselves are difficult at the tactical level
but there are also these strategic barriers to transformation.
#20: Departmental purchases
Different ML tools for different ML use cases
Absence of common best practices and standards
Provisioning is all over the map
One team might use Databricks another uses a competing black box software you’ll never get your data back from.
#21: It’s important to consolidate
It saves money and process and smooths interactions - and with ML in particular there’s an added benefit - multi-task learning.
#22: Multi-task learning is building a ML model on differently trained tasks to the benefit/enhancement of each.
In the health care world this might look something like training separate models on claims data and patient care data and finding new cost savings insights in each when the commonalities and differences in the models are combined and exploited
#24: We hear a lot about data science “heroes” -- genius individual contributors who, singlehandedly, produce brilliant insights that change lives,
#25: In fact, successful data science requires a collaborative approach from many contributors
#29: Just saying something is a deep learning problem is like saying you’re going to deliver a vehicle with a 12-cylinder engine. Each of these vehicles has a 12-cylinder engine, but there are a lot of differences in producing them.
#30: We’re there because academic research is doing some amazing work, but they’re not trying to solve your applied problem. The general formulation of an algorithm does not equal your use case solution. I can pull up some very cool use cases of deep learning using Cloudera in areas like disaster recovery, cardiac monitoring in health care, and voice identification in banking - but there is a big difference between saying that you have a deep learning challenge and bringing you to a solution. We’re here to help bridge that gap.
#41: But obviously it takes more than good people and processes. You need the right technology.
Let’s get down to brass tacks on what the software is about
We’re based on an open source core. A complete, integrated enterprise platform leveraging open source
HOSS business model - core set of platform capabilities – we contribute actively into that community.
and we layer value added software on top - that’s how we run our business.
But what’s truly differentiating about our platform is the enterprise experience you get. It’s why we’re able to claim 7 of the top ten banks and 9 of the top ten telcos are our customers. For regulated industries, the enterprise experience is critical.
Multi-cloud – No vendor lock in. Work in the environment of your choice. Better pricing leverage
Managed TCO – Multiple pricing and deployment options
Integrated – Integrated components with shared metadata, security and operations
Secure - Protect sensitive data from unauthorized access – encryption, key management
Compliance – Full auditing and visibility
Governance – Ensure data veracity