Pentaho - Jake Cornelius - Hadoop World 2010

Oct 25, 20102 likes1,434 views

Putting Analytics in Big Data Analytics Jake Cornelius Director of Product Management, Pentaho Corporation Learn more @ http://www.cloudera.com/hadoop/

© 2010, Pentaho. All Rights Reserved. www.pentaho.com.
Putting Analytics in
Big Data Analytics
Jake Cornelius, Dir. Of Product Management
Pentaho Corporation
October 12, 2010

010, Pentaho. All Rights Reserved. www.pentaho.com. US and Worldwide: +1 (866) 660-7555 | Slide
Traditional BI
Tape/Trash
Data Mart(s)
Data
Source
?
? ?
?
?
??

010, Pentaho. All Rights Reserved. www.pentaho.com. US and Worldwide: +1 (866) 660-7555 | Slide
Data Lake(s)
Big Data Architecture
Data Mart(s)
Data
Source
Data WarehouseAd-Hoc

010, Pentaho. All Rights Reserved. www.pentaho.com. US and Worldwide: +1 (866) 660-7555 | Slide
Pentaho Data Integration
Hadoop
Pentaho Data
Integration
Data Marts, Data Warehouse,
Analytical Applications
Design
Deploy
Orchestrate
Pentaho Data
Integration
Pentaho Data
Integration

010, Pentaho. All Rights Reserved. www.pentaho.com. US and Worldwide: +1 (866) 660-7555 | Slide
Optimize
Visualize
Load
Files / HDFS
Hive
DM & DW
Applications & Systems
Web Tier
RDBMS
Hadoop
Reporting / Dashboards / Analysis

010, Pentaho. All Rights Reserved. www.pentaho.com. US and Worldwide: +1 (866) 660-7555 | Slide
Web Tier
RDBMS
Hadoop
Reporting / Dashboards / Analysis
HDFS
Hive
DM

010, Pentaho. All Rights Reserved. www.pentaho.com. US and Worldwide: +1 (866) 660-7555 | Slide
Demo

• Pentaho for Hadoop Download Capability
• Includes support for development, production support will follow with GA
• Collaborative effort between Pentaho and the Pentaho Community
• 60+ beta sites over three month beta cycle
• Pentaho contributed code for API integration with HIVE to the open source
Apache Foundation
• Pentaho and Cloudera Partnership
• Combines Pentaho ‘s business intelligence and data integration capabilities
with Cloudera’s Distribution for Hadoop (CDH)
• Enables business users to take advantage of Hadoop with ability to easily and
cost-effectively mine, visualize and analyze their Hadoop data
Pentaho for Hadoop Announcements

Pentaho for Hadoop Announcements (cont)
• Pentaho and Impetus Technologies Partnership
• Incorporates Pentaho Agile BI and Pentaho BI Suite for Hadoop into Impetus
Large Data Analytics practice
• First major SI to adopt Pentaho for Hadoop
• Facilitates large data analytics projects including expert consulting services,
best practices support in Hadoop implementations and nCluster including
deployment on private and public clouds

010, Pentaho. All Rights Reserved. www.pentaho.com. US and Worldwide: +1 (866) 660-7555 | Slide
Pentaho for Hadoop Resources & Events
Resources
Download www.pentaho.com/download/hadoop
Pentaho for Hadoop webpage - resources, press, events, partnerships and
more: www.pentaho.com/hadoop
Big Data Analytics: 5 part video series with James Dixon, Pentaho CTO
Events
Hadoop World: NYC - Oct 12, Gold Sponsor, Exhibitor, Richard Daley
presenting, ‘Putting Analytics in Big Data Analysis’
London Hadoop User Group - Oct 12, London
Agile BI Meets Big Data - Oct 13, New York City

010, Pentaho. All Rights Reserved. www.pentaho.com. US and Worldwide: +1 (866) 660-7555 | Slide
Thank You.
Join the conversation. You can find us on:
Pentaho Facebook Group
@Pentaho
http://blog.pentaho.com
Pentaho - Open Source Business Intelligence Group

1. The document discusses Pentaho's approach to big data analytics using a component-based data integration and visualization platform. 2. The platform allows business analysts and data scientists to prepare and analyze big data without advanced technical skills. 3. It provides a visual interface for building reusable data pipelines that can be run locally or deployed to Hadoop for analytics on large datasets.

Pentaho big data camp - 5 minianfyfe

The document discusses the importance of a hybrid data model for Hadoop-driven analytics. It notes that traditional data warehousing is not suitable for large, unstructured data in Hadoop environments due to limitations in handling data volume, variety, and velocity. The hybrid model combines a data lake in Hadoop for raw, large-scale data with data marts and warehouses. It argues that Pentaho's suite provides tools to lower technical barriers for extracting, transforming, and loading (ETL) data between the data lake and marts/warehouses, enabling analytics on Hadoop data.

Slides pentaho-hadoop-wekalucboudreau

Pentaho provides open source business analytics tools including Kettle for extraction, transformation and loading (ETL) of data, and Weka for machine learning and data mining. Kettle allows users to run ETL jobs directly on Hadoop clusters and its JDBC layer enables SQL queries to be pushed down to databases for better performance. While bringing Weka analytics to Hadoop data provides gains, challenges include ensuring true parallel machine learning algorithms and keeping clients notified of database updates.

Big Data Integration Webinar: Getting Started With Hadoop Big DataPentaho

This document discusses getting started with big data analytics using Hadoop and Pentaho. It provides an overview of installing and configuring Hadoop and Pentaho on a single machine or cluster. Dell's Crowbar tool is presented as a way to quickly deploy Hadoop clusters on Dell hardware in about two hours. The document also covers best practices like leveraging different technologies, starting with small datasets, and not overloading networks. A demo is given and contact information provided.

Why Your Product Needs an Analytic Strategy Pentaho

The document discusses strategies for enhancing products with analytics capabilities. It outlines three strategic approaches: 1) enhance current software products with analytics, 2) target new opportunities using existing data through direct data monetization or new products/services, and 3) reinvent value propositions using new data technologies like big data. The document provides examples of implementing analytics capabilities for different user personas and considerations for analytics deployments. It argues that analytics can provide benefits like improved decisions, customer stickiness, and new revenue opportunities.

Exclusive Verizon Employee Webinar: Getting More From Your CDR DataPentaho

This document discusses a project between Pentaho and Verizon to leverage big data analytics. Verizon generates vast amounts of call detail record (CDR) data from mobile networks that is currently stored in a data warehouse for 2 years and then archived to tape. Pentaho's platform will help optimize the data warehouse by using Hadoop to store all CDR data history. This will free up data warehouse capacity for high value data and allow analysis of the full 10 years of CDR data. Pentaho tools will ingest raw CDR data into Hadoop, execute MapReduce jobs to enrich the data, load results into Hive, and enable analyzing the data to understand calling patterns by geography over time.

Pentaho roadmap 061314Stratebi

The document outlines Pentaho's roadmap and focus areas for business analytics products. It discusses enhancements planned for Pentaho Business Analytics 5.1, including new features for analyzing MongoDB data and improved visualizations. It also summarizes R&D activities like integrating real-time data processing with Storm and Spark. The roadmap focuses on hardening the Pentaho platform for large enterprises, extending capabilities for big data engineering and analytics, and improving embedded analytics.

Big Data for BI - Beyond the Hype - PentahoSubramanian Senthamarai Kannan

The document discusses Pentaho's business intelligence (BI) platform for big data analytics. It describes Pentaho as providing a modern, unified platform for data integration and analytics that allows for native integration into the big data ecosystem. It highlights Pentaho's open source development model and that it has over 1,000 commercial customers and 10,000 production deployments. Several use cases are presented that demonstrate how Pentaho helps customers unlock value from big data stores.

Big Data Integration Webinar: Reducing Implementation Efforts of Hadoop, NoSQ...Pentaho

This document discusses approaches to implementing Hadoop, NoSQL, and analytical databases. It describes: 1) The current landscape of big data databases including Hadoop, NoSQL, and analytical databases that are often used together but come from different vendors with different interfaces. 2) Common uses of transactional databases, Hadoop, NoSQL databases, and analytical databases. 3) The complexity of current implementation approaches that involve multiple coding steps across various tools. 4) How Pentaho provides a unified platform and visual tools to reduce the time and effort needed for implementation by eliminating disjointed steps and enabling non-coders to develop workflows and analytics for big data.

Pentaho Analytics at Tampa Analytics September MeetupMark Kromer

30 for 30: Quick Start Your Pentaho EvaluationPentaho

Big Data for Product ManagersPentaho

Moving Health Care Analytics to Hadoop to Build a Better Predictive ModelDataWorks Summit

This document discusses Dignity Health's move to using Hadoop for healthcare analytics to build better predictive models. It outlines their goals of saving costs and lives by leveraging over 30 TB of clinical data using Hadoop and SAS technologies on their Dignity Health Insights platform. The presentation agenda covers Dignity Health, healthcare analytics challenges, their big data ecosystem architecture featuring Hadoop, and how they are using this infrastructure for applications like sepsis surveillance analytics.

Breakout: Operational Analytics with HadoopCloudera, Inc.

Operationalizing models and responding to large volumes of data, fast, requires bolt on systems that can struggle with processing (transforming the data), consistency (always responding to data), and scalability (processing and responding to large volumes of data). If the data volume become too large, these traditional systems fail to deliver their responses resulting in significant losses to organizations. Join this breakout to learn how to overcome the roadblocks.

Pentaho Analytics for MongoDB - presentation from MongoDB World 2014Pentaho

Syncsort, Tableau, & Cloudera present: Break the Barriers to Big Data InsightPrecisely

The document discusses moving legacy data and workloads from traditional data warehouses to Hadoop. It describes how ELT processes on dormant data waste resources and how offloading this data to Hadoop can optimize costs and performance. The presentation includes a demonstration of using Tableau for self-service analytics on data in Hadoop and a case study of a financial organization reducing ELT development time from weeks to hours by offloading mainframe data to Hadoop.

All data accessible to all my organization - Presentation at OW2con'19, June...OW2

This document discusses how Dremio provides a unified access point for data across an entire organization. It summarizes how Dremio allows various users, including data engineers, scientists, analysts and business users, to access all kinds of data sources through SQL or REST APIs. Dremio also enables features like data catalogs, collaborative workspaces, and workload monitoring that help organizations better manage and govern their data.

Syncsort, Tableau, & Cloudera present: Break the Barriers to Big Data InsightSteven Totman

Demand for quicker access to multiple integrated sources of data continues to rise. Immediate access to data stored in a variety of systems - such as mainframes, data warehouses, and data marts - to mine visually for business intelligence is the competitive differentiation enterprises need to win in today’s economy. Stop playing the waiting game and learn about a new end-to-end solution for combining, analyzing, and visualizing data from practically any source in your enterprise environment. Leading organizations are already taking advantage of this architectural innovation to gain modern insights while reducing costs and propelling their businesses ahead of the competition. Are you tired of waiting? Don't let your architecture hold you back. Access this webinar and hear from a team of industry experts on how you can Break the Barriers to Big Data Insight.

Data Mashups for AnalyticsKatharine Bierce

Explore how data integration (or “mashups”) can maximize analytic value and help business teams create streamlined data pipelines that enables ad-hoc analytic inquiries. You’ll learn why businesses increasingly focused on blending data on demand and at the source, the concrete analytic advantages that this approach delivers, and the type of architectures required for delivering trusted, blended data. We provide a checklist to assess your data integration needs and capabilities, and review some real-world examples of how blending various data types has created significant analytic value and concrete business impact.

Rob peglar introduction_analytics _big data_hadoopGhassan Al-Yafie

This document provides an introduction to analytics and big data using Hadoop. It discusses the growth of digital data and challenges of big data. Hadoop is presented as a solution for storing and processing large, unstructured datasets across commodity servers. The key components of Hadoop - HDFS for distributed storage and MapReduce for distributed processing - are described at a high level. Examples of industries using big data analytics are also listed.

Evolution from Apache Hadoop to the Enterprise Data Hub by Cloudera - ArabNet...ArabNet ME

A new foundation for the Modern Information Architecture. Speaker: Amr Awadallah, CTO & Cofounder, Cloudera Our legacy information architecture is not able to cope with the realities of today's business. This is because it is not able to scale to meet our SLAs due to separation of storage and compute, economically store the volumes and types of data we currently confront, provide the agility necessary for innovation, and most importantly, provide a full 360 degree view of our customers, products, and business. In this talk Dr. Amr Awadallah will present the Enterprise Data Hub (EDH) as the new foundation for the modern information architecture. Built with Apache Hadoop at the core, the EDH is an extremely scalable, flexible, and fault-tolerant, data processing system designed to put data at the center of your business.

Data Process Systems, connecting everythingDataWorks Summit/Hadoop Summit

This document summarizes Patrick de Vries' presentation on connecting everything at the Hadoop Summit 2016. The presentation discusses KPN's use of Hadoop to manage increasing data and network capacity needs. It outlines KPN's data flow process from source systems to Hadoop for processing and generating reports. The presentation also covers lessons learned in implementing Hadoop including having strong executive support, addressing cultural challenges around data ownership, and leveraging existing investments. Finally, it promotes joining a new TELCO Hadoop community for telecommunications providers to share use cases and lessons.

MongoDB IoT City Tour EINDHOVEN: Analysing the Internet of Things: Davy Nys, ...MongoDB

Embedded Analytics in Human Capital ManagementPentaho

Better Together: The New Data Management OrchestraCloudera, Inc.

To ingest, store, process and leverage big data for maximum business impact requires integrating systems, processing frameworks, and analytic deployment options. Learn how Cloudera’s enterprise data hub framework, MongoDB, and Teradata Data Warehouse working in concert can enable companies to explore data in new ways and solve problems that not long ago might have seemed impossible. Gone are the days of NoSQL and SQL competing for center stage. Visionary companies are driving data subsystems to operate in harmony. So what’s changed? In this webinar, you will hear from executives at Cloudera, Teradata and MongoDB about the following: How to deploy the right mix of tools and technology to become a data-driven organization Examples of three major data management systems working together Real world examples of how business and IT are benefiting from the sum of the parts Join industry leaders Charles Zedlewski, Chris Twogood and Kelly Stirman for this unique panel discussion, moderated by BI Research analyst, Colin White.

2021 gartner mq dsmlSasikanth R

1. The document discusses a Gartner report that assesses 20 vendors of data science and machine learning platforms. It evaluates the platforms' abilities to support the full data science life cycle. 2. The report places vendors in four categories - Leaders, Challengers, Visionaries, and Niche Players. It outlines the strengths and cautions of platforms from vendors like Amazon Web Services, Alteryx, and Anaconda. 3. Key criteria for evaluating the platforms include ease of use, support for different personas, capabilities for tasks like modeling and deployment, and growth and innovation. The report aims to help users choose the right platform for their needs.

Syncsort, Tableau, & Cloudera present: Break the Barriers to Big Data InsightCloudera, Inc.

Integrated dwh 3Gwen (Chen) Shapira

This document discusses building an integrated data warehouse with Oracle Database and Hadoop. It describes why a data warehouse may need Hadoop to handle big data from sources like social media, sensors and logs. Examples are given of using Hadoop for ETL and analytics. The presentation provides an overview of Hadoop and how to connect it to the data warehouse using tools like Sqoop and external tables. It also offers tips on getting started and avoiding common pitfalls.

Hadoop uk user group meeting finalSkills Matter

The document summarizes Pentaho's open source business intelligence and data integration products, including their new capabilities for Hadoop and big data analytics. It discusses Pentaho's partnerships with Amazon Web Services and Cloudera to more easily integrate Hadoop data. It also outlines how Pentaho helps users analyze and visualize both structured and unstructured data from Hadoop alongside traditional data sources.

Plug 20110217Skills Matter

The document discusses big data and Hadoop. It notes that big data comes in terabytes and petabytes, sometimes generated daily. Hadoop is presented as a framework for distributed computing on large datasets using MapReduce. While Hadoop can store and process massive amounts of data across commodity servers, it was not designed for business intelligence requirements. The document proposes addressing this by adding data integration and transformation capabilities to Hadoop through tools like Pentaho Data Integration, to enable it to better meet the needs of big data analytics.

More Related Content

What's hot (20)

Big Data Integration Webinar: Reducing Implementation Efforts of Hadoop, NoSQ...Pentaho

Pentaho Analytics at Tampa Analytics September MeetupMark Kromer

30 for 30: Quick Start Your Pentaho EvaluationPentaho

Big Data for Product ManagersPentaho

Moving Health Care Analytics to Hadoop to Build a Better Predictive ModelDataWorks Summit

Breakout: Operational Analytics with HadoopCloudera, Inc.

Pentaho Analytics for MongoDB - presentation from MongoDB World 2014Pentaho

Syncsort, Tableau, & Cloudera present: Break the Barriers to Big Data InsightPrecisely

All data accessible to all my organization - Presentation at OW2con'19, June...OW2

Syncsort, Tableau, & Cloudera present: Break the Barriers to Big Data InsightSteven Totman

Data Mashups for AnalyticsKatharine Bierce

Rob peglar introduction_analytics _big data_hadoopGhassan Al-Yafie

Evolution from Apache Hadoop to the Enterprise Data Hub by Cloudera - ArabNet...ArabNet ME

Data Process Systems, connecting everythingDataWorks Summit/Hadoop Summit

MongoDB IoT City Tour EINDHOVEN: Analysing the Internet of Things: Davy Nys, ...MongoDB

Embedded Analytics in Human Capital ManagementPentaho

Better Together: The New Data Management OrchestraCloudera, Inc.

2021 gartner mq dsmlSasikanth R

Syncsort, Tableau, & Cloudera present: Break the Barriers to Big Data InsightCloudera, Inc.

Integrated dwh 3Gwen (Chen) Shapira

Big Data Integration Webinar: Reducing Implementation Efforts of Hadoop, NoSQ...Pentaho

Pentaho Analytics at Tampa Analytics September MeetupMark Kromer

30 for 30: Quick Start Your Pentaho EvaluationPentaho

Big Data for Product ManagersPentaho

Moving Health Care Analytics to Hadoop to Build a Better Predictive ModelDataWorks Summit

Breakout: Operational Analytics with HadoopCloudera, Inc.

Pentaho Analytics for MongoDB - presentation from MongoDB World 2014Pentaho

Syncsort, Tableau, & Cloudera present: Break the Barriers to Big Data InsightPrecisely

All data accessible to all my organization - Presentation at OW2con'19, June...OW2

Syncsort, Tableau, & Cloudera present: Break the Barriers to Big Data InsightSteven Totman

Data Mashups for AnalyticsKatharine Bierce

Rob peglar introduction_analytics _big data_hadoopGhassan Al-Yafie

Evolution from Apache Hadoop to the Enterprise Data Hub by Cloudera - ArabNet...ArabNet ME

Data Process Systems, connecting everythingDataWorks Summit/Hadoop Summit

MongoDB IoT City Tour EINDHOVEN: Analysing the Internet of Things: Davy Nys, ...MongoDB

Embedded Analytics in Human Capital ManagementPentaho

Better Together: The New Data Management OrchestraCloudera, Inc.

2021 gartner mq dsmlSasikanth R

Syncsort, Tableau, & Cloudera present: Break the Barriers to Big Data InsightCloudera, Inc.

Integrated dwh 3Gwen (Chen) Shapira

Similar to Pentaho - Jake Cornelius - Hadoop World 2010 (20)

Hadoop uk user group meeting finalSkills Matter

Plug 20110217Skills Matter

BI congres 2014-5: from BI to big data - Jan Aertsen - PentahoBICC Thomas More

7de BI congres van het BICC-Thomas More: 3 april 2014 Reisverslag van Business Intelligence naar Big Data De reisbranche is sterk in beweging. Deze presentatie zal een reis door klassieke en moderne BI bestemmingen zijn, toont een serie snapshots van verschillende use cases in de reisbranche. Tijdens de sessie benadrukken we de capaciteit en flexibiliteit die een BI-tool nodig heeft om u te begeleiden op uw reis van klassieke BI-implementaties naar de moderne big data uitdagingen .

How advanced analytics is impacting the banking sectorMichael Haddad

The document discusses how advanced analytics is impacting the banking sector. It covers topics like regulatory changes forcing banks to invest in compliance; new digital technologies changing how customers interact with banks; and data analytics helping banks reduce risk, deliver personalized services, and retain skills. It also discusses Hitachi Data Systems' acquisition of Pentaho and how their combined platform can provide unified data integration and business analytics across structured, unstructured, and streaming data sources.

Putting Business Intelligence to Work on Hadoop Data StoresDATAVERSITY

An inexpensive way of storing large volumes of data, Hadoop is also scalable and redundant. But getting data out of Hadoop is tough due to a lack of a built-in query language. Also, because users experience high latency (up to several minutes per query), Hadoop is not appropriate for ad hoc query, reporting, and business analysis with traditional tools. The first step in overcoming Hadoop's constraints is connecting to HIVE, a data warehouse infrastructure built on top of Hadoop, which provides the relational structure necessary for schedule reporting of large datasets data stored in Hadoop files. HIVE also provides a simple query language called Hive QL which is based on SQL and which enables users familiar with SQL to query this data. But to really unlock the power of Hadoop, you must be able to efficiently extract data stored across multiple (often tens or hundreds) of nodes with a user-friendly ETL (extract, transform and load) tool that will then allow you to move your Hadoop data into a relational data mart or warehouse where you can use BI tools for analysis.

Advanced Reporting and ETL for MongoDB: Easily Build a 360-Degree View of You...MongoDB

The document discusses Pentaho's analytics and ETL solutions for MongoDB. It provides an overview of Pentaho Company and its platform for unified business analytics and data integration. It then outlines how Pentaho can be used to build a 360-degree view of customers by extracting, transforming and loading data from source systems into MongoDB and performing analytics and reporting on the MongoDB data. It demonstrates these capabilities with examples and screenshots.

Big data for product managersAIPMM Administration

Big Data has been a "buzz word" for a few years now, and it's generated a fair amount of hype. But, while the technology landscape is still evolving, product companies in the software, web, and hardware areas have actually led the way in delivering real value from data sources like weblogs, sensors, and social media as well as systems like Hadoop, NoSQL, and Analytical Databases. These organizations have built "Big Data Apps" that leverage fast, flexible data frameworks to solve a wide array of user problems, scale to massive audiences, and deliver superior predictive intelligence. Join this webinar to learn why product managers should understand Big Data and hear about real-life products that have been elevated with these innovative technologies. You will hear from: - Ben Hopkins, Product Marketing Manager at Pentaho, who will discuss what Big Data means for product strategy and why it represents a new toolset for product teams to meet user needs and build competitive advantage - Jim Stascavage, VP of Engineering at ESRG, who will discuss how his company has innovated with Big Data and predictive analytics to deliver technology products that optimize fuel consumption and maintenance cycles in the maritime and heavy industry sectors, leveraging trillions of sensor data points a year. Who Should Attend Product Managers, Product Marketing Managers, Project Managers, Development Managers, Product Executives, and anyone responsible for addressing customer needs & influencing product strategy.

Accelerate and Scale Big Data Analytics and Machine Learning Pipelines with D...Alluxio, Inc.

Pentaho Roadmap 2011Datalytics

Pentaho Big Data Analytics with Vertica and HadoopMark Kromer

Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake...NoSQLmatters

Come to this deep dive on how Pivotal's Data Lake Vision is evolving by embracing next generation in-memory data exchange and compute technologies around Spark and Tachyon. Did we say Hadoop, SQL, and what's the shortest path to get from past to future state? The next generation of data lake technology will leverage the availability of in-memory processing, with an architecture that supports multiple data analytics workloads within a single environment: SQL, R, Spark, batch and transactional.

MongoDB IoT City Tour LONDON: Analysing the Internet of Things: Davy Nys, Pen...MongoDB

1) The document discusses Pentaho's beliefs around Internet of Things (IoT) analytics, including applying the right data source and processing for different analytics needs, gaining insights by blending multiple data sources on demand, and planning for agility, flexibility and near real-time analytics. 2) It describes how emerging big data use cases demand blending different data sources and provides examples like improving operations and customer experience. 3) The document advocates an Extract-Transform-Report approach for IoT analytics that provides flexibility to integrate diverse data sources and enables real-time insights.

What's on Your Wish List?MongoDB

<b>Blending Hadoop and MongoDB with Pentaho </b>[11:10 am - 11:30 am]<br />For eCommerce companies, knowing how promoted wish-lists can spark consumer spending is an analytics goldmine. In this lightning talk, Bo Borland will demonstrate how Pentaho analytics can blend click-stream data about promoted wish-lists with sales transaction records using Hadoop, MongoDB and Pentaho to reveal patterns in online shopping behavior. Regardless of your industry or specific use model, come to this session to learn how to blend MongoDB data with any data source for greater business insight. Pentaho offers the first end-to-end analytic solution for MongoDB. From data ingestion to pixel perfect reporting and ad hoc “slice and dice” analysis, the solution meets today’s growing demand for a 360-degree view of your business.

Open Analytics 2014 - Pedro Alves - Innovation though Open SourceOpenAnalytics Spain

Delivering the Future of Analytics: Innovation through Open Source Pentaho was born out of the desire to achieve positive, disruptive change in the business analytics market, dominated by bureaucratic megavendors offering expensive heavy-weight products built on outdated technology platforms. Pentaho’s open, embeddable data integration and analytics platform was developed with a strong open source heritage. This provided Pentaho a first-mover advantage to engage early with adopters of big data technologies and solve the difficult challenges of integrating both established and emerging data types to drive analytics. Continued technology innovations to support the big data ecosystem, have kept customers ahead of the big data curve. With the ability to drastically reduce the time to design, develop and deploy big data solutions, Pentaho counts numerous big data customers, both large and small, across the financial services, retail, travel, healthcare and government industries around the world.

Evolution of Big Data at Intel - Crawl, Walk and Run ApproachDataWorks Summit

Intel's big data journey began in 2011 with an evaluation of Hadoop. Since then, Intel has expanded its use of Hadoop and Cloudera across multiple environments. Intel's 3-year roadmap focuses on evolving its Hadoop platform to support more advanced analytics, real-time capabilities, and integrating with traditional BI tools. Key strategies include designing for scalability, following an iterative approach to understand data, and leveraging open source technologies.

Filling the Data LakeDataWorks Summit/Hadoop Summit

This document discusses strategies for filling a data lake by improving the process of data onboarding. It advocates using a template-based approach to streamline data ingestion from various sources and reduce dependence on hardcoded procedures. The key aspects are managing ELT templates and metadata through automated metadata extraction. This allows generating integration jobs dynamically based on metadata passed at runtime, providing flexibility to handle different source data with one template. It emphasizes reducing the risks associated with large data onboarding projects by maintaining a standardized and organized data lake.

Web Briefing: Unlock the power of Hadoop to enable interactive analyticsKognitio

This document provides an agenda and summaries for a web briefing on unlocking the power of Hadoop to enable interactive analytics and real-time business intelligence. The agenda includes demonstrations on SQL and Hadoop with in-memory acceleration, interactive analytics with Hadoop, and modern data architectures. It also includes presentations on big data drivers and patterns, interoperating Hadoop with existing data tools, and using Hadoop to power new targeted applications.

Cloudian 451-hortonworks - webinarHortonworks

Join Cloudian, Hortonworks and 451 Research for a panel-style Q&A discussion about the latest trends and technology innovations in Big Data and Analytics. Matt Aslett, Data Platforms and Analytics Research Director at 451 Research, John Kreisa, Vice President of Strategic Marketing at Hortonworks, and Paul Turner, Chief Marketing Officer at Cloudian, will answer your toughest questions about data storage, data analytics, log data, sensor data and the Internet of Things. Bring your questions or just come and listen!

Driving Real Insights Through Data ScienceVMware Tanzu

Major changes in industries have been brought about by the emergence of data-driven discoveries and applications. Many organizations are bringing together their data, and looking to drive change. But the ability to generate new insights in real time from a massive sets of data is still far from commonplace. At this event, data technology experts and data scientists from Pivotal provided the latest business perspective on how data science and engineering can be used to accelerate the generation of new insights. For information about upcoming Pivotal events, please visit: http://pivotal.io/news-events/#events

Pentaho Analytics on MongoDBMark Kromer

Hadoop uk user group meeting finalSkills Matter

Plug 20110217Skills Matter

BI congres 2014-5: from BI to big data - Jan Aertsen - PentahoBICC Thomas More

How advanced analytics is impacting the banking sectorMichael Haddad

Putting Business Intelligence to Work on Hadoop Data StoresDATAVERSITY

Advanced Reporting and ETL for MongoDB: Easily Build a 360-Degree View of You...MongoDB

Big data for product managersAIPMM Administration

Accelerate and Scale Big Data Analytics and Machine Learning Pipelines with D...Alluxio, Inc.

Pentaho Roadmap 2011Datalytics

Pentaho Big Data Analytics with Vertica and HadoopMark Kromer

Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake...NoSQLmatters

MongoDB IoT City Tour LONDON: Analysing the Internet of Things: Davy Nys, Pen...MongoDB

What's on Your Wish List?MongoDB

Open Analytics 2014 - Pedro Alves - Innovation though Open SourceOpenAnalytics Spain

Evolution of Big Data at Intel - Crawl, Walk and Run ApproachDataWorks Summit

Filling the Data LakeDataWorks Summit/Hadoop Summit

Web Briefing: Unlock the power of Hadoop to enable interactive analyticsKognitio

Cloudian 451-hortonworks - webinarHortonworks

Driving Real Insights Through Data ScienceVMware Tanzu

Pentaho Analytics on MongoDBMark Kromer

More from Cloudera, Inc. (20)

Partner Briefing_January 25 (FINAL).pptxCloudera, Inc.

The document discusses using Cloudera DataFlow to address challenges with collecting, processing, and analyzing log data across many systems and devices. It provides an example use case of logging modernization to reduce costs and enable security solutions by filtering noise from logs. The presentation shows how DataFlow can extract relevant events from large volumes of raw log data and normalize the data to make security threats and anomalies easier to detect across many machines.

Cloudera Data Impact Awards 2021 - Finalists Cloudera, Inc.

The document outlines the 2021 finalists for the annual Data Impact Awards program, which recognizes organizations using Cloudera's platform and the impactful applications they have developed. It provides details on the challenges, solutions, and outcomes for each finalist project in the categories of Data Lifecycle Connection, Cloud Innovation, Data for Enterprise AI, Security & Governance Leadership, Industry Transformation, People First, and Data for Good. There are multiple finalists highlighted in each category demonstrating innovative uses of data and analytics.

2020 Cloudera Data Impact Awards FinalistsCloudera, Inc.

Cloudera is proud to present the 2020 Data Impact Awards Finalists. This annual program recognizes organizations running the Cloudera platform for the applications they've built and the impact their data projects have on their organizations, their industries, and the world. Nominations were evaluated by a panel of independent thought-leaders and expert industry analysts, who then selected the finalists and winners. Winners exemplify the most-cutting edge data projects and represent innovation and leadership in their respective industries.

Edc event vienna presentation 1 oct 2019Cloudera, Inc.

The document outlines the agenda for Cloudera's Enterprise Data Cloud event in Vienna. It includes welcome remarks, keynotes on Cloudera's vision and customer success stories. There will be presentations on the new Cloudera Data Platform and customer case studies, followed by closing remarks. The schedule includes sessions on Cloudera's approach to data warehousing, machine learning, streaming and multi-cloud capabilities.

Machine Learning with Limited Labeled Data 4/3/19Cloudera, Inc.

Data Driven With the Cloudera Modern Data Warehouse 3.19.19Cloudera, Inc.

Introducing Cloudera DataFlow (CDF) 2.13.19Cloudera, Inc.

Introducing Cloudera Data Science Workbench for HDP 2.12.19Cloudera, Inc.

Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Cloudera, Inc.

Leveraging the cloud for analytics and machine learning 1.29.19Cloudera, Inc.

Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Cloudera, Inc.

Leveraging the Cloud for Big Data Analytics 12.11.18Cloudera, Inc.

Modern Data Warehouse Fundamentals Part 3Cloudera, Inc.

Modern Data Warehouse Fundamentals Part 2Cloudera, Inc.

Modern Data Warehouse Fundamentals Part 1Cloudera, Inc.

The document discusses the benefits and trends of modernizing a data warehouse. It outlines how a modern data warehouse can provide deeper business insights at extreme speed and scale while controlling resources and costs. Examples are provided of companies that have improved fraud detection, customer retention, and machine performance by implementing a modern data warehouse that can handle large volumes and varieties of data from many sources.

Extending Cloudera SDX beyond the PlatformCloudera, Inc.

Federated Learning: ML with Privacy on the Edge 11.15.18Cloudera, Inc.

Analyst Webinar: Doing a 180 on Customer 360Cloudera, Inc.

Build a modern platform for anti-money laundering 9.19.18Cloudera, Inc.

Introducing the data science sandbox as a service 8.30.18Cloudera, Inc.

Partner Briefing_January 25 (FINAL).pptxCloudera, Inc.

Cloudera Data Impact Awards 2021 - Finalists Cloudera, Inc.

2020 Cloudera Data Impact Awards FinalistsCloudera, Inc.

Edc event vienna presentation 1 oct 2019Cloudera, Inc.

Machine Learning with Limited Labeled Data 4/3/19Cloudera, Inc.

Data Driven With the Cloudera Modern Data Warehouse 3.19.19Cloudera, Inc.

Introducing Cloudera DataFlow (CDF) 2.13.19Cloudera, Inc.

Introducing Cloudera Data Science Workbench for HDP 2.12.19Cloudera, Inc.

Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Cloudera, Inc.

Leveraging the cloud for analytics and machine learning 1.29.19Cloudera, Inc.

Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Cloudera, Inc.

Leveraging the Cloud for Big Data Analytics 12.11.18Cloudera, Inc.

Modern Data Warehouse Fundamentals Part 3Cloudera, Inc.

Modern Data Warehouse Fundamentals Part 2Cloudera, Inc.

Modern Data Warehouse Fundamentals Part 1Cloudera, Inc.

Extending Cloudera SDX beyond the PlatformCloudera, Inc.

Federated Learning: ML with Privacy on the Edge 11.15.18Cloudera, Inc.

Analyst Webinar: Doing a 180 on Customer 360Cloudera, Inc.

Build a modern platform for anti-money laundering 9.19.18Cloudera, Inc.

Introducing the data science sandbox as a service 8.30.18Cloudera, Inc.

Recently uploaded (20)

Big Data Analytics Quick Research Guide by Arthur MorganArthur Morgan

This is a Quick Research Guide (QRG). QRGs include the following: - A brief, high-level overview of the QRG topic. - A milestone timeline for the QRG topic. - Links to various free online resource materials to provide a deeper dive into the QRG topic. - Conclusion and a recommendation for at least two books available in the SJPL system on the QRG topic. QRGs planned for the series: - Artificial Intelligence QRG - Quantum Computing QRG - Big Data Analytics QRG - Spacecraft Guidance, Navigation & Control QRG (coming 2026) - UK Home Computing & The Birth of ARM QRG (coming 2027) Any questions or comments? - Please contact Arthur Morgan at [email protected]. 100% human made.

Technology Trends in 2025: AI and Big Data AnalyticsInData Labs

At InData Labs, we have been keeping an ear to the ground, looking out for AI-enabled digital transformation trends coming our way in 2025. Our report will provide a look into the technology landscape of the future, including: -Artificial Intelligence Market Overview -Strategies for AI Adoption in 2025 -Anticipated drivers of AI adoption and transformative technologies -Benefits of AI and Big data for your business -Tips on how to prepare your business for innovation -AI and data privacy: Strategies for securing data privacy in AI models, etc. Download your free copy nowand implement the key findings to improve your business.

Electronic_Mail_Attacks-1-35.pdf by xploitniftliyevhuseyn

Andrew Marnell: Transforming Business Strategy Through Data-Driven InsightsAndrew Marnell

AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...SOFTTECHHUB

I started my online journey with several hosting services before stumbling upon Ai EngineHost. At first, the idea of paying one fee and getting lifetime access seemed too good to pass up. The platform is built on reliable US-based servers, ensuring your projects run at high speeds and remain safe. Let me take you step by step through its benefits and features as I explain why this hosting solution is a perfect fit for digital entrepreneurs.

HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungenpanagenda

Webinar Recording: https://www.panagenda.com/webinars/hcl-nomad-web-best-practices-und-verwaltung-von-multiuser-umgebungen/ HCL Nomad Web wird als die nächste Generation des HCL Notes-Clients gefeiert und bietet zahlreiche Vorteile, wie die Beseitigung des Bedarfs an Paketierung, Verteilung und Installation. Nomad Web-Client-Updates werden “automatisch” im Hintergrund installiert, was den administrativen Aufwand im Vergleich zu traditionellen HCL Notes-Clients erheblich reduziert. Allerdings stellt die Fehlerbehebung in Nomad Web im Vergleich zum Notes-Client einzigartige Herausforderungen dar. Begleiten Sie Christoph und Marc, während sie demonstrieren, wie der Fehlerbehebungsprozess in HCL Nomad Web vereinfacht werden kann, um eine reibungslose und effiziente Benutzererfahrung zu gewährleisten. In diesem Webinar werden wir effektive Strategien zur Diagnose und Lösung häufiger Probleme in HCL Nomad Web untersuchen, einschließlich - Zugriff auf die Konsole - Auffinden und Interpretieren von Protokolldateien - Zugriff auf den Datenordner im Cache des Browsers (unter Verwendung von OPFS) - Verständnis der Unterschiede zwischen Einzel- und Mehrbenutzerszenarien - Nutzung der Client Clocking-Funktion

HCL Nomad Web – Best Practices and Managing Multiuser Environmentspanagenda

Webinar Recording: https://www.panagenda.com/webinars/hcl-nomad-web-best-practices-and-managing-multiuser-environments/ HCL Nomad Web is heralded as the next generation of the HCL Notes client, offering numerous advantages such as eliminating the need for packaging, distribution, and installation. Nomad Web client upgrades will be installed “automatically” in the background. This significantly reduces the administrative footprint compared to traditional HCL Notes clients. However, troubleshooting issues in Nomad Web present unique challenges compared to the Notes client. Join Christoph and Marc as they demonstrate how to simplify the troubleshooting process in HCL Nomad Web, ensuring a smoother and more efficient user experience. In this webinar, we will explore effective strategies for diagnosing and resolving common problems in HCL Nomad Web, including - Accessing the console - Locating and interpreting log files - Accessing the data folder within the browser’s cache (using OPFS) - Understand the difference between single- and multi-user scenarios - Utilizing Client Clocking

Rusty Waters: Elevating Lakehouses Beyond Sparkcarlyakerly1

Spark is a powerhouse for large datasets, but when it comes to smaller data workloads, its overhead can sometimes slow things down. What if you could achieve high performance and efficiency without the need for Spark? At S&P Global Commodity Insights, having a complete view of global energy and commodities markets enables customers to make data-driven decisions with confidence and create long-term, sustainable value. 🌍 Explore delta-rs + CDC and how these open-source innovations power lightweight, high-performance data applications beyond Spark! 🚀

SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdfPrecisely

Linux Support for SMARC: How Toradex Empowers Embedded DevelopersToradex

Toradex brings robust Linux support to SMARC (Smart Mobility Architecture), ensuring high performance and long-term reliability for embedded applications. Here’s how: • Optimized Torizon OS & Yocto Support – Toradex provides Torizon OS, a Debian-based easy-to-use platform, and Yocto BSPs for customized Linux images on SMARC modules. • Seamless Integration with i.MX 8M Plus and i.MX 95 – Toradex SMARC solutions leverage NXP’s i.MX 8 M Plus and i.MX 95 SoCs, delivering power efficiency and AI-ready performance. • Secure and Reliable – With Secure Boot, over-the-air (OTA) updates, and LTS kernel support, Toradex ensures industrial-grade security and longevity. • Containerized Workflows for AI & IoT – Support for Docker, ROS, and real-time Linux enables scalable AI, ML, and IoT applications. • Strong Ecosystem & Developer Support – Toradex offers comprehensive documentation, developer tools, and dedicated support, accelerating time-to-market. With Toradex’s Linux support for SMARC, developers get a scalable, secure, and high-performance solution for industrial, medical, and AI-driven applications. Do you have a specific project or application in mind where you're considering SMARC? We can help with Free Compatibility Check and help you with quick time-to-market For more information: https://www.toradex.com/computer-on-modules/smarc-arm-family

Drupalcamp Finland – Measuring Front-end Energy ConsumptionExove

Greenhouse_Monitoring_Presentation.pptx.hpbmnnxrvb

Generative Artificial Intelligence (GenAI) in BusinessDr. Tathagat Varma

Special Meetup Edition - TDX Bengaluru Meetup #52.pptxshyamraj55

How Can I use the AI Hype in my Business Context?Daniel Lehner

𝙄𝙨 𝘼𝙄 𝙟𝙪𝙨𝙩 𝙝𝙮𝙥𝙚? 𝙊𝙧 𝙞𝙨 𝙞𝙩 𝙩𝙝𝙚 𝙜𝙖𝙢𝙚 𝙘𝙝𝙖𝙣𝙜𝙚𝙧 𝙮𝙤𝙪𝙧 𝙗𝙪𝙨𝙞𝙣𝙚𝙨𝙨 𝙣𝙚𝙚𝙙𝙨? Everyone’s talking about AI but is anyone really using it to create real value? Most companies want to leverage AI. Few know 𝗵𝗼𝘄. ✅ What exactly should you ask to find real AI opportunities? ✅ Which AI techniques actually fit your business? ✅ Is your data even ready for AI? If you’re not sure, you’re not alone. This is a condensed version of the slides I presented at a Linkedin webinar for Tecnovy on 28.04.2025.

Splunk Security Update | Public Sector Summit Germany 2025Splunk

What is Model Context Protocol(MCP) - The new technology for communication bw...Vishnu Singh Chundawat

The MCP (Model Context Protocol) is a framework designed to manage context and interaction within complex systems. This SlideShare presentation will provide a detailed overview of the MCP Model, its applications, and how it plays a crucial role in improving communication and decision-making in distributed systems. We will explore the key concepts behind the protocol, including the importance of context, data management, and how this model enhances system adaptability and responsiveness. Ideal for software developers, system architects, and IT professionals, this presentation will offer valuable insights into how the MCP Model can streamline workflows, improve efficiency, and create more intuitive systems for a wide range of use cases.

tecnologias de las primeras civilizaciones.pdffjgm517

The Evolution of Meme Coins A New Era for Digital Currency ppt.pdfAbi john

Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...Impelsys Inc.

Big Data Analytics Quick Research Guide by Arthur MorganArthur Morgan

Technology Trends in 2025: AI and Big Data AnalyticsInData Labs

Electronic_Mail_Attacks-1-35.pdf by xploitniftliyevhuseyn

Andrew Marnell: Transforming Business Strategy Through Data-Driven InsightsAndrew Marnell

AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...SOFTTECHHUB

HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungenpanagenda

HCL Nomad Web – Best Practices and Managing Multiuser Environmentspanagenda

Rusty Waters: Elevating Lakehouses Beyond Sparkcarlyakerly1

SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdfPrecisely

Linux Support for SMARC: How Toradex Empowers Embedded DevelopersToradex

Drupalcamp Finland – Measuring Front-end Energy ConsumptionExove

Greenhouse_Monitoring_Presentation.pptx.hpbmnnxrvb

Generative Artificial Intelligence (GenAI) in BusinessDr. Tathagat Varma

Special Meetup Edition - TDX Bengaluru Meetup #52.pptxshyamraj55

How Can I use the AI Hype in my Business Context?Daniel Lehner

Splunk Security Update | Public Sector Summit Germany 2025Splunk

What is Model Context Protocol(MCP) - The new technology for communication bw...Vishnu Singh Chundawat

tecnologias de las primeras civilizaciones.pdffjgm517

The Evolution of Meme Coins A New Era for Digital Currency ppt.pdfAbi john

Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...Impelsys Inc.

Pentaho - Jake Cornelius - Hadoop World 2010

4. 010, Pentaho. All Rights Reserved. www.pentaho.com. US and Worldwide: +1 (866) 660-7555 | Slide Pentaho Data Integration Hadoop Pentaho Data Integration Data Marts, Data Warehouse, Analytical Applications Design Deploy Orchestrate Pentaho Data Integration Pentaho Data Integration

5. 010, Pentaho. All Rights Reserved. www.pentaho.com. US and Worldwide: +1 (866) 660-7555 | Slide Optimize Visualize Load Files / HDFS Hive DM & DW Applications & Systems Web Tier RDBMS Hadoop Reporting / Dashboards / Analysis

8. • Pentaho for Hadoop Download Capability • Includes support for development, production support will follow with GA • Collaborative effort between Pentaho and the Pentaho Community • 60+ beta sites over three month beta cycle • Pentaho contributed code for API integration with HIVE to the open source Apache Foundation • Pentaho and Cloudera Partnership • Combines Pentaho ‘s business intelligence and data integration capabilities with Cloudera’s Distribution for Hadoop (CDH) • Enables business users to take advantage of Hadoop with ability to easily and cost-effectively mine, visualize and analyze their Hadoop data Pentaho for Hadoop Announcements

9. Pentaho for Hadoop Announcements (cont) • Pentaho and Impetus Technologies Partnership • Incorporates Pentaho Agile BI and Pentaho BI Suite for Hadoop into Impetus Large Data Analytics practice • First major SI to adopt Pentaho for Hadoop • Facilitates large data analytics projects including expert consulting services, best practices support in Hadoop implementations and nCluster including deployment on private and public clouds

10. 010, Pentaho. All Rights Reserved. www.pentaho.com. US and Worldwide: +1 (866) 660-7555 | Slide Pentaho for Hadoop Resources & Events Resources Download www.pentaho.com/download/hadoop Pentaho for Hadoop webpage - resources, press, events, partnerships and more: www.pentaho.com/hadoop Big Data Analytics: 5 part video series with James Dixon, Pentaho CTO Events Hadoop World: NYC - Oct 12, Gold Sponsor, Exhibitor, Richard Daley presenting, ‘Putting Analytics in Big Data Analysis’ London Hadoop User Group - Oct 12, London Agile BI Meets Big Data - Oct 13, New York City

11. 010, Pentaho. All Rights Reserved. www.pentaho.com. US and Worldwide: +1 (866) 660-7555 | Slide Thank You. Join the conversation. You can find us on: Pentaho Facebook Group @Pentaho http://blog.pentaho.com Pentaho - Open Source Business Intelligence Group

Editor's Notes

#3: In a traditional BI system where we have not been able to store all of the raw data, we have solved the problem by being selective. Firstly we selected the attributes of the data that we know we have questions about. Then we cleansed it and aggregated it to transaction levels or higher, and packaged it up in a form that is easy to consume. Then we put it into an expensive system that we could not scale, whether technically or financially. The rest of the data was thrown away or archived on tape, which for the purposes of analysis, is the same as throwing it away. TRANSITION The problem is we don’t know what is in the data that we are throwing away or archiving. We can only answer the questions that we could predict ahead of time.
#4: When we look at the Big Data architecture we described before we recall that * We want to store all of the data, so we can answer both known and unknown questions * We want to satisfy our standard reporting and analysis requirements * We want to satisfying ad-hoc needs by providing the ability to dip into the lake at any time to extract data * We want to balance balance performance and cost as we scale We need the ability to take the data in the Data Lake and easily convert it into data suitable for a data mart, data warehouse or ad-hoc data set - without requiring custom Java code
#5: Fortunately we have an embeddable data integration engine, written in Java We have taken our Data Integration engine, PDI and integrated with Hadoop in a number of different areas: * We have the ability to move files between Hadoop and external locations * We have the ability to read and write to HDFS files during data transformations * We have the ability to execute data transformations within the MapReduce engine * We have the ability to extract information from Hadoop and load it into external data bases and applications * And we have the ability to orchestrate all of this so you can integrate Hadoop into the rest of your data architecture with scheduling, monitoring, logging etc
#6: Put in to diagram form so we can indicate the different layers in the architecture and also show the scale of the data we get this Big Data pyramid. * At the bottom of the pyramid we have Hadoop, containing our complete set of data. * Higher up we have our data mart layer. This layer has less data in it, but has better performance. * At the top we have application-level data caches. * Looking down from the top, from the perspective of our users, they can see the whole pyramid - they have access to the whole structure. The only thing that varies is the query time, depending on what data they want. * Here we see that the RDBMS layer lets up optimize access to the data. We can decide how much data we want to stage in this layer. If we add more storage in this layer, we can increase performance of a larger subset of the data lake, but it costs more money.
#7: In this demo we will show how easy it is to execute a series of Hadoop and non-Hadoop tasks. We are going to TRANSITION 1 Get a weblog file from an FTP server TRANSITION 2 Make sure the source file does not exist with the Hadoop file system TRANSITION 3 Copy the weblog file into Hadoop TRANSITION 4 Read the weblog and process it - add metadata about the URLs, add geocoding, and enrich the operating system and browser attributes TRANSITION 5 Write the results of the data transformation to a new, improved, data file TRANSITION 6 Load the data into Hive TRANSITION 7 Read an aggregated data set from Hadoop TRANSITION 8 And write it into a database TRANSITION 9 Slice and dice the data with the database TRANSITION 10 And execute an ad-hoc query into Hadoop

Pentaho - Jake Cornelius - Hadoop World 2010

Recommended

More Related Content

What's hot (20)

Similar to Pentaho - Jake Cornelius - Hadoop World 2010 (20)

More from Cloudera, Inc. (20)

Recently uploaded (20)

Pentaho - Jake Cornelius - Hadoop World 2010

Editor's Notes