BIWA2015 - Bringing Oracle Big Data SQL to OBIEE and ODIMark Rittman
The document discusses Oracle's Big Data SQL, which brings Oracle SQL capabilities to Hadoop data stored in Hive tables. It allows querying Hive data using standard SQL from Oracle Database and viewing Hive metadata in Oracle data dictionary tables. Big Data SQL leverages the Hive metastore and uses direct reads and SmartScan to optimize queries against HDFS and Hive data. This provides a unified SQL interface and optimized query processing for both Oracle and Hadoop data.
Part 4 - Hadoop Data Output and Reporting using OBIEE11gMark Rittman
Delivered as a one-day seminar at the SIOUG and HROUG Oracle User Group Conferences, October 2014.
Once insights and analysis have been produced within your Hadoop cluster by analysts and technical staff, it’s usually the case that you want to share the output with a wider audience in the organisation. Oracle Business Intelligence has connectivity to Hadoop through Apache Hive compatibility, and other Oracle tools such as Oracle Big Data Discovery and Big Data SQL can be used to visualise and publish Hadoop data. In this final session we’ll look at what’s involved in connecting these tools to your Hadoop environment, and also consider where data is optimally located when large amounts of Hadoop data need to be analysed alongside more traditional data warehouse datasets
Part 1 - Introduction to Hadoop and Big Data Technologies for Oracle BI & DW ...Mark Rittman
Delivered as a one-day seminar at the SIOUG and HROUG Oracle User Group Conferences, October 2014
In this presentation we cover some key Hadoop concepts including HDFS, MapReduce, Hive and NoSQL/HBase, with the focus on Oracle Big Data Appliance and Cloudera Distribution including Hadoop. We explain how data is stored on a Hadoop system and the high-level ways it is accessed and analysed, and outline Oracle’s products in this area including the Big Data Connectors, Oracle Big Data SQL, and Oracle Business Intelligence (OBI) and Oracle Data Integrator (ODI).
End to-end hadoop development using OBIEE, ODI, Oracle Big Data SQL and Oracl...Mark Rittman
This document discusses an end-to-end example of using Hadoop, OBIEE, ODI and Oracle Big Data Discovery to analyze big data from various sources. It describes ingesting website log data and Twitter data into a Hadoop cluster, processing and transforming the data using tools like Hive and Spark, and using the results for reporting in OBIEE and data discovery in Oracle Big Data Discovery. ODI is used to automate the data integration process.
Part 2 - Hadoop Data Loading using Hadoop Tools and ODI12cMark Rittman
Delivered as a one-day seminar at the SIOUG and HROUG Oracle User Group Conferences, October 2014.
There are many ways to ingest (load) data into a Hadoop cluster, from file copying using the Hadoop Filesystem (FS) shell through to real-time streaming using technologies such as Flume and Hadoop streaming. In this session we’ll take a high-level look at the data ingestion options for Hadoop, and then show how Oracle Data Integrator and Oracle GoldenGate leverage these technologies to load and process data within your Hadoop cluster. We’ll also consider the updated Oracle Information Management Reference Architecture and look at the best places to land and process your enterprise data, using Hadoop’s schema-on-read approach to hold low-value, low-density raw data, and then use the concept of a “data factory” to load and process your data into more traditional Oracle relational storage, where we hold high-density, high-value data.
ODI12c as your Big Data Integration HubMark Rittman
Presentation from the recent Oracle OTN Virtual Technology Summit, on using Oracle Data Integrator 12c to ingest, transform and process data on a Hadoop cluster.
Leveraging Hadoop with OBIEE 11g and ODI 11g - UKOUG Tech'13Mark Rittman
The latest releases of OBIEE and ODI come with the ability to connect to Hadoop data sources, using MapReduce to integrate data from clusters of "big data" servers complementing traditional BI data sources. In this presentation, we will look at how these two tools connect to Apache Hadoop and access "big data" sources, and share tips and tricks on making it all work smoothly.
What is Big Data Discovery, and how it complements traditional business anal...Mark Rittman
Data Discovery is an analysis technique that complements traditional business analytics, and enables users to combine, explore and analyse disparate datasets to spot opportunities and patterns that lie hidden within your data. Oracle Big Data discovery takes this idea and applies it to your unstructured and big data datasets, giving users a way to catalogue, join and then analyse all types of data across your organization.
In this session we'll look at Oracle Big Data Discovery and how it provides a "visual face" to your big data initatives, and how it complements and extends the work that you currently do using business analytics tools.
Delivering the Data Factory, Data Reservoir and a Scalable Oracle Big Data Ar...Mark Rittman
Presentation from the Rittman Mead BI Forum 2015 masterclass, pt.2 of a two-part session that also covered creating the Discovery Lab. Goes through setting up Flume log + twitter feeds into CDH5 Hadoop using ODI12c Advanced Big Data Option, then looks at the use of OBIEE11g with Hive, Impala and Big Data SQL before finally using Oracle Big Data Discovery for faceted search and data mashup on-top of Hadoop
Deep-Dive into Big Data ETL with ODI12c and Oracle Big Data ConnectorsMark Rittman
- The document discusses Oracle tools for extracting, transforming, and loading (ETL) big data from Hadoop into Oracle databases, including Oracle Data Integrator 12c, Oracle Loader for Hadoop, and Oracle Direct Connector for HDFS.
- It provides an overview of using Hadoop for ETL tasks like data loading, processing, and exporting data to structured databases, as well as tools like Hive, Pig, and Spark for these functions.
- Key benefits of the Oracle Hadoop connectors include pushing data transformations to Hadoop clusters for scale and leveraging SQL interfaces to access Hadoop data for business intelligence.
Adding a Data Reservoir to your Oracle Data Warehouse for Customer 360-Degree...Mark Rittman
This document summarizes a presentation about adding a Hadoop-based data reservoir to an Oracle data warehouse. The presentation discusses using a data reservoir to store large amounts of raw customer data from various sources to enable 360-degree customer analysis. It describes loading and integrating the data reservoir with the data warehouse using Oracle tools and how organizations can use it for more personalized customer marketing through advanced analytics and machine learning.
Innovation in the Data Warehouse - StampedeCon 2016StampedeCon
Enterprise Holding’s first started with Hadoop as a POC in 2013. Today, we have clusters on premises and in the cloud. This talk will explore our experience with Big Data and outline three common big data architectures (batch, lambda, and kappa). Then, we’ll dive into the decision points to necessary for your own cluster, for example: cloud vs on premises, physical vs virtual, workload, and security. These decisions will help you understand what direction to take. Finally, we’ll share some lessons learned with the pieces of our architecture worked well and rant about those which didn’t. No deep Hadoop knowledge is necessary, architect or executive level.
Unlock the value in your big data reservoir using oracle big data discovery a...Mark Rittman
The document discusses Oracle Big Data Discovery and how it can be used to analyze and gain insights from data stored in a Hadoop data reservoir. It provides an example scenario where Big Data Discovery is used to analyze website logs, tweets, and website posts and comments to understand popular content and influencers for a company. The data is ingested into the Big Data Discovery tool, which automatically enriches the data. Users can then explore the data, apply additional transformations, and visualize relationships to gain insights.
The document discusses the Lambda architecture, which combines batch and stream processing. It provides an example implementation using Hadoop, Kafka, Storm and other tools. The Lambda architecture handles batch loading and querying of large datasets as well as real-time processing of data streams. It also discusses using YARN and Spark for distributed processing and refreshing enrichments.
Building a Next-gen Data Platform and Leveraging the OSS Ecosystem for Easy W...StampedeCon
This session will be a detailed recount of the design, implementation, and launch of the next-generation Shutterstock Data Platform, with strong emphasis on conveying clear, understandable learnings that can be transferred to your own organizations and projects. This platform was architected around the prevailing use of Kafka as a highly-scalable central data hub for shipping data across your organization in batch or streaming fashion. It also relies heavily on Avro as a serialization format and a global schema registry to provide structure that greatly improves quality and usability of our data sets, while also allowing the flexibility to evolve schemas and maintain backwards compatibility.
As a company, Shutterstock has always focused heavily on leveraging open source technologies in developing its products and infrastructure, and open source has been a driving force in big data more so than almost any other software sub-sector. With this plethora of constantly evolving data technologies, it can be a daunting task to select the right tool for your problem. We will discuss our approach for choosing specific existing technologies and when we made decisions to invest time in home-grown components and solutions.
We will cover advantages and the engineering process of developing language-agnostic APIs for publishing to and consuming from the data platform. These APIs can power some very interesting streaming analytics solutions that are easily accessible to teams across our engineering organization.
We will also discuss some of the massive advantages a global schema for your data provides for downstream ETL and data analytics. ETL into Hadoop and creation and maintenance of Hive databases and tables becomes much more reliable and easily automated with historically compatible schemas. To complement this schema-based approach, we will cover results of performance testing various file formats and compression schemes in Hadoop and Hive, the massive performance benefits you can gain in analytical workloads by leveraging highly optimized columnar file formats such as ORC and Parquet, and how you can use good old fashioned Hive as a tool for easily and efficiently converting exiting datasets into these formats.
Finally, we will cover lessons learned in launching this platform across our organization, future improvements and further design, and the need for data engineers to understand and speak the languages of data scientists and web, infrastructure, and network engineers.
Syncsort, Tableau, & Cloudera present: Break the Barriers to Big Data InsightSteven Totman
Demand for quicker access to multiple integrated sources of data continues to rise. Immediate access to data stored in a variety of systems - such as mainframes, data warehouses, and data marts - to mine visually for business intelligence is the competitive differentiation enterprises need to win in today’s economy.
Stop playing the waiting game and learn about a new end-to-end solution for combining, analyzing, and visualizing data from practically any source in your enterprise environment.
Leading organizations are already taking advantage of this architectural innovation to gain modern insights while reducing costs and propelling their businesses ahead of the competition.
Are you tired of waiting? Don't let your architecture hold you back. Access this webinar and hear from a team of industry experts on how you can Break the Barriers to Big Data Insight.
Introduction to Kudu - StampedeCon 2016StampedeCon
Over the past several years, the Hadoop ecosystem has made great strides in its real-time access capabilities, narrowing the gap compared to traditional database technologies. With systems such as Impala and Spark, analysts can now run complex queries or jobs over large datasets within a matter of seconds. With systems such as Apache HBase and Apache Phoenix, applications can achieve millisecond-scale random access to arbitrarily-sized datasets.
Despite these advances, some important gaps remain that prevent many applications from transitioning to Hadoop-based architectures. Users are often caught between a rock and a hard place: columnar formats such as Apache Parquet offer extremely fast scan rates for analytics, but little to no ability for real-time modification or row-by-row indexed access. Online systems such as HBase offer very fast random access, but scan rates that are too slow for large scale data warehousing workloads.
This talk will investigate the trade-offs between real-time transactional access and fast analytic performance from the perspective of storage engine internals. It will also describe Kudu, the new addition to the open source Hadoop ecosystem that fills the gap described above, complementing HDFS and HBase to provide a new option to achieve fast scans and fast random access from a single API.
Syncsort, Tableau, & Cloudera present: Break the Barriers to Big Data InsightPrecisely
The document discusses moving legacy data and workloads from traditional data warehouses to Hadoop. It describes how ELT processes on dormant data waste resources and how offloading this data to Hadoop can optimize costs and performance. The presentation includes a demonstration of using Tableau for self-service analytics on data in Hadoop and a case study of a financial organization reducing ELT development time from weeks to hours by offloading mainframe data to Hadoop.
TimesTen - Beyond the Summary Advisor (ODTUG KScope'14)Mark Rittman
Presentation from ODTUG KScope'14, Seattle, on using TimesTen as a standalone analytic database, and going beyond the use of the Exalytics Summary Advisor.
Syncsort, Tableau, & Cloudera present: Break the Barriers to Big Data InsightCloudera, Inc.
Rethink data management and learn how to break down barriers to Big Data insight with Cloudera's enterprise data hub (EDH), Syncsort offload solutions, and Tableau Software visualization and analytics.
Format Wars: from VHS and Beta to Avro and ParquetDataWorks Summit
The document discusses different data storage formats such as text, Avro, Parquet, and their suitability for writing and reading data. It provides examples of how to choose a format based on factors like query needs, data types, and whether schemas need to evolve. The document also demonstrates how Avro can handle schema evolution by adding or changing fields while still reading existing data.
This document discusses strategies for filling a data lake by improving the process of data onboarding. It advocates using a template-based approach to streamline data ingestion from various sources and reduce dependence on hardcoded procedures. The key aspects are managing ELT templates and metadata through automated metadata extraction. This allows generating integration jobs dynamically based on metadata passed at runtime, providing flexibility to handle different source data with one template. It emphasizes reducing the risks associated with large data onboarding projects by maintaining a standardized and organized data lake.
The document discusses Ivan Zoratti's presentation on using MySQL for big data. It defines big data and how it can be structured as either unstructured or structured data. It then outlines various technologies that can be used with MySQL like storage engines, partitioning, columnar databases, and the MariaDB optimizer. The presentation provides an overview of how these technologies can help manage large and complex data sets with MySQL.
Pivotal HAWQ and Hortonworks Data Platform: Modern Data Architecture for IT T...VMware Tanzu
Pivotal HAWQ, one of the world’s most advanced enterprise SQL on Hadoop technology, coupled with the Hortonworks Data Platform, the only 100% open source Apache Hadoop data platform, can turbocharge your analytic efforts. The slides from this technical webinar present a deep dive on this powerful modern data architecture for analytics and data science.
Learn more here: http://pivotal.io/big-data/pivotal-hawq
Integrated Data Warehouse with Hadoop and Oracle DatabaseGwen (Chen) Shapira
This document discusses building an integrated data warehouse with Oracle Database and Hadoop. It provides an overview of big data and why data warehouses need Hadoop. It also gives examples of how Hadoop can be integrated into a data warehouse, including using Sqoop to import and export data between Hadoop and Oracle. Finally, it discusses best practices for using Hadoop efficiently and avoiding common pitfalls when integrating Hadoop with a data warehouse.
Using Oracle Big Data SQL 3.0 to add Hadoop & NoSQL to your Oracle Data Wareh...Mark Rittman
As presented at OGh SQL Celebration Day in June 2016, NL. Covers new features in Big Data SQL including storage indexes, storage handlers and ability to install + license on commodity hardware
Oracle Cloud : Big Data Use Cases and ArchitectureRiccardo Romani
Oracle Itay Systems Presales Team presents : Big Data in any flavor, on-prem, public cloud and cloud at customer.
Presentation done at Digital Transformation event - February 2017
Big Data for Oracle Devs - Towards Spark, Real-Time and Predictive AnalyticsMark Rittman
This is a session for Oracle DBAs and devs that looks at the cutting edge big data techs like Spark, Kafka etc, and through demos shows how Hadoop is now a a real-time platform for fast analytics, data integration and predictive modeling
Delivering the Data Factory, Data Reservoir and a Scalable Oracle Big Data Ar...Mark Rittman
Presentation from the Rittman Mead BI Forum 2015 masterclass, pt.2 of a two-part session that also covered creating the Discovery Lab. Goes through setting up Flume log + twitter feeds into CDH5 Hadoop using ODI12c Advanced Big Data Option, then looks at the use of OBIEE11g with Hive, Impala and Big Data SQL before finally using Oracle Big Data Discovery for faceted search and data mashup on-top of Hadoop
Deep-Dive into Big Data ETL with ODI12c and Oracle Big Data ConnectorsMark Rittman
- The document discusses Oracle tools for extracting, transforming, and loading (ETL) big data from Hadoop into Oracle databases, including Oracle Data Integrator 12c, Oracle Loader for Hadoop, and Oracle Direct Connector for HDFS.
- It provides an overview of using Hadoop for ETL tasks like data loading, processing, and exporting data to structured databases, as well as tools like Hive, Pig, and Spark for these functions.
- Key benefits of the Oracle Hadoop connectors include pushing data transformations to Hadoop clusters for scale and leveraging SQL interfaces to access Hadoop data for business intelligence.
Adding a Data Reservoir to your Oracle Data Warehouse for Customer 360-Degree...Mark Rittman
This document summarizes a presentation about adding a Hadoop-based data reservoir to an Oracle data warehouse. The presentation discusses using a data reservoir to store large amounts of raw customer data from various sources to enable 360-degree customer analysis. It describes loading and integrating the data reservoir with the data warehouse using Oracle tools and how organizations can use it for more personalized customer marketing through advanced analytics and machine learning.
Innovation in the Data Warehouse - StampedeCon 2016StampedeCon
Enterprise Holding’s first started with Hadoop as a POC in 2013. Today, we have clusters on premises and in the cloud. This talk will explore our experience with Big Data and outline three common big data architectures (batch, lambda, and kappa). Then, we’ll dive into the decision points to necessary for your own cluster, for example: cloud vs on premises, physical vs virtual, workload, and security. These decisions will help you understand what direction to take. Finally, we’ll share some lessons learned with the pieces of our architecture worked well and rant about those which didn’t. No deep Hadoop knowledge is necessary, architect or executive level.
Unlock the value in your big data reservoir using oracle big data discovery a...Mark Rittman
The document discusses Oracle Big Data Discovery and how it can be used to analyze and gain insights from data stored in a Hadoop data reservoir. It provides an example scenario where Big Data Discovery is used to analyze website logs, tweets, and website posts and comments to understand popular content and influencers for a company. The data is ingested into the Big Data Discovery tool, which automatically enriches the data. Users can then explore the data, apply additional transformations, and visualize relationships to gain insights.
The document discusses the Lambda architecture, which combines batch and stream processing. It provides an example implementation using Hadoop, Kafka, Storm and other tools. The Lambda architecture handles batch loading and querying of large datasets as well as real-time processing of data streams. It also discusses using YARN and Spark for distributed processing and refreshing enrichments.
Building a Next-gen Data Platform and Leveraging the OSS Ecosystem for Easy W...StampedeCon
This session will be a detailed recount of the design, implementation, and launch of the next-generation Shutterstock Data Platform, with strong emphasis on conveying clear, understandable learnings that can be transferred to your own organizations and projects. This platform was architected around the prevailing use of Kafka as a highly-scalable central data hub for shipping data across your organization in batch or streaming fashion. It also relies heavily on Avro as a serialization format and a global schema registry to provide structure that greatly improves quality and usability of our data sets, while also allowing the flexibility to evolve schemas and maintain backwards compatibility.
As a company, Shutterstock has always focused heavily on leveraging open source technologies in developing its products and infrastructure, and open source has been a driving force in big data more so than almost any other software sub-sector. With this plethora of constantly evolving data technologies, it can be a daunting task to select the right tool for your problem. We will discuss our approach for choosing specific existing technologies and when we made decisions to invest time in home-grown components and solutions.
We will cover advantages and the engineering process of developing language-agnostic APIs for publishing to and consuming from the data platform. These APIs can power some very interesting streaming analytics solutions that are easily accessible to teams across our engineering organization.
We will also discuss some of the massive advantages a global schema for your data provides for downstream ETL and data analytics. ETL into Hadoop and creation and maintenance of Hive databases and tables becomes much more reliable and easily automated with historically compatible schemas. To complement this schema-based approach, we will cover results of performance testing various file formats and compression schemes in Hadoop and Hive, the massive performance benefits you can gain in analytical workloads by leveraging highly optimized columnar file formats such as ORC and Parquet, and how you can use good old fashioned Hive as a tool for easily and efficiently converting exiting datasets into these formats.
Finally, we will cover lessons learned in launching this platform across our organization, future improvements and further design, and the need for data engineers to understand and speak the languages of data scientists and web, infrastructure, and network engineers.
Syncsort, Tableau, & Cloudera present: Break the Barriers to Big Data InsightSteven Totman
Demand for quicker access to multiple integrated sources of data continues to rise. Immediate access to data stored in a variety of systems - such as mainframes, data warehouses, and data marts - to mine visually for business intelligence is the competitive differentiation enterprises need to win in today’s economy.
Stop playing the waiting game and learn about a new end-to-end solution for combining, analyzing, and visualizing data from practically any source in your enterprise environment.
Leading organizations are already taking advantage of this architectural innovation to gain modern insights while reducing costs and propelling their businesses ahead of the competition.
Are you tired of waiting? Don't let your architecture hold you back. Access this webinar and hear from a team of industry experts on how you can Break the Barriers to Big Data Insight.
Introduction to Kudu - StampedeCon 2016StampedeCon
Over the past several years, the Hadoop ecosystem has made great strides in its real-time access capabilities, narrowing the gap compared to traditional database technologies. With systems such as Impala and Spark, analysts can now run complex queries or jobs over large datasets within a matter of seconds. With systems such as Apache HBase and Apache Phoenix, applications can achieve millisecond-scale random access to arbitrarily-sized datasets.
Despite these advances, some important gaps remain that prevent many applications from transitioning to Hadoop-based architectures. Users are often caught between a rock and a hard place: columnar formats such as Apache Parquet offer extremely fast scan rates for analytics, but little to no ability for real-time modification or row-by-row indexed access. Online systems such as HBase offer very fast random access, but scan rates that are too slow for large scale data warehousing workloads.
This talk will investigate the trade-offs between real-time transactional access and fast analytic performance from the perspective of storage engine internals. It will also describe Kudu, the new addition to the open source Hadoop ecosystem that fills the gap described above, complementing HDFS and HBase to provide a new option to achieve fast scans and fast random access from a single API.
Syncsort, Tableau, & Cloudera present: Break the Barriers to Big Data InsightPrecisely
The document discusses moving legacy data and workloads from traditional data warehouses to Hadoop. It describes how ELT processes on dormant data waste resources and how offloading this data to Hadoop can optimize costs and performance. The presentation includes a demonstration of using Tableau for self-service analytics on data in Hadoop and a case study of a financial organization reducing ELT development time from weeks to hours by offloading mainframe data to Hadoop.
TimesTen - Beyond the Summary Advisor (ODTUG KScope'14)Mark Rittman
Presentation from ODTUG KScope'14, Seattle, on using TimesTen as a standalone analytic database, and going beyond the use of the Exalytics Summary Advisor.
Syncsort, Tableau, & Cloudera present: Break the Barriers to Big Data InsightCloudera, Inc.
Rethink data management and learn how to break down barriers to Big Data insight with Cloudera's enterprise data hub (EDH), Syncsort offload solutions, and Tableau Software visualization and analytics.
Format Wars: from VHS and Beta to Avro and ParquetDataWorks Summit
The document discusses different data storage formats such as text, Avro, Parquet, and their suitability for writing and reading data. It provides examples of how to choose a format based on factors like query needs, data types, and whether schemas need to evolve. The document also demonstrates how Avro can handle schema evolution by adding or changing fields while still reading existing data.
This document discusses strategies for filling a data lake by improving the process of data onboarding. It advocates using a template-based approach to streamline data ingestion from various sources and reduce dependence on hardcoded procedures. The key aspects are managing ELT templates and metadata through automated metadata extraction. This allows generating integration jobs dynamically based on metadata passed at runtime, providing flexibility to handle different source data with one template. It emphasizes reducing the risks associated with large data onboarding projects by maintaining a standardized and organized data lake.
The document discusses Ivan Zoratti's presentation on using MySQL for big data. It defines big data and how it can be structured as either unstructured or structured data. It then outlines various technologies that can be used with MySQL like storage engines, partitioning, columnar databases, and the MariaDB optimizer. The presentation provides an overview of how these technologies can help manage large and complex data sets with MySQL.
Pivotal HAWQ and Hortonworks Data Platform: Modern Data Architecture for IT T...VMware Tanzu
Pivotal HAWQ, one of the world’s most advanced enterprise SQL on Hadoop technology, coupled with the Hortonworks Data Platform, the only 100% open source Apache Hadoop data platform, can turbocharge your analytic efforts. The slides from this technical webinar present a deep dive on this powerful modern data architecture for analytics and data science.
Learn more here: http://pivotal.io/big-data/pivotal-hawq
Integrated Data Warehouse with Hadoop and Oracle DatabaseGwen (Chen) Shapira
This document discusses building an integrated data warehouse with Oracle Database and Hadoop. It provides an overview of big data and why data warehouses need Hadoop. It also gives examples of how Hadoop can be integrated into a data warehouse, including using Sqoop to import and export data between Hadoop and Oracle. Finally, it discusses best practices for using Hadoop efficiently and avoiding common pitfalls when integrating Hadoop with a data warehouse.
Using Oracle Big Data SQL 3.0 to add Hadoop & NoSQL to your Oracle Data Wareh...Mark Rittman
As presented at OGh SQL Celebration Day in June 2016, NL. Covers new features in Big Data SQL including storage indexes, storage handlers and ability to install + license on commodity hardware
Oracle Cloud : Big Data Use Cases and ArchitectureRiccardo Romani
Oracle Itay Systems Presales Team presents : Big Data in any flavor, on-prem, public cloud and cloud at customer.
Presentation done at Digital Transformation event - February 2017
Big Data for Oracle Devs - Towards Spark, Real-Time and Predictive AnalyticsMark Rittman
This is a session for Oracle DBAs and devs that looks at the cutting edge big data techs like Spark, Kafka etc, and through demos shows how Hadoop is now a a real-time platform for fast analytics, data integration and predictive modeling
Fulfilling Real-Time Analytics on Oracle BI Applications PlatformPerficient, Inc.
Shiv Bharti is the Practice Director of Perficient's National Oracle Business Intelligence Practice. He has over 15 years of experience implementing Oracle BI solutions. Perficient is an Oracle Platinum Partner that has completed over 400 Oracle BI projects. The presentation discusses Oracle BI Applications, real-time BI, metadata modeling steps for real-time analytics using OBIEE, and a customer case study where Perficient implemented Oracle BI Applications for a large manufacturing company.
Oracle BI Hybrid BI : Mode 1 + Mode 2, Cloud + On-Premise Business AnalyticsMark Rittman
Mark Rittman, founder of Rittman Mead, discusses Oracle's approach to hybrid BI deployments and how it aligns with Gartner's vision of a modern BI platform. He explains how Oracle BI 12c supports both traditional top-down modeling and bottom-up data discovery. It also enables deploying components on-premises or in the cloud for flexibility. Rittman believes the future is bi-modal, with IT enabling self-service analytics alongside centralized governance.
The document discusses how utilities can embrace data-driven business models to compete in a changing landscape. It presents examples of data-driven companies like Google's Project SunRoof and Tesla Solar Roof. Oracle argues that it can help utilities innovate by providing solutions that work with utilities' existing network assets and business processes. Oracle's cross-competence organization combines customer experience, technologies, artificial intelligence, and big data. Examples are provided of potential data-driven concepts using Oracle solutions like predictive maintenance apps, augmented customer experiences, and digital finance tools.
Deploying OBIEE in the Cloud - Oracle Openworld 2014Mark Rittman
Introduction to Oracle BI Cloud Service (BICS) including administration, data upload, creating the repository and creating dashboards and reports. Also includes a short case-study around Salesforce.com reporting created for the BICS beta program.
Using Endeca with Oracle Exalytics - Oracle France BI Customer Event, October...Mark Rittman
Short presentation at the Oracle France BI Customer event in Paris, October 2013, on the advantage of running Endeca Information Discovery on Oracle Exalytics In-Memory Machine.
UKOUG Tech 15 - Migration from Oracle Warehouse Builder to Oracle Data Integr...Jérôme Françoisse
This document discusses migrating from Oracle Warehouse Builder (OWB) to Oracle Data Integrator (ODI). It describes the similarities and differences between OWB and ODI, the migration utility for converting OWB objects to ODI, the steps for performing a migration, and issues that may be encountered such as mappings that do not migrate or require changes. It also covers OWB and ODI architecture and using the ODI scheduler and monitoring tools.
UKOUG Tech'14 Super Sunday : Deep-Dive into Big Data ETL with ODI12cMark Rittman
This document discusses using Hadoop and Hive for ETL work. It provides an overview of using Hadoop for distributed processing and storage of large datasets. It describes how Hive provides a SQL interface for querying data stored in Hadoop and how various Apache tools can be used to load, transform and store data in Hadoop. Examples of using Hive to view table metadata and run queries are also presented.
2-in-1 : RPD Magic and Hyperion Planning "Adapter"Gianni Ceresa
The best way to explain the powerful modeling capabilities of OBIEE is with a real-use case, and for this session it will be the relational database of Hyperion Planning applications, not a model designed for reporting but for application needs. Use the advanced modeling capabilities of OBIEE to transform the relational schema of Hyperion Planning with what we call some "RPD magic." The target is to fill the gap when reporting against Planning, using only Essbase and missing the content of the relational database, to end with a federation of Essbase and the relational source without ETL, just with OBIEE and some magic.
As 11.1.1.9 with a native Planning driver was released just few weeks before the presentation it was changed to cover this new driver and then completed with some proper RPD magic to still keep the dual content.
Real-Time Data Replication to Hadoop using GoldenGate 12c AdaptorsMichael Rainey
Oracle GoldenGate 12c is well known for its highly performant data replication between relational databases. With the GoldenGate Adaptors, the tool can now apply the source transactions to a Big Data target, such as HDFS. In this session, we'll explore the different options for utilizing Oracle GoldenGate 12c to perform real-time data replication from a relational source database into HDFS. The GoldenGate Adaptors will be used to load movie data from the source to HDFS for use by Hive. Next, we'll take the demo a step further and publish the source transactions to a Flume agent, allowing Flume to handle the final load into the targets.
Presented at the Oracle Technology Network Virtual Technology Summit February/March 2015.
OBIEE11g Seminar by Mark Rittman for OU Expert Summit, Dubai 2015Mark Rittman
Slides from a two-day OBIEE11g seminar in Dubai, February 2015, at the Oracle University Expert Summit. Covers the following topics:
1. OBIEE 11g Overview & New Features
2. Adding Exalytics and In-Memory Analytics to OBIEE 11g
3. Source Control and Concurrent Development for OBIEE
4. No Silver Bullets - OBIEE 11g Performance in the Real World
5. Oracle BI Cloud Service Overview, Tips and Techniques
6. Moving to Oracle BI Applications 11g + ODI
7. Oracle Essbase and Oracle BI EE 11g Integration Tips and Techniques
8. OBIEE 11g and Predictive Analytics, Hadoop & Big Data
Oracle Warehouse Builder (OWB) and Oracle Data Integrator (ODI) are both Oracle products used for data integration. ODI is the strategic tool Oracle chose for the future and therefore further versions of OWB will not be released anymore. So the question is : How can OWB developers make the switch to ODI?
This talk aims at introducing this product with a particular focus for Oracle Warehouse Builder developers. It covers key aspects of the product while similarities and differences with its predecessor are highlighted. The big question is of course covered : How to migrate from Oracle Warehouse Builder to Oracle Data Integrator?
After this discovery, the OWB developer can serenely start its journey.
The document discusses Endeca, a product for delivering search and navigation capabilities. It provides an overview of Endeca 3.0's components, including the Endeca Server, Integrator Server, and Portal. It also covers how Endeca integrates with other technologies like Oracle BI Publisher, and how to configure Endeca, map data, implement security features, and optimize installations.
Ougn2013 high speed, in-memory big data analysis with oracle exalyticsMark Rittman
The document discusses Oracle Exalytics, a platform for high speed, big data analysis. Exalytics combines Oracle Business Intelligence software with specialized hardware to enable high-density visualization of large datasets and support of many concurrent users. It also integrates Oracle Essbase, Endeca, and Hadoop to provide additional analytic capabilities for both structured and unstructured data.
Mark Rittman is an Oracle ACE Director and co-founder of Rittman Mead, a specialist Oracle BI consulting firm. He has over 15 years of experience with Oracle technologies including BI, OLAP, and the Oracle database. He is a regular speaker at Oracle OpenWorld and columnist for Oracle Magazine. He has authored two books on Oracle BI through Oracle Press. This document provides an introduction and overview of Oracle Business Intelligence including its semantic business model, interactive dashboards, and integration capabilities. Demonstrations are shown of the semantic model, dashboard creation, and integration with Oracle Fusion Middleware.
GoldenGate and Oracle Data Integrator - A Perfect Match...Michael Rainey
Oracle Data Integrator and Oracle GoldenGate excel as standalone products, but paired together they are the perfect match for real-time data warehousing. Following Oracle’s Next Generation Reference Data Warehouse Architecture, this discussion will provide best practices on how to configure, implement, and process data in real-time using ODI and GoldenGate. Attendees will see common real-time challenges solved, including parent-child relationships within micro-batch ETL.
Presented at Rittman Mead BI Forum 2013 Masterclass.
This document summarizes a case study on using Exadata and Oracle Business Intelligence Enterprise Edition (OBIEE) for a large online retailer. The retailer needed to analyze large amounts of transaction data in real-time to optimize pricing, inventory, and customer satisfaction. They implemented Exadata to handle high data volumes and processing power, OBIEE for self-service analytics and dashboards, and an agile development approach to quickly deliver insights from data. This new system provided real-time access to data and a single consistent view of the business to help the retailer gain competitive advantages.
OBIEE, Endeca, Hadoop and ORE Development (on Exalytics) (ODTUG 2013)Mark Rittman
A presentation from ODTUG 2013 on tools other than OBIEE for Exalytics, focusing on analysis of non-traditional data via Endeca, "big data" via Hadoop and statistical analysis / predictive modeling through Oracle R Enterprise, and the benefits of running these tools on Oracle Exalytics
How to Integrate OBIEE and Essbase / EPM Suite (OOW 2012)Mark Rittman
Oracle plans to integrate Oracle Essbase and the EPM product suite with Oracle Business Intelligence Enterprise Edition and Oracle Fusion Middleware. So with the latest release of Oracle Business Intelligence Enterprise Edition, 11.1.1.6, how do you connect Oracle Business Intelligence Enterprise Edition to your Oracle Essbase databases and how well does it handle Oracle Essbase features such as scenario and account dimensions, changing outlines, and unbalanced/parent-child hierarchies? How well do Oracle Business Intelligence Enterprise Edition’s ad hoc reporting tools handle Oracle Essbase hierarchies and member selections in the 11.1.1.6 release? Can we still embed Oracle Business Intelligence Enterprise Edition dashboards in Oracle Workspaces? Learn the answers in this session.
Presentation by Mark Rittman, Technical Director, Rittman Mead, on ODI 11g features that support enterprise deployment and usage. Delivered at BIWA Summit 2013, January 2013.
KScope14 - Real-Time Data Warehouse Upgrade - Success StoriesMichael Rainey
Providing real-time data to its global customers is a necessity for IFPI (International Federation of the Phonographic Industry), a not-for-profit organization with a mission to safeguard the rights of record producers and promote the value of recorded music. Using Oracle Streams and Oracle Warehouse Builder (OWB) for real-time data replication and integration, meeting this goal was becoming a challenge. The solution was difficult to maintain and overall throughput was degrading as data volume increased. The need for greater stability and performance led IFPI to implement Oracle GoldenGate and Oracle Data Integrator. This session will describe the innovative approach taken to complete the migration from a Streams and OWB implementation to a more robust, maintainable, and performant GoldenGate and ODI integrated solution.
Oracle Exalytics - Tips and Experiences from the Field (Enkitec E4 Conference...Mark Rittman
Presentation by Rittman Mead's Mark Rittman and Stewart Bryson on our experiences 1-year on with Exalytics. Includes sections on aggregate caching and datamart loading into TT, use of Essbase as a TT alternative, and deployment patterns we see on client sites.
Inside Oracle Exalytics and Oracle TimesTen for Exalytics - Hotsos 2012Mark Rittman
Mark Rittman presented on Oracle Exalytics and Oracle TimesTen for Exalytics at the Hotsos Symposium 2012. He discussed (1) what Exalytics is as an in-memory appliance for Oracle Business Intelligence that combines specialized hardware and optimized software, (2) how it addresses performance issues for analytics workloads by caching data and aggregates in memory, and (3) its architecture which includes optimized versions of OBIEE and Essbase running on TimesTen for fast in-memory analytics.
Demystifying Data Warehouse as a Service (DWaaS)Kent Graziano
This is from the talk I gave at the 30th Anniversary NoCOUG meeting in San Jose, CA.
We all know that data warehouses and best practices for them are changing dramatically today. As organizations build new data warehouses and modernize established ones, they are turning to Data Warehousing as a Service (DWaaS) in hopes of taking advantage of the performance, concurrency, simplicity, and lower cost of a SaaS solution or simply to reduce their data center footprint (and the maintenance that goes with that).
But what is a DWaaS really? How is it different from traditional on-premises data warehousing?
In this talk I will:
• Demystify DWaaS by defining it and its goals
• Discuss the real-world benefits of DWaaS
• Discuss some of the coolest features in a DWaaS solution as exemplified by the Snowflake Elastic Data Warehouse.
The Future of Analytics, Data Integration and BI on Big Data PlatformsMark Rittman
The document discusses the future of analytics, data integration, and business intelligence (BI) on big data platforms like Hadoop. It covers how BI has evolved from old-school data warehousing to enterprise BI tools to utilizing big data platforms. New technologies like Impala, Kudu, and dataflow pipelines have made Hadoop fast and suitable for analytics. Machine learning can be used for automatic schema discovery. Emerging open-source BI tools and platforms, along with notebooks, bring new approaches to BI. Hadoop has become the default platform and future for analytics.
Using Oracle Big Data Discovey as a Data Scientist's ToolkitMark Rittman
As delivered at Trivadis Tech Event 2016 - how Big Data Discovery along with Python and pySpark was used to build predictive analytics models against wearables and smart home data
SQL-on-Hadoop for Analytics + BI: What Are My Options, What's the Future?Mark Rittman
There are many options for providing SQL access over data in a Hadoop cluster, including proprietary vendor products along with open-source technologies such as Apache Hive, Cloudera Impala and Apache Drill; customers are using those to provide reporting over their Hadoop and relational data platforms, and looking to add capabilities such as calculation engines, data integration and federation along with in-memory caching to create complete analytic platforms. In this session we’ll look at the options that are available, compare database vendor solutions with their open-source alternative, and see how emerging vendors are going beyond simple SQL-on-Hadoop products to offer complete “data fabric” solutions that bring together old-world and new-world technologies and allow seamless offloading of archive data and compute work to lower-cost Hadoop platforms.
Social Network Analysis using Oracle Big Data Spatial & Graph (incl. why I di...Mark Rittman
The document discusses using Hadoop and NoSQL technologies like Apache HBase to perform social network analysis on Twitter data related to a company's website and blog. It describes ingesting tweet and website log data into Hadoop HDFS and processing it with tools like Hive. Graph algorithms from Oracle Big Data Spatial & Graph were then used on the property graph stored in HBase to identify influential Twitter users and communities. This approach provided real-time insights at scale compared to using a traditional relational database.
IlOUG Tech Days 2016 - Big Data for Oracle Developers - Towards Spark, Real-T...Mark Rittman
Mark Rittman, CTO of Rittman Mead, gave a keynote presentation on big data for Oracle developers and DBAs with a focus on Apache Spark, real-time analytics, and predictive analytics. He discussed how Hadoop can provide flexible, cheap storage for logs, feeds, and social data. He also explained several Hadoop processing frameworks like Apache Spark, Apache Tez, Cloudera Impala, and Apache Drill that provide faster alternatives to traditional MapReduce processing.
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...Mark Rittman
Mark Rittman from Rittman Mead presented on Oracle Big Data Discovery. He discussed how many organizations are running big data initiatives involving loading large amounts of raw data into data lakes for analysis. Oracle Big Data Discovery provides a visual interface for exploring, analyzing, and transforming this raw data. It allows users to understand relationships in the data, perform enrichments, and prepare the data for use in tools like Oracle Business Intelligence.
Enkitec E4 Barcelona : SQL and Data Integration Futures on Hadoop : Mark Rittman
Mark Rittman gave a presentation on the future of analytics on Oracle Big Data Appliance. He discussed how Hadoop has enabled highly scalable and affordable cluster computing using technologies like MapReduce, Hive, Impala, and Parquet. Rittman also talked about how these technologies have improved query performance and made Hadoop suitable for both batch and interactive/ad-hoc querying of large datasets.
Gluent New World #02 - SQL-on-Hadoop : A bit of History, Current State-of-the...Mark Rittman
Hadoop and NoSQL platforms initially focused on Java developers and slow but massively-scalable MapReduce jobs as an alternative to high-end but limited-scale analytics RDBMS engines. Apache Hive opened-up Hadoop to non-programmers by adding a SQL query engine and relational-style metadata layered over raw HDFS storage, and since then open-source initiatives such as Hive Stinger, Cloudera Impala and Apache Drill along with proprietary solutions from closed-source vendors have extended SQL-on-Hadoop’s capabilities into areas such as low-latency ad-hoc queries, ACID-compliant transactions and schema-less data discovery – at massive scale and with compelling economics.
In this session we’ll focus on technical foundations around SQL-on-Hadoop, first reviewing the basic platform Apache Hive provides and then looking in more detail at how ad-hoc querying, ACID-compliant transactions and data discovery engines work along with more specialised underlying storage that each now work best with – and we’ll take a look to the future to see how SQL querying, data integration and analytics are likely to come together in the next five years to make Hadoop the default platform running mixed old-world/new-world analytics workloads.
Riga dev day 2016 adding a data reservoir and oracle bdd to extend your ora...Mark Rittman
This talk focus is on what a data reservoir is, how it related to the RDBMS DW, and how Big Data Discovery provides access to it to business and BI users
OBIEE12c and Embedded Essbase 12c - An Initial Look at Query Acceleration Use...Mark Rittman
OBIEE12c comes with an updated version of Essbase that focuses entirely in this release on the query acceleration use-case. This presentation looks at this new release and explains how the new BI Accelerator Wizard manages the creation of Essbase cubes to accelerate OBIEE query performance
Deploying Full Oracle BI Platforms to Oracle Cloud - OOW2015Mark Rittman
- Mark Rittman presented on deploying full OBIEE systems to Oracle Cloud. This involves migrating the data warehouse to Oracle Database Cloud Service, updating the RPD to connect to the cloud database, and uploading the RPD to Oracle BI Cloud Service. Using the wider Oracle PaaS ecosystem allows hosting a full BI platform in the cloud.
Download YouTube By Click 2025 Free Full Activatedsaniamalik72555
Copy & Past Link 👉👉
https://dr-up-community.info/
"YouTube by Click" likely refers to the ByClick Downloader software, a video downloading and conversion tool, specifically designed to download content from YouTube and other video platforms. It allows users to download YouTube videos for offline viewing and to convert them to different formats.
Meet the Agents: How AI Is Learning to Think, Plan, and CollaborateMaxim Salnikov
Imagine if apps could think, plan, and team up like humans. Welcome to the world of AI agents and agentic user interfaces (UI)! In this session, we'll explore how AI agents make decisions, collaborate with each other, and create more natural and powerful experiences for users.
Designing AI-Powered APIs on Azure: Best Practices& ConsiderationsDinusha Kumarasiri
AI is transforming APIs, enabling smarter automation, enhanced decision-making, and seamless integrations. This presentation explores key design principles for AI-infused APIs on Azure, covering performance optimization, security best practices, scalability strategies, and responsible AI governance. Learn how to leverage Azure API Management, machine learning models, and cloud-native architectures to build robust, efficient, and intelligent API solutions
Not So Common Memory Leaks in Java WebinarTier1 app
This SlideShare presentation is from our May webinar, “Not So Common Memory Leaks & How to Fix Them?”, where we explored lesser-known memory leak patterns in Java applications. Unlike typical leaks, subtle issues such as thread local misuse, inner class references, uncached collections, and misbehaving frameworks often go undetected and gradually degrade performance. This deck provides in-depth insights into identifying these hidden leaks using advanced heap analysis and profiling techniques, along with real-world case studies and practical solutions. Ideal for developers and performance engineers aiming to deepen their understanding of Java memory management and improve application stability.
How Valletta helped healthcare SaaS to transform QA and compliance to grow wi...Egor Kaleynik
This case study explores how we partnered with a mid-sized U.S. healthcare SaaS provider to help them scale from a successful pilot phase to supporting over 10,000 users—while meeting strict HIPAA compliance requirements.
Faced with slow, manual testing cycles, frequent regression bugs, and looming audit risks, their growth was at risk. Their existing QA processes couldn’t keep up with the complexity of real-time biometric data handling, and earlier automation attempts had failed due to unreliable tools and fragmented workflows.
We stepped in to deliver a full QA and DevOps transformation. Our team replaced their fragile legacy tests with Testim’s self-healing automation, integrated Postman and OWASP ZAP into Jenkins pipelines for continuous API and security validation, and leveraged AWS Device Farm for real-device, region-specific compliance testing. Custom deployment scripts gave them control over rollouts without relying on heavy CI/CD infrastructure.
The result? Test cycle times were reduced from 3 days to just 8 hours, regression bugs dropped by 40%, and they passed their first HIPAA audit without issue—unlocking faster contract signings and enabling them to expand confidently. More than just a technical upgrade, this project embedded compliance into every phase of development, proving that SaaS providers in regulated industries can scale fast and stay secure.
⭕️➡️ FOR DOWNLOAD LINK : http://drfiles.net/ ⬅️⭕️
Maxon Cinema 4D 2025 is the latest version of the Maxon's 3D software, released in September 2024, and it builds upon previous versions with new tools for procedural modeling and animation, as well as enhancements to particle, Pyro, and rigid body simulations. CG Channel also mentions that Cinema 4D 2025.2, released in April 2025, focuses on spline tools and unified simulation enhancements.
Key improvements and features of Cinema 4D 2025 include:
Procedural Modeling: New tools and workflows for creating models procedurally, including fabric weave and constellation generators.
Procedural Animation: Field Driver tag for procedural animation.
Simulation Enhancements: Improved particle, Pyro, and rigid body simulations.
Spline Tools: Enhanced spline tools for motion graphics and animation, including spline modifiers from Rocket Lasso now included for all subscribers.
Unified Simulation & Particles: Refined physics-based effects and improved particle systems.
Boolean System: Modernized boolean system for precise 3D modeling.
Particle Node Modifier: New particle node modifier for creating particle scenes.
Learning Panel: Intuitive learning panel for new users.
Redshift Integration: Maxon now includes access to the full power of Redshift rendering for all new subscriptions.
In essence, Cinema 4D 2025 is a major update that provides artists with more powerful tools and workflows for creating 3D content, particularly in the fields of motion graphics, VFX, and visualization.
Avast Premium Security Crack FREE Latest Version 2025mu394968
🌍📱👉COPY LINK & PASTE ON GOOGLE https://dr-kain-geera.info/👈🌍
Avast Premium Security is a paid subscription service that provides comprehensive online security and privacy protection for multiple devices. It includes features like antivirus, firewall, ransomware protection, and website scanning, all designed to safeguard against a wide range of online threats, according to Avast.
Key features of Avast Premium Security:
Antivirus: Protects against viruses, malware, and other malicious software, according to Avast.
Firewall: Controls network traffic and blocks unauthorized access to your devices, as noted by All About Cookies.
Ransomware protection: Helps prevent ransomware attacks, which can encrypt your files and hold them hostage.
Website scanning: Checks websites for malicious content before you visit them, according to Avast.
Email Guardian: Scans your emails for suspicious attachments and phishing attempts.
Multi-device protection: Covers up to 10 devices, including Windows, Mac, Android, and iOS, as stated by 2GO Software.
Privacy features: Helps protect your personal data and online privacy.
In essence, Avast Premium Security provides a robust suite of tools to keep your devices and online activity safe and secure, according to Avast.
Pixologic ZBrush Crack Plus Activation Key [Latest 2025] New Versionsaimabibi60507
Copy & Past Link👉👉
https://dr-up-community.info/
Pixologic ZBrush, now developed by Maxon, is a premier digital sculpting and painting software renowned for its ability to create highly detailed 3D models. Utilizing a unique "pixol" technology, ZBrush stores depth, lighting, and material information for each point on the screen, allowing artists to sculpt and paint with remarkable precision .
TestMigrationsInPy: A Dataset of Test Migrations from Unittest to Pytest (MSR...Andre Hora
Unittest and pytest are the most popular testing frameworks in Python. Overall, pytest provides some advantages, including simpler assertion, reuse of fixtures, and interoperability. Due to such benefits, multiple projects in the Python ecosystem have migrated from unittest to pytest. To facilitate the migration, pytest can also run unittest tests, thus, the migration can happen gradually over time. However, the migration can be timeconsuming and take a long time to conclude. In this context, projects would benefit from automated solutions to support the migration process. In this paper, we propose TestMigrationsInPy, a dataset of test migrations from unittest to pytest. TestMigrationsInPy contains 923 real-world migrations performed by developers. Future research proposing novel solutions to migrate frameworks in Python can rely on TestMigrationsInPy as a ground truth. Moreover, as TestMigrationsInPy includes information about the migration type (e.g., changes in assertions or fixtures), our dataset enables novel solutions to be verified effectively, for instance, from simpler assertion migrations to more complex fixture migrations. TestMigrationsInPy is publicly available at: https://github.com/altinoalvesjunior/TestMigrationsInPy.
Exploring Wayland: A Modern Display Server for the FutureICS
Wayland is revolutionizing the way we interact with graphical interfaces, offering a modern alternative to the X Window System. In this webinar, we’ll delve into the architecture and benefits of Wayland, including its streamlined design, enhanced performance, and improved security features.
Exceptional Behaviors: How Frequently Are They Tested? (AST 2025)Andre Hora
Exceptions allow developers to handle error cases expected to occur infrequently. Ideally, good test suites should test both normal and exceptional behaviors to catch more bugs and avoid regressions. While current research analyzes exceptions that propagate to tests, it does not explore other exceptions that do not reach the tests. In this paper, we provide an empirical study to explore how frequently exceptional behaviors are tested in real-world systems. We consider both exceptions that propagate to tests and the ones that do not reach the tests. For this purpose, we run an instrumented version of test suites, monitor their execution, and collect information about the exceptions raised at runtime. We analyze the test suites of 25 Python systems, covering 5,372 executed methods, 17.9M calls, and 1.4M raised exceptions. We find that 21.4% of the executed methods do raise exceptions at runtime. In methods that raise exceptions, on the median, 1 in 10 calls exercise exceptional behaviors. Close to 80% of the methods that raise exceptions do so infrequently, but about 20% raise exceptions more frequently. Finally, we provide implications for researchers and practitioners. We suggest developing novel tools to support exercising exceptional behaviors and refactoring expensive try/except blocks. We also call attention to the fact that exception-raising behaviors are not necessarily “abnormal” or rare.
Who Watches the Watchmen (SciFiDevCon 2025)Allon Mureinik
Tests, especially unit tests, are the developers’ superheroes. They allow us to mess around with our code and keep us safe.
We often trust them with the safety of our codebase, but how do we know that we should? How do we know that this trust is well-deserved?
Enter mutation testing – by intentionally injecting harmful mutations into our code and seeing if they are caught by the tests, we can evaluate the quality of the safety net they provide. By watching the watchmen, we can make sure our tests really protect us, and we aren’t just green-washing our IDEs to a false sense of security.
Talk from SciFiDevCon 2025
https://www.scifidevcon.com/courses/2025-scifidevcon/contents/680efa43ae4f5
FL Studio Producer Edition Crack 2025 Full Versiontahirabibi60507
Copy & Past Link 👉👉
http://drfiles.net/
FL Studio is a Digital Audio Workstation (DAW) software used for music production. It's developed by the Belgian company Image-Line. FL Studio allows users to create and edit music using a graphical user interface with a pattern-based music sequencer.
Adobe Master Collection CC Crack Advance Version 2025kashifyounis067
🌍📱👉COPY LINK & PASTE ON GOOGLE http://drfiles.net/ 👈🌍
Adobe Master Collection CC (Creative Cloud) is a comprehensive subscription-based package that bundles virtually all of Adobe's creative software applications. It provides access to a wide range of tools for graphic design, video editing, web development, photography, and more. Essentially, it's a one-stop-shop for creatives needing a broad set of professional tools.
Key Features and Benefits:
All-in-one access:
The Master Collection includes apps like Photoshop, Illustrator, InDesign, Premiere Pro, After Effects, Audition, and many others.
Subscription-based:
You pay a recurring fee for access to the latest versions of all the software, including new features and updates.
Comprehensive suite:
It offers tools for a wide variety of creative tasks, from photo editing and illustration to video editing and web development.
Cloud integration:
Creative Cloud provides cloud storage, asset sharing, and collaboration features.
Comparison to CS6:
While Adobe Creative Suite 6 (CS6) was a one-time purchase version of the software, Adobe Creative Cloud (CC) is a subscription service. CC offers access to the latest versions, regular updates, and cloud integration, while CS6 is no longer updated.
Examples of included software:
Adobe Photoshop: For image editing and manipulation.
Adobe Illustrator: For vector graphics and illustration.
Adobe InDesign: For page layout and desktop publishing.
Adobe Premiere Pro: For video editing and post-production.
Adobe After Effects: For visual effects and motion graphics.
Adobe Audition: For audio editing and mixing.
Societal challenges of AI: biases, multilinguism and sustainabilityJordi Cabot
Towards a fairer, inclusive and sustainable AI that works for everybody.
Reviewing the state of the art on these challenges and what we're doing at LIST to test current LLMs and help you select the one that works best for you
Societal challenges of AI: biases, multilinguism and sustainabilityJordi Cabot
OGH 2015 - Hadoop (Oracle BDA) and Oracle Technologies on BI Projects
1. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected]
W : www.rittmanmead.com
Hadoop (BDA) and Oracle Technologies
on BI Projects
Mark Rittman, CTO, Rittman Mead
Dutch Oracle Users Group, Jan 14th 2015
2. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected]
W : www.rittmanmead.com
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected]
W : www.rittmanmead.com
About the Speaker
•Mark Rittman, Co-Founder of Rittman Mead
•Oracle ACE Director, specialising in Oracle BI&DW
•14 Years Experience with Oracle Technology
•Regular columnist for Oracle Magazine
•Author of two Oracle Press Oracle BI books
•Oracle Business Intelligence Developers Guide
•Oracle Exalytics Revealed
•Writer for Rittman Mead Blog :
http://www.rittmanmead.com/blog
•Email : [email protected]
•Twitter : @markrittman
3. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected]
W : www.rittmanmead.com
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected]
W : www.rittmanmead.com
About Rittman Mead
•Oracle BI and DW Gold partner
•Winner of five UKOUG Partner of the Year awards in 2013 - including BI
•World leading specialist partner for technical excellence,
solutions delivery and innovation in Oracle BI
•Approximately 80 consultants worldwide
•All expert in Oracle BI and DW
•Offices in US (Atlanta), Europe, Australia and India
•Skills in broad range of supporting Oracle tools:
‣OBIEE, OBIA
‣ODIEE
‣Essbase, Oracle OLAP
‣GoldenGate
‣Endeca
4. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected]
W : www.rittmanmead.com
Agenda
•Part 1 : The Hadoop (BDA) technical stack for Oracle BI/DW projects
‣Why are Oracle BI/DW customers adopting Hadoop (BDA) technologies?
‣What are the Oracle and Cloudera products being used?
‣New Oracle products on the roadmap - Big Data Discovery, Big Data SQL futures
‣Where does OBIEE, ODI etc fit in with these new products
‣Rittman Mead’s development platform
•Part 2 : Rittman Mead Hadoop (BDA) + Oracle BI Project Experiences
‣What is Cloudera CDH, and the BDA, like to work with?
‣How do we approach projects and PoCs?
‣What architecture and approach do we actually take, now?
‣How well do OBIEE and ODI work with Hadoop and BDA?
‣What are the emerging techs, products and architectures we see for 2015+?
5. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected]
W : www.rittmanmead.com
Part 1 :
The Hadoop (BDA) technical stack for Oracle BI/DW projects
or … How did we get here?
6. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected]
W : www.rittmanmead.com
15+ Years in Oracle BI and Data Warehousing
•Started back in 1997 on a bank Oracle DW project
•Our tools were Oracle 7.3.4, SQL*Plus, PL/SQL
and shell scripts
•Went on to use Oracle Developer/2000 and Designer/2000
•Our initial users queried the DW using SQL*Plus
•And later on, we rolled-out Discoverer/2000 to everyone else
•And life was fun…
7. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected]
W : www.rittmanmead.com
The Oracle-Centric DW Architecture
•Over time, this data warehouse architecture developed
•Added Oracle Warehouse Builder to
automate and model the DW build
•Oracle 9i Application Server (yay!)
to deliver reports and web portals
•Data Mining and OLAP in the database
•Oracle 9i for in-database ETL (and RAC)
•Data was typically loaded from
Oracle RBDMS and EBS
•It was turtles Oracle all the way down…
8. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected]
W : www.rittmanmead.com
The State of the Art for BI & DW Was This..
•Oracle Discoverer “Drake” - Combining Relational and OLAP Analysis for Oracle RDBMS
•Oracle Portal, part of Oracle 9iAS
•Oracle Warehouse Builder 9iAS / “Paris”
9. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected]
W : www.rittmanmead.com
Then Came Siebel Analytics … and OBIEE
10. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected]
W : www.rittmanmead.com
The Oracle BI & DW World Changed
•Siebel Analytics replaced Oracle DIscoverer
•Oracle Data Integrator replaced Oracle Warehouse Builder
•Hyperion Essbase Replaced Oracle OLAP
•You were as likely to be loading from SQL Server as from Oracle
•They made us do things we didn’t like to do …
‣Add a mid-tier virtual DW engine on
top of the database
‣Export data out of Oracle into an OLAP server
‣Improve query performance using tools
outside of the Oracle data warehouse
‣It was all a bit scary…
‣… Not to mention that WebLogic stuff
11. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected]
W : www.rittmanmead.com
Introducing - The Oracle Reference DW Architecture
•Recognizing the difference between long-term storage of DW data (the “foundation” layer)
•And organizing the data for queries and easy navigation (the “access + performance layer”)
•Also recognising where OBIEE had been game-changing - federated queries
•… Things are good again
12. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected]
W : www.rittmanmead.com
and now …this happened
13. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected]
W : www.rittmanmead.com
14. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected]
W : www.rittmanmead.com
Today’s Oracle Information Management Architecture
Actionable
Events
Event Engine Data
Reservoir
Data Factory Enterprise
Information Store
Reporting
Discovery Lab
Actionable
Information
Actionable
Insights
Input
Events
Execution
Innovation
Discovery
Output
Events
& Data
Structured
Enterprise
Data
Other
Data
15. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected]
W : www.rittmanmead.com
Today’s Layered Data Warehouse Architecture
Virtualization&
QueryFederation
Enterprise
Performance
Management
Pre-built &
Ad-hoc
BI Assets
Information
Services
Data Ingestion
Information Interpretation
Access & Performance Layer
Foundation Data Layer
Raw Data Reservoir
Data
Science
Data Engines &
Poly-structured
sources
Content
Docs Web & Social Media
SMS
Structured
Data
Sources
•Operational Data
•COTS Data
•Master & Ref. Data
•Streaming & BAM
Immutable raw data reservoir
Raw data at rest is not interpreted
Immutable modelled data. Business
Process Neutral form. Abstracted from
business process changes
Past, current and future interpretation of
enterprise data. Structured to support agile
access & navigation
Discovery Lab Sandboxes Rapid Development Sandboxes
Project based data stores to
support specific discovery
objectives
Project based data stored to
facilitate rapid content /
presentation delivery
Data Sources
16. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected]
W : www.rittmanmead.com
The Oracle Data Warehousing Platform - 2014
17. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected]
W : www.rittmanmead.com
Introducing … The “Data Reservoir”?
•A reservoir is a lake than also can process and refine (your data)
•Wide-ranging source of low-density, lower-value data to complement the DW
18. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected]
W : www.rittmanmead.com
Oracle’s Big Data Products
•Oracle Big Data Appliance
‣Optimized hardware for Hadoop processing
‣Cloudera Distribution incl. Hadoop
‣Oracle Big Data Connectors, ODI etc
•Oracle Big Data Connectors
•Oracle Big Data SQL
•Oracle NoSQL Database
•Oracle Data Integrator
•Oracle R Distribution
•OBIEE, BI Publisher and
Endeca Info Discovery
19. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected]
W : www.rittmanmead.com
Just Released - Oracle Big Data SQL
•Part of Oracle Big Data 4.0 (BDA-only)
‣Also requires Oracle Database 12c, Oracle Exadata Database Machine
•… More on this later
Exadata
Storage Servers
Hadoop
Cluster
Exadata Database
Server
Oracle Big
Data SQL
SQL Queries
SmartScan SmartScan
20. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected]
W : www.rittmanmead.com
Coming Soon : Oracle Big Data Discovery
•Combining of Endeca Server search, analysis and visualisation capabilities
with Apache Spark data munging and transformation
‣Analyse, parse, explore and “wrangle” data using graphical tools and a Spark-based
transformation engine
‣Create a catalog of the data on
your Hadoop cluster, then search
that catalog using Endeca Server
‣Create recommendations of other
datasets, based on what
you’re looking at now
‣Visualize your datasets,
discover new insights
21. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected]
W : www.rittmanmead.com
Coming Soon : Oracle Data Enrichment Cloud Service
•Cloud-based service for loading, enriching, cleansing and supplementing Hadoop data
•Part of the Oracle Data Integration product family
•Used up-stream from Big Data Discovery
•Aims to solve the “data quality problem” for Hadoop
22. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected]
W : www.rittmanmead.com
Combining Oracle RDBMS with Hadoop + NoSQL
•High-value, high-density data goes into Oracle RDBMS
•Better support for fast queries, summaries, referential integrity etc
•Lower-value, lower-density data goes into Hadoop + NoSQL
‣Also provides flexible schema, more agile development
•Successful next-generation BI+DW projects combine both - neither on their own is sufficient
23. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected]
W : www.rittmanmead.com
Productising the Next-Generation IM Architecture
24. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected]
W : www.rittmanmead.com
Still a Key Role for Data Integration, and BI Tools
•Fast, scaleable low-cost / flexible-schema data capture using Hadoop + NoSQL (BDA)
•Long-term storage of the most important downstream data - Oracle RBDMS (Exadata)
•Fast analysis + business-friendly interface : OBIEE, Endeca (Exalytics), RTD etc
25. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected]
W : www.rittmanmead.com
OBIEE for Enterprise Analysis Across all Data Sources
•Dashboards, analyses, OLAP analytics, scorecards,
published reporting, mobile
•Presented as an integrated business semantic model
•Optional mid-tier query acceleration using
Oracle Exalytics In-Memory Machine
•Access data from RBDMS, applications,
Hadoop, OLAP, ADF BCs etc
Enterprise Semantic
Business Model
Business Presentation
Layer (Reports, Dashboards)
In-Memory Caching Layer
Application
Sources
Hadoop /
NoSQL
Sources
DW / OLAP Sources
26. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected]
W : www.rittmanmead.com
Bringing it All Together : Oracle Data Integrator 12c
•ODI provides an excellent framework for running Hadoop ETL jobs
‣ELT approach pushes transformations down to Hadoop - leveraging power of cluster
•Hive, HBase, Sqoop and OLH/ODCH KMs provide native Hadoop loading / transformation
‣Whilst still preserving RDBMS push-down
‣Extensible to cover Pig, Spark etc
•Process orchestration
•Data quality / error handling
•Metadata and model-driven
27. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected]
W : www.rittmanmead.com
Oracle’s Product Strategy
28. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected]
W : www.rittmanmead.com
Rittman Mead Hadoop (BDA) + Oracle BI Project Experiences
•Working with (Cloudera) Hadoop, + Hive, NoSQL, etc
•Working with the Oracle Big Data Appliance
•Typical Hadoop + BI Use-Cases
•How Rittman Mead approaches Hadoop + Oracle BI projects
•Hadoop things that keep the CIO awake at night…
•ODI and Hadoop
•OBIEE and Hadoop
•Oracle Big Data SQL
•Futures - Apache Spark, Next-Generation Hive, Big Data Discovery
29. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected]
W : www.rittmanmead.com
Why is Hadoop of Interest to Us?
•Gives us an ability to store more data, at more detail, for longer
•Provides a cost-effective way to analyse vast amounts of data
•Hadoop & NoSQL technologies can give us “schema-on-read” capabilities
•There’s vast amounts of innovation in this area we can harness
•And it’s very complementary to Oracle BI & DW
30. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected]
W : www.rittmanmead.com
Oracle & Hadoop Use-Cases
•Use Hadoop as a low-cost, horizontally-scalable DW archive
•Use Hadoop, Hive and MapReduce for low-cost ETL staging
•Support standalone-Hadoop / Spark analysis with Oracle reference data
•Extend the DW with new data sources, datatypes, detail-level data
31. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected]
W : www.rittmanmead.com
The Killer, Tech-Focused Use Case : Data Reservoir
•A reservoir is a lake than also can process and refine (your data)
•Wide-ranging source of low-density, lower-value data to complement the DW
32. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected]
W : www.rittmanmead.com
Typical Business Use Case : 360 Degree View of Cust / Process
•OLTP transactional tells us what happened (in the past), but not “why”
•Common customer requirement now is to get a “360 degree view” of their activity
‣Understand what’s being said about them
‣External drivers for interest, activity
‣Understand more about customer intent, opinions
•One example is to add details of social media mentions,
likes, tweets and retweets etc to the transactional dataset
‣Correlate twitter activity with sales increases, drops
‣Measure impact of social media strategy
‣Gather and include textual, sentiment, contextual
data from surveys, media etc
33. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected]
W : www.rittmanmead.com
Initial PoC over 4-6 Weeks
•Focus on high-productivity data analyst tools to identify key data, insights
•Typically performed using R, CDH on VMs, lots of scripting, lots of client interaction
•Focus on the “discovery” phase
‣Governance, dashboards,
productionizing can come later
34. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected]
W : www.rittmanmead.com
Discovery vs. Exploitation Project Phases
•Discovery and monetising steps in Big Data projects have different requirements
•Discovery phase
‣Unbounded discovery
‣Self-Service sandbox
‣Wide toolset
•Promotion to Exploitation
‣Commercial exploitation
‣Narrower toolset
‣Integration to operations
‣Non-functional requirements
‣Code standardisation & governance
35. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected]
W : www.rittmanmead.com
Rittman Mead Development Lab
64GB RAM, 6TB Disk
Core i7 4x3.6GHz
VMWare ESXi 5.5
vmhost2 vmhost3 vmhost4 vmhost5
BDP 2.2 Cluster
5 x nodes
CDH 5.3 Cluster
5 x nodes
Kerberos-Secured
CDH 5.2 Cluster
6 x nodes (16-32GB RAM / node)
Oracle RDBMS
OBIEE 11g
ODI12c
BI Apps 11g
KDCLDAP
64GB RAM, 6TB Disk
Core i7 4x3.6GHz
VMWare ESXi 5.5
64GB RAM, 6TB Disk
Core i7 4x3.6GHz
VMWare ESXi 5.5
64GB RAM, 6TB Disk
Core i7 4x3.6GHz
VMWare ESXi 5.5
iSCSI LUN shared VMFS cluster filesystem
Synology DS414 NAS, 6TB
(For testing VMWare VMotion failover, large HDFS datasets etc)
Mac Mini Server
OS X Server
DNS etc
EM 12c R4
VCenter
16GB RAM, 1TB Disk
Core i7 2x3.6GHz
Vigor 2830n Router
VPN, DHCP etc
•4 x 64GB VM Servers
• 256GB RAM across cluster
• 36TB Storage
• VMWare ESXi 5.5 + VCenter
•Additional iSCSI 6TB storage
• Synology DS414 NAS
•Demo / Free Software Installs
‣Cloudera CDH5 Express
- and BDP2.2 for Tez
‣Oracle RBDMS, OBIEE, ODI etc
‣Oracle Big Data Connectors
‣Oracle EM 12cR4
36. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected]
W : www.rittmanmead.com
Cluster Management
•VMWare VSphere 5 + VCenter Server
•Oracle Enterprise Manager 12cR4 Cloud Control
•OSX Server Yosemite
37. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected]
W : www.rittmanmead.com
BigDataLite Demonstration VM
•Demo / Training VM downloadable from OTN
•Contains Cloudera Hadoop + Oracle Big Data Connectors + Big Data SQL
•Similar to setup on Oracle BDA
•Contains OBIEE enabling technologies:
‣Apache Hive (SQL access over Hadoop)
‣Apache HDFS (file storage)
‣Oracle Direct Connector for HDFS
‣Oracle R Advanced Analytics for Hadoop
‣Oracle Big Data SQL
•Great way to get started with Hadoop
‣Requires 8GB RAM, modern laptop etc
38. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected]
W : www.rittmanmead.com
So …
how well does it work?
39. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected]
W : www.rittmanmead.com
Part 2 :
Rittman Mead Hadoop (BDA) + Oracle BI
Project Experiences
40. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected]
W : www.rittmanmead.com
Typical RM Project BDA Topology
•Starter BDA rack, or full rack
•Kerberos-secured using
included KDC server
•Integration with corporate LDAP
for Cloudera Manager, Hue etc
•Developer access through Hue,
Beeline, R Studio
•End-user access through
OBIEE, Endeca and other tools
‣With final datasets usually
exported to Exadata or Exalytics
41. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected]
W : www.rittmanmead.com
Oracle Big Data Appliance
•Engineered system for big data processing and analysis
•Optimized for enterprise Hadoop workloads
•288 Intel® Xeon® E5 Processors
•1152 GB total memory
•648TB total raw storage capacity
‣Cloudera Distribution of Hadoop
‣Cloudera Manager
‣Open-source R
‣Oracle NoSQL Database Community Edition
‣Oracle Enterprise Linux + Oracle JVM
‣New - Oracle Big Data SQL
42. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected]
W : www.rittmanmead.com
Working with Oracle Big Data Appliance
•Don’t underestimate the value of “pre-integrated” - massive time-saver for client
‣No need to integrate Big Data Connectors, ODI Agent etc with HDFS, Hive etc etc
•Single support route - raise SR with Oracle, they will route to Cloudera if needed
•Single patch process for whole cluster - OS, CDH etc etc
•Full access to Cloudera Enterprise features
•Otherwise … just another CDH cluster in terms of SSH access etc
•We like it ;-)
43. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected]
W : www.rittmanmead.com
Cloudera Distribution including Hadoop (CDH)
•Like Linux, you can set up your Hadoop system manually, or use a distribution
•Key Hadoop distributions include Cloudera CDH, Hortonworks HDP, MapR etc
•Cloudera CDH is the distribution Oracle use on Big Data Appliance
‣Provides HDFS and Hadoop framework for BDA
‣Includes Pig, Hive, Sqoop, Oozie, HBase
‣Cloudera Impala for real-time SQL access
‣Cloudera Manager & Hue
44. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected]
W : www.rittmanmead.com
Cloudera Manager and Hue
•Web-based tools provided with Cloudera CDH
•Cloudera Manager used for cluster admin,
maintenance (like Enterprise Manager
‣Commercial tool developed by Cloudera
‣Not enabled by default in BigDataLite VM
•Hue is a developer / analyst tool for
working with Pig, Hive, Sqoop, HDFS etc
‣Open source project included in CDH
45. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected]
W : www.rittmanmead.com
BigDataLite Demonstration VM
•Demo / Training VM downloadable from OTN
•Contains Cloudera Hadoop + Oracle Big Data Connectors + Big Data SQL
•Similar to setup on Oracle BDA
•Contains OBIEE enabling technologies:
‣Apache Hive (SQL access over Hadoop)
‣Apache HDFS (file storage)
‣Oracle Direct Connector for HDFS
‣Oracle R Advanced Analytics for Hadoop
‣Oracle Big Data SQL
•Great way to get started with Hadoop
‣Requires 8GB RAM, modern laptop etc
46. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected]
W : www.rittmanmead.com
Working with Cloudera Hadoop (CDH) - Observations
•Very good product stack, enterprise-friendly, big community, can do lots with free edition
•Cloudera have their favoured Hadoop technologies - Spark, Kafka
•Also makes use of Cloudera-specific tools - Impala, Cloudera Manager etc
•But ignores some tools that have value - Apache Tez for example
•Easy for an Oracle developer to get productive with the CDH stack
•But beware of some immature technologies / products
‣Hive != Oracle SQL
‣Spark is very much an “alpha” product
‣Limitations in things like LDAP integration, end-to-end security
‣Lots of products in stack = lots of places
to go to diagnose issues
47. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected]
W : www.rittmanmead.com
CDH : Things That Work Well
•HDFS as a low-cost, flexible
data store / reservoir; Hive for
SQL access to structured +
semi-structured HDFS data
•Pig, Spark, Python, R
for data analysis and
munging
•Cloudera Manager and
Hue for web-based
admin + dev access
Real-Time
Logs / Events
RDBMS
Imports
File /
Unstructured
Imports
Hive Metastore /
HCatalog
HDFS Cluster Filesystem
48. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected]
W : www.rittmanmead.com
Oracle Big Data Connectors
•Oracle-licensed utilities to connect Hadoop to Oracle RBDMS
‣Bulk-extract data from Hadoop to Oracle, or expose HDFS / Hive data as external tables
‣Run R analysis and processing on Hadoop
‣Leverage Hadoop compute resources to offload ETL and other work from Oracle RBDMS
‣Enable Oracle SQL to access and load Hadoop data
49. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected]
W : www.rittmanmead.com
Working with the Oracle Big Data Connectors
•Oracle Loader for Hadoop, Oracle SQL Connector for HDFS - rarely used
‣Sqoop works both way (Oracle>Hadoop, Hadoop>Oracle) and is “good enough”
‣OSCH replaced by Oracle Big Data SQL for direct Oracle>Hive access
•Oracle R Advanced Analytics for Hadoop has been very useful though
‣Run MapReduce jobs from R
‣Run R functions across Hive tables
50. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected]
W : www.rittmanmead.com
Oracle R Advanced Analytics for Hadoop Key Features
•Run R functions on Hive Dataframes •Write MapReduce functions in R
51. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected]
W : www.rittmanmead.com
Initial Data Scoping & Discovery using R
•R is typically used at start of a big data project to get a high-level understanding of the data
•Can be run as R standalone, or using Oracle R Advanced Analytics for Hadoop
•Do basic scan of incoming dataset, get counts, determine delimiters etc
•Distribution of values for columns
•Basic graphs and data discovery
•Use findings to drive design of
parsing logic, Hive data structures,
need for data scrubbing / correcting etc
52. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected]
W : www.rittmanmead.com
Design Pattern : Discovery Lab
Actionable
Events
Event Engine Data
Reservoir
Data Factory Enterprise
Information Store
Reporting
Discovery Lab
Actionable
Information
Actionable
Insights
Input
Events
Execution
Innovation
Discovery
Output
Events
& Data
Structured
Enterprise
Data
Other
Data
53. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected]
W : www.rittmanmead.com
Design Pattern : Discovery Lab
•Specific focus on identifying commercial value for exploitation
•Small group of highly skilled individuals (aka Data Scientists)
•Iterative development approach – data oriented NOT development oriented
•Wide range of tools and techniques applied
‣Searching and discovering unstructured data
‣Finding correlations and clusters
‣Filtering, aggregating, deriving and enhancing data
•Data provisioned through Data Factory or own ETL
•Typically separate infrastructure but
could also be unified Reservoir if
resource managed effectively
54. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected]
W : www.rittmanmead.com
For the Future - Oracle Big Data Discovery
55. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected]
W : www.rittmanmead.com
Interactive Analysis & Exploration of Hadoop Data
56. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected]
W : www.rittmanmead.com
Share and Collaborate on Big Data Discovery Projects
57. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected]
W : www.rittmanmead.com
Typical RM Big Data Project Tools Used
Data prep via R
scripts, Python scripts
etc
Data Loading
Real-time via Flume Conf scripts
Batch via Sqoop cmd-line exec
Sharing output via
Hive tables, Impala
tables, HDFS files etc
Data Export
Batch via Sqoop
cmd-line exec
a.k.a. “data munging”
Data analysis via R
scripts, Python
scripts, Pig, Spark etc
a.k.a. “the magic”
“Discovery” phase “Exploitation” phase
58. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected]
W : www.rittmanmead.com
Data Loading into Hadoop
•Default load type is real-time, streaming loads
‣Batch / bulk loads only typically used to seed system
•Variety of sources including web log activity, event streams
•Target is typically HDFS (Hive) or HBase
•Data typically lands in “raw state”
‣Lots of files and events, need to be filtered/aggregated
‣Typically semi-structured (JSON, logs etc)
‣High volume, high velocity
-Which is why we use Hadoop rather than
RBDMS (speed vs. ACID trade-off)
‣Economics of Hadoop means its often possible to
archive all incoming data at detail level
Loading
Stage
Real-Time
Logs / Events
File /
Unstructured
Imports
59. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected]
W : www.rittmanmead.com
Apache Flume : Distributed Transport for Log Activity
•Apache Flume is the standard way to transport log files from source through to target
•Initial use-case was webserver log files, but can transport any file from A>B
•Does not do data transformation, but can send to multiple targets / target types
•Mechanisms and checks to ensure successful transport of entries
•Has a concept of “agents”, “sinks” and “channels”
•Agents collect and forward log data
•Sinks store it in final destination
•Channels store log data en-route
•Simple configuration through INI files
•Handled outside of ODI12c
60. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected]
W : www.rittmanmead.com
Apache Kafka : Reliable, Message-Based
•Developed by LinkedIn, designed to address Flume issues around reliability, throughput
‣(though many of those issues have been addressed since)
•Designed for persistent messages as the common use case
‣Website messages, events etc vs. log file entries
•Consumer (pull) rather than Producer (push) model
•Supports multiple consumers per message queue
•More complex to set up than Flume, and can use
Flume as a consumer of messages
‣But gaining popularity, especially
alongside Spark Streaming
61. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected]
W : www.rittmanmead.com
GoldenGate for Continuous Streaming to Hadoop
•Oracle GoldenGate is also an option, for streaming RDBMS transactions to Hadoop
•Leverages GoldenGate & HDFS / Hive Java APIs
•Sample Implementations on MOS Doc.ID 1586210.1 (HDFS) and 1586188.1 (Hive)
•Likely to be formal part of GoldenGate in future release - but usable now
•Can also integrate with Flume for delivery to HDFS - see MOS Doc.ID 1926867.1
62. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected]
W : www.rittmanmead.com
NoSQL Databases
•Family of database types that reject tabular storage,
SQL access and ACID compliance
•Useful as a way of landing data quickly + supporting
random cell-level access by ETL process
•Focus is on scalability, speed and schema-on-read
‣Oracle NoSQL Database - speed and scalability
‣Apache HBase - speed, scalability and Hadoop
‣MongoDB - native storage of JSON documents
•May or may not run on Hadoop, but associated with it
•Great choice for high-velocity data capture
•CRUD approach vs write-once/read many in HDFS
63. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected]
W : www.rittmanmead.com
ODI on Hadoop - Big Data Projects Discover ETL Tools
•ODI provides an excellent framework for running Hadoop ETL jobs
‣ELT approach pushes transformations down to Hadoop - leveraging power of cluster
•Hive, HBase, Sqoop and OLH/ODCH KMs provide native Hadoop loading / transformation
‣Whilst still preserving RDBMS push-down
‣Extensible to cover Pig, Spark etc
•Process orchestration
•Data quality / error handling
•Metadata and model-driven
64. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected]
W : www.rittmanmead.com
ODI on Hadoop - How Well Does It Work?
•Very good for set-based processing of Hadoop data (HiveQL)
‣Can run python, R etc scripts as procedures
•Brings metadata and team-based ETL development to Hadoop
•Process orchestration, error-handling etc
•Rapid innovation from the ODI Product Dev team - Spark KMs etc coming soon
•But requires Hadoop devs to learn ODI, or add ODI developer to the project
65. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected]
W : www.rittmanmead.com
Options for Sharing Hadoop Output with Wider Audience
•During the discovery phase of a Hadoop project, audience are likely technical
‣Most comfortable with data analyst tools, command-line, low-level access to the data
•During the exploitation phase, audience will be less technical
‣Emphasis on graphical tools, and integration with wider reporting toolset + metadata
•Three main options for visualising and sharing Hadoop data
1.Coming Soon - Oracle Big Data Discovery (Endeca on Hadoop)
2.OBIEE reporting against Hadoop
direct using Hive/Impala, or Oracle Big Data SQL
3.OBIEE reporting against an export of the
Hadoop data, on Exalytics / RDBMS
66. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected]
W : www.rittmanmead.com
Oracle Business Analytics and Big Data Sources
•OBIEE 11g can also make use of big data sources
‣OBIEE 11.1.1.7+ supports Hive/Hadoop as a data source
‣Oracle R Enterprise can expose R models through DB functions, columns
‣Oracle Exalytics has InfiniBand connectivity to Oracle BDA
•Endeca Information Discovery can analyze unstructured and semi-structured sources
‣Increasingly tighter-integration between
OBIEE and Endeca
67. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected]
W : www.rittmanmead.com
New in OBIEE 11.1.1.7 : Hadoop Connectivity through Hive
•MapReduce jobs are typically written in Java, but Hive can make this simpler
•Hive is a query environment over Hadoop/MapReduce to support SQL-like queries
•Hive server accepts HiveQL queries via HiveODBC or HiveJDBC, automatically
creates MapReduce jobs against data previously loaded into the Hive HDFS tables
•Approach used by ODI and OBIEE to gain access to Hadoop data
•Allows Hadoop data to be accessed just like any other data source
68. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected]
W : www.rittmanmead.com
Importing Hadoop/Hive Metadata into RPD
•HiveODBC driver has to be installed into Windows environment, so that
BI Administration tool can connect to Hive and return table metadata
•Import as ODBC datasource, change physical DB type to Apache Hadoop afterwards
•Note that OBIEE queries cannot span >1 Hive schema (no table prefixes)
1
2
3
69. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected]
W : www.rittmanmead.com
OBIEE 11.1.1.7 / HiveServer2 ODBC Driver Issue
•Most customers using BDAs are using CDH4 or CDH5 - which uses HiveServer2
•OBIEE 11.1.1.7 only ships/supports HiveServer1 ODBC drivers
•But … OBIEE 11.1.1.7 on Windows can use the Cloudera HiveServer2 ODBC drivers
‣which isn’t supported by Oracle
‣but works!
70. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected]
W : www.rittmanmead.com
Dealing with Hadoop / Hive Latency Option 1 : Impala
•Hadoop access through Hive can be slow - due to inherent latency in Hive
•Hive queries use MapReduce in the background to query Hadoop
•Spins-up Java VM on each query
•Generates MapReduce job
•Runs and collates the answer
•Great for large, distributed queries ...
•... but not so good for “speed-of-thought” dashboards
71. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected]
W : www.rittmanmead.com
Dealing with Hadoop / Hive Latency Option 1 : Use Impala
•Hive is slow - because it’s meant to be used for batch-mode queries
•Many companies / projects are trying to improve Hive - one of which is Cloudera
•Cloudera Impala is an open-source but
commercially-sponsored in-memory MPP platform
•Replaces Hive and MapReduce in the Hadoop stack
•Can we use this, instead of Hive, to access Hadoop?
‣It will need to work with OBIEE
‣Warning - it won’t be a supported data source (yet…)
72. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected]
W : www.rittmanmead.com
How Impala Works
•A replacement for Hive, but uses Hive concepts and
data dictionary (metastore)
•MPP (Massively Parallel Processing) query engine
that runs within Hadoop
‣Uses same file formats, security,
resource management as Hadoop
•Processes queries in-memory
•Accesses standard HDFS file data
•Option to use Apache AVRO, RCFile,
LZO or Parquet (column-store)
•Designed for interactive, real-time
SQL-like access to Hadoop
Impala
Hadoop
HDFS etc
BI Server
Presentation Svr
Cloudera Impala
ODBC Driver
Impala
Hadoop
HDFS etc
Impala
Hadoop
HDFS etc
Impala
Hadoop
HDFS etc
Impala
Hadoop
HDFS etc
Multi-Node
Hadoop Cluster
73. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected]
W : www.rittmanmead.com
Connecting OBIEE 11.1.1.7 to Cloudera Impala
•Warning - unsupported source - limited testing and no support from MOS
•Requires Cloudera Impala ODBC drivers - Windows or Linux (RHEL etc/SLES) - 32/64 bit
•ODBC Driver / DSN connection steps similar to Hive
74. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected]
W : www.rittmanmead.com
So Does Impala Work, as a Hive Substitute?
•With ORDER BY disabled in DB features, it appears to
•But not extensively tested by me, or Oracle
•But it’s certainly interesting
•Reduces 30s, 180s queries down to 1s, 10s etc
•Impala, or one of the competitor projects
(Drill, Dremel etc) assumed to be the
real-time query replacement for Hive, in time
‣Oracle announced planned support for
Impala at OOW2013 - watch this space
75. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected]
W : www.rittmanmead.com
Dealing with Hadoop / Hive Latency Option 2 : Export to Data Mart
•In most cases, for general reporting access, exporting into RDBMS makes sense
•Export Hive data from Hadoop into Oracle Data Mart or Data Warehouse
•Use Oracle RDBMS for high-value data analysis, full access to RBDMS optimisations
•Potentially use Exalytics for in-memory RBDMS access
Loading
Stage
Processing
Stage
Store / Export
Stage
Real-Time
Logs / Events
RDBMS
Imports
File /
Unstructured
Imports
RDBMS
Exports
File
Exports
76. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected]
W : www.rittmanmead.com
Dealing with Hadoop / Hive Latency Option 3 : Big Data SQL
•Preferred solution for customers with Oracle Big Data Appliance is Big Data SQL
•Oracle SQL Access to both relational, and Hive/NoSQL data sources
•Exadata-type SmartScan against Hadoop datasets
•Response-time equivalent to Impala or Hive on Tez
•No issues around HiveQL limitations
•Insulates end-users around differences
between Oracle and Hive datasets
77. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected]
W : www.rittmanmead.com
Oracle Big Data SQL
•Part of Oracle Big Data 4.0 (BDA-only)
‣Also requires Oracle Database 12c, Oracle Exadata Database Machine
•Extends Oracle Data Dictionary to cover Hive
•Extends Oracle SQL and SmartScan to Hadoop
•Extends Oracle Security Model over Hadoop
‣Fine-grained access control
‣Data redaction, data masking
‣Uses fast c-based readers where possible
(vs. Hive MapReduce generation)
‣Map Hadoop parallelism to Oracle PQ
‣Big Data SQL engine works on top of YARN
‣Like Spark, Tez, MR2
Exadata
Storage Servers
Hadoop
Cluster
Exadata Database
Server
Oracle Big
Data SQL
SQL Queries
SmartScan SmartScan
78. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected]
W : www.rittmanmead.com
View Hive Table Metadata in the Oracle Data Dictionary
•Oracle Database 12c 12.1.0.2.0 with Big Data SQL option can view Hive table metadata
‣Linked by Exadata configuration steps to one or more BDA clusters
•DBA_HIVE_TABLES and USER_HIVE_TABLES exposes Hive metadata
•Oracle SQL*Developer 4.0.3, with Cloudera Hive drivers, can connect to Hive metastore
SQL> col database_name for a30
SQL> col table_name for a30
SQL> select database_name, table_name
2 from dba_hive_tables;
DATABASE_NAME TABLE_NAME
------------------------------ ------------------------------
default access_per_post
default access_per_post_categories
default access_per_post_full
default apachelog
default categories
default countries
default cust
default hive_raw_apache_access_log
79. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected]
W : www.rittmanmead.com
Big Data SQL Server Dataflow
•Read data from HDFS Data Node
‣Direct-path reads
‣C-based readers when possible
‣Use native Hadoop classes otherwise
•Translate bytes to Oracle
•Apply SmartScan to Oracle bytes
‣Apply filters
‣Project columns
‣Parse JSON/XML
‣Score models Disks%
Data$Node$
Big$Data$SQL$Server$
External$Table$Services$
Smart$Scan$
RecordReader%
SerDe%
10110010%10110010%10110010%
1%
2%
3%
1
2
3
80. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected]
W : www.rittmanmead.com
Hive Access through Oracle External Tables + Hive Driver
•Big Data SQL accesses Hive tables through external table mechanism
‣ORACLE_HIVE external table type imports Hive metastore metadata
‣ORACLE_HDFS requires metadata to be specified
•Access parameters cluster and tablename specify Hive table source and BDA cluster
CREATE TABLE access_per_post_categories(
hostname varchar2(100),
request_date varchar2(100),
post_id varchar2(10),
title varchar2(200),
author varchar2(100),
category varchar2(100),
ip_integer number)
organization external
(type oracle_hive
default directory default_dir
access parameters(com.oracle.bigdata.tablename=default.access_per_post_categories));
81. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected]
W : www.rittmanmead.com
Use Rich Oracle SQL Dialect over Hadoop (Hive) Data
•Ranking Functions
‣rank, dense_rank, cume_dist,
percent_rank, ntile
•Window Aggregate Functions
‣Avg, sum, min, max, count, variance,
first_value, last_value
•LAG/LEAD Functions
•Reporting Aggregate Functions
‣Sum, Avg, ratio_to_report
•Statistical Aggregates
‣Correlation, linear regression family,
covariance
•Linear Regression
‣Fitting of ordinary-least-squares
regression line to set of number pairs
•Descriptive Statistics
•Correlations
‣Pearson’s correlation coefficients
•Crosstabs
‣Chi squared, phi coefficinet
•Hypothesis Testing
‣Student t-test, Bionomal test
•Distribution
‣Anderson-Darling test - etc.
82. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected]
W : www.rittmanmead.com
Leverages Hive Metastore for Hadoop Java Access Classes
•As with other next-gen SQL access layers, uses common Hive metastore table metadata
•Provides route to underlying Hadoop data for Oracle Big Data SQL c-based SmartScan
83. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected]
W : www.rittmanmead.com
Extending SmartScan, and Oracle SQL, Across All Data
•Brings query-offloading features of Exadata
to Oracle Big Data Appliance
•Query across both Oracle and Hadoop sources
•Intelligent query optimisation applies SmartScan
close to ALL data
•Use same SQL dialect across both sources
•Apply same security rules, policies,
user access rights across both sources
84. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected]
W : www.rittmanmead.com
Example : Using Big Data SQL to Add Dimensions to Hive Data
•We want to add country and post details to a Hive table containing page accesses
•Post and Country details are stored in Oracle RBDMS reference tables
Hive Weblog Activity table
Oracle Dimension lookup tables
Combined output
in report form
85. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected]
W : www.rittmanmead.com
Create ORACLE_HIVE External Table over Hive Table
•Use the ORACLE_HIVE access driver type to create Oracle external table over Hive table
•ACCESS_PER_POST_EXTTAB now appears in Oracle data dictionary
86. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected]
W : www.rittmanmead.com
Import Oracle Tables, Create RPD joining Tables Together
•No need to use Hive ODBC drivers - Oracle OCI connection instead
•No issue around HiveServer1 vs HiveServer2; also Big Data SQL handles authentication
with Hadoop cluster in background, Kerberos etc
•Transparent to OBIEE - all appear as Oracle tables
•Join across schemas if required
87. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected]
W : www.rittmanmead.com
Create Physical Data Model from Imported Table Metadata
•Join ORACLE_HIVE external table containing log data, to reference tables from Oracle DB
88. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected]
W : www.rittmanmead.com
Create Business Model and Presentation Layers
•Map incoming physical tables into a star schema
•Add aggregation method for fact measures
•Add logical keys for logical dimension tables
•Remove columns from fact table that aren’t measures
89. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected]
W : www.rittmanmead.com
Create Initial Analyses Against Combined Dataset
•Create analyses using
full SQL features
•Access to Oracle RDBMS
Advanced Analytics functions
through EVALUATE,
EVALUATE_AGGR etc
•Big Data SQL SmartScan feature
provides fast, ad-hoc access
to Hive data, avoiding MapReduce
90. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected]
W : www.rittmanmead.com
Oracle / Hive Query Federation at the RDBMS Level
•Oracle Big Data SQL feature (not BI Server) takes care of query federation
•SQL required for fact table (web log activity) access sent to Big Data SQL agent on BDA
•Only columns (projection) and rows (filtering) required to answer query sent back to Exadata
•Storage Indexes used on both Exadata Storage Servers and BDA nodes to skip block reads
for irrelevant data
•HDFS caching used to speed-up
access to commonly-used
HDFS data
91. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected]
W : www.rittmanmead.com
Access to Full Set of Oracle Join Types
•No longer restricted to HiveQL equi-joins - Big Data SQL supports all Oracle join operators
•Use to join Hive data (using View over external table) to a IP range country lookup table
using BETWEEN join operator
92. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected]
W : www.rittmanmead.com
Add In Time Dimension Table
•Enables time-series reporting; pre-req for forecasting (linear regression-type queries)
•Map to Date field in view over ORACLE_HIVE table
‣Convert incoming Hive STRING field to Oracle DATE for better time-series manipulation
93. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected]
W : www.rittmanmead.com
Now Enables Time-Series Reporting Incl. Country Lookups
94. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected]
W : www.rittmanmead.com
What About Oracle Big Data SQL and ODI12c?
•Hive, and MapReduce, are well suited to batch-type ETL jobs, but …
•Not all join types are available in Hive - joins must be equality joins
•Any data from external Oracle RDBMS sources has to be staged in Hadoop before joining
•Limited set of HiveQL functions vs. Oracle SQL
•Oracle-based mappings have to import
Hive data into DB before accessing it
95. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected]
W : www.rittmanmead.com
Combining Oracle and Hadoop (Hive) Data in Mappings
•Example scenario : log data in Hadoop needs to be enriched with customer data in Oracle
‣Hadoop (Hive) contains log activity and customer etc IDs
‣Reference / customer data held in Oracle RBDMS
•How do we create a mapping that joins both datasets?
movieapp_log_odistage.custid = CUSTOMER.CUSTID
96. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected]
W : www.rittmanmead.com
Options for Importing Oracle / RDBMS Data into Hadoop
•Could export RBDMS data to file, and load using
IKM File to Hive
•Oracle Big Data Connectors only export to Oracle,
not import to Hadoop
•One option is to use Apache Sqoop, and new
IKM SQL to Hive-HBase-File knowledge module
•Hadoop-native, automatically runs in parallel
•Uses native JDBC drivers, or OraOop (for
example)
•Bi-directional in-and-out of Hadoop to RDBMS
•Join performed in Hive, using HiveQL
‣With HiveQL limitations (only equi-joins)
movieapp_log_odistage.custid =
customer.custid
Sqoop extract
97. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected]
W : www.rittmanmead.com
New Option - Using Oracle Big Data SQL
•Oracle Big Data SQL provides ability for Exadata to reference Hive tables
•Use feature to create join in Oracle, bringing across Hive data through ORACLE_HIVE table
98. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected]
W : www.rittmanmead.com
Oracle Big Data SQL and Data Integration
•Gives us the ability to easily bring in Hadoop (Hive) data into Oracle-based mappings
•Allows us to create Hive-based mappings that use Oracle SQL for transforms, joins
•Faster access to Hive data for real-time ETL scenarios
•Through Hive, bring NoSQL and semi-structured data access to Oracle ETL projects
•For our scenario - join weblog + customer data in Oracle RDBMS, no need to stage in Hive
99. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected]
W : www.rittmanmead.com
Using Big Data SQL in an ODI12c Mapping
•By default, Hive table has to be exposed as an ORACLE_HIVE external table in Oracle first
•Then register that Oracle external table in ODI repository + model
External table creation in Oracle
Logical Mapping using just Oracle tables
1
2
Register in ODI Model
3
100. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected]
W : www.rittmanmead.com
Custom KM : LKM Hive to Oracle (Big Data SQL)
•ODI12c Big Data SQL example on BigDataLite VM uses a custom KM for Big Data SQL
‣LKM Hive to Oracle (Big Data SQL) - KM code downloadable from java.net
‣Allows Hive+Oracle joins by auto-creating ORACLE_HIVE extttab
definition to enable Big Data SQL Hive table access
101. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected]
W : www.rittmanmead.com
ODI12c Mapping Creates Temp Exttab, Joins to Oracle
1
2
Register in ODI Model
3
4
Hive table AP uses LKM Hive to Oracle (Big Data SQL)
IKM Oracle Insert
Big Data SQL Hive External Table created as temp object
Main integration SQL routines uses regular Oracle SQL join
102. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected]
W : www.rittmanmead.com
Finally … What Keeps the CIO Awake at Night
•Security and Privacy Regulations
‣Are we analysing and sharing data in compliance with privacy regulations?
-And if we are - would customers think our use of it is ethical?
‣Do I know if the data in my Hadoop cluster is *really* secure?
103. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected]
W : www.rittmanmead.com
Hadoop Security “By Default”
•Connections between Hadoop services, and by users to services, aren’t authenticated
•Security is fragmented : HDFS, Hive, OS user accounts, Hue, CM all separate models
•No single place to define security policies, groups, access rights
•No single tool to audit access and permissions
•By default, everything is open and trusted - reflects roots in academia, R&D, marketing depts
104. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected]
W : www.rittmanmead.com
“Secured” Hadoop : Kerberos, Sentry, Data Encryption etc
•Available for most Hadoop distributions, part of core Hadoop
•Kerberos Authentication - enables service-to-service, and client-to-service authentication
using MIT Kerberos or MS AD Kerberos
•Apache Sentry - Role-based Access Control for Hive, Impala and HDFS (CDH5.3+)
•Transparent at-rest HDFS encryption (CDH5.3+)
•Closes security loopholes, goes some way to Oracle-type data security
105. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected]
W : www.rittmanmead.com
Oracle Big Data SQL : Single RBDMS/Hadoop Security Model
•Potential to extend Oracle security model over Hadoop (Hive) data
‣Masking / Redaction
‣VPD
‣FGAC
106. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected]
W : www.rittmanmead.com
Summary
•Hadoop and Oracle Big Data Appliance are increasingly appearing in BI+DW Projects
•Gives DW projects the ability to store more data, cheaper and more flexibly than before
•Enables non-relational (SQL) query tools and analysis techniques (R, Spark etc)
•Extends BI’s capability to report and analyze across wider data sources
•Maturity varies widely in terms of tool maturity, and Oracle integration with Hadoop
•Trend is for Oracle to “productize” big data, creating tools + products around Oracle BDA
•We are probably at early stages - but very interesting times to be an Oracle BI+DW dev!
107. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected]
W : www.rittmanmead.com
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected]
W : www.rittmanmead.com
Thank You for Attending!
•Thank you for attending this presentation, and more information can be found at http://
www.rittmanmead.com
•Contact us at [email protected] or [email protected]
•Look out for our book, “Oracle Business Intelligence Developers Guide” out now!
•Follow-us on Twitter (@rittmanmead) or Facebook (facebook.com/rittmanmead)
108. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected]
W : www.rittmanmead.com
Hadoop and Oracle Technologies
on BI Projects
Mark Rittman, CTO, Rittman Mead
Dutch Oracle Users Group, Jan 14th 2015