These are the slides from my presentation on Running R in the Database using Oracle R Enterprise. The second half of the presentation is a live demo of using the Oracle R Enterprise. Unfortunately the demo is not listed in these slides
This document discusses troubleshooting an issue where a database failed to mount after a role transition from primary to standby due to inconsistencies in the control file's temporary file mappings for PDBs. The issue was identified as a match for bug 24601586, which is caused when temporary tablespaces are dropped and recreated on the primary. The solution involved applying the patch for the bug, recreating the control file to fix the mappings, and adding tempfiles to the tablespaces on the new primary database. Lessons included applying relevant patches before PDB temp tablespace maintenance and checking for control file mapping mismatches when troubleshooting ORA-600 errors.
The Extract-Transform-Load (ETL) process is one of the most time consuming processes facing anyone who wishes to analyze data. Imagine if you could quickly, easily and scaleably merge and query data without having to spend hours in data prep. Well.. you don’t have to imagine it. You can with Apache Drill. In this hands-on, interactive presentation Mr. Givre will show you how to unleash the power of Apache Drill and explore your data without any kind of ETL process.
This document provides an introduction to Cassandra including:
- Datastax is a company that contributes to Apache Cassandra and sells Datastax Enterprise.
- Cassandra was created at Facebook and is now open source software with the current version being 3.2.
- Cassandra's key features include linear scalability, continuous availability, multi-datacenter support, operational simplicity, and Spark integration.
Overview of accessing relational databases from R. Focuses and demonstrates DBI family (RMySQL, RPostgreSQL, ROracle, RJDBC, etc.) but also introduces RODBC. Highlights DBI's dbApply() function to combine strengths of SQL and *apply() on large data sets. Demonstrates sqldf package which provides SQL access to standard R data.frames.
Presented at the May 2011 meeting of the Greater Boston useR Group.
Study after study show that data scientists spend 50-90 percent of their time gathering and preparing data. In many large organizations this problem is exacerbated by data being stored on a variety of systems, with different structures and architectures. Apache Drill is a relatively new tool which can help solve this difficult problem by allowing analysts and data scientists to query disparate datasets in-place using standard ANSI SQL without having to define complex schemata, or having to rebuild their entire data infrastructure. In this talk I will introduce the audience to Apache Drill—to include some hands-on exercises—and present a case study of how Drill can be used to query a variety of data sources. The presentation will cover:
* How to explore and merge data sets in different formats
* Using Drill to interact with other platforms such as Python and others
* Exploring data stored on different machines
OrientDB vs Neo4j - Comparison of query/speed/functionalityCurtis Mosters
This presentation gives an overview on OrientDB and Neo4j. It also compares some specific querys, their speed and the overall functionality of both databases.
The querys might not be optimized in both cases. At least they have the same outcome and are both written as querys. For sure in Neo4j you should do this in Java code. But that is way harder to write, so this presentation is more like a direkt comparision instead of really getting the best results.
Also it's done with real data and at the end round about 200 GB of data.
User Defined Aggregation in Apache Spark: A Love StoryDatabricks
This document summarizes a user's journey developing a custom aggregation function for Apache Spark using a T-Digest sketch. The user initially implemented it as a User Defined Aggregate Function (UDAF) but ran into performance issues due to excessive serialization/deserialization. They then worked to resolve it by implementing the function as a custom Aggregator using Spark 3.0's new aggregation APIs, which avoided unnecessary serialization and provided a 70x performance improvement. The story highlights the importance of understanding how custom functions interact with Spark's execution model and optimization techniques like avoiding excessive serialization.
This document introduces Spark SQL and the Catalyst query optimizer. It discusses that Spark SQL allows executing SQL on Spark, builds SchemaRDDs, and optimizes query execution plans. It then provides details on how Catalyst works, including its use of logical expressions, operators, and rules to transform query trees and optimize queries. Finally, it outlines some interesting open issues and how to contribute to Spark SQL's development.
2021 04-20 apache arrow and its impact on the database industry.pptxAndrew Lamb
The talk will motivate why Apache Arrow and related projects (e.g. DataFusion) is a good choice for implementing modern analytic database systems. It reviews the major components in most databases and explains where Apache Arrow fits in, and explains additional integration benefits from using Arrow.
Apache Spark, the Next Generation Cluster ComputingGerger
This document provides a 3 sentence summary of the key points:
Apache Spark is an open source cluster computing framework that is faster than Hadoop MapReduce by running computations in memory through RDDs, DataFrames and Datasets. It provides high-level APIs for batch, streaming and interactive queries along with libraries for machine learning. Spark's performance is improved through techniques like Catalyst query optimization, Tungsten in-memory columnar formats, and whole stage code generation.
In this talk, we’ll discuss technical designs of support of HBase as a “native” data source to Spark SQL to achieve both query and load performance and scalability: near-precise execution locality of query and loading, fine-tuned partition pruning, predicate pushdown, plan execution through coprocessor, and optimized and fully parallelized bulk loader. Point and range queries on dimensional attributes will benefit particularly well from the techniques. Preliminary test results vs. established SQL-on-HBase technologies will be provided. The speaker will also share the future plan and real-world use cases, particularly in the telecom industry.
DataSource V2 and Cassandra – A Whole New WorldDatabricks
Data Source V2 has arrived for the Spark Cassandra Connector, but what does this mean for you? Speed, Flexibility and Usability improvements abound and we’ll walk you through some of the biggest highlights and how you can take advantage of them today.
This document discusses Hadoop and big data. It begins with definitions of big data and how Hadoop can help with large, complex datasets. It then discusses how Hadoop works with other tools like Pig and Hive. The document outlines different scenarios for big data and whether Hadoop is suitable. It also discusses how big data frameworks have evolved from Google papers. Finally, it provides examples of big data use cases and how education is being democratized with big data tools.
Frustration-Reduced Spark: DataFrames and the Spark Time-Series LibraryIlya Ganelin
In this talk I talk about my recent experience working with Spark Data Frames and the Spark TimeSeries library. For data frames, the focus will be on usability. Specifically, a lot of the documentation does not cover common use cases like intricacies of creating data frames, adding or manipulating individual columns, and doing quick and dirty analytics. For the time series library, I dive into the kind of use cases it supports and why it’s actually super useful.
Structured Streaming for Columnar Data Warehouses with Jack GudenkaufDatabricks
Organizations building big data analytics solutions for streaming environments struggle with adapting legacy batch systems for streaming, supporting multiple columnar analytical databases, providing time series aggregations, and streaming Fact and Dimensional data into star schemas. In this session, you will learn how we overcame these challenges and developed an end-user self-service, no-code required “ETL” framework. Extensible and operationally robust, this developer framework includes a Spark Structured Streaming app for Kafka, Hadoop/Hive (ORC, Parquet), OpenTSDB/HBase, and Vertica data pipelines.
Analyzing Real-World Data with Apache Drilltshiran
This document provides an overview of Apache Drill, an open source SQL query engine for analysis of both structured and unstructured data. It discusses how Drill allows for schema-free querying of data stored in Hadoop, NoSQL databases and other data sources using SQL. The document outlines some key features of Drill, such as its flexible data model, ability to discover schemas on the fly, and distributed execution architecture. It also presents examples of using Drill to analyze real-world data from sources like HDFS, MongoDB and more.
Apache Calcite: A Foundational Framework for Optimized Query Processing Over ...Julian Hyde
Apache Calcite is an open source framework for building data management systems that allows for optimized query processing over heterogeneous data sources. It uses a flexible relational algebra and extensible adapter-based architecture that allows it to incorporate diverse data sources. Calcite's rule-based optimizer transforms logical query plans into efficient physical execution plans tailored for different data sources. It has been adopted by many projects and companies and is also used in research.
Enterprise data is moving into Hadoop, but some data has to stay in operational systems. Apache Calcite (the technology behind Hive’s new cost-based optimizer, formerly known as Optiq) is a query-optimization and data federation technology that allows you to combine data in Hadoop with data in NoSQL systems such as MongoDB and Splunk, and access it all via SQL.
Hyde shows how to quickly build a SQL interface to a NoSQL system using Calcite. He shows how to add rules and operators to Calcite to push down processing to the source system, and how to automatically build materialized data sets in memory for blazing-fast interactive analysis.
Apache Spark - Dataframes & Spark SQL - Part 1 | Big Data Hadoop Spark Tutori...CloudxLab
Big Data with Hadoop & Spark Training: http://bit.ly/2sf2z6i
This CloudxLab Introduction to Spark SQL & DataFrames tutorial helps you to understand Spark SQL & DataFrames in detail. Below are the topics covered in this slide:
1) Introduction to DataFrames
2) Creating DataFrames from JSON
3) DataFrame Operations
4) Running SQL Queries Programmatically
5) Datasets
6) Inferring the Schema Using Reflection
7) Programmatically Specifying the Schema
Yahoo! Hadoop grid makes use of a managed service to get the data pulled into the clusters. However, when it comes to getting the data-out of the clusters, the choices are limited to proxies such as HDFSProxy and HTTPProxy. With the introduction of HCatalog services, customers of the grid now have their data represented in a central metadata repository. HCatalog abstracts out file locations and underlying storage format of data for the users, along with several other advantages such as sharing of data among MapReduce, Pig, and Hive. In this talk, we will focus on how the ODBC/JDBC interface of HiveServer2 accomplished the use case of getting data out of the clusters when HCatalog is in use and users no longer want to worry about the files, partitions and their location. We will also demo the data out capabilities, and go through other nice properties of the data out feature.
Presenter(s):
Sumeet Singh, Director, Product Management, Yahoo!
Chris Drome, Technical Yahoo!
This document outlines an agenda for an advanced Goldengate training covering various topics:
1) Methods for initializing data including using keys and commit SCNs.
2) Handling DML and DML errors with techniques like REPERROR and exception tables.
3) Advanced DDL synchronization and errors including filtering, substitution, and derived objects.
4) Data mapping, manipulation, and selecting rows using filters and WHERE clauses.
5) Monitoring and troubleshooting Goldengate configurations.
Spark RDDs, DataFrames, and Datasets all represent distributed collections of data. RDDs use Java objects to represent data but have no optimization and require expensive serialization. DataFrames use Catalyst optimization and binary serialization to improve efficiency. Datasets build on DataFrames with additional optimizations and type safety. Datasets can also regenerate RDDs and interoperate between formats more easily than DataFrames.
How to teach an elephant to rock'n'rollPGConf APAC
The document discusses techniques for optimizing PostgreSQL queries, including:
1. Using index only scans to efficiently skip large offsets in queries instead of scanning all rows.
2. Pulling the LIMIT clause under joins and aggregates to avoid processing unnecessary rows.
3. Employing indexes creatively to perform DISTINCT operations by scanning the index instead of the entire table.
4. Optimizing DISTINCT ON queries by looping through authors and returning the latest row for each instead of a full sort.
The document discusses Oracle's Advanced Analytics Option which extends the Oracle Database into a comprehensive advanced analytics platform. It includes Oracle Data Mining for in-database predictive analytics and data mining, and Oracle R Enterprise which integrates the open-source R statistical programming language with the database. The option aims to bring algorithms to the data within the database to eliminate data movement and reduce total cost of ownership compared to traditional statistical environments.
Beyond shuffling - Scala Days Berlin 2016Holden Karau
This session will cover our & community experiences scaling Spark jobs to large datasets and the resulting best practices along with code snippets to illustrate.
The planned topics are:
Using Spark counters for performance investigation
Spark collects a large number of statistics about our code, but how often do we really look at them? We will cover how to investigate performance issues and figure out where to best spend our time using both counters and the UI.
Working with Key/Value Data
Replacing groupByKey for awesomeness
groupByKey makes it too easy to accidently collect individual records which are too large to process. We will talk about how to replace it in different common cases with more memory efficient operations.
Effective caching & checkpointing
Being able to reuse previously computed RDDs without recomputing can substantially reduce execution time. Choosing when to cache, checkpoint, or what storage level to use can have a huge performance impact.
Considerations for noisy clusters
Functional transformations with Spark Datasets
How to have the some of benefits of Spark’s DataFrames while still having the ability to work with arbitrary Scala code
Efficient Data Storage for Analytics with Apache Parquet 2.0Cloudera, Inc.
Apache Parquet is an open-source columnar storage format for efficient data storage and analytics. It provides efficient compression and encoding techniques that enable fast scans and queries of large datasets. Parquet 2.0 improves on these efficiencies through enhancements like delta encoding, binary packing designed for CPU efficiency, and predicate pushdown using statistics. Benchmark results show Parquet provides much better compression and query performance than row-oriented formats on big data workloads. The project is developed as an open-source community with contributions from many organizations.
Innovate Analytics with Oracle Data Mining & Oracle RCapgemini
This document summarizes a presentation about innovating analytics with Oracle Data Mining and R. The presentation introduces data mining and R, how they can be used with Oracle BI 11g, and Oracle's predictive analytics stack. It provides examples of data mining use cases and encourages organizations to start predictive analytics projects by leveraging existing BI investments. The presentation aims to provide an understanding of data mining and R, how predictive analytics can benefit organizations, and how to get started with a predictive analytics project.
2021 04-20 apache arrow and its impact on the database industry.pptxAndrew Lamb
The talk will motivate why Apache Arrow and related projects (e.g. DataFusion) is a good choice for implementing modern analytic database systems. It reviews the major components in most databases and explains where Apache Arrow fits in, and explains additional integration benefits from using Arrow.
Apache Spark, the Next Generation Cluster ComputingGerger
This document provides a 3 sentence summary of the key points:
Apache Spark is an open source cluster computing framework that is faster than Hadoop MapReduce by running computations in memory through RDDs, DataFrames and Datasets. It provides high-level APIs for batch, streaming and interactive queries along with libraries for machine learning. Spark's performance is improved through techniques like Catalyst query optimization, Tungsten in-memory columnar formats, and whole stage code generation.
In this talk, we’ll discuss technical designs of support of HBase as a “native” data source to Spark SQL to achieve both query and load performance and scalability: near-precise execution locality of query and loading, fine-tuned partition pruning, predicate pushdown, plan execution through coprocessor, and optimized and fully parallelized bulk loader. Point and range queries on dimensional attributes will benefit particularly well from the techniques. Preliminary test results vs. established SQL-on-HBase technologies will be provided. The speaker will also share the future plan and real-world use cases, particularly in the telecom industry.
DataSource V2 and Cassandra – A Whole New WorldDatabricks
Data Source V2 has arrived for the Spark Cassandra Connector, but what does this mean for you? Speed, Flexibility and Usability improvements abound and we’ll walk you through some of the biggest highlights and how you can take advantage of them today.
This document discusses Hadoop and big data. It begins with definitions of big data and how Hadoop can help with large, complex datasets. It then discusses how Hadoop works with other tools like Pig and Hive. The document outlines different scenarios for big data and whether Hadoop is suitable. It also discusses how big data frameworks have evolved from Google papers. Finally, it provides examples of big data use cases and how education is being democratized with big data tools.
Frustration-Reduced Spark: DataFrames and the Spark Time-Series LibraryIlya Ganelin
In this talk I talk about my recent experience working with Spark Data Frames and the Spark TimeSeries library. For data frames, the focus will be on usability. Specifically, a lot of the documentation does not cover common use cases like intricacies of creating data frames, adding or manipulating individual columns, and doing quick and dirty analytics. For the time series library, I dive into the kind of use cases it supports and why it’s actually super useful.
Structured Streaming for Columnar Data Warehouses with Jack GudenkaufDatabricks
Organizations building big data analytics solutions for streaming environments struggle with adapting legacy batch systems for streaming, supporting multiple columnar analytical databases, providing time series aggregations, and streaming Fact and Dimensional data into star schemas. In this session, you will learn how we overcame these challenges and developed an end-user self-service, no-code required “ETL” framework. Extensible and operationally robust, this developer framework includes a Spark Structured Streaming app for Kafka, Hadoop/Hive (ORC, Parquet), OpenTSDB/HBase, and Vertica data pipelines.
Analyzing Real-World Data with Apache Drilltshiran
This document provides an overview of Apache Drill, an open source SQL query engine for analysis of both structured and unstructured data. It discusses how Drill allows for schema-free querying of data stored in Hadoop, NoSQL databases and other data sources using SQL. The document outlines some key features of Drill, such as its flexible data model, ability to discover schemas on the fly, and distributed execution architecture. It also presents examples of using Drill to analyze real-world data from sources like HDFS, MongoDB and more.
Apache Calcite: A Foundational Framework for Optimized Query Processing Over ...Julian Hyde
Apache Calcite is an open source framework for building data management systems that allows for optimized query processing over heterogeneous data sources. It uses a flexible relational algebra and extensible adapter-based architecture that allows it to incorporate diverse data sources. Calcite's rule-based optimizer transforms logical query plans into efficient physical execution plans tailored for different data sources. It has been adopted by many projects and companies and is also used in research.
Enterprise data is moving into Hadoop, but some data has to stay in operational systems. Apache Calcite (the technology behind Hive’s new cost-based optimizer, formerly known as Optiq) is a query-optimization and data federation technology that allows you to combine data in Hadoop with data in NoSQL systems such as MongoDB and Splunk, and access it all via SQL.
Hyde shows how to quickly build a SQL interface to a NoSQL system using Calcite. He shows how to add rules and operators to Calcite to push down processing to the source system, and how to automatically build materialized data sets in memory for blazing-fast interactive analysis.
Apache Spark - Dataframes & Spark SQL - Part 1 | Big Data Hadoop Spark Tutori...CloudxLab
Big Data with Hadoop & Spark Training: http://bit.ly/2sf2z6i
This CloudxLab Introduction to Spark SQL & DataFrames tutorial helps you to understand Spark SQL & DataFrames in detail. Below are the topics covered in this slide:
1) Introduction to DataFrames
2) Creating DataFrames from JSON
3) DataFrame Operations
4) Running SQL Queries Programmatically
5) Datasets
6) Inferring the Schema Using Reflection
7) Programmatically Specifying the Schema
Yahoo! Hadoop grid makes use of a managed service to get the data pulled into the clusters. However, when it comes to getting the data-out of the clusters, the choices are limited to proxies such as HDFSProxy and HTTPProxy. With the introduction of HCatalog services, customers of the grid now have their data represented in a central metadata repository. HCatalog abstracts out file locations and underlying storage format of data for the users, along with several other advantages such as sharing of data among MapReduce, Pig, and Hive. In this talk, we will focus on how the ODBC/JDBC interface of HiveServer2 accomplished the use case of getting data out of the clusters when HCatalog is in use and users no longer want to worry about the files, partitions and their location. We will also demo the data out capabilities, and go through other nice properties of the data out feature.
Presenter(s):
Sumeet Singh, Director, Product Management, Yahoo!
Chris Drome, Technical Yahoo!
This document outlines an agenda for an advanced Goldengate training covering various topics:
1) Methods for initializing data including using keys and commit SCNs.
2) Handling DML and DML errors with techniques like REPERROR and exception tables.
3) Advanced DDL synchronization and errors including filtering, substitution, and derived objects.
4) Data mapping, manipulation, and selecting rows using filters and WHERE clauses.
5) Monitoring and troubleshooting Goldengate configurations.
Spark RDDs, DataFrames, and Datasets all represent distributed collections of data. RDDs use Java objects to represent data but have no optimization and require expensive serialization. DataFrames use Catalyst optimization and binary serialization to improve efficiency. Datasets build on DataFrames with additional optimizations and type safety. Datasets can also regenerate RDDs and interoperate between formats more easily than DataFrames.
How to teach an elephant to rock'n'rollPGConf APAC
The document discusses techniques for optimizing PostgreSQL queries, including:
1. Using index only scans to efficiently skip large offsets in queries instead of scanning all rows.
2. Pulling the LIMIT clause under joins and aggregates to avoid processing unnecessary rows.
3. Employing indexes creatively to perform DISTINCT operations by scanning the index instead of the entire table.
4. Optimizing DISTINCT ON queries by looping through authors and returning the latest row for each instead of a full sort.
The document discusses Oracle's Advanced Analytics Option which extends the Oracle Database into a comprehensive advanced analytics platform. It includes Oracle Data Mining for in-database predictive analytics and data mining, and Oracle R Enterprise which integrates the open-source R statistical programming language with the database. The option aims to bring algorithms to the data within the database to eliminate data movement and reduce total cost of ownership compared to traditional statistical environments.
Beyond shuffling - Scala Days Berlin 2016Holden Karau
This session will cover our & community experiences scaling Spark jobs to large datasets and the resulting best practices along with code snippets to illustrate.
The planned topics are:
Using Spark counters for performance investigation
Spark collects a large number of statistics about our code, but how often do we really look at them? We will cover how to investigate performance issues and figure out where to best spend our time using both counters and the UI.
Working with Key/Value Data
Replacing groupByKey for awesomeness
groupByKey makes it too easy to accidently collect individual records which are too large to process. We will talk about how to replace it in different common cases with more memory efficient operations.
Effective caching & checkpointing
Being able to reuse previously computed RDDs without recomputing can substantially reduce execution time. Choosing when to cache, checkpoint, or what storage level to use can have a huge performance impact.
Considerations for noisy clusters
Functional transformations with Spark Datasets
How to have the some of benefits of Spark’s DataFrames while still having the ability to work with arbitrary Scala code
Efficient Data Storage for Analytics with Apache Parquet 2.0Cloudera, Inc.
Apache Parquet is an open-source columnar storage format for efficient data storage and analytics. It provides efficient compression and encoding techniques that enable fast scans and queries of large datasets. Parquet 2.0 improves on these efficiencies through enhancements like delta encoding, binary packing designed for CPU efficiency, and predicate pushdown using statistics. Benchmark results show Parquet provides much better compression and query performance than row-oriented formats on big data workloads. The project is developed as an open-source community with contributions from many organizations.
Innovate Analytics with Oracle Data Mining & Oracle RCapgemini
This document summarizes a presentation about innovating analytics with Oracle Data Mining and R. The presentation introduces data mining and R, how they can be used with Oracle BI 11g, and Oracle's predictive analytics stack. It provides examples of data mining use cases and encourages organizations to start predictive analytics projects by leveraging existing BI investments. The presentation aims to provide an understanding of data mining and R, how predictive analytics can benefit organizations, and how to get started with a predictive analytics project.
This document provides a high-level overview of MapReduce and Hadoop. It begins with an introduction to MapReduce, describing it as a distributed computing framework that decomposes work into parallelized map and reduce tasks. Key concepts like mappers, reducers, and job tracking are defined. The structure of a MapReduce job is then outlined, showing how input is divided and processed by mappers, then shuffled and sorted before being combined by reducers. Example map and reduce functions for a word counting problem are presented to demonstrate how a full MapReduce job works.
This document provides an introduction to Hadoop and HDFS. It defines big data and Hadoop, describing how Hadoop uses a scale-out approach to distribute data and processing across clusters of commodity servers. It explains that HDFS is the distributed file system of Hadoop, which splits files into blocks and replicates them across multiple nodes for reliability. HDFS is optimized for large streaming reads and writes of large files. The document also gives an overview of the Hadoop ecosystem and common Hadoop distributions.
OUG Ireland Meet-up - Updates from Oracle Open World 2016Brendan Tierney
OUG Ireland meet-up held on 20th October 20116, with presentations on updates from Oracle Open World 2016. Covering Tech/Database, Big Data, Analyitcs, and Oracle Cloud
Predictive analytics: Mining gold and creating valuable productBrendan Tierney
My presentation about building predictive analytics and machine learning solutions. Presented using a number of real world projects that I've worked on over the past couple of years
O documento discute data mining, definindo-o como a exploração e análise de grandes quantidades de dados para descobrir padrões ou regras interessantes. Ele descreve técnicas como árvores de decisão, redes neurais e algoritmos genéticos e discute como o data mining pode ser aplicado em diversas áreas de negócios.
This document discusses Canary and OpenCanary honeypot tools. Canary is a commercial tool that mimics operating systems and services to detect interactions from attackers. It has a graphical user interface and integrates with services like Slack. OpenCanary is an open source alternative that is less feature-rich but free to use. It requires configuring via text files rather than a GUI. The document also explores using Vagrant to automatically deploy OpenCanary virtual machines.
The document provides an overview of analyzing performance data using the Automatic Workload Repository (AWR) in Oracle databases. It discusses how AWR collects snapshots of data from V$ views over time and stores them in database history views. It highlights some key views used in AWR analysis and factors to consider like snapshot intervals and timestamps. Examples are provided to show how to query AWR views to identify top SQL statements by CPU usage and analyze performance metrics trends over time.
Building a Successful Internal Adversarial Simulation Team - Chris Gates & Ch...Chris Gates
Brucon 2016
The evolution chain in security testing is fundamentally broken due to a lack of understanding, reduction of scope, and a reliance on vulnerability “whack a mole.” To help break the barriers of the common security program we are going to have to divorce ourselves from the metrics of vulnerability statistics and Pavlovian risk color charts and really get to work on how our security programs perform during a REAL event. To do so, we must create an entirely new set of metrics, tests, procedures, implementations and repeatable process. It is extremely rare that a vulnerability causes a direct risk to an environment, it is usually what the attacker DOES with the access gained that matters. In this talk we will discuss the way that Internal and external teams have been created to simulate a REAL WORLD attack and work hand in hand with the Defensive teams to measure the environments resistance to the attacks. We will demonstrate attacks, capabilities, TTP’s tracking, trending, positive metrics, hunt integration and most of all we will lay out a road map to STOP this nonsense of Red vs BLUE and realize that we are all on the same team. Sparring and training every day to be ready for the fight when it comes to us.
The document discusses building a home arcade system. It details three attempts using different hardware configurations - a Raspberry Pi, Windows laptop with Maximus Arcade emulator, and potentially a Windows PC with Hyperspin frontend. The Raspberry Pi setup had issues with exiting games without a keyboard. The Maximus Arcade setup on a laptop worked better out of the box but had video card issues. The goal is to build an easy-to-use system for kids to play retro games.
GAUCbe 2015 - Dashboard Building - Involving clients to find the right metric...Devid Dekegel
* Google Analytics User conference Belgium 2015 presentation *
The size and speed of our digital data is growing at an enormous pace, and more datasources are added each day. We all have an enormous pool of data at our disposal that can give us magical insights about our brands. But it is easy to get lost in all this data and lose focus on what really matters. During this talk we will share our way of working at Colruyt Group and how we manage to keep data simple and work more closely with our stakeholders to build the most suitable dashboards.
This document provides an executive summary of a report on shared ownership in the UK. The key points are:
1) There is large demand for shared ownership with 85,000 approvals reported annually. Housing associations are committed to further growth.
2) Awareness of shared ownership is growing, with 51% of the public able to correctly describe it, but more can be done to increase understanding.
3) The sector is working to improve standards through a new charter. Modeling shows shared ownership remains affordable even with interest rate rises.
4) There is market capacity for 60,000 shared ownership units annually. Lender appetite is growing as data issues are addressed. Overall, shared ownership is becoming
Justin Dunne has over 15 years of experience in business development, sales, and property management roles. He has a proven track record of exceeding sales goals and quotas across various industries including real estate, telecommunications, healthcare, and technology. His resume highlights consistent top performance, including being named rookie of the month and employee of distinction.
The document summarizes the key findings of the Outward Bound Trust regarding the UK government's proposed apprenticeship levy. The main points are:
- The levy of 0.5% on company payrolls over £3 million will be used to fund three million new apprenticeships by 2020.
- It will be collected through PAYE starting in April 2017. Employers will receive an allowance of £15,000 and a 10% top-up from the government to spend on apprenticeships.
- Funds must be spent on approved apprenticeship training within 18 months or will expire. Employers can only spend funds on approved training providers listed on the new Digital Apprentices
MapReduce provides a programming model for processing large datasets in a distributed, parallel manner. It involves two main steps - the map step where the input data is converted into intermediate key-value pairs, and the reduce step where the intermediate outputs are aggregated based on keys to produce the final results. Hadoop is an open-source software framework that allows distributed processing of large datasets across clusters of computers using MapReduce.
The document contains sample questions and explanations for the Oracle 1z0-591 exam. It discusses topics like how data is stored in a star schema, how to rank sales amounts, required components for OBIEE installations, methods for developing OBIEE implementations, and examples of valid dynamic repository variables like CurrentMonth.
The London Underground map created by Harry Beck in 1933 was a modernist representation that rationalized the subterranean space beneath London. It used colored, straight lines with no resemblance to the physical layout of the city above. This abstracted the chaotic nature of London into an organized network that was easy to navigate. The angular lines reflected the Futurism and Vorticism artistic movements of the time that emphasized speed, technology and order. Beck's map was a break from previous maps that depicted the actual winding routes. It provided comfort by simplifying the complex underground system and has become the standard format for subway maps worldwide since.
The document appears to be a presentation about Oracle's R technologies and how they address challenges with the R programming language. It discusses Oracle R Distribution, Oracle R Enterprise, Oracle R Advanced Analytics for Hadoop, and ROracle. It also covers how Oracle has added capabilities for embedded R execution in the Oracle Database using SQL, including functions like rqEval and rqScriptCreate that allow running R scripts and accessing database contents directly from R.
Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine: Presented by T...Lucidworks
The document discusses leveraging Lucene/Solr as a knowledge graph and intent engine. It describes building an intent engine that incorporates type-ahead prediction, spelling correction, entity and entity-type resolution, semantic query parsing, and query augmentation using a knowledge graph. The intent engine aims to understand the user's intent beyond the literal query string and help express their intent through an interactive search experience.
5th in the AskTOM Office Hours series on graph database technologies. https://devgym.oracle.com/pls/apex/dg/office_hours/3084
PGQL: A Query Language for Graphs
Learn how to query graphs using PGQL, an expressive and intuitive graph query language that's a lot like SQL. With PGQL, it's easy to get going writing graph analysis queries to the database in a very short time. Albert and Oskar show what you can do with PGQL, and how to write and execute PGQL code.
U-SQL - Azure Data Lake Analytics for DevelopersMichael Rys
This document introduces U-SQL, a language for big data analytics on Azure Data Lake Analytics. U-SQL unifies SQL with imperative coding, allowing users to process both structured and unstructured data at scale. It provides benefits of both declarative SQL and custom code through an expression-based programming model. U-SQL queries can span multiple data sources and users can extend its capabilities through C# user-defined functions, aggregates, and custom extractors/outputters. The document demonstrates core U-SQL concepts like queries, joins, window functions, and the metadata model, highlighting how U-SQL brings together SQL and custom code for scalable big data analytics.
A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...Databricks
Of all the developers’ delight, none is more attractive than a set of APIs that make developers productive, that are easy to use, and that are intuitive and expressive. Apache Spark offers these APIs across components such as Spark SQL, Streaming, Machine Learning, and Graph Processing to operate on large data sets in languages such as Scala, Java, Python, and R for doing distributed big data processing at scale. In this talk, I will explore the evolution of three sets of APIs-RDDs, DataFrames, and Datasets-available in Apache Spark 2.x. In particular, I will emphasize three takeaways: 1) why and when you should use each set as best practices 2) outline its performance and optimization benefits; and 3) underscore scenarios when to use DataFrames and Datasets instead of RDDs for your big data distributed processing. Through simple notebook demonstrations with API code examples, you’ll learn how to process big data using RDDs, DataFrames, and Datasets and interoperate among them. (this will be vocalization of the blog, along with the latest developments in Apache Spark 2.x Dataframe/Datasets and Spark SQL APIs: https://databricks.com/blog/2016/07/14/a-tale-of-three-apache-spark-apis-rdds-dataframes-and-datasets.html)
Vital AI MetaQL: Queries Across NoSQL, SQL, Sparql, and SparkVital.AI
This document provides an overview of MetaQL, which allows composing queries across NoSQL, SQL, SPARQL, and Spark databases using a domain model. Key points include:
- MetaQL uses a domain model to define concepts and compose typed queries in code that can execute across different databases.
- This separates concerns and improves developer efficiency over managing schemas and databases separately.
- Examples demonstrate MetaQL queries in graph, path, select, and aggregation formats across SQL, NoSQL, and RDF implementations.
Rafael Bagmanov «Scala in a wild enterprise»e-Legion
This document discusses Scala adoption in the enterprise. It describes how Scala was used to build OpenGenesis, an open-source deployment orchestration tool that was successfully deployed in a large financial institution. While Scala works well with common J2EE patterns like Spring MVC, Spring, and JPA/Squeryl, there are challenges around hiring Scala developers and establishing coding standards. The greatest challenges are cultural and involve people.
Designing and Building a Graph Database Application - Ian Robinson (Neo Techn...jaxLondonConference
Presented at JAX London
In this session we'll look at some of the design and implementation strategies you can employ when building a Neo4j-based graph database solution, including architectural choices, data modelling, and testing.
A Tale of Three Apache Spark APIs: RDDs, DataFrames and Datasets by Jules DamjiData Con LA
Abstract:- Of all the developers delight, none is more attractive than a set of APIs that make developers productive, that are easy to use, and that are intuitive and expressive. Apache Spark offers these APIs across components such as Spark SQL, Streaming, Machine Learning, and Graph Processing to operate on large data sets in languages such as Scala, Java, Python, and R for doing distributed big data processing at scale. In this talk, I will explore the evolution of three sets of APIs - RDDs, DataFrames, and Datasets available in Apache Spark 2.x. In particular, I will emphasize why and when you should use each set as best practices, outline its performance and optimization benefits, and underscore scenarios when to use DataFrames and Datasets instead of RDDs for your big data distributed processing. Through simple notebook demonstrations with API code examples, you'll learn how to process big data using RDDs, DataFrames, and Datasets and interoperate among them.
This document discusses using document databases like CouchDB with TYPO3 Flow. It provides an overview of persistence basics in Flow and Doctrine ORM. It then covers using CouchDB as a document database, including its REST API, basics, and the TYPO3.CouchDB package. It notes limitations and introduces alternatives like Radmiraal.CouchDB that support multiple backends. Finally, it discusses future support for multiple persistence backends in Flow.
Alex mang patterns for scalability in microsoft azure applicationCodecamp Romania
The document discusses patterns for scalability in Microsoft Azure applications. It covers queue-based load leveling, competing consumers, and priority queue patterns for handling application load and message processing. It also discusses materialized view and sharding patterns for scaling databases, where materialized views optimize queries and sharding partitions data horizontally across multiple servers. The talk includes demos of priority queue and sharding patterns to illustrate their implementations.
En esta charla miraremos al futuro introduciendo Spark como alternativa al clásico motor de Hadoop MapReduce. Describiremos las diferencias más importantes frente al mismo, se detallarán los componentes principales que componen el ecosistema Spark, e introduciremos conceptos básicos que permitan empezar con el desarrollo de aplicaciones básicas sobre el mismo.
Azure HDlnsight에서 R 및 Spark를 이용하여 확장 가능한 머신러닝OSS On Azure
'애저, 오픈소스의 날개를 달다 웨비나 2'_20171214
Microsoft 한석진 부장, 락플레이스 최덕순 부장
- Azure HDlnsight에서 R 및 Spark를 이용하여 확장 가능한 머신러닝 소개
- 문의 락플레이스 MS사업본부([email protected])
This document discusses Redis, MongoDB, and Amazon DynamoDB. It begins with an overview of NoSQL databases and the differences between SQL and NoSQL databases. It then covers Redis data types like strings, hashes, lists, sets, sorted sets, and streams. Examples use cases for Redis are also provided like leaderboards, geospatial queries, and message queues. The document also discusses MongoDB design patterns like embedding data, embracing duplication, and relationships. Finally, it provides a high-level overview of DynamoDB concepts like tables, items, attributes, and primary keys.
There are many common workloads in R that are "embarrassingly parallel": group-by analyses, simulations, and cross-validation of models are just a few examples. In this talk I'll describe several techniques available in R to speed up workloads like these, by running multiple iterations simultaneously, in parallel.
Many of these techniques require the use of a cluster of machines running R, and I'll provide examples of using cloud-based services to provision clusters for parallel computations. In particular, I will describe how you can use the SparklyR package to distribute data manipulations using the dplyr syntax, on a cluster of servers provisioned in the Azure cloud.
Presented by David Smith at Data Day Texas in Austin, January 27 2018.
Best Practices for Building and Deploying Data Pipelines in Apache SparkDatabricks
Many data pipelines share common characteristics and are often built in similar but bespoke ways, even within a single organisation. In this talk, we will outline the key considerations which need to be applied when building data pipelines, such as performance, idempotency, reproducibility, and tackling the small file problem. We’ll work towards describing a common Data Engineering toolkit which separates these concerns from business logic code, allowing non-Data-Engineers (e.g. Business Analysts and Data Scientists) to define data pipelines without worrying about the nitty-gritty production considerations.
We’ll then introduce an implementation of such a toolkit in the form of Waimak, our open-source library for Apache Spark (https://github.com/CoxAutomotiveDataSolutions/waimak), which has massively shortened our route from prototype to production. Finally, we’ll define new approaches and best practices about what we believe is the most overlooked aspect of Data Engineering: deploying data pipelines.
The document provides an introduction to the R programming language. It discusses that R is an open-source programming language for statistical analysis and graphics. It can run on Windows, Unix and MacOS. The document then covers downloading and installing R and R Studio, the R workspace, basics of R syntax like naming conventions and assignments, working with data in R including importing, exporting and creating calculated fields, using R packages and functions, and resources for R help and tutorials.
HCL Nomad Web – Best Practices and Managing Multiuser Environmentspanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-nomad-web-best-practices-and-managing-multiuser-environments/
HCL Nomad Web is heralded as the next generation of the HCL Notes client, offering numerous advantages such as eliminating the need for packaging, distribution, and installation. Nomad Web client upgrades will be installed “automatically” in the background. This significantly reduces the administrative footprint compared to traditional HCL Notes clients. However, troubleshooting issues in Nomad Web present unique challenges compared to the Notes client.
Join Christoph and Marc as they demonstrate how to simplify the troubleshooting process in HCL Nomad Web, ensuring a smoother and more efficient user experience.
In this webinar, we will explore effective strategies for diagnosing and resolving common problems in HCL Nomad Web, including
- Accessing the console
- Locating and interpreting log files
- Accessing the data folder within the browser’s cache (using OPFS)
- Understand the difference between single- and multi-user scenarios
- Utilizing Client Clocking
AI 3-in-1: Agents, RAG, and Local Models - Brent LasterAll Things Open
Presented at All Things Open RTP Meetup
Presented by Brent Laster - President & Lead Trainer, Tech Skills Transformations LLC
Talk Title: AI 3-in-1: Agents, RAG, and Local Models
Abstract:
Learning and understanding AI concepts is satisfying and rewarding, but the fun part is learning how to work with AI yourself. In this presentation, author, trainer, and experienced technologist Brent Laster will help you do both! We’ll explain why and how to run AI models locally, the basic ideas of agents and RAG, and show how to assemble a simple AI agent in Python that leverages RAG and uses a local model through Ollama.
No experience is needed on these technologies, although we do assume you do have a basic understanding of LLMs.
This will be a fast-paced, engaging mixture of presentations interspersed with code explanations and demos building up to the finished product – something you’ll be able to replicate yourself after the session!
The cost benefit of implementing a Dell AI Factory solution versus AWS and Azure
Our research shows that hosting GenAI workloads on premises, either in a traditional Dell solution or using managed Dell APEX Subscriptions, could significantly lower your GenAI costs over 4 years compared to hosting these workloads in the cloud. In fact, we found that a Dell AI Factory on-premises solution could reduce costs by at much as 71 percent vs. a comparable AWS SageMaker solution and as much as 61 percent vs. a comparable Azure ML solution. These results show that organizations looking to implement GenAI and reap the business benefits to come can find many advantages in an on-premises Dell AI Factory solution, whether they opt to purchase and manage it themselves or engage with Dell APEX Subscriptions. Choosing an on-premises Dell AI Factory solution could save your organization significantly over hosting GenAI in the cloud, while giving you control over the security and privacy of your data as well as any updates and changes to the environment, and while ensuring your environment is managed consistently.
Train Smarter, Not Harder – Let 3D Animation Lead the Way!
Discover how 3D animation makes inductions more engaging, effective, and cost-efficient.
Check out the slides to see how you can transform your safety training process!
Slide 1: Why 3D animation changes the game
Slide 2: Site-specific induction isn’t optional—it’s essential
Slide 3: Visitors are most at risk. Keep them safe
Slide 4: Videos beat text—especially when safety is on the line
Slide 5: TechEHS makes safety engaging and consistent
Slide 6: Better retention, lower costs, safer sites
Slide 7: Ready to elevate your induction process?
Can an animated video make a difference to your site's safety? Let's talk.
The Future of Cisco Cloud Security: Innovations and AI IntegrationRe-solution Data Ltd
Stay ahead with Re-Solution Data Ltd and Cisco cloud security, featuring the latest innovations and AI integration. Our solutions leverage cutting-edge technology to deliver proactive defense and simplified operations. Experience the future of security with our expert guidance and support.
Vaibhav Gupta BAML: AI work flows without Hallucinationsjohn409870
Shipping Agents
Vaibhav Gupta
Cofounder @ Boundary
in/vaigup
boundaryml/baml
Imagine if every API call you made
failed only 5% of the time
boundaryml/baml
Imagine if every LLM call you made
failed only 5% of the time
boundaryml/baml
Imagine if every LLM call you made
failed only 5% of the time
boundaryml/baml
Fault tolerant systems are hard
but now everything must be
fault tolerant
boundaryml/baml
We need to change how we
think about these systems
Aaron Villalpando
Cofounder @ Boundary
Boundary
Combinator
boundaryml/baml
We used to write websites like this:
boundaryml/baml
But now we do this:
boundaryml/baml
Problems web dev had:
boundaryml/baml
Problems web dev had:
Strings. Strings everywhere.
boundaryml/baml
Problems web dev had:
Strings. Strings everywhere.
State management was impossible.
boundaryml/baml
Problems web dev had:
Strings. Strings everywhere.
State management was impossible.
Dynamic components? forget about it.
boundaryml/baml
Problems web dev had:
Strings. Strings everywhere.
State management was impossible.
Dynamic components? forget about it.
Reuse components? Good luck.
boundaryml/baml
Problems web dev had:
Strings. Strings everywhere.
State management was impossible.
Dynamic components? forget about it.
Reuse components? Good luck.
Iteration loops took minutes.
boundaryml/baml
Problems web dev had:
Strings. Strings everywhere.
State management was impossible.
Dynamic components? forget about it.
Reuse components? Good luck.
Iteration loops took minutes.
Low engineering rigor
boundaryml/baml
React added engineering rigor
boundaryml/baml
The syntax we use changes how we
think about problems
boundaryml/baml
We used to write agents like this:
boundaryml/baml
Problems agents have:
boundaryml/baml
Problems agents have:
Strings. Strings everywhere.
Context management is impossible.
Changing one thing breaks another.
New models come out all the time.
Iteration loops take minutes.
boundaryml/baml
Problems agents have:
Strings. Strings everywhere.
Context management is impossible.
Changing one thing breaks another.
New models come out all the time.
Iteration loops take minutes.
Low engineering rigor
boundaryml/baml
Agents need
the expressiveness of English,
but the structure of code
F*** You, Show Me The Prompt.
boundaryml/baml
<show don’t tell>
Less prompting +
More engineering
=
Reliability +
Maintainability
BAML
Sam
Greg Antonio
Chris
turned down
openai to join
ex-founder, one
of the earliest
BAML users
MIT PhD
20+ years in
compilers
made his own
database, 400k+
youtube views
Vaibhav Gupta
in/vaigup
[email protected]
boundaryml/baml
Thank you!
Viam product demo_ Deploying and scaling AI with hardware.pdfcamilalamoratta
Building AI-powered products that interact with the physical world often means navigating complex integration challenges, especially on resource-constrained devices.
You'll learn:
- How Viam's platform bridges the gap between AI, data, and physical devices
- A step-by-step walkthrough of computer vision running at the edge
- Practical approaches to common integration hurdles
- How teams are scaling hardware + software solutions together
Whether you're a developer, engineering manager, or product builder, this demo will show you a faster path to creating intelligent machines and systems.
Resources:
- Documentation: https://on.viam.com/docs
- Community: https://discord.com/invite/viam
- Hands-on: https://on.viam.com/codelabs
- Future Events: https://on.viam.com/updates-upcoming-events
- Request personalized demo: https://on.viam.com/request-demo
Challenges in Migrating Imperative Deep Learning Programs to Graph Execution:...Raffi Khatchadourian
Efficiency is essential to support responsiveness w.r.t. ever-growing datasets, especially for Deep Learning (DL) systems. DL frameworks have traditionally embraced deferred execution-style DL code that supports symbolic, graph-based Deep Neural Network (DNN) computation. While scalable, such development tends to produce DL code that is error-prone, non-intuitive, and difficult to debug. Consequently, more natural, less error-prone imperative DL frameworks encouraging eager execution have emerged at the expense of run-time performance. While hybrid approaches aim for the "best of both worlds," the challenges in applying them in the real world are largely unknown. We conduct a data-driven analysis of challenges---and resultant bugs---involved in writing reliable yet performant imperative DL code by studying 250 open-source projects, consisting of 19.7 MLOC, along with 470 and 446 manually examined code patches and bug reports, respectively. The results indicate that hybridization: (i) is prone to API misuse, (ii) can result in performance degradation---the opposite of its intention, and (iii) has limited application due to execution mode incompatibility. We put forth several recommendations, best practices, and anti-patterns for effectively hybridizing imperative DL code, potentially benefiting DL practitioners, API designers, tool developers, and educators.
Canadian book publishing: Insights from the latest salary survey - Tech Forum...BookNet Canada
Join us for a presentation in partnership with the Association of Canadian Publishers (ACP) as they share results from the recently conducted Canadian Book Publishing Industry Salary Survey. This comprehensive survey provides key insights into average salaries across departments, roles, and demographic metrics. Members of ACP’s Diversity and Inclusion Committee will join us to unpack what the findings mean in the context of justice, equity, diversity, and inclusion in the industry.
Results of the 2024 Canadian Book Publishing Industry Salary Survey: https://publishers.ca/wp-content/uploads/2025/04/ACP_Salary_Survey_FINAL-2.pdf
Link to presentation recording and transcript: https://bnctechforum.ca/sessions/canadian-book-publishing-insights-from-the-latest-salary-survey/
Presented by BookNet Canada and the Association of Canadian Publishers on May 1, 2025 with support from the Department of Canadian Heritage.
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep DiveScyllaDB
Want to learn practical tips for designing systems that can scale efficiently without compromising speed?
Join us for a workshop where we’ll address these challenges head-on and explore how to architect low-latency systems using Rust. During this free interactive workshop oriented for developers, engineers, and architects, we’ll cover how Rust’s unique language features and the Tokio async runtime enable high-performance application development.
As you explore key principles of designing low-latency systems with Rust, you will learn how to:
- Create and compile a real-world app with Rust
- Connect the application to ScyllaDB (NoSQL data store)
- Negotiate tradeoffs related to data modeling and querying
- Manage and monitor the database for consistently low latencies
Web & Graphics Designing Training at Erginous Technologies in Rajpura offers practical, hands-on learning for students, graduates, and professionals aiming for a creative career. The 6-week and 6-month industrial training programs blend creativity with technical skills to prepare you for real-world opportunities in design.
The course covers Graphic Designing tools like Photoshop, Illustrator, and CorelDRAW, along with logo, banner, and branding design. In Web Designing, you’ll learn HTML5, CSS3, JavaScript basics, responsive design, Bootstrap, Figma, and Adobe XD.
Erginous emphasizes 100% practical training, live projects, portfolio building, expert guidance, certification, and placement support. Graduates can explore roles like Web Designer, Graphic Designer, UI/UX Designer, or Freelancer.
For more info, visit erginous.co.in , message us on Instagram at erginoustechnologies, or call directly at +91-89684-38190 . Start your journey toward a creative and successful design career today!
GyrusAI - Broadcasting & Streaming Applications Driven by AI and MLGyrus AI
Gyrus AI: AI/ML for Broadcasting & Streaming
Gyrus is a Vision Al company developing Neural Network Accelerators and ready to deploy AI/ML Models for Video Processing and Video Analytics.
Our Solutions:
Intelligent Media Search
Semantic & contextual search for faster, smarter content discovery.
In-Scene Ad Placement
AI-powered ad insertion to maximize monetization and user experience.
Video Anonymization
Automatically masks sensitive content to ensure privacy compliance.
Vision Analytics
Real-time object detection and engagement tracking.
Why Gyrus AI?
We help media companies streamline operations, enhance media discovery, and stay competitive in the rapidly evolving broadcasting & streaming landscape.
🚀 Ready to Transform Your Media Workflow?
🔗 Visit Us: https://gyrus.ai/
📅 Book a Demo: https://gyrus.ai/contact
📝 Read More: https://gyrus.ai/blog/
🔗 Follow Us:
LinkedIn - https://www.linkedin.com/company/gyrusai/
Twitter/X - https://twitter.com/GyrusAI
YouTube - https://www.youtube.com/channel/UCk2GzLj6xp0A6Wqix1GWSkw
Facebook - https://www.facebook.com/GyrusAI
GyrusAI - Broadcasting & Streaming Applications Driven by AI and MLGyrus AI
Overview of running R in the Oracle Database
1.
www.oraly)cs.com
t
:
@brendan)erney
e
:
brendan.)erney@oraly)cs.com
Running
R
in
the
Database
using
Oracle
R
Enterprise
Brendan Tierney
Code
Demo
2.
www.oraly)cs.com
t
:
@brendan)erney
e
:
brendan.)erney@oraly)cs.com
§ Data
Warehousing
since
1997
§ Data
Mining
since
1998
§ Analy)cs
since
1993
Brendan
Tierney
3.
www.oraly)cs.com
t
:
@brendan)erney
e
:
brendan.)erney@oraly)cs.com
Agenda
• What
is
R?
• Oracle
Advanced
Analy)cs
Op)on
• Oracle
R
Technologies
&
Oracle
R
Enterprise
• Examples
of
using
ORE
• Crea)ng
&
running
R
in
the
Database
• How
to
run
R
in
the
Database
using
SQL
• Using
ORE
with
other
products
Code
Demo
4.
www.oraly)cs.com
t
:
@brendan)erney
e
:
brendan.)erney@oraly)cs.com
Agenda
• What
is
R?
• Oracle
Advanced
Analy)cs
Op)on
• Oracle
R
Technologies
&
Oracle
R
Enterprise
• Examples
of
using
ORE
• Crea)ng
&
running
R
in
the
Database
• How
to
run
R
in
the
Database
using
SQL
• Using
ORE
with
other
products
5.
www.oraly)cs.com
t
:
@brendan)erney
e
:
brendan.)erney@oraly)cs.com
R
§ R
Open
source
sta)s)cal
compu)ng
and
graphics
language
§ Started
in
1993
as
an
alterna)ve
to
SAS,
SPSS
and
other
proprietary
sta)s)cal
packages
• Originally
called
S,
renamed
to
R
in
1996
§ R
is
a
client
and
server
bundled
together
as
one
executable
• It
is
a
single
user
tool
• It
is
not
mul)-‐threaded
• Constrained
to
a
single
CPU
§ Millions
of
R
users
worldwide
§ Thousands
of
libraries
available
at
• hXp://cran.r-‐project.org
§ Free
Milestones:
2017-‐01-‐09:
9870
packages
2016-‐06-‐01:
8492
packages
2015-‐03-‐13:
6400
packages
2015-‐02-‐15:
6325
packages
2014-‐10-‐29:
6000
packages
2013-‐11-‐08:
5000
packages
2012-‐08-‐23:
4000
packages
2011-‐05-‐12:
3000
packages
2009-‐10-‐04:
2000
packages
2007-‐04-‐12:
1000
packages
2004-‐10-‐01:
500
packages
2003-‐04-‐01:
250
packages
6.
www.oraly)cs.com
t
:
@brendan)erney
e
:
brendan.)erney@oraly)cs.com
§ Is
used
every
where
§ Par)cularly
in
USA
– And
else
where
7.
www.oraly)cs.com
t
:
@brendan)erney
e
:
brendan.)erney@oraly)cs.com
Examples
of
R
8.
www.oraly)cs.com
t
:
@brendan)erney
e
:
brendan.)erney@oraly)cs.com
> library(RJDBC)
> # Create connection driver and open
> connectionjdbcDriver <- JDBC(driverClass="oracle.jdbc.OracleDriver", classPath="c:/ojdbc6.jar")
> jdbcConnection <- dbConnect(jdbcDriver, "jdbc:oracle:thin:@//localhost:1521/orcl", "dmuser", "dmuser")
> #list the tables in the schema
> #dbListTables(jdbcConnection)
> #get the DB connections details - it get LOTS of info - Do not run unless it is really needed
> dbGetInfo(jdbcConnection)
> # Query on the Oracle instance name.
> #instanceName <- dbGetQuery(jdbcConnection, "SELECT instance_name FROM v$instance")
TABLE_NAME1
1 INSUR_CUST_LTV_SAMPLE2
2 OUTPUT_1_2
> #print(instanceName)tableNames <- dbGetQuery(jdbcConnection, "SELECT table_name from user_tables where
table_name not like 'DM$%' and table_name not like 'ODMR$%'")
> print(tableNames)
> viewNames <- dbGetQuery(jdbcConnection, "SELECT view_name from user_views")print(viewNames)
1 MINING_DATA_APPLY_V
2 MINING_DATA_BUILD_V
3 MINING_DATA_TEST_V
4 MINING_DATA_TEXT_APPLY_V
5 MINING_DATA_TEXT_BUILD_V
6 MINING_DATA_TEXT_TEST_V
> dbDisconnect(jdbcConnection)
Using
RJDBC
9.
www.oraly)cs.com
t
:
@brendan)erney
e
:
brendan.)erney@oraly)cs.com
> library(ROracle)
> drv <- dbDriver("Oracle")
> # Create the connection string
> host <- "localhost"
> port <- 1521
> sid <- "orcl"
>connect.string <- paste("(DESCRIPTION=”, "(ADDRESS=(PROTOCOL=tcp)(HOST=", host, ")(PORT=", port, "))",
> "(CONNECT_DATA=(SID=", sid, ")))", sep = "")
> con <- dbConnect(drv, username = "dmuser", password = "dmuser",dbname=connect.string)
> rs <- dbSendQuery(con, "select view_name from user_views")
> # fetch records from the resultSet into a data.frame
> data <- fetch(rs)
> # extract all rows
> dim(data)
[1] 6 1
> data
VIEW_NAME
1 MINING_DATA_APPLY_V
2 MINING_DATA_BUILD_V
3 MINING_DATA_TEST_V
4 MINING_DATA_TEXT_APPLY_V
5 MINING_DATA_TEXT_BUILD_V
6 MINING_DATA_TEXT_TEST_V
> dbCommit(con)
> dbClearResult(rs)
> dbDisconnect(con)
Using
ROracle
Needs
Oracle
Client
in
the
search
path
Pulls
the
data
to
the
Client
Has
a
set
of
R
func)ons
tuned
for
the
Oracle
DB
10.
www.oraly)cs.com
t
:
@brendan)erney
e
:
brendan.)erney@oraly)cs.com
The
Challenges
§ Scalability
§ Regardless
of
the
number
of
cores
on
your
CPU,
R
will
only
use
1
on
a
default
build
§ Performance
§ R
reads
data
into
memory
by
default.
Easy
to
exhaust
RAM
by
storing
unnecessary
data.
Typically
R
will
throw
an
excep)on
at
2GB.
§ Paralleliza)on
can
be
challenge.
Is
not
Default.
Packages
available
§ Produc)on
Deployment
§ Difficul)es
deploying
R
in
produc)on
§ Typically
need
to
re-‐code
in
…..
11.
www.oraly)cs.com
t
:
@brendan)erney
e
:
brendan.)erney@oraly)cs.com
Agenda
• What
is
R?
• Oracle
Advanced
Analy)cs
Op)on
• Oracle
R
Technologies
&
Oracle
R
Enterprise
• Examples
of
using
ORE
• Crea)ng
&
running
R
in
the
Database
• How
to
run
R
in
the
Database
using
SQL
• Using
ORE
with
other
products
12.
www.oraly)cs.com
t
:
@brendan)erney
e
:
brendan.)erney@oraly)cs.com
Comprehensive
Advanced
Analy)cs
Plaform
13.
www.oraly)cs.com
t
:
@brendan)erney
e
:
brendan.)erney@oraly)cs.com
Technique
Algorithms
Applicability
Classifica)on
Logis)c
Regression
(GLM)
Decision
Trees
Naïve
Bayes
Support
Vector
Machine
Classical
Sta)s)cal
Technique
Popular
/
Rules
/
Transparency
Embedded
Wide
/
Narrow
Data
/
Text
Regression
Mul)ple
Regression
Support
Vector
Machine
Classical
Sta)s)cal
Technique
Wide
/
Narrow
Data
/
Text
Anomaly
Detec)on
One
Class
SVM
Lack
Examples
AXribute
Importance
Minimum
Descrip)ve
Length
AXribute
Reduc)on
Iden)fy
Useful
Data
Reduce
Data
Noise
Associa)on
Rules
Apriori
Market
Basket
Analysis
Link
Analysis
Clustering
Enhanced
K-‐Means
O-‐Cluster
Expecta)on
Maximiza)on
Product
Grouping
Text
Mining
Gene
and
Protein
Analysis
Feature
Extrac)on
Non-‐Nega)ve
Matrix
Factoriza)on
Principal
Components
Analysis
Singular
Vector
Decomposi)on
Text
Analysis
Feature
Reduc)on
In-‐Database
Data
Mining
Algorithms
14.
www.oraly)cs.com
t
:
@brendan)erney
e
:
brendan.)erney@oraly)cs.com
Oracle
Data
Mining
§ PL/SQL
Package
§ DBMS_DATA_MINING
§ DBMS_DATA_MINING_TRANSFORM
§ DBMS_PREDICTIVE_ANALYTICS
§ SQL
Func)ons
– PREDICTION
– PREDICTION_PROBABILITY
– PREDICTION_BOUNDS
– PREDICTION_COST
– PREDICTION_DETAILS
– PREDICTION_SET
– CLUSTER_ID
– CLUSTER_DETAILS
– CLUSTER_DISTANCE
– CLUSTER_PROBABILITY
– CLUSTER_SET
– FEATURE_ID
– FEATURE_DETAILS
– FEATURE_SET
– FEATURE_VALUE
§ 12c
–
Predic)ve
Queries
§ aka
Dynamic
Queries
§ Transi)ve
dynamic
Data
Mining
models
§ Can
scale
to
many
100+
models
all
in
one
statement
15.
www.oraly)cs.com
t
:
@brendan)erney
e
:
brendan.)erney@oraly)cs.com
Sta)s)cal
Func)ons
in
Oracle
All
of
these
are
FREE
with
the
Database
These
are
oren
forgoXen
about
16.
www.oraly)cs.com
t
:
@brendan)erney
e
:
brendan.)erney@oraly)cs.com
Comprehensive
Advanced
Analy)cs
Plaform
17.
www.oraly)cs.com
t
:
@brendan)erney
e
:
brendan.)erney@oraly)cs.com
Comprehensive
Advanced
Analy)cs
Plaform
18.
www.oraly)cs.com
t
:
@brendan)erney
e
:
brendan.)erney@oraly)cs.com
Agenda
• What
is
R?
• Oracle
Advanced
Analy)cs
Op)on
• Oracle
R
Technologies
&
Oracle
R
Enterprise
• Examples
of
using
ORE
• Crea)ng
&
running
R
in
the
Database
• How
to
run
R
in
the
Database
using
SQL
• Using
ORE
with
other
products
19.
www.oraly)cs.com
t
:
@brendan)erney
e
:
brendan.)erney@oraly)cs.com
Oracle
R
Technologies
R
Distribu)on
Oracle's
supported
redistribu)on
of
open
source
R,
provided
as
a
free
download
from
Oracle,
enhanced
with
dynamic
loading
of
high
performance
linear
algebra
libraries.
Oracle
R
Enterprise
Integra)on
of
R
with
Oracle
Database.
A
component
of
the
Oracle
Advanced
Analy)cs
Op)on.
Oracle
R
Enterprise
makes
the
open
source
R
sta)s)cal
programming
language
and
environment
ready
for
the
enterprise
with
scalability,
performance,
and
ease
of
produc)on
deployment.
Oracle
R
Advanced
Analy)cs
for
Hadoop
High
performance
na)ve
access
to
the
Hadoop
Distributed
File
System
(HDFS)
and
MapReduce
programming
framework
for
R
users.
Oracle
R
Advanced
Analy)cs
for
Hadoop
is
a
component
of
Oracle
Big
Data
Connectors
sorware
suite.
ROracle
An
open
source
R
package,
maintained
by
Oracle
and
enhanced
to
use
the
Oracle
Call
Interface
(OCI)
libraries
to
handle
database
connec)ons
-‐
providing
a
high-‐performance,
na)ve
C-‐language
interface
to
Oracle
Database.
20.
www.oraly)cs.com
t
:
@brendan)erney
e
:
brendan.)erney@oraly)cs.com
Oracle
R
Enterprise
§ R
installed
in
ORACLE_HOME
§ Fully
integrated
with
the
database
§ Overcomes
the
limita)ons
of
R
§ U)lizes
the
DB
performance
and
scalability
features
§ Full
integra)on
into
the
DB
engine
§ Can
run
R
inside
the
DB
§ Can
store
R
object
in
the
DB
§ Can
run
R
objects
using
SQL
&
PL/SQL
§ Easily
integrated
into
other
Oracle
Tools
and
Applica)ons
§ Greatly
expands
the
sta)s)cs
&
analy)cs
§ Easily
integrates
new
“bells
&
whistles”
package
comes
available
21.
www.oraly)cs.com
t
:
@brendan)erney
e
:
brendan.)erney@oraly)cs.com
Oracle
R
Enterprise
22.
www.oraly)cs.com
t
:
@brendan)erney
e
:
brendan.)erney@oraly)cs.com
The
Magic
:
The
Transparency
Layer
§ No
need
to
learn
a
different
programming
paradigm
or
environment
• If
you
are
an
R
programmer
§ Operate
on
database
data
as
though
they
were
R
objects
using
R
syntax
§ Require
minimal
change
to
base
R
scripts
for
database
data
§ Implicitly
translates
R
to
SQL
for
in-‐database
execu)on,
performance,
and
scalability
23.
www.oraly)cs.com
t
:
@brendan)erney
e
:
brendan.)erney@oraly)cs.com
> AggData <- aggregate(CUSTOMER_V$CUST_ID,
by = list(CUST_GENDER = CUSTOMER_V$CUST_GENDER),
FUN = length)
> # Display the results
> AggData
CUST_GENDER x
F F 18325
M M 37175
select
cust_gender,
count(*)
X
from
customer_v
group
by
cust_gender;
CUST_GENDER X
----------- ----------
F 18325
M 37175
Oracle
R
Enterprise
Transparency
Layer
1
2
3
4
5
6
24.
www.oraly)cs.com
t
:
@brendan)erney
e
:
brendan.)erney@oraly)cs.com
The
Magic
:
The
Transparency
Layer
§ R
interface
to
in-‐Database
Sta)s)cal
func)ons
25.
www.oraly)cs.com
t
:
@brendan)erney
e
:
brendan.)erney@oraly)cs.com
The
Magic
:
The
Transparency
Layer
§ R
interface
to
in-‐Database
Predic)ve
Analy)cs
func)ons
26.
www.oraly)cs.com
t
:
@brendan)erney
e
:
brendan.)erney@oraly)cs.com
The
Magic
:
The
Transparency
Layer
§ In
R
all
commands
are
executed
immediately
§ In
ORE,
commands
are
not
executed
immediately
§ They
are
stacked
and
accumulated
§ When
a
final
result
is
needed
(for
computa)on
or
viewing)
§ Oracle
will
perform
some
op)miza)on
reorganiza)on
of
the
commands
§ Oracle
will
perform
query
op)miza)on
§ Oracle
will
translate
into
in-‐Database
SQL
and
PL/SQL
func)on/
procedures/packages
§ Execute
the
SQL
(
R)
commands
on
the
data
in
the
database
§ Oracle
will
manage
the
results
§ The
returned
results
will
be
translated
back
into
R
format
27.
www.oraly)cs.com
t
:
@brendan)erney
e
:
brendan.)erney@oraly)cs.com
Agenda
• What
is
R?
• Oracle
Advanced
Analy)cs
Op)on
• Oracle
R
Technologies
Oracle
R
Enterprise
• Examples
of
using
ORE
• Crea)ng
R
objects
in
the
Database
• How
to
run
R
in
the
Database
using
SQL
• Using
ORE
with
other
products
You
can
run
all
the
example
code
if
you
have
the
ODM
Demo
Schema
created
Code
Demo
29.
www.oraly)cs.com
t
:
@brendan)erney
e
:
brendan.)erney@oraly)cs.com
Agenda
• What
is
R?
• Oracle
Advanced
Analy)cs
Op)on
• Oracle
R
Technologies
Oracle
R
Enterprise
• Examples
of
using
ORE
• Crea)ng
running
R
in
the
Database
• How
to
run
R
in
the
Database
using
SQL
• Using
ORE
with
other
products
31.
www.oraly)cs.com
t
:
@brendan)erney
e
:
brendan.)erney@oraly)cs.com
Agenda
• What
is
R?
• Oracle
Advanced
Analy)cs
Op)on
• Oracle
R
Technologies
Oracle
R
Enterprise
• Examples
of
using
ORE
• Crea)ng
running
R
in
the
Database
• How
to
run
R
in
the
Database
using
SQL
• Using
ORE
with
other
products
32.
www.oraly)cs.com
t
:
@brendan)erney
e
:
brendan.)erney@oraly)cs.com
Embedded
R
Execu)on
using
SQL
34.
www.oraly)cs.com
t
:
@brendan)erney
e
:
brendan.)erney@oraly)cs.com
--
-- Now let us use the Demo_GLM_Batch script to score data in Real-Time
-- The data values are passed to the GLM model
--
select * from table(rqTableEval(
cursor(select 'M' CUST_GENDER,
23 AGE,
'Married' CUST_MARITAL_STATUS,
'United States of America' COUNTRY_NAME,
'B: 30,000 - 49,999' CUST_INCOME_LEVEL,
'Assoc-A' EDUCATION,
'3' HOUSEHOLD_SIZE,
5 YRS_RESIDENCE
from dual),
cursor(select 'myDatastore' datastore_name, 1 ore.connect from dual),
'select CUST_GENDER, AGE, CUST_MARITAL_STATUS, COUNTRY_NAME, CUST_INCOME_LEVEL, EDUCATION,
HOUSEHOLD_SIZE, YRS_RESIDENCE, 1 PRED from MINING_DATA_APPLY',
'Demo_GLM_Batch')) order by 1, 2, 3;
What-‐if
analysis
You
can
easily
use
this
in-‐Database
R
code
in
your
applica)ons
35.
www.oraly)cs.com
t
:
@brendan)erney
e
:
brendan.)erney@oraly)cs.com
Agenda
• What
is
R?
• Oracle
Advanced
Analy)cs
Op)on
• Oracle
R
Technologies
Oracle
R
Enterprise
• Examples
of
using
ORE
• Crea)ng
running
R
in
the
Database
• How
to
run
R
in
the
Database
using
SQL
• Using
ORE
with
other
products
36.
www.oraly)cs.com
t
:
@brendan)erney
e
:
brendan.)erney@oraly)cs.com
Integra)ng
with
OBIEE,
BI
Publisher
Any
other
Language
or
Tool
Database
Server
Machine
Oracle
Database
-‐-‐-‐-‐-‐
-‐-‐-‐-‐-‐
-‐-‐-‐-‐-‐
-‐-‐-‐-‐-‐
-‐-‐-‐-‐-‐
-‐-‐-‐-‐-‐
-‐-‐-‐-‐-‐
-‐-‐-‐-‐-‐
-‐-‐-‐-‐-‐
-‐-‐-‐-‐-‐
-‐-‐-‐-‐-‐
-‐-‐-‐-‐-‐
-‐-‐-‐-‐-‐
-‐-‐-‐-‐-‐
-‐-‐-‐-‐-‐
-‐-‐-‐-‐-‐
-‐-‐-‐-‐-‐
-‐-‐-‐-‐-‐
-‐-‐-‐-‐-‐
-‐-‐-‐-‐-‐
-‐-‐-‐-‐-‐
-‐-‐-‐-‐-‐
-‐-‐-‐-‐-‐
-‐-‐-‐-‐-‐
-‐-‐-‐-‐-‐
-‐-‐-‐-‐-‐
-‐-‐-‐-‐-‐
-‐-‐-‐-‐-‐
-‐-‐-‐-‐-‐
-‐-‐-‐-‐-‐
-‐-‐-‐-‐-‐
-‐-‐-‐-‐-‐
-‐-‐-‐-‐-‐
-‐-‐-‐-‐-‐
-‐-‐-‐-‐-‐
-‐-‐-‐-‐-‐
-‐-‐-‐-‐-‐
-‐-‐-‐-‐-‐
-‐-‐-‐-‐-‐
-‐-‐-‐-‐-‐
-‐-‐-‐-‐-‐
-‐-‐-‐-‐-‐
-‐-‐-‐-‐-‐
-‐-‐-‐-‐-‐
-‐-‐-‐-‐-‐
-‐-‐-‐-‐-‐
-‐-‐-‐-‐-‐
-‐-‐-‐-‐-‐
-‐-‐-‐-‐-‐
-‐-‐-‐-‐-‐
-‐-‐-‐-‐-‐
-‐-‐-‐-‐-‐
-‐-‐-‐-‐-‐
-‐-‐-‐-‐-‐
-‐-‐-‐-‐-‐
-‐-‐-‐-‐-‐
-‐-‐-‐-‐-‐
-‐-‐-‐-‐-‐
-‐-‐-‐-‐-‐
-‐-‐-‐-‐-‐
-‐-‐-‐-‐-‐
-‐-‐-‐-‐-‐
-‐-‐-‐-‐-‐
-‐-‐-‐-‐-‐
-‐-‐-‐-‐-‐
-‐-‐-‐-‐-‐
-‐-‐-‐-‐-‐
-‐-‐-‐-‐-‐
-‐-‐-‐-‐-‐
-‐-‐-‐-‐-‐
-‐-‐-‐-‐-‐
-‐-‐-‐-‐-‐
Tables
In-‐Database
SQL
Func)ons
R
Language
Installa)on
ORE
Installed
Packages
R
Scripts
37.
www.oraly)cs.com
t
:
@brendan)erney
e
:
brendan.)erney@oraly)cs.com
--
-- Create an embedded R script
-- Called using the ORE SQL API
-- - performs an aggregration of the data
-- - creates a Graphic Plot
--
begin
sys.rqScriptDrop('AgeProfile');
sys.rqScriptCreate('AgeProfile',
'function(dat) {
mdbv - dat
aggdata - aggregate(mdbv$AFFINITY_CARD,
by = list(Age = mdbv$AGE),
FUN = length)
res - plot(aggdata$Age, aggdata$x, type = l) } ');
end;
/
--
-- Execute the embedded R Script
-- - Graphic created in PNG format for import into OBIEE
-- - change PNG to XML for BI Publisher
--
select * from table(rqTableEval( cursor(select * from MINING_DATA_BUILD_V),
cursor(select 1 ore.connect from dual),
'PNG','AgeProfile'));
ORE
Demo
ORE
Demo
Data
Chart
39.
www.oraly)cs.com
t
:
@brendan)erney
e
:
brendan.)erney@oraly)cs.com
Embedded
R
Execu)on
using
SQL
40.
www.oraly)cs.com
t
:
@brendan)erney
e
:
brendan.)erney@oraly)cs.com
The
Challenges
:
With
ORE
“Yes
we
can”
§ Scalability
§ Regardless
of
the
number
of
cores
on
your
CPU,
R
will
only
use
1
on
a
default
build
§ Performance
§ R
reads
data
into
memory
by
default.
Easy
to
exhaust
RAM
by
storing
unnecessary
data.
Typically
R
will
throw
an
excep)on
at
2GB.
§ Paralleliza)on
can
be
challenge.
Is
not
Default.
Packages
available
§ Produc)on
Deployment
§ Difficul)es
deploying
R
in
produc)on
§ May
need
to
re-‐code
in
…..
✔
✔
✔
Scale
with
the
Database
In-‐Database
R
execu)on.
Easy
integra)on
with
all
your
applica)ons.
41.
www.oraly)cs.com
t
:
@brendan)erney
e
:
brendan.)erney@oraly)cs.com
Data
Mining
/
Data
Science
in
Oracle
Is
Just
SQL
(Oracle
Data
Mining
SQL
Sta)s)cs
func)ons)
+
R
(Oracle
R
Enterprise)
42.
www.oraly)cs.com
t
:
@brendan)erney
e
:
brendan.)erney@oraly)cs.com
The
Challenges
:
With
ORE
“Yes
we
can”
§ Scalability
§ Regardless
of
the
number
of
cores
on
your
CPU,
R
will
only
use
1
on
a
default
build
§ Performance
§ R
reads
data
into
memory
by
default.
Easy
to
exhaust
RAM
by
storing
unnecessary
data.
Typically
R
will
throw
an
excep)on
at
2GB.
§ Paralleliza)on
can
be
challenge.
Is
not
Default.
Packages
available
§ Produc)on
Deployment
§ Difficul)es
deploying
R
in
produc)on
§ May
need
to
re-‐code
in
…..
✔
✔
✔
Scale
with
the
Database
In-‐Database
R
execu)on.
Easy
integra)on
with
all
your
applica)ons.
What
if
I
want
to
use
a
new
R
Package?
Easy,
just
install
it
on
the
DB
server
and
off
you
go
!!!
43.
www.oraly)cs.com
t
:
@brendan)erney
e
:
brendan.)erney@oraly)cs.com
brendan.)erney@oraly)cs.com
@brendan)erney
www.oraly)cs.com
ie.linkedin.com/in/brendan)erney
44.
www.oraly)cs.com
t
:
@brendan)erney
e
:
brendan.)erney@oraly)cs.com
Word
Cloud
of
the
Oracle
Advanced
Analy)cs
web-‐pages
hXp://www.oraly)cs.com/2015/01/crea)ng-‐word-‐cloud-‐of-‐oracle-‐oaa.html