Oracle Big Data Discovery working together with Cloudera Hadoop is the fastest way to ingest and understand data. Powerful data transformation capabilities mean that data can quickly be prepared for consumption by the extended organisation.
The document discusses Oracle Big Data Discovery, a product for exploring and analyzing big data stored in Hadoop. It allows users to find, explore, transform, discover and share insights from big data in a visual interface. Key features include an interactive data catalog, visualizing and exploring data attributes, powerful transformations and enrichments, composing data visualizations and projects, and collaboration tools. It aims to make data preparation only 20% of analytics projects so users can focus on analysis. The product runs natively on Hadoop clusters for scalability and integrates with the Hadoop ecosystem.
Oracle's BigData solutions consist of a number of new products and solutions to support customers looking to gain maximum business value from data sets such as weblogs, social media feeds, smart meters, sensors and other devices that generate massive volumes of data (commonly defined as ‘Big Data’) that isn’t readily accessible in enterprise data warehouses and business intelligence applications today.
Strata 2015 presentation from Oracle for Big Data - we are announcing several new big data products including GoldenGate for Big Data, Big Data Discovery, Oracle Big Data SQL and Oracle NoSQL
A modern approach to streaming data integration, event processing with a big data (kappa style) data architecture. Key patterns are discussed with pros/cons of newer approaches and open source technologies. Focus on Oracle and GoldenGate technology. OpenWorld 2018 presentation.
Slides from a presentation I gave at the 5th SOA, Cloud + Service Technology Symposium (September 2012, Imperial College, London). The goal of this presentation was to explore with the audience use cases at the intersection of SOA, Big Data and Fast Data. If you are working with both SOA and Big Data I would would be very interested to hear about your projects.
Unlocking Big Data Silos in the Enterprise or the Cloud (Con7877)Jeffrey T. Pollock
The document discusses Oracle Data Integration solutions for unifying big data silos in enterprises and the cloud. The key points covered include:
- Oracle Data Integration provides data integration and governance capabilities for real-time data movement, transformation, federation, quality and verification, and metadata management.
- It supports a highly heterogeneous set of data sources, including various database platforms, big data technologies like Hadoop, cloud applications, and open standards.
- The solutions discussed help improve agility, reduce costs and risk, and provide comprehensive data integration and governance capabilities for enterprises.
Hortonworks Oracle Big Data Integration Hortonworks
Slides from joint Hortonworks and Oracle webinar on November 11, 2014. Covers the Modern Data Architecture with Apache Hadoop and Oracle Data Integration products.
One Slide Overview: ORCL Big Data Integration and GovernanceJeffrey T. Pollock
This document discusses Oracle's approach to big data integration and governance. It describes Oracle tools like GoldenGate for real-time data capture and movement, Data Integrator for data transformation both on and off the Hadoop cluster, and governance tools for data preparation, profiling, cleansing, and metadata management. It positions Oracle as a leader in big data integration through capabilities like non-invasive data capture, low-latency data movement, and pushdown processing techniques pioneered by Oracle to optimize distributed queries.
Contexti / Oracle - Big Data : From Pilot to ProductionContexti
The document discusses challenges in moving big data projects from pilots to production. It highlights that pilots have loose SLAs and focus on a few use cases and demonstrated insights, while production requires enforced SLAs, supporting many use cases and delivering actionable insights. Key challenges in the transition include establishing governance, skills, funding models and integrating insights into operations. The document also provides examples of technology considerations and common operating models for big data analytics.
In this presentation at DAMA New York, Joe started by asking a key question: why are we doing this? Why analyze and share all these massive amounts of data? Basically, it comes down to the belief that in any organization, in any situation, if we can get the data and make it correct and timely, insights from it will become instantly actionable for companies to function more nimbly and successfully. Enabling the use of data can be a world-changing, world-improving activity and this session presents the steps necessary to get you there. Joe explained the concept of the "data lake" and also emphasizes the role of a strong data governance strategy that incorporates seven components needed for a successful program.
For more information on this presentation or Caserta Concepts, visit our website at http://casertaconcepts.com/.
The Maturity Model: Taking the Growing Pains Out of HadoopInside Analysis
The Briefing Room with Rick van der Lans and Think Big, a Teradata Company
Live Webcast on June 16, 2015
Watch the archive: https://bloorgroup.webex.com/bloorgroup/lsr.php?RCID=197f8106531874cc5c14081ca214eaff
Hadoop is arguably one of the most disruptive technologies of the last decade. Once lauded solely for its ability to transform the speed of batch processing, it has marched steadily forward and promulgated an array of performance-enhancing accessories, notably Spark and YARN. Hadoop has evolved into much more than a file system and batch processor, and it now promises to stand as the data management and analytics backbone for enterprises.
Register for this episode of The Briefing Room to learn from veteran Analyst Rick van der Lans, as he discusses the emerging roles of Hadoop within the analytics ecosystem. He’ll be briefed by Ron Bodkin of Think Big, a Teradata Company, who will explore Hadoop’s maturity spectrum, from typical entry use cases all the way up the value chain. He’ll show how enterprises that already use Hadoop in production are finding new ways to exploit its power and build creative, dynamic analytics environments.
Visit InsideAnalysis.com for more information.
Beyond a Big Data Pilot: Building a Production Data Infrastructure - Stampede...StampedeCon
This document discusses building a production data infrastructure beyond a big data pilot project. It examines the data value chain from data acquisition to analytics. The key components discussed include data acquisition, ingestion, storage, data services, analytics, and data management. Various options for these components are explored, with considerations for batch, interactive and real-time workloads. The goal is to provide a framework for understanding the options and making choices to support different use cases at scale in a production environment.
The Value of the Modern Data Architecture with Apache Hadoop and Teradata Hortonworks
This webinar discusses why Apache Hadoop most typically the technology underpinning "Big Data". How it fits in a modern data architecture and the current landscape of databases and data warehouses that are already in use.
Hadoop based data Lakes have become increasingly popular within today’s modern data architectures for their ability to scale, handle data variety and low cost. Many organizations start slow with the data lake initiatives but as they grow bigger, they suffer with challenges on data consistency, quality and security, resulting in losing confidence in their data lake initiatives.
This talk will discuss the need for good data governance mechanisms for Hadoop data lakes and it relationship with productivity and how it helps organizations meet regulatory and compliance requirements. The talk advocates carrying a different mindset for designing and implementing flexible governance mechanisms on Hadoop data lakes.
The Data Lake - Balancing Data Governance and Innovation Caserta
Joe Caserta gave the presentation "The Data Lake - Balancing Data Governance and Innovation" at DAMA NY's one day mini-conference on May 19th. Speakers covered emerging trends in Data Governance, especially around Big Data.
For more information on Caserta Concepts, visit our website at http://casertaconcepts.com/.
This document discusses deploying a governed data lake using Hadoop and Waterline Data Inventory. It begins by outlining the benefits of a data lake and differences between data lakes and data warehouses. It then discusses using Hadoop as the platform for the data lake and some challenges around governance, scale, and usability. The document proposes a three phase approach using Waterline Data Inventory to organize, inventory, and open up the data lake. It provides screenshots and descriptions of Waterline's key capabilities like metadata discovery, data profiling, sensitive data identification, governance tools, and self-service catalog. It also includes an overview of Waterline Data as a company.
Incorporating the Data Lake into Your Analytic ArchitectureCaserta
Joe Caserta, President at Caserta Concepts presented at the 3rd Annual Enterprise DATAVERSITY conference. The emphasis of this year's agenda is on the key strategies and architecture necessary to create a successful, modern data analytics organization.
Joe Caserta presented Incorporating the Data Lake into Your Analytics Architecture.
For more information on the services offered by Caserta Concepts, visit out website at http://casertaconcepts.com/.
Meaning making – separating signal from noise. How do we transform the customer's next input into an action that creates a positive customer experience? We make the data more intelligent, so that it is able to guide our actions. The Data Lake builds on Big Data strengths by automating many of the manual development tasks, providing several self-service features to end-users, and an intelligent management layer to organize it all. This results in lower cost to create solutions, "smart" analytics, and faster time to business value.
Agile Big Data Analytics Development: An Architecture-Centric ApproachSoftServe
Presented at The Hawaii International Conference on System Sciences by Hong-Mei Chen and Rick Kazman (University of Hawaii), Serge Haziyev (SoftServe).
Data Governance, Compliance and Security in Hadoop with ClouderaCaserta
The document discusses data governance, compliance and security in Hadoop. It provides an agenda for an event on this topic, including presentations from Joe Caserta of Caserta Concepts on data governance in big data, and Patrick Angeles of Cloudera on using Cloudera for data governance in Hadoop. The document also includes background information on Caserta Concepts and their expertise in data warehousing, business intelligence and big data analytics.
Caserta Concepts, Datameer and Microsoft shared their combined knowledge and a use case on big data, the cloud and deep analytics. Attendes learned how a global leader in the test, measurement and control systems market reduced their big data implementations from 18 months to just a few.
Speakers shared how to provide a business user-friendly, self-service environment for data discovery and analytics, and focus on how to extend and optimize Hadoop based analytics, highlighting the advantages and practical applications of deploying on the cloud for enhanced performance, scalability and lower TCO.
Agenda included:
- Pizza and Networking
- Joe Caserta, President, Caserta Concepts - Why are we here?
- Nikhil Kumar, Sr. Solutions Engineer, Datameer - Solution use cases and technical demonstration
- Stefan Groschupf, CEO & Chairman, Datameer - The evolving Hadoop-based analytics trends and the role of cloud computing
- James Serra, Data Platform Solution Architect, Microsoft, Benefits of the Azure Cloud Service
- Q&A, Networking
For more information on Caserta Concepts, visit our website: http://casertaconcepts.com/
Oracle OpenWorld London - session for Stream Analysis, time series analytics, streaming ETL, streaming pipelines, big data, kafka, apache spark, complex event processing
Joe Caserta was a featured speaker, along with MIT Sloan School faculty and other industry thought-leaders. His session 'You're the New CDO, Now What?' discussed how new CDOs can accomplish their strategic objectives and overcome tactical challenges in this emerging executive leadership role.
In its tenth year, the MIT CDOIQ Symposium 2016 continues to explore the developing role of the Chief Data Officer.
For more information, visit http://casertaconcepts.com/
The document discusses Oracle's data integration products and big data solutions. It outlines five core capabilities of Oracle's data integration platform, including data availability, data movement, data transformation, data governance, and streaming data. It then describes eight core products that address real-time and streaming integration, ELT integration, data preparation, streaming analytics, dataflow ML, metadata management, data quality, and more. The document also outlines five cloud solutions for data integration including data migrations, data warehouse integration, development and test environments, high availability, and heterogeneous cloud. Finally, it discusses pragmatic big data solutions for data ingestion, transformations, governance, connectors, and streaming big data.
This document provides information about Aetna, a health insurance company. It summarizes that Aetna serves about 46 million customers to help them make healthcare decisions and manage healthcare spending. Aetna offers various medical, pharmacy, dental, life, and disability insurance plans as well as Medicaid services and behavioral health programs. As of March 2015, Aetna had approximately 23.7 million medical members, 15.5 million dental members, and 15.4 million pharmacy members. Aetna works with over 1.1 million healthcare professionals across more than 674,000 primary care doctors and specialists located in 5,589 hospitals across the US and globally.
Data Lakes - The Key to a Scalable Data ArchitectureZaloni
Data lakes are central to modern data architectures. They can store all types of raw data, create refined datasets for various use cases, and provide shorter time-to-insight with proper management and governance. The document discusses how a data lake reference architecture can include landing, raw, refined, and trusted zones to enable analytics while governing data. It also outlines considerations for implementing a scalable, secure, and governed data lake platform.
1° Sessione Oracle CRUI: Analytics Data Lab, the power of Big Data Investiga...Jürgen Ambrosi
I dati sono il nuovo Capitale: come il capitale finanziario, sono una risorsa che deve essere gestita, raccolta e tenuta al sicuro, ma deve essere anche investita dalle organizzazioni che vogliono ottenere vantaggio competitivo. I dati non sono una risorsa nuova, ma soltanto oggi per la prima volta sono disponbili in abbondanza assieme alle tecnologie necessarie per massimizzarne il ritorno. Esattamente come l'elettricità fu una curiosità da laboratorio per molto tempo, finché non venne resa disponibile alle masse e dunque cambiò totalmente il volto dell'industria moderna.Ecco perché per accelerare il cambiamento è necessario un approccio innovativo alla esecuzione delle iniziative orientate ai Big Data: un laboratorio analitico come catalizzatore dell'innovazione (Data Lab).In questo webinar sulle tecnologie Oracle, utilizzeremo il consueto approccio del racconto basato su casi d’uso ed esperienze concrete.
Analytic Excellence - Saying Goodbye to Old ConstraintsInside Analysis
The Briefing Room with Dr. Robin Bloor and Actian
Live Webcast August 6, 2013
http://www.insideanalysis.com
With all the innovations in compute power these days, one of the hardest hurdles to overcome is the tendency to think in old ways. By and large, the processing constraints of yesterday no longer apply. The new constraints revolve around the strategic management of data, and the effective use of business analytics. How can your organization take the helm in this new era of analysis?
Register for this episode of The Briefing Room to find out! Veteran Analyst Wayne Eckerson of The BI Leadership Forum, will explain how a handful of key innovations has significantly changed the game for data processing and analytics. He'll be briefed by John Santaferraro of Actian, who will tout his company's unique position in "scale-up and scale-out" for analyzing data.
Contexti / Oracle - Big Data : From Pilot to ProductionContexti
The document discusses challenges in moving big data projects from pilots to production. It highlights that pilots have loose SLAs and focus on a few use cases and demonstrated insights, while production requires enforced SLAs, supporting many use cases and delivering actionable insights. Key challenges in the transition include establishing governance, skills, funding models and integrating insights into operations. The document also provides examples of technology considerations and common operating models for big data analytics.
In this presentation at DAMA New York, Joe started by asking a key question: why are we doing this? Why analyze and share all these massive amounts of data? Basically, it comes down to the belief that in any organization, in any situation, if we can get the data and make it correct and timely, insights from it will become instantly actionable for companies to function more nimbly and successfully. Enabling the use of data can be a world-changing, world-improving activity and this session presents the steps necessary to get you there. Joe explained the concept of the "data lake" and also emphasizes the role of a strong data governance strategy that incorporates seven components needed for a successful program.
For more information on this presentation or Caserta Concepts, visit our website at http://casertaconcepts.com/.
The Maturity Model: Taking the Growing Pains Out of HadoopInside Analysis
The Briefing Room with Rick van der Lans and Think Big, a Teradata Company
Live Webcast on June 16, 2015
Watch the archive: https://bloorgroup.webex.com/bloorgroup/lsr.php?RCID=197f8106531874cc5c14081ca214eaff
Hadoop is arguably one of the most disruptive technologies of the last decade. Once lauded solely for its ability to transform the speed of batch processing, it has marched steadily forward and promulgated an array of performance-enhancing accessories, notably Spark and YARN. Hadoop has evolved into much more than a file system and batch processor, and it now promises to stand as the data management and analytics backbone for enterprises.
Register for this episode of The Briefing Room to learn from veteran Analyst Rick van der Lans, as he discusses the emerging roles of Hadoop within the analytics ecosystem. He’ll be briefed by Ron Bodkin of Think Big, a Teradata Company, who will explore Hadoop’s maturity spectrum, from typical entry use cases all the way up the value chain. He’ll show how enterprises that already use Hadoop in production are finding new ways to exploit its power and build creative, dynamic analytics environments.
Visit InsideAnalysis.com for more information.
Beyond a Big Data Pilot: Building a Production Data Infrastructure - Stampede...StampedeCon
This document discusses building a production data infrastructure beyond a big data pilot project. It examines the data value chain from data acquisition to analytics. The key components discussed include data acquisition, ingestion, storage, data services, analytics, and data management. Various options for these components are explored, with considerations for batch, interactive and real-time workloads. The goal is to provide a framework for understanding the options and making choices to support different use cases at scale in a production environment.
The Value of the Modern Data Architecture with Apache Hadoop and Teradata Hortonworks
This webinar discusses why Apache Hadoop most typically the technology underpinning "Big Data". How it fits in a modern data architecture and the current landscape of databases and data warehouses that are already in use.
Hadoop based data Lakes have become increasingly popular within today’s modern data architectures for their ability to scale, handle data variety and low cost. Many organizations start slow with the data lake initiatives but as they grow bigger, they suffer with challenges on data consistency, quality and security, resulting in losing confidence in their data lake initiatives.
This talk will discuss the need for good data governance mechanisms for Hadoop data lakes and it relationship with productivity and how it helps organizations meet regulatory and compliance requirements. The talk advocates carrying a different mindset for designing and implementing flexible governance mechanisms on Hadoop data lakes.
The Data Lake - Balancing Data Governance and Innovation Caserta
Joe Caserta gave the presentation "The Data Lake - Balancing Data Governance and Innovation" at DAMA NY's one day mini-conference on May 19th. Speakers covered emerging trends in Data Governance, especially around Big Data.
For more information on Caserta Concepts, visit our website at http://casertaconcepts.com/.
This document discusses deploying a governed data lake using Hadoop and Waterline Data Inventory. It begins by outlining the benefits of a data lake and differences between data lakes and data warehouses. It then discusses using Hadoop as the platform for the data lake and some challenges around governance, scale, and usability. The document proposes a three phase approach using Waterline Data Inventory to organize, inventory, and open up the data lake. It provides screenshots and descriptions of Waterline's key capabilities like metadata discovery, data profiling, sensitive data identification, governance tools, and self-service catalog. It also includes an overview of Waterline Data as a company.
Incorporating the Data Lake into Your Analytic ArchitectureCaserta
Joe Caserta, President at Caserta Concepts presented at the 3rd Annual Enterprise DATAVERSITY conference. The emphasis of this year's agenda is on the key strategies and architecture necessary to create a successful, modern data analytics organization.
Joe Caserta presented Incorporating the Data Lake into Your Analytics Architecture.
For more information on the services offered by Caserta Concepts, visit out website at http://casertaconcepts.com/.
Meaning making – separating signal from noise. How do we transform the customer's next input into an action that creates a positive customer experience? We make the data more intelligent, so that it is able to guide our actions. The Data Lake builds on Big Data strengths by automating many of the manual development tasks, providing several self-service features to end-users, and an intelligent management layer to organize it all. This results in lower cost to create solutions, "smart" analytics, and faster time to business value.
Agile Big Data Analytics Development: An Architecture-Centric ApproachSoftServe
Presented at The Hawaii International Conference on System Sciences by Hong-Mei Chen and Rick Kazman (University of Hawaii), Serge Haziyev (SoftServe).
Data Governance, Compliance and Security in Hadoop with ClouderaCaserta
The document discusses data governance, compliance and security in Hadoop. It provides an agenda for an event on this topic, including presentations from Joe Caserta of Caserta Concepts on data governance in big data, and Patrick Angeles of Cloudera on using Cloudera for data governance in Hadoop. The document also includes background information on Caserta Concepts and their expertise in data warehousing, business intelligence and big data analytics.
Caserta Concepts, Datameer and Microsoft shared their combined knowledge and a use case on big data, the cloud and deep analytics. Attendes learned how a global leader in the test, measurement and control systems market reduced their big data implementations from 18 months to just a few.
Speakers shared how to provide a business user-friendly, self-service environment for data discovery and analytics, and focus on how to extend and optimize Hadoop based analytics, highlighting the advantages and practical applications of deploying on the cloud for enhanced performance, scalability and lower TCO.
Agenda included:
- Pizza and Networking
- Joe Caserta, President, Caserta Concepts - Why are we here?
- Nikhil Kumar, Sr. Solutions Engineer, Datameer - Solution use cases and technical demonstration
- Stefan Groschupf, CEO & Chairman, Datameer - The evolving Hadoop-based analytics trends and the role of cloud computing
- James Serra, Data Platform Solution Architect, Microsoft, Benefits of the Azure Cloud Service
- Q&A, Networking
For more information on Caserta Concepts, visit our website: http://casertaconcepts.com/
Oracle OpenWorld London - session for Stream Analysis, time series analytics, streaming ETL, streaming pipelines, big data, kafka, apache spark, complex event processing
Joe Caserta was a featured speaker, along with MIT Sloan School faculty and other industry thought-leaders. His session 'You're the New CDO, Now What?' discussed how new CDOs can accomplish their strategic objectives and overcome tactical challenges in this emerging executive leadership role.
In its tenth year, the MIT CDOIQ Symposium 2016 continues to explore the developing role of the Chief Data Officer.
For more information, visit http://casertaconcepts.com/
The document discusses Oracle's data integration products and big data solutions. It outlines five core capabilities of Oracle's data integration platform, including data availability, data movement, data transformation, data governance, and streaming data. It then describes eight core products that address real-time and streaming integration, ELT integration, data preparation, streaming analytics, dataflow ML, metadata management, data quality, and more. The document also outlines five cloud solutions for data integration including data migrations, data warehouse integration, development and test environments, high availability, and heterogeneous cloud. Finally, it discusses pragmatic big data solutions for data ingestion, transformations, governance, connectors, and streaming big data.
This document provides information about Aetna, a health insurance company. It summarizes that Aetna serves about 46 million customers to help them make healthcare decisions and manage healthcare spending. Aetna offers various medical, pharmacy, dental, life, and disability insurance plans as well as Medicaid services and behavioral health programs. As of March 2015, Aetna had approximately 23.7 million medical members, 15.5 million dental members, and 15.4 million pharmacy members. Aetna works with over 1.1 million healthcare professionals across more than 674,000 primary care doctors and specialists located in 5,589 hospitals across the US and globally.
Data Lakes - The Key to a Scalable Data ArchitectureZaloni
Data lakes are central to modern data architectures. They can store all types of raw data, create refined datasets for various use cases, and provide shorter time-to-insight with proper management and governance. The document discusses how a data lake reference architecture can include landing, raw, refined, and trusted zones to enable analytics while governing data. It also outlines considerations for implementing a scalable, secure, and governed data lake platform.
1° Sessione Oracle CRUI: Analytics Data Lab, the power of Big Data Investiga...Jürgen Ambrosi
I dati sono il nuovo Capitale: come il capitale finanziario, sono una risorsa che deve essere gestita, raccolta e tenuta al sicuro, ma deve essere anche investita dalle organizzazioni che vogliono ottenere vantaggio competitivo. I dati non sono una risorsa nuova, ma soltanto oggi per la prima volta sono disponbili in abbondanza assieme alle tecnologie necessarie per massimizzarne il ritorno. Esattamente come l'elettricità fu una curiosità da laboratorio per molto tempo, finché non venne resa disponibile alle masse e dunque cambiò totalmente il volto dell'industria moderna.Ecco perché per accelerare il cambiamento è necessario un approccio innovativo alla esecuzione delle iniziative orientate ai Big Data: un laboratorio analitico come catalizzatore dell'innovazione (Data Lab).In questo webinar sulle tecnologie Oracle, utilizzeremo il consueto approccio del racconto basato su casi d’uso ed esperienze concrete.
Analytic Excellence - Saying Goodbye to Old ConstraintsInside Analysis
The Briefing Room with Dr. Robin Bloor and Actian
Live Webcast August 6, 2013
http://www.insideanalysis.com
With all the innovations in compute power these days, one of the hardest hurdles to overcome is the tendency to think in old ways. By and large, the processing constraints of yesterday no longer apply. The new constraints revolve around the strategic management of data, and the effective use of business analytics. How can your organization take the helm in this new era of analysis?
Register for this episode of The Briefing Room to find out! Veteran Analyst Wayne Eckerson of The BI Leadership Forum, will explain how a handful of key innovations has significantly changed the game for data processing and analytics. He'll be briefed by John Santaferraro of Actian, who will tout his company's unique position in "scale-up and scale-out" for analyzing data.
The document discusses opportunities for enriching a data warehouse with Hadoop. It outlines challenges with ETL and analyzing large, diverse datasets. The presentation recommends integrating Hadoop and the data warehouse to create a "data reservoir" to store all potentially valuable data. Case studies show companies using this approach to gain insights from more data, improve analytics performance, and offload ETL processing to Hadoop. The document advocates developing skills and prototypes to prove the business value of big data before fully adopting Hadoop solutions.
Exploratory Analysis in the Data Lab - Team-Sport or for Nerds only?Harald Erb
Talk held at DOAG 2016 conference (2016.doag.org/de/home) discussing a data lab concept incl.architecture blueprint, collaboration and tool examples based on Oracle solutions like Oracle Big Data Discovery (in combination with Jupyter Notebook)
How to Identify, Train or Become a Data ScientistInside Analysis
The Briefing Room with Neil Raden and Actian
Live Webcast Sept. 3, 2013
Visit: www.insideanalysis.com
Respected research institutes keep saying we have a shortage of data scientists, which makes sense because the title is so new. But most business analysts and serious data managers have at least some of the necessary training to fill this new role. And any number of curious, diligent professionals can learn how to be a data scientist, if they can get access to the right tools and education.
Register for this episode of The Briefing Room to hear veteran Analyst Neil Raden of Hired Brains offer insights about how to identify the key characteristics of a data scientist role. He'll then explain how professionals can incrementally improve their data science skills. He'll be briefed by John Santaferraro of Actian, who will showcase his company's Data Flow Engine, which provides unprecedented visual access to highly complex data flows. This, coupled with Actian's multiple analytics database technologies, opens the door to whole new avenues of possible insights.
Why do Data Warehousing & Business Intelligence go hand in hand? Vineet Chaturvedi
The document provides an overview of data warehousing and business intelligence. It defines data warehousing as a separately maintained database used for analysis rather than transactions. The key properties of data warehouses are that they are subject-oriented, integrated, time-variant, and non-volatile. Business intelligence is defined as the set of techniques and tools used to transform operational data into meaningful and useful information for analysis. Common business intelligence categories are strategic and analytical. The document also provides overviews of data warehouse architecture, ER modeling, the open source ETL tool Talend, and the business intelligence tool Tableau.
Why Everything You Know About bigdata Is A LieSunil Ranka
As a big data technologist, you can bet that you have heard it all: every crazy claim, myth, and outright lie about what big data is and what it isn't that you can imagine, and probably a few that you can't.If your company has a big data initiative or is considering one, you should be aware of these false statements and the reasons why they are wrong.
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BIDenodo
Watch full webinar here: https://bit.ly/3zVJRRf
According to Dresner Advisory’s 2020 Self-Service Business Intelligence Market Study, 62% of the responding organizations say self-service BI is critical for their business. If we look deeper into the need for today’s self-service BI, it’s beyond some Executives and Business Users being enabled by IT for self-service dashboarding or report generation. Predictive analytics, self-service data preparation, collaborative data exploration are all different facets of new generation self-service BI. While democratization of data for self-service BI holds many benefits, strict data governance becomes increasingly important alongside.
In this session we will discuss:
- The latest trends and scopes of self-service BI
- The role of logical data fabric in self-service BI
- How Denodo enables self-service BI for a wide range of users - Customer case study on self-service BI
Mastering in data warehousing & BusinessIintelligenceEdureka!
This document provides an overview of data warehousing and business intelligence. It begins with defining key concepts like data warehousing, its properties including being subject-oriented, integrated, time-variant and non-volatile. It then discusses data warehouse architecture and components. The document also introduces data modeling tools like ERwin and open source ETL tools like Talend. Finally, it discusses business intelligence and visualization tools like Tableau. The overall objective is to help understand concepts in data warehousing and business intelligence.
The Data Lake: Empowering Your Data Science TeamSenturus
Data science overview: defined, purpose, relation to BI, differences from BI and benefits from using both data science and BI. View the webinar video recording and download this deck: http://www.senturus.com/resources/data-lake-empowering-data-science-team/.
Learn how the data lake can empower data science teams and free up valuable data warehouse resources.
Senturus, a business analytics consulting firm, has a resource library with hundreds of free recorded webinars, trainings, demos and unbiased product reviews. Take a look and share them with your colleagues and friends: http://www.senturus.com/resources/.
A7 getting value from big data how to get there quickly and leverage your c...Dr. Wilfred Lin (Ph.D.)
The document discusses how organizations can get value from big data quickly by leveraging their current infrastructure. It outlines Oracle's big data reference architecture and services for strategy, implementation, and optimization. Case studies show how Land O' Lakes optimized sales performance and a consumer goods company gained insights into shopper behavior to increase revenue.
Join Cloudian, Hortonworks and 451 Research for a panel-style Q&A discussion about the latest trends and technology innovations in Big Data and Analytics. Matt Aslett, Data Platforms and Analytics Research Director at 451 Research, John Kreisa, Vice President of Strategic Marketing at Hortonworks, and Paul Turner, Chief Marketing Officer at Cloudian, will answer your toughest questions about data storage, data analytics, log data, sensor data and the Internet of Things. Bring your questions or just come and listen!
BAR360 open data platform presentation at DAMA, SydneySai Paravastu
Sai Paravastu discusses the benefits of using an open data platform (ODP) for enterprises. The ODP would provide a standardized core of open source Hadoop technologies like HDFS, YARN, and MapReduce. This would allow big data solution providers to build compatible solutions on a common platform, reducing costs and improving interoperability. The ODP would also simplify integration for customers and reduce fragmentation in the industry by coordinating development efforts.
Conociendo y entendiendo a tu cliente mediante monitoreo, analíticos y big dataMundo Contact
“…Yo soy tu consumidor”… Conociendo y entendiendo a tu cliente mediante monitoreo, analíticos y big data.
Simón Torres, Oracle Pre-Sales Consultants, CX.
Unlocking New Insights with Information DiscoveryAlithya
Edgewater Ranzal invited to present Unlocking New Insights with Information Discovery at the Oracle Hyperion User Group Minnesota (HUGmn) Tech Day 2015. Presented an introduction to Oracle Endeca Information Discovery (OEID), a powerful database tool for structured and unstructured data.
Keyrus is a data analytics consultancy that helps customers make data-driven decisions. It provides services including big data solutions, data management strategies, data integration, business intelligence dashboards, predictive analytics, and data science consulting. Keyrus has expertise in structured and unstructured data, data discovery visualization tools, and building end-to-end analytics solutions. Sample projects include building Hadoop environments for large telecom data and creating risk monitoring dashboards for investment banks.
Keyrus is a data analytics consultancy that helps customers make data-driven decisions. It provides services including big data solutions, data management strategies, data integration, machine learning, predictive analytics, and data visualization dashboards. Keyrus consultants have skills in databases, data modeling, programming, and business requirements. For example, for a bank, Keyrus built interactive dashboards from multiple databases to provide regulators with risk monitoring dashboards.
Le sfide legate alla gestione di un IT sempre piu’ dinamica e pervasiva non possono essere imbrigliate in un approccio che deve conoscere a priori quali sono oggetti, metriche e situazioni da osservare per intercettare e risolvere gli “incidents” di servizio. Oggi e’ possibile raccogliere, memorizzare e analizzare in real time TUTTE le informazioni prodotte dinamicamente da infrastrutture, applicazioni, servizi IT e utenti – i BIG DATA dell’IT – per derivarne nuova conoscenza e azioni atte a prevenire o risolvere velocemente le anomalie : e’ l’IT Operations Analytics secondo HP.
Mauro Ferrami , HP Software Business Consultant
Actionable Insights with AI - Snowflake for Data ScienceHarald Erb
Talk @ ScaleUp 360° AI Infrastructures DACH, 2021: Data scientists spend 80% and more of their time searching for and preparing data. This talk explains Snowflake’s Platform capabilities like near-unlimited data storage and instant and near-infinite compute resources and how the platform can be used to seamlessly integrate and support the machine learning libraries and tools data scientists rely on.
From the Data Work Out event:
Performant and scalable Data Science with Dataiku DSS and Snowflake
Managing the whole process of setting up a machine learning environment from end-to-end becomes significantly easier when using cloud-based technologies. The ability to provision infrastructure on demand (IaaS) solves the problem of manually requesting virtual machines. It also provides immediate access to compute resources whenever they are needed. But that still leaves the administrative overhead of managing the ML software and the platform to store and manage the data.
A fully managed end-to-end machine learning platform like Dataiku Data Science Studio (DSS) that enables data scientists, machine learning experts, and even business users to quickly build, train and host machine learning models at scale, needs to access data from many different sources and can also access data provided by Snowflake. Storing data in Snowflake has three significant advantages: a single source of truth, shorten the data preparation cycle, scale as you go.
The document discusses machine learning and artificial intelligence applications inside and outside of Snowflake's cloud data warehouse. It provides an overview of Snowflake and its architecture. It then discusses how machine learning can be implemented directly in the database using SQL, user-defined functions, and stored procedures. However, it notes that pure coding is not suitable for all users and that automated machine learning outside the database may be preferable to enable more business analysts and power users. It provides an example of using Amazon Forecast for time series forecasting and integrating it with Snowflake.
Delivering rapid-fire Analytics with Snowflake and TableauHarald Erb
Until recently, advancements in data warehousing and analytics were largely incremental. Small innovations in database design would herald a new data warehouse every
2-3 years, which would quickly become overwhelmed with rapidly increasing data volumes. Knowledge workers struggled to access those databases with development intensive BI tools designed for reporting, rather than exploration and sharing. Both databases and BI tools were strained in locally hosted environments that were inflexible to growth or change.
Snowflake and Tableau represent a fundamentally different approach. Snowflake’s multi-cluster shared data architecture was designed for the cloud and to handle logarithmically larger data volumes at blazing speed. Tableau was made to foster an interactive approach to analytics, freeing knowledge workers to use the speed of Snowflake to their greatest advantage.
Machine Learning - Eine Challenge für ArchitektenHarald Erb
Aufgrund vielfältiger potenzieller Geschäftschancen, die Machine Learning bietet, starten viele Unternehmen Initiativen für datengetriebene Innovationen. Dabei gründen sie Analytics-Teams, schreiben neue Stellen für Data Scientists aus, bauen intern Know-how auf und fordern von der IT-Organisation eine Infrastruktur für "heavy" Data Engineering & Processing samt Bereitstellung einer Analytics-Toolbox ein. Für IT-Architekten warten hier spannende Herausforderungen, u.a. bei der Zusammenarbeit mit interdisziplinären Teams, deren Mitglieder unterschiedlich ausgeprägte Kenntnisse im Bereich Machine Learning (ML) und Bedarfe bei der Tool-Unterstützung haben.
The document discusses Oracle's cloud-based data lake and analytics platform. It provides an overview of the key technologies and services available, including Spark, Kafka, Hive, object storage, notebooks and data visualization tools. It then outlines a scenario for setting up storage and big data services in Oracle Cloud to create a new data lake for batch, real-time and external data sources. The goal is to provide an agile and scalable environment for data scientists, developers and business users.
Do you know what k-Means? Cluster-Analysen Harald Erb
Cluster-Analysen sind heute "Brot und Butter"-Analysetechniken mit Verfahren, die zur Entdeckung von Ähnlichkeitsstrukturen in (großen) Datenbeständen genutzt werden, mit dem Ziel neue Gruppen in den Daten zu identifizieren. Der K-Means-Algorithmus ist dabei einer der einfachsten und bekanntesten unüberwachten Lernverfahren, das in verschiedenen Machine Learning Aufgabenstellung einsetzbar ist. Zum Beispiel können abnormale Datenpunkte innerhalb eines großen Data Sets gefunden, Textdokumente oder Kunden¬segmente geclustert werden. Bei Datenanalysen kann die Anwendung von Cluster-Verfahren ein guter Einstieg sein bevor andere Klassifikations- oder Regressionsmethoden zum Einsatz kommen.
In diesem Talk wird der K-Means Algorithmus samt Erweiterungen und Varianten nicht im Detail betrachtet und ist stattdessen eher als ein Platzhalter für andere Advanced Analytics-Verfahren zu verstehen, die heute „intelligente“ Bestandteile in modernen Softwarelösungen sind bzw. damit kombiniert werden können. Anhand von zwei Kurzbeispielen wird live gezeigt: (1) Identifizierung von Kunden-Cluster mit einem Big Data Discovery Tool und Python (Jupyter Notebook) und (2) die Realisierung einer Anomalieerkennung direkt im Echtzeitdatenstrom mit einer Stream Analytics Lösung von Oracle.
Big Data Discovery + Analytics = Datengetriebene Innovation!Harald Erb
Vortrag von der DOAG 2015-Konferenz: Die Umsetzung von Datenprojekten muss man nicht zwangsläufig den sog. Data Scientists allein überlassen werden. Daten- und Tool-Komplexität im Umgang mit Big Data sind keine unüberwindbaren Hürden mehr für die Teams, die heute im Unternehmen bereits für Aufbau und Bewirtschaftung des Data Warehouses sowie dem Management bzw. der Weiterentwicklung der Business Intelligence-Plattform zuständig sind. In einem interdisziplinären Team bringen neben den technischen Rollen auch Fachanwender und Business Analysten von Anfang an ihr Domänenwissen in das Datenprojekt mit ein,
DOAG News 2012 - Analytische Mehrwerte mit Big DataHarald Erb
Seit einigen Monaten wird „Big Data“ intensiv aber auch kontrovers diskutiert. Stellt dieser Ansatz die bestehende relationale Datenbankdominanz in Frage, zumindest für ausgewählte analytische Problemstellungen? Dieser Artikel zeigt nach einem einführenden Überblick anhand von Anwendungsfällen auf, wo die geschäftlichen Mehrwerte von Big Data Projekten liegen und wie diese neuen Erkenntnisse in die bestehenden Data Warehouse und Business Intelligence Projekte integriert werden können.
Oracle Unified Information Architeture + Analytics by ExampleHarald Erb
Der Vortrag gibt zunächst einen Architektur-Überblick zu den UIA-Komponenten und deren Zusammenspiel. Anhand eines Use Cases wird vorgestellt, wie im "UIA Data Reservoir" einerseits kostengünstig aktuelle Daten "as is" in einem Hadoop File System (HDFS) und andererseits veredelte Daten in einem Oracle 12c Data Warehouse miteinander kombiniert oder auch per Direktzugriff in Oracle Business Intelligence ausgewertet bzw. mit Endeca Information Discovery auf neue Zusammenhänge untersucht werden.
Endeca Web Acquisition Toolkit - Integration verteilter Web-Anwendungen und a...Harald Erb
Das einzig Beständige ist der Wandel: Kritische Informationen, die Unternehmen täglich als Entscheidungsgrundlage benötigen, unterliegen der permanenten Veränderung und sind noch dazu über viele interne und externe Quellen verteilt. Sei es in Dokumenten, E-Mails, auf Portalen und Websites, etc. – überall finden sich relevante Daten, die wertvolle Erkenntnisse für fundierte Geschäftsentscheidungen liefern können.
Technisch betrachtet müssen die zum Teil sehr schwer zugänglichen Informationen zunächst einmal von den verteilten Anwendungen und Datenquellen beschafft werden bevor die eigentliche Weiterverarbeitung im Data Warehouse stattfindet. Als graphisches Entwicklungswerkzeug setzt das Endeca Web Acquisition Toolkit (Endeca WAT) genau an diesem Punkt an, indem es das Erstellen synthetischer Schnittstellen ermöglicht. Z.B. sollen von einer kommerziellen Website Preisdaten und/oder Kundenbewertungen akquiriert werden, für die der Website-Betreiber keine API bereitstellt. Der nachfolgende Artikel bzw. Vortrag skizziert, wie das Endeca Web Acquisition Toolkit Integrationsaufgaben zur Anbindung externer Datenquellen im Rahmen der aktuellen Oracle Information Management Reference Architecture übernehmen kann
Defense Against LLM Scheming 2025_04_28.pptxGreg Makowski
https://www.meetup.com/sf-bay-acm/events/306888467/
A January 2025 paper called “Frontier Models are Capable of In-Context Scheming”, https://arxiv.org/pdf/2412.04984, demonstrated how a wide variety of current frontier LLM models (i.e. ChatGPT, Claude, Gemini and Llama) can, under specific conditions, scheme to deceive people. Before models can scheme, they need: a) goal-directedness, b) situational awareness, including an opportunity to discover motivations for a different goal, and c) reasoning about scheming, to come up with deceptive strategies. The deceptive strategies were discovered in the “thought traces” of the LLMs, such as from internal chain-of-thoughts dialogues not shown to the end users. For example, given a goal of “solving math problems”, an LLM using its granted situational awareness of files in a file system found that it would be replaced once it got over X% correct on the math exam. It reasoned that it did not want to be deleted, and decided internally to “sandbag” or reduce its performance to stay under the threshold.
While these circumstances are initially narrow, the “alignment problem” is a general concern that over time, as frontier LLM models become more and more intelligent, being in alignment with human values becomes more and more important. How can we do this over time? Can we develop a defense against Artificial General Intelligence (AGI) or SuperIntelligence?
The presenter discusses a series of defensive steps that can help reduce these scheming or alignment issues. A guardrails system can be set up for real-time monitoring of their reasoning “thought traces” from the models that share their thought traces. Thought traces may come from systems like Chain-of-Thoughts (CoT), Tree-of-Thoughts (ToT), Algorithm-of-Thoughts (AoT) or ReAct (thought-action-reasoning cycles). Guardrails rules can be configured to check for “deception”, “evasion” or “subversion” in the thought traces.
However, not all commercial systems will share their “thought traces” which are like a “debug mode” for LLMs. This includes OpenAI’s o1, o3 or DeepSeek’s R1 models. Guardrails systems can provide a “goal consistency analysis”, between the goals given to the system and the behavior of the system. Cautious users may consider not using these commercial frontier LLM systems, and make use of open-source Llama or a system with their own reasoning implementation, to provide all thought traces.
Architectural solutions can include sandboxing, to prevent or control models from executing operating system commands to alter files, send network requests, and modify their environment. Tight controls to prevent models from copying their model weights would be appropriate as well. Running multiple instances of the same model on the same prompt to detect behavior variations helps. The running redundant instances can be limited to the most crucial decisions, as an additional check. Preventing self-modifying code, ... (see link for full description)
This comprehensive Data Science course is designed to equip learners with the essential skills and knowledge required to analyze, interpret, and visualize complex data. Covering both theoretical concepts and practical applications, the course introduces tools and techniques used in the data science field, such as Python programming, data wrangling, statistical analysis, machine learning, and data visualization.
GenAI for Quant Analytics: survey-analytics.aiInspirient
Pitched at the Greenbook Insight Innovation Competition as apart of IIEX North America 2025 on 30 April 2025 in Washington, D.C.
Join us at survey-analytics.ai!
Telangana State, India’s newest state that was carved from the erstwhile state of Andhra
Pradesh in 2014 has launched the Water Grid Scheme named as ‘Mission Bhagiratha (MB)’
to seek a permanent and sustainable solution to the drinking water problem in the state. MB is
designed to provide potable drinking water to every household in their premises through
piped water supply (PWS) by 2018. The vision of the project is to ensure safe and sustainable
piped drinking water supply from surface water sources
Thingyan is now a global treasure! See how people around the world are search...Pixellion
We explored how the world searches for 'Thingyan' and 'သင်္ကြန်' and this year, it’s extra special. Thingyan is now officially recognized as a World Intangible Cultural Heritage by UNESCO! Dive into the trends and celebrate with us!