Unlock the value in your big data reservoir using oracle big data discovery a...Mark Rittman
The document discusses Oracle Big Data Discovery and how it can be used to analyze and gain insights from data stored in a Hadoop data reservoir. It provides an example scenario where Big Data Discovery is used to analyze website logs, tweets, and website posts and comments to understand popular content and influencers for a company. The data is ingested into the Big Data Discovery tool, which automatically enriches the data. Users can then explore the data, apply additional transformations, and visualize relationships to gain insights.
Riga dev day 2016 adding a data reservoir and oracle bdd to extend your ora...Mark Rittman
This talk focus is on what a data reservoir is, how it related to the RDBMS DW, and how Big Data Discovery provides access to it to business and BI users
What is Big Data Discovery, and how it complements traditional business anal...Mark Rittman
Data Discovery is an analysis technique that complements traditional business analytics, and enables users to combine, explore and analyse disparate datasets to spot opportunities and patterns that lie hidden within your data. Oracle Big Data discovery takes this idea and applies it to your unstructured and big data datasets, giving users a way to catalogue, join and then analyse all types of data across your organization.
In this session we'll look at Oracle Big Data Discovery and how it provides a "visual face" to your big data initatives, and how it complements and extends the work that you currently do using business analytics tools.
The document discusses Oracle's strategy to enable spatial and graph use cases on big data platforms. It provides an overview of Oracle's Big Data Spatial and Graph product, which allows for property graph analysis and spatial analysis on Hadoop. The spatial features allow for location data enrichment, proximity analysis, and preparation of map and imagery data. The graph features are useful for analysis of social media relationships, internet of things interactions, and cybersecurity.
Big Data for Oracle Devs - Towards Spark, Real-Time and Predictive AnalyticsMark Rittman
This is a session for Oracle DBAs and devs that looks at the cutting edge big data techs like Spark, Kafka etc, and through demos shows how Hadoop is now a a real-time platform for fast analytics, data integration and predictive modeling
Social Network Analysis using Oracle Big Data Spatial & Graph (incl. why I di...Mark Rittman
The document discusses using Hadoop and NoSQL technologies like Apache HBase to perform social network analysis on Twitter data related to a company's website and blog. It describes ingesting tweet and website log data into Hadoop HDFS and processing it with tools like Hive. Graph algorithms from Oracle Big Data Spatial & Graph were then used on the property graph stored in HBase to identify influential Twitter users and communities. This approach provided real-time insights at scale compared to using a traditional relational database.
Adding a Data Reservoir to your Oracle Data Warehouse for Customer 360-Degree...Mark Rittman
This document summarizes a presentation about adding a Hadoop-based data reservoir to an Oracle data warehouse. The presentation discusses using a data reservoir to store large amounts of raw customer data from various sources to enable 360-degree customer analysis. It describes loading and integrating the data reservoir with the data warehouse using Oracle tools and how organizations can use it for more personalized customer marketing through advanced analytics and machine learning.
Using Oracle Big Data SQL 3.0 to add Hadoop & NoSQL to your Oracle Data Wareh...Mark Rittman
As presented at OGh SQL Celebration Day in June 2016, NL. Covers new features in Big Data SQL including storage indexes, storage handlers and ability to install + license on commodity hardware
Gluent New World #02 - SQL-on-Hadoop : A bit of History, Current State-of-the...Mark Rittman
Hadoop and NoSQL platforms initially focused on Java developers and slow but massively-scalable MapReduce jobs as an alternative to high-end but limited-scale analytics RDBMS engines. Apache Hive opened-up Hadoop to non-programmers by adding a SQL query engine and relational-style metadata layered over raw HDFS storage, and since then open-source initiatives such as Hive Stinger, Cloudera Impala and Apache Drill along with proprietary solutions from closed-source vendors have extended SQL-on-Hadoop’s capabilities into areas such as low-latency ad-hoc queries, ACID-compliant transactions and schema-less data discovery – at massive scale and with compelling economics.
In this session we’ll focus on technical foundations around SQL-on-Hadoop, first reviewing the basic platform Apache Hive provides and then looking in more detail at how ad-hoc querying, ACID-compliant transactions and data discovery engines work along with more specialised underlying storage that each now work best with – and we’ll take a look to the future to see how SQL querying, data integration and analytics are likely to come together in the next five years to make Hadoop the default platform running mixed old-world/new-world analytics workloads.
Enkitec E4 Barcelona : SQL and Data Integration Futures on Hadoop : Mark Rittman
Mark Rittman gave a presentation on the future of analytics on Oracle Big Data Appliance. He discussed how Hadoop has enabled highly scalable and affordable cluster computing using technologies like MapReduce, Hive, Impala, and Parquet. Rittman also talked about how these technologies have improved query performance and made Hadoop suitable for both batch and interactive/ad-hoc querying of large datasets.
Oracle BI Hybrid BI : Mode 1 + Mode 2, Cloud + On-Premise Business AnalyticsMark Rittman
Mark Rittman, founder of Rittman Mead, discusses Oracle's approach to hybrid BI deployments and how it aligns with Gartner's vision of a modern BI platform. He explains how Oracle BI 12c supports both traditional top-down modeling and bottom-up data discovery. It also enables deploying components on-premises or in the cloud for flexibility. Rittman believes the future is bi-modal, with IT enabling self-service analytics alongside centralized governance.
Using Oracle Big Data Discovey as a Data Scientist's ToolkitMark Rittman
As delivered at Trivadis Tech Event 2016 - how Big Data Discovery along with Python and pySpark was used to build predictive analytics models against wearables and smart home data
OBIEE12c and Embedded Essbase 12c - An Initial Look at Query Acceleration Use...Mark Rittman
OBIEE12c comes with an updated version of Essbase that focuses entirely in this release on the query acceleration use-case. This presentation looks at this new release and explains how the new BI Accelerator Wizard manages the creation of Essbase cubes to accelerate OBIEE query performance
SQL-on-Hadoop for Analytics + BI: What Are My Options, What's the Future?Mark Rittman
There are many options for providing SQL access over data in a Hadoop cluster, including proprietary vendor products along with open-source technologies such as Apache Hive, Cloudera Impala and Apache Drill; customers are using those to provide reporting over their Hadoop and relational data platforms, and looking to add capabilities such as calculation engines, data integration and federation along with in-memory caching to create complete analytic platforms. In this session we’ll look at the options that are available, compare database vendor solutions with their open-source alternative, and see how emerging vendors are going beyond simple SQL-on-Hadoop products to offer complete “data fabric” solutions that bring together old-world and new-world technologies and allow seamless offloading of archive data and compute work to lower-cost Hadoop platforms.
End to-end hadoop development using OBIEE, ODI, Oracle Big Data SQL and Oracl...Mark Rittman
This document discusses an end-to-end example of using Hadoop, OBIEE, ODI and Oracle Big Data Discovery to analyze big data from various sources. It describes ingesting website log data and Twitter data into a Hadoop cluster, processing and transforming the data using tools like Hive and Spark, and using the results for reporting in OBIEE and data discovery in Oracle Big Data Discovery. ODI is used to automate the data integration process.
INFORMATICA ONLINE TRAINING BY QUONTRA SOLUTIONS WITH PLACEMENT ASSISTANCE
We offer online IT training with placements, project assistance in different platforms with real time industry consultants to provide quality training for all it professionals, corporate clients and students etc. Special features by Quontra Solutions are Extensive Training will be in both Informatica Online Training and Placement. We help you in resume preparation and conducting Mock Interviews.
Emphasis is given on important topics which are essential and mostly used in real time projects. Quontra Solutions is an Online Training Leader when it comes to high-end effective and efficient I.T Training. We have always been and still are focusing on the key aspects which are providing utmost effective and competent training to both students and professionals who are eager to enrich their technical skills.
Training Features at Quontra Solutions:
We believe that online training has to be measured by three major aspects viz., Quality, Content and Relationship with the Trainer and Student. Not only our online training classes are important but apart from that the material which we provide are in tune with the latest IT training standards, so a student has not to worry at all whether the training imparted is outdated or latest.
Course content:
• Basics of data warehousing concepts
• Power center components
• Informatica concepts and overview
• Sources
• Targets
• Transformations
• Advanced Informatica concepts
Please Visit us for the Demo Classes, we have regular batches and weekend batches.
QUONTRASOLUTIONS
204-226 Imperial Drive,Rayners Lane, Harrow-HA2 7HH
Phone : +44 (0)20 3734 1498 / 99
Email: [email protected]
Delivering the Data Factory, Data Reservoir and a Scalable Oracle Big Data Ar...Mark Rittman
Presentation from the Rittman Mead BI Forum 2015 masterclass, pt.2 of a two-part session that also covered creating the Discovery Lab. Goes through setting up Flume log + twitter feeds into CDH5 Hadoop using ODI12c Advanced Big Data Option, then looks at the use of OBIEE11g with Hive, Impala and Big Data SQL before finally using Oracle Big Data Discovery for faceted search and data mashup on-top of Hadoop
ODI12c as your Big Data Integration HubMark Rittman
Presentation from the recent Oracle OTN Virtual Technology Summit, on using Oracle Data Integrator 12c to ingest, transform and process data on a Hadoop cluster.
Mark Rittman presented on how a tweet about a smart kettle went viral. He analyzed the tweet data using Oracle Big Data Spatial and Graph on a Hadoop cluster. Over 3,000 tweets were captured from over 30 countries in 48 hours. Key influencers were identified using PageRank and by their large number of followers. Visualization tools like Cytoscape and Tom Sawyer Perspectives showed how the tweet spread over time and geography. The analysis revealed that the tweet went viral after being shared by the influential user @erinscafe on the first day.
Turn Data Into Actionable Insights - StampedeCon 2016StampedeCon
At Monsanto, emerging technologies such as IoT, advanced imaging and geo-spatial platforms; molecular breeding, ancestry and genomics data sets have made us rethink how we approach developing, deploying, scaling and distributing our software to accelerate predictive and prescriptive decisions. We created a Cloud based Data Science platform for the enterprise to address this need. Our primary goals were to perform analytics@scale and integrate analytics with our core product platforms.
As part of this talk, we will be sharing our journey of transformation showing how we enabled: a collaborative discovery analytics environment for data science teams to perform model development, provisioning data through APIs, streams and deploying models to production through our auto-scaling big-data compute in the cloud to perform streaming, cognitive, predictive, prescriptive, historical and batch analytics@scale, integrating analytics with our core product platforms to turn data into actionable insights.
Building a Next-gen Data Platform and Leveraging the OSS Ecosystem for Easy W...StampedeCon
This session will be a detailed recount of the design, implementation, and launch of the next-generation Shutterstock Data Platform, with strong emphasis on conveying clear, understandable learnings that can be transferred to your own organizations and projects. This platform was architected around the prevailing use of Kafka as a highly-scalable central data hub for shipping data across your organization in batch or streaming fashion. It also relies heavily on Avro as a serialization format and a global schema registry to provide structure that greatly improves quality and usability of our data sets, while also allowing the flexibility to evolve schemas and maintain backwards compatibility.
As a company, Shutterstock has always focused heavily on leveraging open source technologies in developing its products and infrastructure, and open source has been a driving force in big data more so than almost any other software sub-sector. With this plethora of constantly evolving data technologies, it can be a daunting task to select the right tool for your problem. We will discuss our approach for choosing specific existing technologies and when we made decisions to invest time in home-grown components and solutions.
We will cover advantages and the engineering process of developing language-agnostic APIs for publishing to and consuming from the data platform. These APIs can power some very interesting streaming analytics solutions that are easily accessible to teams across our engineering organization.
We will also discuss some of the massive advantages a global schema for your data provides for downstream ETL and data analytics. ETL into Hadoop and creation and maintenance of Hive databases and tables becomes much more reliable and easily automated with historically compatible schemas. To complement this schema-based approach, we will cover results of performance testing various file formats and compression schemes in Hadoop and Hive, the massive performance benefits you can gain in analytical workloads by leveraging highly optimized columnar file formats such as ORC and Parquet, and how you can use good old fashioned Hive as a tool for easily and efficiently converting exiting datasets into these formats.
Finally, we will cover lessons learned in launching this platform across our organization, future improvements and further design, and the need for data engineers to understand and speak the languages of data scientists and web, infrastructure, and network engineers.
This document provides an overview of big data concepts and technologies for managers. It discusses problems with relational databases for large, unstructured data and introduces NoSQL databases and Hadoop as solutions. It also summarizes common big data applications, frameworks like MapReduce, Spark, and Flink, and different NoSQL database categories including key-value, column-family, document, and graph stores.
BIWA2015 - Bringing Oracle Big Data SQL to OBIEE and ODIMark Rittman
The document discusses Oracle's Big Data SQL, which brings Oracle SQL capabilities to Hadoop data stored in Hive tables. It allows querying Hive data using standard SQL from Oracle Database and viewing Hive metadata in Oracle data dictionary tables. Big Data SQL leverages the Hive metastore and uses direct reads and SmartScan to optimize queries against HDFS and Hive data. This provides a unified SQL interface and optimized query processing for both Oracle and Hadoop data.
New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...Rittman Analytics
Most DBAs are aware something interesting is going on with big data and the Hadoop product ecosystem that underpins it, but aren't so clear about what each component in the stack does, what problem each part solves and why those problems couldn't be solved using the old approach. We'll look at where it's all going with the advent of Spark and machine learning, what's happening with ETL, metadata and analytics on this platform ... why IaaS and datawarehousing-as-a-service will have such a big impact, sooner than you think
This document provides an overview of social media and big data analytics. It discusses key concepts like Web 2.0, social media platforms, big data characteristics involving volume, velocity, variety, veracity and value. The document also discusses how social media data can be extracted and analyzed using big data tools like Hadoop and techniques like social network analysis and sentiment analysis. It provides examples of analyzing social media data at scale to gain insights and make informed decisions.
Adding a Data Reservoir to your Oracle Data Warehouse for Customer 360-Degree...Mark Rittman
This document summarizes a presentation about adding a Hadoop-based data reservoir to an Oracle data warehouse. The presentation discusses using a data reservoir to store large amounts of raw customer data from various sources to enable 360-degree customer analysis. It describes loading and integrating the data reservoir with the data warehouse using Oracle tools and how organizations can use it for more personalized customer marketing through advanced analytics and machine learning.
Using Oracle Big Data SQL 3.0 to add Hadoop & NoSQL to your Oracle Data Wareh...Mark Rittman
As presented at OGh SQL Celebration Day in June 2016, NL. Covers new features in Big Data SQL including storage indexes, storage handlers and ability to install + license on commodity hardware
Gluent New World #02 - SQL-on-Hadoop : A bit of History, Current State-of-the...Mark Rittman
Hadoop and NoSQL platforms initially focused on Java developers and slow but massively-scalable MapReduce jobs as an alternative to high-end but limited-scale analytics RDBMS engines. Apache Hive opened-up Hadoop to non-programmers by adding a SQL query engine and relational-style metadata layered over raw HDFS storage, and since then open-source initiatives such as Hive Stinger, Cloudera Impala and Apache Drill along with proprietary solutions from closed-source vendors have extended SQL-on-Hadoop’s capabilities into areas such as low-latency ad-hoc queries, ACID-compliant transactions and schema-less data discovery – at massive scale and with compelling economics.
In this session we’ll focus on technical foundations around SQL-on-Hadoop, first reviewing the basic platform Apache Hive provides and then looking in more detail at how ad-hoc querying, ACID-compliant transactions and data discovery engines work along with more specialised underlying storage that each now work best with – and we’ll take a look to the future to see how SQL querying, data integration and analytics are likely to come together in the next five years to make Hadoop the default platform running mixed old-world/new-world analytics workloads.
Enkitec E4 Barcelona : SQL and Data Integration Futures on Hadoop : Mark Rittman
Mark Rittman gave a presentation on the future of analytics on Oracle Big Data Appliance. He discussed how Hadoop has enabled highly scalable and affordable cluster computing using technologies like MapReduce, Hive, Impala, and Parquet. Rittman also talked about how these technologies have improved query performance and made Hadoop suitable for both batch and interactive/ad-hoc querying of large datasets.
Oracle BI Hybrid BI : Mode 1 + Mode 2, Cloud + On-Premise Business AnalyticsMark Rittman
Mark Rittman, founder of Rittman Mead, discusses Oracle's approach to hybrid BI deployments and how it aligns with Gartner's vision of a modern BI platform. He explains how Oracle BI 12c supports both traditional top-down modeling and bottom-up data discovery. It also enables deploying components on-premises or in the cloud for flexibility. Rittman believes the future is bi-modal, with IT enabling self-service analytics alongside centralized governance.
Using Oracle Big Data Discovey as a Data Scientist's ToolkitMark Rittman
As delivered at Trivadis Tech Event 2016 - how Big Data Discovery along with Python and pySpark was used to build predictive analytics models against wearables and smart home data
OBIEE12c and Embedded Essbase 12c - An Initial Look at Query Acceleration Use...Mark Rittman
OBIEE12c comes with an updated version of Essbase that focuses entirely in this release on the query acceleration use-case. This presentation looks at this new release and explains how the new BI Accelerator Wizard manages the creation of Essbase cubes to accelerate OBIEE query performance
SQL-on-Hadoop for Analytics + BI: What Are My Options, What's the Future?Mark Rittman
There are many options for providing SQL access over data in a Hadoop cluster, including proprietary vendor products along with open-source technologies such as Apache Hive, Cloudera Impala and Apache Drill; customers are using those to provide reporting over their Hadoop and relational data platforms, and looking to add capabilities such as calculation engines, data integration and federation along with in-memory caching to create complete analytic platforms. In this session we’ll look at the options that are available, compare database vendor solutions with their open-source alternative, and see how emerging vendors are going beyond simple SQL-on-Hadoop products to offer complete “data fabric” solutions that bring together old-world and new-world technologies and allow seamless offloading of archive data and compute work to lower-cost Hadoop platforms.
End to-end hadoop development using OBIEE, ODI, Oracle Big Data SQL and Oracl...Mark Rittman
This document discusses an end-to-end example of using Hadoop, OBIEE, ODI and Oracle Big Data Discovery to analyze big data from various sources. It describes ingesting website log data and Twitter data into a Hadoop cluster, processing and transforming the data using tools like Hive and Spark, and using the results for reporting in OBIEE and data discovery in Oracle Big Data Discovery. ODI is used to automate the data integration process.
INFORMATICA ONLINE TRAINING BY QUONTRA SOLUTIONS WITH PLACEMENT ASSISTANCE
We offer online IT training with placements, project assistance in different platforms with real time industry consultants to provide quality training for all it professionals, corporate clients and students etc. Special features by Quontra Solutions are Extensive Training will be in both Informatica Online Training and Placement. We help you in resume preparation and conducting Mock Interviews.
Emphasis is given on important topics which are essential and mostly used in real time projects. Quontra Solutions is an Online Training Leader when it comes to high-end effective and efficient I.T Training. We have always been and still are focusing on the key aspects which are providing utmost effective and competent training to both students and professionals who are eager to enrich their technical skills.
Training Features at Quontra Solutions:
We believe that online training has to be measured by three major aspects viz., Quality, Content and Relationship with the Trainer and Student. Not only our online training classes are important but apart from that the material which we provide are in tune with the latest IT training standards, so a student has not to worry at all whether the training imparted is outdated or latest.
Course content:
• Basics of data warehousing concepts
• Power center components
• Informatica concepts and overview
• Sources
• Targets
• Transformations
• Advanced Informatica concepts
Please Visit us for the Demo Classes, we have regular batches and weekend batches.
QUONTRASOLUTIONS
204-226 Imperial Drive,Rayners Lane, Harrow-HA2 7HH
Phone : +44 (0)20 3734 1498 / 99
Email: [email protected]
Delivering the Data Factory, Data Reservoir and a Scalable Oracle Big Data Ar...Mark Rittman
Presentation from the Rittman Mead BI Forum 2015 masterclass, pt.2 of a two-part session that also covered creating the Discovery Lab. Goes through setting up Flume log + twitter feeds into CDH5 Hadoop using ODI12c Advanced Big Data Option, then looks at the use of OBIEE11g with Hive, Impala and Big Data SQL before finally using Oracle Big Data Discovery for faceted search and data mashup on-top of Hadoop
ODI12c as your Big Data Integration HubMark Rittman
Presentation from the recent Oracle OTN Virtual Technology Summit, on using Oracle Data Integrator 12c to ingest, transform and process data on a Hadoop cluster.
Mark Rittman presented on how a tweet about a smart kettle went viral. He analyzed the tweet data using Oracle Big Data Spatial and Graph on a Hadoop cluster. Over 3,000 tweets were captured from over 30 countries in 48 hours. Key influencers were identified using PageRank and by their large number of followers. Visualization tools like Cytoscape and Tom Sawyer Perspectives showed how the tweet spread over time and geography. The analysis revealed that the tweet went viral after being shared by the influential user @erinscafe on the first day.
Turn Data Into Actionable Insights - StampedeCon 2016StampedeCon
At Monsanto, emerging technologies such as IoT, advanced imaging and geo-spatial platforms; molecular breeding, ancestry and genomics data sets have made us rethink how we approach developing, deploying, scaling and distributing our software to accelerate predictive and prescriptive decisions. We created a Cloud based Data Science platform for the enterprise to address this need. Our primary goals were to perform analytics@scale and integrate analytics with our core product platforms.
As part of this talk, we will be sharing our journey of transformation showing how we enabled: a collaborative discovery analytics environment for data science teams to perform model development, provisioning data through APIs, streams and deploying models to production through our auto-scaling big-data compute in the cloud to perform streaming, cognitive, predictive, prescriptive, historical and batch analytics@scale, integrating analytics with our core product platforms to turn data into actionable insights.
Building a Next-gen Data Platform and Leveraging the OSS Ecosystem for Easy W...StampedeCon
This session will be a detailed recount of the design, implementation, and launch of the next-generation Shutterstock Data Platform, with strong emphasis on conveying clear, understandable learnings that can be transferred to your own organizations and projects. This platform was architected around the prevailing use of Kafka as a highly-scalable central data hub for shipping data across your organization in batch or streaming fashion. It also relies heavily on Avro as a serialization format and a global schema registry to provide structure that greatly improves quality and usability of our data sets, while also allowing the flexibility to evolve schemas and maintain backwards compatibility.
As a company, Shutterstock has always focused heavily on leveraging open source technologies in developing its products and infrastructure, and open source has been a driving force in big data more so than almost any other software sub-sector. With this plethora of constantly evolving data technologies, it can be a daunting task to select the right tool for your problem. We will discuss our approach for choosing specific existing technologies and when we made decisions to invest time in home-grown components and solutions.
We will cover advantages and the engineering process of developing language-agnostic APIs for publishing to and consuming from the data platform. These APIs can power some very interesting streaming analytics solutions that are easily accessible to teams across our engineering organization.
We will also discuss some of the massive advantages a global schema for your data provides for downstream ETL and data analytics. ETL into Hadoop and creation and maintenance of Hive databases and tables becomes much more reliable and easily automated with historically compatible schemas. To complement this schema-based approach, we will cover results of performance testing various file formats and compression schemes in Hadoop and Hive, the massive performance benefits you can gain in analytical workloads by leveraging highly optimized columnar file formats such as ORC and Parquet, and how you can use good old fashioned Hive as a tool for easily and efficiently converting exiting datasets into these formats.
Finally, we will cover lessons learned in launching this platform across our organization, future improvements and further design, and the need for data engineers to understand and speak the languages of data scientists and web, infrastructure, and network engineers.
This document provides an overview of big data concepts and technologies for managers. It discusses problems with relational databases for large, unstructured data and introduces NoSQL databases and Hadoop as solutions. It also summarizes common big data applications, frameworks like MapReduce, Spark, and Flink, and different NoSQL database categories including key-value, column-family, document, and graph stores.
BIWA2015 - Bringing Oracle Big Data SQL to OBIEE and ODIMark Rittman
The document discusses Oracle's Big Data SQL, which brings Oracle SQL capabilities to Hadoop data stored in Hive tables. It allows querying Hive data using standard SQL from Oracle Database and viewing Hive metadata in Oracle data dictionary tables. Big Data SQL leverages the Hive metastore and uses direct reads and SmartScan to optimize queries against HDFS and Hive data. This provides a unified SQL interface and optimized query processing for both Oracle and Hadoop data.
New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...Rittman Analytics
Most DBAs are aware something interesting is going on with big data and the Hadoop product ecosystem that underpins it, but aren't so clear about what each component in the stack does, what problem each part solves and why those problems couldn't be solved using the old approach. We'll look at where it's all going with the advent of Spark and machine learning, what's happening with ETL, metadata and analytics on this platform ... why IaaS and datawarehousing-as-a-service will have such a big impact, sooner than you think
This document provides an overview of social media and big data analytics. It discusses key concepts like Web 2.0, social media platforms, big data characteristics involving volume, velocity, variety, veracity and value. The document also discusses how social media data can be extracted and analyzed using big data tools like Hadoop and techniques like social network analysis and sentiment analysis. It provides examples of analyzing social media data at scale to gain insights and make informed decisions.
Social network analysis & Big Data - Telecommunications and moreWael Elrifai
Social Network Analysis: Practical Uses and Implementation is a presentation that discusses social network analysis and its uses. It covers key topics such as defining social networks and social network analysis, why social network analysis is important, identifying influencers in social networks, roles in social networks, graph theory concepts used in social network analysis, calculating metrics from social networks, and recommended approaches to social network analysis. The presentation provides an overview of social network analysis concepts and their practical applications.
Big Data Analytics : A Social Network ApproachAndry Alamsyah
This document discusses using social network analysis approaches for big data analytics. It begins by introducing social network metrics like centrality and modularity that can be applied to large social network datasets. It then provides examples of how social network analysis has been used to detect terrorist cells and identify research communities. Finally, it outlines the author's research interests and publications in areas like sentiment analysis on social media and using social networks to analyze industries.
Knowing what data matters, and what doesn't, is critical to creating your own social media metrics tracking system. This presentation reviews the basics of Google Analytics, Facebook Insights, and YouTube Insights, and the data you need to track in order to know what your online community wants, develop engaging content, support the community, and meet your goals. The presentation also includes references to several DIY social media metrics dashboards you can use in your business.
The document discusses using social media analytics to track activity, audience, engagement, and referral metrics across platforms like Facebook, Twitter, YouTube, and Pinterest. It provides sample metrics for September 2015, such as 351 total posts across platforms, over 13,000 Facebook followers, and over 93,000 total user engagements. The document questions what actions to take based on these collected analytics numbers.
While most organizations embrace the idea of Big data, they are yet to figure out how to solve the implications brought about by the big data explosion from social media. In this presentation we highlighted some of the key challenges that organizations face while implementing big data
Big data is large amounts of unstructured data that require new techniques and tools to analyze. Key drivers of big data growth are increased storage capacity, processing power, and data availability. Big data analytics can uncover hidden patterns to provide competitive advantages and better business decisions. Applications include healthcare, homeland security, finance, manufacturing, and retail. The global big data market is expected to grow significantly, with India's market projected to reach $1 billion by 2015. This growth will increase demand for data scientists and analysts to support big data solutions and technologies like Hadoop and NoSQL databases.
This presentation, by big data guru Bernard Marr, outlines in simple terms what Big Data is and how it is used today. It covers the 5 V's of Big Data as well as a number of high value use cases.
A Different Perspective on Business with Social DataTzar Umang
Do business the intelligent way with Social Data and Analytics, harness the power Social Media and Sentiments and use it to improve your brand and or your current campaign,
Telecom Data Analysis Using Social Media FeedsJuhi Srivastava
This document discusses using social media data and text analysis techniques to gain business insights. It covers extracting data from social media, preprocessing the text, performing sentiment analysis and classification using Naive Bayes and other algorithms, analyzing word frequencies and associations through word clouds and clustering, and segmenting customers for cross-sell/upsell opportunities based on spending and sentiment. Potential applications discussed include customer churn prediction, sarcasm detection, and building unique models of customer behavior over time.
The document analyzes social media activity for BJP, Congress, and AAP from August 1, 2014 to December 9, 2014. It finds that BJP has the highest share of voice (SOV) at 51%. BJP receives the most positive commentary while Congress receives the most negative commentary. Videos generate the most mentions, with Congress enjoying the maximum share for videos.
Recent years have seen an increased use of social media data as a cheaper alternative to more traditional methods of market research. Social media services generate a large quantity of data every day and some of the data is available through their Application Programming Interfaces (APIs). This presentation outlines some of the research work carried out as part of the Uncertainty of Identity (http://www.uncertaintyofidentity.com) project. In particular, the use of social media data for activity pattern analysis and demographic profiling is explored.
Social Media in Australia: The Case of TwitterAxel Bruns
Professor Axel Bruns at Queensland University of Technology leads research tracking social media use in Australia, particularly Twitter. His team has identified over 2.8 million active Australian Twitter accounts through 2013. They map these accounts' follower/followee networks and track hashtags, links shared, and other activities to understand how public discourse and information spread occurs online. Their goal is developing a comprehensive model of Australia's online public sphere through large-scale, data-driven analysis of social media over time.
www.its.leeds.ac.uk/people/c.calastri
Social networks, i.e. the circles of people we are socially connected to, have been recognised to play a role in shaping our travel and activity behaviour. This not only has to do with socialisation being the purpose of travel, but also with enabling mobility and other activities through the so-called social capital. Another theme in the literature connecting social environment and travel behaviour is social influence, i.e. the investigation of how travel behaviour can be affected by observation or comparison with other people. Research about the impact of social influence on travel choices is still at its infancy. In this talk, I will give an overview of how choice modelling can be used to investigate the relationships between social networks, travel and activities. I will touch upon work that I have done so far, in particular I will describe my applications of the Multiple Discrete-Continuous Extreme Value (MDCEV) model to frequency of social interactions as well as to allocation of time to different activities, taking the social dimension into account. In these studies, I make use of social network and travel data collected in places as diverse as Switzerland and Chile. I will also discuss ongoing work making use of longitudinal life-course data to model the impact of family of origin and the “mobility environment” people grew up in on travel decision of adults. Finally, I will outline future plans about modelling behavioural changes due to social influence using the smartphone app travel data that are being collected in Leeds within the “Choices and consumption: modelling long and short term decisions in a changing world” (“DECISIONS”) project.
Multimedia Data Collection using Social Media Analysis Benoit HUET
The opening keynote of VIGTA 2012 – First International Workshop on Visual Interfaces for Ground Truth Collection in Computer Vision Applications
In conjunction with the Advanced Visual Interfaces International Working Conference in Capri Italy, May 21-25, 2012
Spatio-temporal demographic classification of the Twitter usersDr Muhammad Adnan
Use of social media continues to increase day by day, with implications for the creation of ‘big’ data – Twitter alone was forecast to have created 1.8 zettabytes of data in 2011. This talk presents an initial work towards the creation of geo-temporal geodemgoraphic classifications by using the Twitter social media data. London was chosen as the study area because of its high incidence of users and the consequent expectation that higher penetration might be associated with lower demographic bias.
Friendship and mobility user movement in location based social networksFread Mzee
This document summarizes a study on how friendship and social networks influence human mobility patterns based on location-based social network and mobile phone data. The study found that short-range daily travel exhibits strong periodic patterns and is not influenced by social ties, while long-distance travel is more influenced by social networks. It also found that social relationships can explain 10-30% of human movement, with periodic behavior explaining 50-70%. Based on these findings, the study developed a human mobility model combining periodic short-range movements with social network-influenced travel, which better predicted future location dynamics.
Statistical analytical programming for social media analysis .Felicita Florence
This document discusses using SAS programming to analyze social media recruitment data. It includes importing data files, merging files, conducting frequency analysis, means analysis, ANOVA, correlation, regression, and creating graphs and charts like bar charts, pie charts, and scatterplots. SAS code is provided for merging data, conducting statistical tests, and creating various graphs and visualizations to analyze the social media recruitment data.
A guide to realistic social media and measurementAdam Vincenzini
Social media measurement and performance analysis is one of the most debated topics in the current marketing environment.
Recently I hosted a workshop for the PRIA which attempted to put social media measurement in perspective, especially when linking it to tangible business objectives.
This is not an exhaustive presentation, nor will it answer every question linked to social media measurement, but it will hopefully give you a useful resource to refer to.
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...Mark Rittman
Mark Rittman from Rittman Mead presented on Oracle Big Data Discovery. He discussed how many organizations are running big data initiatives involving loading large amounts of raw data into data lakes for analysis. Oracle Big Data Discovery provides a visual interface for exploring, analyzing, and transforming this raw data. It allows users to understand relationships in the data, perform enrichments, and prepare the data for use in tools like Oracle Business Intelligence.
Building the Inform Semantic Publishing Ecosystem: from Author to AudienceVital.AI
This document summarizes Inform, a content enrichment solution that uses semantic technologies to increase engagement for publishers. It analyzes content to extract topics, entities and related content which it then links and distributes through various channels. The summary focuses on how Inform processes content from authors, analyzes it semantically, and distributes related content to audiences through its platform and various distribution channels like widgets, microsites and social networks like Facebook.
The Maturity Model: Taking the Growing Pains Out of HadoopInside Analysis
The Briefing Room with Rick van der Lans and Think Big, a Teradata Company
Live Webcast on June 16, 2015
Watch the archive: https://bloorgroup.webex.com/bloorgroup/lsr.php?RCID=197f8106531874cc5c14081ca214eaff
Hadoop is arguably one of the most disruptive technologies of the last decade. Once lauded solely for its ability to transform the speed of batch processing, it has marched steadily forward and promulgated an array of performance-enhancing accessories, notably Spark and YARN. Hadoop has evolved into much more than a file system and batch processor, and it now promises to stand as the data management and analytics backbone for enterprises.
Register for this episode of The Briefing Room to learn from veteran Analyst Rick van der Lans, as he discusses the emerging roles of Hadoop within the analytics ecosystem. He’ll be briefed by Ron Bodkin of Think Big, a Teradata Company, who will explore Hadoop’s maturity spectrum, from typical entry use cases all the way up the value chain. He’ll show how enterprises that already use Hadoop in production are finding new ways to exploit its power and build creative, dynamic analytics environments.
Visit InsideAnalysis.com for more information.
"Semantic Integration Is What You Do Before The Deep Learning". dev.bg Machine Learning seminar, 13 May 2019.
It's well known that 80\% of the effort of a data scientist is spent on data preparation. Semantic integration is arguably the best way to spend this effort more efficiently and to reuse it between tasks, projects and organizations. Knowledge Graphs (KG) and Linked Open Data (LOD) have become very popular recently. They are used by Google, Amazon, Bing, Samsung, Springer Nature, Microsoft Academic, AirBnb… and any large enterprise that would like to have a holistic (360 degree) view of its business. The Semantic Web (web 3.0) is a way to build a Giant Global Graph, just like the normal web is a Global Web of Documents. IEEE already talks about Big Data Semantics. We review the topic of KGs and their applicability to Machine Learning.
Architecting for Big Data: Trends, Tips, and Deployment OptionsCaserta
Joe Caserta, President at Caserta Concepts addressed the challenges of Business Intelligence in the Big Data world at the Third Annual Great Lakes BI Summit in Detroit, MI on Thursday, March 26. His talk "Architecting for Big Data: Trends, Tips and Deployment Options," focused on how to supplement your data warehousing and business intelligence environments with big data technologies.
For more information on this presentation or the services offered by Caserta Concepts, visit our website: http://casertaconcepts.com/.
Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...Open Analytics
This document discusses using social media, cloud computing, machine learning, open source, and big data analytics to analyze Twitter data. It describes how to collect tweets using the Twitter API, classify tweets in real-time using machine learning models on AWS, store classified tweets in MongoDB on AWS, and present results. Cost estimates for real-time classification of 1 million tweets per day are provided. Use cases described include tracking food poisoning reports and disease occurrence. Future directions discussed include developing turnkey services and linking to additional open data sources.
This document discusses using social media, cloud computing, machine learning, open source, and big data analytics to analyze Twitter data. It describes how to collect tweets using the Twitter API, classify tweets in real-time using machine learning models on AWS, store classified tweets in MongoDB on AWS, and present results. Cost estimates for real-time classification of 1 million tweets per day are provided. Use cases described include tracking food poisoning reports and disease occurrence. Future directions discussed include developing turnkey services and linking to additional open data sources.
Talk given to the Philly Python Users Group (PUG) on October 1, 2015: http://www.meetup.com/phillypug/ Thanks SIG (http://www.sig.com) for hosting!
Kelly O'Briant - DataOps in the Cloud: How To Supercharge Data Science with a...Rehgan Avon
2018 Women in Analytics Conference
https://www.womeninanalytics.org/
Over the last year I’ve become obsessed with learning how to be a better "cloud computing evangelist to data scientists" - specifically to the R community. I’ve learned that this isn’t often an easy undertaking. Most people (data scientists or not) are skeptical of changing up the tools and workflows they’ve come to rely on when those systems seem to be working. Resistance to change increases even further with barriers to quick adoption, such as having to teach yourself a completely new technology or framework. I’d like to give a talk about how working in the cloud changes data science and how exploring these tools can lead to a world of new possibilities within the intersection of DevOps and Data Analytics.
Topics to discuss:
- Working through functionality/engineering challenges with R in a cloud environment
- Opportunities to customize and craft your ideal version of R/RStudio
- Making and embracing a decision on what is “real" about your analysis or daily work (Chapter 6 in R for Data Science)
- Running multiple R instances in the cloud (why would you want to do this?)
- Becoming an R/Data Science Collaboration wizard: Building APIs with Plumber in the Cloud
Let's analyze how world reacts to road traffic by sentiment analysis finalSajeetharan
Sentiment analysis uses natural language processing to identify opinions in text as positive, negative, or neutral. Analyzing Twitter data through sentiment analysis can provide insight into public opinions on various topics. The presentation described how sentiment analysis of Twitter data on road traffic could work, using Azure cognitive services and Logic Apps for processing without code. A demo then showed these Azure services in action for sentiment analysis.
OBIEE, Endeca, Hadoop and ORE Development (on Exalytics) (ODTUG 2013)Mark Rittman
A presentation from ODTUG 2013 on tools other than OBIEE for Exalytics, focusing on analysis of non-traditional data via Endeca, "big data" via Hadoop and statistical analysis / predictive modeling through Oracle R Enterprise, and the benefits of running these tools on Oracle Exalytics
Big Data in Action – Real-World Solution ShowcaseInside Analysis
The Briefing Room with Radiant Advisors and IBM
Live Webcast on February 25, 2014
Watch the archive: https://bloorgroup.webex.com/bloorgroup/lsr.php?RCID=53c9b7fa2000f98f5b236747e3602511
The power of Big Data depends heavily upon the context in which it's used, and most organizations are just beginning to figure out where, how and when to leverage it. One key to success is integration with existing information systems, many of which still rely on relational database technologies. Finding ways to blend these two worlds can help companies generate measurable business value in fairly short order.
Register for this episode of The Briefing Room to hear Analysts Lindy Ryan and John O'Brien as they explain how the combination of traditional Business Intelligence with Big Data Analytics can provide game-changing results in today's information economy. They'll be briefed by Eric Poulin and Paul Flach of Stream Integration who will share best practices for designing and implementing Big Data solutions. They'll discuss the components of IBM BigInsights, and explain how BigSheets can empower non-technical users who need to explore self-structured data.
Visit InsideAnlaysis.com for more information.
The free, one-hour webinar, Sourcing with Social Media: Tips from a Corporate Sleuth, was originally held Nov. 13, 2013.
During this session, the principal in a competitive-intelligence firm will teach you how to harness social media to identify “influencers” – both regionally and nationally – in industries you cover, as well as how to contact them successfully.
YOU WILL LEARN HOW TO:
Find people who are experts on the topics related to your story
Identify sources at a regional and a national level
Get from their handle to their real name, and find them on many sites
“Listen in” on these people as they broadcast across a variety of social media
Determine their tone related to the topic – pro/against, etc.
Determine the extent of their reach; when they talk, how many people listen?
Determine the best way to reach out to them and make contact
YOUR INSTRUCTOR
Sean Campbell is a co-owner of Cascade Insights, a competitive-intelligence and market-research firm near Portland, Ore., that serves the technology industry. Before founding Cascade in 2006, he co-created and sold 3 Leaf, a technical consultancy that worked for some of the world’s largest technology companies, including Microsoft and Intel.
He teaches courses in industry analysis and competitive intelligence in Willamette University’s MBA Program.
His book, “Going Beyond Google: Gathering Internet Intelligence,” which he co-authored with Cascade co-owner Scott Swigart, was listed as a best read by the Strategic and Competitive Intelligence Professionals Association for 2009 and 2010.
View or download this Cascade Insights primer to learn more about Campbell’s company.
For more information training opportunities for business journalists, please visit http://businessjournalism.org.
Analytical Innovation: How to Build the Next Generation Data PlatformVMware Tanzu
There was a time when the Enterprise Data Warehouse (EDW) was the only way to provide a 360-degree analytical view of the business. In recent years many organizations have deployed disparate analytics alternatives to the EDW, including: cloud data warehouses, machine learning frameworks, graph databases, geospatial tools, and other technologies. Often these new deployments have resulted in the creation of analytical silos that are too complex to integrate, seriously limiting global insights and innovation.
Join guest speaker, 451 Research’s Jim Curtis and Pivotal’s Jacque Istok for an interactive discussion about some of the overarching trends affecting the data warehousing market, as well as how to build a next generation data platform to accelerate business innovation. During this webinar you will learn:
- The significance of a multi-cloud, infrastructure-agnostic analytics
- What is working and what isn’t, when it comes to analytics integration
- The importance of seamlessly integrating all your analytics in one platform
- How to innovate faster, taking advantage of open source and agile software
Speakers: James Curtis, Senior Analyst, Data Platforms & Analytics, 451 Research & Jacque Istok, Head of Data, Pivotal
Accelerating Data Lakes and Streams with Real-time AnalyticsArcadia Data
As organizations modernize their data and analytics platforms, the data lake concept has gained momentum as a shared enterprise resource for supporting insights across multiple lines of business. The perception is that data lakes are vast, slow-moving bodies of data, but innovations like Apache Kafka for streaming-first architectures put real-time data flows at the forefront. Combining real-time alerts and fast-moving data with rich historical analysis lets you respond quickly to changing business conditions with powerful data lake analytics to make smarter decisions.
Join this complimentary webinar with industry experts from 451 Research and Arcadia Data who will discuss:
- Business requirements for combining real-time streaming and ad hoc visual analytics.
- Innovations in real-time analytics using tools like Confluent’s KSQL.
- Machine-assisted visualization to guide business analysts to faster insights.
- Elevating user concurrency and analytic performance on data lakes.
- Applications in cybersecurity, regulatory compliance, and predictive maintenance on manufacturing equipment all benefit from streaming visualizations.
This document discusses how big data and analytics are moving from on-premises data warehouses to hybrid cloud environments that leverage technologies like Hadoop, Spark, and machine learning. It provides examples of how Oracle is helping customers with this transition by offering big data cloud services that give them flexibility to run workloads both on-premises and in the cloud while simplifying data management and enabling new types of advanced analytics.
In the last weeks of 2017, Google released a Rich Results Testing Tool to help webmasters understand what pages can generate rich results, based on their structured data implementation.
This new tool, coming from the search giant, is just one of the many recent affirmations of structured data’s continued and growing importance to search optimization in 2018 and beyond.
But why is structured data important to search? How does it impact your SEO strategy? And most importantly, what can you do to optimize structured data and maximize your potential in the SERPs?
Knowledge extraction and incorporation is currently considered to be beneficial for efficient Big Data analytics. Knowledge can take part in workflow design, constraint definition, parameter selection and configuration, human interactive and decision-making strategies. Here we present BIGOWL, an ontology to support knowledge management in Big Data analytics. BIGOWL is designed to cover a wide vocabulary of terms concerning Big Data analytics workflows, including their components and how they are connected, from data sources to the analytics visualization. It also takes into consideration aspects such as parameters, restrictions and formats. This ontology defines not only the taxonomic relationships between the different concepts, but also instances representing specific individuals to guide the users in the design of Big Data analytics workflows. For testing purposes, two case studies are developed, which consists in: first, real-world streaming processing with Spark of traffic Open Data, for route optimization in urban environment of New York city; and second, data mining classification of an academic dataset on local/cloud platforms. The analytics workflows resulting from the BIGOWL semantic model are validated and successfully evaluated.
The Future of Analytics, Data Integration and BI on Big Data PlatformsMark Rittman
The document discusses the future of analytics, data integration, and business intelligence (BI) on big data platforms like Hadoop. It covers how BI has evolved from old-school data warehousing to enterprise BI tools to utilizing big data platforms. New technologies like Impala, Kudu, and dataflow pipelines have made Hadoop fast and suitable for analytics. Machine learning can be used for automatic schema discovery. Emerging open-source BI tools and platforms, along with notebooks, bring new approaches to BI. Hadoop has become the default platform and future for analytics.
IlOUG Tech Days 2016 - Big Data for Oracle Developers - Towards Spark, Real-T...Mark Rittman
Mark Rittman, CTO of Rittman Mead, gave a keynote presentation on big data for Oracle developers and DBAs with a focus on Apache Spark, real-time analytics, and predictive analytics. He discussed how Hadoop can provide flexible, cheap storage for logs, feeds, and social data. He also explained several Hadoop processing frameworks like Apache Spark, Apache Tez, Cloudera Impala, and Apache Drill that provide faster alternatives to traditional MapReduce processing.
Deploying Full Oracle BI Platforms to Oracle Cloud - OOW2015Mark Rittman
- Mark Rittman presented on deploying full OBIEE systems to Oracle Cloud. This involves migrating the data warehouse to Oracle Database Cloud Service, updating the RPD to connect to the cloud database, and uploading the RPD to Oracle BI Cloud Service. Using the wider Oracle PaaS ecosystem allows hosting a full BI platform in the cloud.
OBIEE11g Seminar by Mark Rittman for OU Expert Summit, Dubai 2015Mark Rittman
Slides from a two-day OBIEE11g seminar in Dubai, February 2015, at the Oracle University Expert Summit. Covers the following topics:
1. OBIEE 11g Overview & New Features
2. Adding Exalytics and In-Memory Analytics to OBIEE 11g
3. Source Control and Concurrent Development for OBIEE
4. No Silver Bullets - OBIEE 11g Performance in the Real World
5. Oracle BI Cloud Service Overview, Tips and Techniques
6. Moving to Oracle BI Applications 11g + ODI
7. Oracle Essbase and Oracle BI EE 11g Integration Tips and Techniques
8. OBIEE 11g and Predictive Analytics, Hadoop & Big Data
UKOUG Tech'14 Super Sunday : Deep-Dive into Big Data ETL with ODI12cMark Rittman
This document discusses using Hadoop and Hive for ETL work. It provides an overview of using Hadoop for distributed processing and storage of large datasets. It describes how Hive provides a SQL interface for querying data stored in Hadoop and how various Apache tools can be used to load, transform and store data in Hadoop. Examples of using Hive to view table metadata and run queries are also presented.
Part 1 - Introduction to Hadoop and Big Data Technologies for Oracle BI & DW ...Mark Rittman
Delivered as a one-day seminar at the SIOUG and HROUG Oracle User Group Conferences, October 2014
In this presentation we cover some key Hadoop concepts including HDFS, MapReduce, Hive and NoSQL/HBase, with the focus on Oracle Big Data Appliance and Cloudera Distribution including Hadoop. We explain how data is stored on a Hadoop system and the high-level ways it is accessed and analysed, and outline Oracle’s products in this area including the Big Data Connectors, Oracle Big Data SQL, and Oracle Business Intelligence (OBI) and Oracle Data Integrator (ODI).
Part 4 - Hadoop Data Output and Reporting using OBIEE11gMark Rittman
Delivered as a one-day seminar at the SIOUG and HROUG Oracle User Group Conferences, October 2014.
Once insights and analysis have been produced within your Hadoop cluster by analysts and technical staff, it’s usually the case that you want to share the output with a wider audience in the organisation. Oracle Business Intelligence has connectivity to Hadoop through Apache Hive compatibility, and other Oracle tools such as Oracle Big Data Discovery and Big Data SQL can be used to visualise and publish Hadoop data. In this final session we’ll look at what’s involved in connecting these tools to your Hadoop environment, and also consider where data is optimally located when large amounts of Hadoop data need to be analysed alongside more traditional data warehouse datasets
Part 2 - Hadoop Data Loading using Hadoop Tools and ODI12cMark Rittman
Delivered as a one-day seminar at the SIOUG and HROUG Oracle User Group Conferences, October 2014.
There are many ways to ingest (load) data into a Hadoop cluster, from file copying using the Hadoop Filesystem (FS) shell through to real-time streaming using technologies such as Flume and Hadoop streaming. In this session we’ll take a high-level look at the data ingestion options for Hadoop, and then show how Oracle Data Integrator and Oracle GoldenGate leverage these technologies to load and process data within your Hadoop cluster. We’ll also consider the updated Oracle Information Management Reference Architecture and look at the best places to land and process your enterprise data, using Hadoop’s schema-on-read approach to hold low-value, low-density raw data, and then use the concept of a “data factory” to load and process your data into more traditional Oracle relational storage, where we hold high-density, high-value data.
Thingyan is now a global treasure! See how people around the world are search...Pixellion
We explored how the world searches for 'Thingyan' and 'သင်္ကြန်' and this year, it’s extra special. Thingyan is now officially recognized as a World Intangible Cultural Heritage by UNESCO! Dive into the trends and celebrate with us!
AI Competitor Analysis: How to Monitor and Outperform Your CompetitorsContify
AI competitor analysis helps businesses watch and understand what their competitors are doing. Using smart competitor intelligence tools, you can track their moves, learn from their strategies, and find ways to do better. Stay smart, act fast, and grow your business with the power of AI insights.
For more information please visit here https://www.contify.com/
Andhra Pradesh Micro Irrigation Project” (APMIP), is the unique and first comprehensive project being implemented in a big way in Andhra Pradesh for the past 18 years.
The Project aims at improving
computer organization and assembly language : its about types of programming language along with variable and array description..https://www.nfciet.edu.pk/
2. [email protected] www.rittmanmead.com @rittmanmead 2
•Oracle Gold Partner with offices in the UK and USA (Atlanta)
•70+ staff delivering Oracle BI, DW, Big Data and Advanced Analytics projects
•Oracle ACE Director (Mark Rittman, CTO) + 2 Oracle ACEs
•Significant web presence with the Rittman Mead Blog (http://www.rittmanmead.com)
•Regular sers of social media
(Facebook, Twitter, Slideshare etc)
•Regular column in Oracle Magazine
and other publications
•Hadoop R&D lab for “dogfooding”
solutions developed for customers
About Rittman Mead
3. [email protected] www.rittmanmead.com @rittmanmead 3
Business Scenario
•Rittman Mead want to understand drivers and audience for their website
‣What is our most popular content? Who are the most in-demand blog authors?
‣Who are the influencers? What communities exist around our web presence?
•Three data sources in scope:
RM Website Logs Twitter Stream Website Posts, Comments etc
4. [email protected] www.rittmanmead.com @rittmanmead X
•Initial iteration of project focused on capturing and ingesting web + social media activity
•Apache Flume used for capturing website hits, page views
•Twitter Streaming API used to capture tweets referring to RM website or RM staff
•Activity landed into Hadoop (HDFS), processed and enriched and presented using Hive
Overall Project Architecture - Phase 1
5. [email protected] www.rittmanmead.com @rittmanmead X
•Provided real-time counts of page views, correlated with Twitter activity stored in Hive tables
•Accessed using Oracle Big Data SQL +
joined to Oracle RDBMS reference data
•Delivered using OBIEE reports and dashboards
•Data Warehousing, but cheaper + real-time
•Answered questions such as
‣What are our most popular site pages?
‣Which pages attracted the most
attention on Twitter, Facebook?
‣What topics are popular?
Real-Time Metrics around Site Activity - “What?”
Combine with Oracle Big Data SQL
for structured OBIEE dashboard analysis
What pages are people visiting?
Who is referring to us on Twitter?
What content has the most reach?
6. [email protected] www.rittmanmead.com @rittmanmead X
•Oracle Big Data Discovery used to go back to the raw event data add more meaning
•Enrich data, extract nouns + terms, add reference data from file, RDBMS etc
•Understand sentiment + meaning of tweets, link disparate + loosely coupled events
•Faceted search dashboards
Oracle BDD for Data Wrangling + Data Enrichment
7. [email protected] www.rittmanmead.com @rittmanmead 4
OBIEE and BDD for the “What” and “Why” Questions…
•Counts of page views, tweets, mentions etc helped us understand what content was popular
•Analysis of tweet sentiment, meaning and correlation with content answered why
Combine with Oracle Big Data SQL
for structured OBIEE dashboard analysis
Combine with site content, semantics, text enrichment
Catalog and explore using Oracle Big Data Discovery
What pages are people visiting?
Who is referring to us on Twitter?
What content has the most reach?
Why is some content more popular?
Does sentiment affect viewership?
What content is popular, where?
8. [email protected] www.rittmanmead.com @rittmanmead 5
•Previous counts assumed that all tweet references equally important
•But some Twitter users are far more influential than others
‣Sit at the centre of a community, have 1000’s of followers
‣A reference by them has massive impact on page views
‣Positive or negative comments from them drive perception
•Can we identify them?
‣Potentially “reach out” with analyst program
‣Study what website posts go “viral”
‣Understand out audience, and the conversation, better
But Who Are The Influencers In Our Community?
Influencer Identification
Communication
Stream (e.g. tweets)
Find out people that are
central in the given
network – e.g. influencer
marketing
9. [email protected] www.rittmanmead.com @rittmanmead 6
•Rittman Mead website features many types of content
‣Blogs on BI, data integration, big data, data warehousing
‣Op-Eds (“OBIEE12c - Three Months In, What’s the Verdict?”)
‣Articles on a theme, e.g. performance tuning
‣Details of new courses, new promotions
•Different communities likely to form around these content types
•Different influencers and patterns of recommendation, discovery
•Can we identify some of the communities, segment our audience?
What Communities and Networks Are Our Audience?
Community Detection
Identify group of people
that are close to each other
– e.g. target group
marketing
12. [email protected] www.rittmanmead.com @rittmanmead 8
Common Big Data Graph Analysis Use-Cases
Purchase Record
customer items
Product Recommendation Influencer Identification
Communication
Stream (e.g. tweets)
Graph Pattern MatchingCommunity Detection
Recommend the most
similar item purchased by
similar people
Find out people that are
central in the given
network – e.g. influencer
marketing
Identify group of people
that are close to each other
– e.g. target group
marketing
Find out all the sets of
entities that match to the
given pattern – e.g. fraud
detection
10
13. [email protected] www.rittmanmead.com @rittmanmead 9
Graph Example : RM Blog Post Referenced on Twitter
Lifting the Lid on OBIEE Internals with
Linux Diagnostics Tools http://t.co/gFcUPOm5pI
00 0 0 Page Views10 0 0 Page Views
Follows
20 0 0 Page Views
Follows
30 0 0 Page Views
14. [email protected] www.rittmanmead.com @rittmanmead 10
Network Effect Magnified by Extent of Social Graph
Lifting the Lid on OBIEE Internals with
Linux Diagnostics Tools http://t.co/gFcUPOm5pI
30 0 0 Page Views70 0 5 Page Views
Lifting the Lid on OBIEE Internals with
Linux Diagnostics Tools http://t.co/gFcUPOm5pI
15. [email protected] www.rittmanmead.com @rittmanmead 11
Retweets by Influential Twitter Users Drive Visits
Lifting the Lid on OBIEE Internals with
Linux Diagnostics Tools http://t.co/gFcUPOm5pI
30 0 0 Page Views
Retweet
50 0 3 Page ViewsRT: Lifting the Lid on OBIEE Internals with
Linux Diagnostics Tools http://t.co/gFcUPOm5pI
17. [email protected] www.rittmanmead.com @rittmanmead X
Property Graph Terminology
Lifting the Lid on OBIEE Internals with
Linux Diagnostics Tools http://t.co/gFcUPOm5pI
Mentions
Node, or “Vertex”
Node, or “Vertex”
Directed Connection, or “Edge”
Edge Type
Vertex Properties
18. [email protected] www.rittmanmead.com @rittmanmead 13
Property Graph Terminology
Lifting the Lid on OBIEE Internals with
Linux Diagnostics Tools http://t.co/gFcUPOm5pI
Mentions
Lifting the Lid on OBIEE Internals with
Linux Diagnostics Tools http://t.co/gFcUPOm5pI
Retweets
Node, or “Vertex”
Directed Connection, or “Edge”
Node, or “Vertex”
19. [email protected] www.rittmanmead.com @rittmanmead 14
•Different types of Twitter interaction could imply more or less “influence”
‣Retweet of another user’s Tweet
implies that person is worth quoting
or you endorse their opinion
‣Reply to another user’s tweet
could be a weaker recognition of
that person’s opinion or view
‣Mention of a user in a tweet is a
weaker recognition that they are
part of a community / debate
Determining Influencers - Factors to Consider
20. [email protected] www.rittmanmead.com @rittmanmead 15
Relative Importance of Edge Types Added via Weights
Lifting the Lid on OBIEE Internals with
Linux Diagnostics Tools http://t.co/gFcUPOm5pI
Mentions, Weight = 30
Lifting the Lid on OBIEE Internals with
Linux Diagnostics Tools http://t.co/gFcUPOm5pI
Retweet, Weight = 100
Edge Property
Edge Property
21. [email protected] www.rittmanmead.com @rittmanmead X
•Graph, spatial and raster data processing for big data
‣Primarily documented + tested against Oracle BDA
‣Installable on commodity cluster using CDH
•Data stored in Apache HBase or Oracle NoSQL DB
‣Complements Spatial & Graph in Oracle Database
‣Designed for trillions of nodes, edges etc
•Out-of-the-box spatial enrichment services
•Over 35 of most popular graph analysis functions
‣Graph traversal, recommendations
‣Finding communities and influencers,
‣Pattern matching
Oracle Big Data Spatial & Graph
22. [email protected] www.rittmanmead.com @rittmanmead
•Data loaded from files or through Java API into HBase
•In-Memory Analytics layer runs common graph and spatial algorithms on data
•Visualised using R or other
graphics packaged
Oracle Big Data Graph and Spatial Architecture
Massively Scalable Graph Store
• Oracle NoSQL
• HBase
Lightning-Fast In-Memory Analytics
• YARN Container
• Standalone Server
• Embedded
23. [email protected] www.rittmanmead.com @rittmanmead 16
•ODI12c used to prepare two files in Oracle Flat File Format
‣Extracted vertices and edges from existing data in Hive
‣Wrote vertices (Twitter users) to .opv file,
edges (RTs, replies etc) to .ope file
•For exercise, only considered 2-3 days of tweets
‣Did not include follows (user A followed user B)
as not reported by Twitter Streaming API
‣Could approximate larger follower networks through
multiplying weight of edge by follower scale
-Useful for Page Rank, but does it skew
actual detection of influencers in exercise?
Preparing Vertices and Edges for Ingestion
24. [email protected] www.rittmanmead.com @rittmanmead 17
Oracle Flat File Format Vertices and Edge Files
• Unique ID for the vertex
• Property name (“name”)
• Property value datatype (1 = String)
• Property value (“markrittman”)
Vertex File (.opv)
• Unique ID for the edge
• Leading edge vertex ID
• Trailing edge vertex ID
• Edge Type (“mentions”)
• Edge Property (“weight”)
• Edge Property datatype and value
Edge File (.ope)
25. [email protected] www.rittmanmead.com @rittmanmead 18
cfg = GraphConfigBuilder.forPropertyGraphHbase()
.setName("connectionsHBase")
.setZkQuorum("bigdatalite").setZkClientPort(2181)
.setZkSessionTimeout(120000).setInitialEdgeNumRegions(3)
.setInitialVertexNumRegions(3).setSplitsPerRegion(1)
.addEdgeProperty("weight", PropertyType.DOUBLE, "1000000")
.build();
opg = OraclePropertyGraph.getInstance(cfg);
opg.clearRepository();
vfile="../../data/biwa_connections.opv"
efile="../../data/biwa_connections.ope"
opgdl=OraclePropertyGraphDataLoader.getInstance();
opgdl.loadData(opg, vfile, efile, 2);
// read through the vertices
opg.getVertices();
// read through the edges
opg.getEdges();
Loading Edges and Vertices into HBase
Uses “Gremlin” Shell for HBase
• Creates connection to HBase
• Sets initial configuration for database
• Builds the database ready for load
• Defines location of Vertex and Edge files
• Creates instance of
OraclePropertyGraphDataLoader
• Loads data from files
• Prepares the property graph for use
• Loads in Edges and Vertices
• Now ready for in-memory processing
26. [email protected] www.rittmanmead.com @rittmanmead 19
Calculating Most Influential Tweeters Using Page Rank
vOutput="/tmp/mygraph.opv"
eOutput="/tmp/mygraph.ope"
OraclePropertyGraphUtils.exportFlatFiles(opg, vOutput, eOutput, 2,
false);
session = Pgx.createSession("session-id-1");
analyst = session.createAnalyst();
graph = session.readGraphWithProperties(opg.getConfig());
rank = analyst.pagerank(graph, 0.001, 0.85, 100);
rank.getTopKValues(5);
==>PgxVertex with ID 1=0.13885623487462861
==>PgxVertex with ID 3=0.08686102641801993
==>PgxVertex with ID 101=0.06757752513733056
==>PgxVertex with ID 6=0.06743774001139484
==>PgxVertex with ID 37=0.0481517609757462
==>PgxVertex with ID 17=0.042234536894569276
==>PgxVertex with ID 29=0.04109794527311113
==>PgxVertex with ID 65=0.032058649698044187
==>PgxVertex with ID 15=0.023075360575195276
==>PgxVertex with ID 93=0.019265959946506813
• Initiates an in-memory analytics session
• Runs Page Rank algorithm to determine influencers
• Outputs top ten vertices (users)
Top 10 vertices
28. [email protected] www.rittmanmead.com @rittmanmead 21
•Open source graph analysis tool with Oracle
Big Data Graph and Spatial Plug-in
•Available shortly from Oracle, connects to
Oracle NoSQL or HBase and runs Page
Rank etc
•Alternative to command-line for In-Memory
Analytics once base graph created
Visualising Property Graphs with Cytoscape
29. [email protected] www.rittmanmead.com @rittmanmead 22
Calculating Top 10 Users using Page Rank Algorithm
Top 10 influencers:
markrittman
rmoff
rittmanmead
mRainey
JeromeFr
Nephentur
borkur
BIExperte
i_m_dave
dw_pete
34. [email protected] www.rittmanmead.com @rittmanmead 27
Determining Communities via Twitter Interactions
• Clusters based on actual interaction
patterns, not hashtags
• Detects real communities, not ones
that exist just in-theory
35. [email protected] www.rittmanmead.com @rittmanmead 28
Conclusions, and Further Reading
•Tools such as OBIEE are great for understanding what (counts, page views, popular items)
•Oracle Big Data Discovery can be useful for understanding “why?” (sentiment, terms etc)
•Graph Analysis can help answer “who”?
•Who are our audience? What are our communities? Who are their important influencers?
•Oracle Big Data Graph and Spatial can answer these questions to “big data” scale
•Articles on the Rittman Mead Blog
‣http://www.rittmanmead.com/category/oracle-big-data-appliance/
‣http://www.rittmanmead.com/category/big-data/
‣http://www.rittmanmead.com/category/oracle-big-data-discovery/
•Rittman Mead offer consulting, training and managed services for Oracle Big Data
‣http://www.rittmanmead.com/bigdata