Pentaho Big Data Analytics with Vertica and HadoopMark Kromer
Overview of the Pentaho Big Data Analytics Suite from the Pentaho + Vertica presentation at Big Data Techcon 2014 in Boston for the session called "The Ultimate Selfie | Picture Yourself with the Fastest Analytics on Hadoop with HP Vertica and Pentaho"
Big Data Analytics Projects - Real World with PentahoMark Kromer
This document discusses big data analytics projects and technologies. It provides an overview of Hadoop, MapReduce, YARN, Spark, SQL Server, and Pentaho tools for big data analytics. Specific scenarios discussed include digital marketing analytics using Hadoop, sentiment analysis using MongoDB and SQL Server, and data refinery using Hadoop, MPP databases, and Pentaho. The document also addresses myths and challenges around big data and provides code examples of MapReduce jobs.
Here I talk about examples and use cases for Big Data & Big Data Analytics and how we accomplished massive-scale sentiment, campaign and marketing analytics for Razorfish using a collecting of database, Big Data and analytics technologies.
Big Data in the Cloud with Azure Marketplace ImagesMark Kromer
The document discusses strategies for modern data warehousing and analytics on Azure including using Hadoop for ETL/ELT, integrating streaming data engines, and using lambda and hybrid architectures. It also describes using data lakes on Azure to collect and analyze large amounts of data from various sources. Additionally, it covers performing real-time stream analytics, machine learning, and statistical analysis on the data and discusses how Azure provides scalability, speed of deployment, and support for polyglot environments that incorporate many data processing and storage options.
Big Data Analytics with Hadoop, MongoDB and SQL ServerMark Kromer
This document discusses SQL Server and big data analytics projects in the real world. It covers the big data technology landscape, big data analytics, and three big data analytics scenarios using different technologies like Hadoop, MongoDB, and SQL Server. It also discusses SQL Server's role in the big data world and how to get data into Hadoop for analysis.
This document discusses Big Data solutions in Microsoft Azure. It introduces Azure cloud services and provides an overview of Big Data and how it differs from traditional databases. It then outlines Microsoft's Big Data solutions built on Hortonworks Data Platform, including HDInsight which allows running Hadoop on Azure. HDInsight supports various data storage options and processing tools like Hive, Pig, and Storm. The document also covers designing HDInsight clusters and Azure Data Lake for unlimited storage of structured and unstructured data.
The document discusses data architecture solutions for solving real-time, high-volume data problems with low latency response times. It recommends a data platform capable of capturing, ingesting, streaming, and optionally storing data for batch analytics. The solution should provide fast data ingestion, real-time analytics, fast action, and quick time to value. Multiple data sources like logs, social media, and internal systems would be ingested using Apache Flume and Kafka and analyzed with Spark/Storm streaming. The processed data would be stored in HDFS, Cassandra, S3, or Hive. Kafka, Spark, and Cassandra are identified as key technologies for real-time data pipelines, stream analytics, and high availability persistent storage.
The document discusses modernizing a data warehouse using the Microsoft Analytics Platform System (APS). APS is described as a turnkey appliance that allows organizations to integrate relational and non-relational data in a single system for enterprise-ready querying and business intelligence. It provides a scalable solution for growing data volumes and types that removes limitations of traditional data warehousing approaches.
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...Mark Rittman
Mark Rittman from Rittman Mead presented on Oracle Big Data Discovery. He discussed how many organizations are running big data initiatives involving loading large amounts of raw data into data lakes for analysis. Oracle Big Data Discovery provides a visual interface for exploring, analyzing, and transforming this raw data. It allows users to understand relationships in the data, perform enrichments, and prepare the data for use in tools like Oracle Business Intelligence.
Presto – Today and Beyond – The Open Source SQL Engine for Querying all Data...Dipti Borkar
Born at Facebook, Presto is an open source high performance, distributed SQL query engine. With the disaggregation of storage and compute, Presto was created to simplify querying of all data lakes - cloud data lakes like S3 and on premise data lakes like HDFS. Presto's high performance and flexibility has made it a very popular choice for interactive query workloads on large Hadoop-based clusters as well as AWS S3, Google Cloud Storage and Azure blob store. Today it has grown to support many users and use cases including ad hoc query, data lake house analytics, and federated querying. In this session, we will give an overview on Presto including architecture and how it works, the problems it solves, and most common use cases. We'll also share the latest innovation in the project as well as the future roadmap.
This document discusses how to build a successful data lake by focusing on the right data, platform, and interface. It emphasizes the importance of saving raw data to analyze later, organizing the data lake into zones with different governance levels, and providing self-service tools to find, understand, provision, prepare, and analyze data. It promotes the use of a smart data catalog like Waterline Data to automate metadata tagging, enable data discovery and collaboration, and maximize business value from the data lake.
What is an Open Data Lake? - Data Sheets | WhitepaperVasu S
A data lake, where data is stored in an open format and accessed through open standards-based interfaces, is defined as an Open Data Lake.
https://www.qubole.com/resources/data-sheets/what-is-an-open-data-lake
Everyone is awash in the new buzzword, Big Data, and it seems as if you can’t escape it wherever you go. But there are real companies with real use cases creating real value for their businesses by using big data. This talk will discuss some of the more compelling current or recent projects, their architecture & systems used, and successful outcomes.
Introduction to Kudu - StampedeCon 2016StampedeCon
Over the past several years, the Hadoop ecosystem has made great strides in its real-time access capabilities, narrowing the gap compared to traditional database technologies. With systems such as Impala and Spark, analysts can now run complex queries or jobs over large datasets within a matter of seconds. With systems such as Apache HBase and Apache Phoenix, applications can achieve millisecond-scale random access to arbitrarily-sized datasets.
Despite these advances, some important gaps remain that prevent many applications from transitioning to Hadoop-based architectures. Users are often caught between a rock and a hard place: columnar formats such as Apache Parquet offer extremely fast scan rates for analytics, but little to no ability for real-time modification or row-by-row indexed access. Online systems such as HBase offer very fast random access, but scan rates that are too slow for large scale data warehousing workloads.
This talk will investigate the trade-offs between real-time transactional access and fast analytic performance from the perspective of storage engine internals. It will also describe Kudu, the new addition to the open source Hadoop ecosystem that fills the gap described above, complementing HDFS and HBase to provide a new option to achieve fast scans and fast random access from a single API.
The document introduces the Teradata Aster Discovery Platform for scalable analytics using analytic algorithms on commodity hardware. It discusses use cases like credit risk analysis, fraud detection, and sentiment analysis. It provides an overview of the discovery process model and instructions for downloading, installing, and using Aster including setting up the Aster Management Console and AsterLens for visualization. It then provides examples of using various Aster analytic functions like k-means clustering, market basket analysis, data unpacking, and nPath analysis for applications in marketing, pricing, and web analytics. It concludes that Aster provides more powerful analytics capabilities than SQL alone for exploring big data.
Cortana Analytics Workshop: Operationalizing Your End-to-End Analytics SolutionMSAdvAnalytics
Wee Hyong Tok. With Azure Data Factory (ADF), existing data movement and analytics processing services can be composed into data pipelines that are highly available and managed in the cloud. In this demo-driven session, you learn by example how to build, operationalize, and manage scalable analytics pipelines. Go to https://channel9.msdn.com/ to find the recording of this session.
The ecosystem for big data and analytics has become too large and complex, with too many vendors, distributions, engines, projects, and iterations. This leads to problems like analysis paralysis during platform decisions, solutions becoming obsolete too quickly, and constant stress over choosing the right engine for each job. The document suggests that industries, vendors, investors, analysts, technologists, and customers all need to take steps to reduce complexity and focus on standards and merit-based evaluations of options.
Webinar: Proofpoint, a pioneer in security-as-a-service protects people, info...DataStax
Proofpoint is a $3 billion public cloud security company that acquired Nexgate, an early DataStax customer, in 2014. Proofpoint uses Cassandra all over the organization, with a current production deployment of 3 TB of data across 23 nodes in 9 data centers across 4 clusters. Over time, Proofpoint has evolved its Cassandra usage from a single data center with 3 nodes in 2012 to multiple data centers with Solr deployment today. Proofpoint discussed several use cases for Cassandra including detecting phishing, analyzing spam patterns, trending topics analysis, archive searching, threat event correlation, and more. The presentation provided advice on Cassandra best practices and contacts for further information.
These slides provide highlights of my book HDInsight Essentials. Book link is here: http://www.packtpub.com/establish-a-big-data-solution-using-hdinsight/book
Running cost effective big data workloads with Azure Synapse and ADLS (MS Ign...Michael Rys
Presentation by James Baker and myself on Running cost effective big data workloads with Azure Synapse and Azure Datalake Storage (ADLS) at Microsoft Ignite 2020. Covers Modern Data warehouse architecture supported by Azure Synapse, integration benefits with ADLS and some features that reduce cost such as Query Acceleration, integration of Spark and SQL processing with integrated meta data and .NET For Apache Spark support.
Big Data is the reality of modern business: from big companies to small ones, everybody is trying to find their own benefit. Big Data technologies are not meant to replace traditional ones, but to be complementary to them. In this presentation you will hear what is Big Data and Data Lake and what are the most popular technologies used in Big Data world. We will also speak about Hadoop and Spark, and how they integrate with traditional systems and their benefits.
Bloor Research & DataStax: How graph databases solve previously unsolvable bu...DataStax
This webinar covered graph databases and how they can solve problems that were previously difficult for traditional databases. It included presentations on why graph databases are useful, common use cases like recommendations and network analysis, different types of graph databases, and a demonstration of the DataStax Enterprise graph database. There was also a question and answer session where attendees could ask about graph databases and DataStax Enterprise graph.
The document discusses how managing data is key to unlocking value from the Internet of Things. It emphasizes that variety, not size, is most important with big data. Example use cases mentioned include predictive maintenance, search and root cause analysis. The technology landscape is changing with new architectures like data lakes and new patterns such as event histories and timelines. Managing data is also changing with schema on read, loosely coupled schemas, and increased importance of metadata. The document concludes that data management patterns and practices are foundational to effective analytics with IoT data.
Dev Lakhani, Data Scientist at Batch Insights "Real Time Big Data Applicatio...Dataconomy Media
Dev Lakhani, Data Scientist at Batch Insights talks on "Real Time Big Data Applications for Investment Banks and Financial Institutions" at the first Big Data Frankfurt event that took place at Die Zentrale, organised by Dataconomy Media
Big Data: Architecture and Performance Considerations in Logical Data LakesDenodo
This presentation explains in detail what a Data Lake Architecture looks like, how data virtualization fits into the Logical Data Lake, and goes over some performance tips. Also it includes an example demonstrating this model's performance.
This presentation is part of the Fast Data Strategy Conference, and you can watch the video here goo.gl/9Jwfu6.
Short introduction to different options for ETL & ELT in the Cloud with Microsoft Azure. This is a small accompanying set of slides for my presentations and blogs on this topic
Big Data Analytics in the Cloud with Microsoft AzureMark Kromer
Big Data Analytics in the Cloud using Microsoft Azure services was discussed. Key points included:
1) Azure provides tools for collecting, processing, analyzing and visualizing big data including Azure Data Lake, HDInsight, Data Factory, Machine Learning, and Power BI. These services can be used to build solutions for common big data use cases and architectures.
2) U-SQL is a language for preparing, transforming and analyzing data that allows users to focus on the what rather than the how of problems. It uses SQL and C# and can operate on structured and unstructured data.
3) Visual Studio provides an integrated environment for authoring, debugging, and monitoring U-SQL scripts and jobs. This allows
The document discusses modernizing a data warehouse using the Microsoft Analytics Platform System (APS). APS is described as a turnkey appliance that allows organizations to integrate relational and non-relational data in a single system for enterprise-ready querying and business intelligence. It provides a scalable solution for growing data volumes and types that removes limitations of traditional data warehousing approaches.
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...Mark Rittman
Mark Rittman from Rittman Mead presented on Oracle Big Data Discovery. He discussed how many organizations are running big data initiatives involving loading large amounts of raw data into data lakes for analysis. Oracle Big Data Discovery provides a visual interface for exploring, analyzing, and transforming this raw data. It allows users to understand relationships in the data, perform enrichments, and prepare the data for use in tools like Oracle Business Intelligence.
Presto – Today and Beyond – The Open Source SQL Engine for Querying all Data...Dipti Borkar
Born at Facebook, Presto is an open source high performance, distributed SQL query engine. With the disaggregation of storage and compute, Presto was created to simplify querying of all data lakes - cloud data lakes like S3 and on premise data lakes like HDFS. Presto's high performance and flexibility has made it a very popular choice for interactive query workloads on large Hadoop-based clusters as well as AWS S3, Google Cloud Storage and Azure blob store. Today it has grown to support many users and use cases including ad hoc query, data lake house analytics, and federated querying. In this session, we will give an overview on Presto including architecture and how it works, the problems it solves, and most common use cases. We'll also share the latest innovation in the project as well as the future roadmap.
This document discusses how to build a successful data lake by focusing on the right data, platform, and interface. It emphasizes the importance of saving raw data to analyze later, organizing the data lake into zones with different governance levels, and providing self-service tools to find, understand, provision, prepare, and analyze data. It promotes the use of a smart data catalog like Waterline Data to automate metadata tagging, enable data discovery and collaboration, and maximize business value from the data lake.
What is an Open Data Lake? - Data Sheets | WhitepaperVasu S
A data lake, where data is stored in an open format and accessed through open standards-based interfaces, is defined as an Open Data Lake.
https://www.qubole.com/resources/data-sheets/what-is-an-open-data-lake
Everyone is awash in the new buzzword, Big Data, and it seems as if you can’t escape it wherever you go. But there are real companies with real use cases creating real value for their businesses by using big data. This talk will discuss some of the more compelling current or recent projects, their architecture & systems used, and successful outcomes.
Introduction to Kudu - StampedeCon 2016StampedeCon
Over the past several years, the Hadoop ecosystem has made great strides in its real-time access capabilities, narrowing the gap compared to traditional database technologies. With systems such as Impala and Spark, analysts can now run complex queries or jobs over large datasets within a matter of seconds. With systems such as Apache HBase and Apache Phoenix, applications can achieve millisecond-scale random access to arbitrarily-sized datasets.
Despite these advances, some important gaps remain that prevent many applications from transitioning to Hadoop-based architectures. Users are often caught between a rock and a hard place: columnar formats such as Apache Parquet offer extremely fast scan rates for analytics, but little to no ability for real-time modification or row-by-row indexed access. Online systems such as HBase offer very fast random access, but scan rates that are too slow for large scale data warehousing workloads.
This talk will investigate the trade-offs between real-time transactional access and fast analytic performance from the perspective of storage engine internals. It will also describe Kudu, the new addition to the open source Hadoop ecosystem that fills the gap described above, complementing HDFS and HBase to provide a new option to achieve fast scans and fast random access from a single API.
The document introduces the Teradata Aster Discovery Platform for scalable analytics using analytic algorithms on commodity hardware. It discusses use cases like credit risk analysis, fraud detection, and sentiment analysis. It provides an overview of the discovery process model and instructions for downloading, installing, and using Aster including setting up the Aster Management Console and AsterLens for visualization. It then provides examples of using various Aster analytic functions like k-means clustering, market basket analysis, data unpacking, and nPath analysis for applications in marketing, pricing, and web analytics. It concludes that Aster provides more powerful analytics capabilities than SQL alone for exploring big data.
Cortana Analytics Workshop: Operationalizing Your End-to-End Analytics SolutionMSAdvAnalytics
Wee Hyong Tok. With Azure Data Factory (ADF), existing data movement and analytics processing services can be composed into data pipelines that are highly available and managed in the cloud. In this demo-driven session, you learn by example how to build, operationalize, and manage scalable analytics pipelines. Go to https://channel9.msdn.com/ to find the recording of this session.
The ecosystem for big data and analytics has become too large and complex, with too many vendors, distributions, engines, projects, and iterations. This leads to problems like analysis paralysis during platform decisions, solutions becoming obsolete too quickly, and constant stress over choosing the right engine for each job. The document suggests that industries, vendors, investors, analysts, technologists, and customers all need to take steps to reduce complexity and focus on standards and merit-based evaluations of options.
Webinar: Proofpoint, a pioneer in security-as-a-service protects people, info...DataStax
Proofpoint is a $3 billion public cloud security company that acquired Nexgate, an early DataStax customer, in 2014. Proofpoint uses Cassandra all over the organization, with a current production deployment of 3 TB of data across 23 nodes in 9 data centers across 4 clusters. Over time, Proofpoint has evolved its Cassandra usage from a single data center with 3 nodes in 2012 to multiple data centers with Solr deployment today. Proofpoint discussed several use cases for Cassandra including detecting phishing, analyzing spam patterns, trending topics analysis, archive searching, threat event correlation, and more. The presentation provided advice on Cassandra best practices and contacts for further information.
These slides provide highlights of my book HDInsight Essentials. Book link is here: http://www.packtpub.com/establish-a-big-data-solution-using-hdinsight/book
Running cost effective big data workloads with Azure Synapse and ADLS (MS Ign...Michael Rys
Presentation by James Baker and myself on Running cost effective big data workloads with Azure Synapse and Azure Datalake Storage (ADLS) at Microsoft Ignite 2020. Covers Modern Data warehouse architecture supported by Azure Synapse, integration benefits with ADLS and some features that reduce cost such as Query Acceleration, integration of Spark and SQL processing with integrated meta data and .NET For Apache Spark support.
Big Data is the reality of modern business: from big companies to small ones, everybody is trying to find their own benefit. Big Data technologies are not meant to replace traditional ones, but to be complementary to them. In this presentation you will hear what is Big Data and Data Lake and what are the most popular technologies used in Big Data world. We will also speak about Hadoop and Spark, and how they integrate with traditional systems and their benefits.
Bloor Research & DataStax: How graph databases solve previously unsolvable bu...DataStax
This webinar covered graph databases and how they can solve problems that were previously difficult for traditional databases. It included presentations on why graph databases are useful, common use cases like recommendations and network analysis, different types of graph databases, and a demonstration of the DataStax Enterprise graph database. There was also a question and answer session where attendees could ask about graph databases and DataStax Enterprise graph.
The document discusses how managing data is key to unlocking value from the Internet of Things. It emphasizes that variety, not size, is most important with big data. Example use cases mentioned include predictive maintenance, search and root cause analysis. The technology landscape is changing with new architectures like data lakes and new patterns such as event histories and timelines. Managing data is also changing with schema on read, loosely coupled schemas, and increased importance of metadata. The document concludes that data management patterns and practices are foundational to effective analytics with IoT data.
Dev Lakhani, Data Scientist at Batch Insights "Real Time Big Data Applicatio...Dataconomy Media
Dev Lakhani, Data Scientist at Batch Insights talks on "Real Time Big Data Applications for Investment Banks and Financial Institutions" at the first Big Data Frankfurt event that took place at Die Zentrale, organised by Dataconomy Media
Big Data: Architecture and Performance Considerations in Logical Data LakesDenodo
This presentation explains in detail what a Data Lake Architecture looks like, how data virtualization fits into the Logical Data Lake, and goes over some performance tips. Also it includes an example demonstrating this model's performance.
This presentation is part of the Fast Data Strategy Conference, and you can watch the video here goo.gl/9Jwfu6.
Short introduction to different options for ETL & ELT in the Cloud with Microsoft Azure. This is a small accompanying set of slides for my presentations and blogs on this topic
Big Data Analytics in the Cloud with Microsoft AzureMark Kromer
Big Data Analytics in the Cloud using Microsoft Azure services was discussed. Key points included:
1) Azure provides tools for collecting, processing, analyzing and visualizing big data including Azure Data Lake, HDInsight, Data Factory, Machine Learning, and Power BI. These services can be used to build solutions for common big data use cases and architectures.
2) U-SQL is a language for preparing, transforming and analyzing data that allows users to focus on the what rather than the how of problems. It uses SQL and C# and can operate on structured and unstructured data.
3) Visual Studio provides an integrated environment for authoring, debugging, and monitoring U-SQL scripts and jobs. This allows
This is a run-through at a 200 level of the Microsoft Azure Big Data Analytics for the Cloud data platform based on the Cortana Intelligence Suite offerings.
Microsoft Enterprise Cube is a business performance management solution that helps telecommunications service providers integrate their disparate subscriber data sources to gain insights. It provides a single view of subscriber usage across systems to identify high-value subscribers, underutilized services, and opportunities to improve loyalty. The solution uses familiar Microsoft technologies like SQL Server, SharePoint and Office to deliver customizable reports and analytics at a low total cost of ownership. It supports compliance needs and scales to accommodate growing data storage requirements of service providers.
Microsoft Cloud BI Update 2012 for SQL Saturday PhillyMark Kromer
This document provides an overview and update of Microsoft's Cloud Business Intelligence (BI) solutions in version 3.0 from June 2012. It discusses the objectives of Cloud BI including providing data access and answers to business questions anytime from mobile devices. An overview of the session covers Windows Azure, SQL Azure, SQL Azure Reporting Services, mobile BI delivery, cloud data integration, data mining in the cloud, and hybrid scenarios. Key features of SQL Azure like import/export, data-tier applications, data sync, and federations for database scale-out are also summarized.
What's new in SQL Server 2012 for philly code camp 2012.1Mark Kromer
A high-level run through the SQL Server roadmap focused on the new technologies and features of SQL Server 2012. Mark Kromer presented this deck to the Philly .NET Code Camp in at Penn State Abington on May 12, 2012.
Microsoft Event Registration System Hosted on Windows AzureMark Kromer
This document describes a Windows Azure event registration app built in 2 weeks by 1 developer. It allows interactive check-in for live events on Windows 8 slates and mobile devices. It uses SQL Azure databases to store registration data and Windows Azure storage for photo sharing. The app provides check-in, photo viewing, and social media integration across Windows 8, Windows 7, and Windows Phone 7 platforms.
Philly Code Camp 2013 Mark Kromer Big Data with SQL ServerMark Kromer
These are my slides from May 2013 Philly Code Camp at Penn State Abington. I will post the samples, code and scripts on my blog here following the event this Saturday: http://www.kromerbigdata.com
This document discusses big data and SQL Server. It covers what big data is, the Hadoop environment, big data analytics, and how SQL Server fits into the big data world. It describes using Sqoop to load data between Hadoop and SQL Server, and SQL Server features for big data analytics like columnstore and PolyBase. The document concludes that a big data analytics approach is needed for massive, variable data, and that SQL Server 2012 supports this with features like columnstore and tabular SSAS.
This document discusses big data solutions and analytics. It defines big data in terms of volume, velocity, and variety of data. It contrasts big data analytics with traditional business intelligence, noting that big data looks for untapped insights rather than dashboards. It also provides examples of scalable big data platform architectures and advanced analytics capabilities. Finally, it outlines Anexinet's big data offerings including strategy, starter solutions, projects, and partnerships.
BI(Business Intelligence) 모델링과 쿼리하는 구문 언어인 MDX(Multi Dimensional eXpressions)에 대한 이해와 활용에 도움이 되는 파일입니다. SQL Server 2000기반에서 작성된 자료이지만, MDX에 대한 이해를 위한 좋은 자료라고 생각됩니다.
Microsoft SQL Server Data Warehouses for SQL Server DBAsMark Kromer
The document discusses Microsoft SQL Server data warehousing solutions. It provides an agenda for a presentation that includes an overview of Microsoft's data warehousing offerings, how to establish baseline metrics for Fast Track reference configurations, and how to design balanced server and storage configurations for data warehousing workloads. It also discusses software and hardware best practices, such as data striping and storage configuration recommendations. Overall, the document outlines topics and solutions to help customers accelerate their data warehouse deployments using Microsoft SQL Server.
This document compares cloud platforms Amazon Web Services (AWS) and Microsoft Azure. It finds that AWS is more oriented toward infrastructure as a service (IaaS) while Azure is more platform as a service (PaaS) oriented, though both platforms offer services across IaaS and PaaS. The document also compares specific cloud storage, databases, networking, deployment, middleware, tools and high availability/disaster recovery features between AWS and Azure.
A brief comparison between two cloud platforms AWS vs. Azure. Compare Microsoft Azure services, pricing, customers and more with Amazon AWS through slides.
Advanced Reporting and ETL for MongoDB: Easily Build a 360-Degree View of You...MongoDB
The document discusses Pentaho's analytics and ETL solutions for MongoDB. It provides an overview of Pentaho Company and its platform for unified business analytics and data integration. It then outlines how Pentaho can be used to build a 360-degree view of customers by extracting, transforming and loading data from source systems into MongoDB and performing analytics and reporting on the MongoDB data. It demonstrates these capabilities with examples and screenshots.
Exclusive Verizon Employee Webinar: Getting More From Your CDR DataPentaho
This document discusses a project between Pentaho and Verizon to leverage big data analytics. Verizon generates vast amounts of call detail record (CDR) data from mobile networks that is currently stored in a data warehouse for 2 years and then archived to tape. Pentaho's platform will help optimize the data warehouse by using Hadoop to store all CDR data history. This will free up data warehouse capacity for high value data and allow analysis of the full 10 years of CDR data. Pentaho tools will ingest raw CDR data into Hadoop, execute MapReduce jobs to enrich the data, load results into Hive, and enable analyzing the data to understand calling patterns by geography over time.
The document discusses Pentaho's business intelligence (BI) platform for big data analytics. It describes Pentaho as providing a modern, unified platform for data integration and analytics that allows for native integration into the big data ecosystem. It highlights Pentaho's open source development model and that it has over 1,000 commercial customers and 10,000 production deployments. Several use cases are presented that demonstrate how Pentaho helps customers unlock value from big data stores.
MongoDB IoT City Tour EINDHOVEN: Analysing the Internet of Things: Davy Nys, ...MongoDB
Drawing on Pentaho's wide experience in solving customers' big data issues, Davy Nys will position the importance of analytics in the IoT:
[-] Understanding the challenges behind data integration & analytics for IoT
[-] Future proofing your information architecture for IoT
[-] Delivering IoT analytics, now and tomorrow
[-] Real customer examples of where Pentaho can help
MongoDB IoT City Tour STUTTGART: Analysing the Internet of Things. By, PentahoMongoDB
Dominik Claßen, Sales Engineering Team Laed at Pentaho
Drawing on Pentaho's wide experience in solving customers' big data issues, Dominik positions the importance of analytics in the IoT.
[-] Understanding the challenges behind data integration & analytics for IoT
[-] Future proofing your information architecture for IoT
[-] Delivering IoT analytics, now and tomorrow
[-] Real customer examples of where Pentaho can help
MongoDB IoT City Tour LONDON: Analysing the Internet of Things: Davy Nys, Pen...MongoDB
1) The document discusses Pentaho's beliefs around Internet of Things (IoT) analytics, including applying the right data source and processing for different analytics needs, gaining insights by blending multiple data sources on demand, and planning for agility, flexibility and near real-time analytics.
2) It describes how emerging big data use cases demand blending different data sources and provides examples like improving operations and customer experience.
3) The document advocates an Extract-Transform-Report approach for IoT analytics that provides flexibility to integrate diverse data sources and enables real-time insights.
Big Data has been a "buzz word" for a few years now, and it's generated a fair amount of hype. But, while the technology landscape is still evolving, product companies in the software, web, and hardware areas have actually led the way in delivering real value from data sources like weblogs, sensors, and social media as well as systems like Hadoop, NoSQL, and Analytical Databases. These organizations have built "Big Data Apps" that leverage fast, flexible data frameworks to solve a wide array of user problems, scale to massive audiences, and deliver superior predictive intelligence.
Join this webinar to learn why product managers should understand Big Data and hear about real-life products that have been elevated with these innovative technologies. You will hear from:
- Ben Hopkins, Product Marketing Manager at Pentaho, who will discuss what Big Data means for product strategy and why it represents a new toolset for product teams to meet user needs and build competitive advantage
- Jim Stascavage, VP of Engineering at ESRG, who will discuss how his company has innovated with Big Data and predictive analytics to deliver technology products that optimize fuel consumption and maintenance cycles in the maritime and heavy industry sectors, leveraging trillions of sensor data points a year.
Who Should Attend
Product Managers, Product Marketing Managers, Project Managers, Development Managers, Product Executives, and anyone responsible for addressing customer needs & influencing product strategy.
How advanced analytics is impacting the banking sectorMichael Haddad
The document discusses how advanced analytics is impacting the banking sector. It covers topics like regulatory changes forcing banks to invest in compliance; new digital technologies changing how customers interact with banks; and data analytics helping banks reduce risk, deliver personalized services, and retain skills. It also discusses Hitachi Data Systems' acquisition of Pentaho and how their combined platform can provide unified data integration and business analytics across structured, unstructured, and streaming data sources.
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BIDenodo
Watch full webinar here: https://bit.ly/3zVJRRf
According to Dresner Advisory’s 2020 Self-Service Business Intelligence Market Study, 62% of the responding organizations say self-service BI is critical for their business. If we look deeper into the need for today’s self-service BI, it’s beyond some Executives and Business Users being enabled by IT for self-service dashboarding or report generation. Predictive analytics, self-service data preparation, collaborative data exploration are all different facets of new generation self-service BI. While democratization of data for self-service BI holds many benefits, strict data governance becomes increasingly important alongside.
In this session we will discuss:
- The latest trends and scopes of self-service BI
- The role of logical data fabric in self-service BI
- How Denodo enables self-service BI for a wide range of users - Customer case study on self-service BI
Open Analytics 2014 - Pedro Alves - Innovation though Open SourceOpenAnalytics Spain
Delivering the Future of Analytics: Innovation through Open Source Pentaho was born out of the desire to achieve positive, disruptive change in the business analytics market, dominated by bureaucratic megavendors offering expensive heavy-weight products built on outdated technology platforms. Pentaho’s open, embeddable data integration and analytics platform was developed with a strong open source heritage. This provided Pentaho a first-mover advantage to engage early with adopters of big data technologies and solve the difficult challenges of integrating both established and emerging data types to drive analytics. Continued technology innovations to support the big data ecosystem, have kept customers ahead of the big data curve. With the ability to drastically reduce the time to design, develop and deploy big data solutions, Pentaho counts numerous big data customers, both large and small, across the financial services, retail, travel, healthcare and government industries around the world.
The document outlines Pentaho's roadmap and focus areas for business analytics products. It discusses enhancements planned for Pentaho Business Analytics 5.1, including new features for analyzing MongoDB data and improved visualizations. It also summarizes R&D activities like integrating real-time data processing with Storm and Spark. The roadmap focuses on hardening the Pentaho platform for large enterprises, extending capabilities for big data engineering and analytics, and improving embedded analytics.
Check out this presentation from Pentaho and ESRG to learn why product managers should understand Big Data and hear about real-life products that have been elevated with these innovative technologies.
Learn more in the brief that inspired the presentation, Product Innovation with Big Data: http://www.pentaho.com/resources/whitepaper/product-innovation-big-data
BAR360 open data platform presentation at DAMA, SydneySai Paravastu
Sai Paravastu discusses the benefits of using an open data platform (ODP) for enterprises. The ODP would provide a standardized core of open source Hadoop technologies like HDFS, YARN, and MapReduce. This would allow big data solution providers to build compatible solutions on a common platform, reducing costs and improving interoperability. The ODP would also simplify integration for customers and reduce fragmentation in the industry by coordinating development efforts.
Customer Intelligence_ Harnessing Elephants at Transamerica Presentation (1)Vishal Bamba
Transamerica is an insurance company with two business units focused on investments/retirement and life/protection. They have a rich data environment across systems but lack an enterprise view. Their POC used Cloudera, Informatica, and Datameer on a 10 node Hadoop cluster to integrate data from various sources, perform data quality/cleansing, create customer profiles, and run analytics/visualizations to power use cases like prospect scoring. Key lessons included investing in a POC, partnering with vendors, establishing governance over managed/curated data, and aligning with larger strategies.
This document discusses Saxo Bank's plans to implement a data governance solution called the Data Workbench. The Data Workbench will consist of a Data Catalogue and Data Quality Solution to provide transparency into Saxo's data ecosystem and improve data quality. The Data Catalogue will be built using LinkedIn's open source DataHub tool, which provides a metadata search and UI. The Data Quality Solution will use Great Expectations to define and monitor data quality rules. The document discusses why a decentralized, domain-driven approach is needed rather than a centralized solution, and how the Data Workbench aims to establish governance while staying lean and iterative.
Enabling Next Gen Analytics with Azure Data Lake and StreamSetsStreamsets Inc.
This document discusses enabling next generation analytics with Azure Data Lake. It provides definitions of big data and discusses how big data is a cornerstone of Cortana Intelligence. It also discusses challenges with big data like obtaining skills and determining value. The document then discusses Azure HDInsight and how it provides a cloud Spark and Hadoop service. It also discusses StreamSets and how it can be used for data movement and deployment on Azure VM or local machine. Finally, it discusses a use case of StreamSets at a major bank to move data from on-premise to Azure Data Lake and consolidate migration tools.
When and How Data Lakes Fit into a Modern Data ArchitectureDATAVERSITY
Whether to take data ingestion cycles off the ETL tool and the data warehouse or to facilitate competitive Data Science and building algorithms in the organization, the data lake – a place for unmodeled and vast data – will be provisioned widely in 2020.
Though it doesn’t have to be complicated, the data lake has a few key design points that are critical, and it does need to follow some principles for success. Avoid building the data swamp, but not the data lake! The tool ecosystem is building up around the data lake and soon many will have a robust lake and data warehouse. We will discuss policy to keep them straight, send data to its best platform, and keep users’ confidence up in their data platforms.
Data lakes will be built in cloud object storage. We’ll discuss the options there as well.
Get this data point for your data lake journey.
Data & Analytics with CIS & Microsoft PlatformsSonata Software
Sonata Software provides data and analytics services using Microsoft platforms and technologies. They help customers leverage data to drive intelligent actions and personalization at scale. Sonata has expertise in data warehousing, business analytics, AI, machine learning, and developing industry-specific analytics solutions and AI accelerators on the Microsoft stack. They assist customers with data strategy, analytics, visualization, and migrating to Azure-based platforms.
Pentaho 7.0 aims to bridge the gap between data preparation and analytics by allowing analytics from anywhere in the data pipeline. It brings analytics into data prep workflows, enables sharing analytics during prep, and improves reporting. It also provides enhanced support for big data technologies like Spark, Hadoop security, and metadata injection to automate data onboarding. A demo shows the ability to visually inspect data during prep to identify issues. Analysts say this allows more collaboration between business and IT and accelerates insights.
Fabric Data Factory Pipeline Copy Perf Tips.pptxMark Kromer
This document provides performance tips for pipelines and copy activities in Azure Data Factory (ADF). It discusses:
- Using pipelines for data orchestration with conditional execution and parallel activities.
- The Copy activity provides massive-scale data movement within pipelines. Using Copy for ELT can land data quickly into a data lake.
- Gaining more throughput by using multiple parallel Copy activities but this can overload the source.
- Optimizing copy performance by using binary format, file lists/folders instead of individual files, and SQL source partitioning.
- Metrics showing copying Parquet files to a lakehouse at 5.1 GB/s while CSV and SQL loads were slower due to transformation.
The
Build data quality rules and data cleansing into your data pipelinesMark Kromer
This document provides guidance on building data quality rules and data cleansing into data pipelines. It discusses considerations for data quality in data warehouse and data science scenarios, including verifying data types and lengths, handling null values, domain value constraints, and reference data lookups. It also provides examples of techniques for replacing values, splitting data based on values, data profiling, pattern matching, enumerations/lookups, de-duplicating data, fuzzy joins, validating metadata rules, and using assertions.
Mapping Data Flows Training deck Q1 CY22Mark Kromer
Mapping data flows allow for code-free data transformation at scale using an Apache Spark engine within Azure Data Factory. Key points:
- Mapping data flows can handle structured and unstructured data using an intuitive visual interface without needing to know Spark, Scala, Python, etc.
- The data flow designer builds a transformation script that is executed on a JIT Spark cluster within ADF. This allows for scaled-out, serverless data transformation.
- Common uses of mapping data flows include ETL scenarios like slowly changing dimensions, analytics tasks like data profiling, cleansing, and aggregations.
Data cleansing and prep with synapse data flowsMark Kromer
This document provides resources for data cleansing and preparation using Azure Synapse Analytics Data Flows. It includes links to videos, documentation, and a slide deck that explain how to use Data Flows for tasks like deduplicating null values, saving data profiler summary statistics, and using metadata functions. A GitHub link shares a tutorial document for a hands-on learning experience with Synapse Data Flows.
Data cleansing and data prep with synapse data flowsMark Kromer
This document contains links to resources about using Azure Synapse Analytics for data cleansing and preparation with Data Flows. It includes links to videos and documentation about removing null values, saving data profiler summary statistics, and using metadata functions in Azure Data Factory data flows.
Mapping Data Flows Perf Tuning April 2021Mark Kromer
This document discusses optimizing performance for data flows in Azure Data Factory. It provides sample timing results for various scenarios and recommends settings to improve performance. Some best practices include using memory optimized Azure integration runtimes, maintaining current partitioning, scaling virtual cores, and optimizing transformations and sources/sinks. The document also covers monitoring flows to identify bottlenecks and global settings that affect performance.
This document discusses using Azure Data Factory (ADF) for data lake ETL processes in the cloud. It describes how ADF can ingest data from on-premises, cloud, and SaaS sources into a data lake for preparation, transformation, enrichment, and serving to downstream analytics or machine learning processes. The document also provides several links to YouTube videos and articles about using ADF for these tasks.
Azure Data Factory Data Wrangling with Power QueryMark Kromer
Azure Data Factory now allows users to perform data wrangling tasks through Power Query activities, translating M scripts into ADF data flow scripts executed on Apache Spark. This enables code-free data exploration, preparation, and operationalization of Power Query workflows within ADF pipelines. Examples of use cases include data engineers building ETL processes or analysts operationalizing existing queries to prepare data for modeling, with the goal of providing a data-first approach to building data flows and pipelines in ADF.
Azure Data Factory Data Flow Performance Tuning 101Mark Kromer
The document provides performance timing results and recommendations for optimizing Azure Data Factory data flows. Sample 1 processed a 421MB file with 887k rows in 4 minutes using default partitioning on an 80-core Azure IR. Sample 2 processed a table with the same size and transforms in 3 minutes using source and derived column partitioning. Sample 3 processed the same size file in 2 minutes with default partitioning. The document recommends partitioning strategies, using memory optimized clusters, and scaling cores to improve performance.
Azure Data Factory Data Flows Training (Sept 2020 Update)Mark Kromer
Mapping data flows allow for code-free data transformation using an intuitive visual interface. They provide resilient data flows that can handle structured and unstructured data using an Apache Spark engine. Mapping data flows can be used for common tasks like data cleansing, validation, aggregation, and fact loading into a data warehouse. They allow transforming data at scale through an expressive language without needing to know Spark, Scala, Python, or manage clusters.
Data quality patterns in the cloud with ADFMark Kromer
Azure Data Factory can be used to build modern data warehouse patterns with Azure SQL Data Warehouse. It allows extracting and transforming relational data from databases and loading it into Azure SQL Data Warehouse tables optimized for analytics. Data flows in Azure Data Factory can also clean and join disparate data from Azure Storage, Data Lake Store, and other data sources for loading into the data warehouse. This provides simple and productive ETL capabilities in the cloud at any scale.
Azure Data Factory Data Flows Training v005Mark Kromer
Mapping Data Flow is a new feature of Azure Data Factory that allows building data transformations in a visual interface without code. It provides a serverless, scale-out transformation engine for processing big data with unstructured requirements. Mapping Data Flows can be authored and designed visually, with transformations, expressions, and results previews, and then operationalized with Data Factory scheduling, monitoring, and control flow.
Data Quality Patterns in the Cloud with Azure Data FactoryMark Kromer
This document discusses data quality patterns when using Azure Data Factory (ADF). It presents two modern data warehouse patterns that use ADF for orchestration: one using traditional ADF activities and another leveraging ADF mapping data flows. It also provides links to additional resources on ADF data flows, data quality patterns, expressions, performance, and connectors.
Azure Data Factory can now use Mapping Data Flows to orchestrate ETL workloads. Mapping Data Flows allow users to visually design transformations on data from disparate sources and load the results into Azure SQL Data Warehouse for analytics. The key benefits of Mapping Data Flows are that they provide a visual interface for building expressions to cleanse and join data with auto-complete assistance and live previews of expression results.
Mapping Data Flow is a new feature of Azure Data Factory that allows users to build data transformations in a visual interface without code. It provides a serverless, scale-out transformation engine for processing big data with unstructured requirements. Mapping Data Flows can be operationalized with Data Factory's scheduling, control flow, and monitoring capabilities.
ADF Mapping Data Flows Training Slides V1Mark Kromer
Mapping Data Flow is a new feature of Azure Data Factory that allows users to build data transformations in a visual interface without code. It provides a serverless, scale-out transformation engine to transform data at scale in the cloud in a resilient manner for big data scenarios involving unstructured data. Mapping Data Flows can be operationalized with Azure Data Factory's scheduling, control flow, and monitoring capabilities.
Azure Data Factory ETL Patterns in the CloudMark Kromer
This document discusses ETL patterns in the cloud using Azure Data Factory. It covers topics like ETL vs ELT, the importance of scale and flexible schemas in cloud ETL, and how Azure Data Factory supports workflows, templates, and integration with on-premises and cloud data. It also provides examples of nightly ETL data flows, handling schema drift, loading dimensional models, and data science scenarios using Azure data services.
This is the keynote of the Into the Box conference, highlighting the release of the BoxLang JVM language, its key enhancements, and its vision for the future.
Big Data Analytics Quick Research Guide by Arthur MorganArthur Morgan
This is a Quick Research Guide (QRG).
QRGs include the following:
- A brief, high-level overview of the QRG topic.
- A milestone timeline for the QRG topic.
- Links to various free online resource materials to provide a deeper dive into the QRG topic.
- Conclusion and a recommendation for at least two books available in the SJPL system on the QRG topic.
QRGs planned for the series:
- Artificial Intelligence QRG
- Quantum Computing QRG
- Big Data Analytics QRG
- Spacecraft Guidance, Navigation & Control QRG (coming 2026)
- UK Home Computing & The Birth of ARM QRG (coming 2027)
Any questions or comments?
- Please contact Arthur Morgan at [email protected].
100% human made.
AI and Data Privacy in 2025: Global TrendsInData Labs
In this infographic, we explore how businesses can implement effective governance frameworks to address AI data privacy. Understanding it is crucial for developing effective strategies that ensure compliance, safeguard customer trust, and leverage AI responsibly. Equip yourself with insights that can drive informed decision-making and position your organization for success in the future of data privacy.
This infographic contains:
-AI and data privacy: Key findings
-Statistics on AI data privacy in the today’s world
-Tips on how to overcome data privacy challenges
-Benefits of AI data security investments.
Keep up-to-date on how AI is reshaping privacy standards and what this entails for both individuals and organizations.
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...TrustArc
Most consumers believe they’re making informed decisions about their personal data—adjusting privacy settings, blocking trackers, and opting out where they can. However, our new research reveals that while awareness is high, taking meaningful action is still lacking. On the corporate side, many organizations report strong policies for managing third-party data and consumer consent yet fall short when it comes to consistency, accountability and transparency.
This session will explore the research findings from TrustArc’s Privacy Pulse Survey, examining consumer attitudes toward personal data collection and practical suggestions for corporate practices around purchasing third-party data.
Attendees will learn:
- Consumer awareness around data brokers and what consumers are doing to limit data collection
- How businesses assess third-party vendors and their consent management operations
- Where business preparedness needs improvement
- What these trends mean for the future of privacy governance and public trust
This discussion is essential for privacy, risk, and compliance professionals who want to ground their strategies in current data and prepare for what’s next in the privacy landscape.
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...Impelsys Inc.
Impelsys provided a robust testing solution, leveraging a risk-based and requirement-mapped approach to validate ICU Connect and CritiXpert. A well-defined test suite was developed to assess data communication, clinical data collection, transformation, and visualization across integrated devices.
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...Aqusag Technologies
In late April 2025, a significant portion of Europe, particularly Spain, Portugal, and parts of southern France, experienced widespread, rolling power outages that continue to affect millions of residents, businesses, and infrastructure systems.
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...SOFTTECHHUB
I started my online journey with several hosting services before stumbling upon Ai EngineHost. At first, the idea of paying one fee and getting lifetime access seemed too good to pass up. The platform is built on reliable US-based servers, ensuring your projects run at high speeds and remain safe. Let me take you step by step through its benefits and features as I explain why this hosting solution is a perfect fit for digital entrepreneurs.
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...Alan Dix
Talk at the final event of Data Fusion Dynamics: A Collaborative UK-Saudi Initiative in Cybersecurity and Artificial Intelligence funded by the British Council UK-Saudi Challenge Fund 2024, Cardiff Metropolitan University, 29th April 2025
https://alandix.com/academic/talks/CMet2025-AI-Changes-Everything/
Is AI just another technology, or does it fundamentally change the way we live and think?
Every technology has a direct impact with micro-ethical consequences, some good, some bad. However more profound are the ways in which some technologies reshape the very fabric of society with macro-ethical impacts. The invention of the stirrup revolutionised mounted combat, but as a side effect gave rise to the feudal system, which still shapes politics today. The internal combustion engine offers personal freedom and creates pollution, but has also transformed the nature of urban planning and international trade. When we look at AI the micro-ethical issues, such as bias, are most obvious, but the macro-ethical challenges may be greater.
At a micro-ethical level AI has the potential to deepen social, ethnic and gender bias, issues I have warned about since the early 1990s! It is also being used increasingly on the battlefield. However, it also offers amazing opportunities in health and educations, as the recent Nobel prizes for the developers of AlphaFold illustrate. More radically, the need to encode ethics acts as a mirror to surface essential ethical problems and conflicts.
At the macro-ethical level, by the early 2000s digital technology had already begun to undermine sovereignty (e.g. gambling), market economics (through network effects and emergent monopolies), and the very meaning of money. Modern AI is the child of big data, big computation and ultimately big business, intensifying the inherent tendency of digital technology to concentrate power. AI is already unravelling the fundamentals of the social, political and economic world around us, but this is a world that needs radical reimagining to overcome the global environmental and human challenges that confront us. Our challenge is whether to let the threads fall as they may, or to use them to weave a better future.
Procurement Insights Cost To Value Guide.pptxJon Hansen
Procurement Insights integrated Historic Procurement Industry Archives, serves as a powerful complement — not a competitor — to other procurement industry firms. It fills critical gaps in depth, agility, and contextual insight that most traditional analyst and association models overlook.
Learn more about this value- driven proprietary service offering here.
Book industry standards are evolving rapidly. In the first part of this session, we’ll share an overview of key developments from 2024 and the early months of 2025. Then, BookNet’s resident standards expert, Tom Richardson, and CEO, Lauren Stewart, have a forward-looking conversation about what’s next.
Link to recording, presentation slides, and accompanying resource: https://bnctechforum.ca/sessions/standardsgoals-for-2025-standards-certification-roundup/
Presented by BookNet Canada on May 6, 2025 with support from the Department of Canadian Heritage.
Semantic Cultivators : The Critical Future Role to Enable AIartmondano
By 2026, AI agents will consume 10x more enterprise data than humans, but with none of the contextual understanding that prevents catastrophic misinterpretations.
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025BookNet Canada
Book industry standards are evolving rapidly. In the first part of this session, we’ll share an overview of key developments from 2024 and the early months of 2025. Then, BookNet’s resident standards expert, Tom Richardson, and CEO, Lauren Stewart, have a forward-looking conversation about what’s next.
Link to recording, transcript, and accompanying resource: https://bnctechforum.ca/sessions/standardsgoals-for-2025-standards-certification-roundup/
Presented by BookNet Canada on May 6, 2025 with support from the Department of Canadian Heritage.
How Can I use the AI Hype in my Business Context?Daniel Lehner
𝙄𝙨 𝘼𝙄 𝙟𝙪𝙨𝙩 𝙝𝙮𝙥𝙚? 𝙊𝙧 𝙞𝙨 𝙞𝙩 𝙩𝙝𝙚 𝙜𝙖𝙢𝙚 𝙘𝙝𝙖𝙣𝙜𝙚𝙧 𝙮𝙤𝙪𝙧 𝙗𝙪𝙨𝙞𝙣𝙚𝙨𝙨 𝙣𝙚𝙚𝙙𝙨?
Everyone’s talking about AI but is anyone really using it to create real value?
Most companies want to leverage AI. Few know 𝗵𝗼𝘄.
✅ What exactly should you ask to find real AI opportunities?
✅ Which AI techniques actually fit your business?
✅ Is your data even ready for AI?
If you’re not sure, you’re not alone. This is a condensed version of the slides I presented at a Linkedin webinar for Tecnovy on 28.04.2025.
Generative Artificial Intelligence (GenAI) in BusinessDr. Tathagat Varma
My talk for the Indian School of Business (ISB) Emerging Leaders Program Cohort 9. In this talk, I discussed key issues around adoption of GenAI in business - benefits, opportunities and limitations. I also discussed how my research on Theory of Cognitive Chasms helps address some of these issues
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager APIUiPathCommunity
Join this UiPath Community Berlin meetup to explore the Orchestrator API, Swagger interface, and the Test Manager API. Learn how to leverage these tools to streamline automation, enhance testing, and integrate more efficiently with UiPath. Perfect for developers, testers, and automation enthusiasts!
📕 Agenda
Welcome & Introductions
Orchestrator API Overview
Exploring the Swagger Interface
Test Manager API Highlights
Streamlining Automation & Testing with APIs (Demo)
Q&A and Open Discussion
Perfect for developers, testers, and automation enthusiasts!
👉 Join our UiPath Community Berlin chapter: https://community.uipath.com/berlin/
This session streamed live on April 29, 2025, 18:00 CET.
Check out all our upcoming UiPath Community sessions at https://community.uipath.com/events/.
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptxJustin Reock
Building 10x Organizations with Modern Productivity Metrics
10x developers may be a myth, but 10x organizations are very real, as proven by the influential study performed in the 1980s, ‘The Coding War Games.’
Right now, here in early 2025, we seem to be experiencing YAPP (Yet Another Productivity Philosophy), and that philosophy is converging on developer experience. It seems that with every new method we invent for the delivery of products, whether physical or virtual, we reinvent productivity philosophies to go alongside them.
But which of these approaches actually work? DORA? SPACE? DevEx? What should we invest in and create urgency behind today, so that we don’t find ourselves having the same discussion again in a decade?
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep DiveScyllaDB
Want to learn practical tips for designing systems that can scale efficiently without compromising speed?
Join us for a workshop where we’ll address these challenges head-on and explore how to architect low-latency systems using Rust. During this free interactive workshop oriented for developers, engineers, and architects, we’ll cover how Rust’s unique language features and the Tokio async runtime enable high-performance application development.
As you explore key principles of designing low-latency systems with Rust, you will learn how to:
- Create and compile a real-world app with Rust
- Connect the application to ScyllaDB (NoSQL data store)
- Negotiate tradeoffs related to data modeling and querying
- Manage and monitor the database for consistently low latencies
Linux Support for SMARC: How Toradex Empowers Embedded DevelopersToradex
Toradex brings robust Linux support to SMARC (Smart Mobility Architecture), ensuring high performance and long-term reliability for embedded applications. Here’s how:
• Optimized Torizon OS & Yocto Support – Toradex provides Torizon OS, a Debian-based easy-to-use platform, and Yocto BSPs for customized Linux images on SMARC modules.
• Seamless Integration with i.MX 8M Plus and i.MX 95 – Toradex SMARC solutions leverage NXP’s i.MX 8 M Plus and i.MX 95 SoCs, delivering power efficiency and AI-ready performance.
• Secure and Reliable – With Secure Boot, over-the-air (OTA) updates, and LTS kernel support, Toradex ensures industrial-grade security and longevity.
• Containerized Workflows for AI & IoT – Support for Docker, ROS, and real-time Linux enables scalable AI, ML, and IoT applications.
• Strong Ecosystem & Developer Support – Toradex offers comprehensive documentation, developer tools, and dedicated support, accelerating time-to-market.
With Toradex’s Linux support for SMARC, developers get a scalable, secure, and high-performance solution for industrial, medical, and AI-driven applications.
Do you have a specific project or application in mind where you're considering SMARC? We can help with Free Compatibility Check and help you with quick time-to-market
For more information: https://www.toradex.com/computer-on-modules/smarc-arm-family
#3: Pentaho 5.0 reinforces Pentaho’s mission of delivering the future of analytics. Pentaho had continued to invest in BI and DI together with over 100 new features in PDI and over 250 in the platform overall.
Continued investments in big data—new integrations—specifically with Mongo and Cassandra—and continues to shield customers from changes in the market.
Open core and pluggable platform allows us to innovate quickly.
Pentaho is battle tested with over 1200 commercial customers.
#4: Icons are nice and the build-order is great!
My suggestion the top 3 icons on the left-hand side:
Customer
Provisioning
Billing
Suggestion for the bottom 3 icons:
Web
Network
Social Media
(note: Location seems to be important to AT&T but we can just mention this)
I need to come up with an explanation for why the arrow below “Just in Time Integration” is bi-directional instead of just flowing to Analytics
#14: Reference Architecture Notes
Financial services company: Ingest data from various sources into single Big Data store, then processes and summarizes data at customer unique ID level
Information is available in call center application for service, accessible by research analysts, and leveraged in predictive applications as well
Pentaho Data Integration can ingest into NoSQL, pull out of NoSQL, and connect to Pentaho Business Analytics for end user needs
#17: Sharding is a method for storing data across multiple machines. MongoDB uses sharding to support deployments with very large data sets and high throughput operations.
Sharding, or horizontal scaling, by contrast, divides the data set and distributes the data over multiple servers, or shards. Each shard is an independent database, and collectively, the shards make up a single logical database.